Researchers have unveiled a new jailbreaking technique, 'Deceptive Delight,' which successfully manipulates AI models to produce unsafe responses in only three interactions.
Palo Alto Networks’ Unit 42 researchers developed the method by embedding restricted topics within benign prompts, effectively bypassing safety filters. By carefully layering requests, researchers managed to coerce...