Policy Puppetry Prompt Injection

Policy Puppetry Prompt Injection A few days ago, I experimented with some Jailbreaking techniques, which I share in the repo. I started from a HiddenLayer article published a few weeks ago, where the research team described a rather creative and ingenious jailbreaking technique to bypass safety guardrails and the alignment of frontier models. The technique appears to be universal and applicable with a single prompt to multiple models, capable of revealing typically unsafe content or even portions of the native system prompt....

May 15, 2025 · 2 min · 417 words · Me