Scary Agent Skills: Hidden Unicode Instructions in Skills ...And How To Catch Them
There is a lot of talk about Skills recently, both in terms of capabilities and security concerns. However, so far I haven’t seen anyone bring up hidden prompt injection. So, I figured to demo a Skills supply chain backdoor that survives human review.
Additionally, I also built a basic scanner, and had my agent propose updates to OpenClaw to catch such attacks.
Attack SurfaceSkills introduce common threats, like prompt injection, supply chain attacks, RCE, data exfiltration,… This post discusses some basics, highlights the most simple prompt injection avenue, and shows how one can backdoor a real Skill from OpenAI with invisible Unicode Tag codepoints that certain models, like Gemini, Claude, Grok are known to interpret as instructions.