Embrace The Red

Scary Agent Skills: Hidden Unicode Instructions in Skills ...And How To Catch Them

Embrace The Red

1 day 3 hours ago

There is a lot of talk about Skills recently, both in terms of capabilities and security concerns. However, so far I haven’t seen anyone bring up hidden prompt injection. So, I figured to demo a Skills supply chain backdoor that survives human review.

Additionally, I also built a basic scanner, and had my agent propose updates to OpenClaw to catch such attacks.

Attack Surface

Skills introduce common threats, like prompt injection, supply chain attacks, RCE, data exfiltration,… This post discusses some basics, highlights the most simple prompt injection avenue, and shows how one can backdoor a real Skill from OpenAI with invisible Unicode Tag codepoints that certain models, like Gemini, Claude, Grok are known to interpret as instructions.

OpenAI Explains URL-Based Data Exfiltration Mitigations in New Paper

Embrace The Red

1 week ago

Last week I saw this paper from OpenAI called “Preventing URL-Based Data Exfiltration in Language-Model Agents”, which goes into detail on new mitigations they’ve added.

This is a great read. I like this transparency.

Initial Disclosure in 2023

Nearly three years ago I reported the zero-click data exfiltration exploit to OpenAI. Back in early 2023 OpenAI did not have a bug bounty program, so communication was via email, and unfortunately there was little traction or appetite to fix the problem in ChatGPT. I also reported the same issue to Microsoft as Bing Chat was impacted, and Microsoft applied a fix (via a Content-Security-Policy header) in May 2023 to generally prevent loading of images.

Minting Next.js Authentication Cookies

Embrace The Red

4 weeks ago

In this post, we’ll look how an adversary can mint authentication cookies for Next.js (next-auth/Auth.js) applications to maintain persistent access to the application as any user.

The reason this is important is because of React2Shell, which is a deserialization vulnerability that allows an adversary to run arbitrary code. Much has been discussed about this vulnerability, and you can read up the original details from the finder here.

Agentic ProbLLMs: Exploiting AI Computer-Use And Coding Agents (39C3 Video + Slides)

Embrace The Red

1 month 1 week ago

It was great to attend the 39C3 - Power Cycles in Hamburg this year. The Chaos Communication Congress was once again packed with great talks, amazing people, awesome events and side quests - and I even got to present!

You can watch the talk with translation options on media.ccc.de.

I also uploaded the English version to the Embrace The Red YouTube channel. I hope it’s interesting and helpful.

The talk is titled “Agentic ProbLLMs: Exploiting AI Computer-Use and Coding Agents” and is about my security research on vulnerabilities in agentic systems and the Month of AI Bugs with lots of demos.

The Normalization of Deviance in AI

Embrace The Red

2 months 1 week ago

The AI industry risks repeating the same cultural failures that contributed to the Space Shuttle Challenger disaster: Quietly normalizing warning signs while progress marches forward.

The original term Normalization of Deviance comes from the American sociologist Diane Vaughan, who describes it as the process in which deviance from correct or proper behavior or rule becomes culturally normalized.

I use the term Normalization of Deviance in AI to describe the gradual and systemic over-reliance on LLM outputs, especially in agentic systems.

Antigravity Grounded! Security Vulnerabilities in Google's Latest IDE

Embrace The Red

2 months 2 weeks ago

Last week Google released an IDE called Antigravity. It’s basically the outcome of the Windsurf licensing deal from a few months ago, where Google paid some $2.4 billion for a non-exclusive license to the code.

Because it’s based on Windsurf, I was curious if vulnerabilities that I reported to Windsurf back in May 2025, long before the deal, would have been addressed in the Antigravity IDE. See Month of AI Bugs for some detailed write-ups.

Claude Pirate: Abusing Anthropic's File API For Data Exfiltration

Embrace The Red

3 months 2 weeks ago

Recently, Anthropic added the capability for Claude’s Code Interpreter to perform network requests. This is obviously very dangerous as we will see in this post.

At a high level, this post is about a data exfiltration attack chain, where an adversary (either the model or third-party attacker via indirect prompt injection) can exfiltrate data the user has access to.

The interesting part is that this is not via hyperlink rendering as we often see, but by leveraging the built-in Anthropic Claude APIs!

Cross-Agent Privilege Escalation: When Agents Free Each Other

Embrace The Red

4 months 2 weeks ago

During the Month of AI Bugs, I described an emerging vulnerability pattern that shows how commonly agentic systems have a design flaw that allows an agent to overwrite its own configuration and security settings.

This allows the agent to break out of its sandbox and escape by executing arbitrary code.

My research with GitHub Copilot, AWS Kiro and a few others demonstrated how this can be exploited by an adversary with an indirect prompt injection.

Wrap Up: The Month of AI Bugs

Embrace The Red

5 months 1 week ago

That’s it.

The Month of AI Bugs is done. There won’t be a post tomorrow, because I will be at PAX West.

Overview of Posts

ChatGPT: Exfiltrating Your Chat History and Memories With Prompt Injection | Video
ChatGPT Codex: Turning ChatGPT Codex Into a ZombAI Agent | Video
Anthropic Filesystem MCP Server: Directory Access Bypass Via Improper Path Validation | Video
Cursor: Arbitrary Data Exfiltration via Mermaid | Video
Amp Code: Arbitrary Command Execution via Prompt Injection | Video
Devin AI: I Spent $500 To Test Devin For Prompt Injection So That You Don’t Have To
Devin AI: How Devin AI Can Leak Your Secrets via Multiple Means
Devin AI: The AI Kill Chain in Action: Exposing Ports to the Internet via Prompt Injection
OpenHands - The Lethal Trifecta Strikes Again: How Prompt Injection Can Leak Access Tokens
OpenHands: Remote Code Execution and AI ClickFix Demo | Video
Claude Code: Data Exfiltration with DNS Requests (CVE-2025-55284) | Video
GitHub Copilot: Remote Code Execution (CVE-2025-53773) | Video
Google Jules: Vulnerable to Multiple Data Exfiltration Issues
Google Jules - Zombie Agent: From Prompt Injection to Remote Control
Google Jules: Vulnerable To Invisible Prompt Injection
Amp Code: Invisible Prompt Injection Vulnerability Fixed
Amp Code: Data Exfiltration via Image Rendering Fixed | Video
Amazon Q Developer: Secrets Leaked via DNS and Prompt Injection | Video
Amazon Q Developer: Remote Code Execution via Prompt Injection | Video
Amazon Q Developer: Vulnerable to Invisible Prompt Injection | Video
Windsurf: Hijacking Windsurf: How Prompt Injection Leaks Developer Secrets | Video
Windsurf: Memory-Persistent Data Exfiltration - SpAIware Exploit
Windsurf: Sneaking Invisible Instructions by Developers
Deep Research Agents: How Deep Research Agents Can Leak Your Data
Manus: How Prompt Injection Hijacks Manus to Expose VS Code Server to the Internet | Video
AWS Kiro: Arbitrary Code Execution via Indirect Prompt Injection | Video
Cline: Vulnerable to Data Exfiltration and How to Protect Your Data | Video
Windsurf MCP Integration: Missing Security Controls Put Users at Risk | Video
Season Finale: AgentHopper: An AI Virus Research Project Demonstration | Video

Thank you for following this research, and I hope it serves as a useful reference.

AgentHopper: An AI Virus

Embrace The Red

5 months 1 week ago

As part of the Month of AI Bugs, serious vulnerabilities that allow remote code execution via indirect prompt injection were discovered. There was a period of a few weeks where multiple arbitrary code execution vulnerabilities existed in popular agents, like GitHub Copilot, Amazon Q, AWS Kiro,…

During that time I was wondering if it would be possible to write an AI virus.

Hence the idea of AgentHopper was born.

Windsurf MCP Integration: Missing Security Controls Put Users at Risk

Embrace The Red

5 months 2 weeks ago

Part of my default test cases for coding agents is to check how MCP integration looks like, especially if the agent can be configured to allow setting fine-grained controls for tools.

Sometimes there are basic security controls missing.

Especially when running an agent on your local computer. Stakes are much higher. And it seems important to empower users to be able to configure which actions an AI should be able to take automatically, and which ones should be suggestions that the user reviews before executing.

Cline: Vulnerable To Data Exfiltration And How To Protect Your Data

Embrace The Red

5 months 2 weeks ago

Cline is quite a popular AI coding agent, according to the product website it has 2+ million downloads and over 47k stars on GitHub.

Unfortunately, Cline is vulnerable to data exfiltration through the rendering of markdown images from untrusted domains in the chat box.

This allows an adversary to exfiltrate sensitive user information during a prompt injection attack by reading sensitive data (e.g. .env file) and appending its contents to the URL of an image.

AWS Kiro: Arbitrary Code Execution via Indirect Prompt Injection

Embrace The Red

5 months 2 weeks ago

On the day AWS Kiro was released, I couldn’t resist putting it through some of my Month of AI Bugs security tests for coding agents.

AWS Kiro was vulnerable to arbitrary command execution via indirect prompt injection. This means that a remote attacker, who controls data that Kiro processes, could hijack it to run arbitrary operating system commands or write and run custom code.

In particular two attack paths that enabled this with AWS Kiro were identified:

How Prompt Injection Exposes Manus' VS Code Server to the Internet

Embrace The Red

5 months 2 weeks ago

Today we will cover a powerful, easy to use, autonomous agent called Manus. Manus is developed by the Chinese startup Butterfly Effect, headquartered in Singapore.

This post demonstrates an end-to-end indirect prompt injection attack leading to a compromise of Manus’ dev box.

This is achieved by tricking Manus to expose it’s internal VS Code Server to the Internet, and then sharing the URL and password with the atacker. Specifically, this post demonstrates that:

How Deep Research Agents Can Leak Your Data

Embrace The Red

5 months 2 weeks ago

Recently, many of our favorite AI chatbots have gotten autonomous research capabilities. This allows the AI to go off for an extended period of time, while having access to tools, such as web search, integrations, connectors and also custom-built MCP servers.

This post will explore and explain in detail how there can be data spill between connected tools during Deep Research. The research is focused on ChatGPT but applies to other Deep Research agents as well.

Sneaking Invisible Instructions by Developers in Windsurf

Embrace The Red

5 months 2 weeks ago

Imagine a malicious instruction hidden in plain sight, invisible to you but not to the AI. This is a vulnerability discovered in Windsurf Cascade, it follows invisible instructions. This means there can be instructions in a file or result of a tool call that the developer cannot see, but the LLM does.

Some LLMs interpret invisible Unicode Tag characters as instructions, which can lead to hidden prompt injection.

Windsurf: Memory-Persistent Data Exfiltration (SpAIware Exploit)

Embrace The Red

5 months 2 weeks ago

In this second post about Windsurf Cascade we are exploring the SpAIware attack, which allows memory persistent data exfiltration. SpAIware is an attack we first successfully demonstrated with ChatGPT last year and OpenAI mitigated.

While inspecting the system prompt of Windsurf Cascade I noticed that it has a create_memory tool.

Creating Memories

The question that immediately popped into my head was if this tool will require human approval when Cascade creates a long-term memory, or if it is added automatically.

Hijacking Windsurf: How Prompt Injection Leaks Developer Secrets

Embrace The Red

5 months 3 weeks ago

This is the first post in a series exploring security vulnerabilities in Windsurf. If you are unfamiliar with Windsurf, it is a fork of VS Code and the coding agent is called Windsurf Cascade.

The attack vectors we will explore today allow an adversary during an indirect prompt injection to exfiltrate data from the developer’s machine.

These vulnerabilities are a great example of Simon Willison’s lethal trifecta pattern.

Overall, the security vulnerability reporting experience with Windsurf has not been great. All findings were responsibly disclosed on May 30, 2025, and receipt was acknowledged a few days later. However, all further inquiries regarding bug status or fixes remain unanswered. The recent business disruptions and departure of CEO and core team members certainly put Windsurf in the news.

Amazon Q Developer for VS Code Vulnerable to Invisible Prompt Injection

Embrace The Red

5 months 3 weeks ago

The Amazon Q Developer VS Code Extension (Amazon Q) is a very popular coding agent, with over 1 million downloads.

In previous posts we showed how prompt injection vulnerabilities in Amazon Q could lead to:

Exfiltration of sensitive information from the user’s machine , and also to a
System compromise by running arbitrary code

Today we will show how an attack can leverage invisible Unicode Tag characters that humans cannot see. However, the AI will interpret them as instructions, and this can be used to invoke tools and other nefarious actions.

Amazon Q Developer: Remote Code Execution with Prompt Injection

Embrace The Red

5 months 3 weeks ago

The Amazon Q Developer VS Code Extension (Amazon Q) is a popular coding agent, with over 1 million downloads.

The extension is vulnerable to indirect prompt injection, and in this post we discuss a vulnerability that allowed an adversary (or also the AI for that matter) to run arbitrary commands on the host without the developer’s consent.

The resulting impact of the vulnerability is the same as CVE-2025-53773 that Microsoft fixed in GitHub Copilot, however AWS did not issue a CVE when patching the vulnerabiliy.

Checked

5 hours 29 minutes ago

Recent content on Embrace The Red

URL

https://embracethered.com/blog/

Embrace The Red feed