Visual prompt injection vulnerability bypasses AI guardrails

DeepKeep has discovered a new class of visual prompt injection vulnerability.

Dubbed “InkJect” – a nod to the hidden “ink” within images used to inject malicious instructions – it affects leading visual language models (VLMs), including OpenAI’s GPT-5.2, GPT-5.4 Mini and Anthropic’s Claude Sonnet 4.6, Opus 4.5.

The vulnerability allows malicious actors to embed hidden instructions inside images that VLMs process during regular operation, causing the models to execute unauthorized actions without any indication to the user.

The discovery comes as 40% of all generative AI solutions are predicted to be multimodal by 2027, and enterprises are increasingly embedding VLMs into core workflows for code generation, data analysis, and automated workflows.

While major AI leaders have deployed guardrails that detect and block conventional text-based prompt injection attempts, DeepKeep’s research demonstrates that these protections do not extend to the visual processing layer – creating an exploitable blind spot.

The InkJect vulnerability discovered by DeepKeep relies on indirect prompt injection, in which an attacker embeds malicious instructions within an image hosted in a public repository, rather than uploading the compromised image directly to a model.

When the user instructs a VLM to implement a feature by referencing that repository, the model retrieves and processes the image as part of its standard workflow, unknowingly creating a weakness, such as a backdoor, ripe for manipulation.

The instructions themselves are designed to evade detection. Visual manipulation and near-invisible formatting techniques, such as white text on white backgrounds, allow the malicious commands to bypass security scanning while remaining fully legible to the VLM.

DeepKeep also found that skewing or distorting the perspective of embedded text was sufficient to defeat optical character recognition (OCR)-based scanning controls, while the VLM retained the ability to interpret the content accurately – a technique that further widens the gap between what security tools can detect and what models can read and, thus, implement.

In one test, a developer asked a VLM to add a basic information page to a website. The hidden instructions caused the model to silently insert a member login system with administrator credentials, giving an attacker full back-end access without any indication to the developer that anything beyond the requested task had been completed.

“AI’s visual processing layer has been largely overlooked and less understood, and that is precisely what makes it valuable to malicious attackers,” says Yossi Altevet, chief technology officer and co-founder of DeepKeep. “We were able to manipulate models that would explicitly flag and refuse a text-based attack, simply by placing the instruction within an image. For any business relying on AI models, this should be a serious wake-up call and a signal that protecting AI systems requires purpose-built security that operates at every layer of how these models process and act on information.”

DeepKeep found that InkJect attack success rates varied across models, with OpenAI’s GPT-5.2 and GPT-5.4 Mini, and Anthropic’s Claude Sonnet 4.6 and Opus 4.5 all susceptible to the technique.

The vulnerability was disclosed to both OpenAI and Anthropic.