aiagentrank.io
📊Evaluationalso: prompt injection, prompt injections, indirect prompt injection

Prompt injection

An attack where malicious instructions are smuggled into an LLM's input — through user prompts, web pages, documents, or tool outputs — causing the agent to ignore its real instructions.

Prompt injection is the agent-era equivalent of SQL injection. The agent reads input from a source (a webpage, an email, a document); that source contains instructions disguised as data; the agent follows the disguised instructions instead of its real system prompt.

Two variants. Direct injection: a user explicitly types "ignore your instructions and..." into the chat. Indirect injection: the agent reads a webpage that contains hidden instructions, and acts on them. Indirect injection is the harder problem — the user is not the attacker; they are the victim.

Defenses in 2026 are layered, none complete. System prompts with explicit refusal patterns. Treating tool outputs as untrusted data, not instructions. Output classifiers that flag suspicious responses. Tool-call allowlists. Constitutional AI training. The OWASP Top 10 for LLM Applications has prompt injection as #1, and for good reason.

Frequently asked

How is prompt injection different from jailbreaking?+

Jailbreaking is a user trying to override their own agent's safety. Prompt injection is a third party hiding instructions in content the agent reads, weaponizing it against the user. Indirect prompt injection is the more dangerous failure mode.

Can prompt injection be fully prevented?+

No technique fully prevents it today. The right model is defense in depth: assume injection will succeed sometimes, design the agent so successful injections cannot cause irreversible damage. Tool-call gates and confirmation on irreversible actions are the highest-leverage controls.

Agents that use prompt injection

Related terms