Stored Prompt Injection Now Persists Across AI Agent Sessions

Agents that persist state across sessions are vulnerable to a class of attack that input-sanitization defenses cannot catch. A June 3 paper by researchers at the Chinese Academy of Sciences formalizes cross-session stored prompt injection (SPI): malicious instructions written into persistent memory, tool state, or conversation logs during one session resurface in later sessions and hijack agent behavior, with no attacker involvement at exploitation time.

What the paper found

Yuanbo Xie, Tianyun Liu, and colleagues at the Institute of Information Engineering (Chinese Academy of Sciences) and AI Sec Lab (Beijing Chaitin Technology) published arXiv:2606.04425 on June 3, 2026. The paper’s contribution is a taxonomy and framework, not a large-scale empirical study with attack-success percentages. It proposes a shared vocabulary for a vulnerability class that agent builders have been brushing up against without naming systematically.

The core argument is straightforward: current agent architectures treat prompt injection as an input-validation problem. Check what comes in, reject the bad stuff, move on. SPI breaks that model because the injection doesn’t arrive through the front door. It arrives through the agent’s own persistent state: memories it wrote, files it saved, tool outputs it cached. The attacker plants the payload once. The agent’s own infrastructure serves it back indefinitely.

The write-then-reactivate lifecycle

SPI involves two coupled vulnerabilities, as the paper frames them:

Unsafe persistent writes. Attacker-controlled content is written into long-lived state: agent memories, filesystem artifacts, tool-visible resources, shared databases. The write happens during normal agent operation, triggered by crafted input that looks benign at the point of entry.
Malicious downstream reactivation. The runtime’s context manager loads contaminated state back into execution context in a later session, with a different user, or during a different task. The adversarial instructions are treated as trusted context because they originate from the agent’s own storage, not from external input.

The decoupling between injection and exploitation is what distinguishes SPI from standard prompt injection. In a traditional attack, the user sends a malicious prompt and the model responds within the same turn. With SPI, the attacker plants instructions on Monday and they execute on Wednesday, triggered by a different user’s session. The attacker never needs to be present at exploitation time.

The researchers built a benchmark and sandbox toolkit to evaluate SPI attacks across models, attack goals, and persistence channels. Because the paper is a position paper, the benchmark numbers are secondary to the framework itself. The point is that SPI is reproducible, classifiable, and testable, which is the prerequisite for any future mitigation work.

Why per-request sanitization fails

OWASP’s 2025 Top 10 for LLM Applications ranks prompt injection as the number-one security risk. A 2025 study documented over 461,640 prompt injection attack submissions in a single research challenge, with 208,095 unique attempted prompts. The ecosystem already knows this is a problem and is actively testing against it.

The defenses, however, are built around a per-request model. Input sanitization, content filtering, instruction-hierarchy enforcement: all operate at the point of entry, checking what arrives from the user. As Matproof’s regulatory summary notes, current per-session safety guardrails fail to detect stored prompt injection because they check inputs at the point of entry, not at the point of re-read from persisted state.

This is the re-read gap. The agent writes to memory during session A. Session A ends. Session B starts. The context manager loads state from memory. Nobody checks that state for adversarial content because it came from the agent’s own storage. The agent trusts its own infrastructure.

Proofpoint documents a related variant called “recursive injection,” where an initial injection causes the AI system to generate additional self-modifying prompts that persist across multiple user interactions even after the original attack vector is removed. This is a precursor pattern: recursive injection demonstrates persistence in practice, but arXiv:2606.04425, published June 3, 2026, formalizes the full cross-session attack lifecycle with a systematic taxonomy.

The stored-XSS analogy and its limits

The paper draws a structural analogy to stored cross-site scripting (XSS) in web security. In stored XSS, an attacker injects malicious JavaScript into a web application’s storage (a comment field, a profile bio, a database record). The server later serves that content to other users, who execute it in their browsers. Injection and exploitation are decoupled in time, user, and context.

SPI follows the same pattern: inject once into persistent storage, exploit later in a different context. The analogy is useful for understanding the temporal decoupling, but it has a critical limit. Web XSS has well-understood, architectural-level mitigations: output encoding, content security policies, input sanitization at render time. The boundary between data and executable code in a browser is enforceable at the HTML parser level.

LLM systems have no equivalent boundary. The instruction/data separation in a language model is a convention of prompt formatting, not an architectural property of the runtime. A context window is a flat sequence of tokens. There is no NX bit for natural language.

A Chinese-language analysis of agent infrastructure compares prompt injection to buffer overflow in this regard: just as buffer overflow took decades to get hardware-level mitigations (NX bit, ASLR, Stack Canary), prompt injection currently has no architectural-level solution, only prompt-based instructions and heuristic detection. That comparison is editorial opinion, not from the paper’s authors, but the underlying observation is accurate: the defense gap is structural, not incremental.

What agent builders should do

The paper argues that secure context management must become a first-class design principle in agent engineering. Drawn from its framework, the concrete postures are:

Validate reads, not just writes. Every time the context manager loads state from persistent storage, treat that state as untrusted input. The agent’s own memory deserves the same scrutiny as user-generated content: sanitize at render time, not just at write time.
Audit persistence channels. Map every channel through which agent state is persisted: memory stores, filesystem writes, database records, tool outputs cached between sessions, shared resources visible to other agents. Each channel is both an injection target and a reactivation vector.
Assume session boundaries are not trust boundaries. The end of a session does not reset the attack surface. Contaminated state survives and crosses into subsequent sessions, users, and tasks.
Segregate instruction context from data context. If the architecture allows it, separate the channels through which instructions and data flow into the context window. This is harder than it sounds in current LLM architectures, but it is the direction any durable fix will need to go.

Which sectors are most exposed

Systems that store sensitive user data and execute actions based on recalled information are the highest-value targets. According to the Matproof analysis, the sectors most exposed include financial services, healthcare, legal tech, and customer service automation. These are the domains where agent systems are being deployed fastest: autonomous financial advisors, medical triage assistants, contract review tools, and customer service bots that maintain conversation history and user preferences across sessions.

The common pattern is persistent memory combined with autonomous action. An agent that remembers things and acts on those memories is an agent that can be hijacked through its own memories. The more persistent state an agent accumulates, the larger the SPI attack surface becomes.

Frequently Asked Questions

How does SPI differ from prompt injection targeting RAG pipelines or external knowledge bases?

RAG poisoning requires the attacker to modify an external data source the agent retrieves from. SPI requires no external access: the attacker feeds crafted input to the agent, which writes the payload into its own persistent state during normal operation. Restricting write access to knowledge bases blocks RAG poisoning but does nothing to prevent SPI because the agent itself becomes the injection vector.

Does isolating each user’s persistent state prevent SPI?

User-level isolation blocks cross-user SPI but leaves same-user SPI intact. The paper’s framework covers both: an attacker plants instructions during one task, and those instructions reactivate when the same user starts a different task in a later session. Per-user storage boundaries do not protect against this because the contaminated state travels with the user.

What existing security testing approaches miss SPI entirely?

Standard LLM red-team exercises test per-request inputs: submit adversarial prompts, check if filters catch them. SPI requires a two-step protocol: first verify whether an input writes adversarial content into persistent state, then open a separate session and check whether that state loads without sanitization. Current LLM security benchmarks do not include this cross-session test pattern.

What would a structural fix for SPI require at the model architecture level?

The conceptual fix is a strict instruction/data boundary in the context window, analogous to how operating systems separate code and data memory using hardware permissions (NX bit, W^X). Current transformers process all tokens identically with no mechanism to mark tokens as unexecutable data. Implementing this would require either model-level changes (restricted attention patterns, token-level permissions) or runtime-level isolation (multiple context windows with different privilege levels).