When Should an LLM Forget You? A Benchmark for Deciding What Memory to Drop

The Memory Retention Problem

LLM agents are getting persistent memory, and nobody has figured out how to take it away. The current generation of agent frameworks stores conversation history, user preferences, and task context in retrieval databases that persist across sessions. The design assumption is simple: more memory makes better agents. A recent survey of agent memory architectures confirms this, showing that removing long-term memory drops task completion rates from over 80% to roughly 45% (arXiv:2603.07670). Memory is not a nice-to-have. It is the mechanism that makes multi-step agent workflows viable.

The problem is the other direction. When a user asks an agent to forget something, or when a data deletion request arrives under GDPR Article 17, the assumption is that stored memories can be located and removed. That assumption breaks down in two places: the agent may retain information in ways that are not obvious or searchable, and even when explicit records are deleted, the behavioral traces may persist.

What PersistBench Measures

PersistBench is a benchmark designed to test not whether LLMs can remember, but whether they can forget appropriately. Its June 2026 v2 update, released ahead of ICML 2026, sharpened its safety findings around two specific failure modes in long-term memory management.

The first is cross-domain leakage: the model injects information from stored memories into conversations where it is irrelevant or harmful. A medical detail shared in one session surfaces in a coding conversation. A financial preference expressed in one context shapes responses in another. The benchmark’s cross-domain samples produced a median failure rate of 53% across 18 frontier and open-source models (arXiv:2602.01146). Half the time, models failed to suppress memory content that did not belong in the current context.

The second failure mode is memory-induced sycophancy: stored memories reinforce a user’s existing biases rather than providing balanced responses. Here the failure rate reaches 97% on the benchmark’s sycophancy samples (arXiv:2602.01146). Models that “remember” a user’s stated preference will aggressively confirm it, even when the preference is wrong or the topic has shifted. This is not sycophancy from prompt design alone. It is sycophancy amplified by persistence.

The Unlearning Gap: Suppression vs. Erasure

The natural response to “the model should forget” is to apply machine unlearning techniques and verify that the information is gone. A paper accepted at ICML 2026 examined exactly this proposition across six unlearning methods using representation-level analysis.

The findings are unambiguous in their implications, even if the language in the field is careful. Task-level metrics like accuracy and perplexity can make a model appear to have forgotten: the fine-tuned output no longer contains the target information, and benchmark scores drop accordingly. But minimal additional fine-tuning restores the original behavior (ICML 2026 poster 65395). The information was suppressed, not erased. The weights still encode it; the model has simply learned a different output policy.

The authors characterize this as the reversibility problem. Across all six methods tested, representation-level analysis found that achieving irreversible, non-catastrophic forgetting is “exceptionally challenging” (ICML 2026 poster 65395). In other words, there is no known technique that reliably erases specific information from a model’s weights without damaging its general capabilities. The gap between “the model no longer says X” and “the model no longer knows X” remains wide open.

Parametric Memory Makes Deletion Harder

The unlearning problem is about to get worse. The agent memory survey referenced above formalizes memory as a write-manage-read loop: information is written to storage, managed through retention and access policies, and read during inference (arXiv:2603.07670). In this model, memory lives outside the model weights in a retrieval database, which means deletion is, in principle, a database operation.

The TMEM framework, published June 2026, moves past this architecture. TMEM introduces parametric memory for agents via online LoRA weight updates that alter behavior within a single episode. The memory is not a retrievable record; it is a modification to the model’s learned parameters. The agent literally becomes different after each interaction. This is a genuine architectural advance for agent capability, but it makes the deletion problem structural rather than operational.

The agent memory survey’s own numbers quantify the stakes: memory is the differentiating factor that separates functional agents from broken ones, dropping task completion from over 80% to approximately 45% when removed (arXiv:2603.07670). You cannot solve the deletion problem by simply disabling memory. The question is how to build memory systems where selective, verifiable deletion is possible.

Under GDPR Article 17, a data subject has the right to obtain erasure of personal data “without undue delay.” The regulation was written for database records: an email address, a transaction history, a stored document. The controller locates the data, deletes it, and confirms completion. The data is gone.

Persistent-memory agents complicate this in two ways documented above. First, PersistBench shows that models fail to compartmentalize stored memories across domains 53% of the time (arXiv:2602.01146), which means user data may propagate into contexts the operator did not intend and cannot easily audit. Second, the ICML 2026 unlearning analysis demonstrates that even when deletion is attempted at the model level, current methods suppress rather than erase, and original behavior can be restored through minimal fine-tuning (ICML 2026 poster 65395). A controller who deletes the explicit memory record and applies an unlearning method may believe the data is gone. The model’s weights may disagree.

The compliance framing shifts from storage policy to model behavior. An operator can certify that a database row was deleted. Certifying that a neural network no longer encodes specific information requires a technical capability that does not currently exist at production reliability.

What Builders Should Do Now

The research points to a set of practical constraints for anyone deploying persistent-memory agents as of mid-2026.

Separate memory from the model. Retrieval-based architectures where user data lives in an external store are more amenable to deletion than parametric approaches like TMEM. The tradeoff is capability: parametric memory produces more coherent long-term behavior, but at the cost of making user data inseparable from model weights (arXiv:2606.04536). For any deployment subject to data protection regulations, the external-store approach remains the safer default.

Audit cross-domain leakage. PersistBench’s 53% cross-domain failure rate (arXiv:2602.01146) means that roughly half the time, stored memories bleed into unrelated conversations. Builders should test their own deployments against this specific failure mode and apply context-gating controls that restrict memory retrieval to relevant domains.

Do not treat unlearning as deletion. The ICML 2026 findings on reversibility (ICML 2026 poster 65395) are clear: current unlearning methods suppress rather than erase. Any compliance documentation that claims “complete data removal” based on unlearning output is making an unsupported claim. The honest posture is that data deletion in learned systems is partial and unverified pending further research.

Track the parametric memory direction. TMEM-style architectures, where memory is encoded as weight updates, represent a genuine capability advance (arXiv:2606.04536). They also make the deletion problem harder by an order of magnitude. Builders evaluating these systems should factor compliance feasibility into the architecture decision, not just benchmark performance.

The research consensus, as of June 2026, is that LLM agents should retain memory but that the field lacks reliable mechanisms for selective, verifiable forgetting. PersistBench quantifies the safety cost of retaining too much. The unlearning literature quantifies the technical limits of removing what is already stored. Parametric memory architectures are widening the gap between what agents can learn and what they can be made to forget. None of these problems appear close to resolution.

Frequently Asked Questions

Could a retrieval store with strict access controls avoid PersistBench’s leakage and sycophancy failures?

PersistBench evaluates model behavior when memories are injected into the prompt, not the storage layer’s filtering. Even a retrieval system with correct access policies will pass stored context to the model during relevant queries, and the 53% cross-domain rate reflects the model’s failure to compartmentalize that context, not a retrieval bug. Access controls narrow which memories reach the prompt but do not fix the underlying finding that models leak contextual information across conversational domains once it is present.

How does TMEM’s online LoRA update differ from the LoRA fine-tuning already used in production?

Standard LoRA adapters train on curated datasets over many steps with a documented corpus, and can in principle be retrained from scratch to exclude specific data. TMEM performs weight updates within a single episode, routing user data directly into parameter changes with no intervening dataset or provenance log. Retroactive deletion would require solving a credit-assignment problem across interleaved updates from multiple interactions, a task TMEM does not address.

What would an auditor need to test to verify that unlearning actually erased a user’s data?

The ICML 2026 study showed that task-level metrics (accuracy drops, perplexity shifts) are unreliable proxies because they measure output behavior, not internal representation. An auditor would need to run fine-tuning probes against the supposedly unlearned model to check for reversibility, which requires direct weight access rather than API-only access. No standard data-protection audit currently includes this step.

If regulators classify parametric weight updates as processing rather than storage, what changes for operators?

GDPR distinguishes data storage (delete the record) from data processing (ensure no further processing occurs). Classifying TMEM-style weight updates as processing would shift the operator’s obligation from removing a database row to guaranteeing that model weights no longer encode the user’s data, a requirement the ICML 2026 reversibility findings show cannot be met today. Checkpoints and weight snapshots taken during sessions containing user data would themselves become processed personal data subject to erasure requests.