When an AI Agent Causes a Loss, Who Files the Insurance Claim?

Filing an insurance claim for a loss caused by an autonomous AI agent requires reconstructing a causal chain that most current agent deployments do not log. arXiv:2606.03777, published June 2, 2026 by Alex Leung, proposes the CER framework to close this gap: a three-part diagnostic defining what operators need to retain, reconstruct, and prove before an insurer will pay out.

Why AI Agent Losses Break Traditional Insurance Models

Traditional insurance underwriting depends on event reconstruction: something happened, there is a record of it, and the policy covers that category of event. AI agents break this model. The relevant question shifts from what happened to what the system was allowed to do, what it actually did, and the causal chain connecting the two.

The CER paper argues that AI losses require state reconstruction, not merely event reconstruction. The system’s state changes as it reasons, retrieves context, calls tools, and acts. Without capturing those intermediate states, the causal chain is irrecoverable after the fact.

This is not theoretical. The paper cites three anchor cases: a reported PocketOS agentic database-deletion incident, a reported Replit agentic database-deletion incident, and the adjudicated Moffatt v. Air Canada case (arXiv:2606.03777). The first two involve agents taking real destructive actions in production systems. Moffatt v. Air Canada is a different vector: an output-reliance claim where the court held the operator responsible for the AI system’s output. It concerns chatbot output rather than agentic tool use, so stretching it to cover autonomous agent causality requires caution. What it does establish is the judicial pattern: operators bear responsibility for AI system behavior, whether the system acts or merely speaks.

Inside the CER Framework: Control, Evidence, Response

CER operates at the use-case level with three legs. Each must be satisfied for an AI-mediated loss to qualify as insurable.

Control boundary (C): Did the system have an enforceable operating envelope? This is not a protocol boundary (MCP, A2A, or similar). It is the set of constraints defining what the agent is permitted to do: permissions, rate limits, scope restrictions, human-in-the-loop checkpoints. If the control boundary was undefined or unenforced when the loss occurred, CER flags it as a gap in the insured’s duty of care.

Evidence reconstruction (E): Can the system state and causal chain be rebuilt from retained artifacts? This requires action logs, tool inputs and outputs, retrieved context, chain-of-thought traces, and permission-state snapshots. The paper specifically calls out externally triggered failures: prompt injection, retrieval-augmented generation (RAG) poisoning, malicious tool output, credential misuse, and data poisoning (arXiv:2606.03777). In these scenarios, the insured’s AI system sits in the causal chain, but the trigger may originate outside the operator’s perimeter.

Insurance response (R): Is the reconstructed loss covered by an existing policy, and can the proof survive insurer scrutiny? This leg connects the technical diagnostic to the commercial question. Even with complete logging, the policy must cover the failure mode and the artifacts must meet the evidentiary bar.

The Calibrated-Abstention Gap

A related structural problem: agents that never refuse. arXiv:2604.19457, an April 2026 study of long-horizon enterprise agents, tested six memory architectures and found that all six committed on every case. None abstained. The paper describes this as a calibrated-abstention gap: aggregate accuracy metrics mask the fact that the system never says no.

This matters for insurability because an agent that cannot abstain is an agent that will eventually act outside its competence. If the loss involves a regulated decision such as claims adjudication, the absence of any abstention mechanism becomes evidence that the operator did not implement adequate safeguards. Under CER’s control-boundary leg, a system with no mechanism for refusing to act has an incomplete operating envelope by definition.

The Regulatory Clock

The CER framework arrives during a regulatory compression. Agent Liability’s global desk tracks AI agent liability regulation across at least 17 jurisdictions. In the EU, the AI Act’s Article 5 prohibitions, Article 50 transparency obligations, and GPAI obligations under Articles 53 and 55 activate August 2, 2026. These provisions take effect regardless of the Digital Omnibus delay to Annex III high-risk classifications.

As of August 2026, operators deploying autonomous agents in EU markets face statutory duties that overlap with CER’s diagnostic structure. Article 5 prohibits specific AI practices. Article 50 requires transparency about AI interaction. Articles 53 and 55 impose documentation and risk-management requirements on general-purpose AI providers. None are insurance requirements per se, but they establish the regulatory baseline that insurers underwrite against.

The Insurance Market Today

As of mid-2026, the specialty AI insurance market exists but is fragmented and expensive. Agent Liability describes it as underwritten against the same operator duties the statutes describe: control boundaries, evidence retention, and documented risk management. The market is not waiting for standards to settle, but it is also not offering cheap coverage for operators who cannot demonstrate these capabilities.

A 2026 industry report projects 60% of the Fortune 100 will appoint a designated head of AI governance. Treat this as directional rather than precise.

What to Instrument Before Your Agent Ships

If CER’s three legs describe the minimum evidentiary bar for insurability, the deployment implication is direct: logging architecture decisions made today determine whether tomorrow’s agent failure is an insurable event or an uninsured loss.

For teams shipping autonomous agents as of June 2026:

Define the control boundary explicitly. Document what the agent is permitted to do, at what scope, with what constraints. Make this machine-enforceable, not just policy-level prose.
Log every tool call’s input and output. Not just the action taken, but the context and reasoning that produced it. This is the evidence-reconstruction leg.
Implement abstention mechanisms. The calibrated-abstention gap from arXiv:2604.19457 is a control-boundary deficiency. An agent that never refuses has an incomplete operating envelope.
Run post-incident reconstruction tests. Inject a known failure mode and try to reconstruct the causal chain from existing logs. If it cannot be done in a controlled test, it will not be possible after a real loss.
Scope insurance to the residual. The AI insurance market underwrites against operational safeguards. Deploy the controls first; the policy covers what is left over.

Frequently Asked Questions

Does CER cover text-only AI systems, or just agents that take actions?

CER is a use-case-level framework, so it can apply to non-agentic outputs. Moffatt v. Air Canada shows that output-reliance claims already produce operator liability without tool use. However, CER’s control-boundary and evidence-reconstruction legs assume tool calls and permission states. For a static chatbot with no tool access, the control boundary collapses to input-output filtering, and the evidence leg shrinks to prompt-response pairs, which is a narrower diagnostic.

How does CER’s evidence bar differ from SOC 2 or ISO 27001 audit trails?

Traditional compliance frameworks log discrete access events: who touched what resource, when, and with what credential. CER demands the reasoning chain behind the action: retrieved context, intermediate tool outputs, chain-of-thought traces, and permission-state snapshots. An ISO 27001 trail can confirm an agent accessed a database. CER’s bar requires showing why the agent decided to act, what context informed that decision, and whether the operating envelope permitted it.

What breaks in multi-agent systems where one agent’s output feeds another?

CER addresses individual agent deployments. In multi-agent architectures, attribution breaks when Agent A passes corrupted or out-of-scope context to Agent B, who then causes a loss. Agent B’s control boundary may be intact while the harmful input originated upstream. MCP and A2A protocol boundaries operate at a different layer than CER’s control boundary, so orchestrated agent fleets require extending the evidence-reconstruction leg to cover inter-agent context handoffs specifically.

How soon could CER-style logging become a legal requirement rather than an insurance suggestion?

Gartner projects fragmented AI regulation covering half the world’s economies by 2027, driving USD 5 billion in compliance costs. With the EU AI Act’s Articles 53 and 55 activating August 2026 and requiring documentation and risk management from GPAI providers, the logging infrastructure CER describes is on track to shift from an insurance prerequisite to a regulatory minimum within 12 to 18 months across multiple jurisdictions.