groundy
agents & frameworks

Can AI Agents Share Context Without a Central Coordinator?

DeLM replaces central multi-agent coordinators with shared context, posting 10.5-point SWE-bench gains at half cost. Consistency, stale reads, and write conflicts remain.

8 min · · · 5 sources ↓

As of mid-2026, every widely-used multi-agent framework in production assumes a central node. CrewAI routes tasks through a manager agent. AutoGen funnels conversations through a GroupChat controller. LangGraph maintains a shared state graph with a single coordinator deciding what runs next. A June 2026 paper from arXiv proposes an alternative: agents that coordinate through a shared context store with no supervisor routing every message, and it posts measurable gains on two standard benchmarks while cutting costs in half. The question is whether the mechanism survives contact with real workloads where writes conflict and context goes stale.

The Central-Coordinator Assumption

The dominant multi-agent architectures all share a structural trait: a single point of orchestration. CrewAI (roughly 48k GitHub stars as of April 2026) chains agents through a manager that assigns tasks, merges outputs, and handles errors. LangGraph exposes a state graph where edges and conditional routing live in one place. AutoGen uses a GroupChat manager to determine speaker order and message flow.

This is not an accident. Central coordination is the easiest model to reason about because the orchestrator has a total view of agent state. The trade-off is that the orchestrator becomes the bottleneck. Every update passes through it. Every error is its responsibility. And when the coordinator’s context window fills or its planning latency spikes, every agent waits.

The production track record is uneven. As one practitioner comparison notes, LangGraph is designed to “emit traces at every node transition”, while higher-abstraction frameworks tend to be “black boxes.” AutoGen’s v0.4 rewrite broke most existing integrations. These are framework-specific failures, but they share a common root: the central coordinator is doing too much of the work that agents could do themselves.

The DeLM Alternative: Shared Context Without a Supervisor

DeLM (Decentralized Language Model Multi-Agent Framework) replaces the central controller with three components: parallel agents, a shared verified context, and a task queue. No single node decides what runs next. Agents read from the shared context, pick up tasks from the queue, and write verified results back. Coordination emerges from the context itself rather than from an orchestrator’s routing logic.

The “verified” qualifier matters. Agents don’t write raw output directly into shared context. Each contribution passes a verification step before it becomes visible to other agents. This is the mechanism that makes leaderless coordination tractable: agents don’t need to trust each other’s unchecked outputs because the context store only admits verified entries.

The benchmark results, as reported in the June 2026 DeLM paper, are concrete. On SWE-bench Verified, DeLM achieved the best performance across Avg.@1, Pass@2, and Pass@4 metrics, with gains of up to 10.5 percentage points over the strongest baseline, while reducing cost per task by roughly 50% (DeLM evaluation). On LongBench-v2 Multi-Doc QA, DeLM achieved the highest average accuracy across four frontier model families, improving over the strongest baseline by up to 5.7 percentage points.

These are strong numbers on established benchmarks. The cost reduction is particularly notable: removing the central coordinator doesn’t just simplify the architecture, it measurably reduces the token spend per task because agents aren’t waiting on a supervisor’s planning calls.

How CA-MCP Reduces the Central LLM Bottleneck

DeLM is not the only recent work moving in this direction. CA-MCP (Context-Aware Model Context Protocol) introduces a Shared Context Store where the central LLM functions only as a long-term planner for high-level goals and final summarization. The MCP servers act as stateful short-term reactors that autonomously read from and write to shared context.

The architecture inverts the usual relationship. Instead of the LLM calling tools, the tools are running their own coordination logic against a shared state. The central model is still present, but its responsibilities narrow to goal-setting and synthesis, not step-by-step routing. This is a smaller shift than DeLM’s fully leaderless model, but it points in the same direction: the central LLM does less, and the shared context does more.

The Consistency Catch: Staleness and Conflicting Writes

Removing the central coordinator eliminates the bottleneck but introduces a different problem. When multiple agents read and write to a shared context with no single arbiter, the system needs to handle three failure modes that central coordination avoids by construction:

Stale reads. Agent A writes a partial result. Agent B reads the context before A’s verification completes. B now operates on incomplete or outdated information. DeLM’s verification step mitigates this by gating writes, but the read path still has a window where context is inconsistent.

Conflicting writes. Two agents independently tackle overlapping subtasks. Both write verified results to shared context. The results contradict each other. Without a coordinator to resolve the conflict, the system needs a separate reconciliation mechanism, or it needs to accept that some context entries are mutually exclusive.

Cascading errors. A verified-but-wrong entry in shared context propagates to every agent that reads it. In a central-coordinator model, the supervisor can catch and quarantine bad outputs before they spread. In a decentralized model, the verification step is the only gate, and if it passes a wrong result, there is no second line of defense.

These are not theoretical concerns. They are the standard distributed-systems problems that reappear whenever a system moves from central coordination to shared-state concurrency. The database world spent decades building isolation levels, conflict resolution, and consensus protocols for exactly this class of failure. Multi-agent systems are about to rediscover all of them.

Benchmarks vs. Real-World Failure Modes

The SWE-bench and LongBench-v2 benchmarks are well-structured evaluation environments with clearly scoped tasks and ground-truth answers. They are also environments where conflicting writes are unlikely because the tasks are decomposable into mostly independent subproblems. An agent fixing one function rarely steps on an agent fixing another.

Open-ended tasks do not have this property. A multi-agent system debugging a production incident, writing a grant proposal, or refactoring a coupled codebase will generate overlapping work products where conflicts are the norm, not the exception. The benchmarks show that decentralized coordination works when tasks are partitionable. They do not show that it works when partitioning itself is the hard problem.

Enterprise orchestration architectures illustrate the gap. PwC’s Agent OS positions multi-agent coordination as a switchboard function, while the emerging Agent-to-Agent (A2A) protocol governs peer coordination, negotiation, and delegation. These systems assume that coordination requires explicit protocol-level negotiation between agents, not just shared context. The assumption may be conservative, but it reflects production experience: real workloads generate conflicts that shared context alone does not resolve.

What Practitioners Should Watch

The architectural shift from central coordination to shared context is real and the benchmark evidence is strong, but the gap between benchmark performance and production readiness is where the actual engineering work lives. Three things to track:

  1. Conflict resolution mechanisms. DeLM’s verification step handles bad inputs but does not obviously handle conflicting good inputs. The next iteration of these systems will need something analogous to optimistic concurrency control or conflict-free replicated data types. Whichever team ships a working version of this first will have a genuine advantage.

  2. Observability in leaderless systems. Debugging a central coordinator is difficult enough. Debugging a system where coordination emerges from shared state, with no single node’s logs telling the whole story, is a categorically different problem. As of mid-2026, the observability tooling for this does not exist.

  3. Framework adoption signals. CrewAI and AutoGen are not going away. If their next major versions start offering shared-context modes alongside their existing central-coordination modes, that is a stronger signal than any arXiv paper that the industry is moving toward hybrid architectures rather than wholesale replacement.

The central-coordinator model is a bottleneck, but it is a bottleneck practitioners understand. The decentralized alternative posts better numbers on structured benchmarks and halves the cost. The open question is whether it can handle the messy, conflicting, overlapping tasks that make up most real multi-agent work. The answer to that question will determine whether shared context becomes the default coordination mechanism or remains a specialized technique for well-partitioned problems.

Frequently Asked Questions

What does a team migrating from CrewAI to a shared-context model actually need to build?

CrewAI’s manager agent handles task decomposition, routing, and error recovery out of the box. Replacing it requires building a verified context store with read/write gating, a task queue that agents can pull from autonomously, and a verification function that validates outputs before they become visible to other agents. Teams also need a conflict resolution layer that CrewAI never required, because the manager previously serialized all decisions. The DeLM paper does not specify an off-the-shelf implementation of any of these components.

How does DeLM’s verification step differ from consensus protocols like Raft or Paxos?

Raft and Paxos solve agreement on a single value among known participants with strong consistency guarantees. DeLM’s verification validates individual agent outputs against task requirements before writing them to shared context, but it does not establish ordering or agreement across concurrent writes. Two agents can pass verification with contradictory results, and both can land in the context store. The verification step is closer to a type check than a consensus round.

What workload characteristics make shared-context coordination riskiest?

Tasks with tight coupling between subproblems, where one agent’s output changes the correct answer for another agent’s subtask, are the worst case. Refactoring a monorepo with cross-module dependencies is one example: two agents might independently refactor interdependent interfaces and each produce verified but mutually incompatible changes. SWE-bench avoids this problem because its issues decompose into mostly independent function-level fixes with low coupling between patches.

Can DeLM mix model families within a single run, or must all agents use the same LLM?

The LongBench-v2 evaluation tested DeLM across four distinct frontier model families, which implies heterogeneous agent composition works in practice. This matters for cost: a team can route simpler verification steps to cheaper models while reserving expensive ones for complex reasoning. Central orchestrators like CrewAI’s manager can also do model routing, but every routing decision adds a planning call that a shared-context architecture skips entirely.

sources · 5 cited

  1. CrewAI: The Leading Multi-Agent Platform vendor accessed 2026-06-10
  2. Multi-Agent Orchestration: LangGraph vs CrewAI vs AutoGen vs Custom analysis accessed 2026-06-10
  3. Decentralized Multi-Agent Systems with Shared Context primary accessed 2026-06-10
  4. Enhancing Model Context Protocol (MCP) with Context-Aware Server Collaboration primary accessed 2026-06-10
  5. The Orchestration of Multi-Agent Systems: Architectures, Protocols, and Enterprise Adoption primary accessed 2026-06-10