Agents & Frameworks
43 articles exploring Agents & Frameworks. Expert analysis and insights from our editorial team.
The agentic layer of AI is where the interesting engineering problems live. This cluster covers the full stack of autonomous AI systems: the orchestration frameworks that sequence tool calls, the memory architectures that let agents stay coherent across sessions, the trust and autonomy models that determine how much rope you give a running agent, and the emerging protocol standards that connect those agents to external systems.
On the protocol front, the Model Context Protocol has become the connective tissue of the modern AI stack. Groundy has tracked MCP from its initial Anthropic release through GitHub’s registry play, the Zed and JetBrains integrations, and the ACP (Agent Communication Protocol) effort that targets multi-agent coordination at the registry level. The MCP registry now hosts thousands of servers; the question has shifted from “will this standard win?” to “how do you audit what you’re connecting to?”
Framework choices are less settled. PydanticAI and LangChain represent genuinely different philosophies—type-safe structured outputs versus flexible chain composition—and the right answer depends heavily on whether you’re building a one-shot pipeline or a long-running autonomous process. CrewAI, AutoGen, and Superpowers sit further up the abstraction stack, enforcing workflow discipline on top of bare tool-use APIs.
Memory architecture is the underappreciated dimension. Hindsight, vector stores, and session management each solve a different failure mode—context overflow, retrieval hallucination, and cross-session amnesia are distinct problems with distinct solutions. The pattern of conflating them is one reason production agent deployments underperform their demos.
Groundy covers this cluster with an engineering orientation: architecture decisions, production failure modes, benchmark methodology, and the governance questions—autonomy bounds, human-in-the-loop thresholds—that don’t show up in vendor benchmarks but matter most when agents run unattended.
The autonomy question is not just philosophical. Agents that can browse the web, execute code, push commits, and make API calls on your behalf are also attack surfaces. Prompt injection through tool outputs, malicious content retrieved by browsing agents, and supply-chain attacks on MCP server packages are all documented. This cluster covers the security dimensions of agent architecture alongside the engineering ones—because in 2026 you cannot separate them.
Featured in this cluster
Multi-Agent Coordination Protocols: When AI Agents Work Together
Multi-agent coordination protocols are standardized communication frameworks that enable autonomous AI agents to delegate tasks, share information, and resolve conflicts in distributed systems. These protocols are essential infrastructure for modern AI systems from autonomous vehicles to LLM-based agent frameworks.
CornerstoneHow AI Agents Remember: Memory Architectures That Work
AI agents use four distinct memory tiers—working, episodic, semantic, and procedural—stored across context windows, vector databases, knowledge graphs, and model weights. Choosing the right architecture determines whether your agent stays coherent across sessions or forgets everything the moment a conversation ends.
CornerstoneHow Much Autonomy Should AI Agents Have? A Framework for Trust
As AI agents gain real-world capabilities—browsing, coding, purchasing—the question of how much autonomy to grant these systems becomes critical. This article proposes the VERIFIED framework for determining appropriate trust levels.
CornerstoneMCP Is Everywhere: The Protocol That Connected AI to Everything
How the Model Context Protocol became the universal standard connecting AI assistants to data sources, tools, and enterprise systems—transforming isolated models into truly connected agents.
CornerstonePydantic AI vs LangChain: A Developer's Guide to the New Generation of Agent Frameworks
A comprehensive comparison of [Pydantic AI and LangChain](/articles/salesforce-tdx-2026-headless-360-ships-60-mcp-tools-and-agentforce-vibes/), exploring type safety, developer experience, and production readiness in modern Python AI agent frameworks.
Latest in Agents & Frameworks
Council Mode Cuts Multi-Agent LLM Hallucination 35.9% at 4.2x Token Cost on HaluEval
Council Mode routes queries through three frontier LLMs and a consensus model, cutting hallucinations 35.9% on HaluEval at 4.2x token cost. Major frameworks lack this pattern.
Salesforce TDX 2026: Headless 360 Ships 60+ MCP Tools and Agentforce Vibes 2.0 With Claude Sonnet 4.5
Salesforce TDX 2026 shipped 60+ MCP tools and a Claude-default IDE, collapsing wrapper value for LangGraph, CrewAI, and AutoGen while shifting to cross-MCP routing.
CrewAI 1.14.2 Lands Checkpoint TUI with Tree View, Fork Support, and Lineage Tracking
CrewAI 1.14.2 and 1.14.3 ship a checkpoint TUI with fork support and lineage tracking, making resumability a framework primitive for expensive multi-step agent pipelines.
LLM Agent for Iterative Chart Refinement Exposes a Logging Gap in CrewAI and AutoGen
An arxiv paper shows iterative chart agents need per-step rationale schemas that CrewAI and AG2 lack, while the token and storage cost of structured traces remains unmeasured.
Cloudflare Agents Week Moved Sandbox Execution, Private Networking, and Memory From Framework Code to Network Primitives
Cloudflare shipped four production primitives in April 2026 — Sandboxes GA, Mesh, Dynamic Workers, and Agent Memory — replacing infrastructure CrewAI, LangGraph, and AutoGen.
Frontier LLMs Fail Agentic Threat Hunting: Best Model Catches 3.8% of Malicious Events in 11-Model Benchmark
Simbian AI's benchmark tests 11 LLMs on raw Windows event log hunting; Claude Opus 4.6 leads at 0.55 coverage score while every other model clears zero of 13 ATT&CK tactics.
FSE 2026: Chain-of-Thought Fails Per-Bias as Debiasing; Axiomatic Cues Cut Sensitivity 51%
FSE 2026: chain-of-thought fails per-bias on PROBE-SWE SE tasks. Axiomatic cues cut bias sensitivity 51%, exposing gaps in CrewAI, LangChain, Pydantic AI defaults.
PROBE-SWE Finds Chain-of-Thought and Self-Debiasing Don't Reduce Prompt-Induced Bias in Coding Agents
PROBE-SWE (arXiv 2604.16756) finds chain-of-thought and self-debiasing fail to reduce prompt-induced cognitive bias in SE agents; axiomatic reasoning cues cut it 51%.
ACL 2026: Multi-Agent LLM Topologies Accelerate Premature Convergence; Adding Agents Makes It Worse
An ACL 2026 Findings paper shows dense communication topologies in [multi-agent LLM systems](/articles/neural-computers-symbolic-stability-failure-contradicts-the-case-for-pure/) accelerate premature convergence, meaning topology matters more than model strength.
'Beyond the Diff' Quantifies Agentic Entropy: Why AI Coding Agents Drift Across Iterations
A CHI 2026 paper formalizes agentic entropy as structural drift between agent actions and intent, showing why per-step benchmarks miss cumulative misalignment in long agent.
Diversity Collapse in Multi-Agent LLM Systems: Structural Coupling Breaks Open-Ended Idea Generation Even When Topologies Are Sparse
An ACL 2026 Findings paper finds multi-agent LLM brainstorming collapses because agents share models, prompts, and context, not because topologies are too dense.
Google's TPU 8i Targets Agentic Workloads. What CrewAI, LangGraph, and AutoGen Must Measure
Google's TPU 8i adds SRAM and a collectives engine for agentic workloads, yet CrewAI, LangGraph, and AutoGen lack the per-step latency and branch-utilization metrics needed.
Nous Research's Hermes Ships Persistent Memory and Auto-Skill Capture: CrewAI and AutoGen Must Reconsider
Hermes Agent bakes persistent memory and auto-skill capture into core, shifting comparison from orchestration to self-improvement. CrewAI has static skills; AutoGen is frozen.
OpenAI Responses API WebSocket Is Production-Ready; Pydantic AI, LangChain, and CrewAI Lack Adapters
OpenAI's Responses API WebSocket transport is production-ready as of April 2026, but Pydantic AI has only a pending PR and LangChain and CrewAI have no adapters.
A2A v1.0 Left Agent Discovery Blank: Why AAIF's 170-Member Standard Still Forces Every Enterprise to Build Its Own Governance Layer
A2A v1.0 defines Agent Cards but deliberately leaves registry, discovery, and governance infrastructure unspecified, forcing every enterprise to build its own.