Category

Agents & Frameworks

50 articles exploring Agents & Frameworks. Expert analysis and insights from our editorial team.

Showing 1–15 of 50 articles · Page 1 of 4

The agentic layer of AI is where the interesting engineering problems live. This cluster covers the full stack of autonomous AI systems: the orchestration frameworks that sequence tool calls, the memory architectures that let agents stay coherent across sessions, the trust and autonomy models that determine how much rope you give a running agent, and the emerging protocol standards that connect those agents to external systems.

On the protocol front, the Model Context Protocol has become the connective tissue of the modern AI stack. Groundy has tracked MCP from its initial Anthropic release through GitHub’s registry play, the Zed and JetBrains integrations, and the ACP (Agent Communication Protocol) effort that targets multi-agent coordination at the registry level. The MCP registry now hosts thousands of servers; the question has shifted from “will this standard win?” to “how do you audit what you’re connecting to?”

Framework choices are less settled. PydanticAI and LangChain represent genuinely different philosophies—type-safe structured outputs versus flexible chain composition—and the right answer depends heavily on whether you’re building a one-shot pipeline or a long-running autonomous process. CrewAI, AutoGen, and Superpowers sit further up the abstraction stack, enforcing workflow discipline on top of bare tool-use APIs.

Memory architecture is the underappreciated dimension. Hindsight, vector stores, and session management each solve a different failure mode—context overflow, retrieval hallucination, and cross-session amnesia are distinct problems with distinct solutions. The pattern of conflating them is one reason production agent deployments underperform their demos.

Groundy covers this cluster with an engineering orientation: architecture decisions, production failure modes, benchmark methodology, and the governance questions—autonomy bounds, human-in-the-loop thresholds—that don’t show up in vendor benchmarks but matter most when agents run unattended.

The autonomy question is not just philosophical. Agents that can browse the web, execute code, push commits, and make API calls on your behalf are also attack surfaces. Prompt injection through tool outputs, malicious content retrieved by browsing agents, and supply-chain attacks on MCP server packages are all documented. This cluster covers the security dimensions of agent architecture alongside the engineering ones—because in 2026 you cannot separate them.

Featured in this cluster

Cornerstone

Multi-Agent Coordination Protocols: When AI Agents Work Together

Multi-agent coordination protocols are standardized communication frameworks that enable autonomous AI agents to delegate tasks, share information, and resolve conflicts in distributed systems. These protocols are essential infrastructure for modern AI systems from autonomous vehicles to LLM-based agent frameworks.

· 9 min read
Cornerstone

How AI Agents Remember: Memory Architectures That Work

AI agents use four distinct memory tiers—working, episodic, semantic, and procedural—stored across context windows, vector databases, knowledge graphs, and model weights. Choosing the right architecture determines whether your agent stays coherent across sessions or forgets everything the moment a conversation ends.

· 9 min read
Cornerstone

How Much Autonomy Should AI Agents Have? A Framework for Trust

As AI agents gain real-world capabilities—browsing, coding, purchasing—the question of how much autonomy to grant these systems becomes critical. This article proposes the VERIFIED framework for determining appropriate trust levels.

· 12 min read
Cornerstone

MCP Is Everywhere: The Protocol That Connected AI to Everything

How the Model Context Protocol became the universal standard connecting AI assistants to data sources, tools, and enterprise systems—transforming isolated models into truly connected agents.

· 6 min read
Cornerstone

Pydantic AI vs LangChain: A Developer's Guide to the New Generation of Agent Frameworks

A comprehensive comparison of Pydantic AI and LangChain, exploring type safety, developer experience, and production readiness in modern Python AI agent frameworks.

· 8 min read

Latest in Agents & Frameworks

Newest first
01

A New Trust Schema Exposes Why Agent Skill Registries Fail Enterprise Audit Requirements

Metere's arXiv 2605.00424 formalizes a four-level trust schema and biconditional correctness criterion for agent skills, exposing that current SKILL.md-based registries.

02

Trojan Hippo Plants Dormant Payloads in Agent Memory, Hits 85-100% Exfiltration on Frontier Models

Trojan Hippo plants dormant payloads in agent memory via a single untrusted email, achieving 85-100% exfiltration ASR on frontier models after surviving 100 benign sessions.

03

CrewAI vs AutoGen vs LangGraph 2026: The Real Trade-Off After Maintenance Mode

AutoGen is in maintenance mode, so the 2026 choice is CrewAI vs LangGraph. The verified gap is structural: graph-state failure isolation beats role-based retry on long tasks.

04

FormulaCode's 957-Task Benchmark Catches Frontier Agents Failing at Real-Codebase Performance Optimization

FormulaCode finds frontier agents trail human experts at repo-scale optimization, exposing SWE-Bench's blind spot: passing patches that never verify real-world speedups.

05

LangGraph 1.2.0 Makes Error-Handler Resume Crash-Durable — With Conditions

LangGraph 1.2.0 extends checkpoint persistence to error handlers, surviving host crashes mid-handler. The guarantee requires Postgres, sync mode, and idempotent nodes —.

06

Spectral Analysis of LLM Agent Graphs Predicts Three Failure Modes: r=1.0, 0.5, and -1.0 on Qwen2.5

A new paper applies the successor representation to multi-agent LLM graphs, finding condition number perfectly predicts perturbation robustness (r_s=1.0) while spectral.

07

IFPV's Adversarial Cognitive Simulation Cuts Multi-Agent Operational Cost 41.7% Over Single-Step LLMs

IFPV pairs a multi-agent planner with a fine-tuned adversarial simulator, cutting operational cost 41.7% in ACTS and challenging agent frameworks to own plan verification.

08

Council Mode Cuts Multi-Agent LLM Hallucination 35.9% at 4.2x Token Cost on HaluEval

Council Mode routes queries through three frontier LLMs and a consensus model, cutting hallucinations 35.9% on HaluEval at 4.2x token cost. Major frameworks lack this pattern.

09

CrewAI 1.14.2 Lands Checkpoint TUI with Tree View, Fork Support, and Lineage Tracking

CrewAI 1.14.2 and 1.14.3 ship a checkpoint TUI with fork support and lineage tracking, making resumability a framework primitive for expensive multi-step agent pipelines.

10

LLM Agent for Iterative Chart Refinement Exposes a Logging Gap in CrewAI and AutoGen (see also logging gap in CrewAI)

An arxiv paper shows iterative chart agents need per-step rationale schemas that CrewAI and AG2 lack, while the token and storage cost of structured traces remains unmeasured.

11

Salesforce TDX 2026: Headless 360 Ships 60+ MCP Tools and Agentforce Vibes 2.0 With Claude Sonnet 4.5

Salesforce TDX 2026 shipped 60+ MCP tools and a Claude-default IDE, collapsing wrapper value for LangGraph, CrewAI, and AutoGen while shifting to cross-MCP routing.

12

Cloudflare Agents Week Moved Sandbox Execution, Private Networking, and Memory From Framework Code to Network Primitives

Cloudflare shipped four production primitives in April 2026 — Sandboxes GA, Mesh, Dynamic Workers, and Agent Memory — replacing infrastructure CrewAI, LangGraph, and AutoGen.

13

Frontier LLMs Fail Agentic Threat Hunting: Best Model Catches 3.8% of Malicious Events in 11-Model Benchmark

Simbian AI's benchmark tests 11 LLMs on raw Windows event log hunting; Claude Opus 4.6 leads at 0.55 coverage score while every other model clears zero of 13 ATT&CK tactics.

14

FSE 2026: Chain-of-Thought Fails Per-Bias as Debiasing; Axiomatic Cues Cut Sensitivity 51%

FSE 2026: chain-of-thought fails per-bias on PROBE-SWE SE tasks. Axiomatic cues cut bias sensitivity 51%, exposing gaps in CrewAI, LangChain, Pydantic AI defaults.

15

PROBE-SWE Finds Chain-of-Thought and Self-Debiasing Don't Reduce Prompt-Induced Bias in Coding Agents

PROBE-SWE (arXiv 2604.16756) finds chain-of-thought and self-debiasing fail to reduce prompt-induced cognitive bias in SE agents; axiomatic reasoning cues cut it 51%.

Explore More Categories

Discover insights across different technology domains.

Browse All Articles