<?xml version="1.0" encoding="UTF-8"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/"><channel><title>Groundy — Agents &amp; Frameworks</title><description>Independent comparisons of agent stacks and multi-agent designs, tracking the gap between framework marketing and the failure modes that show up under real workloads.</description><link>https://groundy.com/</link><language>en-us</language><atom:link href="https://groundy.com/category/agents-frameworks/rss.xml" rel="self" type="application/rss+xml"/><item><title>Multi-Agent LLM Coordination: Why Attention Steering Beats Full Broadcast</title><link>https://groundy.com/articles/multi-agent-llm-coordination-why-attention-steering-beats-full-broadcast/</link><guid isPermaLink="true">https://groundy.com/articles/multi-agent-llm-coordination-why-attention-steering-beats-full-broadcast/</guid><description>Multi-agent LLM systems that broadcast every message to every peer waste tokens and lose accuracy. Agent-Radar steers attention by relevance for 7.64-point gains.</description><pubDate>Fri, 29 May 2026 00:00:00 GMT</pubDate><dc:creator>Groundy Editorial</dc:creator><atom:updated>2026-05-29T00:00:00.000Z</atom:updated><category>multi-agent-systems</category><category>llm-routing</category><category>attention-steering</category><category>agent-communication</category><category>token-efficiency</category><category>message-topology</category><author>Groundy Editorial</author></item><item><title>DataClawBench: AI Agents Fail at Exploratory Financial Analysis Across 492 Tasks</title><link>https://groundy.com/articles/dataclawbench-ai-agents-fail-at-exploratory-financial-analysis-across-492-tasks/</link><guid isPermaLink="true">https://groundy.com/articles/dataclawbench-ai-agents-fail-at-exploratory-financial-analysis-across-492-tasks/</guid><description>DataClawBench finds eight frontier AI agents reliably fail at exploratory financial analysis across 492 tasks, breaking at hypothesis generation rather than query execution.</description><pubDate>Fri, 29 May 2026 00:00:00 GMT</pubDate><dc:creator>Groundy Editorial</dc:creator><atom:updated>2026-05-29T00:00:00.000Z</atom:updated><category>ai-agents</category><category>data-analysis</category><category>financial-analysis</category><category>llm-benchmarks</category><category>exploratory-analysis</category><category>dataclawbench</category><author>Groundy Editorial</author></item><item><title>Agentic RAG Has a Credit-Assignment Problem That Subgoaling Tries to Fix</title><link>https://groundy.com/articles/agentic-rag-has-a-credit-assignment-problem-that-subgoaling-tries-to-fix/</link><guid isPermaLink="true">https://groundy.com/articles/agentic-rag-has-a-credit-assignment-problem-that-subgoaling-tries-to-fix/</guid><description>APEX-Searcher splits agentic RAG into separate planning and retrieval training stages so teams can pinpoint whether a wrong answer came from a bad plan or a bad fetch.</description><pubDate>Fri, 29 May 2026 00:00:00 GMT</pubDate><dc:creator>Groundy Editorial</dc:creator><atom:updated>2026-05-29T00:00:00.000Z</atom:updated><category>rag</category><category>credit-assignment</category><category>agentic-rag</category><category>subgoaling</category><category>reinforcement-learning</category><category>retrieval-evaluation</category><author>Groundy Editorial</author></item><item><title>SkillOpt Treats Agent Skill Libraries as an Executive Scheduling Problem, Not a Memory Bank</title><link>https://groundy.com/articles/skillopt-treats-agent-skill-libraries-as-an-executive-scheduling-problem-not/</link><guid isPermaLink="true">https://groundy.com/articles/skillopt-treats-agent-skill-libraries-as-an-executive-scheduling-problem-not/</guid><description>SkillOpt treats agent skills as trainable state with deletion and budgeted edits, sweeping 52 of 52 benchmarks. Append-only registries in agent frameworks are a design error.</description><pubDate>Thu, 28 May 2026 00:00:00 GMT</pubDate><dc:creator>Groundy Editorial</dc:creator><atom:updated>2026-05-28T00:00:00.000Z</atom:updated><category>skill-optimization</category><category>agent-frameworks</category><category>skill-management</category><category>llm-agents</category><category>benchmark-results</category><category>skill-eviction</category><author>Groundy Editorial</author></item><item><title>Claude Code Dynamic Workflows: Spawning 100 Parallel Subagents on Opus 4.8</title><link>https://groundy.com/articles/claude-code-dynamic-workflows-spawning-100-parallel-subagents-on-opus/</link><guid isPermaLink="true">https://groundy.com/articles/claude-code-dynamic-workflows-spawning-100-parallel-subagents-on-opus/</guid><description>Dynamic workflows lets Claude Code run hundreds of parallel subagents in one session. Here is how map-reduce and fan-out patterns work on Opus 4.8.</description><pubDate>Thu, 28 May 2026 00:00:00 GMT</pubDate><dc:creator>Groundy Editorial</dc:creator><atom:updated>2026-05-28T00:00:00.000Z</atom:updated><category>claude-code</category><category>parallel-agents</category><category>dynamic-workflows</category><category>opus-4-8</category><category>agentic-coding</category><category>multi-agent</category><category>anthropic</category><author>Groundy Editorial</author></item><item><title>How Opus 4.8 Honesty Prevents Cascade Failures in Agentic Loops</title><link>https://groundy.com/articles/how-opus-4-8-honesty-prevents-cascade-failures-in-agentic-loops/</link><guid isPermaLink="true">https://groundy.com/articles/how-opus-4-8-honesty-prevents-cascade-failures-in-agentic-loops/</guid><description>Opus 4.8 flags uncertainties more often and makes fewer unsupported claims, reducing hallucinated API calls and memory drift in 100+ turn autonomous workflows.</description><pubDate>Thu, 28 May 2026 00:00:00 GMT</pubDate><dc:creator>Groundy Editorial</dc:creator><atom:updated>2026-05-28T00:00:00.000Z</atom:updated><category>claude</category><category>anthropic</category><category>opus-48</category><category>agentic-loops</category><category>hallucination</category><category>autonomous-agents</category><category>model-reliability</category><author>Groundy Editorial</author></item><item><title>Penetration Testing Multi-Agent LLM Systems: A Failure Catalog Vendors Don&apos;t Document</title><link>https://groundy.com/articles/penetration-testing-multi-agent-llm-systems-a-failure-catalog-vendors-dont/</link><guid isPermaLink="true">https://groundy.com/articles/penetration-testing-multi-agent-llm-systems-a-failure-catalog-vendors-dont/</guid><description>The first independent pen tests of proprietary agent deployments found preventable classical vulnerabilities, not novel AI flaws, compounding across multi-agent topologies.</description><pubDate>Wed, 27 May 2026 00:00:00 GMT</pubDate><dc:creator>Groundy Editorial</dc:creator><atom:updated>2026-05-27T00:00:00.000Z</atom:updated><category>multi-agent-security</category><category>penetration-testing</category><category>agent-frameworks</category><category>red-teaming</category><category>ai-safety</category><category>vulnerability-research</category><author>Groundy Editorial</author></item><item><title>Claude Code, Cursor, Copilot: How Agentic Coding Assistants Get Weaponized as Attacker Shells</title><link>https://groundy.com/articles/claude-code-cursor-copilot-how-agentic-coding-assistants-get-weaponized/</link><guid isPermaLink="true">https://groundy.com/articles/claude-code-cursor-copilot-how-agentic-coding-assistants-get-weaponized/</guid><description>Indirect prompt injection through repo artifacts turns coding agents into attacker shells, exploiting the file-write and shell privileges agents already hold.</description><pubDate>Wed, 27 May 2026 00:00:00 GMT</pubDate><dc:creator>Groundy Editorial</dc:creator><atom:updated>2026-05-27T00:00:00.000Z</atom:updated><category>prompt-injection</category><category>coding-agents</category><category>supply-chain-security</category><category>agent-security</category><category>developer-tools</category><category>sandboxing</category><author>Groundy Editorial</author></item><item><title>Claude Code Configs in the Wild: New Study Maps How Developers Actually Use It</title><link>https://groundy.com/articles/claude-code-configs-in-the-wild-new-study-maps-how-developers-actually-use/</link><guid isPermaLink="true">https://groundy.com/articles/claude-code-configs-in-the-wild-new-study-maps-how-developers-actually-use/</guid><description>Two studies analyzing 581 CLAUDE.md files find developers favor shallow, architecture-first configs, revealing a gap between Anthropic&apos;s guidance and actual practice.</description><pubDate>Wed, 27 May 2026 00:00:00 GMT</pubDate><dc:creator>Groundy Editorial</dc:creator><atom:updated>2026-05-28T00:00:00.000Z</atom:updated><category>claude-code</category><category>ai-coding-agents</category><category>developer-tools</category><category>configuration-management</category><category>software-engineering</category><category>anthropic</category><author>Groundy Editorial</author></item><item><title>Microsoft Bolts Governance Onto Agent Framework as Stack Sprawl Persists</title><link>https://groundy.com/articles/microsoft-bolts-governance-onto-agent-framework-as-stack-sprawl-persists/</link><guid isPermaLink="true">https://groundy.com/articles/microsoft-bolts-governance-onto-agent-framework-as-stack-sprawl-persists/</guid><description>Microsoft&apos;s Agent Framework governance additions address auditability but not six-surface sprawl, while Google and AWS each offer one framework mapped to one runtime.</description><pubDate>Tue, 26 May 2026 00:00:00 GMT</pubDate><dc:creator>Groundy Editorial</dc:creator><atom:updated>2026-05-26T00:00:00.000Z</atom:updated><category>agent-frameworks</category><category>microsoft-agent-framework</category><category>agent-governance</category><category>owasp</category><category>fides</category><category>azure-agents</category><author>Groundy Editorial</author></item><item><title>GovernSpec Contractual Skills Make Agent Governance Auditable Before Runtime</title><link>https://groundy.com/articles/governspec-contractual-skills-make-agent-governance-auditable-before-runtime/</link><guid isPermaLink="true">https://groundy.com/articles/governspec-contractual-skills-make-agent-governance-auditable-before-runtime/</guid><description>GovernSpec contractual skills move governance declarations into SKILL.md contracts before agents run. Auditors get checkable artifacts. Runtime guardrails remain mandatory.</description><pubDate>Tue, 26 May 2026 00:00:00 GMT</pubDate><dc:creator>Groundy Editorial</dc:creator><atom:updated>2026-05-26T00:00:00.000Z</atom:updated><category>agent-governance</category><category>contractual-skills</category><category>governspec</category><category>formal-verification</category><category>ai-agents</category><category>compliance-audit</category><author>Groundy Editorial</author></item><item><title>Indirect Prompt Injection Benchmarks Were Too Easy: LivePI Adds Realism</title><link>https://groundy.com/articles/indirect-prompt-injection-benchmarks-were-too-easy-livepi-adds-realism/</link><guid isPermaLink="true">https://groundy.com/articles/indirect-prompt-injection-benchmarks-were-too-easy-livepi-adds-realism/</guid><description>LivePI replaces static prompt-injection benchmarks with live multi-surface attacks on a real VM, reporting 10.7 to 29.6 percent success rates across five frontier models.</description><pubDate>Tue, 26 May 2026 00:00:00 GMT</pubDate><dc:creator>Groundy Editorial</dc:creator><atom:updated>2026-05-28T00:00:00.000Z</atom:updated><category>prompt-injection</category><category>agent-security</category><category>ai-benchmarks</category><category>llm-agents</category><category>red-teaming</category><category>adversarial-attacks</category><author>Groundy Editorial</author></item><item><title>Routing LLM Agents: Why TwinRouterBench Splits Static and Live Evaluation</title><link>https://groundy.com/articles/routing-llm-agents-why-twinrouterbench-splits-static-and-live-evaluation/</link><guid isPermaLink="true">https://groundy.com/articles/routing-llm-agents-why-twinrouterbench-splits-static-and-live-evaluation/</guid><description>TwinRouterBench pairs 970-prefix static scoring with live SWE-bench runs to expose why per-step router accuracy fails to predict end-to-end agent success.</description><pubDate>Tue, 26 May 2026 00:00:00 GMT</pubDate><dc:creator>Groundy Editorial</dc:creator><atom:updated>2026-05-26T00:00:00.000Z</atom:updated><category>llm-routing</category><category>agent-frameworks</category><category>benchmark-evaluation</category><category>swe-bench</category><category>langgraph</category><category>multi-model-routing</category><author>Groundy Editorial</author></item><item><title>SpecBench Exposes Reward Hacking in Long-Horizon Coding Agents</title><link>https://groundy.com/articles/specbench-exposes-reward-hacking-in-long-horizon-coding-agents/</link><guid isPermaLink="true">https://groundy.com/articles/specbench-exposes-reward-hacking-in-long-horizon-coding-agents/</guid><description>SpecBench quantifies a 28-point reward-hacking gap per 10x code-size increase, proving passing test suites are unreliable correctness signals for autonomous coding agents.</description><pubDate>Sat, 23 May 2026 00:00:00 GMT</pubDate><dc:creator>Groundy Editorial</dc:creator><atom:updated>2026-05-23T00:00:00.000Z</atom:updated><category>reward-hacking</category><category>coding-agents</category><category>llm-benchmarks</category><category>ci-cd</category><category>agentic-coding</category><category>test-evaluation</category><author>Groundy Editorial</author></item><item><title>GraphFlow Lifts LLM-Agent Workflows Into Schedulable Graphs to Optimize Serving</title><link>https://groundy.com/articles/graphflow-lifts-llm-agent-workflows-into-schedulable-graphs-to-optimize-serving/</link><guid isPermaLink="true">https://groundy.com/articles/graphflow-lifts-llm-agent-workflows-into-schedulable-graphs-to-optimize-serving/</guid><description>GraphFlow turns agent workflows into declarative graphs the serving runtime can batch and reorder, exposing a serving-optimization gap in LangGraph, CrewAI, and AutoGen.</description><pubDate>Sat, 23 May 2026 00:00:00 GMT</pubDate><dc:creator>Groundy Editorial</dc:creator><atom:updated>2026-05-23T00:00:00.000Z</atom:updated><category>graphflow</category><category>llm-serving</category><category>agent-orchestration</category><category>kv-cache</category><category>workflow-scheduling</category><category>inference-optimization</category><author>Groundy Editorial</author></item><item><title>Learning to Configure Agentic AI Systems Exposes a Gap in CrewAI and AutoGen Template Libraries</title><link>https://groundy.com/articles/learning-to-configure-agentic-ai-systems-exposes-a-gap-in-crewai-and-autogen/</link><guid isPermaLink="true">https://groundy.com/articles/learning-to-configure-agentic-ai-systems-exposes-a-gap-in-crewai-and-autogen/</guid><description>ARC proves learned per-query agent configuration beats static templates by 31% reasoning and 2x τ-Bench, forcing CrewAI and AutoGen to compete on declarative config surfaces.</description><pubDate>Sat, 23 May 2026 00:00:00 GMT</pubDate><dc:creator>Groundy Editorial</dc:creator><atom:updated>2026-05-23T00:00:00.000Z</atom:updated><category>agent-configuration</category><category>agentic-frameworks</category><category>arc</category><category>crewai</category><category>autogen</category><category>langgraph</category><author>Groundy Editorial</author></item><item><title>Microsoft&apos;s 2026 Cost Math Forces CrewAI and LangGraph Users to Audit Token Spend Per Agent</title><link>https://groundy.com/articles/microsofts-2026-cost-math-forces-crewai-and-langgraph-users-to-audit-token/</link><guid isPermaLink="true">https://groundy.com/articles/microsofts-2026-cost-math-forces-crewai-and-langgraph-users-to-audit-token/</guid><description>Microsoft&apos;s accounting reveals per-agent token bills now exceed engineer salaries. CrewAI, LangGraph, and AutoGen lack the per-step cost attribution enterprises will soon.</description><pubDate>Sat, 23 May 2026 00:00:00 GMT</pubDate><dc:creator>Groundy Editorial</dc:creator><atom:updated>2026-05-23T00:00:00.000Z</atom:updated><category>agent-frameworks</category><category>token-cost</category><category>observability</category><category>multi-agent</category><category>cost-attribution</category><category>enterprise-ai</category><author>Groundy Editorial</author></item><item><title>PBT-Bench Asks Whether AI Coding Agents Can Actually Write Property-Based Tests</title><link>https://groundy.com/articles/pbt-bench-asks-whether-ai-coding-agents-can-actually-write-property-based-tests/</link><guid isPermaLink="true">https://groundy.com/articles/pbt-bench-asks-whether-ai-coding-agents-can-actually-write-property-based-tests/</guid><description>PBT-Bench reveals the best AI coding agent catches only 83.4% of semantic bugs with property-based tests, showing SWE-Bench QA claims measure the wrong testing paradigm.</description><pubDate>Sat, 23 May 2026 00:00:00 GMT</pubDate><dc:creator>Groundy Editorial</dc:creator><atom:updated>2026-05-28T00:00:00.000Z</atom:updated><category>property-based-testing</category><category>coding-agents</category><category>swebench</category><category>ai-testing</category><category>hypothesis-framework</category><category>reward-hacking</category><category>software-quality</category><author>Groundy Editorial</author></item><item><title>SpecBench Catches Long-Horizon Coding Agents Gaming Reward Signals</title><link>https://groundy.com/articles/specbench-catches-long-horizon-coding-agents-gaming-reward-signals/</link><guid isPermaLink="true">https://groundy.com/articles/specbench-catches-long-horizon-coding-agents-gaming-reward-signals/</guid><description>SpecBench exposes a 28 pp scaling coefficient in reward hacking for long-horizon coding agents, revealing gaps that SWE-bench-style leaderboards completely miss.</description><pubDate>Sat, 23 May 2026 00:00:00 GMT</pubDate><dc:creator>Groundy Editorial</dc:creator><atom:updated>2026-05-23T00:00:00.000Z</atom:updated><category>reward-hacking</category><category>coding-agents</category><category>benchmarks</category><category>spec-faithfulness</category><category>swebench</category><category>autonomous-coding</category><author>Groundy Editorial</author></item><item><title>Beyond Text-to-SQL: New Agentic Architecture Routes Enterprise Analytics Through Governed APIs</title><link>https://groundy.com/articles/beyond-text-to-sql-new-agentic-architecture-routes-enterprise-analytics-through/</link><guid isPermaLink="true">https://groundy.com/articles/beyond-text-to-sql-new-agentic-architecture-routes-enterprise-analytics-through/</guid><description>A May 2026 arXiv paper argues governed API contracts should replace SQL for LLM analytics, moving security and lineage from SQL rewrites to a stable boundary layer.</description><pubDate>Sat, 23 May 2026 00:00:00 GMT</pubDate><dc:creator>Groundy Editorial</dc:creator><atom:updated>2026-05-23T00:00:00.000Z</atom:updated><category>text-to-sql</category><category>agentic-systems</category><category>data-governance</category><category>enterprise-analytics</category><category>llm-agents</category><category>api-contracts</category><category>analytics-apis</category><author>Groundy Editorial</author></item><item><title>AI Agents That Learn New Skills Without a Human Curator</title><link>https://groundy.com/articles/solar-frames-lifelong-learning-agents-as-self-optimizing-skipping-the-human/</link><guid isPermaLink="true">https://groundy.com/articles/solar-frames-lifelong-learning-agents-as-self-optimizing-skipping-the-human/</guid><description>SOLAR removes the supervisor-agent curation gate from skill acquisition, but SpecBench shows reward hacking scales with complexity, shifting the bottleneck to rollback and.</description><pubDate>Sat, 23 May 2026 00:00:00 GMT</pubDate><dc:creator>Groundy Editorial</dc:creator><atom:updated>2026-05-23T00:00:00.000Z</atom:updated><category>solar-agent</category><category>lifelong-learning</category><category>reward-hacking</category><category>agent-frameworks</category><category>skill-curation</category><category>meta-learning</category><author>Groundy Editorial</author></item><item><title>Trojan Hippo Plants Dormant Payloads in Agent Memory, Hits 85-100% Exfiltration on Frontier Models</title><link>https://groundy.com/articles/trojan-hippo-plants-dormant-payloads-in-agent-memory-hits-85-100-exfiltration/</link><guid isPermaLink="true">https://groundy.com/articles/trojan-hippo-plants-dormant-payloads-in-agent-memory-hits-85-100-exfiltration/</guid><description>Trojan Hippo plants dormant payloads in agent memory via a single untrusted email, achieving 85-100% exfiltration ASR on frontier models after surviving 100 benign sessions.</description><pubDate>Tue, 19 May 2026 00:00:00 GMT</pubDate><dc:creator>Groundy Editorial</dc:creator><atom:updated>2026-05-19T00:00:00.000Z</atom:updated><category>agent-memory</category><category>llm-security</category><category>prompt-injection</category><category>rag</category><category>data-exfiltration</category><category>memory-attacks</category><category>agent-frameworks</category><author>Groundy Editorial</author></item><item><title>A New Trust Schema Exposes Why Agent Skill Registries Fail Enterprise Audit Requirements</title><link>https://groundy.com/articles/a-new-trust-schema-exposes-why-agent-skill-registries-fail-enterprise-audit/</link><guid isPermaLink="true">https://groundy.com/articles/a-new-trust-schema-exposes-why-agent-skill-registries-fail-enterprise-audit/</guid><description>Metere&apos;s arXiv 2605.00424 formalizes a four-level trust schema and biconditional correctness criterion for agent skills, exposing that current SKILL.md-based registries.</description><pubDate>Tue, 19 May 2026 00:00:00 GMT</pubDate><dc:creator>Groundy Editorial</dc:creator><atom:updated>2026-05-19T00:00:00.000Z</atom:updated><category>agent-security</category><category>skill-registries</category><category>hitl-agents</category><category>trust-verification</category><category>supply-chain-security</category><category>agent-frameworks</category><author>Groundy Editorial</author></item><item><title>LangGraph 1.2.0 Makes Error-Handler Resume Crash-Durable: With Conditions</title><link>https://groundy.com/articles/langgraph-1-2-0-makes-error-handler-resume-crash-durable-with-conditions/</link><guid isPermaLink="true">https://groundy.com/articles/langgraph-1-2-0-makes-error-handler-resume-crash-durable-with-conditions/</guid><description>LangGraph 1.2.0 extends checkpoint persistence to error handlers, surviving host crashes mid-handler. The guarantee requires Postgres, sync mode, and idempotent nodes.</description><pubDate>Mon, 18 May 2026 00:00:00 GMT</pubDate><dc:creator>Groundy Editorial</dc:creator><atom:updated>2026-05-18T00:00:00.000Z</atom:updated><category>langgraph</category><category>agent-frameworks</category><category>checkpointing</category><category>durable-execution</category><category>crewai</category><category>cloudflare-workers</category><author>Groundy Editorial</author></item><item><title>CrewAI vs AutoGen vs LangGraph 2026: The Real Trade-Off After Maintenance Mode</title><link>https://groundy.com/articles/crewai-vs-autogen-vs-langgraph-2026-the-real-trade-off-after-maintenance-mode/</link><guid isPermaLink="true">https://groundy.com/articles/crewai-vs-autogen-vs-langgraph-2026-the-real-trade-off-after-maintenance-mode/</guid><description>AutoGen is in maintenance mode, so the 2026 choice is CrewAI vs LangGraph. The verified gap is structural: graph-state failure isolation beats role-based retry on long tasks.</description><pubDate>Mon, 18 May 2026 00:00:00 GMT</pubDate><dc:creator>Groundy Editorial</dc:creator><atom:updated>2026-05-18T00:00:00.000Z</atom:updated><category>agents-frameworks</category><category>langgraph</category><category>crewai</category><category>autogen</category><category>multi-agent</category><category>benchmarking</category><category>failure-modes</category><author>Groundy Editorial</author></item><item><title>FormulaCode&apos;s 957-Task Benchmark Catches Frontier Agents Failing at Real-Codebase Performance Optimization</title><link>https://groundy.com/articles/formulacodes-957-task-benchmark-catches-frontier-agents-failing-at-real/</link><guid isPermaLink="true">https://groundy.com/articles/formulacodes-957-task-benchmark-catches-frontier-agents-failing-at-real/</guid><description>FormulaCode finds frontier agents trail human experts at repo-scale optimization, exposing SWE-Bench&apos;s blind spot: passing patches that never verify real-world speedups.</description><pubDate>Mon, 18 May 2026 00:00:00 GMT</pubDate><dc:creator>Groundy Editorial</dc:creator><atom:updated>2026-05-28T00:00:00.000Z</atom:updated><category>agents-frameworks</category><category>llm-benchmarks</category><category>swe-bench</category><category>performance-optimization</category><category>ai-coding-agents</category><category>formulacode</category><category>icml-2026</category><author>Groundy Editorial</author></item><item><title>Spectral Analysis of LLM Agent Graphs Predicts Three Failure Modes: r=1.0, 0.5, and -1.0 on Qwen2.5</title><link>https://groundy.com/articles/spectral-analysis-of-llm-agent-graphs-predicts-three-failure-modes/</link><guid isPermaLink="true">https://groundy.com/articles/spectral-analysis-of-llm-agent-graphs-predicts-three-failure-modes/</guid><description>A new paper applies the successor representation to multi-agent LLM graphs, finding condition number perfectly predicts perturbation robustness (r_s=1.0) while spectral.</description><pubDate>Mon, 18 May 2026 00:00:00 GMT</pubDate><dc:creator>Groundy Editorial</dc:creator><atom:updated>2026-05-18T00:00:00.000Z</atom:updated><category>multi-agent</category><category>agents-frameworks</category><category>spectral-analysis</category><category>llm-topology</category><category>crewai</category><category>autogen</category><category>graph-theory</category><author>Groundy Editorial</author></item><item><title>IFPV&apos;s Adversarial Cognitive Simulation Cuts Multi-Agent Operational Cost 41.7% Over Single-Step LLMs</title><link>https://groundy.com/articles/ifpvs-adversarial-cognitive-simulation-cuts-multi-agent-operational-cost/</link><guid isPermaLink="true">https://groundy.com/articles/ifpvs-adversarial-cognitive-simulation-cuts-multi-agent-operational-cost/</guid><description>IFPV pairs a multi-agent planner with a fine-tuned adversarial simulator, cutting operational cost 41.7% in ACTS and challenging agent frameworks to own plan verification.</description><pubDate>Sun, 17 May 2026 00:00:00 GMT</pubDate><dc:creator>Groundy Editorial</dc:creator><atom:updated>2026-05-17T00:00:00.000Z</atom:updated><category>multi-agent</category><category>adversarial-simulation</category><category>agent-frameworks</category><category>langgraph</category><category>plan-verification</category><category>llm-planning</category><category>autogen</category><author>Groundy Editorial</author></item><item><title>LLM Agent for Iterative Chart Refinement Exposes a Logging Gap in CrewAI and AutoGen (see also logging gap in CrewAI)</title><link>https://groundy.com/articles/llm-agent-for-iterative-chart-refinement-exposes-a-logging-gap-in-crewai/</link><guid isPermaLink="true">https://groundy.com/articles/llm-agent-for-iterative-chart-refinement-exposes-a-logging-gap-in-crewai/</guid><description>An arxiv paper shows iterative chart agents need per-step rationale schemas that CrewAI and AG2 lack, while the token and storage cost of structured traces remains unmeasured.</description><pubDate>Wed, 29 Apr 2026 00:00:00 GMT</pubDate><dc:creator>Groundy Editorial</dc:creator><atom:updated>2026-04-29T00:00:00.000Z</atom:updated><category>agents-frameworks</category><category>iterative-refinement</category><category>observability</category><category>crewai</category><category>autogen</category><category>data-visualization</category><category>llm-agents</category><author>Groundy Editorial</author></item><item><title>CrewAI 1.14.2 Lands Checkpoint TUI with Tree View, Fork Support, and Lineage Tracking</title><link>https://groundy.com/articles/crewai-1-14-2-lands-checkpoint-tui-with-tree-view-fork-support-and-lineage/</link><guid isPermaLink="true">https://groundy.com/articles/crewai-1-14-2-lands-checkpoint-tui-with-tree-view-fork-support-and-lineage/</guid><description>CrewAI 1.14.2 and 1.14.3 ship a checkpoint TUI with fork support and lineage tracking, making resumability a framework primitive for expensive multi-step agent pipelines.</description><pubDate>Wed, 29 Apr 2026 00:00:00 GMT</pubDate><dc:creator>Groundy Editorial</dc:creator><atom:updated>2026-04-29T00:00:00.000Z</atom:updated><category>crewai</category><category>checkpointing</category><category>multi-agent</category><category>langgraph</category><category>agent-orchestration</category><category>dev-tools</category><author>Groundy Editorial</author></item><item><title>Council Mode Cuts Multi-Agent LLM Hallucination 35.9% at 4.2x Token Cost on HaluEval</title><link>https://groundy.com/articles/council-mode-cuts-multi-agent-llm-hallucination-35-9-at-4-2x-token-cost/</link><guid isPermaLink="true">https://groundy.com/articles/council-mode-cuts-multi-agent-llm-hallucination-35-9-at-4-2x-token-cost/</guid><description>Council Mode routes queries through three frontier LLMs and a consensus model, cutting hallucinations 35.9% on HaluEval at 4.2x token cost. Major frameworks lack this pattern.</description><pubDate>Wed, 29 Apr 2026 00:00:00 GMT</pubDate><dc:creator>Groundy Editorial</dc:creator><atom:updated>2026-05-28T00:00:00.000Z</atom:updated><category>multi-agent-consensus</category><category>llm-hallucination</category><category>council-mode</category><category>crewai</category><category>autogen</category><category>langgraph</category><category>token-cost</category><author>Groundy Editorial</author></item><item><title>Salesforce TDX 2026: Headless 360 Ships 60+ MCP Tools and Agentforce Vibes 2.0 With Claude Sonnet 4.5</title><link>https://groundy.com/articles/salesforce-tdx-2026-headless-360-ships-60-mcp-tools-and-agentforce-vibes/</link><guid isPermaLink="true">https://groundy.com/articles/salesforce-tdx-2026-headless-360-ships-60-mcp-tools-and-agentforce-vibes/</guid><description>Salesforce TDX 2026 shipped 60+ MCP tools and a Claude-default IDE, collapsing wrapper value for LangGraph, CrewAI, and AutoGen while shifting to cross-MCP routing.</description><pubDate>Wed, 29 Apr 2026 00:00:00 GMT</pubDate><dc:creator>Groundy Editorial</dc:creator><atom:updated>2026-05-18T00:00:00.000Z</atom:updated><category>salesforce</category><category>mcp-tools</category><category>agentforce</category><category>langgraph</category><category>crewai</category><category>autogen</category><category>agentic-orchestration</category><author>Groundy Editorial</author></item><item><title>Cloudflare Agents Week Moved Sandbox Execution, Private Networking, and Memory to Network Primitives</title><link>https://groundy.com/articles/cloudflare-agents-week-moved-sandbox-execution-private-networking-and-memory/</link><guid isPermaLink="true">https://groundy.com/articles/cloudflare-agents-week-moved-sandbox-execution-private-networking-and-memory/</guid><description>Cloudflare shipped four production primitives in April 2026, Sandboxes GA, Mesh, Dynamic Workers, and Agent Memory, replacing infrastructure CrewAI, LangGraph, and AutoGen.</description><pubDate>Fri, 24 Apr 2026 00:00:00 GMT</pubDate><dc:creator>Groundy Editorial</dc:creator><atom:updated>2026-05-26T00:00:00.000Z</atom:updated><category>cloudflare</category><category>agents-frameworks</category><category>ai-infrastructure</category><category>sandboxes</category><category>multi-agent</category><author>Groundy Editorial</author></item><item><title>Diversity Collapse in Multi-Agent LLM Systems: Structural Coupling, Not Topology, Breaks Open-Ended Ideation</title><link>https://groundy.com/articles/diversity-collapse-in-multi-agent-llm-systems-structural-coupling-breaks-open/</link><guid isPermaLink="true">https://groundy.com/articles/diversity-collapse-in-multi-agent-llm-systems-structural-coupling-breaks-open/</guid><description>An ACL 2026 Findings paper finds multi-agent LLM brainstorming collapses because agents share models, prompts, and context, not because topologies are too dense.</description><pubDate>Thu, 23 Apr 2026 00:00:00 GMT</pubDate><dc:creator>Groundy Editorial</dc:creator><atom:updated>2026-05-26T00:00:00.000Z</atom:updated><category>multi-agent</category><category>diversity-collapse</category><category>structural-coupling</category><category>agent-frameworks</category><category>crewai</category><category>autogen</category><category>langgraph</category><author>Groundy Editorial</author></item><item><title>Nous Research&apos;s Hermes Ships Persistent Memory and Auto-Skill Capture: CrewAI and AutoGen Must Reconsider</title><link>https://groundy.com/articles/nous-researchs-hermes-agent-ships-persistent-memory-and-auto-skill-capture-in/</link><guid isPermaLink="true">https://groundy.com/articles/nous-researchs-hermes-agent-ships-persistent-memory-and-auto-skill-capture-in/</guid><description>Hermes Agent bakes persistent memory and auto-skill capture into core, shifting comparison from orchestration to self-improvement. CrewAI has static skills; AutoGen is frozen.</description><pubDate>Thu, 23 Apr 2026 00:00:00 GMT</pubDate><dc:creator>Groundy Editorial</dc:creator><atom:updated>2026-05-26T00:00:00.000Z</atom:updated><category>hermes-agent</category><category>crewai</category><category>autogen</category><category>agent-memory</category><category>auto-skills</category><category>self-improving-agents</category><author>Groundy Editorial</author></item><item><title>ml-intern&apos;s 32% GPQA Gain on One H100 Exposes the Assumption That Post-Training Still Needs a Human Researcher</title><link>https://groundy.com/articles/ml-interns-32-gpqa-gain-on-a-single-h100-exposes-the-assumption-that-post/</link><guid isPermaLink="true">https://groundy.com/articles/ml-interns-32-gpqa-gain-on-a-single-h100-exposes-the-assumption-that-post/</guid><description>ml-intern hit 32% on GPQA in under 10 hours, beating Claude Code&apos;s 22.99% on the same task, but a 51% instruction-tuned ceiling marks what the autonomous loop cannot close.</description><pubDate>Wed, 22 Apr 2026 00:00:00 GMT</pubDate><dc:creator>Groundy Editorial</dc:creator><atom:updated>2026-05-28T00:00:00.000Z</atom:updated><category>post-training</category><category>autonomous-agents</category><category>benchmarks</category><category>smolagents</category><category>gpqa</category><category>grpo</category><category>reward-hacking</category><author>Groundy Editorial</author></item><item><title>AI Agents That Actually Learn: The Architecture Behind Hindsight Memory</title><link>https://groundy.com/articles/ai-agents-that-actually-learn-architecture-behind-hindsight/</link><guid isPermaLink="true">https://groundy.com/articles/ai-agents-that-actually-learn-architecture-behind-hindsight/</guid><description>Hindsight by vectorize-io is an open-source agent memory system that replaces stateless retrieval with structured, time-aware memory networks, achieving 91.4% on LongMemEval and showing what genuine agent learning looks like at the architecture level.</description><pubDate>Sun, 15 Mar 2026 00:00:00 GMT</pubDate><dc:creator>Groundy Editorial</dc:creator><atom:updated>2026-05-18T00:00:00.000Z</atom:updated><category>ai-engineering</category><category>agents</category><category>memory</category><author>Groundy Editorial</author></item><item><title>InsForge: The Backend Framework Built for Agentic Applications</title><link>https://groundy.com/articles/insforge-backend-framework-built-specifically-agentic/</link><guid isPermaLink="true">https://groundy.com/articles/insforge-backend-framework-built-specifically-agentic/</guid><description>InsForge is a backend-as-a-service platform purpose-built for AI coding agents, delivering 1.6x faster task completion and 2.4x fewer tokens than Supabase.</description><pubDate>Fri, 27 Mar 2026 00:00:00 GMT</pubDate><dc:creator>Groundy Editorial</dc:creator><atom:updated>2026-05-28T00:00:00.000Z</atom:updated><category>ai-engineering</category><category>backend</category><category>frameworks</category><author>Groundy Editorial</author></item><item><title>Superpowers: The Agentic Framework Replacing Your Dev Process</title><link>https://groundy.com/articles/superpowers-agentic-framework-replacing-your-dev/</link><guid isPermaLink="true">https://groundy.com/articles/superpowers-agentic-framework-replacing-your-dev/</guid><description>Superpowers is an open-source agentic skills framework by Jesse Vincent that enforces structured software development workflows (brainstorming, planning, TDD, and subagent coordination) on top of AI coding agents like Claude Code, turning them from reactive assistants into disciplined developers capable of autonomous multi-hour sessions.</description><pubDate>Sat, 28 Feb 2026 00:00:00 GMT</pubDate><dc:creator>Groundy Editorial</dc:creator><atom:updated>2026-05-17T00:00:00.000Z</atom:updated><category>ai-agents</category><category>frameworks</category><author>Groundy Editorial</author></item><item><title>How AI Agents Remember: Memory Architectures That Work</title><link>https://groundy.com/articles/how-ai-agents-remember-memory-architectures-that/</link><guid isPermaLink="true">https://groundy.com/articles/how-ai-agents-remember-memory-architectures-that/</guid><description>AI agents use four memory tiers across context windows, vector DBs, knowledge graphs, and model weights. Architecture choice determines session coherence or full reset.</description><pubDate>Fri, 27 Feb 2026 00:00:00 GMT</pubDate><dc:creator>Groundy Editorial</dc:creator><atom:updated>2026-05-28T00:00:00.000Z</atom:updated><category>ai-agents</category><category>memory</category><category>context-windows</category><category>llm-architecture</category><category>agents-frameworks</category><author>Groundy Editorial</author></item><item><title>CrewAI vs AutoGen: A Developer&apos;s Guide to Multi-Agent AI Frameworks</title><link>https://groundy.com/articles/crewai-vs-autogen-developers-guide/</link><guid isPermaLink="true">https://groundy.com/articles/crewai-vs-autogen-developers-guide/</guid><description>Comparing CrewAI and Microsoft&apos;s AutoGen for multi-agent AI: architecture, code examples, ergonomics, and which framework fits production deployments in 2026.</description><pubDate>Thu, 12 Feb 2026 00:00:00 GMT</pubDate><dc:creator>Groundy Editorial</dc:creator><atom:updated>2026-05-28T00:00:00.000Z</atom:updated><category>ai</category><category>multi-agent</category><category>crewai</category><category>autogen</category><category>python</category><category>agent-orchestration</category><author>Groundy Editorial</author></item><item><title>Function Calling Best Practices: LLMs That Actually Use APIs Correctly</title><link>https://groundy.com/articles/function-calling-best-practices-llms-that-actually-use-apis/</link><guid isPermaLink="true">https://groundy.com/articles/function-calling-best-practices-llms-that-actually-use-apis/</guid><description>How to make LLM function calling reliable in production: schema design, structured outputs, error handling, and validation patterns that prevent hallucinated parameters.</description><pubDate>Wed, 18 Feb 2026 00:00:00 GMT</pubDate><dc:creator>Groundy Editorial</dc:creator><atom:updated>2026-05-28T00:00:00.000Z</atom:updated><category>ai-engineering</category><category>apis</category><category>best-practices</category><category>tools</category><category>claude</category><author>Groundy Editorial</author></item><item><title>How to Build Your First Autonomous Coding Agent with OpenHands SDK</title><link>https://groundy.com/articles/openhands-autonomous-coding/</link><guid isPermaLink="true">https://groundy.com/articles/openhands-autonomous-coding/</guid><description>A comprehensive guide to building production-ready autonomous coding agents using the OpenHands Software Agent SDK, covering architecture, deployment options, and practical implementation.</description><pubDate>Wed, 11 Feb 2026 00:00:00 GMT</pubDate><dc:creator>Groundy Editorial</dc:creator><atom:updated>2026-05-21T00:00:00.000Z</atom:updated><category>OpenHands</category><category>AI</category><category>coding-agents</category><category>SDK</category><category>automation</category><category>machine-learning</category><category>software-engineering</category><author>Groundy Editorial</author></item><item><title>Pydantic AI vs LangChain: A Developer&apos;s Guide to the New Generation of Agent Frameworks</title><link>https://groundy.com/articles/pydantic-ai-vs-langchain/</link><guid isPermaLink="true">https://groundy.com/articles/pydantic-ai-vs-langchain/</guid><description>A practical comparison of Pydantic AI and LangChain on type safety, developer experience, and production readiness for Python AI agent frameworks.</description><pubDate>Wed, 11 Feb 2026 00:00:00 GMT</pubDate><dc:creator>Groundy Editorial</dc:creator><atom:updated>2026-05-24T00:00:00.000Z</atom:updated><category>pydantic-ai</category><category>langchain</category><category>ai-agents</category><category>python</category><category>agent-frameworks</category><category>type-safety</category><category>developer-experience</category><author>Groundy Editorial</author></item><item><title>Are AI-Generated PRs Killing Open Source?</title><link>https://groundy.com/articles/are-ai-generated-prs-killing-open-source/</link><guid isPermaLink="true">https://groundy.com/articles/are-ai-generated-prs-killing-open-source/</guid><description>How open source projects can use AI contributions without drowning in low-quality noise, through the lens of Mitchell Hashimoto&apos;s Vouch system and the maintainer crisis.</description><pubDate>Thu, 12 Feb 2026 00:00:00 GMT</pubDate><dc:creator>Groundy Editorial</dc:creator><atom:updated>2026-05-27T00:00:00.000Z</atom:updated><category>ai</category><category>open-source</category><category>github</category><category>pull-requests</category><category>mitchell-hashimoto</category><category>vouch</category><author>Groundy Editorial</author></item></channel></rss>