Agents & Frameworks
LLM agents, agent frameworks, tool use, MCP, and agent architecture.
43 articles exploring Agents & Frameworks. Expert analysis and insights from our editorial team.
Latest in Agents & Frameworks
ml-intern's 32% GPQA Gain on a Single H100 Exposes the Assumption That Post-Training Still Needs a Human ML Researcher
ml-intern hit 32% on GPQA in under 10 hours, beating Claude Code's 22.99% on the same task — but a 51% instruction-tuned ceiling marks what the autonomous loop cannot close.
Neural Computers' Symbolic Stability Failure Contradicts the Case for Pure-Neural Agent Orchestration
Meta AI and KAUST's Neural Computers paper names routine reuse, controlled updates, and symbolic stability as open problems — exactly what LangGraph and AutoGen already solve.
InsForge: The Backend Framework Built for Agentic Applications
InsForge is a backend-as-a-service platform purpose-built for AI coding agents, delivering 1.6x faster task completion and 2.4x fewer tokens than Supabase.
AI Agents That Actually Learn: The Architecture Behind Hindsight Memory
Hindsight by vectorize-io is an open-source agent memory system that replaces stateless retrieval with structured, time-aware memory networks—achieving 91.4% on LongMemEval and showing what genuine agent learning looks like at the architecture level.
SWE-Bench's Dirty Secret: AI-Passing PRs That Real Engineers Would Reject
New research from METR shows roughly half of SWE-bench-passing AI-generated PRs would be rejected by actual project maintainers—exposing a 24-percentage-point gap between benchmark scores and real-world code acceptability.
Hugging Face Skills: Pretrained Agent Capabilities
Hugging Face Skills are standardized, self-contained instruction packages that give coding agents—Claude Code, Codex, Gemini CLI, and Cursor—procedural expertise for AI/ML tasks. Launched in November 2025, the Apache 2.0-licensed library reached 7,500 GitHub stars by early 2026 and provides nine composable capabilities from model training to paper publishing.
Superpowers: The Agentic Framework Replacing Your Dev Process
Superpowers is an open-source agentic skills framework by Jesse Vincent that enforces structured software development workflows—brainstorming, planning, TDD, and subagent coordination—on top of AI coding agents like Claude Code, turning them from reactive assistants into disciplined developers capable of autonomous multi-hour sessions.
How AI Agents Remember: Memory Architectures That Work
AI agents use four distinct memory tiers—working, episodic, semantic, and procedural—stored across context windows, vector databases, knowledge graphs, and model weights. Choosing the right architecture determines whether your agent stays coherent across sessions or forgets everything the moment a conversation ends.
Vibe Coding One Year Later: What Actually Survived
One year after Andrej Karpathy coined 'vibe coding,' the evidence is clear: rapid prototyping and non-developer productivity are genuine wins, but production security and organizational-level gains remain elusive. Here's what the data shows.
Browser-Use Agents: AI That Browses Like a Human
A comprehensive guide to browser-use AI agents, exploring OpenAI Operator, Claude Computer Use, Browser-Use framework, and Google Project Mariner with benchmarks and capabilities.
GGML Joins Hugging Face: What It Means for Local AI
Hugging Face acquired ggml-org, the team behind llama.cpp, on February 20, 2026. This strategic move ensures the long-term sustainability of the world's most popular local AI inference framework while accelerating its integration with the broader ML ecosystem.
AI Testing Automation: Agents That Write and Run Tests
AI agents can now generate, execute, and maintain test suites with minimal human intervention. While unit tests and regression suites achieve 60-80% automation rates, exploratory testing and complex business logic validation still require human oversight.
Function Calling Best Practices: LLMs That Actually Use APIs Correctly
Function calling enables LLMs to interact with external systems through structured API calls, but reliability requires careful schema design, error handling patterns, and validation strategies to prevent hallucinated parameters and malformed requests.
AI-Orchestrated Systems: The Rise of Multi-Agent Development Frameworks
AI-orchestrated development systems like AutoGen, CrewAI, and ChatDev are emerging as comprehensive platforms for managing end-to-end software development through coordinated multi-agent workflows, revealing both significant capabilities and critical limitations in AI-managed software engineering.
AI Code Review Agents: Catching Bugs Before Humans Do
AI code review agents can reduce review time by 50% and catch security vulnerabilities human reviewers miss, but they augment rather than replace human expertise in 2026.