Category

Agents & Frameworks

LLM agents, agent frameworks, tool use, MCP, and agent architecture.

43 articles exploring Agents & Frameworks. Expert analysis and insights from our editorial team.

Showing 16–30 of 43 articles · Page 2 of 3

Latest in Agents & Frameworks

Newest first
16

ml-intern's 32% GPQA Gain on a Single H100 Exposes the Assumption That Post-Training Still Needs a Human ML Researcher

ml-intern hit 32% on GPQA in under 10 hours, beating Claude Code's 22.99% on the same task — but a 51% instruction-tuned ceiling marks what the autonomous loop cannot close.

17

Neural Computers' Symbolic Stability Failure Contradicts the Case for Pure-Neural Agent Orchestration

Meta AI and KAUST's Neural Computers paper names routine reuse, controlled updates, and symbolic stability as open problems — exactly what LangGraph and AutoGen already solve.

18

InsForge: The Backend Framework Built for Agentic Applications

InsForge is a backend-as-a-service platform purpose-built for AI coding agents, delivering 1.6x faster task completion and 2.4x fewer tokens than Supabase.

· 8 min read
19

AI Agents That Actually Learn: The Architecture Behind Hindsight Memory

Hindsight by vectorize-io is an open-source agent memory system that replaces stateless retrieval with structured, time-aware memory networks—achieving 91.4% on LongMemEval and showing what genuine agent learning looks like at the architecture level.

· 8 min read
20

SWE-Bench's Dirty Secret: AI-Passing PRs That Real Engineers Would Reject

New research from METR shows roughly half of SWE-bench-passing AI-generated PRs would be rejected by actual project maintainers—exposing a 24-percentage-point gap between benchmark scores and real-world code acceptability.

· 9 min read
21

Hugging Face Skills: Pretrained Agent Capabilities

Hugging Face Skills are standardized, self-contained instruction packages that give coding agents—Claude Code, Codex, Gemini CLI, and Cursor—procedural expertise for AI/ML tasks. Launched in November 2025, the Apache 2.0-licensed library reached 7,500 GitHub stars by early 2026 and provides nine composable capabilities from model training to paper publishing.

· 8 min read
22

Superpowers: The Agentic Framework Replacing Your Dev Process

Superpowers is an open-source agentic skills framework by Jesse Vincent that enforces structured software development workflows—brainstorming, planning, TDD, and subagent coordination—on top of AI coding agents like Claude Code, turning them from reactive assistants into disciplined developers capable of autonomous multi-hour sessions.

· 8 min read
23

How AI Agents Remember: Memory Architectures That Work

AI agents use four distinct memory tiers—working, episodic, semantic, and procedural—stored across context windows, vector databases, knowledge graphs, and model weights. Choosing the right architecture determines whether your agent stays coherent across sessions or forgets everything the moment a conversation ends.

· 9 min read
24

Vibe Coding One Year Later: What Actually Survived

One year after Andrej Karpathy coined 'vibe coding,' the evidence is clear: rapid prototyping and non-developer productivity are genuine wins, but production security and organizational-level gains remain elusive. Here's what the data shows.

· 9 min read
25

Browser-Use Agents: AI That Browses Like a Human

A comprehensive guide to browser-use AI agents, exploring OpenAI Operator, Claude Computer Use, Browser-Use framework, and Google Project Mariner with benchmarks and capabilities.

· 8 min read
26

GGML Joins Hugging Face: What It Means for Local AI

Hugging Face acquired ggml-org, the team behind llama.cpp, on February 20, 2026. This strategic move ensures the long-term sustainability of the world's most popular local AI inference framework while accelerating its integration with the broader ML ecosystem.

· 8 min read
27

AI Testing Automation: Agents That Write and Run Tests

AI agents can now generate, execute, and maintain test suites with minimal human intervention. While unit tests and regression suites achieve 60-80% automation rates, exploratory testing and complex business logic validation still require human oversight.

· 8 min read
28

Function Calling Best Practices: LLMs That Actually Use APIs Correctly

Function calling enables LLMs to interact with external systems through structured API calls, but reliability requires careful schema design, error handling patterns, and validation strategies to prevent hallucinated parameters and malformed requests.

· 8 min read
29

AI-Orchestrated Systems: The Rise of Multi-Agent Development Frameworks

AI-orchestrated development systems like AutoGen, CrewAI, and ChatDev are emerging as comprehensive platforms for managing end-to-end software development through coordinated multi-agent workflows, revealing both significant capabilities and critical limitations in AI-managed software engineering.

· 12 min read
30

AI Code Review Agents: Catching Bugs Before Humans Do

AI code review agents can reduce review time by 50% and catch security vulnerabilities human reviewers miss, but they augment rather than replace human expertise in 2026.

· 7 min read

Explore More Categories

Discover insights across different technology domains.

Browse All Articles