Topic
#swe-bench
2 articles exploring swe-bench. Expert insights and analysis from our editorial team.
Showing 1โ2 of 2 articles
Articles
Newest first
Agents & Frameworks
'Beyond the Diff' Quantifies Agentic Entropy: Why AI Coding Agents Drift Across Iterations
A CHI 2026 paper formalizes agentic entropy as structural drift between agent actions and intent, showing why per-step benchmarks miss cumulative misalignment in long agent.
Developer Tools
SWE-bench Verified Explained: What the Coding Agent Leaderboard Actually Measures (and What It Misses)
SWE-bench Verified tests AI agents on 500 real GitHub bug fixes. Learn what 'resolved 49%' means, how scoring works, and the benchmark's critical blind spots.