Topic

#swe-bench

2 articles exploring swe-bench. Expert insights and analysis from our editorial team.

Showing 1โ€“2 of 2 articles

Articles

Newest first
Agents & Frameworks

'Beyond the Diff' Quantifies Agentic Entropy: Why AI Coding Agents Drift Across Iterations

A CHI 2026 paper formalizes agentic entropy as structural drift between agent actions and intent, showing why per-step benchmarks miss cumulative misalignment in long agent.

Developer Tools

SWE-bench Verified Explained: What the Coding Agent Leaderboard Actually Measures (and What It Misses)

SWE-bench Verified tests AI agents on 500 real GitHub bug fixes. Learn what 'resolved 49%' means, how scoring works, and the benchmark's critical blind spots.

ยท 8 min read