Topic
#compositional-reasoning
2 articles exploring compositional-reasoning. Expert insights and analysis from our editorial team.
Showing 1–2 of 2 articles
Articles
Newest first
Models & Research
STaD Exposes What HumanEval Hides: Compositional Skill Gaps in LLMs That Aggregate Benchmarks Miss
IBM Research's STaD shows models with identical benchmark scores can fail on different subskills, making leaderboard rank a poor proxy for compositional code generation.
Models & Research
STaD's Scaffolded Tasks Isolate the Compositional Skill Gaps That Aggregate LLM Benchmarks Hide
IBM Research's STaD framework exposes compositional skill gaps aggregate benchmarks miss: two models at 32% on ToT Arithmetic needed fundamentally different fixes.