Topic

#grpo

2 articles exploring grpo. Expert insights and analysis from our editorial team.

Showing 1–2 of 2 articles

Articles

Newest first
Models & Research

Fixed Entropy Coefficients Break Down on Mixed-Difficulty Tasks: What AER Means for Teams Running LLM RL at Scale

Static entropy regularization in GRPO underperforms on mixed-difficulty tasks. Difficulty-aware allocation closes the gap by 7-10 points on pass@1 without extra compute.

Agents & Frameworks

ml-intern's 32% GPQA Gain on a Single H100 Exposes the Assumption That Post-Training Still Needs a Human ML Researcher

ml-intern hit 32% on GPQA in under 10 hours, beating Claude Code's 22.99% on the same task — but a 51% instruction-tuned ceiling marks what the autonomous loop cannot close.