Topic

#process-reward-models

1 article exploring process-reward-models. Expert insights and analysis from our editorial team.

Showing 1–1 of 1 articles

Articles

Newest first
Models & Research

The Last Word Often Wins: A Format Confound Inflates Chain-of-Thought Corruption Robustness Scores

A format confound in CoT corruption benchmarks—suffix sensitivity collapsed 19× when final-answer text was stripped—means published faithfulness scores are inflated.