Models & Research

about this beat editorial framing

Every new foundation model arrives wrapped in a benchmark chart and a press release. This beat exists because the chart almost never tells you what the model actually does — and the press release tells you even less. The interesting work is one layer down: the attention variant that changes the cost curve, the parameterization fix that makes scaling laws actually transfer, the eval protocol whose failure modes flip the standings once you change a decoding parameter or a codec. We cover that layer.

We treat open-weight and closed releases as the same story told from opposite ends of a distribution shift. A trillion-parameter mixture beating a dense competitor on one harness and losing on another is not a contradiction; it is evidence about what the harness measures. Training-efficiency claims, context-window claims, and reasoning-benchmark claims all get the same treatment — read the method section, find the assumption that was load-bearing, and report whether removing it changes the result. When a paper proposes a new safety mechanism, a new compression trick, or a new continual-learning split, we are interested in the part the authors did not want to highlight.

The throughline is comparative and skeptical without being contrarian. Foundation-model research has become the field where the largest gap between published numbers and deployed behavior tends to live. Closing that gap — with reproduction notes, ablation reading, and honest accounting of what generalizes versus what was tuned into the eval — is the beat.

models & research

Tracing Why LLM Agent Memory Fails: A Method for Attributing Errors

Persona Prompts Change Who an LLM Recommends as an Expert

Opus 4.8 Batch API: 1M Context, 300k Output, and Team Cost Controls

Opus 4.8 vs Opus 4.7: What Changed and What Did Not

One Learning Rate Doesn't Fit All: Heavy-Tail Layerwise LR Schedules for LLM Pretraining

Scale Vectors: Tiny Parameter Subsets That Disproportionately Steer LLM Behavior

Embedding Compression at Training Time: DIVE's Gradient Trick vs Post-Hoc Quantization for Vector DBs

μP Hyperparameter Transfer Has an Embedding Layer Hole, New arXiv Paper Says

Top in models & research