Models & Research

DuQuant++ Brings Fine-Grained Rotation to FP4: What Microscaling Quantization Means for Running Larger Models on the Same GPU

DuQuant++ adapts outlier-aware rotation to MXFP4, halving online rotation cost on LLaMA 3 and shifting the FP4 deployment bottleneck from memory to calibration engineering.

Models & Research

Fixed Entropy Coefficients Break Down on Mixed-Difficulty Tasks: What AER Means for Teams Running LLM RL at Scale

Static entropy regularization in GRPO underperforms on mixed-difficulty tasks. Difficulty-aware allocation closes the gap by 7-10 points on pass@1 without extra compute.

Agents & Frameworks

Google's TPU 8i Targets Agentic Workloads. What CrewAI, LangGraph, and AutoGen Must Measure

Google's TPU 8i adds SRAM and a collectives engine for agentic workloads, yet CrewAI, LangGraph, and AutoGen lack the per-step latency and branch-utilization metrics needed.

Open Source

Hugging Face's Spring 2026 State of Open Source Report: China Hits 41% of Downloads, Industry Share Collapses From 70% to 37%

Chinese models hit 41% of Hugging Face downloads, overtaking the US, while independents hit 39%. Top 200 models capture half of all downloads, forcing Western procurement.

Infrastructure & Runtime

Ingress-Nginx Is Dead, Not Deprecated: The Final CVE Patches Shipped, But [Platform Teams](/articles/crawshaws-i-am-building-a-cloud-what-a-tailscale-co-founders-solo-stack-implies/) Still Need a Migration Plan

ingress-nginx was retired March 24, 2026. CVE-2026-4342 patches shipped March 19, but no future fixes are coming. How platform teams should pick a migration path.

Models & Research

JumpLoRA's Sparse Adapters Break the Assumption That Continual Fine-Tuning Requires Full-Rank LoRA Stacks

JumpLoRA adds learnable JumpReLU gates to LoRA blocks for 87-95% sparse adapters with near-zero cross-task overlap. The work exposes that PEFT has no router for continual.

Industry & Business

KV Packet's Recomputation-Free Cache Exposes a Gap in How Cloud AI Vendors Price Multi-Document RAG Inference

KV Packet proves near-zero-FLOPs context-independent KV reuse is achievable, exposing how prefix-only vendor caching tiers structurally exclude multi-document RAG.

Developer Tools

LACE Forces vLLM and SGLang to Rethink How Parallel Reasoning Threads Run

LACE lets parallel reasoning threads share state mid-inference, yielding 3-7 point accuracy gains but forcing vLLM and SGLang to abandon independent-sequence batching.

Developer Tools

LiteRT-LM v0.10.1 Ships Gemma 4 MTP Heads That llama.cpp Can't Access

LiteRT-LM v0.10.1 ships Gemma 4 with Qualcomm NPU acceleration, but Google stripped MTP heads from public weights, locking peak Gemma 4 throughput to its own runtime.

Security

March-April MCP CVEs Expose the Local-Host Trust Model (see also [local-host trust model](/articles/hugging-face-lerobot-cve-2026-25874-unauthenticated-pickle-loads-rce-in-grpc/)) in AI Agent Frameworks

Three CVEs scoring up to 9.8 reveal a structural flaw: MCP's local-host trust model lacks authentication primitives for networked multi-tenant deployments.

Security

Marimo's CVE-2026-39987 Pre-Auth RCE Puts AI Notebooks on the [Same CVE Treadmill](/articles/instructlab-cve-2026-6859-hardcoded-trust-remote-code-true-turns-any/) as Inference Servers (see also [inference servers](/articles/hugging-face-lerobot-cve-2026-25874-unauthenticated-pickle-loads-rce-in-grpc/))

CVE-2026-39987 skipped auth on Marimo's /terminal/ws, handing any caller a root PTY shell (CVSS 9.3) — exploited in the wild just 9h 41m after the advisory.

Security

Marimo's CVE-2026-39987: 9h41m From Disclosure to Exploitation, NKAbuse Staged on Hugging Face

Marimo CVE-2026-39987 was exploited 9h41m after disclosure, with 662 events and a NKAbuse backdoor staged on Hugging Face. Same-day patching is the new minimum for AI tooling.

Models & Research

MM-JudgeBias Exposes Compositional Bias in MLLM-as-a-Judge: What It Means for Teams Running Model-Based Eval Pipelines

MM-JudgeBias shows MLLM judges inherit the compositional biases they evaluate, so teams must audit judge selection rather than assume model-based eval removes labeling work.

Open Source

Neural Computers From MetaAuto: Video Models Can Replace Shell Interpreters, But Not Stateful Tasks

Neural Computers replace the interpreter with learned pixel I/O, but the paper shows these agents fail at symbolic state and multi-step arithmetic.

Agents & Frameworks

Nous Research's Hermes Ships Persistent Memory and Auto-Skill Capture: CrewAI and AutoGen Must Reconsider

Hermes Agent bakes persistent memory and auto-skill capture into core, shifting comparison from orchestration to self-improvement. CrewAI has static skills; AutoGen is frozen.