Groundy — independent coverage of developer tools, infrastructure, and platforms
Flow Matching vs U-Net: A Skip-Free Backbone for Speech Models
A June 2026 preprint argues U-Net skip connections leak noise into flow-matching speech decoders, and a codec-supervised skip-free backbone can replace them at parity.
securityMeasuring LLM Safety by Refusal Alignment Instead of Attack Success Rate
A June 2026 preprint proposes RAS, a white-box metric that scores LLM safety by hidden-state refusal alignment rather than blocked output, challenging ASR-only leaderboards.
Poisoning Physics-Informed Neural Networks Slips Past Loss-Based Validation
A June 2026 preprint shows poisoned physics-informed neural networks hit clean training loss while their solutions diverge up to 128%, defeating loss-based validation.
policy50 Years of Aviation Certification Expose a Structural Gap in AI Governance
A June 2026 arXiv paper argues AI governance lacks the epoch limits and proof surfaces that 50 years of aviation certification built in, breaking one-time approval stamps.
securityCatching LLM Jailbreaks by Watching Per-Layer Entropy, Not Outputs
A June 2026 paper reports jailbreaks perturb per-layer entropy of frozen LLMs before any harmful token emits, but adaptive attackers will likely follow one layer deeper.
ossCost and Access, Not Ideology, Drive Open-Weight Chinese Model Adoption
The shift toward open-weight Chinese models runs on cost and access, not openness. Operators inherit the work of vetting provenance, licenses, and benchmark reproducibility.
modelsA Per-Neuron Sequence Model Was Withdrawn From arXiv as Coverage Hailed It
TND proposed per-neuron dynamics as a sequence-modeling primitive, then was withdrawn from arXiv for accuracy errors the day before coverage called it a Transformer.
policyDo Reasoning Tokens Actually Make LLMs Safer? A New Paper Tests It
A June 2026 preprint finds refusal decisions are locked at a model's first token, undercutting the safety case for premium reasoning modes billed per thinking token.
- models GLM-5.2 vs Kimi K2.7 Code: Two Open-Weight Bets on Agentic Coding
- devtools Cursor Goes to SpaceX, Windsurf to Cognition: What Changes for Dev Teams
- policy US Export Order Forces Anthropic to Disable Fable 5 and Mythos 5 Worldwide
- infra MiniMax M3 Ships 1M Context and Desktop Control as Open Weights
- agents When AI Agents Delegate Work, Your Observability Stack Goes Blind
- devtools GitHub Copilot vs Cursor vs Claude Code: The 2026 AI Coding Showdown
- industry Cursor's Meteoric Rise: Inside the AI Editor Hitting $300M ARR
- models GLM-5.2 Benchmarks: What 62.1% SWE-bench Pro and 99.2% AIME Actually Mean
- infra Running GLM-5.2 at Home: SGLang, vLLM, Transformers, and KTransformers Setup Guide
- models AI Code Generation Benchmarks 2026: Which Model Actually Writes Better Code?
- models Chinese AI Models Compared: DeepSeek, Qwen, Kimi, Doubao, and Ernie
- devtools Running GLM-5.2 in Cursor, Cline, and Roo Code: Migration Checklist and Gotchas
- industry Fable 5 Credit Cliff: What the June 23 Billing Shift Means for Teams
- devtools Claude Code Plugins: Anthropic's Official Plugin Ecosystem Explained
- devtools Cursor Goes to SpaceX, Windsurf to Cognition: What Changes for Dev Teams
- jun 24 models Flow Matching vs U-Net: A Skip-Free Backbone for Speech Models
- jun 24 security Measuring LLM Safety by Refusal Alignment Instead of Attack Success Rate
- jun 24 security Poisoning Physics-Informed Neural Networks Slips Past Loss-Based Validation
- jun 24 policy 50 Years of Aviation Certification Expose a Structural Gap in AI Governance
- jun 24 security Catching LLM Jailbreaks by Watching Per-Layer Entropy, Not Outputs
- jun 24 oss Cost and Access, Not Ideology, Drive Open-Weight Chinese Model Adoption
- jun 24 models A Per-Neuron Sequence Model Was Withdrawn From arXiv as Coverage Hailed It
- jun 24 policy Do Reasoning Tokens Actually Make LLMs Safer? A New Paper Tests It
- jun 24 devtools Nub Bundles a Bun-Style Toolkit Onto Node Without the Runtime Swap
- jun 24 oss Bot-Account Lookups Miss 97% of AI Coding Agent Commits, 180M-Repo Census Finds
- jun 24 security How Reliable Are the LLM Judges Scoring Jailbreak Attacks?
- jun 24 models PV-TAM Corrects Decoding Drift and Boundary-Marker Bias in VLM Localization Scoring
- jun 24 agents Do AGENTS.md Files Actually Help Coding Agents? A New Benchmark Tests It
- jun 24 agents Should AI Shopping Agents Pay Micro-Transactions for Verified Product Data?
- jun 24 models Meituan's General 365 Benchmark: Top Models All Score Under 63%
- jun 24 models LLM Surrogates in A/B Tests: The 39% Recovery Gap and the Silent Bias Risk
- jun 24 models LLM Token Pricing vs Compute Cost: What the Tokenomics Math Shows
- jun 24 models Do LLM Judges Favor Their Own Output? A Sanity Check on Self-Preference
- jun 23 agents Can a Conversational Graph Compile Into a Goal-Oriented Dialogue Runtime?
- jun 23 security Auto-Reproducing Text-to-Image Jailbreaks From Papers: The PixJail Pipeline
- jun 23 agents Can a Cryptographic Certificate Prove an AI Agent's Output Is Valid?
- jun 23 infra Vercel on the AWS Marketplace: What the Listing Does to Procurement and Lock-In
- jun 23 policy Machine-Readable AI Usage Terms: Does ODRL's Permission Model Hold Up?
- jun 23 agents CrewAI vs AutoGen vs Microsoft Agent Framework: AutoGen's Merger Reframes the 2026 Choice
- jun 23 devtools Vercel Now Deploys Long-Running Node Servers: The Serverless Boundary Shifts
- jun 23 policy Who Audits the Safety Rules an LLM Agent Evolves for Itself?
- jun 23 agents Can You Trust an LLM Judge to Grade an Agentic Data Analysis System?
- jun 23 agents Do LLM Agent Societies Develop Their Own Authority Hierarchies?
- jun 23 infra Serving Cold MoE Models: CrossPool Disaggregates KV Cache and Weights
- jun 23 security Vercel BotID's Telemetry Is a Threat Intelligence Feed Most Teams Discard
- jun 23 policy When Vibe-Coded Software Is Safety-Critical, Who Verifies It?
- jun 23 security Extracting Unseen Training Data From an LLM by Poisoning Its Loss Landscape
- jun 23 agents Do Retrieval Metrics Predict Tool-Use Agent Success? A Paper Says No
- jun 23 infra Vercel's In-Function Concurrency: What It Does to Cold Starts and Billing
- jun 23 policy Can You Trust an AI Robustness Certificate? A Paper Says Verify It
- jun 23 agents Can You Pinpoint Which Step Broke a Long-Horizon AI Agent?
- jun 23 industry Vercel's Series D Thesis Hardened Into a Whole-Stack Lock-In
- jun 23 devtools make-look-scanned Simulates Scans in an Offline WASM File, Exposing PDF Provenance as a Pixel Check
- jun 23 infra Poisoning a RAG Retriever: How Conflict-Aware Edits Inject False Knowledge
- jun 23 models Can AI Write CAD Programs? CADBench Measures the Gap
- jun 23 infra Vercel Raised Its CDN Origin Timeout to Two Minutes: What Breaks First
- jun 23 infra Gradio-Lite Runs Model Inference in the Browser via Pyodide, No Server
- jun 23 devtools Vercel's Billing Usage API: Wiring Cost Data Into CI Cost Gates
- jun 23 infra Cloudflare AI Gateway Adds Spend Limits to Cap the Runaway Inference Bill
- jun 23 infra Vercel Now Honors stale-if-error: Serving Stale Cache When the Origin Dies
- jun 23 models ByteDance's Doubao 2.1 Pro vs GPT-5.5: Reading Self-Reported Benchmarks
- jun 22 policy Can a Benchmark Catch When AI Discharge Summaries Drop Care Steps?
- jun 22 devtools Vercel CLI Now Scopes Commands to the Local Directory: Audit Your CI Scripts
- jun 22 security React Router CVE-2025-31137: Vercel's Edge Fix Is Not the Patch
- jun 22 infra Vercel's Manual CDN Purge API: Cache Control Without a Redeploy