Groundy — independent coverage of developer tools, infrastructure, and platforms
Why Audio Deepfake Detectors Keep Losing the Voice-Cloning Arms Race
A 34,000-parameter audio deepfake detector reaches only 75 to 80 percent cross-domain accuracy, a result that shows why post-hoc detection sits downstream of generation.
securityMixed Compliance Data Makes Safety Fine-Tuning a Curation Problem
A June 2026 preprint shows benign and harmful compliance examples are not interchangeable, with DPO, not SFT, the stage that stops benign examples from amplifying harm.
When an LLM Narrates a Solver, the Explanation Drifts From the Math
A June 2026 arXiv paper isolates the narration gap in LLM-solver loops: prompt injection can invert a verified verdict at the prose stage, breaking reasoning-log audits.
infraCloudflare's Temporary Accounts Give AI Agents Disposable Credentials
Cloudflare's temporary accounts give agents auto-expiring 60-minute credentials, but the launch is an onboarding shortcut, not a scoped security control.
policyGrading DiffusionGemma: How an Open-Weight Diffusion Model Scores on Transparency
An arXiv paper finds DiffusionGemma's opaque serial depth collapses from 28.6x to 1.1x via a token bottleneck, though its model card leaves training data unitemized.
policyWho Owns Editorial Authority When LLMs Mediate Knowledge?
A June 2026 preprint argues no role in the LLM pipeline holds editorial sign-off for what answer engines surface as public knowledge, framing it as a governance gap.
ossLithuania's Open-Source Drone-Detection Network Signals an Air-Defense Shift
A Lithuanian open-source drone-detection network points to cheap passive sensor meshes that could move air defense off centralized radar, though unvalidated by field tests.
cultureWhy AI Misreads Nigerian English: A Register Gap in Public Discourse
Models tuned on standard English misread Nigerian English and Pidgin register shifts, pushing intent validation onto local annotators vendors rarely fund.
- models GLM-5.2 vs Kimi K2.7 Code: Two Open-Weight Bets on Agentic Coding
- devtools Cursor Goes to SpaceX, Windsurf to Cognition: What Changes for Dev Teams
- policy US Export Order Forces Anthropic to Disable Fable 5 and Mythos 5 Worldwide
- infra MiniMax M3 Ships 1M Context and Desktop Control as Open Weights
- agents When AI Agents Delegate Work, Your Observability Stack Goes Blind
- devtools GitHub Copilot vs Cursor vs Claude Code: The 2026 AI Coding Showdown
- models AI Code Generation Benchmarks 2026: Which Model Actually Writes Better Code?
- models Chinese AI Models Compared: DeepSeek, Qwen, Kimi, Doubao, and Ernie
- industry Cursor's Meteoric Rise: Inside the AI Editor Hitting $300M ARR
- infra Running GLM-5.2 at Home: SGLang, vLLM, Transformers, and KTransformers Setup Guide
- models GLM-5.2 Benchmarks: What 62.1% SWE-bench Pro and 99.2% AIME Actually Mean
- infra MLX vs llama.cpp on Apple Silicon: Which Runtime to Use for Local LLM Inference
- devtools Claude Code Plugins: Anthropic's Official Plugin Ecosystem Explained
- devtools Running GLM-5.2 in Cursor, Cline, and Roo Code: Migration Checklist and Gotchas
- industry Stargate: Inside OpenAI's $100B Infrastructure Buildout
- jun 21 culture Why Audio Deepfake Detectors Keep Losing the Voice-Cloning Arms Race
- jun 20 security Mixed Compliance Data Makes Safety Fine-Tuning a Curation Problem
- jun 20 policy When an LLM Narrates a Solver, the Explanation Drifts From the Math
- jun 20 infra Cloudflare's Temporary Accounts Give AI Agents Disposable Credentials
- jun 20 policy Grading DiffusionGemma: How an Open-Weight Diffusion Model Scores on Transparency
- jun 20 policy Who Owns Editorial Authority When LLMs Mediate Knowledge?
- jun 20 oss Lithuania's Open-Source Drone-Detection Network Signals an Air-Defense Shift
- jun 20 culture Why AI Misreads Nigerian English: A Register Gap in Public Discourse
- jun 20 agents Deep-Research Benchmarks Hide How Agents Fail at Open-Web Source Grounding
- jun 20 policy Vector Database Access Control Is Missing, and RAG Pipelines Pay for It
- jun 20 agents DSPy Ships Autonomous Prompt Optimization, but Judge Drift Is the Failure Mode
- jun 20 culture What YouTube's Coding Tutorials Teach About Who Belongs in Software
- jun 20 industry Finance Agent Benchmarks Expose Where Lending Automation Breaks
- jun 20 oss NLnet's Grant Model Diverges From VC-Backed Open Source
- jun 20 oss Adam's Open-Source AI CAD Claim Lacks a Confirmed Repo or Accuracy Benchmark
- jun 20 agents Do AI Agents Reach for Over-Privileged Tools When Simpler Ones Suffice?
- jun 20 agents When Should Multi-Agent Systems Use an Event Bus Instead of an Orchestrator?
- jun 20 oss Epic Open-Sources Lore, a VCS Pitched at Git's Scaling Ceiling
- jun 20 infra Running Long-Context Agents on a 4-Bit KV Cache: Where Accuracy Breaks
- jun 20 security Defending Agentic AI With Deception: Misdirecting Model-Guided Attacks
- jun 20 security The Autonomy Tax: Why RL Rewards the Wrong Behavior in Agents
- jun 20 security Anthropic's Procurement Risk Is Policy Refusal, Not Jailbreaks
- jun 19 industry Can You Predict a Fine-Tune's Payoff Before Training Finishes?
- jun 19 culture When an Algorithm Sequences Gig Hiring, Whose Objective Does It Optimize?
- jun 19 infra When LLM-Generated CUDA Kernels Pass Tests but Get the Math Wrong
- jun 19 models Can RoboSSM's State-Space Backbone Replace Transformer Imitation Policies?
- jun 19 models Pruning Experts to Shrink MoE Models: Does Attribution-Guided Compression Beat Magnitude?
- jun 19 agents Can Deontic Policy Rules Govern an AI Agent at Runtime?
- jun 19 models GLM-5.2 vs Kimi K2.7 Code: Two Open-Weight Bets on Agentic Coding
- jun 19 models How Linear Is a Transformer Feed-Forward Block? A New Test Says It's Learned, Not Built In
- jun 19 devtools Cursor Goes to SpaceX, Windsurf to Cognition: What Changes for Dev Teams
- jun 18 culture AI Essay Grading: What a Probe of LLM Internals Reveals About Scoring
- jun 18 models GLM-5.2 Benchmarks: What 62.1% SWE-bench Pro and 99.2% AIME Actually Mean
- jun 18 policy GLM-5.2 MIT Weights vs Llama License: Self-Hosting Compliance for Regulated Industries
- jun 18 models GLM-5.2 on Terminal-Bench 2.1: Strengths, Gaps, and How to Route Real Coding Tasks
- jun 18 models GLM-5.2 vs Claude Opus 4.8: Open-Weight Coding at Frontier Pricing
- jun 18 models GLM-5.2's 753B MoE Costs More to Self-Host Than the MIT License Suggests
- jun 18 infra Running GLM-5.2 at Home: SGLang, vLLM, Transformers, and KTransformers Setup Guide
- jun 18 devtools Running GLM-5.2 in Cursor, Cline, and Roo Code: Migration Checklist and Gotchas
- jun 17 models STAR Replaces Scalar Reward in Text-to-Image RL with Attention-Derived Spatial Maps
- jun 15 oss Zhipu Open-Sources GLM-5.2 Under MIT While Anthropic Tightens Model Access
- jun 15 models Can Editing One Neuron Fix LLM Repetition Loops?
- jun 15 industry Zhipu Ships GLM-5.2 With 1M Context and MIT Weights, but Zero Benchmarks at Launch
- jun 15 infra AWS Bedrock Now Requires Data Sharing for Mythos: The Self-Hosting Calculus
- jun 15 devtools Vercel's Remend Turns Streaming-Markdown Repair Into a Dependency
- jun 15 industry Moonshot's Kimi K2.7 Code Loses 11 of 12 Benchmark Cells, Leads on Efficiency Instead
- jun 14 policy Can Reinforcement Learning Be Provably Safe Without Sacrificing Scale?
- jun 14 infra vLLM Cold Start Latency: Why Scale-to-Zero LLM Serving Stalls
- jun 14 infra The Vercel-AWS Deal Reveals Where AI Inference Runs
- jun 14 agents Do Programming Languages Still Matter to Your AI Coding Agent?