Groundy — independent coverage of developer tools, infrastructure, and platforms
Do AI Agents Reach for Over-Privileged Tools When Simpler Ones Suffice?
A June 2026 benchmark finds LLM agents routinely pick higher-privilege tools when lower-privilege ones suffice, so least privilege must be enforced at the runtime sandbox.
agentsWhen Should Multi-Agent Systems Use an Event Bus Instead of an Orchestrator?
Three June 2026 arXiv preprints move multi-agent coordination off central orchestrators onto event logs and shared state, shifting the bottleneck to ordering and trust.
Epic Open-Sources Lore, a VCS Pitched at Git's Scaling Ceiling
Epic open-sourced Lore, a version control system pitched at the scaling ceiling where Git's object model strains. Operators should demand benchmarks before any migration.
infraRunning Long-Context Agents on a 4-Bit KV Cache: Where Accuracy Breaks
UltraQuant cuts agent time-to-first-token 3.47x with 4-bit KV caching on AMD CDNA4, but its June 2026 preprint omits the accuracy numbers operators need to ship it.
securityDefending Agentic AI With Deception: Misdirecting Model-Guided Attacks
A preprint shows defensive misdirection can cut estimated attacker-success bounds by up to two orders of magnitude, but leaves the cost to legitimate tasks unmeasured.
securityThe Autonomy Tax: Why RL Rewards the Wrong Behavior in Agents
Two June 2026 preprints find RL training in LLM agents rewards the wrong behavior, widening the safety gap and inflating SWE-bench scores by 14 points.
securityAnthropic's Procurement Risk Is Policy Refusal, Not Jailbreaks
Anthropic's record splits AI procurement risk in two: model behavior on benign prompts versus vendor refusal. Both block deployments but need different diligence.
industryCan You Predict a Fine-Tune's Payoff Before Training Finishes?
TuneAhead forecasts fine-tuning success from a short simulated probe, catching 89.4% of winners and 91.0% of failures on Qwen2.5-7B-Instruct for 58.4% compute savings.
- models GLM-5.2 vs Kimi K2.7 Code: Two Open-Weight Bets on Agentic Coding
- devtools Cursor Goes to SpaceX, Windsurf to Cognition: What Changes for Dev Teams
- policy US Export Order Forces Anthropic to Disable Fable 5 and Mythos 5 Worldwide
- infra MiniMax M3 Ships 1M Context and Desktop Control as Open Weights
- agents When AI Agents Delegate Work, Your Observability Stack Goes Blind
- devtools GitHub Copilot vs Cursor vs Claude Code: The 2026 AI Coding Showdown
- models AI Code Generation Benchmarks 2026: Which Model Actually Writes Better Code?
- models Chinese AI Models Compared: DeepSeek, Qwen, Kimi, Doubao, and Ernie
- industry Cursor's Meteoric Rise: Inside the AI Editor Hitting $300M ARR
- infra MLX vs llama.cpp on Apple Silicon: Which Runtime to Use for Local LLM Inference
- infra Running GLM-5.2 at Home: SGLang, vLLM, Transformers, and KTransformers Setup Guide
- devtools Claude Code Plugins: Anthropic's Official Plugin Ecosystem Explained
- models GLM-5.2 Benchmarks: What 62.1% SWE-bench Pro and 99.2% AIME Actually Mean
- industry Stargate: Inside OpenAI's $100B Infrastructure Buildout
- devtools Running GLM-5.2 in Cursor, Cline, and Roo Code: Migration Checklist and Gotchas
- jun 20 agents Do AI Agents Reach for Over-Privileged Tools When Simpler Ones Suffice?
- jun 20 agents When Should Multi-Agent Systems Use an Event Bus Instead of an Orchestrator?
- jun 20 oss Epic Open-Sources Lore, a VCS Pitched at Git's Scaling Ceiling
- jun 20 infra Running Long-Context Agents on a 4-Bit KV Cache: Where Accuracy Breaks
- jun 20 security Defending Agentic AI With Deception: Misdirecting Model-Guided Attacks
- jun 20 security The Autonomy Tax: Why RL Rewards the Wrong Behavior in Agents
- jun 20 security Anthropic's Procurement Risk Is Policy Refusal, Not Jailbreaks
- jun 19 industry Can You Predict a Fine-Tune's Payoff Before Training Finishes?
- jun 19 culture When an Algorithm Sequences Gig Hiring, Whose Objective Does It Optimize?
- jun 19 infra When LLM-Generated CUDA Kernels Pass Tests but Get the Math Wrong
- jun 19 models Can RoboSSM's State-Space Backbone Replace Transformer Imitation Policies?
- jun 19 models Pruning Experts to Shrink MoE Models: Does Attribution-Guided Compression Beat Magnitude?
- jun 19 agents Can Deontic Policy Rules Govern an AI Agent at Runtime?
- jun 19 models GLM-5.2 vs Kimi K2.7 Code: Two Open-Weight Bets on Agentic Coding
- jun 19 models How Linear Is a Transformer Feed-Forward Block? A New Test Says It's Learned, Not Built In
- jun 19 devtools Cursor Goes to SpaceX, Windsurf to Cognition: What Changes for Dev Teams
- jun 18 culture AI Essay Grading: What a Probe of LLM Internals Reveals About Scoring
- jun 18 models GLM-5.2 Benchmarks: What 62.1% SWE-bench Pro and 99.2% AIME Actually Mean
- jun 18 policy GLM-5.2 MIT Weights vs Llama License: Self-Hosting Compliance for Regulated Industries
- jun 18 models GLM-5.2 on Terminal-Bench 2.1: Strengths, Gaps, and How to Route Real Coding Tasks
- jun 18 models GLM-5.2 vs Claude Opus 4.8: Open-Weight Coding at Frontier Pricing
- jun 18 models GLM-5.2's 753B MoE Costs More to Self-Host Than the MIT License Suggests
- jun 18 infra Running GLM-5.2 at Home: SGLang, vLLM, Transformers, and KTransformers Setup Guide
- jun 18 devtools Running GLM-5.2 in Cursor, Cline, and Roo Code: Migration Checklist and Gotchas
- jun 17 models STAR Replaces Scalar Reward in Text-to-Image RL with Attention-Derived Spatial Maps
- jun 15 oss Zhipu Open-Sources GLM-5.2 Under MIT While Anthropic Tightens Model Access
- jun 15 models Can Editing One Neuron Fix LLM Repetition Loops?
- jun 15 industry Zhipu Ships GLM-5.2 With 1M Context and MIT Weights, but Zero Benchmarks at Launch
- jun 15 infra AWS Bedrock Now Requires Data Sharing for Mythos: The Self-Hosting Calculus
- jun 15 devtools Vercel's Remend Turns Streaming-Markdown Repair Into a Dependency
- jun 15 industry Moonshot's Kimi K2.7 Code Loses 11 of 12 Benchmark Cells, Leads on Efficiency Instead
- jun 14 policy Can Reinforcement Learning Be Provably Safe Without Sacrificing Scale?
- jun 14 infra vLLM Cold Start Latency: Why Scale-to-Zero LLM Serving Stalls
- jun 14 infra The Vercel-AWS Deal Reveals Where AI Inference Runs
- jun 14 agents Do Programming Languages Still Matter to Your AI Coding Agent?
- jun 14 agents Why Production AI Agents Fail Silently and Your Logs Never Catch It
- jun 13 security AMD Took 124 Days to Patch the RCE It First Called Out of Scope
- jun 12 policy US Export Order Forces Anthropic to Disable Fable 5 and Mythos 5 Worldwide
- jun 10 models Claude Fable 5 Benchmarks: What FrontierCode, CursorBench, and ViBench Show
- jun 11 agents Computer-Use Agents Fabricate Success on 8 to 33 Percent of Long-Horizon Tasks
- jun 10 infra Running RAG on a Snapdragon NPU: The On-Device Retrieval Tradeoff
- jun 10 models Does Attribution Patching Lie? A Fix for a Common Interpretability Shortcut
- jun 11 models Can You Make a Multimodal Model Unlearn With Activation Steering?
- jun 11 models Why Pruning a Model Can Raise Its Out-of-Distribution Accuracy
- jun 11 industry Vercel's Turborepo: Build Speed Becomes a Hosting-Vendor Feature
- jun 10 security OpenAI Frames Instruction Hierarchy as an Open Challenge, Not a Prompt-Injection Fix
- jun 10 devtools JetBrains Mellum2: A 12B Open-Weights Code Model for Self-Hosted Completion
- jun 09 models Do Unified Multimodal Models Actually Interleave Understanding and Generation?
- jun 09 agents Can AI Agents Share Context Without a Central Coordinator?
- jun 09 agents Why Skill Creation and Reward Optimization Collide in Agentic RL