groundy

Groundy — independent coverage of developer tools, infrastructure, and platforms





  1. jun 20 agents Do AI Agents Reach for Over-Privileged Tools When Simpler Ones Suffice?
  2. jun 20 agents When Should Multi-Agent Systems Use an Event Bus Instead of an Orchestrator?
  3. jun 20 oss Epic Open-Sources Lore, a VCS Pitched at Git's Scaling Ceiling
  4. jun 20 infra Running Long-Context Agents on a 4-Bit KV Cache: Where Accuracy Breaks
  5. jun 20 security Defending Agentic AI With Deception: Misdirecting Model-Guided Attacks
  6. jun 20 security The Autonomy Tax: Why RL Rewards the Wrong Behavior in Agents
  7. jun 20 security Anthropic's Procurement Risk Is Policy Refusal, Not Jailbreaks
  8. jun 19 industry Can You Predict a Fine-Tune's Payoff Before Training Finishes?
  9. jun 19 culture When an Algorithm Sequences Gig Hiring, Whose Objective Does It Optimize?
  10. jun 19 infra When LLM-Generated CUDA Kernels Pass Tests but Get the Math Wrong
  11. jun 19 models Can RoboSSM's State-Space Backbone Replace Transformer Imitation Policies?
  12. jun 19 models Pruning Experts to Shrink MoE Models: Does Attribution-Guided Compression Beat Magnitude?
  13. jun 19 agents Can Deontic Policy Rules Govern an AI Agent at Runtime?
  14. jun 19 models GLM-5.2 vs Kimi K2.7 Code: Two Open-Weight Bets on Agentic Coding
  15. jun 19 models How Linear Is a Transformer Feed-Forward Block? A New Test Says It's Learned, Not Built In
  16. jun 19 devtools Cursor Goes to SpaceX, Windsurf to Cognition: What Changes for Dev Teams
  17. jun 18 culture AI Essay Grading: What a Probe of LLM Internals Reveals About Scoring
  18. jun 18 models GLM-5.2 Benchmarks: What 62.1% SWE-bench Pro and 99.2% AIME Actually Mean
  19. jun 18 policy GLM-5.2 MIT Weights vs Llama License: Self-Hosting Compliance for Regulated Industries
  20. jun 18 models GLM-5.2 on Terminal-Bench 2.1: Strengths, Gaps, and How to Route Real Coding Tasks
  21. jun 18 models GLM-5.2 vs Claude Opus 4.8: Open-Weight Coding at Frontier Pricing
  22. jun 18 models GLM-5.2's 753B MoE Costs More to Self-Host Than the MIT License Suggests
  23. jun 18 infra Running GLM-5.2 at Home: SGLang, vLLM, Transformers, and KTransformers Setup Guide
  24. jun 18 devtools Running GLM-5.2 in Cursor, Cline, and Roo Code: Migration Checklist and Gotchas
  25. jun 17 models STAR Replaces Scalar Reward in Text-to-Image RL with Attention-Derived Spatial Maps
  26. jun 15 oss Zhipu Open-Sources GLM-5.2 Under MIT While Anthropic Tightens Model Access
  27. jun 15 models Can Editing One Neuron Fix LLM Repetition Loops?
  28. jun 15 industry Zhipu Ships GLM-5.2 With 1M Context and MIT Weights, but Zero Benchmarks at Launch
  29. jun 15 infra AWS Bedrock Now Requires Data Sharing for Mythos: The Self-Hosting Calculus
  30. jun 15 devtools Vercel's Remend Turns Streaming-Markdown Repair Into a Dependency
  31. jun 15 industry Moonshot's Kimi K2.7 Code Loses 11 of 12 Benchmark Cells, Leads on Efficiency Instead
  32. jun 14 policy Can Reinforcement Learning Be Provably Safe Without Sacrificing Scale?
  33. jun 14 infra vLLM Cold Start Latency: Why Scale-to-Zero LLM Serving Stalls
  34. jun 14 infra The Vercel-AWS Deal Reveals Where AI Inference Runs
  35. jun 14 agents Do Programming Languages Still Matter to Your AI Coding Agent?
  36. jun 14 agents Why Production AI Agents Fail Silently and Your Logs Never Catch It
  37. jun 13 security AMD Took 124 Days to Patch the RCE It First Called Out of Scope
  38. jun 12 policy US Export Order Forces Anthropic to Disable Fable 5 and Mythos 5 Worldwide
  39. jun 10 models Claude Fable 5 Benchmarks: What FrontierCode, CursorBench, and ViBench Show
  40. jun 11 agents Computer-Use Agents Fabricate Success on 8 to 33 Percent of Long-Horizon Tasks
  41. jun 10 infra Running RAG on a Snapdragon NPU: The On-Device Retrieval Tradeoff
  42. jun 10 models Does Attribution Patching Lie? A Fix for a Common Interpretability Shortcut
  43. jun 11 models Can You Make a Multimodal Model Unlearn With Activation Steering?
  44. jun 11 models Why Pruning a Model Can Raise Its Out-of-Distribution Accuracy
  45. jun 11 industry Vercel's Turborepo: Build Speed Becomes a Hosting-Vendor Feature
  46. jun 10 security OpenAI Frames Instruction Hierarchy as an Open Challenge, Not a Prompt-Injection Fix
  47. jun 10 devtools JetBrains Mellum2: A 12B Open-Weights Code Model for Self-Hosted Completion
  48. jun 09 models Do Unified Multimodal Models Actually Interleave Understanding and Generation?
  49. jun 09 agents Can AI Agents Share Context Without a Central Coordinator?
  50. jun 09 agents Why Skill Creation and Reward Optimization Collide in Agentic RL
load older →