Groundy — independent coverage of developer tools, infrastructure, and platforms
Do LLM Personality Tests Measure Anything? A New Paper Says No
A June 2026 arXiv preprint finds 81 to 90 percent of LLM personality test variation stems from directional response bias, undermining persona and safety scores.
securityReported React Server Components Leak Is Unconfirmed: Audit the Payload
A reported React Server Components source-code leak has no CVE or advisory in React or Vercel's channels. Audit what your app serializes before trusting the boundary.
Generating Vercel Firewall Rules From Natural Language: What to Audit
Vercel's natural-language WAF rules silently fill rate-limit defaults and persistent block durations. Read the generated config before publish, not the prompt.
devtoolsGLM-5.2 Coding Plan vs Claude Opus 4.8: Picking a Model for Coding Agents
GLM-5.2 ships an MIT-licensed Coding Plan at $12.6 to $112 per month, forcing coding-agent teams to weigh billing model and license terms over the missing benchmark table.
securityVercel's Secure AI Agent Guidance Pushes Defense Into the Sandbox
Vercel treats prompt injection and agent hallucination as unsolvable at the model layer, routing defense into per-session sandboxes and shifting security onto deployment ops.
securityNx Supply-Chain Attack Used Developers' Own AI CLIs to Hunt Secrets
s1ngularity's malicious Nx packages invoked installed AI CLIs with permission bypass flags to enumerate secrets, making any local agent a scriptable recon primitive.
industryVercel Folds Backends, Agent Tooling, and Operations Into Its Deploy Platform
At Ship 2026, Vercel launched an agentic infrastructure platform that folds backends, agent tooling, and operations into its deploy stack, raising switching costs.
infraCloudflare Now Routes Public Traffic to Private Apps via DNS, No VPN
Cloudflare's private-origins DNS routing lets public hostnames reach RFC 1918 apps without a VPN, but the flag only routes traffic; it does not authenticate callers.
- models GLM-5.2 vs Kimi K2.7 Code: Two Open-Weight Bets on Agentic Coding
- devtools Cursor Goes to SpaceX, Windsurf to Cognition: What Changes for Dev Teams
- policy US Export Order Forces Anthropic to Disable Fable 5 and Mythos 5 Worldwide
- infra MiniMax M3 Ships 1M Context and Desktop Control as Open Weights
- agents When AI Agents Delegate Work, Your Observability Stack Goes Blind
- devtools GitHub Copilot vs Cursor vs Claude Code: The 2026 AI Coding Showdown
- models AI Code Generation Benchmarks 2026: Which Model Actually Writes Better Code?
- industry Cursor's Meteoric Rise: Inside the AI Editor Hitting $300M ARR
- infra Running GLM-5.2 at Home: SGLang, vLLM, Transformers, and KTransformers Setup Guide
- models Chinese AI Models Compared: DeepSeek, Qwen, Kimi, Doubao, and Ernie
- models GLM-5.2 Benchmarks: What 62.1% SWE-bench Pro and 99.2% AIME Actually Mean
- devtools Running GLM-5.2 in Cursor, Cline, and Roo Code: Migration Checklist and Gotchas
- industry Fable 5 Credit Cliff: What the June 23 Billing Shift Means for Teams
- devtools Claude Code Plugins: Anthropic's Official Plugin Ecosystem Explained
- infra MLX vs llama.cpp on Apple Silicon: Which Runtime to Use for Local LLM Inference
- jun 22 policy Do LLM Personality Tests Measure Anything? A New Paper Says No
- jun 22 security Reported React Server Components Leak Is Unconfirmed: Audit the Payload
- jun 22 devtools Generating Vercel Firewall Rules From Natural Language: What to Audit
- jun 22 devtools GLM-5.2 Coding Plan vs Claude Opus 4.8: Picking a Model for Coding Agents
- jun 22 security Vercel's Secure AI Agent Guidance Pushes Defense Into the Sandbox
- jun 22 security Nx Supply-Chain Attack Used Developers' Own AI CLIs to Hunt Secrets
- jun 22 industry Vercel Folds Backends, Agent Tooling, and Operations Into Its Deploy Platform
- jun 22 infra Cloudflare Now Routes Public Traffic to Private Apps via DNS, No VPN
- jun 22 oss OpenAI's Patch the Planet Is Security Capacity for Nine Projects, Not Sustainability Funding
- jun 22 oss MiniMax M3 Claims GPT-5.5-Beating Code With 1M Context and Open Weights
- jun 22 industry George Hotz Says Only AGI Doom Justifies Today's AI Valuations
- jun 22 infra GitHub's AI Capacity Crunch Pushes Microsoft to Rent AWS Compute
- jun 22 policy Community LoRA Mining Raises a Consent Gap for Style Generation
- jun 21 culture Why Audio Deepfake Detectors Keep Losing the Voice-Cloning Arms Race
- jun 20 security Mixed Compliance Data Makes Safety Fine-Tuning a Curation Problem
- jun 20 policy When an LLM Narrates a Solver, the Explanation Drifts From the Math
- jun 20 infra Cloudflare's Temporary Accounts Give AI Agents Disposable Credentials
- jun 20 policy Grading DiffusionGemma: How an Open-Weight Diffusion Model Scores on Transparency
- jun 20 policy Who Owns Editorial Authority When LLMs Mediate Knowledge?
- jun 20 oss Lithuania's Open-Source Drone-Detection Network Signals an Air-Defense Shift
- jun 20 culture Why AI Misreads Nigerian English: A Register Gap in Public Discourse
- jun 20 agents Deep-Research Benchmarks Hide How Agents Fail at Open-Web Source Grounding
- jun 20 policy Vector Database Access Control Is Missing, and RAG Pipelines Pay for It
- jun 20 agents DSPy Ships Autonomous Prompt Optimization, but Judge Drift Is the Failure Mode
- jun 20 culture What YouTube's Coding Tutorials Teach About Who Belongs in Software
- jun 20 industry Finance Agent Benchmarks Expose Where Lending Automation Breaks
- jun 20 oss NLnet's Grant Model Diverges From VC-Backed Open Source
- jun 20 oss Adam's Open-Source AI CAD Claim Lacks a Confirmed Repo or Accuracy Benchmark
- jun 20 agents Do AI Agents Reach for Over-Privileged Tools When Simpler Ones Suffice?
- jun 20 agents When Should Multi-Agent Systems Use an Event Bus Instead of an Orchestrator?
- jun 20 oss Epic Open-Sources Lore, a VCS Pitched at Git's Scaling Ceiling
- jun 20 infra Running Long-Context Agents on a 4-Bit KV Cache: Where Accuracy Breaks
- jun 20 security Defending Agentic AI With Deception: Misdirecting Model-Guided Attacks
- jun 20 security The Autonomy Tax: Why RL Rewards the Wrong Behavior in Agents
- jun 20 security Anthropic's Procurement Risk Is Policy Refusal, Not Jailbreaks
- jun 19 industry Can You Predict a Fine-Tune's Payoff Before Training Finishes?
- jun 19 culture When an Algorithm Sequences Gig Hiring, Whose Objective Does It Optimize?
- jun 19 infra When LLM-Generated CUDA Kernels Pass Tests but Get the Math Wrong
- jun 19 models Can RoboSSM's State-Space Backbone Replace Transformer Imitation Policies?
- jun 19 models Pruning Experts to Shrink MoE Models: Does Attribution-Guided Compression Beat Magnitude?
- jun 19 agents Can Deontic Policy Rules Govern an AI Agent at Runtime?
- jun 19 models GLM-5.2 vs Kimi K2.7 Code: Two Open-Weight Bets on Agentic Coding
- jun 19 models How Linear Is a Transformer Feed-Forward Block? A New Test Says It's Learned, Not Built In
- jun 19 devtools Cursor Goes to SpaceX, Windsurf to Cognition: What Changes for Dev Teams
- jun 18 culture AI Essay Grading: What a Probe of LLM Internals Reveals About Scoring
- jun 18 models GLM-5.2 Benchmarks: What 62.1% SWE-bench Pro and 99.2% AIME Actually Mean
- jun 18 policy GLM-5.2 MIT Weights vs Llama License: Self-Hosting Compliance for Regulated Industries
- jun 18 models GLM-5.2 on Terminal-Bench 2.1: Strengths, Gaps, and How to Route Real Coding Tasks
- jun 18 models GLM-5.2 vs Claude Opus 4.8: Open-Weight Coding at Frontier Pricing
- jun 18 models GLM-5.2's 753B MoE Costs More to Self-Host Than the MIT License Suggests