Groundy — independent coverage of developer tools, infrastructure, and platforms
Why LLMs Fail at Spatial Reasoning When Planning Navigation
LLMs fail at spatial navigation because training text encodes geometry poorly. Two papers show explicit structural scaffolding, not prompt tweaks, is the fix teams need.
cultureRanking LLMs Side by Side Makes Their Dialect Bias Worse
A FAccT 2026 study finds pairwise LLM evaluation, the format behind chatbot arenas and RLHF, amplifies bias against AAVE, and dialect labels make the problem worse.
Replacing Workers With AI Erodes the Skills You'll Need Later
Replacing junior roles with AI tools masks a hidden cost: the erosion of senior-level skills needed to verify, correct, and supervise those same systems over time.
cultureDoes AI Have 6.5 Years Before It Breaches a Planetary Boundary?
A preprint assigns AI a planetary boundary with a 6.5-year breach window, but the countdown excludes AI's thermal load and a rival roadmap targets 1000× efficiency gains.
ossAn Open-Source 80386 Rebuilt Around Intel's Original Microcode
z386 is an 80386 FPGA core driven by Intel's original microcode ROM, recovered from die photographs. It runs Doom at 16.5 FPS, but the microcode's IP status is unresolved.
securityVercel AI SDK CVE-2025-48985: Input Validation Bypass Hits LLM App Builders
An index mismatch in Vercel AI SDK lets attackers inject arbitrary bytes into prompt file inputs. With no NVD CVSS score yet, most dependency scanners will not flag it.
policyCan a Mental Health Support Chatbot Be Safe If It Learns From Forums?
LLUMI matches GPT empathy scores by training on Reddit upvotes, but its safety evaluations lack clinical credentials, shifting liability to any platform that deploys it.
policyDataset Watermarks Fail to Trace Fine-Tuned AI Image Models, New Benchmark Finds
A new benchmark finds dataset watermarks can be stripped from fine-tuned diffusion models without quality loss, undermining post-hoc traceability as a regulatory mechanism.
- models Opus 4.8 vs Opus 4.7: What Changed and What Did Not
- agents Claude Code, Cursor, Copilot: How Agentic Coding Assistants Get Weaponized as Attacker Shells
- devtools Anthropic Buys Stainless: OpenAI and Google Now Depend on a Rival for SDK Tooling
- agents A New Trust Schema Exposes Why Agent Skill Registries Fail Enterprise Audit Requirements
- policy FTC's TAKE IT DOWN Act Lands May 19: 48-Hour Deepfake NCII Takedowns and No Safe Harbor
- devtools Claude Code Plugins: Anthropic's Official Plugin Ecosystem Explained
- devtools GitHub Copilot vs Cursor vs Claude Code: The 2026 AI Coding Showdown
- devtools GitHub Copilot's Opus 4.7 Multiplier: 7.5x to 15x to 27x in 60 Days
- infra MLX vs llama.cpp on Apple Silicon: Which Runtime to Use for Local LLM Inference
- models AI Code Generation Benchmarks 2026: Which Model Actually Writes Better Code?
- models Chinese AI Models Compared: DeepSeek, Qwen, Kimi, Doubao, and Ernie
- devtools Claude Code in GitHub Actions: A Complete Guide to Automated PR Fixes
- security The Mysterious Case of Chinese Bot Traffic in 2026: How AI-Powered Bots Are Rewriting the Rules of Detection
- industry Cursor's Meteoric Rise: From $300M to $3B ARR in a Year
- infra Prefill-Decode Disaggregation: The Architecture Shift Redefining LLM Serving at Scale
- may 31 models Why LLMs Fail at Spatial Reasoning When Planning Navigation
- may 31 culture Ranking LLMs Side by Side Makes Their Dialect Bias Worse
- may 30 culture Replacing Workers With AI Erodes the Skills You'll Need Later
- may 30 culture Does AI Have 6.5 Years Before It Breaches a Planetary Boundary?
- may 29 oss An Open-Source 80386 Rebuilt Around Intel's Original Microcode
- may 31 security Vercel AI SDK CVE-2025-48985: Input Validation Bypass Hits LLM App Builders
- may 30 policy Can a Mental Health Support Chatbot Be Safe If It Learns From Forums?
- may 30 policy Dataset Watermarks Fail to Trace Fine-Tuned AI Image Models, New Benchmark Finds
- may 30 culture Can LLM Agents Realistically Fake Reactions to Online News?
- may 31 policy Can Synthetic Preference Data Keep RLHF Private Without Wrecking Alignment?
- may 30 security Job Seekers Are Prompt-Injecting AI Resume Screeners. New Study Measures the Hit Rate
- may 31 agents What Breaks When Claude Code Writes Production Code: A New Failure Catalog
- may 31 security Hijacking AI Agent Memory: One Conversation Can Plant a Persistent Trojan
- jun 01 devtools JetBrains Ships Codex Natively, Making Its IDE the Multi-Vendor AI Surface
- may 30 security Why Audio Jailbreaks Slip Past the Safety Training Built for Text LLMs
- may 31 security Why Attack Success Rate Misleads LLM Jailbreak Benchmarks
- may 31 agents More Agents, Worse Results: Why Multi-Agent LLM Teams Hold Experts Back
- may 30 models Can an LLM Peer-Review Your Paper? A New Behavior Benchmark
- may 30 security LoRA Adapter Backdoors Generalize Beyond Their Trigger Tokens
- may 31 devtools Transformers.js v4 Moves Transformer Inference Into the Browser
- may 30 infra Cloudflare Turnstile Now Fingerprints WebGL: The Privacy CAPTCHA Tradeoff
- may 29 industry Valve's $200 Steam Deck Price Hike Concedes the Handheld PC Margin Squeeze
- may 30 models Anthropic Scaled Sparse Autoencoders to Claude 3 Sonnet. Interpretability Now Costs Compute
- may 31 industry OpenRouter's $113M Series B Bets Routing Beats Picking a Single LLM
- jun 01 industry Anthropic's $965B Private Mark Now Faces a Confidential S-1
- may 31 models Does Giving AI Agents More Skills Help? A Controlled SkillsBench Study
- may 31 policy FTC's May 11 Take It Down Act Letters Set May 19 Deadline: 48-Hour Removal, $53,088 Per Violation
- may 28 policy Can LLM Personas Replace Human Survey Respondents? New arXiv Paper Tests Decision Alignment
- may 28 culture Wikipedia's Foundation Is Running Big Tech's Anti-Labor Playbook, an Editor Argues
- may 28 security Three Labs Concede Browser Agents Cannot Stop Prompt Injection
- may 28 agents Multi-Agent LLM Coordination: Why Attention Steering Beats Full Broadcast
- may 28 models Tracing Why LLM Agent Memory Fails: A Method for Attributing Errors
- may 28 security Vercel Firewall Now Blocks SAMLStorm. Can an Edge WAF Fix a SAML Signature Flaw?
- may 28 models Persona Prompts Change Who an LLM Recommends as an Expert
- may 28 policy Distributed Training Breaks the Compute Thresholds Behind AI Regulation
- may 28 agents DataClawBench: AI Agents Fail at Exploratory Financial Analysis Across 492 Tasks
- may 28 infra The Viral AWS Support Post Is a Warning About Cloud Escalation Paths
- may 28 policy A Single RLHF Pass Can't Align an LLM to Every Online Community
- may 28 oss Models.dev Turns Scattered AI Model Pricing Into One Open Database
- may 28 policy RLHF Can Be Exploited to Optimize the Biases It Was Built to Suppress
- may 28 agents Agentic RAG Has a Credit-Assignment Problem That Subgoaling Tries to Fix
- may 27 oss Frontier AI Has Broken Open CTFs: Why Claude Code Now One-Shots Medium Pwn Challenges
- may 27 policy Selective Geometry Attacks Bypass LLM Safety Alignment, New arXiv Paper Reports
- may 27 industry OpenAI's Indeed Customer Story Pushes ChatGPT Into the Job-Description Stack Ahead of LinkedIn
- may 27 industry HiBob Runs 2,500 Internal GPTs: OpenAI's New Enterprise Adoption Metric
- may 27 industry OpenAI's Trusted-Access Programs Force a Compliance Tier onto Pharma AI Buyers
- may 27 agents SkillOpt Treats Agent Skill Libraries as an Executive Scheduling Problem, Not a Memory Bank
- may 27 oss Audiomass Adds Multitrack to the Browser-Only Open-Source Audio Editor
- may 27 agents Claude Code Dynamic Workflows: Spawning 100 Parallel Subagents on Opus 4.8
- may 27 agents How Opus 4.8 Honesty Prevents Cascade Failures in Agentic Loops