groundy

Groundy — independent coverage of developer tools, infrastructure, and platforms





  1. may 31 models Why LLMs Fail at Spatial Reasoning When Planning Navigation
  2. may 31 culture Ranking LLMs Side by Side Makes Their Dialect Bias Worse
  3. may 30 culture Replacing Workers With AI Erodes the Skills You'll Need Later
  4. may 30 culture Does AI Have 6.5 Years Before It Breaches a Planetary Boundary?
  5. may 29 oss An Open-Source 80386 Rebuilt Around Intel's Original Microcode
  6. may 31 security Vercel AI SDK CVE-2025-48985: Input Validation Bypass Hits LLM App Builders
  7. may 30 policy Can a Mental Health Support Chatbot Be Safe If It Learns From Forums?
  8. may 30 policy Dataset Watermarks Fail to Trace Fine-Tuned AI Image Models, New Benchmark Finds
  9. may 30 culture Can LLM Agents Realistically Fake Reactions to Online News?
  10. may 31 policy Can Synthetic Preference Data Keep RLHF Private Without Wrecking Alignment?
  11. may 30 security Job Seekers Are Prompt-Injecting AI Resume Screeners. New Study Measures the Hit Rate
  12. may 31 agents What Breaks When Claude Code Writes Production Code: A New Failure Catalog
  13. may 31 security Hijacking AI Agent Memory: One Conversation Can Plant a Persistent Trojan
  14. jun 01 devtools JetBrains Ships Codex Natively, Making Its IDE the Multi-Vendor AI Surface
  15. may 30 security Why Audio Jailbreaks Slip Past the Safety Training Built for Text LLMs
  16. may 31 security Why Attack Success Rate Misleads LLM Jailbreak Benchmarks
  17. may 31 agents More Agents, Worse Results: Why Multi-Agent LLM Teams Hold Experts Back
  18. may 30 models Can an LLM Peer-Review Your Paper? A New Behavior Benchmark
  19. may 30 security LoRA Adapter Backdoors Generalize Beyond Their Trigger Tokens
  20. may 31 devtools Transformers.js v4 Moves Transformer Inference Into the Browser
  21. may 30 infra Cloudflare Turnstile Now Fingerprints WebGL: The Privacy CAPTCHA Tradeoff
  22. may 29 industry Valve's $200 Steam Deck Price Hike Concedes the Handheld PC Margin Squeeze
  23. may 30 models Anthropic Scaled Sparse Autoencoders to Claude 3 Sonnet. Interpretability Now Costs Compute
  24. may 31 industry OpenRouter's $113M Series B Bets Routing Beats Picking a Single LLM
  25. jun 01 industry Anthropic's $965B Private Mark Now Faces a Confidential S-1
  26. may 31 models Does Giving AI Agents More Skills Help? A Controlled SkillsBench Study
  27. may 31 policy FTC's May 11 Take It Down Act Letters Set May 19 Deadline: 48-Hour Removal, $53,088 Per Violation
  28. may 28 policy Can LLM Personas Replace Human Survey Respondents? New arXiv Paper Tests Decision Alignment
  29. may 28 culture Wikipedia's Foundation Is Running Big Tech's Anti-Labor Playbook, an Editor Argues
  30. may 28 security Three Labs Concede Browser Agents Cannot Stop Prompt Injection
  31. may 28 agents Multi-Agent LLM Coordination: Why Attention Steering Beats Full Broadcast
  32. may 28 models Tracing Why LLM Agent Memory Fails: A Method for Attributing Errors
  33. may 28 security Vercel Firewall Now Blocks SAMLStorm. Can an Edge WAF Fix a SAML Signature Flaw?
  34. may 28 models Persona Prompts Change Who an LLM Recommends as an Expert
  35. may 28 policy Distributed Training Breaks the Compute Thresholds Behind AI Regulation
  36. may 28 agents DataClawBench: AI Agents Fail at Exploratory Financial Analysis Across 492 Tasks
  37. may 28 infra The Viral AWS Support Post Is a Warning About Cloud Escalation Paths
  38. may 28 policy A Single RLHF Pass Can't Align an LLM to Every Online Community
  39. may 28 oss Models.dev Turns Scattered AI Model Pricing Into One Open Database
  40. may 28 policy RLHF Can Be Exploited to Optimize the Biases It Was Built to Suppress
  41. may 28 agents Agentic RAG Has a Credit-Assignment Problem That Subgoaling Tries to Fix
  42. may 27 oss Frontier AI Has Broken Open CTFs: Why Claude Code Now One-Shots Medium Pwn Challenges
  43. may 27 policy Selective Geometry Attacks Bypass LLM Safety Alignment, New arXiv Paper Reports
  44. may 27 industry OpenAI's Indeed Customer Story Pushes ChatGPT Into the Job-Description Stack Ahead of LinkedIn
  45. may 27 industry HiBob Runs 2,500 Internal GPTs: OpenAI's New Enterprise Adoption Metric
  46. may 27 industry OpenAI's Trusted-Access Programs Force a Compliance Tier onto Pharma AI Buyers
  47. may 27 agents SkillOpt Treats Agent Skill Libraries as an Executive Scheduling Problem, Not a Memory Bank
  48. may 27 oss Audiomass Adds Multitrack to the Browser-Only Open-Source Audio Editor
  49. may 27 agents Claude Code Dynamic Workflows: Spawning 100 Parallel Subagents on Opus 4.8
  50. may 27 agents How Opus 4.8 Honesty Prevents Cascade Failures in Agentic Loops
load older →