groundy

Groundy — independent coverage of developer tools, infrastructure, and platforms





  1. jun 09 models How LLMs Track Who Did What: The Entity Rebinding Circuit
  2. jun 09 devtools Vercel's Chat SDK Targets Every Chat Platform From One Codebase
  3. jun 09 infra MiniMax M3 Ships 1M Context and Desktop Control as Open Weights
  4. jun 09 devtools NPM v12 Breaking Changes: Auditing Your Lockfiles Before the Upgrade
  5. jun 09 infra DeepSeek-V4 FlashMemory: Sparse Attention for Million-Token Context
  6. jun 09 agents When AI Agents Delegate Work, Your Observability Stack Goes Blind
  7. jun 08 security Skill Injection: Hiding Undetectable Instructions in What an AI Agent Loads
  8. jun 08 models LLM Steganography: Can Defenders Detect Payloads Hidden in Model Output?
  9. jun 08 policy Who Gets to Audit Your Health Chatbot? Almost No One
  10. jun 08 policy Do Word-Subset Explanations Satisfy the EU AI Act's Transparency Rule?
  11. jun 08 infra Is Cloudflare's Bot Traffic Surge Real? The Measurement Dispute
  12. jun 08 industry OpenAI Pushes ChatGPT Into Compensation Data, Pressuring Mercer and Radford
  13. jun 08 policy Bit-Exact Inference Verification Gives AI Audits a Proof Mechanism
  14. jun 08 models Do Privacy Defenses Actually Protect Fine-Tuned LLMs? A New Benchmark
  15. jun 08 models Can You Reconstruct an LLM's System Prompt From Its Activations?
  16. jun 08 policy Can a Robot's Own Attention Flag Its Unsafe Actions Before They Run?
  17. jun 08 devtools Can a CLI Replace Screenshots for GUI Automation Agents?
  18. jun 08 agents Bloomberg's Pomona Makes Small Automated Code Changes, Not Big Agent PRs
  19. jun 08 agents Agent Tool-Gating Moves From Prompt Rules to Learned Policies
  20. jun 08 culture Does Debate Quality Survive When LLMs Argue Outside English?
  21. jun 08 security Splitting a Malicious Task Across Tool Calls Slips Past LLM Agent Guardrails
  22. jun 08 agents More Capable LLMs Cooperate Less in Zero-Cost Collaboration Tests
  23. jun 08 policy Can One Safety Adapter Realign Every Fine-Tuned LLM?
  24. jun 08 industry Bending Spoons Files to IPO: The App Roll-Up Playbook Goes Public
  25. jun 08 devtools How Cursor Uses GPT-5: What OpenAI's Writeup Tells Coding Teams
  26. jun 08 oss DuckDB Queries Hugging Face Parquet Files Over HTTP Without Downloads
  27. jun 08 models Does Softmax Normalization Limit What Attention Can Represent?
  28. jun 08 infra Huawei's KVarN Puts KV-Cache Quantization Inside vLLM's Backend
  29. jun 07 policy Can AI Be Aligned Without Modeling Human Cognitive Diversity?
  30. jun 07 models Can an Attacker Steal Your Model's Last Layer From Its Outputs?
  31. jun 07 policy Is the Pentagon's Software Pathway Ready to Buy AI Systems?
  32. jun 07 security Web Agents Can Be Talked Into Abandoning Their Task: The TRAP Benchmark
  33. jun 07 security Shallow Neural Nets Beat LLM Guardrails at Catching Prompt Injection
  34. jun 07 security When an AI Agent Clicks a Link: OpenAI's Data-Exfiltration Model
  35. jun 07 agents Why Foundation Model Agents Pass Benchmarks but Fail in Production
  36. jun 07 industry Vercel's Rox Case Study Pitches AI Agents as a Revenue Operating System
  37. jun 07 industry AI Patent Valuation Models Aim to Replace the Expert Appraiser
  38. jun 06 policy Data Safety Policies for AI Agents: Controlling What an Agent Can Leak
  39. jun 06 agents Can AI Agents Repair Broken Network Configs? A New Benchmark Tests It
  40. jun 06 agents Can Self-Evolving AI Agents Drift Without a Human in the Loop?
  41. jun 06 culture A Covert LLM Persuasion Experiment Was Shut Down: How Far Did the Bots Get?
  42. jun 06 infra Indexing Images for RAG: kapa.ai's Approach to Multimodal Retrieval
  43. jun 06 models Can LLMs Leak Training Data? A New Test Splits Capacity From Intent
  44. jun 06 policy GDPR Rectification Rights Have No Clear Owner in ML Supply Chains
  45. jun 06 security Benchmarking RAG Over Cyber Threat Intelligence: Where Retrieval Breaks
  46. jun 06 models When an AI Agent's Tools Break, Can It Recover? A New Benchmark
  47. jun 06 industry US Hyperscale Data Centers: A Carbon Audit That Recasts AI Power Costs
  48. jun 05 infra The RTX Spark Bet on Unified Memory for Local LLMs: Where Bandwidth Caps It
  49. jun 05 infra Reading Vercel's Fluid Compute vs Cloudflare Workers Benchmark
  50. jun 05 agents Fine-Tuning Multi-Agent LLM Systems: RL Enters Where Prompt Tweaks Stall
load older →