Groundy — independent coverage of developer tools, infrastructure, and platforms
Do Word-Subset Explanations Satisfy the EU AI Act's Transparency Rule?
A KDD 2026 paper attributes LLM outputs to input words without model access, but shows which tokens mattered, not how the model reasoned, creating an EU AI Act compliance gap.
infraIs Cloudflare's Bot Traffic Surge Real? The Measurement Dispute
Cloudflare claims a 15x bot surge using a classifier that flags privacy browsers as bots. Audit your own logs before trusting the numbers behind Pay-Per-Crawl.
OpenAI Pushes ChatGPT Into Compensation Data, Pressuring Mercer and Radford
OpenAI's 3M daily compensation queries push ChatGPT into salary benchmarking, but the model lacks the proprietary employer panels behind Radford and Mercer's moat.
policyBit-Exact Inference Verification Gives AI Audits a Proof Mechanism
An arXiv preprint shows GPU inference outputs can be reproduced bit-for-bit across hardware, giving auditors a forensic trail to verify which model produced a given output.
modelsDo Privacy Defenses Actually Protect Fine-Tuned LLMs? A New Benchmark
A June 2026 benchmark shows passing privacy attack probes on fine-tuned LLMs is not a formal guarantee, exposing a compliance gap for teams deploying models on customer data.
modelsCan You Reconstruct an LLM's System Prompt From Its Activations?
PRISM recovers full instruction sets inside frozen LLMs from hidden states, enabling anyone with activation access to reconstruct system prompts without output probing.
policyCan a Robot's Own Attention Flag Its Unsafe Actions Before They Run?
Two June 2026 preprints show VLA robot policies already compute safety-relevant signals at inference, enabling real-time collision monitors with no retraining.
devtoolsCan a CLI Replace Screenshots for GUI Automation Agents?
AppAgent-Claw replaces the VLM screenshot loop with CLI queries for GUI automation, cutting cost and latency, but only where applications expose a usable text surface.
- models Opus 4.8 vs Opus 4.7: What Changed and What Did Not
- agents Claude Code, Cursor, Copilot: How Agentic Coding Assistants Get Weaponized as Attacker Shells
- devtools Anthropic Buys Stainless: OpenAI and Google Now Depend on a Rival for SDK Tooling
- agents A New Trust Schema Exposes Why Agent Skill Registries Fail Enterprise Audit Requirements
- policy FTC's TAKE IT DOWN Act Lands May 19: 48-Hour Deepfake NCII Takedowns and No Safe Harbor
- infra MLX vs llama.cpp on Apple Silicon: Which Runtime to Use for Local LLM Inference
- devtools GitHub Copilot vs Cursor vs Claude Code: The 2026 AI Coding Showdown
- models Chinese AI Models Compared: DeepSeek, Qwen, Kimi, Doubao, and Ernie
- models AI Code Generation Benchmarks 2026: Which Model Actually Writes Better Code?
- industry Cursor's Meteoric Rise: Inside the AI Editor Hitting $300M ARR
- devtools Claude Code in GitHub Actions: A Complete Guide to Automated PR Fixes
- devtools GitHub Copilot's Opus 4.7 Multiplier: 7.5x to 15x to 27x in 60 Days
- devtools Claude Code Plugins: Anthropic's Official Plugin Ecosystem Explained
- devtools GitHub Copilot Replaces Premium Request Units With Token-Metered AI Credits on June 1
- culture EU's 2027 Replaceable Battery Mandate: What It Means for Phone Buyers and Repairers Right Now
- jun 08 policy Do Word-Subset Explanations Satisfy the EU AI Act's Transparency Rule?
- jun 08 infra Is Cloudflare's Bot Traffic Surge Real? The Measurement Dispute
- jun 08 industry OpenAI Pushes ChatGPT Into Compensation Data, Pressuring Mercer and Radford
- jun 08 policy Bit-Exact Inference Verification Gives AI Audits a Proof Mechanism
- jun 08 models Do Privacy Defenses Actually Protect Fine-Tuned LLMs? A New Benchmark
- jun 08 models Can You Reconstruct an LLM's System Prompt From Its Activations?
- jun 08 policy Can a Robot's Own Attention Flag Its Unsafe Actions Before They Run?
- jun 08 devtools Can a CLI Replace Screenshots for GUI Automation Agents?
- jun 08 agents Bloomberg's Pomona Makes Small Automated Code Changes, Not Big Agent PRs
- jun 08 agents Agent Tool-Gating Moves From Prompt Rules to Learned Policies
- jun 08 culture Does Debate Quality Survive When LLMs Argue Outside English?
- jun 08 security Splitting a Malicious Task Across Tool Calls Slips Past LLM Agent Guardrails
- jun 08 agents More Capable LLMs Cooperate Less in Zero-Cost Collaboration Tests
- jun 08 policy Can One Safety Adapter Realign Every Fine-Tuned LLM?
- jun 08 industry Bending Spoons Files to IPO: The App Roll-Up Playbook Goes Public
- jun 08 devtools How Cursor Uses GPT-5: What OpenAI's Writeup Tells Coding Teams
- jun 08 oss DuckDB Queries Hugging Face Parquet Files Over HTTP Without Downloads
- jun 08 models Does Softmax Normalization Limit What Attention Can Represent?
- jun 08 infra Huawei's KVarN Puts KV-Cache Quantization Inside vLLM's Backend
- jun 07 policy Can AI Be Aligned Without Modeling Human Cognitive Diversity?
- jun 07 models Can an Attacker Steal Your Model's Last Layer From Its Outputs?
- jun 07 policy Is the Pentagon's Software Pathway Ready to Buy AI Systems?
- jun 07 security Web Agents Can Be Talked Into Abandoning Their Task: The TRAP Benchmark
- jun 07 security Shallow Neural Nets Beat LLM Guardrails at Catching Prompt Injection
- jun 07 security When an AI Agent Clicks a Link: OpenAI's Data-Exfiltration Model
- jun 07 agents Why Foundation Model Agents Pass Benchmarks but Fail in Production
- jun 07 industry Vercel's Rox Case Study Pitches AI Agents as a Revenue Operating System
- jun 07 industry AI Patent Valuation Models Aim to Replace the Expert Appraiser
- jun 06 policy Data Safety Policies for AI Agents: Controlling What an Agent Can Leak
- jun 06 agents Can AI Agents Repair Broken Network Configs? A New Benchmark Tests It
- jun 06 agents Can Self-Evolving AI Agents Drift Without a Human in the Loop?
- jun 06 culture A Covert LLM Persuasion Experiment Was Shut Down: How Far Did the Bots Get?
- jun 06 infra Indexing Images for RAG: kapa.ai's Approach to Multimodal Retrieval
- jun 06 models Can LLMs Leak Training Data? A New Test Splits Capacity From Intent
- jun 06 policy GDPR Rectification Rights Have No Clear Owner in ML Supply Chains
- jun 06 security Benchmarking RAG Over Cyber Threat Intelligence: Where Retrieval Breaks
- jun 06 models When an AI Agent's Tools Break, Can It Recover? A New Benchmark
- jun 06 industry US Hyperscale Data Centers: A Carbon Audit That Recasts AI Power Costs
- jun 05 infra The RTX Spark Bet on Unified Memory for Local LLMs: Where Bandwidth Caps It
- jun 05 infra Reading Vercel's Fluid Compute vs Cloudflare Workers Benchmark
- jun 05 agents Fine-Tuning Multi-Agent LLM Systems: RL Enters Where Prompt Tweaks Stall
- jun 05 security Stronger Safety Alignment Made LLMs Easier to Jailbreak, Not Harder
- jun 05 security SAML Signature Bypass Is Back: Inside the SAMLStorm Vulnerability Class
- jun 05 policy When LLM Safety Lives at Inference, Not Training: A Certification Gap
- jun 05 culture Do LLMs Understand Idioms in Low-Resource Languages?
- jun 05 infra Does CUDA Tile Match Hand-Tuned Kernels on Hopper and Blackwell?
- jun 05 security SAMLStorm: The SAML Signature Bug That Forges Valid SSO Logins
- jun 05 models MiniMax M3 Bets on Sparse Attention for 1M Context. Does the Math Hold?
- jun 05 models Can One Model Handle Every CAD Task? UniCAD Tests It
- jun 05 models Do Foundation Models Actually Learn Relational Structure In-Context?