groundy

Groundy — independent coverage of developer tools, infrastructure, and platforms





  1. jun 07 security Web Agents Can Be Talked Into Abandoning Their Task: The TRAP Benchmark
  2. jun 07 security Shallow Neural Nets Beat LLM Guardrails at Catching Prompt Injection
  3. jun 07 security When an AI Agent Clicks a Link: OpenAI's Data-Exfiltration Model
  4. jun 07 agents Why Foundation Model Agents Pass Benchmarks but Fail in Production
  5. jun 07 industry Vercel's Rox Case Study Pitches AI Agents as a Revenue Operating System
  6. jun 07 industry AI Patent Valuation Models Aim to Replace the Expert Appraiser
  7. jun 06 policy Data Safety Policies for AI Agents: Controlling What an Agent Can Leak
  8. jun 06 agents Can AI Agents Repair Broken Network Configs? A New Benchmark Tests It
  9. jun 06 agents Can Self-Evolving AI Agents Drift Without a Human in the Loop?
  10. jun 06 culture A Covert LLM Persuasion Experiment Was Shut Down: How Far Did the Bots Get?
  11. jun 06 infra Indexing Images for RAG: kapa.ai's Approach to Multimodal Retrieval
  12. jun 06 models Can LLMs Leak Training Data? A New Test Splits Capacity From Intent
  13. jun 06 policy GDPR Rectification Rights Have No Clear Owner in ML Supply Chains
  14. jun 06 security Benchmarking RAG Over Cyber Threat Intelligence: Where Retrieval Breaks
  15. jun 06 models When an AI Agent's Tools Break, Can It Recover? A New Benchmark
  16. jun 06 industry US Hyperscale Data Centers: A Carbon Audit That Recasts AI Power Costs
  17. jun 05 infra The RTX Spark Bet on Unified Memory for Local LLMs: Where Bandwidth Caps It
  18. jun 05 infra Reading Vercel's Fluid Compute vs Cloudflare Workers Benchmark
  19. jun 05 agents Fine-Tuning Multi-Agent LLM Systems: RL Enters Where Prompt Tweaks Stall
  20. jun 05 security Stronger Safety Alignment Made LLMs Easier to Jailbreak, Not Harder
  21. jun 05 security SAML Signature Bypass Is Back: Inside the SAMLStorm Vulnerability Class
  22. jun 05 policy When LLM Safety Lives at Inference, Not Training: A Certification Gap
  23. jun 05 culture Do LLMs Understand Idioms in Low-Resource Languages?
  24. jun 05 infra Does CUDA Tile Match Hand-Tuned Kernels on Hopper and Blackwell?
  25. jun 05 security SAMLStorm: The SAML Signature Bug That Forges Valid SSO Logins
  26. jun 05 models MiniMax M3 Bets on Sparse Attention for 1M Context. Does the Math Hold?
  27. jun 05 models Can One Model Handle Every CAD Task? UniCAD Tests It
  28. jun 05 models Do Foundation Models Actually Learn Relational Structure In-Context?
  29. jun 05 models Can LLMs Write Better Research Paper Titles Than Authors?
  30. jun 05 models Does Information-Theoretic Example Selection Beat kNN for In-Context Learning?
  31. jun 05 infra Pod-Level Remote Attestation in Kubernetes: Confidential Workloads on dstack
  32. jun 05 models Do Concept Bottleneck Model Benchmarks Measure Interpretability or Dataset Bias?
  33. jun 05 agents Cascading Hallucination in Agentic RAG: When One Bad Retrieval Poisons the Chain
  34. jun 05 security Vercel's Flags SDK Exposed Feature-Flag Definitions via CVE-2025-46332
  35. jun 05 models Continuous Bit-Width Quantization vs Fixed INT4: Does LiftQuant Beat Discrete?
  36. jun 04 models Federated Learning for Industrial IoT Anomaly Detection: The Data-Locality Tradeoff
  37. jun 04 infra Generating GPU Kernels for Moore Threads Silicon: Can LLMs Break CUDA Lock-In?
  38. jun 04 devtools Alibaba's Open Code Review Moves AI Review Into the CLI, Not the PR
  39. jun 04 infra Microsoft's Azure Linux Goes General-Purpose: The Container Base-Image Play
  40. jun 04 models Reading Failed LLM Reasoning Traces Won't Tell You Which Ones RL Can Fix
  41. jun 04 agents Can AI Agents Build Other Agents? The Meta-Agent Challenge Says Mostly Not Yet
  42. jun 04 models Can You Stitch Two Foundation Models Together Without Retraining?
  43. jun 04 infra Cloudflare Acquires VoidZero, the Company Behind Vite's Rust Toolchain
  44. jun 04 security Jailbreak Suffixes Hit Harder at Specific Token Positions, New GCG Variant Shows
  45. jun 04 policy When Should an LLM Forget You? A Benchmark for Deciding What Memory to Drop
  46. jun 04 security OpenAI Adds Lockdown Mode to ChatGPT, Shifting Prompt-Injection Risk to Users
  47. jun 04 policy When RL Training Rewards Capability-Seeking: A New Alignment Risk
  48. jun 04 models Do Reasoning LLMs Waste Tokens? OckBench Tries to Measure It
  49. jun 04 security Activation Steering Was Sold as LLM Control. New Work Makes It an Attack Surface
  50. jun 04 culture Can Teaching Logical Fallacies Inoculate People Against AI Misinformation?
load older →