groundy

all articles


  1. jun 13 security AMD Took 124 Days to Patch the RCE It First Called Out of Scope
  2. jun 12 policy US Export Order Forces Anthropic to Disable Fable 5 and Mythos 5 Worldwide
  3. jun 10 models Claude Fable 5 Benchmarks: What FrontierCode, CursorBench, and ViBench Show
  4. jun 11 agents Computer-Use Agents Fabricate Success on 8 to 33 Percent of Long-Horizon Tasks
  5. jun 10 infra Running RAG on a Snapdragon NPU: The On-Device Retrieval Tradeoff
  6. jun 10 models Does Attribution Patching Lie? A Fix for a Common Interpretability Shortcut
  7. jun 11 models Can You Make a Multimodal Model Unlearn With Activation Steering?
  8. jun 11 models Why Pruning a Model Can Raise Its Out-of-Distribution Accuracy
  9. jun 11 industry Vercel's Turborepo: Build Speed Becomes a Hosting-Vendor Feature
  10. jun 10 security OpenAI Frames Instruction Hierarchy as an Open Challenge, Not a Prompt-Injection Fix
  11. jun 10 devtools JetBrains Mellum2: A 12B Open-Weights Code Model for Self-Hosted Completion
  12. jun 09 models Do Unified Multimodal Models Actually Interleave Understanding and Generation?
  13. jun 09 agents Can AI Agents Share Context Without a Central Coordinator?
  14. jun 09 agents Why Skill Creation and Reward Optimization Collide in Agentic RL
  15. jun 09 infra GraphRAG vs VectorRAG: Does the Graph Index Earn Its Cost?
  16. jun 09 models How LLMs Track Who Did What: The Entity Rebinding Circuit
  17. jun 09 devtools Vercel's Chat SDK Targets Every Chat Platform From One Codebase
  18. jun 09 infra MiniMax M3 Ships 1M Context and Desktop Control as Open Weights
  19. jun 09 devtools NPM v12 Breaking Changes: Auditing Your Lockfiles Before the Upgrade
  20. jun 09 infra DeepSeek-V4 FlashMemory: Sparse Attention for Million-Token Context
  21. jun 09 agents When AI Agents Delegate Work, Your Observability Stack Goes Blind
  22. jun 09 models Claude Fable 5 vs Opus 4.8: When 2x Pricing Is Worth It
  23. jun 09 models Claude Mythos 5 Access Rules: Who Gets Project Glasswing and Why
  24. jun 09 policy Fable 5 Biology Classifiers: How Flagged Prompts Fall Back to Opus 4.8
  25. jun 09 industry Fable 5 Credit Cliff: What the June 23 Billing Shift Means for Teams
  26. jun 09 models Fable 5 Distillation Protection: How Anthropic Blocks Model Copying
  27. jun 09 models Skip Fable 5 or Upgrade? When Opus 4.8 and Sonnet 4.6 Are Still Enough
  28. jun 08 security Skill Injection: Hiding Undetectable Instructions in What an AI Agent Loads
  29. jun 08 models LLM Steganography: Can Defenders Detect Payloads Hidden in Model Output?
  30. jun 08 policy Who Gets to Audit Your Health Chatbot? Almost No One
  31. jun 08 policy Do Word-Subset Explanations Satisfy the EU AI Act's Transparency Rule?
  32. jun 08 infra Is Cloudflare's Bot Traffic Surge Real? The Measurement Dispute
  33. jun 08 industry OpenAI Pushes ChatGPT Into Compensation Data, Pressuring Mercer and Radford
  34. jun 08 policy Bit-Exact Inference Verification Gives AI Audits a Proof Mechanism
  35. jun 08 models Do Privacy Defenses Actually Protect Fine-Tuned LLMs? A New Benchmark
  36. jun 08 models Can You Reconstruct an LLM's System Prompt From Its Activations?
  37. jun 08 policy Can a Robot's Own Attention Flag Its Unsafe Actions Before They Run?
  38. jun 08 devtools Can a CLI Replace Screenshots for GUI Automation Agents?
  39. jun 08 agents Bloomberg's Pomona Makes Small Automated Code Changes, Not Big Agent PRs
  40. jun 08 agents Agent Tool-Gating Moves From Prompt Rules to Learned Policies
  41. jun 08 culture Does Debate Quality Survive When LLMs Argue Outside English?
  42. jun 08 security Splitting a Malicious Task Across Tool Calls Slips Past LLM Agent Guardrails
  43. jun 08 agents More Capable LLMs Cooperate Less in Zero-Cost Collaboration Tests
  44. jun 08 policy Can One Safety Adapter Realign Every Fine-Tuned LLM?
  45. jun 08 industry Bending Spoons Files to IPO: The App Roll-Up Playbook Goes Public
  46. jun 08 devtools How Cursor Uses GPT-5: What OpenAI's Writeup Tells Coding Teams
  47. jun 08 oss DuckDB Queries Hugging Face Parquet Files Over HTTP Without Downloads
  48. jun 08 models Does Softmax Normalization Limit What Attention Can Represent?
  49. jun 08 infra Huawei's KVarN Puts KV-Cache Quantization Inside vLLM's Backend
  50. jun 07 policy Can AI Be Aligned Without Modeling Human Cognitive Diversity?
  51. jun 07 models Can an Attacker Steal Your Model's Last Layer From Its Outputs?
  52. jun 07 policy Is the Pentagon's Software Pathway Ready to Buy AI Systems?
  53. jun 07 security Web Agents Can Be Talked Into Abandoning Their Task: The TRAP Benchmark
  54. jun 07 security Shallow Neural Nets Beat LLM Guardrails at Catching Prompt Injection
  55. jun 07 security When an AI Agent Clicks a Link: OpenAI's Data-Exfiltration Model
  56. jun 07 agents Why Foundation Model Agents Pass Benchmarks but Fail in Production
  57. jun 07 industry Vercel's Rox Case Study Pitches AI Agents as a Revenue Operating System
  58. jun 07 industry AI Patent Valuation Models Aim to Replace the Expert Appraiser
  59. jun 06 policy Data Safety Policies for AI Agents: Controlling What an Agent Can Leak
  60. jun 06 agents Can AI Agents Repair Broken Network Configs? A New Benchmark Tests It
  61. jun 06 agents Can Self-Evolving AI Agents Drift Without a Human in the Loop?
  62. jun 06 culture A Covert LLM Persuasion Experiment Was Shut Down: How Far Did the Bots Get?
  63. jun 06 infra Indexing Images for RAG: kapa.ai's Approach to Multimodal Retrieval
  64. jun 06 models Can LLMs Leak Training Data? A New Test Splits Capacity From Intent
  65. jun 06 policy GDPR Rectification Rights Have No Clear Owner in ML Supply Chains
  66. jun 06 security Benchmarking RAG Over Cyber Threat Intelligence: Where Retrieval Breaks
  67. jun 06 models When an AI Agent's Tools Break, Can It Recover? A New Benchmark
  68. jun 06 industry US Hyperscale Data Centers: A Carbon Audit That Recasts AI Power Costs
  69. jun 05 infra The RTX Spark Bet on Unified Memory for Local LLMs: Where Bandwidth Caps It
  70. jun 05 infra Reading Vercel's Fluid Compute vs Cloudflare Workers Benchmark
  71. jun 05 agents Fine-Tuning Multi-Agent LLM Systems: RL Enters Where Prompt Tweaks Stall
  72. jun 05 security Stronger Safety Alignment Made LLMs Easier to Jailbreak, Not Harder
  73. jun 05 security SAML Signature Bypass Is Back: Inside the SAMLStorm Vulnerability Class
  74. jun 05 policy When LLM Safety Lives at Inference, Not Training: A Certification Gap
  75. jun 05 culture Do LLMs Understand Idioms in Low-Resource Languages?
  76. jun 05 infra Does CUDA Tile Match Hand-Tuned Kernels on Hopper and Blackwell?
  77. jun 05 security SAMLStorm: The SAML Signature Bug That Forges Valid SSO Logins
  78. jun 05 models MiniMax M3 Bets on Sparse Attention for 1M Context. Does the Math Hold?
  79. jun 05 models Can One Model Handle Every CAD Task? UniCAD Tests It
  80. jun 05 models Do Foundation Models Actually Learn Relational Structure In-Context?
  81. jun 05 models Can LLMs Write Better Research Paper Titles Than Authors?
  82. jun 05 models Does Information-Theoretic Example Selection Beat kNN for In-Context Learning?
  83. jun 05 infra Pod-Level Remote Attestation in Kubernetes: Confidential Workloads on dstack
  84. jun 05 models Do Concept Bottleneck Model Benchmarks Measure Interpretability or Dataset Bias?
  85. jun 05 agents Cascading Hallucination in Agentic RAG: When One Bad Retrieval Poisons the Chain
  86. jun 05 security Vercel's Flags SDK Exposed Feature-Flag Definitions via CVE-2025-46332
  87. jun 05 models Continuous Bit-Width Quantization vs Fixed INT4: Does LiftQuant Beat Discrete?
  88. jun 04 models Federated Learning for Industrial IoT Anomaly Detection: The Data-Locality Tradeoff
  89. jun 04 infra Generating GPU Kernels for Moore Threads Silicon: Can LLMs Break CUDA Lock-In?
  90. jun 04 devtools Alibaba's Open Code Review Moves AI Review Into the CLI, Not the PR
  91. jun 04 infra Microsoft's Azure Linux Goes General-Purpose: The Container Base-Image Play
  92. jun 04 models Reading Failed LLM Reasoning Traces Won't Tell You Which Ones RL Can Fix
  93. jun 04 agents Can AI Agents Build Other Agents? The Meta-Agent Challenge Says Mostly Not Yet
  94. jun 04 models Can You Stitch Two Foundation Models Together Without Retraining?
  95. jun 04 infra Cloudflare Acquires VoidZero, the Company Behind Vite's Rust Toolchain
  96. jun 04 security Jailbreak Suffixes Hit Harder at Specific Token Positions, New GCG Variant Shows
  97. jun 04 policy When Should an LLM Forget You? A Benchmark for Deciding What Memory to Drop
  98. jun 04 security OpenAI Adds Lockdown Mode to ChatGPT, Shifting Prompt-Injection Risk to Users
  99. jun 04 policy When RL Training Rewards Capability-Seeking: A New Alignment Risk
  100. jun 04 models Do Reasoning LLMs Waste Tokens? OckBench Tries to Measure It