groundy

Groundy — independent coverage of developer tools, infrastructure, and platforms





  1. jun 24 models Flow Matching vs U-Net: A Skip-Free Backbone for Speech Models
  2. jun 24 security Measuring LLM Safety by Refusal Alignment Instead of Attack Success Rate
  3. jun 24 security Poisoning Physics-Informed Neural Networks Slips Past Loss-Based Validation
  4. jun 24 policy 50 Years of Aviation Certification Expose a Structural Gap in AI Governance
  5. jun 24 security Catching LLM Jailbreaks by Watching Per-Layer Entropy, Not Outputs
  6. jun 24 oss Cost and Access, Not Ideology, Drive Open-Weight Chinese Model Adoption
  7. jun 24 models A Per-Neuron Sequence Model Was Withdrawn From arXiv as Coverage Hailed It
  8. jun 24 policy Do Reasoning Tokens Actually Make LLMs Safer? A New Paper Tests It
  9. jun 24 devtools Nub Bundles a Bun-Style Toolkit Onto Node Without the Runtime Swap
  10. jun 24 oss Bot-Account Lookups Miss 97% of AI Coding Agent Commits, 180M-Repo Census Finds
  11. jun 24 security How Reliable Are the LLM Judges Scoring Jailbreak Attacks?
  12. jun 24 models PV-TAM Corrects Decoding Drift and Boundary-Marker Bias in VLM Localization Scoring
  13. jun 24 agents Do AGENTS.md Files Actually Help Coding Agents? A New Benchmark Tests It
  14. jun 24 agents Should AI Shopping Agents Pay Micro-Transactions for Verified Product Data?
  15. jun 24 models Meituan's General 365 Benchmark: Top Models All Score Under 63%
  16. jun 24 models LLM Surrogates in A/B Tests: The 39% Recovery Gap and the Silent Bias Risk
  17. jun 24 models LLM Token Pricing vs Compute Cost: What the Tokenomics Math Shows
  18. jun 24 models Do LLM Judges Favor Their Own Output? A Sanity Check on Self-Preference
  19. jun 23 agents Can a Conversational Graph Compile Into a Goal-Oriented Dialogue Runtime?
  20. jun 23 security Auto-Reproducing Text-to-Image Jailbreaks From Papers: The PixJail Pipeline
  21. jun 23 agents Can a Cryptographic Certificate Prove an AI Agent's Output Is Valid?
  22. jun 23 infra Vercel on the AWS Marketplace: What the Listing Does to Procurement and Lock-In
  23. jun 23 policy Machine-Readable AI Usage Terms: Does ODRL's Permission Model Hold Up?
  24. jun 23 agents CrewAI vs AutoGen vs Microsoft Agent Framework: AutoGen's Merger Reframes the 2026 Choice
  25. jun 23 devtools Vercel Now Deploys Long-Running Node Servers: The Serverless Boundary Shifts
  26. jun 23 policy Who Audits the Safety Rules an LLM Agent Evolves for Itself?
  27. jun 23 agents Can You Trust an LLM Judge to Grade an Agentic Data Analysis System?
  28. jun 23 agents Do LLM Agent Societies Develop Their Own Authority Hierarchies?
  29. jun 23 infra Serving Cold MoE Models: CrossPool Disaggregates KV Cache and Weights
  30. jun 23 security Vercel BotID's Telemetry Is a Threat Intelligence Feed Most Teams Discard
  31. jun 23 policy When Vibe-Coded Software Is Safety-Critical, Who Verifies It?
  32. jun 23 security Extracting Unseen Training Data From an LLM by Poisoning Its Loss Landscape
  33. jun 23 agents Do Retrieval Metrics Predict Tool-Use Agent Success? A Paper Says No
  34. jun 23 infra Vercel's In-Function Concurrency: What It Does to Cold Starts and Billing
  35. jun 23 policy Can You Trust an AI Robustness Certificate? A Paper Says Verify It
  36. jun 23 agents Can You Pinpoint Which Step Broke a Long-Horizon AI Agent?
  37. jun 23 industry Vercel's Series D Thesis Hardened Into a Whole-Stack Lock-In
  38. jun 23 devtools make-look-scanned Simulates Scans in an Offline WASM File, Exposing PDF Provenance as a Pixel Check
  39. jun 23 infra Poisoning a RAG Retriever: How Conflict-Aware Edits Inject False Knowledge
  40. jun 23 models Can AI Write CAD Programs? CADBench Measures the Gap
  41. jun 23 infra Vercel Raised Its CDN Origin Timeout to Two Minutes: What Breaks First
  42. jun 23 infra Gradio-Lite Runs Model Inference in the Browser via Pyodide, No Server
  43. jun 23 devtools Vercel's Billing Usage API: Wiring Cost Data Into CI Cost Gates
  44. jun 23 infra Cloudflare AI Gateway Adds Spend Limits to Cap the Runaway Inference Bill
  45. jun 23 infra Vercel Now Honors stale-if-error: Serving Stale Cache When the Origin Dies
  46. jun 23 models ByteDance's Doubao 2.1 Pro vs GPT-5.5: Reading Self-Reported Benchmarks
  47. jun 22 policy Can a Benchmark Catch When AI Discharge Summaries Drop Care Steps?
  48. jun 22 devtools Vercel CLI Now Scopes Commands to the Local Directory: Audit Your CI Scripts
  49. jun 22 security React Router CVE-2025-31137: Vercel's Edge Fix Is Not the Patch
  50. jun 22 infra Vercel's Manual CDN Purge API: Cache Control Without a Redeploy
load older →