Groundy — independent coverage of developer tools, infrastructure, and platforms
AI Agent Alignment Tests Are One-Shot. A New Benchmark Catches Multi-Step Failures
MoralityGym proves AI agents pass one-shot alignment checks but drift toward violations across multi-step trajectories, a failure mode red-team prompt batteries cannot detect.
ossFiles.md Bets on Plain Markdown Folders as the Obsidian Exit Ramp
Files.md's HN traction exposed that Obsidian's core is closed-source, reframing its plugin ecosystem as a migration tax for developers evaluating plain-markdown alternatives.
Green Card Rule Change Forces Tech Workers to Leave the US to Apply
A May 22 USCIS memo eliminates standard in-country green card processing, forcing temporary visa holders into consular processing abroad with no guaranteed return.
cultureMicrosoft's Own Numbers: AI Agents Cost More Per Task Than the Human Employees They Replace
Microsoft's internal data shows agentic AI workflows consume 1000x more tokens than chat, with costs varying 30x between identical runs and zero correlation to output quality.
policyMicrosoft's Own Numbers Now Show AI Agents Cost More Than the Humans They Replaced
Microsoft's internal data shows token-burning AI agents now exceed the all-in cost of human labor, giving procurement teams vendor-supplied evidence to challenge 2027 renewal.
industryOpenAI Hires Slack's Denise Dresser as CRO, Conceding Enterprise Growth Needs a Sales Org
OpenAI hiring Salesforce veteran Denise Dresser as CRO signals a shift from product-led growth to field sales, driven by IPO pressure and Anthropic's enterprise momentum.
securityOpenAI Ships Lockdown Mode and Elevated Risk Labels for ChatGPT Sessions
OpenAI's Lockdown Mode kills ChatGPT network exfiltration paths at the infrastructure layer, conceding that model-level filtering cannot stop prompt injection at scale.
cultureTrump Ends Domestic Green Card Filing: Applicants Must Now Leave the US to Apply
A May 22 USCIS memo closes the domestic adjustment-of-status path, requiring H-1B, L-1, and F-1 visa holders to leave the US and refile through consular processing abroad.
- 01 devtools GitHub Copilot vs Cursor vs Claude Code: The 2026 AI Coding Showdown
- 02 devtools Claude Code Plugins: Anthropic's Official Plugin Ecosystem Explained
- 03 models Chinese AI Models Compared: DeepSeek, Qwen, Kimi, Doubao, and Ernie
- 04 infra MLX vs llama.cpp on Apple Silicon: Which Runtime to Use for Local LLM Inference
- 05 models AI Code Generation Benchmarks 2026: Which Model Actually Writes Better Code?
- 06 infra Prefill-Decode Disaggregation: The Architecture Shift Redefining LLM Serving at Scale
- 07 policy Atlassian Turned On AI Training Data Collection by Default — Here's What to Disable
- 08 industry Cursor's Meteoric Rise: Inside the AI Editor Hitting $300M ARR
- 09 devtools Claude Code in GitHub Actions: A Complete Guide to Automated PR Fixes
- 10 devtools GitHub Copilot's Opus 4.7 Multiplier: 7.5x to 15x to 27x in 60 Days
- may 22 Beyond Text-to-SQL: New Agentic Architecture Routes Enterprise Analytics Through Governed APIs
- may 22 GraphFlow Lifts LLM-Agent Workflows Into Schedulable Graphs to Optimize Serving
- may 22 Learning to Configure Agentic AI Systems Exposes a Gap in CrewAI and AutoGen Template Libraries
- may 22 Microsoft's 2026 Cost Math Forces CrewAI and LangGraph Users to Audit Token Spend Per Agent
- may 22 PBT-Bench Asks Whether AI Coding Agents Can Actually Write Property-Based Tests
- may 22 Cursor's In-House Model Changes the Vendor Calculus for AI Coding Teams
- may 22 Deno 2.8 Lands as Bun Gets Deprecated by yt-dlp: The JavaScript Runtime Field Is Reshuffling
- may 22 Google Sunsets Gemini CLI on June 18: Forced Migration to Antigravity CLI Breaks Existing Automation
- may 22 Malicious VSCode Extension Hit 3,800 Repos: What GitHub's Marketplace Trust Model Actually Verifies
- may 17 Claude Code Adds Plugin Dependency Enforcement: disable Now Refuses to Break Transitive Chains
- may 23 OpenAI Ships Lockdown Mode and Elevated Risk Labels for ChatGPT Sessions
- may 22 AI Jailbreaks Are Now a Reasoning Problem, Not a Prompt Problem
- may 22 OpenAI's New Agent Defense Post Concedes Prompt Injection Is Architectural, Not Patchable
- may 22 Jailbreak Defense Now Lives in Model Weights, Not in Prompt Filters
- may 22 Vercel Blocks Deploys With Vulnerable next-mdx-remote by Default: Platform Mitigation Outpaces the CVE Cycle
- may 23 Green Card Rule Change Forces Tech Workers to Leave the US to Apply
- may 23 OpenAI Hires Slack's Denise Dresser as CRO, Conceding Enterprise Growth Needs a Sales Org
- may 22 Microsoft and Uber's AI Agent Bills Expose a Per-Token Pricing Problem
- may 22 OpenAI's S-1 Will Force the First Public Audit of LLM Inference Margins
- may 22 OpenAI's S-1 Will Have to Define AGI for SEC Reviewers, Not Just Investors
- may 23 policy AI Agent Alignment Tests Are One-Shot. A New Benchmark Catches Multi-Step Failures
- may 23 oss Files.md Bets on Plain Markdown Folders as the Obsidian Exit Ramp
- may 23 industry Green Card Rule Change Forces Tech Workers to Leave the US to Apply
- may 23 culture Microsoft's Own Numbers: AI Agents Cost More Per Task Than the Human Employees They Replace
- may 23 policy Microsoft's Own Numbers Now Show AI Agents Cost More Than the Humans They Replaced
- may 23 industry OpenAI Hires Slack's Denise Dresser as CRO, Conceding Enterprise Growth Needs a Sales Org
- may 23 security OpenAI Ships Lockdown Mode and Elevated Risk Labels for ChatGPT Sessions
- may 23 culture Trump Ends Domestic Green Card Filing: Applicants Must Now Leave the US to Apply
- may 23 culture US Researchers Hit With New Federal Limits on Publishing With Foreign Collaborators
- may 23 infra What Cloudflare's Q1 2026 Outage Data Says About Designing for State-Level Shutdowns
- may 22 models A Theory of Time-Sensitive Language Generation Says Sparse Hallucination Beats Mode Collapse
- may 22 models arXiv 2605.16428 Measures AI Search's Drag on Publisher Traffic Using Paired Google and Reddit Data
- may 22 agents Beyond Text-to-SQL: New Agentic Architecture Routes Enterprise Analytics Through Governed APIs
- may 22 policy CISA's Own Data Leak Has Lawmakers Demanding Answers About the Voluntary Threat-Sharing Pact
- may 22 devtools Cursor's In-House Model Changes the Vendor Calculus for AI Coding Teams
- may 22 devtools Deno 2.8 Lands as Bun Gets Deprecated by yt-dlp: The JavaScript Runtime Field Is Reshuffling
- may 22 culture Employer-Side Law Firms Create a Structural Asymmetry in US Organizing Drives
- may 22 devtools Google Sunsets Gemini CLI on June 18: Forced Migration to Antigravity CLI Breaks Existing Automation
- may 22 agents GraphFlow Lifts LLM-Agent Workflows Into Schedulable Graphs to Optimize Serving
- may 22 agents Learning to Configure Agentic AI Systems Exposes a Gap in CrewAI and AutoGen Template Libraries
- may 22 devtools Malicious VSCode Extension Hit 3,800 Repos: What GitHub's Marketplace Trust Model Actually Verifies
- may 22 security AI Jailbreaks Are Now a Reasoning Problem, Not a Prompt Problem
- may 22 industry Microsoft and Uber's AI Agent Bills Expose a Per-Token Pricing Problem
- may 22 agents Microsoft's 2026 Cost Math Forces CrewAI and LangGraph Users to Audit Token Spend Per Agent
- may 22 policy NIH Demands Advance Clearance for Foreign Co-Authors Without a Published Rule
- may 22 oss Nx Console 18.95.0 Compromise Hides a Multi-Stage Credential Stealer in an Orphan Commit
- may 22 security OpenAI's New Agent Defense Post Concedes Prompt Injection Is Architectural, Not Patchable
- may 22 industry OpenAI's S-1 Will Force the First Public Audit of LLM Inference Margins
- may 22 industry OpenAI's S-1 Will Have to Define AGI for SEC Reviewers, Not Just Investors
- may 22 agents PBT-Bench Asks Whether AI Coding Agents Can Actually Write Property-Based Tests
- may 22 infra Railway's May 19 GCP Suspension Exposes the Single-Account Risk Underneath Every Reseller PaaS
- may 22 security Jailbreak Defense Now Lives in Model Weights, Not in Prompt Filters
- may 22 agents AI Agents That Learn New Skills Without a Human Curator
- may 22 agents SpecBench Catches Long-Horizon Coding Agents Gaming Reward Signals
- may 22 agents SpecBench Exposes Reward Hacking in Long-Horizon Coding Agents
- may 22 security Vercel Blocks Deploys With Vulnerable next-mdx-remote by Default: Platform Mitigation Outpaces the CVE Cycle
- may 22 security Vercel's Next.js Middleware Bypass Postmortem: What the Fix Reveals About Edge Runtime Auth
- may 22 infra vLLM 0.21 Makes Prefill-Decode Disaggregation Actually Practical
- may 22 security When Stronger Backdoor Triggers Backfire: An arXiv Theory Paper Inverts a Core Defense Assumption
- may 18 agents A New Trust Schema Exposes Why Agent Skill Registries Fail Enterprise Audit Requirements
- may 18 industry Anthropic Passes OpenAI in US Business Adoption, But Per-Token Billing Shifts Cost Risk to Buyers
- may 18 industry Anthropic Ships 10 Finance Agents With Moody's 600M-Company Credit Data and Expanded Microsoft 365 Integration
- may 18 industry Bret Taylor's Sierra Raises $950M at $15B, Claims 40% of Fortune 50 Use Its Agents
- may 18 infra DMax Hits 1,338 Tokens/Sec on 2x H200: Parallel Decoding Pushes dLLM Serving Past the Autoregressive Bar
- may 18 infra KV Cache Offloading Breaks on Text2JSON: Why Llama 3 and Qwen 3 Lose Accuracy on Context-Intensive Prompts
- may 18 policy Maryland Enacts First US Ban on Algorithmic Grocery Pricing, Effective Immediately
- may 18 models The Last Word Often Wins: A Format Confound Inflates Chain-of-Thought Corruption Robustness Scores
- may 18 agents Trojan Hippo Plants Dormant Payloads in Agent Memory, Hits 85-100% Exfiltration on Frontier Models
- may 17 culture AB 566 Forces Chrome and Safari to Ship Opt-Out Signals by 2027 — Then Shields Them from Google's 86% GPC Failure
- may 17 industry AI Was Cited in 26% of Challenger's April Layoffs. UBS Notes the Series Captures 5% of US Job Flow