groundy

infrastructure & runtime

30 articles · rss

Top in infrastructure & runtime

infra

Cloudflare Flagship Is a Feature Flag Service That Deepens Platform Gravity

Cloudflare's Flagship is a feature flag service with a native Workers binding that replaces third-party flag providers, consolidating more of the edge stack under one vendor.

infra

Why LLMs Still Botch Kubernetes Manifests: The Training-Data Gap

A 1.5B-parameter model hits 91.5% on Kubernetes YAML generation, but the remaining failures are syntactically valid manifests that deploy and quietly violate cluster intent.

infra

ObjectCache Moves KV Reuse to S3-Class Storage: Why Layerwise Retrieval Beats Full-Prefix Cache Hits

ObjectCache retrieves KV cache per-layer from S3, adding 5.6% TTFT at 64K context but 56-75 ms at 4K. Long-context deployments where DRAM is the bottleneck benefit most.

infra

Vercel's CDN Origin Timeout Jumps to 2 Minutes: A Concession to LLM Streaming Workloads

Vercel raised its CDN origin timeout from 30s to 120s to support LLM streaming, removing a constraint that forced teams to route AI traffic through separate infrastructure.

infra

Fluid Compute vs PgBouncer: Vercel's Undocumented Bet on Connection Reuse

Vercel's Fluid Compute claims to hold Postgres connections open across requests, potentially eliminating PgBouncer, but the claim lacks published technical specs.

infra

Vercel Fluid Pools Database Connections Across Invocations, Bypassing External Poolers

Fluid Compute reuses Postgres connections across warm invocations via attachDatabasePool, dropping the pooler for simple apps but not for shared-database architectures.


  1. may 25 infra Railway's GCP Suspension Is a Reseller PaaS Problem, Not a Google One
  2. may 24 infra Vercel CDN Request Collapsing: One Origin Fetch Per ISR Cache Miss
  3. may 24 infra CISA Admin Leaked AWS GovCloud Keys on GitHub: What Federal Secret Scanning Missed
  4. may 23 infra What Cloudflare's Q1 2026 Outage Data Says About Designing for State-Level Shutdowns
  5. may 22 infra Railway's May 19 GCP Suspension Exposes the Single-Account Risk Underneath Every Reseller PaaS
  6. may 22 infra vLLM 0.21 Makes Prefill-Decode Disaggregation Actually Practical
  7. may 18 infra DMax Hits 1,338 Tokens/Sec on 2x H200: Parallel Decoding Pushes dLLM Serving Past the Autoregressive Bar
  8. may 17 infra Kioxia and Dell's 10 PB in 2RU: What Storage Density Means for Cluster Power and Rebuild Windows
  9. may 17 infra KV Cache Offloading Breaks on Context-Intensive Tasks: Text2JSON Exposes the Landmark Failure Mode
  10. apr 28 infra Crawshaw's 'I Am Building a Cloud': What a Tailscale Co-Founder's Solo Stack Implies for Platform Teams
  11. apr 23 infra UCCL-Zip: Lossless Compression for NCCL, 47.5% Faster RL Sync, 10% Lower vLLM Latency
  12. apr 22 infra Ingress-Nginx Is Dead, Not Deprecated: Final CVE Patches Shipped, But Platform Teams Need a Migration Plan
  13. mar 23 infra MLX vs llama.cpp on Apple Silicon: Which Runtime to Use for Local LLM Inference
  14. mar 23 infra Prefill-Decode Disaggregation: The Architecture Shift Redefining LLM Serving at Scale
  15. mar 14 infra Google LiteRT: Running LLMs on Your Phone Without the Cloud
  16. mar 12 infra Microsoft's BitNet: How 1-Bit LLMs Could Make GPU Farms Obsolete
  17. mar 26 infra OpenRAG: The Open-Source RAG Platform Challenging Pinecone
  18. feb 27 infra WebAssembly AI: Running Models in the Browser
  19. feb 18 infra DNS-Persist-01 Validation: Let's Encrypt's Model for Permanent ACME Certificate Authorization
  20. feb 18 infra Tailscale Peer Relays: The Missing Piece for True P2P Networking
  21. feb 14 infra Perplexity API: Adding Real-Time Search to Your Apps in Minutes
  22. feb 11 infra The Complete Guide to Local LLMs in 2026

Production AI runs on infrastructure that was never designed for it. Inference serving is a moving target as prefill and decode pull apart onto different hardware, KV caches spill into tiered storage, and collective communication libraries get rewritten to claw back bandwidth. Every benchmark win on synthetic workloads has to survive long-context synthesis, multi-tenant interference, and the unglamorous math of tokens-per-dollar before it counts.

The fabric underneath is just as contested. Vector databases are converging with the OLTP stack, serverless runtimes are quietly absorbing what connection poolers used to own, and overlay networks keep colliding with cloud-provider NAT and egress policy in ways that turn architecture diagrams into invoices. Storage density is outrunning rebuild windows, forcing erasure-coding choices that used to be theoretical. Cheaper-inference research keeps threatening the assumption that scale must mean GPU farms, while denser GPU farms keep proving it.

This beat covers that tension on the merits. We track serving architectures, networking and peering economics, retrieval and caching layers, GPU and storage hardware, and the cloud-account dependencies that quietly underwrite the whole stack. We compare vendor claims against published numbers, flag when a throughput headline hides a quality regression, and pay attention to the boring failure modes that take down platforms more often than the exciting ones do.