Category

Infrastructure & Runtime

Inference, serving, RAG, vector DBs, edge deployment, and hardware.

33 articles exploring Infrastructure & Runtime. Expert analysis and insights from our editorial team.

Showing 16–30 of 33 articles · Page 2 of 3

Latest in Infrastructure & Runtime

Newest first
16

Prefill-Decode Disaggregation: The Architecture Shift Redefining LLM Serving at Scale

Prefill-decode disaggregation separates compute-bound prefill from memory-bound decode onto dedicated hardware, eliminating phase interference.

· 9 min read
17

Google LiteRT: Running LLMs on Your Phone Without the Cloud

Google's LiteRT (formerly TensorFlow Lite) is now the production backbone for on-device GenAI across Android, Chrome, and Pixel devices. Here's what it means for developers building AI apps that run privately, without the cloud.

· 8 min read
18

Microsoft's BitNet: How 1-Bit LLMs Could Make GPU Farms Obsolete

Microsoft's BitNet inference framework runs billion-parameter LLMs on ordinary CPUs using ternary weights, delivering up to 6x faster inference and 82% lower energy consumption—potentially upending the assumption that AI inference requires expensive GPU hardware.

· 7 min read
19

WebAssembly AI: Running Models in the Browser

WebAssembly enables production-ready AI inference directly in the browser—no server required. Learn how WASM, WebGPU, and modern frameworks make client-side ML practical, what the performance trade-offs actually look like, and when to use it.

· 9 min read
20

The MCP Registry: GitHub's Play to Become the App Store for AI Tools

GitHub's MCP Registry centralizes discovery of Model Context Protocol servers, positioning GitHub as the primary distribution layer for AI agent tooling and addressing the fragmentation that emerged as MCP's ecosystem exploded past 5,000 servers in under a year.

· 7 min read
21

Microsoft's Data Storage That Lasts Millennia

Microsoft's Project Silica has demonstrated a way to encode terabytes of data into ordinary borosilicate glass using femtosecond lasers, with accelerated aging tests projecting data integrity for at least 10,000 years—at a fraction of previous costs.

· 8 min read
22

MCP Is Everywhere: The Protocol That Connected AI to Everything

How the Model Context Protocol became the universal standard connecting AI assistants to data sources, tools, and enterprise systems—transforming isolated models into truly connected agents.

· 6 min read
23

Nvidia's Deal With Meta Signals a New Era in AI Computing Power

Meta and Nvidia announced a multi-year strategic partnership in February 2026 that will see Meta deploy Nvidia's Vera Rubin platform across gigawatt-scale data centers, representing one of the largest single commitments of AI computing resources in history.

· 10 min read
24

Pebble Is Back: Inside the Community-Driven Smartwatch Revival

After nine years in stasis, Pebble—the iconic smartwatch that pioneered wearable computing—is returning through a grassroots revival led by its original founder and a passionate community of developers.

· 12 min read
25

Alibaba's zvec: A Lightning-Fast Vector Database That Fits In-Process

Zvec is Alibaba's open-source, in-process vector database built on the battle-tested Proxima engine. It enables millisecond semantic search across billions of vectors without requiring external servers or infrastructure, making it ideal for edge AI and embedded applications.

· 8 min read
26

DNS-Persist-01: Let's Encrypt's New Model for Permanent Certificate Validation

DNS-Persist-01 is a proposed ACME challenge type that allows persistent DNS TXT records for certificate validation, eliminating the need for real-time DNS updates with each renewal as certificate lifetimes shrink to 47 days by March 2029 under CA/Browser Forum SC-081v3.

· 8 min read
27

Tailscale Peer Relays: The Missing Piece for True P2P Networking

Tailscale Peer Relays became generally available on February 18, 2026, enabling high-throughput peer-to-peer relaying within your own infrastructure. This feature eliminates the performance bottleneck of DERP servers when NAT traversal fails, delivering true mesh networking even in restrictive network environments.

· 8 min read
28

Edge AI Deployment: Running Models Where the Data Lives

Edge AI deploys machine learning models directly on local devices, reducing latency to milliseconds while keeping sensitive data private. This comprehensive guide covers deployment strategies, optimization techniques, and key frameworks for running AI from smartphones to IoT sensors.

· 8 min read
29

GitHub Agentic Workflows: AI That Commits Code For You

GitHub's agentic workflows bring autonomous AI agents directly into the developer workflow, enabling AI to write code, create pull requests, and respond to feedback—transforming the PR process from manual coding to AI-assisted systems thinking.

· 8 min read
30

Vector Search at Scale: Architectures That Handle Billions of Embeddings

Vector search at scale requires distributed architectures, approximate nearest neighbor algorithms like HNSW and IVF, and intelligent sharding strategies. Leading implementations can query billions of embeddings in milliseconds with 95%+ recall.

· 6 min read

Explore More Categories

Discover insights across different technology domains.

Browse All Articles