Topic

#inference infrastructure

2 articles exploring inference infrastructure. Expert insights and analysis from our editorial team.

Showing 1โ€“2 of 2 articles

Articles

Newest first
Infrastructure & Runtime

UCCL-Zip Adds Lossless Compression to NCCL Collectives: 47.5% Faster RL Weight Sync, No API Changes

UCCL-Zip fuses lossless compression into NCCL collectives at the kernel level, cutting cross-node wire bytes without accuracy tradeoffs or application changes. Peak gains:.

Infrastructure & Runtime

Prefill-Decode Disaggregation: The Architecture Shift Redefining LLM Serving at Scale

Prefill-decode disaggregation separates compute-bound prefill from memory-bound decode onto dedicated hardware, eliminating phase interference.

ยท 9 min read