Topic

#distributed-inference

2 articles exploring distributed-inference. Expert insights and analysis from our editorial team.

Showing 1–2 of 2 articles

Articles

Newest first
Infrastructure & Runtime

UCCL-Zip Brings Lossless Compression to NCCL Collectives — 47.5% Faster RL Weight Sync and 10% Lower vLLM Latency

UCCL-Zip fuses lossless compression into NCCL and GPU P2P transfers, cutting RL weight sync by 47.5% and vLLM latency by 10% with no API changes and bit-identical outputs.

Infrastructure & Runtime

CoCoDiff Exposes the All-to-All Bottleneck That Caps Distributed Diffusion Transformer Inference Well Below Theoretical GPU Count

Ulysses parallelism caps distributed DiT inference scaling on heterogeneous interconnects. CoCoDiff delivers 3.6x average speedups on Aurora via topology-aware scheduling.