Topic
#distributed-inference
2 articles exploring distributed-inference. Expert insights and analysis from our editorial team.
Showing 1–2 of 2 articles
Articles
Newest first
Infrastructure & Runtime
UCCL-Zip Brings Lossless Compression to NCCL Collectives — 47.5% Faster RL Weight Sync and 10% Lower vLLM Latency
UCCL-Zip fuses lossless compression into NCCL and GPU P2P transfers, cutting RL weight sync by 47.5% and vLLM latency by 10% with no API changes and bit-identical outputs.
Infrastructure & Runtime
CoCoDiff Exposes the All-to-All Bottleneck That Caps Distributed Diffusion Transformer Inference Well Below Theoretical GPU Count
Ulysses parallelism caps distributed DiT inference scaling on heterogeneous interconnects. CoCoDiff delivers 3.6x average speedups on Aurora via topology-aware scheduling.