Topic
#nccl
2 articles exploring nccl. Expert insights and analysis from our editorial team.
Showing 1–2 of 2 articles
Articles
Newest first
Infrastructure & Runtime
UCCL-Zip Adds Lossless Compression to NCCL Collectives: 47.5% Faster RL Weight Sync, No API Changes
UCCL-Zip fuses lossless compression into NCCL collectives at the kernel level, cutting cross-node wire bytes without accuracy tradeoffs or application changes. Peak gains:.
Infrastructure & Runtime
UCCL-Zip: Lossless Compression for NCCL, 47.5% Faster RL Sync, 10% Lower vLLM Latency
UCCL-Zip fuses lossless compression into NCCL and GPU P2P transfers, cutting RL weight sync by 47.5% and vLLM latency by 10% with no API changes and bit-identical outputs.