Topic

#nccl

2 articles exploring nccl. Expert insights and analysis from our editorial team.

Showing 1–2 of 2 articles

Articles

Newest first
Infrastructure & Runtime

UCCL-Zip Brings Lossless Compression to NCCL Collectives — 47.5% Faster RL Weight Sync and 10% Lower vLLM Latency

UCCL-Zip fuses lossless compression into NCCL and GPU P2P transfers, cutting RL weight sync by 47.5% and vLLM latency by 10% with no API changes and bit-identical outputs.

Infrastructure & Runtime

UCCL-Zip Brings Lossless Compression to NCCL Collectives — 47.5% Faster RL Weight Sync, No API Changes

UCCL-Zip fuses lossless compression into NCCL collectives at the kernel level, cutting cross-node wire bytes without accuracy tradeoffs or application changes. Peak gains:.