Topic

#llada

1 article exploring llada. Expert insights and analysis from our editorial team.

Showing 1–1 of 1 articles

Articles

Newest first
Infrastructure & Runtime

DMax Hits 1,338 Tokens/Sec on 2x H200: Parallel Decoding Pushes dLLM Serving Past the Autoregressive Bar

DMax reformulates diffusion LLM decoding as embedding refinement, achieving 1,338 tok/s on 2× H200 and challenging ParallelBench's parallel-decoding quality trade-off finding.