Topic

#block-preemption

1 article exploring block-preemption. Expert insights and analysis from our editorial team.

Showing 1–1 of 1 articles

Articles

Newest first
Infrastructure & Runtime

vLLM Block-Level Preemption and FlexKV Shift the Long-Context Bottleneck From GPU Memory to PCIe

vLLM v0.19 block preemption and v0.18 FlexKV shift the long-context bottleneck from GPU memory to PCIe and CPU cache, but require experimental flags and carry unresolved.