Topic

#consumer-gpu

1 article exploring consumer-gpu. Expert insights and analysis from our editorial team.

Showing 1–1 of 1 articles

Articles

Newest first
Infrastructure & Runtime

K-Token Merging Compresses Sequences in Latent Space — Lowering the KV Cache Floor for Long-Context Serving on 24GB and 48GB Cards

K-Token Merging compresses prompts in latent space before attention, cutting prefill KV cache 75% on 0.5B models and extending feasible context on 24GB and 48GB consumer GPUs.