Topic

#llm-inference

3 articles exploring llm-inference. Expert insights and analysis from our editorial team.

Showing 1–3 of 3 articles

Articles

Newest first

KServe + llm-d Claims 57× P90 TTFT. RC1 Ships with a Routing Deadlock and No Migration Guide

Red Hat's KServe + llm-d integration claims 57× P90 TTFT gains against an unoptimized vLLM baseline, but RC1 ships with a known routing deadlock, a prematurely merged WIP.

April 23, 2026

Models & Research

DuQuant++ Makes FP4 Quantization Practical for LLM Inference: What Fine-Grained Rotation Means for Blackwell Deployments

DuQuant++ aligns rotation block size with MXFP4 microscaling groups, halving preprocessing cost and pushing W4A4 accuracy close to FP8 as Blackwell FP4 Tensor Cores ship.

April 22, 2026

Infrastructure & Runtime

KV Cache Is Becoming a Distributed Infrastructure Layer: What KV Packet and llm-d Mean for Self-Hosted LLM Teams

KV Packet eliminates cross-request recomputation; llm-d brings cache-aware routing to Kubernetes. Here's what both mean for vLLM capacity planning.

April 20, 2026 · 6 min read

Browse All Topics