Vercel’s KIKO Milano case study documents a genuine workflow shift: no manual Black Friday prep, builds down from roughly 20 minutes to under four, deploys several times a day. What it never documents is the part that decides whether a seasonal peak is safe on serverless. The post quotes no QPS, no concurrency ceiling, no cold-start figures, no bill. “It scaled” is operator relief, not a benchmark.
What the KIKO post actually claims (and the numbers it never gives)
The KIKO case study’s headline wins are operational, not performance. Per Vercel’s post, KIKO eliminated three weeks of Black Friday infrastructure prep, cut app build times by 75%, and moved from minimal releases to deploying multiple times a day. Builds now finish in under four minutes on average against roughly 20 minutes for the old pipeline.
None of that is contested. Build-time and deploy-cadence improvements are measurable and reproducible by the team that made them. The contested part is the scaling claim, which is the only thing the post offers about Black Friday itself. That claim is qualitative: the post frames “automatic scaling during traffic spikes” as what let the team stop planning around infrastructure limits. No benchmark is attached. No bill is attached.
The distinction matters because operational relief and capacity headroom are different questions. Faster deploys mean you can ship a fix during the peak; they do not mean the system has headroom to absorb the peak. The case study answers the first question and is silent on the second.
The mechanism the post rests on: request collapsing and ISR shielding
The capability that actually carries a Black Friday on this stack is request collapsing in the CDN, fronted by ISR cache shielding. Both are documented with reproducible mechanics, unlike the case study’s scaling claim.
Request collapsing synchronizes concurrent cache misses so that only one function invocation runs per region, with the other concurrent requests waiting briefly to receive the same response once it is cached. That is the mechanism keeping origin load flat while traffic spikes.
ISR adds a second layer. The ISR cache persists content for 31 days, is scoped per-deployment, and provides cache shielding: on a CDN miss, Vercel reads from the ISR cache before invoking the function. Global revalidation purges propagate within 300ms. This is the layer you can pin claims to. When the KIKO post says it scaled, the load-bearing mechanism is this cache fronting the origin.
There is a second-order catch the post does not raise. Because the ISR cache is scoped per-deployment, every new deployment generates its own cache. The same “deploy multiple times per day” rhythm that the post celebrates means each deploy during a peak primes a fresh cache and briefly shifts load back to the origin. The workflow win and the capacity risk share a mechanism.
The open question is what fraction of Black Friday traffic is cacheable. Collapsing and shielding only help the cacheable portion.
Fluid Compute savings: the “over 50%” headline and the workload it assumes
Fluid Compute’s savings figure, on Vercel’s own marketing page, comes from a customer quote: “cutting costs by over 50% with zero code changes” for API endpoints that were “lightweight and involved external requests, resulting in idle compute time.” That is not a guarantee. It is a workload-specific result from a customer whose handlers were I/O-bound.
The Vercel feature table frames the win the same way. Traditional serverless is marked with “I/O bound inefficiency”; Fluid is marked with “Optimized I/O efficiency.” CPU-bound workloads are not in the headline. The optimized concurrency path that drives the model is available only for the Node.js and Python runtimes, per Groundy’s in-function concurrency teardown.
When the KIKO post implies Fluid Compute absorbed the peak, the honest read is that I/O-bound handlers got cheaper and CPU-bound handlers may have regressed. The post does not say which kind KIKO runs.
Where serverless billing regresses on spiky traffic
Spiky seasonal traffic is the worst case for the billing model Vercel sells, and the case study is silent on it. Three lines move the wrong way under a spike.
Cold starts do not vanish. Fluid Compute reduces cold-start frequency but does not eliminate cold starts, per Autonoma’s independent analysis. On the single most important request of the year, a first-hit cold start is real and not zero.
Database connections leak. Serverless functions that suspend when idle hold database connections until the pooler-side timeout fires. Vercel’s own post names VM suspension as the root cause and documents a pool-management fix, but it has to be wired in. This is a known footgun, and it collides directly with the “deploy multiple times per day” cadence the post praises: a deploy without that fix in place can stampede the pooler at the worst possible moment.
Egress goes uncited. The post quotes no egress figure. For a Black Friday, egress and image delivery are where the bill actually lives, and none of the Fluid Compute savings touches that line item.
Why the checkout hot path can’t collapse
Request collapsing only applies to paths known to produce cacheable responses. The checkout and payment routes are exactly the ones that cannot collapse, which is the expensive part of Black Friday traffic.
Per Vercel’s request-collapsing post, a dynamic API route returning user-specific data cannot be collapsed. On Black Friday that means the payment confirmation, the cart, the live inventory check. These are the uncacheable, origin-hitting, user-specific requests that do the actual work and bill against you.
This is the second-order consequence worth carrying away: cache shielding handles the cheap, cacheable majority; collapsing does nothing for the uncacheable minority that drives the cost and the latency under load. The case study’s silence on traffic composition is the silence that matters most.
What to pin to docs, what to treat as marketing
Treat the case study as the contested layer on top of the docs. Pin every capability claim to the request-collapsing and ISR docs; treat the KIKO specifics as a workflow story rather than evidence of headroom.
| Claim | What actually backs it |
|---|---|
| 3 weeks of prep eliminated | Case study: operational, credible |
| 75% faster builds, under 4 min | Case study: measurable, reproducible by the team |
| ”It scaled” for Black Friday | Qualitative framing, no QPS, cost, or benchmark |
| Fluid Compute absorbs the peak | Teardown: I/O-bound wins, CPU-bound may regress |
| Request collapsing cuts origin load | Mechanism: reproducible, cacheable paths only |
| Checkout handled at peak | No evidence; collapse does not apply to user-specific routes |
The credible rows are the operational ones. The scaling row is the one a buyer should not rely on without their own numbers.
A capacity-planning checklist for a seasonal peak
Before signing up for a peak on this stack, pin the numbers the case study omits. Each item below maps to a documented mechanism, not a vendor headline.
- Split traffic by cacheability. Only the cacheable portion benefits from request collapsing and ISR shielding. Size the origin function pool for the uncacheable remainder.
- Classify handlers as I/O-bound or CPU-bound. The headline figure assumes I/O-bound. CPU-bound handlers may regress under concurrency and need their own benchmark.
- Measure your own cold start. Fluid Compute reduces cold-start frequency but does not eliminate them; benchmark against your own baseline rather than assuming Fluid removed them.
- Wire Vercel’s documented pool-management fix. Without it, budget for pooler-side connection exhaustion during the spike.
- Get the egress number the post never gives. Egress and image delivery are the bill lines that scaling savings do not touch.
- Account for the per-deploy capacity cost. Each new deployment brings new code that has to initialize and re-primes a fresh ISR cache, so the same fast-cadence deploys the post celebrates also shift load back to origin during a peak.
The case study is evidence that the workflow got easier. It is not evidence that the bill stays flat, and it tells you nothing about the checkout path. Run your own numbers before the peak, because the post already ran the ones it wanted to show.
Frequently Asked Questions
How do Vercel’s Node.js cold-start times compare to AWS Lambda and Cloudflare Workers?
Per Autonoma’s March 2026 independent analysis, Next.js on Node.js cold-start p50 lands between 200ms and 800ms, compared to 100-500ms for AWS Lambda Node.js and under 5ms for Cloudflare Workers. Fluid Compute shrinks how often those initializations fire, but does not change what they cost when they do.
What is the gap between Vercel’s 85% savings headline and the beta efficiency range?
Vercel’s launch changelog markets ‘Reduce compute costs by up to 85%’ while the beta changelog reports overlapping invocations ‘can increase efficiency 20%-50%.’ The two numbers describe different workload classes: 85% is a best-case I/O-bound ceiling, 20-50% is the measured band. The KIKO post’s ‘over 50%’ customer quote lands in the middle of the measured range, not at the ceiling, and describes a customer whose handlers were explicitly I/O-bound and idle-heavy.
What does the documented database connection fix actually require teams to implement?
Vercel’s fix uses attachDatabasePool from @vercel/functions, which calls waitUntil to close idle clients before function suspension rather than leaving connections open until pooler-side timeout. Pairing that with rolling releases (rather than a full blue-green swap) spreads initialization load across instances on deploy instead of spiking the pooler all at once.
How much traffic does request collapsing handle across Vercel’s CDN?
Vercel’s own post reports the CDN collapses over 3 million requests per day on cache miss, plus 90 million from background revalidations. Those figures are platform-wide across all tenants, so they describe what the mechanism can absorb in aggregate, not a throughput floor any individual project can rely on during its own peak.
Is Fluid Compute’s per-instance concurrency limit high enough for a genuine retail peak?
The per-instance concurrency limit was still beta-capped as of mid-2026, below the ‘tens of thousands’ figure in Vercel’s marketing copy. Vendor documentation does not publish a timeline for when the cap will be lifted or what SLA applies, which makes it a planning constraint that any capacity model for a seasonal peak needs to account for explicitly.