Vercel CDN Request Collapsing: One Origin Fetch Per ISR Cache Miss

Incremental Static Regeneration has always carried a mechanical flaw: when a cached page expires and traffic is high, every concurrent visitor triggers a separate origin fetch. The resulting cache stampede could multiply function invocations by the dozens for a single page. Vercel’s CDN request collapsing now deduplicates those concurrent misses down to one function invocation per region.

The stampede ISR always had

ISR works by serving a static page from cache until a configurable revalidation window expires. The first request after expiration regenerates the page and writes the fresh result back to cache. The problem is the gap: between the moment the stale page is evicted and the moment the regenerated page lands, every incoming request for that URL fires a separate function invocation. At low traffic this is invisible. During a traffic spike on a popular route, it is an origin-side thundering herd.

This is not theoretical. Any Next.js site using ISR with revalidate values in the 10-to-60-second range and a bursty traffic pattern has hit it. The symptoms are straightforward: function duration spikes, higher invocation counts than traffic justifies, and intermittent 504s when the origin saturates.

Per-region collapsing

Vercel’s fix collapses concurrent requests at the region level. When multiple requests hit the same uncached path within a CDN region, only one triggers a function invocation. The remaining requests wait and receive the cached response once regeneration completes. The Vercel blog post describes each CDN node as maintaining an in-memory cache for frequently requested content, with multiple workers per node handling concurrent requests. The coordination ensures that only one invocation per region runs for a given cache key.

Each CDN region operates independently. A page that goes viral globally still triggers one invocation per region, not one per request. This collapses concurrent requests within each region but does not guarantee a single global invocation.

What collapsing means for origin sizing

Origin function invocations no longer fan out linearly with edge concurrency. Before collapsing, every concurrent request for an expiring page triggered a separate function invocation in that region. After collapsing, concurrent requests resolve to a single invocation. The math changes origin sizing from worst-case fan-out to worst-case unique-keys-per-region.

This matters most for teams paying for function invocation count or duration. A site with 50 ISR routes and modest traffic might have seen its origin bill dominated by a handful of hot pages during peak hours. Collapsing removes that spike from the cost curve, though the protection depends on how quickly regeneration completes relative to the collapse window.

Routes that don’t collapse

Not everything qualifies. Request collapsing applies only to routes Vercel identifies as cacheable via framework-defined infrastructure metadata. Dynamic API routes returning user-specific data, pages with random content or timestamps, and routes served solely via Cache-Control headers do not collapse.

External-origin proxied content is also excluded. If you are using Vercel as a CDN layer in front of a separate backend, the stampede problem persists for those routes. The feature is specific to framework-declared ISR, which means self-hosted Next.js on Docker or Node does not benefit at all; there is no CDN-level collapsing layer to lean on.

Comparison with Bunny.net request coalescing

Bunny.net offers a comparable feature called Request Coalescing that combines simultaneous uncached requests into a single origin fetch. The architectural difference is scope: Bunny.net’s coalescing runs independently per edge point of presence and does not guarantee a single origin request globally. Vercel’s collapsing operates at the region level, which is a wider scope than a single PoP but narrower than global.

Feature	Scope	Route coverage	Fallback on timeout
Vercel request collapsing	Per-region (all nodes)	Framework-declared ISR only	Undocumented
Bunny.net request coalescing	Per-PoP	Configurable per pull zone	Serve stale / pass through

What changes for capacity planning

The practical shift: origin sizing moves from modeling worst-case concurrent requests per hot key to modeling worst-case unique expiring keys per region. That is a narrower number by one to two orders of magnitude on most sites.

What has not changed: dynamic routes and external origins still stampede. Each region still invokes independently, so global traffic patterns still fan out across regions. And Vercel has not published queue-depth limits or timeout behavior, which means the failure mode under extreme hot-key pressure is not well characterized.

For teams already running ISR on Vercel, the feature is live and has likely been reducing invocation counts since it was enabled by default. The question worth asking your monitoring is whether your regeneration times are fast enough for the collapse mechanism to hold under peak load. If they are not, collapsing may do less work than the architecture suggests.

Frequently Asked Questions

What happens to cached content if the origin function returns an error during revalidation?

Vercel preserves the stale cached page and schedules a retry after a 30-second TTL. The error does not poison the cache. Only HTTP 200, 301, 302, 307, 308, 404, and 410 status codes are treated as successful revalidation results; network errors and 5xx responses both trigger the stale-serve-and-retry behavior, so a brief origin outage will not leave visitors seeing error pages for previously cached ISR routes.

Does request collapsing reduce time to first byte, or only invocation count?

Both. Vercel uses double-checked locking (check cache, acquire locks, check cache again, regenerate only if still empty) and writes the regenerated page to cache asynchronously after the function returns. The async write means the request that triggered regeneration receives its response before the cache update completes, reducing TTFB while still collapsing subsequent requests behind the lock.

What happens to queued requests if regeneration takes longer than the collapse window?

Lock timeouts are set to 3 seconds at both the node and regional levels. If a request cannot acquire the lock within that window, it abandons waiting and invokes the function independently as a hedging mechanism. Slow regenerations (database-heavy pages, slow upstream APIs) can break through collapsing and cause the same fan-out the feature is designed to prevent. Teams with regeneration times approaching 3 seconds should monitor invocation counts for hot keys during peak traffic.

Are the 3M collapsed requests per day measured globally or per region?

The 3M cache-miss collapses and 90M background revalidation collapses are global totals self-reported by Vercel, not independently audited. Because collapsing operates independently per region, a site with traffic concentrated in a single region may see a smaller absolute reduction than the global average, since there are fewer same-key concurrent requests to deduplicate within one region.