Vercel's CDN Origin Timeout Jumps to 2 Minutes: A Concession to LLM Streaming Workloads

Vercel raised the CDN origin timeout from 30 seconds to 120 seconds, explicitly citing LLM generation and complex data queries as the driver. The change is available on all plans at no extra cost, requires no configuration, and directly addresses the class of streaming AI workloads that were hitting 504 gateway timeouts before the backend could emit a single byte.

What Changed

Vercel’s CDN origin timeout changelog entry increased the proxied request ceiling from 30 seconds to 120 seconds. After the first byte is received, the backend can take longer than two minutes to complete the request as long as it continues sending data at least once every 120 seconds. The changelog carries a publication date of May 8, 2025; the Vercel Weekly for 2026-05-18 does not mention the change, which creates a date ambiguity worth noting: this may not be a recent change so much as one that went largely unremarked when it first appeared.

The distinction matters because the proxied request timeout is not the same as the function execution timeout. Vercel Functions with Fluid Compute (enabled by default) have a 300-second default max duration and an 800-second ceiling on Pro and Enterprise plans. The CDN origin timeout governs the edge proxy layer, the hop between Vercel’s CDN node and the origin function. A function can run for five minutes, but if the CDN proxy gives up waiting after 30 seconds, the function result never reaches the client.

The Problem It Solves

The 30-second ceiling was a hard constraint for any LLM streaming endpoint that needed time to think before responding. A developer deploying Next.js Edge runtime with Gemini 2.5 Flash streaming hit a reproducible 120-second cutoff on SSE responses with 35k-40k token prompts, even with keep-alive pings every 10 seconds. That report describes the timeout at 120 seconds, which lines up with the new ceiling and suggests the change either addressed or coincided with that specific pain point.

Before this change, the practical options for teams running long-context or agentic LLM chains on Vercel were limited:

Route streaming traffic through a separate infrastructure layer (Cloudflare Workers, a raw origin server) that did not enforce a 30-second first-byte deadline.
Accept 504s on any request where the model needed more than 30 seconds of reasoning before emitting tokens.
Avoid slow-first-byte models entirely and stick to fast-response endpoints.

Vercel’s own timeout guidance recommends streaming for AI applications: “If building AI applications, we recommend streaming. Always send an HTTP response (even an error).” Good advice, but it only works if the CDN proxy layer does not kill the connection before the stream starts.

How the Timeout Layers Stack Up

Vercel’s timeout model is layered, and conflating the layers is the most common source of confusion:

Layer	Timeout	Scope
CDN origin proxy	120s (was 30s)	First-byte deadline from CDN to origin
Function max duration	300s default, 800s ceiling (Pro/Enterprise)	Total function execution with Fluid Compute
Keepalive cadence	120s between data frames	Must send at least one byte per window after first byte

The CDN origin timeout is the narrowest constraint for AI streaming. A model that takes 90 seconds to start emitting tokens would have failed under the old 30-second ceiling regardless of how long the function was allowed to run.

The Competitive Angle

The 30-second CDN origin timeout was a constraint specific to Vercel’s proxy layer. Other edge platforms handle long-lived proxied connections with different timeout models, and the constraint was a common reason teams adopted split architectures: static assets through Vercel, SSE streams through a separate origin.

The practical effect of Vercel’s change is that teams already invested in the Vercel ecosystem no longer need a split-architecture deployment where static assets go through Vercel and SSE streams bypass it via a separate origin. Whether 120 seconds is sufficient depends on the model: fast-response models like GPT-4o-mini or Claude Haiku will never approach the limit, but deep-reasoning models and long-context agentic chains that require 60-90 seconds of thinking before the first token are now within the CDN’s patience.

What’s Still Missing

The change addresses the most common failure mode, but gaps remain.

Deep-reasoning models that exceed 120 seconds of initial thinking before any token emission will still fail at the CDN layer. The 4x increase from 30 to 120 seconds covers most current production workloads, but the trajectory of model complexity suggests that thinking times will continue to grow. A model that spends three minutes on chain-of-thought reasoning before responding is not served by a two-minute ceiling.

The date ambiguity in the changelog is also worth flagging. The changelog entry lists May 8, 2025. The Vercel Weekly covering May 8-15, 2026 does not mention the change. If this change shipped in May 2025, it predates much of the current wave of long-context model adoption and may have been a forward-looking adjustment that Vercel did not promote heavily at the time. If the date is a typo for May 2026, the change is recent but undocumented in the weekly roundup. Either way, the infrastructure analysis holds: the 30-second ceiling was a real constraint, the increase to 120 seconds removes it for most current workloads, and the remaining gaps will matter as models get slower to start and faster once they begin emitting.

Frequently Asked Questions

Does the CDN timeout increase apply to Vercel Hobby plans?

The CDN origin timeout is identical across all tiers at 120 seconds. The plan difference emerges at the function runtime layer: Hobby deployments are capped at 300 seconds of total function execution with Fluid Compute, while Pro and Enterprise plans can extend to 800 seconds. A streaming endpoint on Hobby that fits within the CDN proxy window but needs more than five minutes of total function execution will still fail.

How does Cloudflare Workers handle the same long-lived streaming scenario?

Cloudflare Workers enforce a 30-second CPU-time limit that measures actual compute cycles, not wall-clock elapsed time. An SSE connection that is idle while waiting for an LLM’s next token burst does not consume CPU time, so there is no equivalent first-byte or keepalive timeout at the proxy layer. The tradeoff is that Workers lack an integrated function runtime: teams must build their own streaming proxy and manage the origin server that calls the LLM API, rather than deploying a single serverless function.

Can the 120-second ceiling be raised per route for models that need more time?

The changelog documents a single fixed ceiling with no per-route or per-deployment configuration option. Teams whose models consistently exceed two minutes before emitting the first byte must route streaming traffic outside Vercel’s CDN entirely, typically through Cloudflare Workers or a dedicated origin. Vercel Workflows offer unlimited execution duration but use a different invocation and orchestration model than standard serverless functions, so they are not a direct substitute for a higher CDN proxy timeout.

Does the CDN timeout affect Edge Functions and Route Handlers, or only Serverless Functions?

The CDN origin timeout governs the proxy hop between Vercel’s CDN nodes and the origin, so it applies to any proxied request regardless of what serves it: Serverless Functions, Edge Functions, and Next.js Route Handlers all pass through the same CDN proxy layer. The execution-time limits differ by runtime, but the 120-second first-byte constraint is a property of that CDN hop, not the compute tier behind it.