Vercel’s changelog entry extends the CDN-to-origin timeout from 30 seconds to 120 seconds, effective immediately across all plans at no additional cost. The change is not a removal of a failure boundary. It is a relocation: the edge will now hold a slow-origin connection four times longer before issuing a 504, which shifts slow-backend detection from the CDN onto your application’s own timeout and cancellation logic.
What did Vercel actually change?
The specific rule: the CDN now waits up to 120 seconds for the first byte from the origin, up from 30 seconds. Once that first byte arrives, the connection can stay open past two minutes, provided the backend sends at least one byte every 120 seconds. Two separate windows, operating sequentially. No per-route or per-deployment override exists; the ceiling is uniform across all plans and all routes.
Vercel identifies the target workloads directly in the entry: LLM generation and complex data queries. The framing is accurate as far as it goes, and the reduction in 504 gateway timeouts on slow-first-byte requests is real.
One timing note: the changelog page itself carries a publication date of May 8, 2025, which conflicts with “June 2026 changelog” framing and with the absence of this entry from recent June 2026 changelog pages. The date discrepancy is real and unresolved. The behavioral specification in the entry is not in dispute.
Which timeout layer does this affect?
Vercel has at least three timeout boundaries that interact, and conflating them is the main source of confused debugging.
The CDN origin proxy timeout is what changed: 30 seconds to 120 seconds for first byte at the edge-to-origin hop. If the backend starts streaming within 120 seconds, the CDN stays connected. If it does not, the edge drops the connection and returns a 504, regardless of whether the function itself is still running.
The function maxDuration is a separate limit on how long the underlying serverless function is allowed to execute. Vercel’s function limits docs set the Fluid Compute default at 300 seconds across all plans; Pro and Enterprise can configure up to 800 seconds (GA) or 1,800 seconds (beta). Exceeding maxDuration returns a 504 FUNCTION_INVOCATION_TIMEOUT, which is a different error path from a CDN proxy timeout.
The Edge runtime first-byte rule is a third boundary: Edge Functions must begin sending a response within 25 seconds, after which they can continue streaming for up to 300 seconds. This 25-second first-byte requirement is distinct from both the CDN proxy change and maxDuration, and the CDN change does not affect it.
The practical consequence: a function can run for 300 seconds and emit data correctly, but if the CDN proxy gives up first, the client gets a 504 before the function’s output ever arrives. That proxy-vs-function race was why the 30-second ceiling mattered; the 120-second ceiling changes where the race is lost, not whether it exists.
Where did the failure boundary go?
The boundary moved from 30 seconds to 120 seconds before first byte. For a backend that completes successfully in under 30 seconds, the change is invisible. For one that takes 40 to 90 seconds to start streaming, a plausible range for deep-reasoning model inference, the CDN used to terminate the connection before first byte; now it waits. That is a real expansion of the success window.
What the expanded window does not change: a backend that is wedged, crashed, or stuck in a blocking call still eventually times out. The CDN will now hold that connection for 120 seconds before giving up instead of 30. Four times the wait; four times the resource cost per failed request.
Under the previous regime, a fast-failing CDN was also a fast-detecting CDN. At 30 seconds, the edge returned a 504, the client saw an error, and the connection was closed. Slow-origin detection was, in effect, outsourced to the CDN. That implicit contract is now gone.
What breaks first at 120 seconds?
The most constrained resource in a wedged-connection scenario is Vercel’s shared file descriptor pool: functions share 1,024 file descriptors across all concurrent executions, including runtime overhead. A held-open connection to a wedged LLM backend occupies one of those descriptors for up to 120 seconds instead of 30. Under load, with multiple concurrent requests hitting a slow or unresponsive upstream, that pool saturates faster.
The failure mode is not a clean error. When file descriptors exhaust, new connections cannot open, which manifests as failures on unrelated requests sharing the same function instance. A wedged LLM call can impair a health check or a database query running in a different concurrent execution. The 4x longer hold time makes this class of failure more likely and more persistent.
Concurrency pressure is the second failure mode. If your function serves 10 concurrent requests and 3 of them wedge on upstream LLM calls, those 3 slots stay occupied for up to 120 seconds. The CDN change does not introduce this problem; it prolongs it.
What does your application have to handle now?
The CDN previously acted as a rough backstop: if the backend was too slow, the edge killed the connection. Teams that never implemented application-level timeouts on their LLM API calls could rely on that 30-second guardrail to bound worst-case latency, at the cost of a 504 on slow requests.
At 120 seconds, that implicit backstop is four times weaker. An LLM call that hangs indefinitely on the provider side, a stalled stream, a provider outage that returns HTTP 200 but emits no tokens, will hold its connection open for two minutes before the edge terminates it. That is long enough to cause visible user-facing stalls and long enough to exhaust the file descriptor pool under concurrent load.
Three things the application now owns explicitly:
Application-level timeouts on upstream calls. If you want fail-fast behavior at, say, 15 seconds without a first token, that timer lives in your code. AbortController with a deadline is the standard mechanism; the CDN will not do this for you at a useful granularity.
Streaming keepalive versus generation progress. The CDN’s 120-second keepalive window (post-first-byte) measures data flow, not token output. A backend can emit whitespace or a heartbeat byte every 119 seconds and the CDN will stay connected. Detecting a stalled generation requires tracking time-between-substantive-tokens, not just time-since-last-byte.
Graceful degradation on timeout. When your application-level timer fires and you cancel the upstream call, return a partial response or a user-facing error immediately rather than waiting for the CDN to time out. The CDN will eventually catch a wedged connection, but at 120 seconds most users have already abandoned the request.
Where does the ceiling still cut?
The 120-second first-byte window covers most current streaming LLM workloads, but it is not an open-ended window. Deep-reasoning models with extended thinking budgets can exceed 120 seconds before emitting a first token, a pattern that was uncommon when the original 30-second limit was set but has become routine for multi-step planning tasks.
For those workloads, the changelog offers no per-route override. If your origin exceeds 120 seconds before sending any bytes, the CDN returns a 504 regardless of your function’s maxDuration configuration. Teams that need to support first-byte latencies above 120 seconds must route those requests outside Vercel’s CDN proxy hop entirely.
The other ceiling that did not move: the Edge runtime’s 25-second first-byte requirement. An Edge Function must still start responding within 25 seconds. The CDN origin timeout change applies to the CDN-to-origin proxy layer, not to the Edge runtime’s own response-start requirement.
The 30-second ceiling was cutting valid requests short on workloads that happen to be common now. The 120-second ceiling solves that class of failure. What it does not solve, and modestly worsens, is the cost of failure when backends are genuinely slow or wedged. Fail-fast is now the application’s job.
Frequently Asked Questions
How does the longer ceiling affect function compute billing?
Vercel bills function execution by wall-clock duration, so a wedged upstream call held open for the full 120 seconds costs roughly four times the compute of the old 30-second regime. The longer ceiling multiplies per-request compute spend on slow or dead backends, not just the observed 504 rate.
Is the 120-second ceiling configurable per route or tier?
No. The changelog sets a uniform 120-second first-byte ceiling across all plans, including Hobby, with no per-route or per-deployment override on any tier. This contrasts with function maxDuration, which is tier-gated: the 300-second Fluid Compute default applies everywhere, but anything above 300 seconds requires Pro or Enterprise, capping at 800 seconds GA or 1,800 seconds in beta.
Can Vercel Workflows absorb requests that exceed 120 seconds to first byte?
Not as a drop-in replacement. Vercel Workflows use a different invocation model than the synchronous CDN proxy hop, so they do not satisfy the streaming-proxy contract the timeout governs. Teams whose origins exceed 120 seconds before first byte must route those specific requests outside Vercel’s CDN proxy layer entirely rather than migrating the workload to Workflows.
How does general trade press coverage frame this change differently?
Most coverage treats the 30-to-120-second extension as a pure reliability win that removes a streaming constraint. The failure-cost view inverts that: the longer ceiling relocates the boundary, prolongs held connections, and pushes slow-backend detection onto application-owned timeout logic, a net cost on genuinely slow backends rather than an unqualified improvement.
What should teams monitor now that the CDN no longer fails fast?
Instrument time-to-first-token distributions and count connections held near the 120-second mark, both of which were invisible when the edge cut slow origins at 30 seconds. Logging when AbortController fires in production becomes the primary signal for upstream provider stalls, replacing the 504 rate that previously served as the rough health indicator.