The Vercel-AWS Deal Reveals Where AI Inference Runs

On May 27, 2026, AWS announced a databases integration that lets developers provision Aurora PostgreSQL, DynamoDB, and Aurora DSQL directly from the Vercel dashboard and the v0 assistant. Read as an architecture decision rather than a procurement convenience, the move clarifies what the product already implied: Vercel’s edge was never running the AI, and the stateful layer it depends on now sits in AWS regions.

What the Vercel-AWS deal actually moves

The partnership is older than the latest announcement suggests. The Strategic Collaboration Agreement dates to December 9, 2024 at re:Invent, and the genuinely new artifact is the May 27, 2026 databases integration. Vercel’s announcement framed the original SCA as a three-year AWS investment in Vercel AI products: v0.dev, the open-source AI SDK (explicitly including AWS Bedrock), one-click AI Integrations, and AI Templates.

Until now, that collaboration lived mostly in go-to-market shape. v0 on AWS Marketplace, procurement convenience, retail and consumer-goods competencies. The databases integration is the first time the SCA touches the stateful data tier concretely, and it arrives paired with an H0 hackathon offering $160,000 in prizes and a June 29, 2026 deadline, per coverage on aws-news.com.

That data-tier move is the part worth reading. It is the first concrete sign of where Vercel-backed AI workloads actually run.

Edge Functions were never running your inference

Vercel’s edge runtime was architecturally incapable of running inference, and the documented production pattern never asked it to. Edge Functions cap responses at roughly 512KB, time out around 10 seconds, and have no GPU access, as a production-architecture analysis from markaicode documents. The documented AI pattern routes model calls to external APIs such as OpenAI and Bedrock, caches responses in Vercel KV at the edge, and uses Background Functions (300-second timeout) for async logging.

Vercel frames the edge’s lack of GPU as deliberate design rather than a deficiency: you route to an external API and cache the result. The partnership reframes which external API does the lifting. The AI SDK’s Amazon Bedrock provider (@ai-sdk/amazon-bedrock, available since June 2024 per Vercel’s changelog) makes AWS-hosted models like Llama 3 70B a first-class inference path inside Vercel AI apps.

Vercel’s own AI Gateway leaderboard, dated May 23, 2026, reinforces the point. The most-routed models are Gemini 3 Flash (16.8%) and Claude Opus 4.7 (13.5%), per Vercel’s own routing data, and none of them execute on Vercel’s edge infrastructure. Vercel markets itself as “the AI Cloud.” The inference is not on Vercel.

Why Fluid Compute makes streaming LLM responses expensive

Fluid Compute replaced Vercel’s earlier per-invocation pricing with Active CPU billing, according to an edge-cases.com analysis that dates the rollout to early 2026. The billing model penalizes exactly the workload an AI app spends its time doing.

Vercel’s pricing documentation states that Active CPU is billed only while code is executing, not during I/O waits such as database queries or AI model calls. Provisioned memory, however, is billed for the entire instance lifetime and keeps accruing during I/O until the last in-flight request finishes.

The consequence for AI workloads, per the same edge-cases.com analysis: CPU-bound workloads including LLM inference, image processing, and video transcoding get more expensive under Active CPU billing, while I/O-bound database and webhook workloads get cheaper. The specific trap for LLM apps is streaming. A streaming response keeps the function alive for the full token stream, which means provisioned memory accrues for the entire duration even though the function spends that time waiting on the model API.

One caveat on the edge-cases figures: the analysis’s specific cost-spike math uses the author’s own CPU rate ($0.00025 per millisecond), not a Vercel-published number, the same analysis notes. The qualitative finding holds; the magnitude is the author’s estimate, not Vercel’s.

How much the region choice changes your bill

The region where your inference-adjacent code runs materially changes the bill, and the spread across regions is wide. Vercel’s current pricing documentation, accessed June 15, 2026, lists a wide regional Active CPU spread:

Vercel region (as listed in docs)	Active CPU rate
Cleveland, Portland, Washington D.C. (iad1)	$0.128 / hour
São Paulo (gru1)	$0.221 / hour

Provisioned memory ranges from $0.0106 to $0.0183 per GB-hour across regions per the same pricing table. Vercel revises these figures, so the regions are the stable reference point when you model cost, not the specific dollar amounts. By the current table, São Paulo runs roughly 1.7 times the per-hour CPU cost of the cheapest US regions.

That spread sits between a Next.js team and any clean “run close to the user” story. Vercel has spent years arguing for edge proximity, the idea that running code close to the user buys latency. When the expensive compute is region-pinned anyway, a stateful database in an AWS region and inference routed to a specific Bedrock endpoint, the edge-proximity argument narrows to the cache layer. The compute does not move.

Where the stateful tier now lives

The May 27 integration moves the stateful layer into AWS regions and provisions it through Vercel’s UI, which is consistent with where the compute already was. It lands where edge functions were structurally weak: stateful, persistent, region-resident. Read architecturally, the deal aligns three layers that had already drifted apart in practice. The developer surface stays in Vercel. The stateful data tier moves to AWS regions. The inference stays where it always was, behind an external API, increasingly AWS Bedrock. The integration shortens the provisioning path between them; it does not relocate the compute.

What “AI on Vercel” actually is

Vercel sells “the AI Cloud,” but the architecture behind that label is an assembly, not a stack. The edge handles routing and caching; the stateful tier sits in AWS regions after this integration; the inference stays behind an external model API, increasingly Bedrock, and never executes on Vercel’s compute. The May 27 databases integration is what makes that topology concrete enough to draw. Once you read it that way, the operational questions, on billing, latency, and region, reduce to which layer of the assembly you are actually paying for.

Frequently Asked Questions

Why does the databases integration offer Aurora PostgreSQL, DynamoDB, and Aurora DSQL instead of a single store?

Each targets a different stateful pattern. DynamoDB handles key-value workloads with single-digit-millisecond lookups, Aurora PostgreSQL covers relational data, and Aurora DSQL is built for multi-region active-active distributed SQL. Choosing among them is itself an architecture decision because they replicate and bill differently inside AWS regions.

How does the Vercel-AWS model differ from Cloudflare’s Workers AI?

Cloudflare executes inference on its own edge network through Workers AI, so model calls stay on Cloudflare infrastructure. Vercel’s edge only routes to external model APIs, and the AWS partnership formalizes that split rather than collapsing it. Vercel keeps routing and caching, AWS regions hold the stateful tier and increasingly the inference via Bedrock.

What should a team monitor once Fluid Compute bills a streaming chat app?

The billing line that grows during a token stream is provisioned-memory duration, since Active CPU pauses during the model’s I/O wait and memory does not. The figure to track is function wall-clock time per request multiplied by memory GB-hours, and the lever that keeps it low is cache hit rate in Vercel KV, because a cache hit skips the compute entirely.

What would force a team to move AI compute off Vercel entirely?

Two thresholds: a workload that needs GPU access the edge cannot provide, or a job that exceeds Background Functions’ 300-second ceiling. Because the compute was already region-pinned to the database and Bedrock endpoint, the migration path is to run the function inside the same AWS region rather than rewrite the stack.