Where DeepSeek Weights Actually Run on Vercel's AI Gateway

DeepSeek’s share of tokens flowing through Vercel’s AI Gateway jumped from under 1% to 17% in a single month, according to gateway telemetry reported by TheNewStack. What that surge does not tell you is where the weights execute. The sources reviewed here confirm that Vercel operates a routing gateway and that DeepSeek traffic through it is climbing fast. None of them confirm that Vercel serves DeepSeek from Microsoft Azure, which is the premise the title turns on.

What is Vercel’s AI Gateway: a model host, or a routing layer?

Vercel markets its AI Model Gateway as a first-class piece of agentic infrastructure, sold alongside durable orchestration, sandboxed environments, and fluid compute, with the homepage positioning Vercel as the deployment substrate for agents built by Notion, Zapier, and Mintlify. That framing matters: a gateway is a routing and observability layer, not the place inference happens. When a Vercel app calls DeepSeek through the gateway, the request is dispatched to a backend, and the backend is where the GPU, the memory of your prompt, and the legal jurisdiction all live. The gateway decides the route; the backend owns the execution and the data path.

What the cached sources do not establish is which backend Vercel uses for DeepSeek. The Vercel homepage describes the gateway as a feature; the TheNewStack report quotes gateway telemetry showing DeepSeek’s usage climbing; Microsoft’s announcement describes a separate Azure-hosted DeepSeek evaluation for its own Copilot Cowork product. There is no primary Vercel or Microsoft document in the source set that confirms a Vercel-to-Azure DeepSeek routing integration. Read the title’s “via Azure” as the open question this article interrogates, not a confirmed fact.

Where can DeepSeek weights actually run? Three paths

Three hosting paths exist for DeepSeek weights today, and each carries a different trust story.

Path	Who hosts	Where prompts execute	Evidence
DeepSeek-native	DeepSeek’s own endpoint	China	The default endpoint behind the data-sovereignty objection
Hyperscaler-hosted	Microsoft Azure	US (Azure regions)	Microsoft’s Copilot Cowork evaluation
Third-party US inference	Atlas Cloud	US	Lindy’s 100% migration to DeepSeek v4

DeepSeek-native, China-hosted is the most direct path and the one that carries the full data-sovereignty objection. This is the endpoint procurement and legal teams reflexively reject.

Hyperscaler-hosted, US jurisdiction is the path the title is gesturing at. Microsoft confirmed on 2026-06-17 that it is evaluating a fine-tuned DeepSeek V4 on Azure for Copilot Cowork, with execution fully inside Azure so customer data inherits Azure’s enterprise security, compliance, and data-residency protections. Launch of the low-cost option was expected within weeks of that date. Note what this actually is: Microsoft’s own product path, not a documented Vercel feature.

Third-party US inference, not Azure is the path a flagship migration actually picked. Lindy moved 100% of its traffic from Anthropic to DeepSeek v4 but chose US-based Atlas Cloud as the host rather than Azure or DeepSeek’s own endpoint, explicitly to keep inference on US soil. Self-hosting was considered and rejected as a distraction. Lindy’s choice is a useful corrective to the title’s Azure framing: Azure is one US answer, not the only one.

Why does host jurisdiction matter more than model origin?

The compliance objection to DeepSeek has always been about data flow, not weights. Weights are public and downloadable; the objection is that prompts and completions sent to a Beijing-hosted endpoint traverse Chinese jurisdiction. Relocating execution reframes the objection entirely.

Once the weights run on Azure or on a US-based provider like Atlas Cloud, the trust question shifts from “who trained the model” to “who operates the GPUs your prompts touch, under what legal regime, and through what data path.” Microsoft’s pitch for its Azure-hosted DeepSeek leans on exactly this: the Azure surface inherits Azure’s compliance, data-residency, and security posture, plus the policy controls in Foundry Control Plane. The model’s Chinese origin becomes a procurement footnote rather than the blocking issue.

Satya Nadella has been pushing this multi-provider posture publicly, arguing on X that enterprises should avoid over-dependence on a small number of AI providers and that “a frontier without an ecosystem is unstable.” That framing turns host-jurisdiction portability into a feature rather than a workaround: the strategic value is being able to run the same open weights under whichever jurisdiction the workload requires.

The deeper architectural shift points the other way. DeepSeek V4, previewed in April 2026, was rewritten over months to run on CANN, Huawei’s CUDA alternative, reducing reliance on US chip infrastructure. EPFL’s Marcel Salathé called it the first end-to-end AI stack (chips through framework through model) developed entirely in China; that characterization carries medium confidence in the brief. The dry irony: the model most associated with decoupling from US silicon is, in US deployments, being re-coupled to US-hosted infrastructure to clear compliance.

How much cheaper are DeepSeek tokens, and what do you trade for it?

DeepSeek V4 Flash is priced at $0.14 per million input tokens and $0.28 per million output tokens, according to Vercel’s AI Gateway production index, roughly 20 to 50 times cheaper than comparable Anthropic models. Reporting on Microsoft’s Copilot Cowork evaluation cites the same per-token pricing and projects a more-than-90% drop in cost for routine office tasks. The gateway telemetry makes the ratio concrete: DeepSeek hit 17% of token volume through Vercel’s gateway in May 2026 while staying around 1% of spend, per TheNewStack’s reporting. Seventeen percent of the tokens for one percent of the bill is the entire sales pitch compressed into one ratio.

The per-token prices are Vercel’s own published figures. The 20-to-50x and 90% comparisons are vendor framing rather than independent benchmarks, and the 90% figure is measured against routine office tasks, a category that favors the cheaper model.

The real-world cost data point is Lindy. Inference was the company’s largest line item, exceeding payroll, before the switch; CEO Flo Crivello said the move saves millions of dollars a year. The same account tempers the savings: the migration took roughly 100x the originally scoped effort, dominated by online and offline evals and prompt retuning. Cheap tokens did not mean a cheap migration. Anyone planning a frontier-model swap should budget for evals and prompt engineering as the dominant cost, not the per-token price.

The tradeoff is what you inherit along with the host. Running DeepSeek on Azure means Azure’s regions, Azure’s billing, and Azure’s Foundry Models catalog and Foundry Control Plane for routing, observability, and guardrails. Azure’s 600-plus-service footprint and 99.9% SLAs are enterprise-comforting; they are also lock-in. A model whose appeal was being cheap and portable gets bound to a hyperscaler’s pricing mechanics and region availability. The same consumption-based billing Microsoft is moving Copilot Cowork toward, because heavy agent users made flat-rate tiers uneconomic per EVP Charles Lamanna, is the billing shape you inherit when you route through that host.

What does this mean for a Vercel app?

For a practitioner, the takeaway is that “which model” and “which jurisdiction runs it” are now separate dial-turns. A Vercel app that adds DeepSeek through the AI Gateway gets one routing entry; the sovereignty story depends entirely on which backend that entry points at, and the sources here do not confirm that backend is Azure. If Vercel routes to DeepSeek’s own endpoint, the China-jurisdiction objection is still live. If it routes to a US hyperscaler or a third-party US provider, the objection relocates with the execution.

Two moves follow. First, verify the backend before relying on the compliance framing. The procurement win reads as “your prompts never leave Azure’s US regions,” and that only holds if the gateway actually dispatches there. No primary source in this set confirms it does, so the question to put to Vercel’s docs or sales is direct: which backend serves the DeepSeek routing entry, and in which region? Second, treat the comparison ratios as directional rather than contractual. The 17%-of-tokens, 1%-of-spend ratio is the compelling part and it is reported; the per-token prices are Vercel’s published figures, but the 20-to-50x and 90% savings claims are vendor framing against categories chosen to favor the cheaper model.

The durable pattern underneath the timely details is the split between model selection and execution jurisdiction. DeepSeek’s open weights let you separate the two, and US hyperscalers plus providers like Atlas Cloud are competing to be the jurisdiction that runs those weights for compliance-sensitive customers. The title’s “via Azure on Vercel’s AI Gateway” is, on the evidence available, a plausible configuration rather than a shipped integration. Where the weights actually run is the question worth asking. The honest answer, as of the sources reviewed, is whichever backend the gateway points at, and that backend is not yet documented.

Frequently Asked Questions

How does the Azure-hosted DeepSeek path compare to how Anthropic models reach AWS customers?

Anthropic models reach AWS customers through Amazon Bedrock, which hosts the weights inside AWS regions under AWS’s compliance posture. The Azure-hosted DeepSeek evaluation for Copilot Cowork follows the same relocate-the-weights pattern, shifting the trust question from who trained the model to which cloud operates the GPUs. Bedrock is a shipped product; the Azure-DeepSeek combination is, as of the sources reviewed, still an evaluation rather than generally available infrastructure.

Which sources for DeepSeek model specs should practitioners avoid?

Domains such as deepseek.net and deep-seek.com are third-party chat or SEO surfaces rather than official DeepSeek documentation, and they have published fabricated version numbers like ‘V3.2.’ Capability claims, context windows, and benchmark scores from those sites should not be cited. Official DeepSeek GitHub repositories and the API documentation endpoint are the reliable sources for model specifications.

What does a team lose by standardizing on Foundry Control Plane for DeepSeek routing?

Foundry Control Plane policies, guardrails, and observability instrumentation are Azure-specific and do not transfer to a non-Azure host like Atlas Cloud without rework. A team that builds its prompt firewall or content filters inside Foundry has bound that operational layer to Microsoft’s catalog. Migrating the same open weights to another US provider later means rebuilding the control plane, even though the weights themselves move cleanly.

When does the China-hosted DeepSeek endpoint remain the rational choice despite the sovereignty objection?

Workloads carrying no personal data, no proprietary content, and no regulatory exposure face little real cost from Chinese jurisdiction. Batch translation of public-domain text, open-source code review, and synthetic-data generation are cases where the native endpoint’s pricing and earliest model access win outright. The sovereignty objection tracks what flows through the prompt, not a property of the model itself.