Vercel’s /billing/charges endpoint exposes team-level usage and cost data in the FinOps open standard FOCUS v1.3, returned as a streamed newline-delimited JSON response with one-day granularity. That gives you a programmable feed you can wire into a CI cost gate: a step that fetches projected spend, compares it against a budget, and fails the deploy when the threshold is crossed. The constraint to design around is attribution. The stream breaks cost down by Vercel service and by day at the team scope, not by project, environment, or the specific agent run that caused the overrun.
What the /billing/charges endpoint actually returns
The /billing/charges endpoint returns cost data in the FOCUS v1.3 open-standard format, the same column schema the FinOps Foundation maintains for normalizing cloud billing across AWS, Azure, and GCP. The practical payoff is that a Vercel cost row lands in an existing FinOps ingestion pipeline without a bespoke adapter mapping Vercel’s internal units onto someone else’s schema.
The endpoint supports one-day granularity with a maximum date range of one year, and responses are streamed as newline-delimited JSON (JSONL) to keep large datasets from exhausting memory. A year of team-level spend is a long stream, so the JSONL framing matters: you consume it incrementally rather than materializing the whole response. The changelog’s curl example uses -N (unbuffered) and --compressed alongside an Accept-Encoding: gzip header, confirming the server compresses the body and expects the client to stream it.
The endpoint is called as GET https://api.vercel.com/v1/billing/charges?teamId=<team> with a Bearer token. It is scoped to a team, not to a project or environment. That single scoping decision is the thing the rest of this article design around, because it determines what a CI gate can and cannot blame a cost breach on.
How to reach it from CI: the CLI and SDK surface
Vercel ships three credible entry points for a CI step, and the right one depends on how much plumbing you want to own.
vercel usage is the interactive surface. Per the CLI reference, it outputs a table broken down by Vercel Service showing Usage (in USD, or the legacy Metered Infrastructure Units), Effective Cost, and Billed Cost. With --breakdown daily|weekly|monthly it groups spend per period with per-service detail and a grand total. For quick human inspection of the current billing period, that is enough. For a gate, it is the wrong tool: parsing a CLI-rendered table is brittle, and vercel usage is built for a human reading a terminal.
For a script that does not want to manage a long-lived token, vercel api (beta) makes authenticated HTTP requests to https://api.vercel.com using the existing CLI session. The reference notes --paginate to fetch all pages and --generate=curl to emit a reusable curl command. The advantage is that a CI runner authenticated via vercel login can call the billing endpoint without a plaintext token in the environment. The disadvantage is the beta tag: treat the surface as likely to change.
The cleanest programmatic path is the @vercel/sdk, which the changelog demonstrates with a streaming for await loop:
import { Vercel } from "@vercel/sdk";
const vercel = new Vercel({ bearerToken: process.env.VERCEL_TOKEN });
const result = await vercel.billing.listBillingCharges({ from: "2026-06-01T00:00:00.000Z", to: "2026-06-24T00:00:00.000Z", teamId: process.env.VERCEL_TEAM_ID,});
for await (const event of result) { // accumulate per-service effective cost}This is the shape you want behind a cost gate: a typed, streaming aggregation loop you control end to end.
A concrete GitHub Actions cost gate
The deploy-failing pattern is four steps: authenticate, fetch the current period, project to month-end, compare against the budget.
A minimal job that runs before the deploy step:
- name: Spend gate env: VERCEL_TOKEN: ${{ secrets.VERCEL_TOKEN }} VERCEL_TEAM_ID: ${{ secrets.VERCEL_TEAM_ID }} run: node scripts/spend-gate.mjsInside spend-gate.mjs, you accumulate effectiveCost per service across the streaming response, sum the total for the current billing period, and apply a linear projection: (month-to-date spend / elapsed days in month) * total days in month. If the projection exceeds BUDGET_USD, exit non-zero and the deploy step never runs. The projection is deliberately crude. The point is to fail a runaway-spend trajectory before the invoice lands, not to be a precise forecast.
Two implementation details are load-bearing. First, the gate must respect the rate-limit headers the REST API exposes: X-RateLimit-Limit, X-RateLimit-Remaining, and X-RateLimit-Reset. A polling gate that ignores these will get throttled and then silently under-report spend. Second, the gate compares against the team total, not a per-project subtotal, because the API does not hand you one.
The attribution wall: per-service and per-day, not per-run
Here is the design constraint the launch coverage underplays. The /billing/charges stream is team-scoped and broken down by Vercel service and by day. It is not broken down by project, by environment, by deployment, or by the agent run that produced the spend.
That means a cost gate can correctly flag that the team is trending 40% over budget this month, and be unable to name whether the cause is a Vercel Sandbox session that a coding agent left looping overnight, an AI Gateway route that a misconfigured fallback pinned to an expensive model, or a legitimate traffic spike on a production deploy. The breach is visible. The culprit is not.
Per-resource attribution does exist on the platform, but at a different layer. Vercel’s integration billing API supports two scopes: installation-level billing for a whole integration and resource-level billing for an individual resource. That tells you per-resource attribution is a billing-side concept that the platform understands. It is not, however, exposed through the team /billing/charges stream that a CI gate consumes.
This is not a Vercel-specific bug. AWS Cost Explorer, Azure Cost Management, and GCP billing all share the same shape of gap: granular spend data that stops one level above the resource you actually want to blame. A CI cost gate built on any of them hits the same attribution wall. What is new here is that the wall now sits between you and a deploy-failing enforcement point, where the gap costs more than it does in a monthly invoice review.
Where it fits: Vantage, focus-analyze-charges, and the billing distinction
The launch came with a reference implementation and a partner integration, and both clarify what the stream is for.
Vercel’s own vercel-labs/focus-analyze-charges repo demonstrates the client-side work the stream demands: gzip decompression, JSONL parsing, and aggregation into per-day and per-service totals carrying MIUs, effective cost, and billed cost. Read it before writing your own aggregator. The bulk of a cost gate’s logic is the same accumulation loop, and the repo already shows the field names and the decompression handling.
Vantage shipped a native integration that syncs Vercel usage and cost data alongside other cloud providers, placing Vercel spend inside a multi-cloud FinOps view. That is the right home for cost reporting and reconciliation across providers, but it is the wrong home for a deploy-failing gate. FinOps tooling answers “where did the money go”; a CI gate answers “should this deploy proceed”. Different question, different latency requirement, different consumer.
One distinction matters for anyone wiring this up: do not conflate the consumption endpoint with the integration-billing endpoints. The team-level /v1/billing/charges stream is for teams reading their own spend. The Submit Billing Data and Submit Invoice integration endpoints are for integrations charging their own customers. Same product family, different audience, different data. A gate that accidentally calls the integration surface will get nothing useful back.
What a cost gate does not solve
A deploy-failing gate built on /billing/charges enforces a budget trajectory. It does not enforce three things teams usually also want.
It does not enforce real-time spend. The one-day granularity and the daily batch settle mean the gate reads yesterday’s numbers. A coding agent that burns a week of budget in an afternoon will not trip the gate until the following day’s data lands, by which point the spend is already incurred.
It does not enforce per-PR budgets. Because the stream is team-scoped and per-service, there is no row in the response that says “this pull request spent X.” A per-PR gate requires reconstructing attribution from deployment metadata, as above, and the reconstruction is approximate. If you need strict per-PR budget enforcement, the billing API is the wrong data source; you want deployment-time usage estimates or per-deployment cost telemetry, which the stream does not provide.
It does not self-heal. The gate fails the deploy. It does not scale the run down, kill the looping sandbox, or reroute the AI Gateway away from the expensive model. Those remediations are a separate system a human or a different automation layer has to build. The billing API gives you the signal; the action on that signal is yours to wire.
The honest summary is narrow. The /billing/charges endpoint makes Vercel spend queryable in a standard format through a streaming API, and that is enough to build a deploy-failing cost gate on top of the existing @vercel/sdk. It does not make that spend attributable to the agent run that caused it, and it does not make the gate real-time. Teams that expect automatic cost optimization from the launch will be disappointed; teams that want to move budget enforcement out of the monthly invoice and into the deploy pipeline now have the plumbing to do it.
Frequently Asked Questions
What is the practical difference between the ‘effective cost’ and ‘billed cost’ fields the FOCUS stream returns?
In FOCUS v1.3 terminology, effective cost reflects the amount after credits, discounts, and amortized commitments are applied; billed cost is the pre-discount line-item charge. A gate threshold should sum effectiveCost to compare against a dollar budget, not billedCost, because billedCost overstates what the team actually owes when negotiated discounts are in effect.
If the API rate limiter throttles the gate mid-stream, does the accumulated spend total become unreliable?
Yes. A 429 returned mid-JSONL stream truncates the accumulated total, so the gate under-reports spend and can pass a deployment that should be blocked. The safer pattern is to check X-RateLimit-Remaining before starting the stream and exit non-zero if quota is critically low, rather than allowing a partial read to produce a falsely low total.
How does Vercel’s one-day granularity ceiling compare to AWS Cost Explorer’s hourly option, and does the difference matter for agentic workloads?
AWS Cost Explorer supports hourly granularity on some dimensions, shortening the settle lag compared to Vercel’s daily ceiling. The attribution gap is the same class of problem on both platforms. The daily lag matters most for agentic workloads that can exhaust a monthly budget in hours, where an hourly settle window would at least catch the breach on the same day rather than the following morning.
Could FOCUS v1.3’s ResourceId column eventually carry per-project attribution without requiring a schema change from Vercel?
Yes. FOCUS v1.3 includes a ResourceId column designed for sub-service attribution. Vercel could populate it with a projectId or deploymentId without leaving the standard. Until that field is populated, per-project breakdowns require joining the billing stream against the deployments API by day and estimating attribution from overlapping timestamps, which is the reconstruction approach described in the article.
Does vercel api --generate=curl produce a command safe to cache across CI runs, or does the embedded token expire?
The generated curl command embeds the session token active at generation time. When that session expires or is rotated, the cached command returns a 401, often silently. For pipelines that rotate credentials on a schedule, the @vercel/sdk with a dedicated API token stored in a secrets manager is more durable because the token reference is resolved at runtime, not baked into a shell command.