groundy
developer tools

Vercel Sandbox CLI: Reproducible Agent Runs Belong in CI, Not the Dashboard

Vercel's sandbox CLI pairs access-token auth with snapshotting, tags, and Drives so agent runs become reproducible CI steps rather than dashboard clicks.

8 min···3 sources ↓

Vercel’s sandbox CLI is documented as the integration path for manual testing, agentic workflows, debugging, and one-off operations, alongside the @vercel/sandbox JS SDK and vercel.sandbox Python SDK. The reason it earns a place in a pipeline is the auth model behind it: access tokens exist for exactly the external CI/CD systems the dashboard doesn’t reach, and snapshotting, tags, and Drives turn one-off runs into resumable, reproducible steps.

What the Sandbox CLI is, and where the docs position it

The sandbox CLI is one of three documented ways to drive Vercel Sandbox, and the docs are explicit that it is the one for hands-on, non-application work: “manual testing, agentic workflows, debugging, and one-off operations” (Vercel Sandbox docs).

The docs frame the SDKs as “the recommended way to integrate Vercel Sandbox into your applications,” with the CLI listed alongside for cases where you are not embedding sandbox control into app code (Vercel Sandbox docs). The CLI’s job, as written, is the operator’s job: spin a sandbox up, run commands, inspect logs and file edits, then discard it or checkpoint it.

Sandbox itself is a compute primitive for running untrusted or user-generated code in isolated, ephemeral Linux VMs, built for AI agent output, user uploads, and third-party scripts (Vercel Sandbox docs). Each sandbox is a Firecracker microVM with its own filesystem and network, running Amazon Linux 2023 with node26, node24, node22, or python3.13 available (node24 default), as the vercel-sandbox user with sudo and a working directory of /vercel/sandbox (Vercel Sandbox docs). The same primitive runs system-privileged workloads that need root, including container runtimes like Docker, VPN clients, and FUSE filesystem drivers, and Vercel advertises startup in the millisecond range (Vercel Sandbox docs). That capability set is identical regardless of which integration path drives it; the docs describe one runtime model, not a CLI-specific one.

Why access-token auth makes the CLI the CI/CD-native entry point

The CLI’s pipeline story rests on one auth detail: Vercel Sandbox accepts two credentials, and the second is explicitly for environments the first cannot reach.

The recommended method is Vercel OIDC tokens, which Vercel generates and associates with your project, automatic in production on Vercel and available locally through vercel link and vercel env pull (Vercel Sandbox docs). The fallback is access tokens, and the docs state exactly when to reach for them: “when VERCEL_OIDC_TOKEN is unavailable, such as in external CI/CD systems or non-Vercel environments” (Vercel Sandbox docs).

That quote is the whole CI/CD argument. OIDC assumes you are inside Vercel; access tokens assume you are not. A GitHub Actions runner, a GitLab CI box, or a self-hosted Jenkins controller has no Vercel project context to mint an OIDC token from, so the CLI with an access token is the documented path in.

This is where the CLI separates from the SDKs in practice, not in capability. The SDKs accept access tokens too. The difference is that the CLI is the interface meant for ad-hoc and scripted operation, where you orchestrate a sandbox as a pipeline step instead of spawning one inside request handling.

Snapshotting, persistence, tags, and Drives: the reproducibility primitives

Four documented features do the heavy lifting for reproducible runs: default persistence, explicit snapshotting, key-value tags, and Drives.

The docs list persistence and snapshotting separately, and the distinction matters. Persistence is the default: a sandbox “auto-save[s] state on stop and resume[s] where you left off,” with “no manual snapshot management needed” (Vercel Sandbox docs). Snapshotting is the on-demand version: “save the state of a running sandbox to resume later,” explicitly to “skip dependency installation on subsequent runs” (Vercel Sandbox docs). Persistence saves you from losing work when a sandbox stops. Snapshotting gives you a named checkpoint to resume from deterministically, which is what makes a pipeline reproducible rather than merely durable.

Tags let you “categorize sandboxes by environment, team, or any other criteria using key-value tags” (Vercel Sandbox docs). In CI that is filter plumbing: tag a sandbox env=ci, branch=main, team=platform, and you can list, resume, or tear down by criteria instead of by opaque ID.

Drives, still in beta, attach “persistent filesystem storage to sandboxes and reuse data across sandbox runs” (Vercel Sandbox docs). The payoff is the dependency-install tax: a Drive holding node_modules or a model cache that survives across sandbox lifecycles turns a cold setup into a warm one. Pair Drives with snapshotting and the “skip dependency installation” line stops being a feature bullet and becomes a concrete workflow.

What a scripted lifecycle buys over dashboard clicks

The argument for driving sandboxes from the CLI in CI is reproducibility, and reproducibility is a property scripts have that interactive UIs do not.

Concretely: a snapshot ID pinned in a pipeline step means every run resumes from the same filesystem state, not from whatever an operator clicked last. Snapshotting and Drives exist to amortize setup cost; in a dashboard you pay that cost every fresh session, in a script you pay it once and reference the result. Tags make cleanup and accounting queryable, so a pipeline can spin up, label, and tear down sandboxes by env=ci without a human in the loop.

The deeper point is diffability. A dashboard click has no diff, no blame, no review. A CLI invocation in a YAML file is a line someone can read, change, and bisect when the pipeline breaks. That is the structural difference, and it is the reason the reproducibility primitives matter: they give the script something deterministic to resume from.

None of this is a docs claim that the CLI “moves debugging out of the dashboard.” That is the read this article is making. What the docs do say is that the CLI is for “debugging, and one-off operations” and that access tokens are for “external CI/CD systems” (Vercel Sandbox docs). Taken together, those two sentences describe a workflow where the sandbox lifecycle belongs in version-controlled scripts, and the reproducibility primitives are what make that workflow pay off.

The dashboard’s strengths are real and documented: live previews, and direct access to logs, file edits, and running state for inspection (Vercel Sandbox docs). It is the right tool for understanding a single sandbox. It is the wrong tool for asserting that two runs are the same sandbox.

Limits and open questions

Three things are worth flagging before betting a pipeline on this.

The runtime matrix is timely, not permanent. The docs list node26, node24, node22, and python3.13 on Amazon Linux 2023 with node24 as the default, on a page last updated 2026-06-17 (Vercel Sandbox docs). Node and Python move on their own cadence; pin the runtime you depend on rather than trusting the default.

Drives are still beta (Vercel Sandbox docs). A CI workflow that leans on cross-run persistent storage depends on a feature Vercel can still change. For now, snapshotting and persistence are the safer primitives to anchor on, with Drives as an acceleration layer you can afford to lose.

The docs give no ship date or version number for the CLI itself, and they do not contrast CLI behavior with a dashboard at the isolation level. The Firecracker microVM model is described once, for all integration paths (Vercel Sandbox docs). If the dashboard and the CLI ever appear to behave differently at the runtime level, treat that as your observation, not a documented guarantee.

Vercel, founded as ZEIT in 2015 and valued at $9.3B after a September 2025 Series F (Wikipedia), markets Sandbox under an “Agentic Infrastructure” umbrella alongside Durable Orchestration, the AI Model Gateway, and Fluid Compute, citing Notion running agent conversations on the platform (Vercel). Read that as positioning, where Vercel wants the product to sit, not as a capability spec.

The sandbox CLI earns its keep when a sandbox has to behave identically across runs. Access tokens, snapshots, tags, and Drives are the primitives that make a sandbox lifecycle scriptable, and a scriptable lifecycle is the only kind that reproduces. For a step that has to run the same way next Tuesday, the CLI is the documented path.

Frequently Asked Questions

When would the @vercel/sandbox JS SDK beat the CLI?

The SDKs spawn sandboxes inside application request handling, so they fit when each user action needs its own isolated execution, like a code playground or an in-app agent that runs generated code. The CLI serves the operator side, where a sandbox is a pipeline step rather than a per-request spawn. Vercel ships v0 and the AI SDK in the same Agentic Infrastructure line, which is the product shape that calls for in-app sandbox execution rather than scripted lifecycle.

What does putting an access token in CI cost you operationally?

An access token is a long-lived credential stored as a CI variable, whereas an OIDC token is short-lived and scoped to the Vercel project that minted it. A leaked CI variable can drive sandboxes under whatever scope the token carries, so teams typically scope access tokens narrowly and rotate them on a schedule the OIDC path does not require. That ongoing rotation overhead is the operational tax of running sandboxes off Vercel.

What does pinning the runtime actually require in a script?

Pass the runtime explicitly on every sandbox invocation instead of accepting the node24 default, because the matrix moves on Node and Python’s own cadence and the page listing them was last updated 2026-06-17. The vercel-sandbox user ships with sudo, so a script can install system packages into a snapshot, but that bakes host-specific state into the checkpoint and reduces its portability across runtime versions. Pin both the runtime and any system packages you install, or a Node bump can silently invalidate your snapshot.

Where does relying on Drives break first?

A Drive that caches node_modules across runs creates a hidden coupling to your lockfile. If the lockfile changes but the Drive does not invalidate, the sandbox resumes with stale dependency resolution that passes locally and fails unpredictably in CI. Treat Drives as an acceleration layer you can disable without breaking the pipeline, and let snapshotting or a clean install carry correctness.

What would shift the CLI from a standalone tool to a building block?

Vercel bundles Sandbox with Durable Orchestration, the AI Model Gateway, and Fluid Compute under one Agentic Infrastructure line. The forward risk is that orchestration absorbs sandbox lifecycle, in which case the CLI commands you pin in CI become the substrate a higher-level orchestrator drives rather than the interface you script against directly. The reproducibility primitives would still matter, but the entry point would move up a layer.

sources · 3 cited

  1. Vercel Sandboxvercel.comprimaryaccessed 2026-06-28
  2. Vercelen.wikipedia.orgcommunityaccessed 2026-06-28
  3. Agentic Infrastructurevercel.comvendoraccessed 2026-06-28