groundy
open source

Models.dev Turns Scattered AI Model Pricing Into One Open Database

Models.dev aggregates 1,000+ AI model specs into a TOML database with a public JSON API, but one stale price field silently corrupts every downstream cost estimate.

6 min · · · 4 sources ↓

Models.dev is a community-maintained, open-source database that aggregates specs and pricing for over 1,000 AI models across 50+ providers, served through a public JSON API. It solves a problem that has been compounding: every model router, cost dashboard, and comparison tool was independently scraping vendor pricing pages, duplicating effort and duplicating stale data. The project has drawn attention to a gap that most builders had been working around rather than fixing.

How it works

The data model is straightforward. Each model is a TOML file in a Git repository. A build pipeline validates every file against Zod schemas, checking for required fields, correct types, value ranges, and valid TOML syntax. A GitHub Action runs this validation on every pull request before merge, which means malformed data cannot land on the main branch without passing type checks.

The site deploys as a Cloudflare Worker, serving pre-rendered HTML and a JSON API from the edge. The codebase is a Bun monorepo with three packages: core for schema and validation logic, web for server-side rendering, and function for the worker entrypoint. The public endpoint at https://models.dev/api.json returns the full catalog as a single JSON payload, which is the integration point most downstream tools will consume.

What the schema tracks

The schema goes beyond a name-and-price lookup. Per-model fields include:

  • Pricing: input, output, reasoning, cache read, cache write, input audio, and output audio costs, all in USD per million tokens.
  • Limits: context window sizes.
  • Capabilities: supported modalities (text, image, audio, video).
  • Metadata: release dates, knowledge cutoffs, open-weights status, and lifecycle status.

The pricing granularity is worth pausing on. Separating cache-read and cache-write costs, and splitting audio I/O from text, reflects how frontier providers actually bill. A schema that only tracks “input price” and “output price” is already outdated for models with prompt-caching tiers.

Automated sync vs. manual updates

The project includes automated sync scripts that fetch model data from external provider APIs. As of late May 2026, these scripts cover Venice, Vercel, Friendli, and Weights & Biases. They merge API-sourced data with manual TOML configs while preserving manually maintained fields like knowledge cutoffs and lifecycle status.

The remaining 45+ providers rely entirely on community-submitted pull requests. There is no SLA on how quickly a pricing change at, say, Mistral or Cohere gets reflected in the database. The sync scripts are a start, but they cover a minority of the catalog.

Who’s behind it

The project is a community-contributed effort used internally to power opencode. This is not a weekend side project; it has a production consumer with an incentive to keep the data accurate. But that incentive is narrow. The project’s primary consumer cares about the models and providers that opencode routes to, which may not align with the long tail of the catalog.

The stale-data problem

The pricing range across frontier models illustrates why a shared catalog matters, and why it is fragile. According to AI Model Benchmarks, verified pricing as of early May 2026 spans from $0.20/$0.50 per million input/output tokens (Grok 4.1 Fast) to $5/$30 (GPT-5.5). A 25x spread on input pricing and a 60x spread on output pricing means even small inaccuracies compound into materially wrong cost estimates downstream.

Consider what happens when a model router uses Models.dev to pick the cheapest provider for a given request. If the cached price for Provider A is $0.50/M input tokens but the actual price was raised to $2.00/M three days ago, the router will preferentially route to a provider that is now 4x more expensive than the database suggests. The error is silent. No alert fires. The cost dashboard that also pulls from Models.dev will underreport spend until someone notices the discrepancy against their actual invoice.

One stale token-price field, replicated across every downstream tool that trusts the API, is the structural risk.

Using it responsibly

Models.dev fills a real gap. Before it, the options were maintaining your own vendor-page scraper, depending on a proprietary catalog like OpenRouter’s model list, or using LiteLLM’s built-in model info. None of those are versioned, community-editable, and schema-validated in the way Models.dev is.

For production use, the reasonable approach is to treat Models.dev as a high-quality starting point, not a primary source. Cache its JSON API output, timestamp the fetch, and cross-reference vendor pricing pages for any model where cost accuracy affects routing or billing. If you are building a cost dashboard, flag any model entry older than a configurable threshold (7 days is conservative) and surface a staleness indicator rather than silently displaying potentially outdated numbers.

The project’s README on GitHub has contribution guidelines. If you consume the data and notice a stale field, the fix is a pull request. The burden of accuracy is distributed, which is both the strength and the structural weakness of any community-maintained catalog.

Frequently Asked Questions

How does Models.dev differ from LiteLLM’s model catalog or OpenRouter’s model list?

LiteLLM bundles model metadata inside a Python package, so updates require a library release cycle. OpenRouter’s list is proprietary to its routing service and exposes only the fields OpenRouter needs for its own billing. Models.dev stores everything as versioned TOML files with a schema that separates seven distinct pricing tiers (input, output, reasoning, cache read, cache write, input audio, output audio). Neither LiteLLM nor OpenRouter exposes an extends mechanism where a wrapper provider can inherit a canonical model definition from a first-party lab and selectively override only the fields that differ.

Does the extends system cover all wrapper providers?

The extends feature currently supports inheritance from six first-party lab definitions: Anthropic, OpenAI, Google, xAI, MiniMax, and Moonshot. A wrapper provider like OpenRouter that mirrors Claude or GPT models can point to the canonical TOML for that model and override only the fields where its offering differs, such as pricing or rate limits. Providers outside those six first-party labs cannot be extended and must be defined from scratch in their own TOML files.

Which fields are most likely to go stale in the database?

Context window sizes, modality lists, and open-weights status change infrequently and tend to stay accurate. Pricing fields and lifecycle status (alpha, beta, deprecated) are the highest-risk entries. Lifecycle transitions such as a model moving from beta to general availability or being deprecated are often announced in a vendor blog post with no corresponding API signal, which means the sync scripts cannot catch them even for the providers they cover.

Can the sync scripts catch price changes for all 50+ providers?

No. The automated scripts cover only a subset: Venice, Vercel, Friendli, and Weights & Biases as of late May 2026. Providers like Mistral, Cohere, and Google have no sync scripts, so a mid-month pricing change from any of them will sit undetected until a contributor notices and files a pull request. The sync scripts also only merge fields that exist in the provider API; manually maintained fields like knowledge cutoffs are preserved during merge but never auto-updated.

sources · 4 cited

  1. DeepWiki - anomalyco/models.dev architecture documentation analysis accessed 2026-05-29
  2. DeepWiki - models.dev automated synchronization analysis accessed 2026-05-29
  3. GitHub - anomalyco/models.dev: An open-source database of AI models primary accessed 2026-05-29
  4. AI Model Comparison & Pricing (2026) | AI Model Benchmarks community accessed 2026-05-29