Devstral 2 from Mistral: A Fully Open-Source Coding Agent Model You Can Run on a Laptop

Mistral shipped two models under the Devstral 2 name on December 9, 2025, and as of April 2026 practitioners are still untangling what “open source” means in Mistral’s licensing vocabulary. The answer depends on which model you mean. Devstral Small 2 (24B) is the genuinely open variant — Apache 2.0, no restrictions, fits in 14 GB. The flagship 123B carries a revenue clause that most companies above mid-market scale will quietly breach.

What Devstral 2 Actually Is: Two Models, Two Licenses, Two Stories

Mistral’s December 9, 2025 release (Mistral AI. “Introducing: Devstral 2 and Mistral Vibe CLI.”) packaged two distinct models under one product name:

Devstral 2 (123B parameters): the flagship, positioned at enterprise-grade agentic coding tasks
Devstral Small 2 (24B parameters): the laptop-friendly variant, positioned for local inference

Both share a 256K token context window (Hugging Face. “mistralai/Devstral-2-123B-Instruct-2512 — Model Card.”). The capability gap between them is considerably smaller than the parameter ratio implies — but the licensing difference is substantial enough to determine which one your legal team will approve.

The Modified MIT Trap: Why the 123B Model Isn’t Open Source for Most Companies

Devstral 2 (123B) ships under a “modified MIT” license. The binding clause reads (Implicator.ai. “Mistral’s ‘Open Source’ Trick: Build a Great Model, Gate It Behind Revenue Caps, Call It Freedom.”):

“You are not authorized to exercise any rights under this license if the global consolidated monthly revenue of your company (or that of your employer) exceeds $20 million”

That’s $240M in annual revenue as the ceiling. Organizations above that threshold require a separate commercial agreement with Mistral to use the 123B model legally.

The Open Source Initiative’s definition of open source explicitly prohibits discrimination against persons, groups, or fields of endeavor (Implicator.ai. “Mistral’s ‘Open Source’ Trick: Build a Great Model, Gate It Behind Revenue Caps, Call It Freedom.”). A revenue cap violates this directly. Devstral 2 (123B) is source-available with commercial restrictions — it does not qualify as open source under the OSI standard.

Mistral’s marketing describes both models as “open-source and permissively licensed” without clearly distinguishing the revenue gate on the 123B. The distinction matters.

Devstral Small 2: The Actually-Open-Source Coding Agent That Fits in 14 GB

Devstral Small 2 (24B) ships under genuine Apache 2.0 (Hugging Face (bartowski). “Devstral-Small-2-24B-Instruct-2512 GGUF — quantization specs and Apache 2.0 license.”). No revenue restrictions. No commercial use limitations. No requirement to negotiate a separate license. You can deploy it, modify it, fine-tune it, and redistribute it without legal exposure — regardless of your company’s revenue.

The capability gap between Small 2 and the 123B is narrower than the marketing suggests. On SWE-bench Verified, Small 2 scores 68.0% versus the 123B’s 72.2% (Hugging Face. “mistralai/Devstral-2-123B-Instruct-2512 — Model Card.”) — a 4.2 percentage point difference at one-fifth the parameters.

Benchmarks on Real Agentic Tasks

SWE-bench evaluates a model’s ability to resolve real GitHub issues from popular open-source repositories. It’s a more meaningful proxy for coding agent performance than completion-style benchmarks like HumanEval. Devstral 2’s reported scores (Hugging Face. “mistralai/Devstral-2-123B-Instruct-2512 — Model Card.”):

Model	SWE-bench Verified	SWE-bench Multilingual	Terminal Bench 2
Devstral 2 (123B)	72.2%	61.3%	32.6%
Devstral Small 2 (24B)	68.0%	55.7%	—

Terminal Bench 2 scores for Small 2 were not reported in the available data as of April 2026. The multilingual gap (5.6 points) is slightly wider than the Verified gap, suggesting the 123B’s advantage is more pronounced on non-English codebases.

Mistral also claims Devstral 2 is up to 7× more cost-efficient than Claude Sonnet at real-world agentic tasks, priced after the free period at $0.40/$2.00 per million tokens (input/output) for the 123B and $0.10/$0.30 for Small 2 (Simon Willison’s Weblog. “Devstral 2.”). The specific Sonnet version and benchmark methodology behind the 7× figure are not clearly documented, so treat it as directional rather than a precise comparison.

Running It Locally: Hardware Requirements and Quantization Tradeoffs

Devstral 2 (123B) requires at minimum four H100 GPUs for self-hosted deployment (Hugging Face. “mistralai/Devstral-2-123B-Instruct-2512 — Model Card.”). That’s outside the reach of individual practitioners and most small teams without enterprise infrastructure.

Devstral Small 2 operates on a completely different hardware tier (Hugging Face (bartowski). “Devstral-Small-2-24B-Instruct-2512 GGUF — quantization specs and Apache 2.0 license.”):

Quantization	Size	Hardware target
Q4_K_M	14.33 GB	Single RTX 4090 or Apple Silicon Mac (32GB)
Q6_K_L	19.67 GB	16GB RAM + 12GB VRAM (28GB combined)
Q8_0	25.06 GB	Prosumer or higher-end consumer hardware

Q4_K_M is the practical entry point. It fits on widely available consumer hardware while preserving most of the model’s capability. Q6_K_L is worth considering if you have split CPU/GPU RAM available and want higher precision. Q8_0 is full-quality but requires dedicated higher-end hardware.

Mistral Vibe CLI: The Terminal Agent Bundled with the Release

Mistral released Mistral Vibe CLI alongside Devstral 2 (Mistral AI. “Introducing: Devstral 2 and Mistral Vibe CLI.”) — a terminal agent that automates software engineering tasks end-to-end using Devstral as the backend. It ships under Apache 2.0.

The CLI enters a space alongside other terminal coding agents, though as of April 2026 it’s a new release with limited independent evaluation. It’s worth monitoring as the toolchain matures, particularly for workflows that favor a CLI interface over IDE integrations.

Who Should Use Which Model (and Under What Terms)

Scenario	Recommended model	Reason
Individual developer, local inference	Small 2 (Q4_K_M)	Apache 2.0, consumer GPU, no legal exposure
Startup below $20M/month revenue	Either	123B is in-scope; Small 2 eliminates cap risk entirely
Company above $20M/month revenue	Small 2 or commercial license	123B modified MIT requires separate Mistral contract
Subsidiary of a large parent company	Small 2	Parent’s global revenue determines eligibility
API usage, cost-sensitive, eligible	123B via Mistral API	$0.40/$2.00 per million tokens

The benchmark difference — 4.2 points on SWE-bench Verified — is real but unlikely to be meaningful for most production coding workloads. For teams that can use either model legally, Small 2 is the variant with fewer infrastructure requirements, no licensing ambiguity, and hardware requirements that match what practitioners actually own.

FAQ

Does the $20M/month revenue cap apply if I use Devstral 2 (123B) through Mistral’s managed API rather than self-hosting?

The modified MIT license governs use of the model weights directly. API access is governed by Mistral’s separate API terms of service, which may differ. If you’re calling Mistral’s managed API rather than hosting the weights yourself, verify the commercial API terms for your revenue tier — the weight-level license may not apply, but Mistral’s API terms could impose equivalent restrictions. When in doubt, the Small 2 via API eliminates this question entirely.

What is Terminal Bench 2, and is the 32.6% score good?

Terminal Bench 2 evaluates agents operating through a terminal on longer-horizon, multi-step tasks — closer to real-world agentic workflows than single-turn benchmarks. The 123B’s 32.6% (Hugging Face. “mistralai/Devstral-2-123B-Instruct-2512 — Model Card.”) reflects the genuine difficulty of these tasks; it is not a ceiling-scraping score, but agentic task benchmarks are generally harder than code completion benchmarks and scores across the field are lower. No comparative scores for other models at the same tier were included in the sourced data.

Can I fine-tune Devstral Small 2 and ship the fine-tuned model in a product?

Apache 2.0 permits modification, fine-tuning, and redistribution of derivatives without additional licensing requirements. You are not obligated to open-source fine-tuned weights or notify Mistral. The 123B under modified MIT applies the same $20M/month threshold to any derived works.

Devstral 2 from Mistral: A Fully Open-Source Coding Agent Model You Can Run on a Laptop

What Devstral 2 Actually Is: Two Models, Two Licenses, Two Stories

The Modified MIT Trap: Why the 123B Model Isn’t Open Source for Most Companies

Devstral Small 2: The Actually-Open-Source Coding Agent That Fits in 14 GB

Benchmarks on Real Agentic Tasks

Running It Locally: Hardware Requirements and Quantization Tradeoffs

Mistral Vibe CLI: The Terminal Agent Bundled with the Release

Who Should Use Which Model (and Under What Terms)

FAQ

Sources

Enjoyed this article?

What Devstral 2 Actually Is: Two Models, Two Licenses, Two Stories

The Modified MIT Trap: Why the 123B Model Isn’t Open Source for Most Companies

Devstral Small 2: The Actually-Open-Source Coding Agent That Fits in 14 GB

Benchmarks on Real Agentic Tasks

Running It Locally: Hardware Requirements and Quantization Tradeoffs

Mistral Vibe CLI: The Terminal Agent Bundled with the Release

Who Should Use Which Model (and Under What Terms)

FAQ

Sources

Related Articles

free-claude-code Routes Claude Code Through NVIDIA NIM and Local Models After Anthropic's CLI Ban

ggsql Alpha: Write ggplot2-Style Visualizations Directly in SQL

GitHub CLI's `gh skill` Command: One Standard to Rule [Claude Code](/articles/free-claude-code-routes-claude-code-through-nvidia-nim-and-local-models-after/), Copilot, Cursor, and Gemini

Enjoyed this article?