Hugging Face's Spring 2026 State of Open Source Report: China Hits 41% of Downloads, Industry Share Collapses From 70% to 37%

Chinese models accounted for 41% of downloads on Hugging Face in the platform’s Spring 2026 ecosystem audit, surpassing the US. The same report shows independent developers have edged past industry contributors in share of downloads, while the mean size of downloaded open models rose from 827M parameters in 2023 to 20.8B in 2025—though the median stayed near 400M. For Western teams building fine-tune pipelines and procurement shortlists, the assumption that US frontier labs seed the downstream ecosystem no longer matches the data.

The Two Geographic Shifts: China Hits 41% and the US Slips

China’s 41% share of Hugging Face downloads marks the first time the country has outpaced the US on the platform. The shift is driven by release volume: Baidu went from zero Hub uploads in 2024 to more than 100 in 2025, while ByteDance and Tencent each increased their release counts by eight to nine times over the same period (State of Open Source on Hugging Face: Spring 2026). A companion arXiv study notes that US dominance by Google, Meta, and OpenAI has declined sharply amid the rise of Chinese industry and unaffiliated developers (Economies of Open Intelligence: Tracing Power & Participation in the Model Ecosystem).

Industry-affiliated developers accounted for roughly 70% of downloads before 2022; by 2025 that share had fallen to roughly 37%. Independent or unaffiliated developers rose from 17% to 39%, at times accounting for more than half of total usage (State of Open Source on Hugging Face: Spring 2026). The same arXiv study notes that open-weights models surpassed truly open-source models for the first time in 2025 (Economies of Open Intelligence: Tracing Power & Participation in the Model Ecosystem)—a distinction that matters for licensing and modification rights even when download counts look similar.

Small Models Won the Download War — But the Mean Is Lying

Among the top-10 models in each size category, 1-9B parameter models are downloaded roughly four times more often than models above 100B parameters. More than 1.4 billion of roughly 2 billion total downloads come from the 1-9B range (ATOM Project: Relative Adoption Metric). However, the report explicitly acknowledges that automated systems and CI pipelines inflate small-model counts, so the ratio overstates organic human preference (ATOM Project: Relative Adoption Metric).

The divergence between mean and median model size tells a parallel story. The mean downloaded model grew from 827M parameters in 2023 to 20.8B in 2025, while the median increased only marginally from 326M to 406M (State of Open Source on Hugging Face: Spring 2026). The headline average is pulled upward by a small set of large-model users; most production use remains firmly in the sub-billion to few-billion parameter range.

Qwen’s Derivative Moat: 113,000+ and Counting

Alibaba’s Qwen family now constitutes more than 113,000 derivative models and over 200,000 tagged models total on the Hub. As an organization, Alibaba has more derivative models than Google and Meta combined (State of Open Source on Hugging Face: Spring 2026). That downstream activity is a signal procurement teams can use: when a model family accumulates tens of thousands of fine-tuned variants, the surrounding tooling ecosystem—quantization configs, LoRA adapters, deployment scripts—tends to follow.

The Concentration Catch: Half of All Downloads, Six Weeks to Live

Despite the geographic and institutional diversification, the Hub remains heavily concentrated. The top 200 most downloaded models—0.01% of all models—comprise 49.6% of all downloads. Half of all models on the platform have fewer than 200 total downloads (State of Open Source on Hugging Face: Spring 2026). The average model’s active engagement period after release is approximately six weeks (State of Open Source on Hugging Face: Spring 2026), meaning most uploads fail to sustain attention regardless of origin.

What This Means for Western Procurement and Fine-Tune Pipelines

Western practitioners should update two assumptions. First, the default “frontier lab” shortlist—OpenAI, Google, Meta, Anthropic—is no longer a reliable proxy for what the open-weights ecosystem will actually adopt. Chinese organizations, particularly Alibaba via Qwen and Baidu via its 2025 release surge, now drive a plurality of downstream activity (State of Open Source on Hugging Face: Spring 2026). Second, “open” does not mean distributed. With the top 200 models capturing half of all downloads, ecosystem health cannot be read from volume metrics alone (State of Open Source on Hugging Face: Spring 2026).

Teams evaluating base models for fine-tuning should weigh derivative count and sustained engagement over raw download totals. A model with 113,000+ derivatives and a living community of adapters is a safer integration bet than a freshly uploaded checkpoint with a spike of automated downloads and a six-week half-life.

Frequently Asked Questions

Release volume is the stronger driver. Baidu’s 0-to-100+ upload surge and ByteDance and Tencent’s 8–9x increases in 2025 inflated download counts independent of benchmark performance. Download share measures downstream adoption pressure, not capability.

Why does the gap between mean (20.8B) and median (406M) model size matter for hardware planning?

A small cohort of large-model users pulls the mean upward while most production deployments stay in the sub-billion to few-billion range. Infrastructure and budget decisions based on the mean will overestimate the GPU memory and inference costs most teams actually face.

How should procurement teams weigh derivative counts against download totals?

Use derivative count as a first-pass filter: a model family like Qwen with 113,000+ variants has accumulated quantization configs, LoRA adapters, and deployment scripts that reduce integration risk. Then check whether engagement persists beyond the six-week average half-life, since most models lose activity quickly regardless of origin.

Are download counts a reliable proxy for ecosystem health?

Only in combination with other signals. Automated CI pipelines inflate small-model downloads, and the top 200 models capture 49.6% of all volume. Cross-reference downloads with derivative activity, sustained engagement, and license terms before drawing conclusions about adoption.

What does the open-weights majority mean for teams that need modification rights?

Open-weights models surpassed truly open-source models in 2025, but the former typically withhold training code, data, and documentation. Teams relying on reproducibility or the ability to modify and redistribute must verify license terms per model rather than assuming any Hub download carries unrestricted rights.

Hugging Face's Spring 2026 State of Open Source Report: China Hits 41% of Downloads, Industry Share Collapses From 70% to 37%

The Two Geographic Shifts: China Hits 41% and the US Slips

Small Models Won the Download War — But the Mean Is Lying

Qwen’s Derivative Moat: 113,000+ and Counting

The Concentration Catch: Half of All Downloads, Six Weeks to Live

What This Means for Western Procurement and Fine-Tune Pipelines

Frequently Asked Questions

Why does the gap between mean (20.8B) and median (406M) model size matter for hardware planning?

How should procurement teams weigh derivative counts against download totals?

Are download counts a reliable proxy for ecosystem health?

What does the open-weights majority mean for teams that need modification rights?

Sources

Enjoyed this article?

The Two Geographic Shifts: China Hits 41% and the US Slips

Industry Share Halved: From 70% to 37% as Independents Hit 39%

Small Models Won the Download War — But the Mean Is Lying

Qwen’s Derivative Moat: 113,000+ and Counting

The Concentration Catch: Half of All Downloads, Six Weeks to Live

What This Means for Western Procurement and Fine-Tune Pipelines

Frequently Asked Questions

Does the 41% China download share reflect model quality or release volume?

Why does the gap between mean (20.8B) and median (406M) model size matter for hardware planning?

How should procurement teams weigh derivative counts against download totals?

Are download counts a reliable proxy for ecosystem health?

What does the open-weights majority mean for teams that need modification rights?

Sources

Related Articles

free-claude-code Routes Claude Code Through NVIDIA NIM and Local Models After Anthropic's CLI Ban

ggsql Alpha: Write ggplot2-Style Visualizations Directly in SQL

Neural Computers From MetaAuto: Video Models Can Replace Shell Interpreters, But Not Stateful Tasks

Enjoyed this article?