groundy
models & research

DeepSeek V4.1 Flash vs Qwen 3.7 vs Llama 4.5: June 2026 HF Trending Ranks Velocity, Not Installs

DeepSeek V4.1 Flash led Hugging Face trending in June 2026 within one week, and five of the top ten slots went to Chinese labs. Trending measures velocity, not installs.

8 min···6 sources ↓

The presenc.ai June 2026 Hugging Face trending snapshot, published June 10, ranks DeepSeek V4.1 Flash first, Qwen 3.7 second, Gemma 4 (31B Dense) third, Llama 4.5 Maverick fourth, and GLM-6 fifth among text-generation models. Chinese open-weight labs hold five of the top ten slots, described by presenc.ai as the highest concentration on record. Download velocity is not benchmark rank, and the gap between the two tells you something specific about how practitioners actually choose models.

Trending on Hugging Face reflects recent download velocity, not cumulative installs. Per the presenc.ai methodology note, the ranking favors newly released models by design: a 100k-download spike in the first week outranks a model with 6 million cumulative downloads that has plateaued. Framing this list as “most downloaded” misrepresents what the signal measures.

The rotation rate makes this concrete. The live huggingface.co/models trending page, captured 2026-06-27, already shows a different leader: zai-org/GLM-5.2 (753 billion parameters, 99k downloads, 2.64k likes), with MiniMaxAI/MiniMax-M3 and a DeepSeek-v4-Fable variant in the next slots. That’s a different top-3 than the June 10 snapshot, seventeen days later. Both readings are accurate; neither is stable.

Underneath the large-model trending numbers sits a structural context worth naming. A 2025 study of HF download patterns found that 92.48% of all Hub downloads go to models under 1 billion parameters, and decoder LLMs represent only 9.5% of NLP downloads. The frontier model trending list is a story about deployer attention, not Hub volume.

Who’s in the top 10, and what does each represent?

The June 2026 presenc.ai snapshot ranks the top 10 HF text-generation trending as: (1) DeepSeek V4.1 Flash, (2) Qwen 3.7 flagship, (3) Gemma 4 31B Dense, (4) Llama 4.5 Maverick, (5) GLM-6 flagship MoE, (6) DeepSeek V4.1 smaller variants, (7) Llama 4.5 Scout, (8) Qwen 3.7 Coder, (9) Qwen 3.7 VL, (10) Kimi K2.6. Chinese labs, DeepSeek, Qwen/Alibaba, GLM/Zhipu, Kimi/Moonshot, hold five of those ten slots.

Why did DeepSeek V4.1 Flash hit #1 in a week?

DeepSeek V4.1 Flash reached the top trending slot within one week of release, which requires explanation beyond generic popularity. The architecture and economics do most of the work.

DeepSeek-V4 launched 2026-04-24 as an open-weight mixture-of-experts model with 1.6 trillion total parameters and 49 billion activated parameters per forward pass, trained on 33 trillion tokens, with a 1 million token context window, released in Pro and Flash variants on HuggingFace and ModelScope. The Flash API pricing, per Tencent Cloud’s coverage (a vendor-adjacent source), lists 0.2 yuan per million input tokens at the base tier. The Pro variant is listed at 1 yuan input / 12 yuan output per million, with a note that Pro throughput is limited pending Ascend 950 supernode availability in H2 2026.

MoE activation economics matter here. A model with 1.6T total parameters and 49B active parameters runs inference at roughly the cost of a dense 49B model, not a 1.6T one. That’s what makes large MoEs attractive for practitioners who want headline parameter counts, and the capability gains that often track them, without paying for full dense-model inference costs.

Why does Gemma 4 hold the cumulative download record despite ranking third on velocity?

Gemma 4 placed third on the June 2026 presenc.ai trending list by velocity, but an April 27, 2026 agents-radar digest recorded google/gemma-4-31B-it at 6,042,134 cumulative downloads versus DeepSeek-V4-Pro’s 123,431. That’s an order-of-magnitude gap, and it reflects something structural: Gemma shipped earlier and accumulated a longer download tail.

Format support extends the lead. An April 12, 2026 agents-radar digest found Gemma 4 variants occupying multiple trending slots alongside proliferating Unsloth GGUF quantizations, evidence that the Gemma line had penetrated the local-inference layer before the June frontier wave hit. Dense architecture quantizes more predictably than MoE; the Gemma 4 GGUF ecosystem was already built out.

Its Apache 2.0 license is also load-bearing. Teams with legal constraints around derivative works or commercial redistribution eliminate half the top-10 list immediately and arrive at Gemma 4 or Phi-4 (MIT) by default.

What are the concrete tradeoffs when choosing among these models?

License, parameter profile, and context window are the three axes where the top-5 differ in ways that constrain deployment choices before benchmarks enter the picture.

ModelLicenseArchitectureActive ParamsContext
DeepSeek V4.1 FlashDeepSeek LicenseMoE13B of 284B1M tokens
Qwen 3.7Qwen LicenseDense / MoE variantsvaries by variantvaries
Gemma 4 31B DenseApache 2.0Dense31Bvaries
Llama 4.5 MaverickLlama Community LicenseMoE[unverified][unverified]
GLM-6MIT-modifiedMoE[unverified][unverified]

The presenc.ai snapshot identifies the license spectrum: Gemma 4 on Apache 2.0, Llama 4.5 on the Llama Community License, DeepSeek V4.1 on the DeepSeek License, Qwen 3.7 on the Qwen License, and GLM-6 and Kimi K2.6 on MIT-modified terms. Apache 2.0 and MIT-modified are the permissive end; the Llama Community License and the proprietary vendor licenses each require legal review for enterprise deployment.

Context window is a differentiator primarily for specific workloads: long document analysis, large codebase inference, extended multi-turn sessions. DeepSeek V4’s 1 million token context window is the confirmed specification in the brief; confirmed context figures for the other top-5 models are not available from the research sources.

What does the China-concentration signal actually mean?

Five of the top ten trending slots going to Chinese open-weight labs is a real signal, but it requires context. The presenc.ai analysis captured a week with multiple concurrent Chinese model releases, DeepSeek V4.1, GLM-6, and Kimi K2.6 all landing in a compressed window. Release clustering inflates the count. Two months earlier, Gemma 4 dominated the April 12 trending list across multiple variants.

What the concentration does indicate is that Chinese labs have moved past the “impressive given constraints” framing that governed early DeepSeek and Qwen coverage. Qwen 3.7 Coder and Qwen 3.7 VL each hold separate top-10 slots (positions 8 and 9 per the June snapshot), meaning Alibaba’s multimodal and coding variants are being treated by practitioners as distinct deployment targets worth pulling separately.

The benchmark-versus-adoption gap is worth naming explicitly. Models trending on HF are not necessarily the models scoring highest on whatever aggregate leaderboard is current. The ones deployers reach for first tend to combine legible licensing, established toolchain support (vLLM, Ollama, llama.cpp), and name recognition from a prior version. Benchmark position is a factor; it is not the decisive one for most teams making a real deployment decision.

How should a team actually decide?

The deployment target determines the choice more than any ranking.

Local inference on consumer hardware: Gemma 4 31B Dense with a GGUF quantization is the path of least friction. Unsloth’s GGUF builds were already proliferating across the Gemma line by April 2026. Dense architecture quantizes more predictably than MoE; Apache 2.0 removes license diligence entirely; the tooling questions are already answered.

Self-hosted enterprise with legal review: Apache 2.0 (Gemma 4) or MIT-modified (GLM-6, Kimi K2.6) are the only options that survive most enterprise legal teams without custom negotiation. The Llama Community License and the DeepSeek/Qwen licenses each require a read before signing off.

API routing with long-context requirements: DeepSeek V4.1’s 1M token context window and Flash-tier pricing make it the relevant option, with the caveat that Pro throughput is listed as constrained pending H2 2026 hardware availability, and the pricing figures come from vendor-adjacent coverage.

Download rank tells you where deployer attention went in a narrow window. It doesn’t tell you whether the model will still be there in six months, whether the weights are actually available at the throughput you need, or whether the license survives your legal review. The June 2026 presenc.ai snapshot is a useful signal about momentum. It is not a decision framework on its own.

Frequently Asked Questions

What are DeepSeek V4.1 Flash’s three pricing tiers, and how does the spread compare to Pro?

Tencent Cloud’s coverage (a vendor-adjacent source) lists three Flash tiers at 0.2, 1, and 2 yuan per million tokens, with Pro priced at 1 yuan input and 12 yuan output per million. The 60x spread between Flash’s cheapest tier and Pro’s output price is operationally significant for volume workloads, though these figures come from promotional coverage and require verification against official API documentation before committing to a cost model.

How does the GGUF quantization ecosystem for Gemma 4 compare to what is available for the June 2026 MoE frontrunners?

By April 2026, Unsloth’s GGUF quantizations of the Gemma 4 family had over 3.5 million combined downloads, and the family held 8 of 30 trending slots that month. MoE architectures require quantizing sparse expert layers separately, which delays official GGUF support for models like DeepSeek V4.1 and GLM-6 by weeks to months after release. Teams targeting local inference on consumer hardware are choosing between a mature, well-tested ecosystem and a pending tooling backlog.

Which lab holds the most slots in the June 2026 top-10, and what does the shape of those slots reveal?

Qwen/Alibaba holds three slots (positions 2, 8, and 9) versus DeepSeek’s two (positions 1 and 6). DeepSeek’s two slots reflect a Flash-versus-variants split within one general-purpose model family. Qwen’s three slots span separate capability domains: general text, code generation, and vision-language. Alibaba holds more top-10 real estate than DeepSeek despite DeepSeek occupying the top position.

What restriction does ‘MIT-modified’ typically add for Chinese open-weight models like GLM-6 and Kimi K2.6?

Chinese models labeled ‘MIT-modified’ commonly add a clause restricting use of the weights or derived outputs to train competing commercial large language models, which standard MIT does not include. Academic researchers and non-commercial deployers are generally unaffected, but teams building synthetic data pipelines or fine-tuning workflows that feed commercial training runs need to review the specific license text before treating these as fully permissive.

sources · 6 cited

  1. Hugging Face Trending Models June 2026presenc.aianalysisaccessed 2026-06-27
  2. Models – Hugging Face (live trending)huggingface.coprimaryaccessed 2026-06-27