groundy
models & research

Chinese AI Models Compared: DeepSeek, Qwen, Kimi, Doubao, and Ernie

DeepSeek isn't China's only frontier AI. Compare DeepSeek, Qwen, Kimi, Doubao, and Ernie on benchmarks, licensing, API access, and use-case fit.

10 min···16 sources ↓

China’s AI model ecosystem is diverse and rapidly evolving. DeepSeek dominates Western headlines, but Qwen, Kimi, Doubao, and Ernie each occupy distinct positions (see also Qwen’s dense model architecture), with different licensing models, API accessibility, and technical strengths. This structured comparison covers all five — plus Zhipu’s GLM-5.2, an increasingly relevant sixth entry — refreshed for mid-2026 with the DeepSeek V4, Qwen3.7-Max, Kimi K2.7 Code, and GLM-5.2 releases.


Why the Chinese AI Ecosystem Deserves a Map

When DeepSeek-R1 launched in January 2025 and matched OpenAI o1’s performance on MATH-500 (97.3 vs. 96.4) at a fraction of the cost, it forced a reassessment of how much frontier AI performance actually costs to produce. (DeepSeek AI. “DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning.” arXiv, January 2025) But the “DeepSeek moment” was a narrow lens on a broader story.

Chinese AI labs collectively account for roughly 41% of all model downloads on Hugging Face, a plurality that overtook U.S. labs, according to Hugging Face’s Spring 2026 State of Open Source report.2 The ecosystem spans at least six distinct companies with distinct strategies: open-source maximalism (DeepSeek), closed-but-cheap API plays (Doubao), long-context specialization (Kimi), consumer-first deployment (Doubao/ByteDance), search-engine heritage (Baidu/ERNIE), and fully permissive open weights (Zhipu/GLM).

Understanding each model on its own terms is more useful than ranking them on a single leaderboard.


The Competitors at a Glance

Model FamilyCompanyOpen SourceContext WindowInternational API
DeepSeek V4 Pro / V4 Flash / R1DeepSeek AIYes (MIT)1M (V4)Easy
Qwen3.7-Max / Qwen3 / QwQ-32BAlibaba CloudYes (Apache 2.0)1M (3.7-Max)Easy
Kimi K2.6 / K2.7 CodeMoonshot AIYes (Mod. MIT)256KYes
Doubao Seed 2.0ByteDanceNoUndisclosedDifficult
ERNIE 4.5 / X1.1 / 5.0BaiduPartial (Apache 2.0)UndisclosedLimited
GLM-5.2Zhipu AI (Z.ai)Yes (MIT)1MYes

DeepSeek: The Open-Source Disruptor

DeepSeek AI’s flagship pair, V3 for general tasks and R1 for reasoning, defined the terms of the 2025 cost efficiency debate. DeepSeek-V3 uses a 671B-parameter Mixture-of-Experts architecture with only ~37B parameters active at inference, combined with Multi-head Latent Attention (MLA) to compress key-value representations, a genuine architectural innovation, not just efficient training. (DeepSeek AI. “DeepSeek-V3 Technical Report.” arXiv, December 2024)

DeepSeek-R1 went further by training its reasoning chain through pure reinforcement learning, skipping supervised fine-tuning for bootstrapping. The result: an open-source reasoning model competitive with OpenAI o1 on AIME 2024 (79.8 vs. 79.2) and MATH-500 (97.3 vs. 96.4), available under an MIT license at roughly $0.55 per million input tokens and $2.19 per million output tokens as of March 2026.4

In late April 2026, DeepSeek shipped V4, splitting the flagship into V4 Pro (1.6T total / 49B active parameters) and V4 Flash (284B total / 13B active). On the Artificial Analysis Intelligence Index, V4 Pro sits second only to Kimi K2.6, returning DeepSeek to the leading edge of open-weights releases. (Artificial Analysis. “DeepSeek is back among the leading open weights models with V4 Pro and V4 Flash.”) V3.2 remains supported for production deployments that haven’t migrated yet, at the same pricing tier.

Use case fit: Best for mathematics, code generation, and any reasoning-intensive task where you need frontier-level capability at open-source prices. The MIT license permits distillation, which has spawned a generation of fine-tuned derivatives.


Qwen: The Broadest Ecosystem

Alibaba’s Qwen series has become the most downloaded open-source model family globally, outpacing Meta’s Llama by download volume in 2025.6 The range is remarkable: Qwen3 covers sizes from 0.5B to 235B, with QwQ-32B as a standalone reasoning model and Qwen2.5-Coder as a specialized code variant.

The Qwen3.5 release in February 2026 introduced a new architecture that fuses linear attention (Gated Delta Networks) with sparse MoE, delivering 8.6x faster decoding than Qwen3-Max at 32K context while claiming the highest instruction-following scores of any model evaluated on IFBench (76.5), and outperforming GPT-5.2 on MathVision benchmarks (88.6 vs. 83.0).7 In April 2026 Alibaba previewed Qwen3.6-Max, with improved agentic coding, stronger world knowledge, and more reliable instruction following. Community evaluations placed it close to DeepSeek V4 and Kimi K2.6 on open-weights leaderboards. (DeepLearning.AI The Batch. “Kimi K2.6 Matches Open Qwen3.6 Max.”) Since then Alibaba has open-sourced two Qwen3.6 variants (a dense 27B and a 35B-A3B MoE) and launched its next flagship, Qwen3.7-Max, on May 19, 2026. Qwen3.7-Max is a closed-weight reasoning agent model with a 1M-token context window, scoring 56.6 on the Artificial Analysis Intelligence Index; it is text-only (use Qwen3.7-Plus-Preview for vision input) and is designed specifically for long-horizon autonomous execution, with a verified 35-hour continuous run reported by Alibaba. So treat “Max-Preview” as a fast-moving label rather than a fixed endpoint. [Updated June 2026]

Most models in the Qwen family carry Apache 2.0 licensing, a substantial advantage over MIT for enterprise legal teams who need clarity on patent clauses. The closed-source Qwen-Max (API only) remains for users who want the best Alibaba has to offer without self-hosting.

Use case fit: The breadth of sizes makes Qwen the practical choice for on-device deployments (0.5B–7B), resource-constrained inference (14B–32B), and production API use. Qwen2.5-Coder 32B outperforms GPT-4o on LiveCodeBench (37.2% vs. 29.2%) and Aider code editing benchmarks (73.7%) (Qwen Blog. “Qwen2.5-Coder: Code the World.” October 2024), making it the strongest open-source code model in this comparison at time of writing.


Kimi: Long Context as a First Principle

Moonshot AI, founded in 2023, built its identity around one idea: context length matters more than most labs acknowledge. Kimi k1.5 (January 2025) demonstrated that scaling reinforcement learning context windows to 128K tokens produces continuous reasoning improvements, without the Monte Carlo Tree Search complexity used by competing approaches. (Moonshot AI. “Kimi k1.5: Scaling Reinforcement Learning with LLMs.” January 2025)

The Kimi K2 architecture escalates this to 1 trillion total parameters (32B active), making it the largest open-source MoE in this comparison by raw parameter count. Kimi K2.5 (January 2026) extended context to 256K tokens and added native multimodal capabilities, with automatic caching reducing effective input costs by up to 75% on long-context workloads.10 On April 20, 2026, Moonshot released Kimi K2.6, a 1T-parameter vision-language model that became the first open-weight system to beat GPT-5.4 on SWE-Bench Pro (58.6 vs. 57.7), with native INT4 quantization, a “preserve thinking” mode, and agent-swarm capabilities [Updated June 2026]. (DeepLearning.AI The Batch. “Kimi K2.6 Matches Open Qwen3.6 Max.”)

Pricing at the API level is context-window-tiered: ~$0.20 per million input tokens for 8K context up to $2.00 per million for 128K. The API is OpenAI SDK-compatible, a drop-in replacement via api.moonshot.ai/v1, accepting USD payment from international developers.

Use case fit: Long-document analysis, research synthesis, and agentic tasks where maintaining coherent context across many tool calls is the bottleneck. With K2.6, Kimi is now competitive on reasoning benchmarks as well as long-context workflows, narrowing the prior gap to DeepSeek-R1.

June 2026 update — Kimi K2.7 Code: On June 12, 2026, Moonshot released Kimi K2.7 Code, a coding-specialized derivative of the K2.7 architecture with the same 1T-parameter MoE structure (32B active), a 256K context window, and native INT4 weights. Pricing sits at $0.95 per million input tokens and $4.00 per million output tokens, with cached input at $0.19 per million — roughly a tenth of Claude Fable 5’s list rates. The model ships with thinking always enabled (preserve_thinking) and no instant mode, so trivial edits still incur reasoning overhead. Moonshot published a six-benchmark comparison table showing K2.7-Code trailing GPT-5.5 and Claude Opus 4.8 in 11 of 12 cells, winning only on MCPMark Verified (tool-server integration). The efficiency pitch — approximately 30% fewer thinking tokens than K2.6 per accepted change — is the actual differentiator, not a benchmark headline. See Moonshot’s Kimi K2.7 Code benchmarks and pricing analysis for the full breakdown. [Updated June 2026]


Doubao: Consumer Scale, Enterprise Pricing

ByteDance’s Doubao is arguably the most powerful model in this comparison by deployment scale, with over 340 million monthly active users on the consumer app as of mid-2026 [Updated June 2026], and the least accessible to international developers.11

The Doubao Seed 2.0 family (February 14, 2026 launch) includes four variants covering Pro, Lite, Mini, and Code specializations. Doubao Seed 2.0 Pro claims competitive positioning with frontier Western models on AIME 2025 (98.3) and Codeforces (rating 3020), and ranked 6th on the LMSYS Text Arena and 3rd on Vision Arena as of its launch date.12 Doubao Seed 2.0 Pro is priced at approximately $0.47 per million input tokens and $2.37 per million output tokens via Volcano Engine [Updated June 2026]; the older Doubao-1.5-Pro still runs at $0.11 per million input tokens for teams tolerating the capability downgrade.

The access barrier is real. Standard registration requires a Chinese phone number. International enterprise access routes through Volcano Engine (ByteDance’s cloud platform) via negotiated agreements, or through third-party API aggregators.

Use case fit: Businesses operating within ByteDance’s product ecosystem (Douyin, TikTok) or companies with existing Volcano Engine relationships. The multimodal video understanding capabilities are strong (VideoMME scores 89.5), making it competitive for media-adjacent applications. For most international developers, the friction outweighs the price advantage unless there’s a specific ByteDance integration requirement.


Ernie: Heritage Brand, Mixed Execution

Baidu’s ERNIE has the longest pedigree in Chinese NLP: the name stands for Enhanced Representation through kNowledge IntEgration, and early versions in 2019 preceded the modern LLM era. ERNIE 4.5, released March 2025 and open-sourced under Apache 2.0 in July 2025, covers a broad parameter range (0.3B dense to 424B MoE) and achieves MMLU-Pro scores near GPT-4.5 (~78) according to Baidu’s benchmarks.13

ERNIE X1 and X1.1 target the reasoning segment, with Baidu claiming X1 matches DeepSeek-R1 at 50% of the cost, and X1.1 surpassing DeepSeek R1-0528 on key benchmarks. ERNIE 5.0, previewed November 2025 and fully released January 22, 2026, expands to 2.4 trillion parameters via a sparse MoE that activates under 3% of experts per inference, and natively handles text, images, audio, and video. [Updated June 2026]

The caveat: Baidu’s benchmark claims have historically attracted skepticism, and independent verification of ERNIE’s stated performance has been uneven. ERNIE 4.5’s open-source release is a substantial shift, signaling that Baidu recognizes the open-weight ecosystem as a competitive pressure rather than a curiosity.

Use case fit: Chinese-language applications where Baidu’s search data heritage matters, and workflows that require full multimodal capability including audio and video (ERNIE 5.0). The open-sourced ERNIE 4.5 is accessible and well-documented; the proprietary ERNIE 5.0 is less straightforward for international developers to access.


Zhipu / GLM-5.2: The Sixth Contender

Zhipu AI (also operating as Z.ai) released GLM-5.2 on June 13, 2026, and as of June 19 the model sits among the stronger entries in this comparison on several coding and reasoning benchmarks, not merely an honorable mention. GLM-5.2 is a 753B-parameter Mixture-of-Experts architecture (README designation “744B-A40B,” implying approximately 40B active parameters) with a 1M-token context window, an Anthropic Messages API-compatible endpoint, and MIT-licensed weights on Hugging Face at zai-org/GLM-5.2 (BF16) and zai-org/GLM-5.2-FP8. The 1M context is a 5x increase over GLM-5.1’s 200K window. Architecture highlights include IndexShare sparse attention (reuses the same indexer across every four sparse-attention layers, reducing per-token FLOPs by 2.9x at 1M context) and an MTP speculative decoding layer. The model deploys via SGLang, vLLM, Transformers, or KTransformers.

Benchmarks (official, as of June 19 2026): SWE-bench Pro 62.1% (ahead of Kimi K2.6’s 58.6% on that benchmark); Terminal-Bench 2.1 81.0, trailing Claude Opus 4.8’s 85.0 by 4 points; AIME 2026 99.2; HMMT Nov 2025 94.4; GPQA-Diamond 91.2; HLE 40.5. The SWE-bench Pro gain over GLM-5.1 (58.4%) is 3.7 points; Terminal-Bench 2.1 improved from 62.0 to 81.0. These are Zhipu’s self-reported figures; independent replication is not yet available as of this writing.

Pricing: The hosted API runs on a flat subscription: GLM Coding Plan Lite at $18/month (approximately 400 prompts per week), Pro at 5x Lite usage, Max at 20x Lite usage (~$112/month on annual billing). There are no per-token fees on the subscription tiers. The MIT-licensed weights are self-hostable at hardware cost only, with no per-token fees under the license. GLM-5.2 is the most permissively licensed long-context model in this comparison: MIT with no commercial revenue threshold, unlike Kimi’s Modified MIT (which adds a display obligation above $20M/month revenue or 100M MAUs). Eight coding agents supported at launch include Claude Code, Cline, and Goose via the Anthropic-compatible endpoint.

The reliability caveat from GLM-5.1 deserves mention: a degradation event in June 2026 exposed a ~50% bad-output rate under production load for the prior generation. Whether the GLM-5.2 architecture resolves that is not yet established by independent testing. See Zhipu ships GLM-5.2 with 1M context and MIT weights for the release-day analysis and Zhipu open-sources GLM-5.2 under MIT while Anthropic tightens model access for the open-source licensing angle. [Updated June 2026]


Benchmark Comparison: Where They Stand

BenchmarkDeepSeek V4 Pro / R1Qwen3.6 / 3.5Kimi K2.6Doubao Seed 2.0 ProERNIE X1.1GLM-5.2
MMLU90.8 (R1)n/an/an/an/an/a
MATH-50097.3 (R1)n/a96.2 (K2.5)n/an/an/a
AIME 2024/202579.8 (R1)n/a77.5 (K2.5)98.3Competitiven/a
AIME 2026n/an/an/an/an/a99.2
HMMT Nov 2025n/an/an/an/an/a94.4
GPQA-Diamondn/an/an/an/an/a91.2
HLEn/an/an/an/an/a40.5
IFBenchn/a76.5 (3.5)n/an/an/an/a
MathVisionn/a88.6 (3.5)n/an/an/an/a
Codeforcesn/an/a94th pct. (K2.5)3020 ratingn/an/a
SWE-Bench Pron/an/a58.6 (K2.6)n/an/a62.1%
Terminal-Bench 2.1n/an/an/an/an/a81.0 (Opus 4.8: 85.0)
LMSYS Text Arena (Feb 2026)n/an/an/a6thn/an/a
Artificial Analysis Intelligence Index2nd of open-weights (V4 Pro)close to V4 (3.6 Max)1st of open-weights (K2.6)n/an/anot yet rated

GLM-5.2 benchmarks are Zhipu self-reported as of June 19 2026; independent replication pending. All other benchmarks are model-reported or third-party verified where noted. Direct comparisons across labs require caution: evaluation setups differ.


API Pricing at a Glance

ModelInput ($/1M)Output ($/1M)Notes
DeepSeek V4 Pro$0.435$0.8701.6T total / 49B active; 1M context; cached input $0.0036
DeepSeek V4 Flash$0.14$0.28284B total / 13B active; 1M context
DeepSeek V3.2$0.28$0.42Cached: $0.028
DeepSeek-R1$0.55$2.19Reasoning model
Qwen-Plus$0.40$1.20
Qwen3-Max$0.78$3.90
Qwen3.6-Max-Preview$1.30$7.80Preview; superseded by Qwen3.7-Max (May 2026)
Kimi K2.6$0.60$2.50Auto-caching; INT4 quant; vision-language
Kimi K2.7 Code$0.95$4.00Cached input $0.19; coding-specialized; June 2026
Doubao Seed 2.0 Pro$0.47$2.37China-accessible; limited intl.
Doubao-1.5-Pro$0.11$0.275Legacy tier; China-accessible
ERNIE X1$0.28$1.10
ERNIE 4.5$0.55$2.20
GLM-5.2 (Zhipu)SubscriptionGLM Coding Plan: Lite $18/mo (~400 prompts/wk), Pro 5x, Max ~$112/mo yearly; MIT weights self-hostable at hardware cost only; 1M context
GPT-4o (reference)~$2.50~$10.00

Choosing the Right Model

The selection decision reduces to three axes: openness (do you need to self-host?), access (are you outside China?), and task type (reasoning, code, long context, or multimodal?).

  • Self-hosting a reasoning model: DeepSeek-R1 (or V4 Pro once it stabilizes) with MIT license. Distill down to the 32B or 70B variant if hardware is constrained.
  • Self-hosting a long-context coding model (June 2026): GLM-5.2 at MIT license with no commercial revenue threshold. The 753B MoE weights are live on HuggingFace (BF16 and FP8); self-hosting requires significant GPU capacity, but there are no per-token fees at any scale. SWE-bench Pro 62.1% and Terminal-Bench 2.1 81.0 are the relevant capability anchors. Wait for independent benchmark replication before production commitment.
  • Self-hosting a general model: Qwen3 series with Apache 2.0. The breadth of sizes from 0.5B to 235B is unmatched, and Qwen3.6-Max is the natural upgrade path once GA arrives.
  • Long-context document work via API: Kimi K2.6 at 256K context with automatic caching and INT4 quantization.
  • Agentic coding at volume: Kimi K2.7 Code (June 2026) at $0.95/$4.00 per million tokens if token efficiency per accepted change matters more than raw benchmark rank; run your own repo eval before committing (see K2.7 Code benchmark and pricing breakdown).
  • Cheapest capable API with easy international access: DeepSeek-V3.2 at $0.28/$0.42 per million tokens.
  • Chinese-language production workloads: DeepSeek or ERNIE, both with strong native language performance and open-source options.
  • Multimodal including video: Doubao Seed 2.0 or ERNIE 5.0, with the caveat that international access is limited for both.

Where Claude Fable 5 fits in this comparison: Anthropic’s June 9, 2026 release of Claude Fable 5 (priced at $10/$50 per million tokens, 1M context) establishes a new tier above Opus 4.8, not a direct substitute for any of the Chinese models here. For teams comparing against the Chinese API field on price, DeepSeek V4 Pro at $0.435/$0.870 and Kimi K2.6 at $0.60/$2.50 remain in a different cost bracket. Fable 5 is relevant as a Western performance ceiling for teams evaluating whether Chinese open-weight quality is sufficient for their task before committing to an API dependency. See best AI models 2026 for a broader cross-lab ranking, and AI code generation benchmarks 2026 for task-level coding numbers.


Frequently Asked Questions

Q: Is DeepSeek actually the best Chinese AI model? A: As of June 2026, the picture is more distributed than in 2025. DeepSeek V4 Pro and Kimi K2.6 sit at the top of open-weights leaderboards by Artificial Analysis ranking. GLM-5.2 (Zhipu, June 13) posts the highest SWE-bench Pro among Chinese models at 62.1%, ahead of Kimi K2.6’s 58.6%, though GLM-5.2’s numbers are self-reported and independent replication is pending. “Best” depends on the task: GLM-5.2 leads on reported SWE-bench Pro; Kimi K2.6 leads on independently verified long-context and coding; Kimi K2.7 Code leads on agentic coding token efficiency; Qwen3.7-Max (May 2026, 1M context) leads on long-horizon autonomous execution; Qwen3.5 leads on instruction following; and Doubao Seed 2.0 claims top math scores with less independent verification. [Updated June 2026]

Q: Which Chinese models can I use outside China without restrictions? A: DeepSeek and Qwen (via Alibaba Cloud or OpenRouter) offer the most accessible international APIs with USD billing and no residency requirements. Kimi’s API also supports international developers. Doubao requires a Chinese phone number for standard access; ERNIE’s Qianfan platform is primarily China-oriented.

Q: Are Chinese open-source models truly open source? A: DeepSeek-R1/V3 use MIT license (fully permissive, distillation allowed). Qwen3 series and ERNIE 4.5 use Apache 2.0. Kimi K2/K2.5/K2.7 use a Modified MIT license (adds attribution requirement above 100M MAUs or $20M/month revenue). GLM-5.2 (Zhipu) uses standard MIT with no commercial threshold. Doubao’s models carry no open-source license, weights are not available. Always verify the specific model version, as licensing can differ within a model family.

Q: Should enterprise teams be concerned about data privacy with Chinese AI APIs? A: Yes, with nuance. The concern applies to all API providers, data transmitted to any external API is subject to the provider’s data policies and applicable law. For Chinese providers, this includes China’s data security laws. DeepSeek had a documented 2025 incident involving exposed chat logs. Enterprise users should review data processing agreements, consider on-premises deployment of open-weight models where possible, and avoid sending sensitive data to consumer-facing apps regardless of provider.

Q: How quickly are Chinese models closing the gap with GPT and Claude? A: Analysis from Epoch AI and MIT Technology Review indicates Chinese frontier models, on average, trailed leading U.S. releases by approximately seven months, a figure Epoch AI reaffirmed in early 2026, down from a significantly wider gap in 2023. By Q2 2026, Kimi K2.6 became the first open-weights model to beat GPT-5.4 on SWE-Bench Pro (58.6 vs. 57.7), and DeepSeek V4 Pro sits second on the Artificial Analysis Intelligence Index among open-weights releases. The picture on Claude has shifted further following Anthropic’s June 9, 2026 launch of Claude Fable 5, the first generally available Mythos-class model and Anthropic’s most capable widely released model, which sits above the Opus tier entirely. Opus 4.8 (launched May 2026, scoring 69.2% on SWE-Bench Pro against GPT-5.5’s 58.6% and Gemini 3.1 Pro’s 54.2%) remains Anthropic’s most capable Opus-tier model and is not deprecated. Fable 5 leads on agentic coding benchmarks including FrontierCode and CursorBench, and was the first model to break 90% on a core analytics benchmark, though Anthropic has not published numeric scores. At $10/$50 per million tokens, it costs exactly twice Opus 4.8 and substantially more than the Chinese models in this comparison. In specific domains (math reasoning, code, instruction following), Chinese models have reached or exceeded Western benchmarks at lower cost. The gap at the very frontier is widening again following the Fable 5 launch, though Chinese labs have consistently closed prior gaps faster than Western observers anticipated. (Anthropic. “Claude Fable 5 and Claude Mythos 5.” June 2026)


sources · 16 cited

  1. Qwen Blog. "Qwen2.5-Coder: Code the World." October 2024qwenlm.github.iocommunityaccessed 2026-04-24
  2. Anthropic. "Claude Opus 4.8." May 2026anthropic.comprimaryaccessed 2026-05-28
  3. Anthropic. "Claude Fable 5 and Claude Mythos 5." June 2026anthropic.comprimaryaccessed 2026-06-10
  4. DeepSeek. "Models & Pricing." API Docsapi-docs.deepseek.comprimaryaccessed 2026-05-29
  5. Moonshot AI. "Kimi K2.7 Code Release." June 2026nerova.aianalysisaccessed 2026-06-17
  6. Zhipu AI. "GLM-5.2 Open Source Release, June 2026"gate.comanalysisaccessed 2026-06-17
  7. Hugging Face. "zai-org/GLM-5.2 Model Card." June 2026huggingface.coprimaryaccessed 2026-06-19