groundy
models

Chinese AI Models Compared: DeepSeek, Qwen, Kimi, Doubao, and Ernie

DeepSeek isn't China's only frontier AI. Compare DeepSeek, Qwen, Kimi, Doubao, and Ernie on benchmarks, licensing, API access, and use-case fit.

10 min · · · 6 sources ↓

China’s AI model ecosystem is diverse and rapidly evolving. DeepSeek dominates Western headlines, but Qwen, Kimi, Doubao, and Ernie each occupy distinct positions (see also Qwen’s dense model architecture), with different licensing models, API accessibility, and technical strengths. Here’s a structured comparison of all five to help practitioners choose the right model for their stack, refreshed for Q2 2026 with the DeepSeek V4, Qwen 3.6, and Kimi K2.6 releases.


Why the Chinese AI Ecosystem Deserves a Map

When DeepSeek-R1 launched in January 2025 and matched OpenAI o1’s performance on MATH-500 (97.3 vs. 96.4) at a fraction of the cost, it forced a reassessment of how much frontier AI performance actually costs to produce. (DeepSeek AI. “DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning.” arXiv, January 2025) But the “DeepSeek moment” was a narrow lens on a broader story.

Chinese AI labs collectively now account for roughly 30% of all open-source model downloads globally, surpassing U.S. labs for the first time in 2025, according to MIT Technology Review.2 The ecosystem spans five distinct companies with five different strategies: open-source maximalism, closed-but-cheap API plays, long-context specialization, consumer-first deployment, and search-engine heritage.

Understanding each model on its own terms is more useful than ranking them on a single leaderboard.


The Five Competitors at a Glance

Model FamilyCompanyOpen SourceContext WindowInternational API
DeepSeek V4 Pro / V4 Flash / R1DeepSeek AIYes (MIT)128K–164KEasy
Qwen3.6-Max-Preview / Qwen3 / QwQ-32BAlibaba CloudYes (Apache 2.0)256KEasy
Kimi K2.6Moonshot AIYes (Mod. MIT)256KYes
Doubao Seed 2.0ByteDanceNoUndisclosedDifficult
ERNIE 4.5 / X1.1 / 5.0BaiduPartial (Apache 2.0)UndisclosedLimited

DeepSeek: The Open-Source Disruptor

DeepSeek AI’s flagship pair, V3 for general tasks and R1 for reasoning, defined the terms of the 2025 cost efficiency debate. DeepSeek-V3 uses a 671B-parameter Mixture-of-Experts architecture with only ~37B parameters active at inference, combined with Multi-head Latent Attention (MLA) to compress key-value representations, a genuine architectural innovation, not just efficient training. (DeepSeek AI. “DeepSeek-V3 Technical Report.” arXiv, December 2024)

DeepSeek-R1 went further by training its reasoning chain through pure reinforcement learning, skipping supervised fine-tuning for bootstrapping. The result: an open-source reasoning model competitive with OpenAI o1 on AIME 2024 (79.8 vs. 79.2) and MATH-500 (97.3 vs. 96.4), available under an MIT license at roughly $0.55 per million input tokens and $2.19 per million output tokens as of March 2026.4

In late April 2026, DeepSeek shipped V4, splitting the flagship into V4 Pro (1.6T total / 49B active parameters) and V4 Flash (284B total / 13B active). On the Artificial Analysis Intelligence Index, V4 Pro sits second only to Kimi K2.6, returning DeepSeek to the leading edge of open-weights releases. (Artificial Analysis. “DeepSeek is back among the leading open weights models with V4 Pro and V4 Flash.”) V3.2 remains supported for production deployments that haven’t migrated yet, at the same pricing tier.

Use case fit: Best for mathematics, code generation, and any reasoning-intensive task where you need frontier-level capability at open-source prices. The MIT license permits distillation, which has spawned a generation of fine-tuned derivatives.


Qwen: The Broadest Ecosystem

Alibaba’s Qwen series has become the most downloaded open-source model family globally, outpacing Meta’s Llama by download volume in 2025.6 The range is remarkable: Qwen3 covers sizes from 0.5B to 235B, with QwQ-32B as a standalone reasoning model and Qwen2.5-Coder as a specialized code variant.

The Qwen3.5 release in February 2026 introduced a new architecture that fuses linear attention (Gated Delta Networks) with sparse MoE, delivering 8.6x faster decoding than Qwen3-Max at 32K context while claiming the highest instruction-following scores of any model evaluated on IFBench (76.5), and outperforming GPT-5.2 on MathVision benchmarks (88.6 vs. 83.0).7 In April 2026 Alibaba previewed Qwen3.6-Max, with improved agentic coding, stronger world knowledge, and more reliable instruction following. Community evaluations placed it close to DeepSeek V4 and Kimi K2.6 on open-weights leaderboards. (DeepLearning.AI The Batch. “Kimi K2.6 Matches Open Qwen3.6 Max.”)

Most models in the Qwen family carry Apache 2.0 licensing, a substantial advantage over MIT for enterprise legal teams who need clarity on patent clauses. The closed-source Qwen-Max (API only) remains for users who want the best Alibaba has to offer without self-hosting.

Use case fit: The breadth of sizes makes Qwen the practical choice for on-device deployments (0.5B–7B), resource-constrained inference (14B–32B), and production API use. Qwen2.5-Coder 32B outperforms GPT-4o on LiveCodeBench (37.2% vs. 29.2%) and Aider code editing benchmarks (73.7%) (Qwen Blog. “Qwen2.5-Coder: Code the World.” October 2024), making it the strongest open-source code model in this comparison at time of writing.


Kimi: Long Context as a First Principle

Moonshot AI, founded in 2023, built its identity around one idea: context length matters more than most labs acknowledge. Kimi k1.5 (January 2025) demonstrated that scaling reinforcement learning context windows to 128K tokens produces continuous reasoning improvements, without the Monte Carlo Tree Search complexity used by competing approaches. (Moonshot AI. “Kimi k1.5: Scaling Reinforcement Learning with LLMs.” January 2025)

The Kimi K2 architecture escalates this to 1 trillion total parameters (32B active), making it the largest open-source MoE in this comparison by raw parameter count. Kimi K2.5 (January 2026) extended context to 256K tokens and added native multimodal capabilities, with automatic caching reducing effective input costs by up to 75% on long-context workloads.10 On April 20, 2026, Moonshot released Kimi K2.6, a 1T-parameter vision-language model that became the first open-weight system to beat GPT-5.4 (xhigh) on SWE-Bench Pro, with native INT4 quantization, a “preserve thinking” mode, and agent-swarm capabilities. (DeepLearning.AI The Batch. “Kimi K2.6 Matches Open Qwen3.6 Max.”)

Pricing at the API level is context-window-tiered: ~$0.20 per million input tokens for 8K context up to $2.00 per million for 128K. The API is OpenAI SDK-compatible, a drop-in replacement via api.moonshot.ai/v1, accepting USD payment from international developers.

Use case fit: Long-document analysis, research synthesis, and agentic tasks where maintaining coherent context across many tool calls is the bottleneck. With K2.6, Kimi is now competitive on reasoning benchmarks as well as long-context workflows, narrowing the prior gap to DeepSeek-R1.


Doubao: Consumer Scale, Enterprise Pricing

ByteDance’s Doubao is arguably the most powerful model in this comparison by deployment scale, with over 200 million users on the consumer app as of early 2026, and the least accessible to international developers.11

The Doubao Seed 2.0 family (February 14, 2026 launch) includes four variants covering Pro, Lite, Mini, and Code specializations. Doubao Seed 2.0 Pro claims competitive positioning with frontier Western models on AIME 2025 (98.3) and Codeforces (rating 3020), and ranked 6th on the LMSYS Text Arena and 3rd on Vision Arena as of its launch date.12 Doubao-1.5-Pro input pricing sits at approximately $0.11 per million tokens, among the lowest in the comparison.

The access barrier is real. Standard registration requires a Chinese phone number. International enterprise access routes through Volcano Engine (ByteDance’s cloud platform) via negotiated agreements, or through third-party API aggregators.

Use case fit: Businesses operating within ByteDance’s product ecosystem (Douyin, TikTok) or companies with existing Volcano Engine relationships. The multimodal video understanding capabilities are strong (VideoMME scores 89.5), making it competitive for media-adjacent applications. For most international developers, the friction outweighs the price advantage unless there’s a specific ByteDance integration requirement.


Ernie: Heritage Brand, Mixed Execution

Baidu’s ERNIE has the longest pedigree in Chinese NLP: the name stands for Enhanced Representation through kNowledge IntEgration, and early versions in 2019 preceded the modern LLM era. ERNIE 4.5, released March 2025 and open-sourced under Apache 2.0 in July 2025, covers a broad parameter range (0.3B dense to 424B MoE) and achieves MMLU-Pro scores near GPT-4.5 (~78) according to Baidu’s benchmarks.13

ERNIE X1 and X1.1 target the reasoning segment, with Baidu claiming X1 matches DeepSeek-R1 at 50% of the cost, and X1.1 surpassing DeepSeek R1-0528 on key benchmarks. ERNIE 5.0, released in late 2025, expands to 2.4 trillion parameters and natively handles text, images, audio, and video.

The caveat: Baidu’s benchmark claims have historically attracted skepticism, and independent verification of ERNIE’s stated performance has been uneven. ERNIE 4.5’s open-source release is a substantial shift, signaling that Baidu recognizes the open-weight ecosystem as a competitive pressure rather than a curiosity.

Use case fit: Chinese-language applications where Baidu’s search data heritage matters, and workflows that require full multimodal capability including audio and video (ERNIE 5.0). The open-sourced ERNIE 4.5 is accessible and well-documented; the proprietary ERNIE 5.0 is less straightforward for international developers to access.


Benchmark Comparison: Where They Stand

BenchmarkDeepSeek V4 Pro / R1Qwen3.6 / 3.5Kimi K2.6Doubao Seed 2.0 ProERNIE X1.1
MMLU90.8 (R1)n/an/an/an/a
MATH-50097.3 (R1)n/a96.2 (K2.5)n/an/a
AIME 2024/202579.8 (R1)n/a77.5 (K2.5)98.3Competitive
IFBenchn/a76.5 (3.5)n/an/an/a
MathVisionn/a88.6 (3.5)n/an/an/a
Codeforcesn/an/a94th pct. (K2.5)3020 ratingn/a
SWE-Bench Pron/an/abeats GPT-5.4 xhigh (K2.6)n/an/a
LMSYS Text Arena (Feb 2026)n/an/an/a6thn/a
Artificial Analysis Intelligence Index2nd of open-weights (V4 Pro)close to V4 (3.6 Max)1st of open-weights (K2.6)n/an/a

Benchmarks are model-reported or third-party verified where noted. Direct comparisons across labs require caution: evaluation setups differ.


API Pricing at a Glance

ModelInput ($/1M)Output ($/1M)Notes
DeepSeek V4 ProTBC at launchTBC at launch1.6T total / 49B active; April 2026
DeepSeek V4 FlashTBC at launchTBC at launch284B total / 13B active; April 2026
DeepSeek V3.2$0.28$0.42Cached: $0.028
DeepSeek-R1$0.55$2.19Reasoning model
Qwen-Plus$0.40$1.20
Qwen3-Max$0.78$3.90
Qwen3.6-Max-PreviewTBC at GATBC at GAApril 2026 preview
Kimi K2.5 / K2.6$0.60$2.50Auto-caching; K2.6 adds INT4 quant
Doubao-1.5-Pro$0.11$0.275China-accessible; limited intl.
ERNIE X1$0.28$1.10
ERNIE 4.5$0.55$2.20
GPT-4o (reference)~$2.50~$10.00

Choosing the Right Model

The selection decision reduces to three axes: openness (do you need to self-host?), access (are you outside China?), and task type (reasoning, code, long context, or multimodal?).

  • Self-hosting a reasoning model: DeepSeek-R1 (or V4 Pro once it stabilizes) with MIT license. Distill down to the 32B or 70B variant if hardware is constrained.
  • Self-hosting a general model: Qwen3 series with Apache 2.0. The breadth of sizes from 0.5B to 235B is unmatched, and Qwen3.6-Max is the natural upgrade path once GA arrives.
  • Long-context document work via API: Kimi K2.6 at 256K context with automatic caching and INT4 quantization.
  • Cheapest capable API with easy international access: DeepSeek-V3.2 at $0.28/$0.42 per million tokens.
  • Chinese-language production workloads: DeepSeek or ERNIE, both with strong native language performance and open-source options.
  • Multimodal including video: Doubao Seed 2.0 or ERNIE 5.0, with the caveat that international access is limited for both.

Frequently Asked Questions

Q: Is DeepSeek actually the best Chinese AI model? A: As of May 2026, DeepSeek V4 Pro and Kimi K2.6 sit at the top of open-weights leaderboards, with Qwen3.6-Max-Preview close behind. “Best” depends on the task: Kimi K2.6 leads on long-context and now SWE-Bench Pro, Qwen3.5 leads on instruction following, and Doubao Seed 2.0 claims top scores on math (with less independent verification).

Q: Which Chinese models can I use outside China without restrictions? A: DeepSeek and Qwen (via Alibaba Cloud or OpenRouter) offer the most accessible international APIs with USD billing and no residency requirements. Kimi’s API also supports international developers. Doubao requires a Chinese phone number for standard access; ERNIE’s Qianfan platform is primarily China-oriented.

Q: Are Chinese open-source models truly open source? A: DeepSeek-R1/V3 use MIT license (fully permissive, distillation allowed). Qwen3 series and ERNIE 4.5 use Apache 2.0. Kimi K2/K2.5 uses a Modified MIT license. Doubao’s models carry no open-source license — weights are not available. Always verify the specific model version, as licensing can differ within a model family.

Q: Should enterprise teams be concerned about data privacy with Chinese AI APIs? A: Yes, with nuance. The concern applies to all API providers — data transmitted to any external API is subject to the provider’s data policies and applicable law. For Chinese providers, this includes China’s data security laws. DeepSeek had a documented 2025 incident involving exposed chat logs. Enterprise users should review data processing agreements, consider on-premises deployment of open-weight models where possible, and avoid sending sensitive data to consumer-facing apps regardless of provider.

Q: How quickly are Chinese models closing the gap with GPT and Claude? A: Analysis from Epoch AI and MIT Technology Review indicates Chinese frontier models, on average, trailed leading U.S. releases by approximately seven months as of 2025, down from a significantly wider gap in 2023. By Q2 2026, Kimi K2.6 became the first open-weights model to beat GPT-5.4 (xhigh) on SWE-Bench Pro, and DeepSeek V4 Pro sits second on the Artificial Analysis Intelligence Index among open-weights releases. In specific domains (math reasoning, code, instruction following), Chinese models have reached or exceeded Western benchmarks at lower cost. The gap on multimodal and reasoning tasks at the very frontier remains, but it is narrowing faster than most Western observers anticipated.


  1. DeepSeek AI. "DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning." arXiv, January 2025 primary accessed 2026-04-24
  2. DeepSeek AI. "DeepSeek-V3 Technical Report." arXiv, December 2024 primary accessed 2026-04-24
  3. Qwen Blog. "Qwen2.5-Coder: Code the World." October 2024 community accessed 2026-04-24
  4. Moonshot AI. "Kimi k1.5: Scaling Reinforcement Learning with LLMs." January 2025 community accessed 2026-04-24
  5. DeepLearning.AI The Batch. "Kimi K2.6 Matches Open Qwen3.6 Max and DeepSeek V4." analysis accessed 2026-05-17
  6. Artificial Analysis. "DeepSeek is back among the leading open weights models with V4 Pro and V4 Flash." analysis accessed 2026-05-17