GLM-5.2 vs Kimi K2.7 Code: Two Open-Weight Bets on Agentic Coding

Moonshot shipped Kimi K2.7 Code on June 12, 2026. Zhipu opened GLM-5.2 to its Coding Plan the next day.⁸¹ Two Chinese labs, two open-weight models pointed at the same job (autonomous coding agents), released inside the same 24 hours, in the same week the US Commerce Department ordered Anthropic to disable worldwide access to its top two models, Claude Fable 5 and Mythos 5.¹⁴ The timing is not subtle, and neither lab pretended otherwise: Zhipu founder Jie Tang called the restriction “deeply regrettable” days before GLM-5.2 went live.

The two models look like twins on paper and turn out to be opposites. GLM-5.2 leads with a benchmark sheet, a million-token context window, and the top spot on every open-weight leaderboard that ranks it. Kimi K2.7 Code leads with a lower price and an efficiency claim, ships a benchmark table it mostly loses, and scores below its own predecessor on general intelligence. These are opposite bets on what an open-weight coding model should compete on. Choosing between them has almost nothing to do with the few benchmark points separating their generations, and almost everything to do with whether you are buying capability or buying a cheaper meter.

Two labs, two bets, one week

GLM-5.2 comes from Zhipu AI (operating internationally as Z.ai), a 2019 Tsinghua University spin-off that became the first Chinese AI lab to go public, listing in Hong Kong in January 2026 before a volatile run carried its market cap to a record near $112 billion in late May.¹⁵ It is the fourth GLM-5 release in roughly four months, and Zhipu positions it as its strongest open model to date.¹ The pitch is capability plus reach: a 753B-parameter mixture-of-experts model with a 1M-token window, openly released weights, and an endpoint that speaks the Anthropic Messages API so it drops into Claude Code-style harnesses behind a base-URL swap.²³

Kimi K2.7 Code comes from Moonshot AI, the Beijing lab valued near $20 billion in a Meituan-led round in May and reportedly raising again at a higher mark.¹⁹ It is a coding-specialized derivative of the K2.7 architecture (there is no general-purpose K2.7 instruct model; the coding build is the release), carrying forward the trillion-parameter MoE design Moonshot has run since K2.⁷⁹ The pitch is economics: openly released weights under a Modified MIT license, an API priced below GLM-5.2 on every axis, and a claim that the model spends about 30% fewer reasoning tokens than its predecessor per accepted change.⁸ What it does not lead with is a capability win. Across the six benchmarks Moonshot ran itself, K2.7 Code trails GPT-5.5 and Claude Opus 4.8 in 11 of 12 head-to-head cells, the subject of our breakdown of the K2.7 Code launch.⁹

The strategic backdrop is the same for both. Chinese-origin models went from under 2% of OpenRouter token volume in 2024 to a majority share in early 2026, settling around 45% by June, though usage tracks price more than quality and the revenue split still favors Western labs.¹⁶ Open weights route around the export restrictions that bind closed models. When Washington can switch off Fable 5 worldwide but cannot switch off a file on HuggingFace, the release calendar starts to look like policy. That is the stakes layer under what is otherwise a straightforward question: which of these two should you actually run.

Here is the shape of the decision before the detail.

	GLM-5.2 (Z.ai / Zhipu)	Kimi K2.7 Code (Moonshot)
Released	June 13, 2026	June 12, 2026
Total / active params	753B / ~40B ²	1T / 32B ⁷
Experts	256 routed, 8+1 active ⁵	384 routed, 8+1 active, 61 layers ⁷
Context window	1,000,000 ²	256K ⁷
Vision input	No (text-only) ³	Yes (400M MoonViT) ⁹
License	MIT (HF) / Apache-2.0 (GitHub) ²¹	Modified MIT ⁷
Intelligence Index (independent)	51 (#1 open-weights) ⁴	42 ⁶
API price, in / out per 1M	$1.40 / $4.40 ⁵	$0.95 / $4.00 ⁶
Output speed (independent)	~82.7 tok/s ⁵	~57.1 tok/s ⁶

The architecture: 753B sparse against a trillion-parameter MoE

Both models are sparse mixtures of experts, and in both cases the headline parameter count is the least useful number on the page. What sets inference cost is the active-parameter slice, not the total.

GLM-5.2 carries 753B total parameters and activates roughly 40B per token across 256 routed experts plus a shared one.²⁵ Kimi K2.7 Code is the larger model on paper at a trillion parameters, but activates only 32B per token across 384 experts (8 routed plus 1 shared) over 61 layers.⁷ So the trillion-parameter model is the cheaper one to run per token, by a wide margin: 32B active against 40B. Raw size is a memory-footprint problem, not a compute-per-token problem, and the two models invert the ranking depending on which you care about.

The decoding stacks differ in ways that matter for long-running agents. GLM-5.2 uses what Zhipu calls IndexShare sparse attention, reusing a single token-selection indexer across every four sparse-attention layers to cut per-token FLOPs by 2.9x at 1M context, plus a multi-token-prediction layer for speculative decoding.² IndexShare is what makes the million-token window usable rather than nominal, and our benchmark explainer walks through the mechanism. Kimi K2.7 Code carries the K2 line’s Multi-head Latent Attention (the same KV-compression family DeepSeek popularized), native INT4 weights, and a 400M-parameter MoonViT vision encoder.⁷⁹

That vision encoder is a real asymmetry. GLM-5.2 is text-only; the vision work in Zhipu’s lineup lives in a separate, closed-weights GLM-5V model.³ Kimi K2.7 Code can read a screenshot, a UI mockup, or a rendered error directly, which removes a manual transcription step for tasks like reproducing a design or debugging a layout. Moonshot’s benchmarks do not isolate vision-grounded coding performance, so the capability is real but unmeasured. If your workflow never touches an image, it is also irrelevant.

Context and speed: a million tokens, and which model is actually faster

GLM-5.2 exposes a 1,000,000-token input window with up to 131,072 output tokens, a 5x jump over GLM-5.1’s 200K.²³ Kimi K2.7 Code holds the K2 line’s 256K window, paired with aggressive prompt caching that drops the cost of repeated context.⁶⁷

The two approaches answer different questions. A 1M window lets you load an entire repository and its dependency surface in one pass and skip the retrieve-then-reason loop most coding agents run. Kimi’s bet is that you rarely need a million tokens live, and that caching the 256K you do reuse is cheaper than paying attention compute over a window four times larger.

Speed cuts the other way from what the parameter counts suggest. On Artificial Analysis’s independent measurements, GLM-5.2 is the faster model: about 82.7 tokens per second with a 1.51-second time-to-first-token, against Kimi K2.7 Code’s 57.1 tokens per second and 2.22 seconds.⁵⁶ Moonshot has announced a high-speed Kimi variant claiming 180 tokens per second and bursts past 260, but that figure is vendor-stated and not independently confirmed.⁸ On the numbers anyone outside the two labs can check, GLM-5.2 is faster, holds more context, and starts responding sooner. Those are three points in its column before a single benchmark is argued.

The benchmark problem, and the one number that escapes it

Here is the honest difficulty in any GLM-5.2-versus-Kimi comparison: the two vendors published benchmark tables you cannot merge. GLM-5.2 reported public benchmarks (SWE-bench Pro, Terminal-Bench 2.1, GPQA, AIME, HLE). Kimi K2.7 Code reported only Moonshot’s own proprietary suites (Kimi Code Bench v2, Program Bench, MCP Mark, Kimi Claw 24/7). There is no public benchmark both models ran.

Except one. The Artificial Analysis Intelligence Index is a third-party composite that ranks both, and there the gap is not close.

Model	Intelligence Index	Note
Claude Fable 5	60	closed, restricted
Claude Opus 4.8	56	closed
GPT-5.5	55	closed
GLM-5.2	51	#1 open-weights, #4 overall ⁴
DeepSeek V4 Pro	44	open
MiniMax-M3	44	open
Kimi K2.6	43	open, predecessor
Kimi K2.7 Code	42	open ⁶

On the only common independent scale, GLM-5.2 (51) is the leading open-weights model on the board, and Kimi K2.7 Code (42) sits not just below it but below Kimi’s own previous generation, K2.6 at 43.⁴⁶ That last detail is the tell. K2.7 Code is not a general-capability upgrade; it is a coding-and-efficiency release that trades general intelligence for cheaper, faster-converging agent loops. Read that way, the lower Index score is a design choice, not a regression. It also means a buyer who treats K2.7 Code as a smarter Kimi is misreading the release.

The vendor coding numbers, taken on their own terms, broadly agree with the Index. GLM-5.2 self-reports 62.1% on SWE-bench Pro, up from GLM-5.1’s 58.4%, and notably ahead of GPT-5.5’s 58.6% on the same benchmark, the same GPT-5.5 that beat Kimi K2.7 Code across nearly every cell of Moonshot’s table.¹⁹ Kimi K2.7 Code published no SWE-bench score at all. Figures circulating on aggregator sites (78.2%, 60.4%) trace to no primary source, contradict each other, and the independent evaluator Vals.ai lists the model’s SWE-bench run as not completed, so treat any SWE-bench number attached to K2.7 Code as unverified.⁹ Kimi’s one published win is MCP Mark Verified at 81.1, ahead of Opus 4.8’s 76.4 but behind GPT-5.5’s 92.9.⁸ MCP Mark scores Model Context Protocol tool-server integration, which is closer to real agent work than single-shot generation, so it is a genuine win in the one place it lands.

Benchmark	GLM-5.2	Kimi K2.7 Code	Reference: Opus 4.8
SWE-bench Pro	62.1% ¹	not published	69.2% ¹³
Terminal-Bench 2.1	81.0 ¹	not published	74.6 per Anthropic ¹³ / 85.0 per Zhipu ¹
MCP Mark Verified	not published	81.1 ⁸	76.4 ⁸
AIME 2026	99.2 ¹	not published	95.7 ¹
HLE (with tools)	54.7 ¹	not published	57.9 ¹³

Two caveats travel out of that table. The Terminal-Bench row shows why vendor numbers don’t merge: Anthropic’s own Opus 4.8 announcement reported 74.6, while Zhipu’s GLM-5.2 table lists Opus at 85.0 on the same-named benchmark, a 10-point spread on one model’s score that most likely reflects different harnesses or eval dates.¹³¹ And GLM-5.2’s 99.2% AIME is a near-ceiling math score with the highest contamination exposure in its suite; competition math saturates training crawls, so it measures recall as much as reasoning. Its 40.5% raw HLE (54.7 with tools) is the more honest signal, because that benchmark is built to resist saturation.

Price: cheaper tokens against a faster, smarter, pricier model

This is where Kimi’s strategy pays off, and it is the cleanest win either model has.

Both models are metered per token, and Kimi K2.7 Code is cheaper on every axis: $0.95 per million input tokens against GLM-5.2’s $1.40, $0.19 cached against $0.26, and $4.00 output against $4.40.⁵⁶ Artificial Analysis blends those to roughly $0.70 per million for Kimi against $0.90 for GLM-5.2. Then the efficiency claim compounds it: if K2.7 Code really spends about 30% fewer reasoning tokens per accepted change, its effective cost-per-task advantage is wider than the per-token spread alone.⁸ That claim cuts directly at GLM-5.2’s known weakness. Independent reviewers flag GLM-5.2 as verbose, around 43,000 output tokens per task against 26,000 for GLM-5.1, the kind of generation bloat that turns a per-token edge into a per-task deficit.⁴

The subscription tiers are a near-tie at the entry point, and the two have been lined up directly by third-party reviewers since launch.¹⁷ Zhipu’s GLM Coding Plan starts at $18/month for the Lite tier (roughly 400 prompts a week), with Pro and Max above it, and GLM-5.2 included on every tier.¹⁰ Moonshot’s Kimi Code plans start at $19/month and climb to a $199 tier that runs hundreds of parallel subagents.⁸ One wrinkle favors Kimi on heavy subscription use: GLM-5.2 consumes 3x quota at peak hours and 2x off-peak against the plan, so the flat plan is less flat than it looks for an always-on agent.¹⁰ And the self-host path zeroes the per-token line for either model, moving the bill to hardware.

The summary is uncomfortable for GLM-5.2 on this axis and only this axis: it is the more expensive model to run, per token and per task, and it is competing on a benchmark lead and a context window, not on price. Kimi K2.7 Code is the budget instrument. For a coding agent that loops hundreds of times per task, that is the number that shows up on the invoice.

Open weights, two licenses, two hardware bills

Both models ship openly, and the licenses are close but not identical. GLM-5.2 is the more permissive on paper, with one genuine inconsistency: its HuggingFace model card declares MIT, while the LICENSE file in Zhipu’s GitHub repository (which does cover GLM-5.2) reads Apache-2.0.²¹ Both are permissive and neither carries regional restrictions, but a legal team should note the ambiguity rather than assume MIT. Kimi K2.7 Code is Modified MIT: standard MIT behavior until a product crosses 100 million monthly active users or US$20 million per month in revenue, at which point it must display “Kimi K2” in its interface.⁷ For all but the largest products that clause never fires.

The hardware bill inverts the licensing story. GLM-5.2 is the smaller model but still 753B parameters, around 744GB to 753GB in FP8 and 1.51TB in BF16. Kimi K2.7 Code is the larger model at a trillion parameters, near 1TB in its native low precision, with the MoE memory penalty of holding every expert resident even though only 32B fire per token.⁹ Neither is a single-card proposition; both want an 8-way H200-class node for full-precision serving, or a community GGUF quant (down to roughly 217GB for GLM-5.2, 304GB for Kimi) to fit smaller rigs. GLM-5.2 is marginally the easier of the two to self-host, on footprint and on the lighter quant rungs; Kimi’s edge is shipping an official INT4 build rather than leaning on community quantization.

One number resists the “GLM-5.2 leads everything” reading: downloads. In its first month Kimi K2.7 Code pulled roughly 318,000 HuggingFace downloads against GLM-5.2’s 19,700.⁶⁵ A single coding SKU and a day’s head start inflate that, and downloads measure curiosity rather than quality, but the gap is wide enough to suggest Kimi’s price-and-efficiency pitch is landing with the self-host crowd even as GLM-5.2 wins the leaderboards. GLM-5.2 leads on HuggingFace likes, for whatever that is worth.

Agentic coding: the near-tie that splits by use case

Strip away the leaderboards and the question becomes narrower: which one writes and ships better code inside an agent loop. Here the independent evidence is thinner, all of it a week old, and it lands closer to a tie than the Intelligence Index does.

Integration is a near-wash. GLM-5.2 speaks the Anthropic Messages API and lists eight validated agent harnesses including Claude Code, Cline, OpenCode, Roo Code, and Goose; a base-URL and model-name swap usually suffices.³ Kimi K2.7 Code offers both OpenAI- and Anthropic-compatible endpoints, broader surface, but the OpenAI path carries two real gotchas: multi-turn tool use requires replaying the prior reasoning_content block or the tool loop breaks, and tool_choice accepts only auto or none.¹⁸ Both need adapter awareness despite the compatible-looking API. Reasoning control diverges sharply: GLM-5.2 exposes High and Max effort presets, while Kimi K2.7 Code forces thinking on through preserve_thinking with no instant mode, so a one-line rename pays full reasoning overhead it cannot opt out of.³⁸ For hard tasks that is a feature; for trivial high-volume edits it is the efficiency pitch arguing with itself.

The week-one hands-on comparisons split predictably:

Kilo Code scored the two across planning and building and gave the nod to GLM-5.2 in both phases (planning 9.0 to Kimi’s 8.1, building 15/15 to 14/15), noting GLM-5.2’s plan landed within 0.1 of Claude Fable 5 at roughly a tenth of the cost.¹¹
AkitaOnRails ran a full Rails 8 build and scored it almost even: GLM-5.2 at 87/100 with the cleanest dependency injection in the field but the slowest run at 43 minutes, and Kimi K2.7 Code at 86/100 with a verified end-to-end result in 22 minutes for about $0.30, marred by a regression where it drops the system prompt via with_instructions.¹² The same reviewer clocked GLM-5.2’s jump from GLM-5.1’s 46/100, a generational leap that tracks the Intelligence Index story.
Across the cluster of reviews the pattern is consistent: Kimi K2.7 Code is the more autonomous agent on long-horizon and bug-fix work and the cheaper one to run; GLM-5.2 is stronger on planning, one-shot generation, and frontend and visual output, where it tops the Design Arena among open models. Both still cede the hardest tasks to Opus 4.8 and GPT-5.5.

The tool-reliability heritage favors GLM, for what predecessor data is worth: GLM-4.5 reported a 90.6% tool-call success rate against the original Kimi K2’s 86.2%, the long-running pattern of GLM holding tool loops together. Whether that survives into this generation is exactly the kind of thing a week of reviews cannot settle. For teams already on Claude Code or Cline, our GLM-5.2 migration checklist covers the harness specifics.

Both releases rest on the same shaky foundation: vendor-reported tables, no independent replication of the coding numbers, and a week of impressions standing in for consensus. Every GLM-5.2 coding benchmark is Zhipu self-reported, including the Opus 4.8 comparator. Every Kimi K2.7 Code cell was run by Moonshot, two of its six benchmarks are in-house suites, and the efficiency claim is self-measured. The Artificial Analysis Index is the one independent anchor, and it tempers the picture rather than settling it.

Each model also carries a specific risk a benchmark sheet hides. GLM-5.2’s predecessor, GLM-5.1, hit a degradation event in June 2026 that exposed a roughly 50% bad-output rate under production load, and whether the 5.2 architecture resolves it is unestablished; the release-day analysis in Zhipu ships GLM-5.2 with zero benchmarks flags the gap between availability and trust, and the self-hosting cost reality is steeper than the MIT label implies. Kimi K2.7 Code’s risk is quieter: forced thinking means its cost advantage is real on hard tasks and can evaporate on trivial ones, the documented with_instructions regression is the kind of integration footgun that only surfaces in production, and some practitioners report its headline 30% token saving does not reproduce on real repositories.²⁰

Neither launch hands a procurement team a number it can defer to. The comparison work has moved off the leaderboard and onto your repository.

How to choose

The decision resolves to a few axes, and on most of them the two models answer different questions rather than beat each other.

General capability, speed, and long context: GLM-5.2. It is the leading open-weights model on the one independent index, the faster model on independent measurement, and the only one of the two with a million-token window. If you want the strongest open coding model and can absorb its higher per-token cost, this is it.
Cost per task at volume: Kimi K2.7 Code. Cheaper on every per-token axis and more token-efficient per accepted change, it is the budget instrument for agents that loop hundreds of times, provided you run it against real pull requests first.
Autonomous long-horizon and bug-fix loops: Kimi K2.7 Code, on the week-one hands-on evidence. Planning, frontend, and one-shot generation: GLM-5.2.
Vision in the coding loop: Kimi K2.7 Code, via MoonViT. GLM-5.2 is text-only.
Reasoning-budget control: GLM-5.2, with explicit effort presets. Kimi forces thinking on.

The broader read is that this is no longer a story about whether Chinese open-weight models are competitive. GLM-5.2 holding the top open-weights slot and trading blows with closed frontier models on developer-relevant tasks, the same week the US restricted those closed models, is the story. For the full field these two sit in, including DeepSeek, Qwen, and MiniMax, our map of the Chinese model ecosystem puts them in context, and the task-level coding benchmarks collect the numbers against GPT and Claude.

The answer neither vendor prints on a slide is the same one that closed every comparison this year: run both on your own codebase and measure cost per merged pull request, not benchmark cells. GLM-5.2 gives you the stronger model and the bigger bill. Kimi K2.7 Code gives you the cheaper meter and the more autonomous loop. Both stopped handing you a leaderboard to hide behind, which is not a relief from evaluation. It is a transfer of the work to you.

Frequently Asked Questions

Is GLM-5.2 or Kimi K2.7 Code better overall?

On the only independent benchmark that ranks both, the Artificial Analysis Intelligence Index, GLM-5.2 scores 51 (the leading open-weights model) against Kimi K2.7 Code’s 42, which sits below even Kimi’s own predecessor K2.6 at 43.⁴⁶ GLM-5.2 is also faster on independent measurement and holds a far larger context window. GLM-5.2’s weakness is cost: Kimi K2.7 Code is cheaper per token and more token-efficient, and the two are closer to a tie on real coding tasks than the Index suggests. “Better” depends on whether you are buying capability or buying a cheaper meter.

Which is cheaper for a high-volume coding agent?

Kimi K2.7 Code, clearly. It undercuts GLM-5.2 on every per-token axis ($0.95 vs $1.40 input, $4.00 vs $4.40 output, $0.19 vs $0.26 cached) and claims roughly 30% fewer reasoning tokens per accepted change, which widens the effective gap.⁵⁶⁸ GLM-5.2 is the more verbose model (around 43,000 output tokens per task), so its per-task cost runs higher than the per-token spread alone. Subscription entry points are near-tied at $18 (GLM Coding Plan) and $19 (Kimi Code), but GLM-5.2 burns 2x to 3x plan quota depending on time of day.¹⁰

Can I run GLM-5.2 or Kimi K2.7 Code on my own hardware?

Both ship open weights, but neither is a single-GPU job. GLM-5.2’s FP8 weights are roughly 744GB to 753GB for 753B parameters; Kimi K2.7 Code’s trillion-parameter MoE is near 1TB in its native INT4 precision, because every expert stays resident even though only 32B activate per token.⁹ Both run on vLLM, SGLang, and KTransformers, and both have community GGUF quants (down to roughly 217GB for GLM-5.2 and 304GB for Kimi) for smaller rigs. Full-precision serving wants an 8-way H200-class node either way.

Do both models work with Claude Code and Cline?

Yes, with different friction. GLM-5.2 exposes an Anthropic Messages API-compatible endpoint and lists eight validated harnesses including Claude Code, Cline, OpenCode, Roo Code, and Goose; a base-URL and model-name swap usually suffices.³ Kimi K2.7 Code offers OpenAI- and Anthropic-compatible endpoints, but its OpenAI path requires replaying reasoning_content across turns and restricts tool_choice to auto or none, so some frameworks need adapter code.¹⁸

Is there an independent benchmark comparing GLM-5.2 and Kimi K2.7 Code directly?

Partly. The two labs never ran the same coding benchmarks, so their published tables cannot be merged. The one shared independent measurement is the Artificial Analysis Intelligence Index (GLM-5.2 at 51, Kimi K2.7 Code at 42), plus a handful of week-one hands-on reviews (Kilo Code, AkitaOnRails) that score real coding tasks and land close to a tie that splits by use case.⁴⁶¹¹¹² Any merged coding-benchmark table you see elsewhere is stitching together numbers the vendors measured in different harnesses, and at least one widely circulated Kimi SWE-bench figure is refuted by independent evaluators.