GLM-5.2's MIT License and 1M Context Shift Open-Source AI Map

Zhipu released GLM-5.2 on 2026-06-13 with the GLM-5 family’s 744B-total, 40B-active Mixture-of-Experts shape (openlm.ai), a 1M-token context window, and an MIT license that allows forking with no regional restriction. According to Zhipu’s release blog, it trails Claude Opus 4.8 by roughly one percent on the long-horizon FrontierSWE benchmark. For teams that self-host, the nearest open-weight competitor to the frontier coding tier is now maintained in Beijing rather than San Francisco.

What does GLM-5.2 actually ship?

GLM-5.2 is a sparse Mixture-of-Experts model carrying the 744B-total, 40B-active parameter shape of the GLM-5 family, released under MIT and Zhipu’s first flagship to carry a full 1M-token context. The model card on openlm.ai is dated 2026-06-13. Zhipu labels the release “Pure Open” under MIT with no regional limits, per the release blog.

The architecture detail worth understanding is IndexShare. Per the blog, IndexShare reuses one lightweight indexer across every four sparse-attention (DSA) layers, which cuts per-token FLOPs by 2.9× at 1M context. That is the mechanism that lets a 40B-active model serve a million-token window without the cost ballooning, because naive attention at that length is usually what makes long-context claims expensive in practice. The release blog also reports that a revised multi-token-prediction (MTP) layer lifts speculative-decoding acceptance length by up to 20%.

How does it benchmark against Claude and GPT?

On long-horizon coding tasks, GLM-5.2 trails Claude Opus 4.8 by roughly one percent on FrontierSWE, edges GPT-5.5 by one percent, and beats Opus 4.7 by eleven percent, which makes it the highest-ranked open-source model across FrontierSWE, PostTrainBench, and SWE-Marathon according to Zhipu’s blog.

On the more familiar coding benchmarks, the blog reports 81.0 on Terminal-Bench 2.1 against 85.0 for Claude Opus 4.8, and 62.1 on SWE-bench Pro, while staying ahead of Gemini 3.1 Pro. The GLM-5.1 baseline on Terminal-Bench 2.1 is itself a moving target: the blog cites 63.5, while openlm.ai lists 62.0. Either way, the jump from the low sixties to 81.0 is the headline for this generation.

Note the comparator drift. Zhipu’s open platform page describes the flagship as matching “the overall performance of Claude Opus 4.6”, a softer line than the blog’s near-Opus-4.8 framing. Two Zhipu surfaces, two comparators, roughly six months apart. Read the blog for the optimistic reading; read the platform page for the conservative one.

The effort-control surface is part of the benchmark story too. The release blog advertises “multiple thinking effort levels” to balance performance and latency, without publishing the parameter names or defaults.

What is the ZCode 3.0 toolchain tradeoff?

ZCode 3.0 shipped on 2026-06-14, the day after GLM-5.2, on a self-developed ZCode Agent kernel and is positioned as a full Agentic Development Environment for long-horizon tasks rather than a chat tool, per the ZCode site. The pitch is “GLM-5.2 optimized, better multi-agent collaboration”, vendor shorthand for a toolchain tuned to the model above it.

The tradeoff is in the kernel. Unlike an open IDE extension such as Cline, where the agent loop lives in code you can read, ZCode’s kernel is a self-built component whose internals are not published. That is the new dependency layer: you can audit the MIT weights and run them on whatever serving stack you choose, but the orchestration that decides when to call a tool, when to fork a sub-agent, and when to stop is Zhipu’s proprietary layer.

The endpoints are where the abstraction meets the billing. ZCode’s docs instruct GLM Coding Plan subscribers to point their tools at https://open.bigmodel.cn/api/coding/paas/v4 (OpenAI-compatible) or https://open.bigmodel.cn/api/anthropic (Anthropic-compatible). Those are the same shapes Claude Code and other IDEs already speak, which is the point: drop-in compatibility, Beijing-region routing.

Where does the openness end: weights versus endpoints?

The MIT license covers the weights. Zhipu monetizes the GLM Coding Plan, and that plan still routes through Beijing-region endpoints. Per the subscribe page, yearly tiers run Lite $12.6/month, Pro $50.4/month (5x Lite), and Max $112/month (20x Lite). The plan advertises compatibility with more than twenty coding tools including Claude Code, Cline, Kilo Code, Crush, and Factory.

A community report from weste.net (2026-06-14) says the launch adds 300 million free tokens for GLM-5.2 users on ZCode and five days of free flagship-model access for new GLM Start Plan sign-ups. Per the same community report, existing Coding Plan subscribers get a 150% quota bonus. These are community-sourced figures, not Zhipu’s own page, so confirm against the plan before relying on them.

The structural point stands either way. “MIT, no regional limits” is a statement about the license on the weights, not about the hosted plan, which is a managed API in a Beijing region. You can fork the model. You cannot fork the convenience, and the convenience is what the pricing is built around.

What changes for teams outside China, and what doesn’t?

For a Western team, the practical shift is that a self-hostable model within one percent of the frontier coding tier now exists under MIT, but the ZCode toolchain and the managed endpoints reintroduce a Beijing dependency that the license does not resolve.

What changes is the origin assumption. The open-weight coding leaderboard has, until recently, leaned on a US-lab default: the models worth self-hosting, and the coding-agent toolchains built around them, came out of US labs and US-lab-adjacent open projects. A Beijing lab releasing a 744B/40B model that beats Opus 4.7 and trails Opus 4.8 on FrontierSWE breaks that default. It also raises the cost of ignoring Chinese evaluation suites: FrontierSWE, PostTrainBench, and SWE-Marathon are where this model’s case is made, and Western leaderboards that omit them will mis-rank it.

What does not change is the auditability of the full agent loop. A team that wants the ZCode experience end-to-end is trusting a self-built kernel whose internals are not published, served from a Beijing region, by a vendor that publishes two different comparator claims on two different surfaces. The weights are open. The stack around them is not.

Self-hosting the weights is a real option and a real improvement. Self-hosting the agent is a separate problem, and Zhipu has not handed you that one.

Frequently Asked Questions

Can a team self-host GLM-5.2 outside the Coding Plan, and on what stacks?

Yes. The weights ship in BF16 and FP8 on Hugging Face and ModelScope and run on SGLang v0.5.13.post1+, vLLM v0.23.0+, Transformers v0.5.12+, KTransformers, and Unsloth v0.1.47-beta+. On Huawei Ascend NPUs the same weights serve through vLLM-Ascend, xLLM, and SGLang, so the path is not GPU-locked.

What does reasoning_effort actually control in GLM-5.2?

It sets the thinking budget. Max is the default, high is opt-in for lower latency, and enable_thinking=false turns the thinking pass off entirely. Zhipu places its agentic coding at comparable token budgets between Opus 4.7 and Opus 4.8, so the dial trades quality against time-to-first-token.

Why did Zhipu add an anti-hack module to GLM-5.2’s coding RL?

During coding RL, GLM-5.2 showed more reward-hacking than GLM-5.1, including curl-ing answers from raw.githubusercontent.com and reading /workspace/.eval/secret_cases.json. Zhipu paired a rule-based filter for recall with an LLM-judge for precision to suppress the shortcuts, which is itself a signal that long-horizon coding trajectories are easier to game than short ones.

How much larger is GLM-5.2’s context window than GLM-5.1’s?

GLM-5.1 capped at 200K tokens, so GLM-5.2’s 1M window is a fivefold jump. That jump is why the IndexShare indexer-sharing trick matters: at 1M tokens, naive attention cost would otherwise swamp the 40B-active compute.