Zhipu opened GLM-5.2 to all GLM Coding Plan tiers on June 13, 2026, shipping a one-million-token context window under a promised MIT license and an Anthropic-compatible API endpoint. The open-weights release is the substance. The “free API” in the headline is not: hosted access requires a paid plan, and Zhipu published no benchmark scores at launch.
What shipped on June 13, and what is still only promised
GLM-5.2 went live across every GLM Coding Plan tier (Lite, Pro, Max, and Team) on June 13, with Zhipu calling it its strongest open-source model to date. The model ships as glm-5.2[1m], exposing a 1,000,000-token input window and up to 131,072 output tokens alongside two thinking-effort presets, High and Max, aimed at long, multi-step coding tasks.
Two pieces are still promised rather than shipped. The standalone API is slated for “the following week” from the June 13 launch, and the MIT-licensed open weights are due the same week. As of June 16, the weights have not appeared in public. Zhipu’s Hong Kong-listed entity (02513.HK) confirmed the 1M context and the MIT intent, framing the open release as a way to grow open-platform and API call volume.
This is the third major iteration of the GLM-5 line in roughly four months. GLM-5 shipped February 11, GLM-5-Turbo March 15, and GLM-5.1 April 7, per a compiled release timeline, a cadence Zhipu accelerated after its January 2026 Hong Kong IPO. The “three releases in three months” shorthand floating around early coverage understates it; the line spans February through June, and counting the Turbo variant makes it four.
What does a 1M-token context and an Anthropic-compatible endpoint actually buy?
GLM-5.2 speaks the Anthropic Messages API, so it drops into Claude Code-style harnesses behind a base-URL and model-name swap rather than a real integration. The practical hook is MODEL_NAME=glm-5.2[1m]: a config change, not an SDK migration.
For teams whose context budget today forces retrieval round-trips, a one-million-token window lets you load an entire repository and its dependency surface in one pass, collapsing the retrieve-then-reason loop that most coding agents run. That is the capacity claim, and it is the part Zhipu can credibly put on the spec sheet.
Context window is a capacity claim, not a quality claim. A model that ingests a million tokens is only useful if its attention holds across them, and that is precisely the thing Zhipu did not demonstrate.
Where are the benchmarks, and is the architecture even confirmed?
Zhipu published no benchmark scores at the GLM-5.2 launch: no SWE-bench, no human-eval proxy, no eval-leaderboard placement. That is an unusual omission for a “strongest open-source model to date” positioning. The closest available reference is the predecessor: GLM-5.1 scored 58.4 on SWE-bench Pro and 1530 Elo on Code Arena, ranking third globally at the time, but both figures are third-party rather than vendor-confirmed.
The architecture is equally unverified. Community posts describe a 744B-parameter mixture-of-experts model with roughly 40B active parameters per token, DeepSeek-style sparse attention, and a 28.5T-token training run. Zhipu confirmed none of these figures at launch.
What does GLM-5.2 actually cost?
Only the forthcoming MIT-licensed weights are free. Hosted access requires a GLM Coding Plan, a flat subscription with the Lite tier from $18 a month and higher flat-fee tiers above it (Pro, Max, Team).
This is where the “free API” framing breaks. MIT licenses the weights you can run on your own hardware for nothing beyond the hardware itself; Zhipu’s hosted endpoint is a paid, flat-rate subscription. The self-host path only wins if you already operate inference infrastructure and can amortize it, which most teams do not.
Why the MIT timing matters for closed-model pricing
GLM-5.2 launched the same day Anthropic’s Claude Fable 5 was pulled under US export controls, and Chinese commentary has framed the MIT release as a hedge: open weight distribution routes around export restrictions that bind closed models. That framing rests on secondary reporting, so read it as commentary rather than policy analysis.
The more durable effect is economic. A frontier-adjacent model under MIT with a million-token window changes what Western labs can defensibly charge for. If capability, rather than availability, becomes the only premium closed inference can claim, the per-token pricing of models like Anthropic’s Opus line and OpenAI’s frontier tiers has to be justified on raw quality alone. GLM-5.2’s own quality is unproven, so the pressure today is reputational and tomorrow it is contractual.
The context-window comparison is where the sourcing is thinnest. One opinion column places GLM-5.2’s 1M window against Meta’s Llama 4.5 at 512K, Alibaba’s Qwen3.5 at 256K, and Google’s Gemini 2.0 Ultra at 2M. Those figures come from a single low-confidence source and should be checked against each vendor before they enter a procurement decision.
What should a team test before migrating?
Run a controlled proof-of-concept before swapping Claude Code or any production agent loop onto GLM-5.2, because no vendor benchmark exists to de-risk the move for you. The Anthropic-compatible endpoint makes the swap mechanically cheap; the validation is not.
Three experiments worth running:
- Whole-repo refactor. Load the full codebase into the 1M window and attempt a cross-cutting change. Measure whether the model holds intent across the context or drifts partway through.
- Agent-loop stability. Run a multi-hour autonomous task and count context-loss incidents, tool-call errors, and silent failures.
- Large-document recall. Feed a long spec or API reference and probe recall at the end of the document, where attention is weakest.
Until Zhipu ships benchmarks alongside the weights, every team adopting GLM-5.2 is running its own eval suite. The MIT license and the million-token window lower the barrier to trying. They do not lower the barrier to trusting.
Frequently Asked Questions
How does GLM-5.2’s MIT license compare to rival open coding models?
MIT is more permissive than every named rival. Qwen3.5 ships under Apache 2.0, which carries patent-retaliation clauses; Llama 4.5 uses a custom commercial license that restricts competitors and caps deployment size; and Gemini 2.0 Ultra ships no weights at all. For enterprise legal review, MIT is the cleanest of the group to clear.
What hardware does self-hosting the GLM-5.2 weights actually require?
Mixture-of-experts models hold every expert in memory even when only a fraction fires per token. The unconfirmed 744B-parameter community spec would need enough GPU memory to load the full parameter set, not just the roughly 40B active slice, which rules out single-card rigs and pushes buyers toward multi-GPU or high-memory inference boxes.
Is GLM-5.2 really Zhipu’s newest model line?
Zhipu’s December 2025 flagship was GLM-4.7, and the company’s own About page does not list any GLM-5.x release. The 5.x line (5, 5-Turbo, 5.1, 5.2) spans only February through June 2026, so the version jump is recent and older 4.x artifacts still circulate in vendor materials.
Which buyers face the hardest clearance bar with GLM-5.2?
Regulated-industry procurement teams cannot sign off without vendor-confirmed architecture and benchmark scores, and Zhipu has published neither. They also inherit unverified community claims about the 744B mixture-of-experts design, so the MIT weights, once released, become the first artifact they can actually audit rather than trust on reputation.
What is Zhipu’s track record as a model provider?
Zhipu (Z.ai) is a 2019 spin-off from Tsinghua University’s KEG lab, now publicly listed in Hong Kong as 02513.HK after a January 2026 IPO. The post-IPO period produced four GLM-5-line releases in roughly four months, a cadence that outpaces the company’s willingness to disclose benchmarks on the current flagship.