Running GLM-5.2 in Cursor, Cline, and Roo Code: Migration Checklist and Gotchas

Q: Does switching to GLM-5.2 require changing the agent's SDK?

No. Z.ai's endpoint is Anthropic Messages API-compatible, so agents using the standard Anthropic client library swap over with a base URL and model-name change. No new SDK dependency is required.

Q: Can I self-host GLM-5.2 instead of paying the Z.ai subscription?

MIT-licensed weights are publicly available at BF16 and FP8 precision. Supported deployment frameworks are SGLang, vLLM, Transformers, and KTransformers. Self-hosting removes the per-tier subscription cost, but running a 753B-parameter model requires substantial GPU or NPU capacity. The FP8 variant is the practical path for teams that do not have enough VRAM for BF16.

Q: What is the parameter count: 744B or 753B?

Zhipu's own HuggingFace model cards state 753B. The 744B figure was a community-circulated estimate that predated the official weight release. Use 753B when citing the model in documentation or architecture reviews.

GLM-5.2 launched on June 13, 2026 with an Anthropic Messages API-compatible endpoint,³ which means switching a Claude Code, Cline, or Roo Code setup does not require a new SDK or a rewrite of agent configuration. A base URL swap and a model-name change are the mechanical steps. Everything else is a question of tradeoffs, which are worth examining before the swap, not after.

What is GLM-5.2 and what changed from prior generations?

GLM-5.2 is Zhipu AI’s latest coding-focused model, released under the Z.ai brand.¹ It is a 753B-parameter Mixture-of-Experts model² with approximately 40B active parameters per token implied by the 744B-A40B designation in the official README.¹ The architecture uses IndexShare sparse attention, which reuses the same attention indexer across every four sparse layers, reducing per-token FLOPs by 2.9x at 1M context length.² A Multi-Token Prediction speculative decoding layer is also present.²

The numbers that matter for the coding-agent comparison:

Metric	GLM-5.2	GLM-5.1 (prior gen)
SWE-bench Pro	62.1%¹	58.4%¹
Terminal-Bench 2.1	81.0¹	62.0¹
Context window	1M tokens¹	200K tokens¹
Max output	128K tokens (131,072)³	—

Claude Opus 4.8 scores 85.0 on Terminal-Bench 2.1,¹ four points ahead of GLM-5.2’s 81.0. That gap is relevant for terminal-heavy workflows. For AIME 2026 GLM-5.2 reports 99.2,¹ HMMT Nov 2025 at 94.4,¹ and GPQA-Diamond at 91.2.¹ These are Zhipu’s own published numbers; independent verification has not appeared as of June 19, 2026.

The context window increase from 200K to 1M tokens¹ is the single largest architectural change for agent use. It means full-repository ingestion in a single pass is feasible for codebases that previously required chunking. Whether the model holds coherent retrieval and reasoning at token counts well above 200K has not been validated by an independent long-context eval.

MIT-licensed weights are downloadable now at BF16² and FP8² precision. The FP8 variant had accumulated roughly 93,900 downloads against 11,900 for BF16 as of June 19, 2026,² a ratio that reflects the hardware cost reality of running a 753B model in production.

Which coding agents support GLM-5.2 at launch?

Eight agents ship official GLM-5.2 integration as of the June 13 launch:⁴ Claude Code, Cline, OpenCode, Roo Code, Goose, Crush, OpenClaw, and Kilo Code. All eight route through Z.ai’s Anthropic-compatible API,⁶ so the integration pattern is uniform across agents that already have Anthropic client support.

Cursor is not listed in Z.ai’s published eight-agent launch set.⁴ Any Cursor user migrating to GLM-5.2 would do so through Cursor’s custom model or OpenAI-compatible endpoint configuration, not through a Z.ai first-party integration. That path exists, but it has no official support from either side at launch.

How do you migrate from Claude Opus to GLM-5.2 in Cline, Roo Code, or Claude Code?

The base URL for Z.ai’s Anthropic-compatible endpoint is the single required change.⁶ The pattern applies to any agent that uses the Anthropic Messages API client:

Base URL. Replace Anthropic’s endpoint with Z.ai’s Anthropic-compatible URL. Zhipu’s model overview documentation⁶ specifies this endpoint; the exact string is in their official docs rather than reproduced here to avoid stale copy.
Model name. Set the model identifier to glm-5.2 or glm-5.2[1m] for the 1M-context variant.³ Do not carry over claude-opus-4-8 or any Anthropic model string; it will 404.
API key. The Z.ai API key replaces the Anthropic API key. If the agent stores credentials per-provider (Claude Code and Cline both do), create a new provider entry rather than overwriting the Anthropic key. Keeping both lets you switch back without re-entering credentials.
Thinking presets. GLM-5.2 exposes High and Max thinking-effort presets for long multi-step coding tasks.³ These are not Anthropic extended thinking and may behave differently on tasks tuned for Anthropic’s budget_tokens parameter. Test on a representative task before committing to Max on long agentic sessions.
Context size. The 1M-token window¹ is an upper bound, not a floor. Agents that batch large context windows by default will send more tokens per turn, which draws from the Z.ai subscription allowance faster than the same task would against per-token Anthropic billing.

What gotchas appear at the integration boundary?

Subscription vs. token billing mismatch. Z.ai prices GLM-5.2 through flat-tier subscriptions:⁴ Lite at $18/month (approximately 400 prompts per week), Pro at 5x that usage, Max at 20x (approximately $112/month yearly). There are no per-token charges under this model. For a developer accustomed to Anthropic’s per-token billing, “prompts per week” is a coarser unit. A single 1M-token context pass counts as one prompt against the weekly allowance; a sequence of short tool-call loops counts as many prompts. The Lite tier can drain in a few heavy agentic sessions.

MIT-licensed self-hosting is not free. The BF16 weights² require hardware that can run a 753B-parameter model at inference speed. That is not a consumer GPU workload. Deployment is supported via SGLang, vLLM, Transformers, and KTransformers.¹ The MIT license removes royalty cost but not infrastructure cost. FP8 quantization² reduces memory requirements substantially, which is why the FP8 download count is running nearly 8x BF16.

Benchmark attribution. The 62.1% SWE-bench Pro and 81.0 Terminal-Bench 2.1 numbers are Zhipu-published and appear in the official GitHub repository.¹ They are not yet independently replicated. The 77.8% SWE-bench Verified figure in wider circulation belongs to GLM-5 base (February 2026), a different model. Attributing that number to GLM-5.2 is an error present in some community migration guides.

The Opus 4.8 Terminal-Bench gap. Claude Opus 4.8 scores 85.0 on Terminal-Bench 2.1 against GLM-5.2’s 81.0.¹ For workflows that are terminal-execution-heavy, agentic bash chains, shell testing, CI debugging, this four-point spread may be measurable in practice. For pure code-generation tasks the SWE-bench Pro comparison (62.1% for GLM-5.2) is more relevant, and no current Opus 4.8 SWE-bench Pro number is published in the Z.ai README for a direct apples-to-apples comparison.

Company and supply chain. Zhipu AI is a 2019 Tsinghua University KEG lab spin-off,⁵ publicly listed in Hong Kong as 02513.HK after its January 2026 IPO.⁵ The model is developed and served from China. Teams with data residency requirements or geopolitical supply-chain policies should review those requirements before routing production coding-agent traffic through Z.ai’s hosted endpoint. The MIT weight path avoids this by allowing self-hosting outside Z.ai’s infrastructure.

Why would you switch, and why might you stay on Anthropic?

The case for switching is pricing structure and context window. A flat $18/month Lite subscription⁴ against a budget of roughly 400 prompts per week is straightforward for a solo developer doing bounded coding sessions. The 1M-token window¹ lets full-repository context pass in a single call, which Anthropic’s current per-token pricing makes expensive for large-context workloads.

The case for staying on Anthropic is benchmark verification and tooling maturity. Claude Opus 4.8 leads GLM-5.2 by four points on Terminal-Bench 2.1.¹ The Anthropic ecosystem, extended thinking, tool-use schemas, computer-use, has years of production validation. Claude Code and Cline configurations against Anthropic endpoints are well-documented and stable. GLM-5.2 is a June 2026 release; the integration surface for the eight listed agents⁴ will accumulate edge cases over the next few weeks that are not yet documented.

A reasonable migration path is parallel: keep the Anthropic key active, add Z.ai as a second provider in whichever agent you use, and run a two-week comparison on representative tasks before switching the primary model. The Anthropic-compatible endpoint⁶ makes that comparison low-friction. The flat-tier pricing⁴ makes the two-week trial period cost-bounded.

Frequently Asked Questions

Does switching to GLM-5.2 require changing the agent’s SDK?

No. Z.ai’s endpoint is Anthropic Messages API-compatible,⁶ so agents using the standard Anthropic client library swap over with a base URL and model-name change. No new SDK dependency is required.

Is the 1M-token context available on all Z.ai subscription tiers?

The 1,000,000-token context window is a model-level property of GLM-5.2¹ accessible via the glm-5.2[1m] model identifier.³ All three subscription tiers (Lite, Pro, Max)⁴ route to the same model. The difference between tiers is weekly prompt volume, not context size.

Can I self-host GLM-5.2 instead of paying the Z.ai subscription?

MIT-licensed weights are publicly available at BF16² and FP8² precision. Supported deployment frameworks are SGLang, vLLM, Transformers, and KTransformers.¹ Self-hosting removes the per-tier subscription cost, but running a 753B-parameter model requires substantial GPU or NPU capacity. The FP8 variant is the practical path for teams that do not have enough VRAM for BF16.

What is the parameter count: 744B or 753B?

Zhipu’s own HuggingFace model cards state 753B.² The 744B figure was a community-circulated estimate that predated the official weight release. Use 753B when citing the model in documentation or architecture reviews.