groundy
models & research

GLM-5.2 vs Claude Opus 4.8: Open-Weight Coding at Frontier Pricing

GLM-5.2 posts 62.1% on SWE-bench Pro and 81.0 on Terminal-Bench 2.1, four points behind Opus 4.8. MIT weights are self-hostable; flat plan starts at $18/month.

6 min · · · 6 sources ↓

GLM-5.2 scores 62.1% on SWE-bench Pro and 81.0 on Terminal-Bench 2.11, four points behind Claude Opus 4.8 on the latter benchmark1. The practical difference for coding teams is less about those four points than about what each model costs to run: Opus 4.8 bills per token; GLM-5.2’s MIT-licensed weights2 are self-hostable at hardware cost only, and Z.ai’s flat-subscription plan starts at $18 per month5.

Zhipu released GLM-5.2 on June 13, 20261. The model is a 753B-parameter mixture-of-experts architecture2 with a confirmed 1M-token context window1, up from 200K in GLM-5.11. This is the fourth GLM-5 family release in roughly four months since Zhipu’s Hong Kong IPO in January 20266.

What Is GLM-5.2 and How Does It Compare to Opus 4.8?

GLM-5.2 is a 753B-parameter sparse mixture-of-experts model from Zhipu/Z.ai, a 2019 Tsinghua University spin-off now publicly listed as 02513.HK6. The README designates it as 744B-A40B, implying approximately 40B active parameters per forward pass, though Zhipu has not stated that figure explicitly1. Total parameter count on the HuggingFace model cards is 753B2.

Claude Opus 4.8 is Anthropic’s closed-weight frontier model, accessible only via Anthropic’s API at per-token pricing.

The core benchmark comparison, both figures from Zhipu’s official GitHub README1:

BenchmarkGLM-5.2Claude Opus 4.8
SWE-bench Pro62.1%not reported
Terminal-Bench 2.181.085.0
AIME 202699.2not reported
HMMT Nov 202594.4not reported
GPQA-Diamond91.2not reported
HLE40.5not reported

The Terminal-Bench 2.1 comparison is the one Zhipu chose to publish directly against Opus 4.8: GLM-5.2 at 81.0 versus Opus 4.8 at 85.01. That is a four-point gap on a benchmark measuring autonomous terminal-based coding tasks. For SWE-bench Pro, Zhipu reports only its own score; the 62.1% figure represents a 3.7-percentage-point improvement over GLM-5.1’s reported 58.4%1.

How Does GLM-5.2 Architecture Work?

GLM-5.2 uses three architectural features worth understanding for production deployments2:

Mixture-of-Experts with sparse activation. The 753B total parameter count does not activate fully on each token. The 744B-A40B designation implies roughly 40B parameters active per forward pass, which determines actual inference compute cost (not the total parameter count).

IndexShare sparse attention. Zhipu’s term for an attention efficiency technique that reuses the same indexer across every four sparse attention layers2. At 1M context length, this reduces per-token FLOPs by 2.9x compared to dense attention over the same context2. For teams running long-context coding tasks, this is the mechanism that makes 1M-token context practically usable rather than theoretically possible.

MTP speculative decoding layer. A multi-token prediction layer that generates multiple output tokens per decoding step, improving throughput on generation-heavy workloads2.

The supported deployment frameworks are SGLang, vLLM, Transformers, and KTransformers1. Weights are publicly downloadable: BF16 at zai-org/GLM-5.2 on HuggingFace2 and FP8 at zai-org/GLM-5.2-FP83. As of June 19, 2026, the FP8 variant has accumulated roughly 93,900 downloads versus 11,900 for BF163, reflecting that most self-hosters are running quantized weights.

The weight license is MIT per the HuggingFace model cards2. The GitHub repository for the GLM-5 family shows Apache-2.0 for the code; the MIT designation applies specifically to the model weights.

What Does GLM-5.2 Cost vs Claude Opus 4.8?

The cost comparison divides into two distinct paths:

Via Z.ai’s hosted API, Zhipu offers a flat subscription called the GLM Coding Plan5:

TierMonthly (annual billing)Monthly (monthly billing)Usage
Lite~$12.6/month$18/month~400 prompts/week
Pro~$50.4/monthnot published5x Lite
Max~$112/monthnot published20x Lite

The API endpoint is Anthropic Messages API compatible, meaning a base URL swap is sufficient to point Claude Code or similar agents at GLM-5.2 without modifying agent code4. At launch, Zhipu listed eight coding agent integrations: Claude Code, Cline, OpenCode, Roo Code, Goose, Crush, OpenClaw, and Kilo Code5.

Via self-hosted weights, the MIT license means no per-token fees and no usage caps2. The hardware requirement is the constraint: at 753B parameters, running the FP8 weights requires a multi-GPU cluster with sufficient combined VRAM. KTransformers targets CPU+DRAM inference for the active-parameter slice, which can reduce hardware requirements, but verified throughput figures for self-hosted configurations are not published by Zhipu.

Claude Opus 4.8 has no self-hosting path. It is available only through Anthropic’s API.

Why the Context Window Jump Matters for Coding Teams

GLM-5.1 had a 200K-token context window. GLM-5.2 extends this to 1M tokens1. Zhipu describes it as “1M lossless context”1.

For coding tasks specifically, a 1M-token window fits:

  • Complete large codebases in a single context (a 200K-line codebase at roughly 4 chars per token fits comfortably)
  • Full test suites alongside the source code they test
  • Entire dependency chains for multi-file refactoring tasks

The IndexShare sparse attention mechanism2 is what enables this window to be practical: without it, attention computation over 1M tokens at the full parameter count would be prohibitively expensive per token. With a 2.9x FLOPs reduction at 1M context2, the model can sustain coherent reasoning over the full window.

Opus 4.8’s context window specification is not compared directly in Zhipu’s benchmark materials.

What Teams Should Consider Before Switching

The four-point Terminal-Bench gap between GLM-5.2 (81.0) and Opus 4.8 (85.0)1 is the headline constraint. Terminal-Bench 2.1 measures autonomous coding in terminal environments, which is the closest published proxy for agentic coding agent performance.

Several factors complicate a direct switch decision:

Benchmark provenance. The Opus 4.8 Terminal-Bench score is reported in Zhipu’s own materials1. Independent third-party verification of the head-to-head comparison is not available as of June 19, 2026. SWE-bench Pro is also self-reported by Zhipu.

Inference infrastructure. Self-hosting 753B parameters requires purpose-built multi-GPU infrastructure. Teams without that should compare the $18/month Lite tier5 against their actual Anthropic API spend. At roughly 400 prompts per week under Lite, heavy users will need Pro or Max.

Context window utilization. If coding workflows regularly approach or exceed 200K tokens, the 5x context increase is structurally significant regardless of benchmark delta.

API compatibility. The Anthropic Messages API compatibility4 means migration friction is low for teams already running Claude Code or similar agents, which reduces the switching cost substantially.

Thinking presets. GLM-5.2 supports High and Max thinking-effort presets for long multi-step coding tasks4, comparable in framing to Anthropic’s extended thinking mode, though the implementations differ.

Frequently Asked Questions

What is GLM-5.2’s SWE-bench Pro score? 62.1%, as reported by Zhipu in the official GLM-5 GitHub repository1. This is up from 58.4% for GLM-5.1.

How does GLM-5.2 compare to Claude Opus 4.8 on Terminal-Bench 2.1? GLM-5.2 scores 81.0 and Claude Opus 4.8 scores 85.0 on Terminal-Bench 2.1, a four-point gap in Opus 4.8’s favor1. Both scores are reported in Zhipu’s benchmark materials.

Can GLM-5.2 be self-hosted? Yes. Weights are MIT-licensed2 and publicly available in BF16 and FP8 formats on HuggingFace at zai-org/GLM-5.2 and zai-org/GLM-5.2-FP83. Supported deployment frameworks include SGLang, vLLM, Transformers, and KTransformers1. Hardware requirements for a 753B model are substantial.

What does GLM-5.2 cost via Z.ai? The GLM Coding Plan Lite tier is $18/month billed monthly, or approximately $12.6/month on an annual plan5, covering roughly 400 prompts per week. Pro and Max tiers offer 5x and 20x Lite usage respectively5.

Is GLM-5.2 API-compatible with Claude Code? Yes. The Z.ai API endpoint implements the Anthropic Messages API, so a base URL change is sufficient to redirect Claude Code or other Anthropic-compatible agents to GLM-5.24.

sources · 6 cited

  1. GLM-5 Model Family. zai-org GitHub repository primary accessed 2026-06-19
  2. GLM-5.2 Model Card. Hugging Face, zai-org primary accessed 2026-06-19
  3. GLM-5.2-FP8 Model Card. Hugging Face, zai-org primary accessed 2026-06-19
  4. GLM-5.2 API Documentation. Zhipu BigModel primary accessed 2026-06-19
  5. GLM Coding Plan Pricing. Z.ai primary accessed 2026-06-19
  6. Zhipu (02513.HK) Unveils Flagship Model GLM-5.2. New Times Space analysis accessed 2026-06-19