Skip to content

GLM-5 — February 2026

What Changed

Zhipu AI's GLM-5 is a 744B-parameter MoE model (40B active per forward pass) released open-weight under MIT license in February 2026. It represents a leap from GLM-4.6 (357B/32B active) in both scale and capability, with a focus on agentic engineering — sustained multi-step planning, tool orchestration, and autonomous code generation across full-stack applications.

Key Technical Details

Architecture scaling from GLM-4.6 to GLM-5:

Metric GLM-4.6 (Sep 2025) GLM-5 (Feb 2026)
Total parameters 357B 744B
Active parameters ~32B ~40B
Pre-training tokens ~20T 28.5T
Context length 200K 200K+
License MIT MIT

Slime framework: GLM-5 introduces a novel asynchronous reinforcement learning infrastructure called Slime that improves training throughput at scale. Unlike synchronous PPO or GRPO pipelines, Slime decouples rollout generation from policy updates, enabling higher GPU utilization during RL fine-tuning.

Thinking modes: Multiple inference-time reasoning modes allow users to trade latency for depth — similar to the thinking/non-thinking paradigm seen in Kimi K2.5 and Qwen 3, but with GLM-specific routing.

In Plain English

GLM-5 is built for tasks where the model acts as an engineer, not just a chatbot — it plans a multi-file code change, invokes compilers and test runners, interprets results, and iterates. The RL training specifically rewards long-horizon task completion rather than single-turn helpfulness.

GLM pretraining objective (inherited from the GLM family):

\[ \mathcal{L}_{\text{GLM}} = -\mathbb{E}\left[\sum_{s \in S} \log P_\theta(s \mid x_{\text{corrupt}}, s_{<i})\right] \]

This autoregressive blank-infilling objective unifies bidirectional context encoding (like BERT) with sequential span generation (like GPT).

Benchmark Performance

GLM-5 achieves state-of-the-art among open-source models on agentic and coding benchmarks:

Benchmark GLM-5 Previous Best (Open)
SWE-bench Verified 77.8 ~65 (DeepSeek-V3)
Terminal Bench 2.0 56.2 ~45
Vending Bench 2 $4,432

Performance approaches Claude Opus 4.5 on software engineering tasks, making GLM-5 the strongest open-weight model for agentic coding as of its release.

Practical Implications

Agentic deployment requires infrastructure beyond a simple chat API — tool registries, sandboxed execution environments, and multi-turn state management. GLM-5's MIT license makes it viable for self-hosted agentic systems without API costs, but MoE serving (744B total weights) demands expert-aware batching and sufficient VRAM (multi-GPU setups with tensor parallelism).

Open-weight parity with frontier closed models on coding tasks reduces the moat for proprietary APIs. Organizations with GPU infrastructure can now run competitive agentic systems in-house.

Interview Questions

  1. How does the Slime asynchronous RL framework differ from standard synchronous PPO training? What are the throughput benefits?
  2. With 744B total but only 40B active parameters, what determines the serving cost of GLM-5 — active parameters or total parameters? Why?
  3. Compare the GLM pretraining objective (autoregressive blank infilling) with standard causal LM and masked LM. When would each be preferred?
  4. What infrastructure is needed to serve a 744B MoE model on-premise? How does expert parallelism differ from tensor parallelism?