GLM-5 — February 2026¶

What Changed¶

Zhipu AI's GLM-5 is a 744B-parameter MoE model (40B active per forward pass) released open-weight under MIT license in February 2026. It represents a leap from GLM-4.6 (357B/32B active) in both scale and capability, with a focus on agentic engineering — sustained multi-step planning, tool orchestration, and autonomous code generation across full-stack applications.

Key Technical Details¶

Architecture scaling from GLM-4.6 to GLM-5:

Metric	GLM-4.6 (Sep 2025)	GLM-5 (Feb 2026)
Total parameters	357B	744B
Active parameters	~32B	~40B
Pre-training tokens	~20T	28.5T
Context length	200K	200K+
License	MIT	MIT

Slime framework: GLM-5 introduces a novel asynchronous reinforcement learning infrastructure called Slime that improves training throughput at scale. Unlike synchronous PPO or GRPO pipelines, Slime decouples rollout generation from policy updates, enabling higher GPU utilization during RL fine-tuning.

Thinking modes: Multiple inference-time reasoning modes allow users to trade latency for depth — similar to the thinking/non-thinking paradigm seen in Kimi K2.5 and Qwen 3, but with GLM-specific routing.

In Plain English

GLM-5 is built for tasks where the model acts as an engineer, not just a chatbot — it plans a multi-file code change, invokes compilers and test runners, interprets results, and iterates. The RL training specifically rewards long-horizon task completion rather than single-turn helpfulness.

GLM pretraining objective (inherited from the GLM family):

\[ \mathcal{L}_{\text{GLM}} = -\mathbb{E}\left[\sum_{s \in S} \log P_\theta(s \mid x_{\text{corrupt}}, s_{<i})\right] \]

This autoregressive blank-infilling objective unifies bidirectional context encoding (like BERT) with sequential span generation (like GPT).

Benchmark Performance¶

GLM-5 achieves state-of-the-art among open-source models on agentic and coding benchmarks:

Benchmark	GLM-5	Previous Best (Open)
SWE-bench Verified	77.8	~65 (DeepSeek-V3)
Terminal Bench 2.0	56.2	~45
Vending Bench 2	$4,432	—

Performance approaches Claude Opus 4.5 on software engineering tasks, making GLM-5 the strongest open-weight model for agentic coding as of its release.

Practical Implications¶

Agentic deployment requires infrastructure beyond a simple chat API — tool registries, sandboxed execution environments, and multi-turn state management. GLM-5's MIT license makes it viable for self-hosted agentic systems without API costs, but MoE serving (744B total weights) demands expert-aware batching and sufficient VRAM (multi-GPU setups with tensor parallelism).

Open-weight parity with frontier closed models on coding tasks reduces the moat for proprietary APIs. Organizations with GPU infrastructure can now run competitive agentic systems in-house.

Interview Questions

How does the Slime asynchronous RL framework differ from standard synchronous PPO training? What are the throughput benefits?
With 744B total but only 40B active parameters, what determines the serving cost of GLM-5 — active parameters or total parameters? Why?
Compare the GLM pretraining objective (autoregressive blank infilling) with standard causal LM and masked LM. When would each be preferred?
What infrastructure is needed to serve a 744B MoE model on-premise? How does expert parallelism differ from tensor parallelism?