Recent Advances in LLM Research¶
How to Use This Section¶
A chronological log of frontier LLM developments from January 2025 onward. Each entry explains what changed technically, why it matters, and what interviewers probe. Use the timeline to orient yourself, then drill into individual entries for depth.
Last Updated¶
April 2026
Timeline at a Glance¶
| Date | Event | Significance |
|---|---|---|
| Jan 2025 | DeepSeek-R1 & R1-Zero released | RL-only reasoning without SFT; GRPO goes mainstream |
| Jan 2025 | Kimi k1.5 (Moonshot AI) | Long-context RL scaling; matches o1 without MCTS |
| Feb 2025 | GPT-4.5 (OpenAI) | Last major GPT-4 generation; unsupervised learning scaling |
| Mar 2025 | Gemini 2.5 Pro preview (Google) | 1M context, Deep Think mode, configurable reasoning budget |
| Apr 2025 | Llama 4 Scout & Maverick (Meta) | First open-weight natively multimodal MoE; 10M ctx Scout |
| Apr 2025 | Qwen 3 (Alibaba) | 8-model family with hybrid thinking modes; 119 languages |
| May 2025 | Gemini 2.5 Pro GA + I/O updates | Top LMArena ranking, grounded search |
| Jun 2025 | Gemini 2.5 Flash GA | 25% faster, 85% cheaper than Gemini 1.5 Pro |
| Aug 2025 | GPT-5 (OpenAI) | Unified fast+reasoning router; replaces GPT-4o/o1 split |
| Sep 2025 | GLM-4.6 (Zhipu AI) | 357B MoE, 200K context, open-weight under MIT |
| Sep 2025 | Qwen3-Next (Alibaba) | Hybrid linear+softmax attention; 10× throughput at 32K+ |
| Nov 2025 | Claude Opus 4.5 (Anthropic) | SotA on software engineering benchmarks |
| Jan 2026 | Kimi K2.5 (Moonshot AI) | 1T MoE, native vision, Agent Swarm (100 parallel sub-agents) |
| Feb 2026 | Gemini 3.1 Pro (Google) | 77.1% on ARC-AGI-2; cost-competitive with proprietary frontier |
| Feb 2026 | GLM-5 (Zhipu AI) | 744B MoE, agentic engineering, SotA open-weight on SWE-bench |
| Feb 2026 | Qwen 3.5 (Alibaba) | Surpasses Qwen 3 on most benchmarks |
| Mar 2026 | GPT-5.4 family (OpenAI) | 1M+ ctx, native computer use, mini/nano variants |
| Apr 2026 | Llama 4 Behemoth (Meta, in training) | 288B active / 16-expert MoE; distillation teacher |
Individual Entries (Chronological)¶
| # | Entry | Date |
|---|---|---|
| 1 | DeepSeek-R1 and R1-Zero | January 2025 |
| 2 | Kimi k1.5 | January 2025 |
| 3 | GPT-4.5 | February 2025 |
| 4 | Gemini 2.5 Pro | March–June 2025 |
| 5 | Llama 4 | April 2025 |
| 6 | Qwen 3 | April 2025 |
| 7 | GPT-5 | August 2025 |
| 8 | GLM-4.6 | September 2025 |
| 9 | Claude Opus 4.5 | November 2025 |
| 10 | Kimi K2.5 | January 2026 |
| 11 | Gemini 3.1 Pro | February 2026 |
| 12 | GLM-5 | February 2026 |
| 13 | GPT-5.4 Family | March 2026 |
Each entry covers: What Changed, Key Technical Details (with math and plain-English explanations), Practical Implications, Interview Questions, and Code Examples where applicable.
Cross-Cutting Themes (2025–2026)¶
Reasoning as a first-class axis: every frontier lab now has a reasoning model (o3, R1, Gemini Deep Think, GLM-Z1, Qwen 3 thinking mode, Claude extended thinking). RL with verifiable rewards is the standard recipe.
Open-weight MoE at scale: Llama 4, GLM-5, Kimi K2.5, Qwen 3 — open MoE models are now competitive with closed frontiers. Self-hosting is viable for many production use cases.
Hybrid thinking modes: Qwen 3 and Kimi K2.5 pioneered single-checkpoint models that switch between fast and reasoning modes, eliminating the need for separate model deployments.
Context window beyond 128K: 1M (Maverick, Gemini 2.5, GPT-5.4), 10M (Scout), 256K (Kimi K2.5), 200K (GLM-5). Cost-per-token matters as much as quality at these lengths.
Natively multimodal pretraining: early fusion (Llama 4, Kimi K2.5) is gaining over bolt-on vision encoders. Models learn joint text-image representations from day one.
Agentic capabilities: GLM-5's agentic engineering focus and Kimi K2.5's Agent Swarm represent a shift from single-turn chat to autonomous multi-step task execution.
Model families via distillation: GPT-5.4 (Pro/mini/nano), R1 (Distill variants), Scout (distilled from Behemoth). One training run, multiple deployment tiers.
Cost as a frontier: Gemini 3.1 at 1/3 the cost of GPT-5.4 Pro at comparable quality — efficiency is now a primary competitive dimension, not just a secondary consideration.
Further Reading¶
- Core Architectures (MoE, MLA, iRoPE) — technical details on the attention and expert mechanisms underlying these models
- Training and Alignment (RLHF, DPO, GRPO) — the RL recipes powering reasoning models
- Inference and Serving (KV cache, speculative decoding, quantization) — how these large models run in production
- Research Papers — the research lineage: DeepSeekMath → R1, Chinchilla → compute-optimal training, InstructGPT → RLHF, FlashAttention 1→4
Verify vendor docs and licenses before production use — this page reflects the state as of April 2026.