GPT-5 — August 2025¶
What Changed¶
OpenAI's GPT-5 unified the previously split product line (GPT-4o for speed, o1/o3 for reasoning) into a single system with an intelligent router that dynamically selects between a fast main model and a deeper reasoning model based on query complexity.
GPT-5 represented a qualitative jump in: coding (complex front-end generation, debugging), health-domain questions, writing structure, and hallucination reduction. Available to all users; Pro users get access to gpt-5-thinking-pro for extended reasoning.
Key Technical Details¶
Adaptive routing is the architectural novelty: a lightweight classifier routes each query to either gpt-5-main (fast, efficient) or gpt-5-thinking (deeper, slower). The router is trained on signals including user preferences, measured correctness, conversation type, and tool requirements.
where \(w_m\) incorporates latency and cost penalties for the heavy model.
In Plain English
The router is itself a learned policy: given the query, decide whether it is worth paying the latency and cost of the reasoning model. Over time, the router improves from feedback — user thumbs-down on fast answers that were wrong teaches it to route more carefully.
API surface: introduces a reasoning field with effort level (minimal, low, medium, high) and a verbosity parameter, making reasoning budget explicit.
Technical Details
- Reduced sycophancy: GPT-5 was specifically trained to minimize agreement-seeking behavior — a known failure mode of RLHF-trained models.
- Improved calibration: Better uncertainty expression; more frequent "I don't know" or "I'm not sure" when the model genuinely lacks knowledge.
- Computer use: not a primary feature at GPT-5 launch but available in GPT-5.4.
- Pricing: lower than GPT-4.5 at launch, reflecting inference efficiency improvements.
Practical Implications¶
Treat reasoning effort and verbosity as first-class product knobs: log which settings correlate with user satisfaction and cost. Expect bimodal latency (fast path vs. thinking path) — set timeouts and UX copy accordingly. For reliability-critical flows, allow explicit override to the reasoning model or higher effort instead of relying on the router alone.
Interview Questions
- What are the engineering challenges of a routing architecture (fast main + deep reasoning model)? How do you ensure the router decision is itself low-latency?
- Why is sycophancy a specific failure mode of RLHF training — what in the training signal causes it, and how would you measure and mitigate it?
- How does the unified router model change the user experience compared to explicitly choosing between GPT-4o and o3?
Code Example¶
Illustrative request shape (field names vary by SDK; check current OpenAI API docs):