Skip to content

GPT-5 — August 2025

What Changed

OpenAI's GPT-5 unified the previously split product line (GPT-4o for speed, o1/o3 for reasoning) into a single system with an intelligent router that dynamically selects between a fast main model and a deeper reasoning model based on query complexity.

GPT-5 represented a qualitative jump in: coding (complex front-end generation, debugging), health-domain questions, writing structure, and hallucination reduction. Available to all users; Pro users get access to gpt-5-thinking-pro for extended reasoning.

Key Technical Details

Adaptive routing is the architectural novelty: a lightweight classifier routes each query to either gpt-5-main (fast, efficient) or gpt-5-thinking (deeper, slower). The router is trained on signals including user preferences, measured correctness, conversation type, and tool requirements.

\[ \text{route}(q) = \arg\max_{m \in \{\text{main, thinking}\}} P(\text{correct} \mid m, q) \cdot w_m \]

where \(w_m\) incorporates latency and cost penalties for the heavy model.

In Plain English

The router is itself a learned policy: given the query, decide whether it is worth paying the latency and cost of the reasoning model. Over time, the router improves from feedback — user thumbs-down on fast answers that were wrong teaches it to route more carefully.

API surface: introduces a reasoning field with effort level (minimal, low, medium, high) and a verbosity parameter, making reasoning budget explicit.

Technical Details
  • Reduced sycophancy: GPT-5 was specifically trained to minimize agreement-seeking behavior — a known failure mode of RLHF-trained models.
  • Improved calibration: Better uncertainty expression; more frequent "I don't know" or "I'm not sure" when the model genuinely lacks knowledge.
  • Computer use: not a primary feature at GPT-5 launch but available in GPT-5.4.
  • Pricing: lower than GPT-4.5 at launch, reflecting inference efficiency improvements.

Practical Implications

Treat reasoning effort and verbosity as first-class product knobs: log which settings correlate with user satisfaction and cost. Expect bimodal latency (fast path vs. thinking path) — set timeouts and UX copy accordingly. For reliability-critical flows, allow explicit override to the reasoning model or higher effort instead of relying on the router alone.

Interview Questions

  1. What are the engineering challenges of a routing architecture (fast main + deep reasoning model)? How do you ensure the router decision is itself low-latency?
  2. Why is sycophancy a specific failure mode of RLHF training — what in the training signal causes it, and how would you measure and mitigate it?
  3. How does the unified router model change the user experience compared to explicitly choosing between GPT-4o and o3?

Code Example

Illustrative request shape (field names vary by SDK; check current OpenAI API docs):

{
  "model": "gpt-5",
  "input": "Refactor this React component for accessibility.",
  "reasoning": {
    "effort": "medium"
  },
  "text": {
    "verbosity": "low"
  }
}