Skip to content

GPT-5.4 Family — March 2026

What Changed

The GPT-5.4 family (Thinking, Pro, mini, nano) brought native computer use, up to 1,050,000 token context, and a tiered pricing structure down to $0.20/M tokens for nano. This established the pattern of model families: a flagship for quality, a mini for cost, and a nano for edge/volume use cases — all from the same training run via distillation.

Key Technical Details

Native computer use: the model generates structured action sequences (click, type, scroll, screenshot-then-reason) in a loop — not just text. The action space is formalized:

\[ a_t = \pi_\theta(s_t), \quad s_{t+1} = \text{env}(s_t, a_t) \]

where \(s_t\) is the current screenshot/DOM state and \(a_t\) is a structured action.

In Plain English

The model is now an agent operating in a GUI. Each step: observe state (screenshot or structured UI tree), reason, emit a structured action (click, type), then observe the next state. Training and safety analysis map naturally to an MDP: policy \(\pi_\theta\) maps observations to actions; the environment advances the state.

Technical Details
  • GPT-5.4 standard: \(2.50/\)15 per million tokens.
  • GPT-5.4 Pro: \(30/\)180 per million tokens — extended reasoning.
  • GPT-5.4 mini: \(0.75/\)4.50 per million tokens.
  • GPT-5.4 nano: \(0.20/\)1.25 per million tokens — optimized for high-volume, latency-sensitive calls.
  • 1M token context: full document, codebase, or conversation history in a single prompt.

Practical Implications

Pricing tiers let you route traffic: nano/mini for classification and extraction, standard/Pro for reasoning-heavy or computer-use tasks. For computer use, enforce allowlisted domains, read-only filesystems, and human confirmation for irreversible actions. Long 1M context shifts cost to prefill — profile prompt size and cache reuse.

Interview Questions

  1. How do you distill a frontier model into mini/nano variants without losing critical capabilities? What training signals are preserved vs. compressed away?
  2. What new safety challenges arise from native computer use that don't exist for text-only models?

Code Example

Pricing tiers (per million tokens, illustrative — verify current vendor pricing):

Variant Input Output
GPT-5.4 $2.50 $15.00
GPT-5.4 Pro $30.00 $180.00
GPT-5.4 mini $0.75 $4.50
GPT-5.4 nano $0.20 $1.25

MDP-style action sketch:

s_t = encode(screenshot, DOM, goal)
a_t ~ π_θ(· | s_t)   # structured action: click(x,y), type(text), ...
s_{t+1} = Environment.step(s_t, a_t)