TL;DR

OpenAI shipped ChatGPT Images 2.0 on April 21, 2026, powered by a new model called gpt-image-2. It is the first OpenAI image model with native reasoning: it can plan composition, search the web for real-time facts, and double-check its own outputs before rendering. A single prompt can now return up to 8 coherent images with character and object continuity, at up to 2K resolution, across aspect ratios from 3:1 to 1:3. DALL-E 2 and DALL-E 3 are being retired on May 12.

What's new

OpenAI frames Images 2.0 as a move from rendering tool to "visual thought partner." Two modes ship out of the box:

  • Instant — fast single-shot generation, available to every ChatGPT and Codex user.
  • Thinking — deliberate mode that reasons through structure, searches the web, and verifies outputs. Limited to Plus, Pro, and Business subscribers.

Thinking mode unlocks the headline tricks: 8-image sets with persistent characters and objects (storyboards, multi-panel manga, room-by-room redesigns), dense multilingual text, and self-correction before delivery. Fine-grained rendering also improves across the board — small text, iconography, UI elements, dense compositions. Native multilingual support now covers Japanese, Korean, Chinese, Hindi, and Bengali, with language treated as part of the design rather than a label slapped on top.

Why it matters

Diffusion models historically reconstructed images from noise and treated text as a few stray pixels, which is why you got "enchuita" and "burrto" on AI-made Mexican menus. Images 2.0 flips that: it spells. TechCrunch reports that a generated Mexican menu from the new model would pass for real in a restaurant. Reliable text changes what image models are useful for — infographics, ads, slides, maps, UI mockups, and educational graphics become production-ready instead of rough drafts.

Reasoning is the second unlock. Traditional image generators produce one output per prompt with no self-check loop. Thinking mode gives gpt-image-2 a plan-and-verify stage that, per OpenAI, cuts the reroll tax on wrong object counts, mislabeled diagrams, and inconsistent character details across frames.

Technical facts

Propertygpt-image-2
Launch dateApril 21, 2026
ModesInstant + Thinking
Max images per prompt8 (Thinking mode)
Max resolution2K via API (above 2K in beta)
Aspect ratios3:1 to 1:3
Multilingual renderingJapanese, Korean, Chinese, Hindi, Bengali
Knowledge cutoffDecember 2025
SurfacesChatGPT, Codex, API

API pricing is token-based: $8 per million image input tokens, $30 per million image output tokens, text tokens $5 input / $10 output. A 1024×1024 generation costs $0.006 (low), $0.053 (medium), or $0.211 (high). Larger 1024×1536 outputs are actually slightly cheaper at the high tier: $0.165.

Comparison

Before launch, Google's Gemini held the top spot on the LM Arena text-to-image leaderboard, with OpenAI's previous gpt-image-1.5 in second. Google's Nano Banana Pro pioneered thinking-before-drawing and had a clear edge in avoiding the over-smooth "AI look." gpt-image-2 copies that playbook and pushes further on text rendering and multilingual support.

Resolution & qualitygpt-image-2gpt-image-1.5
1024×1024, high$0.211$0.133
1024×1536, high$0.165$0.20
1024×1024, low$0.006$0.009

Standard square high-quality is more expensive than 1.5; larger aspect ratios are cheaper. Low-tier is cheaper across the board.

Use cases

  • Marketers — localized ad creative, social graphic series, banner-to-mobile-story without post-processing.
  • Educators & technical writers — infographics and diagrams where correctness matters.
  • Designers & comic artists — storyboards, multi-panel manga, iterative design exploration in a single prompt.
  • Developers — UI directions and prototype screenshots inside Codex, no separate API key.
  • Small businesses — menus, posters, signage with reliable text, including non-Latin scripts.

Canva creative strategist Dwayne Koh, quoted by OpenAI: "The model wasn't just rendering images. It was interpreting briefs, understanding audiences, and making creative decisions behind the scenes."

Limitations & pricing

  • Physical reasoning is still shaky — origami guides, Rubik's Cubes, objects on angled or reversed surfaces.
  • Very fine repetitive detail (grains of sand) can still exceed fidelity limits; dense part diagrams may need manual review.
  • Iterative editing stalls — Wharton professor Ethan Mollick notes edits work for the first round or two, then progress plateaus. Workaround: drop the image into a fresh chat to reset context.
  • Above-2K outputs are still in API beta and can be inconsistent.
  • Thinking mode is paywalled — free ChatGPT users get the base quality bump but not the 8-image sets or web search.

What's next

DALL-E 2 and DALL-E 3 retire on May 12, 2026, making gpt-image-2 the canonical OpenAI image endpoint. The company calls physical-world reasoning and high-density detail "important frontiers for future work," and is signaling a bigger strategic bet: image generation as a core interface layer for AI, not a side feature.

Sources: OpenAI, TechCrunch, The New Stack, The Decoder, PetaPixel.