ChatGPT Images 2.0: OpenAI's gpt-image-2 Thinks Before It Draws

TL;DR

OpenAI shipped ChatGPT Images 2.0 on April 21, 2026, powered by a new model called gpt-image-2. It is the first OpenAI image model with native reasoning: it can plan composition, search the web for real-time facts, and double-check its own outputs before rendering. A single prompt can now return up to 8 coherent images with character and object continuity, at up to 2K resolution, across aspect ratios from 3:1 to 1:3. DALL-E 2 and DALL-E 3 are being retired on May 12.

What's new

OpenAI frames Images 2.0 as a move from rendering tool to "visual thought partner." Two modes ship out of the box:

Instant — fast single-shot generation, available to every ChatGPT and Codex user.
Thinking — deliberate mode that reasons through structure, searches the web, and verifies outputs. Limited to Plus, Pro, and Business subscribers.

Thinking mode unlocks the headline tricks: 8-image sets with persistent characters and objects (storyboards, multi-panel manga, room-by-room redesigns), dense multilingual text, and self-correction before delivery. Fine-grained rendering also improves across the board — small text, iconography, UI elements, dense compositions. Native multilingual support now covers Japanese, Korean, Chinese, Hindi, and Bengali, with language treated as part of the design rather than a label slapped on top.

Why it matters

Diffusion models historically reconstructed images from noise and treated text as a few stray pixels, which is why you got "enchuita" and "burrto" on AI-made Mexican menus. Images 2.0 flips that: it spells. TechCrunch reports that a generated Mexican menu from the new model would pass for real in a restaurant. Reliable text changes what image models are useful for — infographics, ads, slides, maps, UI mockups, and educational graphics become production-ready instead of rough drafts.

Reasoning is the second unlock. Traditional image generators produce one output per prompt with no self-check loop. Thinking mode gives gpt-image-2 a plan-and-verify stage that, per OpenAI, cuts the reroll tax on wrong object counts, mislabeled diagrams, and inconsistent character details across frames.

Technical facts

Property	gpt-image-2
Launch date	April 21, 2026
Modes	Instant + Thinking
Max images per prompt	8 (Thinking mode)
Max resolution	2K via API (above 2K in beta)
Aspect ratios	3:1 to 1:3
Multilingual rendering	Japanese, Korean, Chinese, Hindi, Bengali
Knowledge cutoff	December 2025
Surfaces	ChatGPT, Codex, API

API pricing is token-based: $8 per million image input tokens, $30 per million image output tokens, text tokens $5 input / $10 output. A 1024×1024 generation costs $0.006 (low), $0.053 (medium), or $0.211 (high). Larger 1024×1536 outputs are actually slightly cheaper at the high tier: $0.165.

Comparison

Before launch, Google's Gemini held the top spot on the LM Arena text-to-image leaderboard, with OpenAI's previous gpt-image-1.5 in second. Google's Nano Banana Pro pioneered thinking-before-drawing and had a clear edge in avoiding the over-smooth "AI look." gpt-image-2 copies that playbook and pushes further on text rendering and multilingual support.

Resolution & quality	gpt-image-2	gpt-image-1.5
1024×1024, high	$0.211	$0.133
1024×1536, high	$0.165	$0.20
1024×1024, low	$0.006	$0.009

Standard square high-quality is more expensive than 1.5; larger aspect ratios are cheaper. Low-tier is cheaper across the board.

Use cases

Marketers — localized ad creative, social graphic series, banner-to-mobile-story without post-processing.
Educators & technical writers — infographics and diagrams where correctness matters.
Designers & comic artists — storyboards, multi-panel manga, iterative design exploration in a single prompt.
Developers — UI directions and prototype screenshots inside Codex, no separate API key.
Small businesses — menus, posters, signage with reliable text, including non-Latin scripts.

Canva creative strategist Dwayne Koh, quoted by OpenAI: "The model wasn't just rendering images. It was interpreting briefs, understanding audiences, and making creative decisions behind the scenes."

Limitations & pricing

Physical reasoning is still shaky — origami guides, Rubik's Cubes, objects on angled or reversed surfaces.
Very fine repetitive detail (grains of sand) can still exceed fidelity limits; dense part diagrams may need manual review.
Iterative editing stalls — Wharton professor Ethan Mollick notes edits work for the first round or two, then progress plateaus. Workaround: drop the image into a fresh chat to reset context.
Above-2K outputs are still in API beta and can be inconsistent.
Thinking mode is paywalled — free ChatGPT users get the base quality bump but not the 8-image sets or web search.

What's next

DALL-E 2 and DALL-E 3 retire on May 12, 2026, making gpt-image-2 the canonical OpenAI image endpoint. The company calls physical-world reasoning and high-density detail "important frontiers for future work," and is signaling a bigger strategic bet: image generation as a core interface layer for AI, not a side feature.

Sources: OpenAI, TechCrunch, The New Stack, The Decoder, PetaPixel.

ChatGPT Images 2.0: OpenAI's gpt-image-2 Thinks Before It Draws

TL;DR

What's new

Why it matters

Technical facts

Comparison

Use cases

Limitations & pricing

What's next

Tiếp tục lướt

Codex + gpt-image-2: workflow viết PRD → vẽ UI → code SwiftUI "god-tier fidelity"

Qwen-Image Vừa Bẻ Khoá "Sharper Instruction Following" — Và Đây Không Phải Screenshot

GPT-Image-2 + Seedance 2.0: Vẽ "sơ đồ chuyển động camera" để điều khiển video AI

Chandra OCR 2: Mô hình OCR open-source 4B đánh bại Gemini, dots.ocr và olmOCR

GPT-5.5 truy bug Worker đến comment 'TODO(perf)' Kenton Varda viết 6 năm trước