- OpenAI launched ChatGPT Images 2.0 on April 21, 2026.
- The new gpt-image-2 model reasons mid-generation, produces 8 coherent images per prompt, renders dense text and non-Latin scripts, and retires DALL-E on May 12.
TL;DR
OpenAI shipped ChatGPT Images 2.0 on April 21, 2026, powered by a new model called gpt-image-2. It is the first OpenAI image model with native reasoning: it can plan composition, search the web for real-time facts, and double-check its own outputs before rendering. A single prompt can now return up to 8 coherent images with character and object continuity, at up to 2K resolution, across aspect ratios from 3:1 to 1:3. DALL-E 2 and DALL-E 3 are being retired on May 12.
What's new
OpenAI frames Images 2.0 as a move from rendering tool to "visual thought partner." Two modes ship out of the box:
- Instant — fast single-shot generation, available to every ChatGPT and Codex user.
- Thinking — deliberate mode that reasons through structure, searches the web, and verifies outputs. Limited to Plus, Pro, and Business subscribers.
Thinking mode unlocks the headline tricks: 8-image sets with persistent characters and objects (storyboards, multi-panel manga, room-by-room redesigns), dense multilingual text, and self-correction before delivery. Fine-grained rendering also improves across the board — small text, iconography, UI elements, dense compositions. Native multilingual support now covers Japanese, Korean, Chinese, Hindi, and Bengali, with language treated as part of the design rather than a label slapped on top.
Why it matters
Diffusion models historically reconstructed images from noise and treated text as a few stray pixels, which is why you got "enchuita" and "burrto" on AI-made Mexican menus. Images 2.0 flips that: it spells. TechCrunch reports that a generated Mexican menu from the new model would pass for real in a restaurant. Reliable text changes what image models are useful for — infographics, ads, slides, maps, UI mockups, and educational graphics become production-ready instead of rough drafts.
Reasoning is the second unlock. Traditional image generators produce one output per prompt with no self-check loop. Thinking mode gives gpt-image-2 a plan-and-verify stage that, per OpenAI, cuts the reroll tax on wrong object counts, mislabeled diagrams, and inconsistent character details across frames.
Technical facts
| Property | gpt-image-2 |
|---|---|
| Launch date | April 21, 2026 |
| Modes | Instant + Thinking |
| Max images per prompt | 8 (Thinking mode) |
| Max resolution | 2K via API (above 2K in beta) |
| Aspect ratios | 3:1 to 1:3 |
| Multilingual rendering | Japanese, Korean, Chinese, Hindi, Bengali |
| Knowledge cutoff | December 2025 |
| Surfaces | ChatGPT, Codex, API |
API pricing is token-based: $8 per million image input tokens, $30 per million image output tokens, text tokens $5 input / $10 output. A 1024×1024 generation costs $0.006 (low), $0.053 (medium), or $0.211 (high). Larger 1024×1536 outputs are actually slightly cheaper at the high tier: $0.165.
Comparison
Before launch, Google's Gemini held the top spot on the LM Arena text-to-image leaderboard, with OpenAI's previous gpt-image-1.5 in second. Google's Nano Banana Pro pioneered thinking-before-drawing and had a clear edge in avoiding the over-smooth "AI look." gpt-image-2 copies that playbook and pushes further on text rendering and multilingual support.
| Resolution & quality | gpt-image-2 | gpt-image-1.5 |
|---|---|---|
| 1024×1024, high | $0.211 | $0.133 |
| 1024×1536, high | $0.165 | $0.20 |
| 1024×1024, low | $0.006 | $0.009 |
Standard square high-quality is more expensive than 1.5; larger aspect ratios are cheaper. Low-tier is cheaper across the board.
Use cases
- Marketers — localized ad creative, social graphic series, banner-to-mobile-story without post-processing.
- Educators & technical writers — infographics and diagrams where correctness matters.
- Designers & comic artists — storyboards, multi-panel manga, iterative design exploration in a single prompt.
- Developers — UI directions and prototype screenshots inside Codex, no separate API key.
- Small businesses — menus, posters, signage with reliable text, including non-Latin scripts.
Canva creative strategist Dwayne Koh, quoted by OpenAI: "The model wasn't just rendering images. It was interpreting briefs, understanding audiences, and making creative decisions behind the scenes."
Limitations & pricing
- Physical reasoning is still shaky — origami guides, Rubik's Cubes, objects on angled or reversed surfaces.
- Very fine repetitive detail (grains of sand) can still exceed fidelity limits; dense part diagrams may need manual review.
- Iterative editing stalls — Wharton professor Ethan Mollick notes edits work for the first round or two, then progress plateaus. Workaround: drop the image into a fresh chat to reset context.
- Above-2K outputs are still in API beta and can be inconsistent.
- Thinking mode is paywalled — free ChatGPT users get the base quality bump but not the 8-image sets or web search.
What's next
DALL-E 2 and DALL-E 3 retire on May 12, 2026, making gpt-image-2 the canonical OpenAI image endpoint. The company calls physical-world reasoning and high-density detail "important frontiers for future work," and is signaling a bigger strategic bet: image generation as a core interface layer for AI, not a side feature.
Sources: OpenAI, TechCrunch, The New Stack, The Decoder, PetaPixel.

