TL;DR

On 2026-04-16, OpenAI shipped Codex app 26.415 — the release that turns Codex from a one-shot coding assistant into a long-horizon teammate. Three features carry the weight: reusable threads, scheduled thread automations that let the agent wake itself up, and a persistent memory preview that carries preferences and project conventions across sessions. Real numbers back it up: GPT-5.3-Codex has already run ~25 hours uninterrupted, burned ~13M tokens, and shipped ~30k lines of code in a single autonomous run.

What's new in 26.415

  • Threads as first-class objects. Chats no longer require picking a project folder. Start a thread for research, planning, or tool-driven work, and come back to it later with full context intact.
  • Scheduled automations. A thread can wake itself on a cron-like schedule, check a long-running process, watch external sources, and continue a follow-up loop — across days or weeks.
  • Persistent memory (preview). Codex retains stable preferences, project conventions, user corrections, and recurring work patterns across threads. CLI 0.121.0 exposes memory mode, reset, and extension cleanup controls.
  • Bigger surface area. Same release ships an in-app browser, background macOS computer use, GPT-Image-1.5 for UI mockups, in-app GitHub PR review, and 90+ new plugins (Atlassian Rovo, CircleCI, CodeRabbit, GitLab Issues, Microsoft Suite, Neon, Remotion, Render, Superpowers).

Why it matters

Single-shot prompts and tight pair-programming loops are a local maximum. What was missing for real delegation was durable state: a way for the agent to remember how you work, pause a task, and pick it back up without a human prompt. That's exactly what threads + automations + memory deliver. The mental model shifts from "assistant you babysit" to "teammate you steer at milestones." OpenAI is explicit about the direction: less micromanaging, more delegation with guardrails.

Technical facts

SignalNumber
Longest continuous GPT-5.3-Codex run~25 hours
Tokens consumed in that run~13M
Lines of code shipped in that run~30,000
GPT-5.4 experimental context window1M tokens
GPT-5.3-Codex-Spark throughput (Cerebras)>1,000 tok/s, 128k ctx
Task-reliability doubling time (METR)~7 months
Container start time reduction48s → 5s (90% faster)
Max task diff size5 MB (up from 1 MB)
Setup script ceiling (Pro / Business)20 minutes
New plugins shipped in 26.41590+

How "wake itself up" actually works

A thread automation is a recurring schedule attached to a specific conversation thread. When it fires, Codex reopens that thread — with its full history, plans, and artifact sidebar — and runs the next step. In practice teams use it for three things:

  1. Polling long-running processes (CI, deploys, cloud jobs) and reacting when state changes.
  2. Monitoring external tools — open PR comments, Slack threads, Gmail, Notion docs, Google Docs comments — then summarizing or acting.
  3. Multi-day engineering work — pause, sleep, resume on the next milestone without replaying context.

For in-session durability, the OpenAI team also recommends a "durable project memory" file stack: Prompt.md (spec + deliverables), Plan.md (milestones + validations), Implement.md (runbook), Documentation.md (live audit log). This is what kept the 25-hour, 30k-LOC run coherent end to end.

Codex vs Claude Code

Claude Code has been the default pick for long, complex codebase work thanks to strong reasoning. Codex was the faster execution-first alternative but lagged at long horizons. 26.415 targets that specific gap:

CapabilityCodex (Apr 2026)Claude Code
Autonomous multi-day runsThread automations + memoryAgent SDK + MCP servers
Desktop app controlBackground computer use (macOS)Via MCP / external tooling
Multi-agent in parallelYes, inside Codex appYes, via subagents
Visual / UI generationGPT-Image-1.5 built-inClaude Design (separate)
Plugin ecosystem90+ native + MCPMCP marketplace

Real use cases

  • Migrations & refactors that stretch across a sprint — let the agent checkpoint progress, run validation, and resume the next morning.
  • Operational watchers — a thread that wakes every hour to triage new GitHub issues, label them, and draft replies.
  • Morning standup assistant — on wake, Codex pulls open Slack threads, failing CI, and Notion comments into a prioritized list.
  • Non-developers — PMs and designers can drive real software work because plans, previews, and rollbacks give safe scaffolding without an IDE.

Limitations & pricing

Limitations: background computer use is not available in EEA, UK, or Switzerland at launch. Memory is a preview — Enterprise, Edu, and EU/UK rollout is "soon." SSH to remote devboxes is still alpha. Codex-Spark is text-only and may queue under load.

Pricing tiers: ChatGPT Free and Go include Codex for a limited time. Plus and Pro get double rate limits plus on-demand credits for overage (Pro also gets Codex-Spark). Business, Enterprise, and Edu add admin controls, managed config, and analytics. Codex-Spark is Pro-only during the research preview.

What's next

OpenAI's near-term roadmap: expand computer use and memory personalization to EU / UK / Enterprise / Edu, push the in-app browser beyond localhost into general web control, move SSH remote devboxes from alpha to GA, and continue the ~7-month doubling cadence on task reliability. The strategic bet is clear — the next battleground isn't who writes the best function, it's who can reliably hold a coherent, multi-day engineering context. 26.415 is OpenAI's first real entry in that race.

Nguồn: OpenAI Codex changelog, OpenAI — Run long horizon tasks with Codex, gHacks, Help Net Security, The Tech Portal.