Some cron jobs should never leave the database: Postgres Background Workers explained

TL;DR

Postgres Background Workers are processes forked by the postmaster that live inside the database server, attach to shared memory, and run custom logic with direct SPI access. Extensions like pg_cron use them to schedule jobs without any app server in the loop. For state-driven, data-local work — expiring rows, cleaning stale data, refreshing derived state — that is a cleaner design than a five-hop trip through your API.

What's actually new here

Background Workers themselves are old (PostgreSQL 9.3+), but the mechanism is quietly becoming the default way to run small automation inside managed Postgres.

pg_cron is now v1.6.7 and ships on every major managed Postgres: AWS RDS, Google Cloud SQL, Supabase, Neon, Azure Flexible Server.
Since v1.5, schedules can run as often as every 1 second ('10 seconds' syntax).
v1.6 added PostgreSQL 16 support; PG18 packages (pg_cron_18) are available today.
cron.use_background_workers = on now routes jobs through real background workers instead of localhost connections — cheaper and faster.

Why it matters

The typical scheduler pattern in most stacks looks like this:

scheduler → app → database → app logic → database write

Five hops for work that never needed to surface to an application layer at all. Every hop adds latency, a failure mode, and a piece of infrastructure you have to operate. Kubernetes CronJobs die. Sidekiq workers restart. Lambda cold-starts add seconds. The data was always sitting right there.

A background worker inside Postgres collapses those hops to one process. It reads state, executes logic, writes the result — all without ever touching a network socket to an app server. No API endpoint to secure, no queue to drain, no container to keep warm. When the primary fails over, your scheduler fails over with it, because the scheduler is the database.

This is the part most teams miss: a Kubernetes CronJob hitting a Postgres-backed API is not “close to the data.” It is a distributed system pretending to do local work. Background Workers remove the pretending.

Technical facts worth knowing

Workers are registered either at startup via RegisterBackgroundWorker() in _PG_init() (requires shared_preload_libraries) or dynamically at runtime via RegisterDynamicBackgroundWorker().
Two flags matter: BGWORKER_SHMEM_ACCESS for shared memory, BGWORKER_BACKEND_DATABASE_CONNECTION for a DB connection.
max_worker_processes caps total registered workers. Tune this if you plan to run many concurrent jobs.
A worker can call BackgroundWorkerInitializeConnection() exactly once — no switching databases mid-worker.
pg_cron defaults: cron.max_running_jobs = 32, parallel jobs allowed, but only one instance per job at a time. Re-triggers queue.
Run history is persisted in cron.job_run_details, queryable with plain SQL.

pg_cron vs app-level cron

Aspect	App cron (K8s, Sidekiq, Lambda)	pg_cron / Background Worker
Hops per job	≥ 5	1 (in-process)
Deployment	Separate scheduler + app infra	`CREATE EXTENSION pg_cron;`
Failover	Need HA scheduler	Follows Postgres primary
Observability	App logs + DB logs	One SQL table
Runtime	Full language ecosystem	SQL / PL/pgSQL / C only

Where this pattern wins

Expiring stale reservations — flip status from pending to expired after N minutes.
Cleaning up old rows — GDPR retention, soft-delete purges, session garbage collection.
Refreshing derived data — REFRESH MATERIALIZED VIEW CONCURRENTLY on a schedule.
Row state transitions — workflow steps that depend only on local data and a timer.
Small maintenance jobs — partition rotation, statistics refresh, index bloat checks.

Limitations & when to stay in the app

This is not the right pattern for everything. Keep the following in your application layer:

External API calls, payment webhooks, third-party SDKs.
Long-running workflows and multi-service orchestration.
Core business logic that needs a rich runtime — Python libs, TypeScript tools, ML models.
Anything that should survive a Postgres failover independently.

Also worth noting: pg_cron jobs do not execute on hot-standby replicas, only on the primary. You can install the extension only once per cluster — use cron.schedule_in_database() to target other databases. And background workers written in C have unrestricted access, so audit anything you load into shared_preload_libraries.

What's next

The interesting trend is that the same Background Worker primitive is powering a whole small ecosystem of in-database automation: pgmq for message queues, pg_later for async SQL, pg_net for async HTTP, YugabyteDB porting pg_cron with multi-node failover. The shift is subtle but real: instead of bolting workers on top of Postgres, teams are letting Postgres be the worker.

If you are already running Postgres and already running a small fleet of app-side cron jobs, try auditing which ones actually touch anything but the database. You will probably find a handful that have no business living in your app tier at all — and moving them into pg_cron deletes more infrastructure than it adds.

The mistake was never using cron. The mistake was forcing data-local work to run far away from the data.

Sources: PostgreSQL Background Workers docs, citusdata/pg_cron, Citus Data blog, original insight by @RaulJuncoV.

Some cron jobs should never leave the database: Postgres Background Workers explained

TL;DR

What's actually new here

Why it matters

Technical facts worth knowing

pg_cron vs app-level cron

Where this pattern wins

Limitations & when to stay in the app

What's next

Tiếp tục lướt

Database-per-PR với Prisma Management API và GitHub Actions: tạm biệt staging DB "chung một phòng"

Schelk: công cụ snapshot filesystem siêu nhanh cho benchmark database vừa được Paradigm open-source

InsForge: Backend built for AI coding agents, not humans

Prisma Postgres gets an official Terraform provider — databases as code, finally

PlanetScale Traffic Control: chặn truy vấn xấu trước khi nó hạ gục cả database