RDSPostgreSQLMaximumUsedTransaction #
Meaning #
Alert is triggered when a PostgreSQL server is closed from its maximum number of visible transactions hard limit.
Important It’s a race against time to prevent PostgreSQL from shutting down to prevent data loss.
More
PostgreSQL uses Multi-Version Concurrency Control (MVCC) to provide data access concurrency.
Transaction IDs are stored in 32 bits integers, so PostgreSQL can have up to 4 billion visible transactions.
PostgreSQL continuously recycles transaction IDs once transactions are released.
If transactions are not released, PostgreSQL won’t be able to accept new transactions (or queries).
PostgreSQL internal events:
- 40 million transactions before the upper limit is reached, WARNING messages consisting of a countdown will be logged
- 3 million transactions before the upper limit is reached, PostgreSQL goes to READ-ONLY mode
See official postgresql documentation about routine-vacuuming
Impact #
PostgreSQL will shut down and refuse to start any new transactions once there are fewer than 3 million transactions left until wraparound.
Diagnosis #
Transaction ID Wraparound can be caused by a combination of one or more of the following circumstances:
Check if there a long-running transactions on the
live dashboard
SQL
SELECT pid, leader_pid, datname as database, usename AS user, application_name, EXTRACT(EPOCH FROM now() - xact_start) as transaction_duration, CASE WHEN state = 'active' THEN EXTRACT(EPOCH FROM now() - query_start) ELSE null END query_duration, state, wait_event_type, wait_event, CASE WHEN state = 'active' THEN query ELSE null END query, xact_start as transaction_start, CASE WHEN state = 'active' THEN query_start ELSE null END query_start, CASE WHEN state != 'active' THEN EXTRACT(EPOCH FROM state_change - query_start) ELSE null END last_query_duration, CASE WHEN state != 'active' THEN query ELSE null END last_query, pg_blocking_pids(pid) as blocked_by, client_addr, backend_start, ceil(EXTRACT(EPOCH FROM now() - backend_start)) as backend_duration FROM pg_stat_activity WHERE query_start is not null AND usename!='rdsrepladmin' AND query not like 'START_REPLICATION %' AND pid != pg_backend_pid();
Check replication slots are active and don’t have lag on
live dashboard
.PostgreSQL retains WAL files until the replication slot client confirms received data.
SQL
SELECT database, slot_name, active::TEXT, xmin, pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) AS replication_lag, pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) AS replication_lag_raw FROM pg_replication_slots ORDER by replication_lag_raw, database, slot_name desc;
Check the
autovacuum
is turned onNote:
autovacumm
could be disabled on specific tablesSQL
SHOW autovacuum;
Autovacuum can’t be executed due to intense operations (SELECT, INSERT, UPDATE, DELETE)
Check PostgreSQL vacuum logs
Mitigation #
Terminate long-running queries
Vacuum databases as quickly as possible to prevent a forced shutdown.