RDSPostgreSQLMaximumUsedTransaction #

Meaning #

Alert is triggered when a PostgreSQL server is closed from its maximum number of visible transactions hard limit.

Important It’s a race against time to prevent PostgreSQL from shutting down to prevent data loss.

PostgreSQL uses Multi-Version Concurrency Control (MVCC) to provide data access concurrency.

Transaction IDs are stored in 32 bits integers, so PostgreSQL can have up to 4 billion visible transactions.

PostgreSQL continuously recycles transaction IDs once transactions are released.

If transactions are not released, PostgreSQL won’t be able to accept new transactions (or queries).

PostgreSQL internal events:

40 million transactions before the upper limit is reached, WARNING messages consisting of a countdown will be logged
3 million transactions before the upper limit is reached, PostgreSQL goes to READ-ONLY mode

See official postgresql documentation about routine-vacuuming

Impact #

PostgreSQL will shut down and refuse to start any new transactions once there are fewer than 3 million transactions left until wraparound.

Diagnosis #

Transaction ID Wraparound can be caused by a combination of one or more of the following circumstances:

Check if there a long-running transactions on the live dashboard

SQL

SELECT
    pid,
    leader_pid,
    datname as database,
    usename AS user,
    application_name,
    EXTRACT(EPOCH FROM now() - xact_start) as transaction_duration,
    CASE
        WHEN state = 'active' THEN EXTRACT(EPOCH FROM now() - query_start)
        ELSE null
    END query_duration,
    state,
    wait_event_type,
    wait_event,
    CASE
        WHEN state = 'active' THEN query
        ELSE null
    END query,
    xact_start as transaction_start,
    CASE
        WHEN state = 'active' THEN query_start
        ELSE null
    END query_start,
    CASE
        WHEN state != 'active' THEN EXTRACT(EPOCH FROM state_change - query_start)
        ELSE null
    END last_query_duration,
    CASE
        WHEN state != 'active' THEN query
        ELSE null
    END last_query,
    pg_blocking_pids(pid) as blocked_by,
    client_addr,
    backend_start,
    ceil(EXTRACT(EPOCH FROM now() - backend_start)) as backend_duration
FROM pg_stat_activity
WHERE query_start is not null
    AND usename!='rdsrepladmin'
    AND query not like 'START_REPLICATION %'
    AND pid != pg_backend_pid();

Check replication slots are active and don’t have lag on live dashboard.

PostgreSQL retains WAL files until the replication slot client confirms received data.

SQL

SELECT
    database,
    slot_name,
    active::TEXT,
    xmin,
    pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) AS replication_lag,
    pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) AS replication_lag_raw
FROM pg_replication_slots
ORDER by replication_lag_raw, database, slot_name desc;

Check the autovacuum is turned on
Note: autovacumm could be disabled on specific tables
SQL
```
SHOW autovacuum;
```
Autovacuum can’t be executed due to intense operations (SELECT, INSERT, UPDATE, DELETE)
Check PostgreSQL vacuum logs

Mitigation #

Terminate long-running queries
Vacuum databases as quickly as possible to prevent a forced shutdown.

RDSPostgreSQLMaximumUsedTransaction #

Meaning #

Impact #

Diagnosis #

Mitigation #

Additional resources #