Maximum used transactions

RDSPostgreSQLMaximumUsedTransaction #

Meaning #

Alert is triggered when a PostgreSQL server is closed from its maximum number of visible transactions hard limit.

Important It’s a race against time to prevent PostgreSQL from shutting down to prevent data loss.
More

PostgreSQL uses Multi-Version Concurrency Control (MVCC) to provide data access concurrency.

Transaction IDs are stored in 32 bits integers, so PostgreSQL can have up to 4 billion visible transactions.

PostgreSQL continuously recycles transaction IDs once transactions are released.

If transactions are not released, PostgreSQL won’t be able to accept new transactions (or queries).

PostgreSQL internal events:

  • 40 million transactions before the upper limit is reached, WARNING messages consisting of a countdown will be logged
  • 3 million transactions before the upper limit is reached, PostgreSQL goes to READ-ONLY mode

See official postgresql documentation about routine-vacuuming

Impact #

PostgreSQL will shut down and refuse to start any new transactions once there are fewer than 3 million transactions left until wraparound.

Diagnosis #

Transaction ID Wraparound can be caused by a combination of one or more of the following circumstances:

  • Check if there a long-running transactions on the live dashboard

    SQL
    SELECT
        pid,
        leader_pid,
        datname as database,
        usename AS user,
        application_name,
        EXTRACT(EPOCH FROM now() - xact_start) as transaction_duration,
        CASE
            WHEN state = 'active' THEN EXTRACT(EPOCH FROM now() - query_start)
            ELSE null
        END query_duration,
        state,
        wait_event_type,
        wait_event,
        CASE
            WHEN state = 'active' THEN query
            ELSE null
        END query,
        xact_start as transaction_start,
        CASE
            WHEN state = 'active' THEN query_start
            ELSE null
        END query_start,
        CASE
            WHEN state != 'active' THEN EXTRACT(EPOCH FROM state_change - query_start)
            ELSE null
        END last_query_duration,
        CASE
            WHEN state != 'active' THEN query
            ELSE null
        END last_query,
        pg_blocking_pids(pid) as blocked_by,
        client_addr,
        backend_start,
        ceil(EXTRACT(EPOCH FROM now() - backend_start)) as backend_duration
    FROM pg_stat_activity
    WHERE query_start is not null
        AND usename!='rdsrepladmin'
        AND query not like 'START_REPLICATION %'
        AND pid != pg_backend_pid();
    
  • Check replication slots are active and don’t have lag on live dashboard.

    PostgreSQL retains WAL files until the replication slot client confirms received data.

    SQL
    SELECT
        database,
        slot_name,
        active::TEXT,
        xmin,
        pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) AS replication_lag,
        pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) AS replication_lag_raw
    FROM pg_replication_slots
    ORDER by replication_lag_raw, database, slot_name desc;
    
  • Check the autovacuum is turned on

    Note: autovacumm could be disabled on specific tables

    SQL
    SHOW autovacuum;
    
  • Autovacuum can’t be executed due to intense operations (SELECT, INSERT, UPDATE, DELETE)

  • Check PostgreSQL vacuum logs

Mitigation #

  1. Terminate long-running queries

  2. Vacuum databases as quickly as possible to prevent a forced shutdown.

Additional resources #