Inactive logical replication slot

PostgreSQLInactiveLogicalReplicationSlot #

Meaning #

Alert is triggered when a PostgreSQL logical replication slot is inactive.

Impact #

A non-running replication slot forces PostgreSQL to keep all WAL files on its local storage.

It could lead to:

  • Disk space saturation on the PostgreSQL server
  • Replication slot will no longer be usable if it reaches its max allowed storage

Diagnosis #

Logical replication slots are used by applications for Change Data Capture.

Example of services that may use CDC: Kafka connect, AWS DMS, …

  1. Prioritize. Look at the replication slot disk space consumption trend in Replication slot available storage panel of the Replication slot dashboard to estimate the delay before reaching storage space saturation

  2. Identify the non-running logical replication slot

    List of Replication Slots
    SELECT
        slot_type,
        database,
        slot_name,
        active::TEXT,
        pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) AS replication_lag,
        pg_size_pretty(safe_wal_size) remaining_disk_space,
        wal_status,
        CASE
            WHEN safe_wal_size IS NOT NULL
                THEN (select (safe_wal_size / 1024 / 1024) * 100 / (setting::int) from pg_settings where name = 'max_slot_wal_keep_size')
            else
                100
        END as remaining_disk_space_percent
    FROM pg_replication_slots
    ORDER by remaining_disk_space_percent, database, slot_name desc;
    

    The inactive slot can be identified with active=false from the SQL query above.

    The database and slot_name information provide elements to identify the slot replication client.

    If the wal_status is lost, you may need to recreate the slot.

Mitigation #

The replication slot client is not consuming its replication slot. Investigate and fix the replication slot client:

  • Ensure the client consuming the replication slot (Kafka Connect, AWS DMS, etc.) is up and running
  • Check logs of the client to determine if it is producing an error
  • Check logs of the PostgreSQL server to determine if there is an error related to the client

Additional resources #