Replication slot storage limit

PostgreSQLReplicationSlotStorageLimit #

Meaning #

Alert is triggered when a PostgreSQL replication slot is close to the maximum size it can use on disk.

Impact #

A non-running replication slot forces PostgreSQL to keep all WAL files on its local storage.

It could lead to:

  • Disk space saturation on the PostgreSQL server
  • Replication slot will no longer be usable if it reaches its max allowed storage
More

The max_slot_wal_keep_size parameter specifies the maximum size of WAL files that replication slots can retain in the pg_wal directory at checkpoint time.

If restart_lsn of a replication slot falls behind the current LSN by more than the given size, the standby using the slot may no longer be able to continue replication due to removal of required WAL files

Diagnosis #

  • Check the replication slot status in Replication slot dashboard

    SQL
    SELECT
        slot_type,
        database,
        slot_name,
        active::TEXT,
        pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) AS replication_lag,
        pg_size_pretty(safe_wal_size) remaining_disk_space,
        wal_status,
        CASE
            WHEN safe_wal_size IS NOT NULL
                THEN (select (safe_wal_size / 1024 / 1024) * 100 / (setting::int) from pg_settings where name = 'max_slot_wal_keep_size')
            else
                100
        END as remaining_disk_space_percent
    FROM pg_replication_slots
    ORDER by remaining_disk_space_percent, database, slot_name desc;
    
  • Check replication slot client logs and performances

    If the replication slot client is not running or performing correctly, it may have difficulties to consume the replication slot

Mitigation #

  • Correct the root cause on replication slot client

  • Increase max_slot_wal_keep_size to allow more disk space for the replication slot (check free storage first!)

  • Increase server storage

Additional resources #