PostgreSQLReplicationSlotStorageLimit #
Meaning #
Alert is triggered when a PostgreSQL replication slot is close to the maximum size it can use on disk.
Impact #
A non-running replication slot forces PostgreSQL to keep all WAL files on its local storage.
It could lead to:
- Disk space saturation on the PostgreSQL server
- Replication slot will no longer be usable if it reaches its max allowed storage
More
The max_slot_wal_keep_size
parameter specifies the maximum size of WAL files that replication slots can retain in the pg_wal
directory at checkpoint time.
If restart_lsn
of a replication slot falls behind the current LSN by more than the given size, the standby using the slot may no longer be able to continue replication due to removal of required WAL files
Diagnosis #
Check the replication slot status in
Replication slot dashboard
SQL
SELECT slot_type, database, slot_name, active::TEXT, pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) AS replication_lag, pg_size_pretty(safe_wal_size) remaining_disk_space, wal_status, CASE WHEN safe_wal_size IS NOT NULL THEN (select (safe_wal_size / 1024 / 1024) * 100 / (setting::int) from pg_settings where name = 'max_slot_wal_keep_size') else 100 END as remaining_disk_space_percent FROM pg_replication_slots ORDER by remaining_disk_space_percent, database, slot_name desc;
Check replication slot client logs and performances
If the replication slot client is not running or performing correctly, it may have difficulties to consume the replication slot
Mitigation #
Correct the root cause on replication slot client
Increase
max_slot_wal_keep_size
to allow more disk space for the replication slot (check free storage first!)Increase server storage