PostgreSQLInactiveLogicalReplicationSlot #
Meaning #
Alert is triggered when a PostgreSQL logical replication slot is inactive.
Impact #
A non-running replication slot forces PostgreSQL to keep all WAL files on its local storage.
It could lead to:
- Disk space saturation on the PostgreSQL server
- Replication slot will no longer be usable if it reaches its max allowed storage
Diagnosis #
Logical replication slots are used by applications for Change Data Capture.
Example of services that may use CDC: Kafka connect, AWS DMS, …
Prioritize. Look at the replication slot disk space consumption trend in
Replication slot available storage
panel of theReplication slot dashboard
to estimate the delay before reaching storage space saturationIdentify the non-running logical replication slot
List of Replication Slots
SELECT slot_type, database, slot_name, active::TEXT, pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) AS replication_lag, pg_size_pretty(safe_wal_size) remaining_disk_space, wal_status, CASE WHEN safe_wal_size IS NOT NULL THEN (select (safe_wal_size / 1024 / 1024) * 100 / (setting::int) from pg_settings where name = 'max_slot_wal_keep_size') else 100 END as remaining_disk_space_percent FROM pg_replication_slots ORDER by remaining_disk_space_percent, database, slot_name desc;
The inactive slot can be identified with
active=false
from the SQL query above.The
database
andslot_name
information provide elements to identify the slot replication client.If the
wal_status
islost
, you may need to recreate the slot.
Mitigation #
The replication slot client is not consuming its replication slot. Investigate and fix the replication slot client:
- Ensure the client consuming the replication slot (Kafka Connect, AWS DMS, etc.) is up and running
- Check logs of the client to determine if it is producing an error
- Check logs of the PostgreSQL server to determine if there is an error related to the client