Disk space prediction

RDSDiskSpacePrediction #

Meaning #

Alert is triggered when monitoring predict the RDS storage disk space will be full.

Impact #

The PostgreSQL instance will stop to prevent data corruption if no more disk space is available.

Diagnosis #

Determine whether it’s a long-term growth trend requiring storage increase or abnormal disk usage reflecting another problem:

Check database size growth over the last 24 hours to identify abnormal disk usage growth
Look at Storage usage panel in RDS instance details Grafana dashboard
Check if there are long-running transactions (or queries)
Look at Postgresql live activity Grafana dashboard to find long-running query
Why?
A long running transaction (or query)

PostgreSQL

SELECT pid, leader_pid, datname as database, usename AS user, application_name, EXTRACT(EPOCH FROM now() - xact_start) as transaction_duration, CASE WHEN state = 'active' THEN EXTRACT(EPOCH FROM now() - query_start) ELSE null END query_duration, state, wait_event_type, wait_event, CASE WHEN state = 'active' THEN query ELSE null END query, xact_start as transaction_start, CASE WHEN state = 'active' THEN query_start ELSE null END query_start, CASE WHEN state != 'active' THEN EXTRACT(EPOCH FROM state_change - query_start) ELSE null END last_query_duration, CASE WHEN state != 'active' THEN query ELSE null END last_query, pg_blocking_pids(pid) as blocked_by, client_addr, backend_start, ceil(EXTRACT(EPOCH FROM now() - backend_start)) as backend_duration style=color:#66d9ef>FROM pg_stat_activity style=color:#66d9ef>WHERE query_start is not null AND usename!='rdsrepladmin' AND query not like 'START_REPLICATION %' AND pid != pg_backend_pid(); ils>Check the status of PostgreSQL replication slots
Why?
PostgreSQL keeps WAL files on its disk until the replication slot client acknowledges they consumed it.
Check log file sizes
Why?
PostgreSQL logs could consume large disk space.Usually related to:
Connection/disconnections
Slow queries
PGaudit logs
Internal errors
Temporary files
PostgreSQL parameters:
log_temp_files can be set to log temporary file creation above a size threshold
temp_file_limit parameter can be set to avoid over-usage: https://www.postgresql.org/docs/16/runtime-config-resource.html#RUNTIME-CONFIG-RESOURCE-DISK
Mitigation
#
You must avoid reaching no disk space left situation.
Fix the system that blocks PostgreSQL to recycle its WAL files
If long-running transactions/queries: cancel or kill the transactions
If non-running replication slot: delete replication slot
Increase RDS disk space
AWS RDS limitations
Minimal disk increase is 10%
You can’t make further storage modifications for either six (6) hours or until storage optimization has completed on the instance, whichever is longer.
Storage optimization can take several hours
Storage can’t be reclaimed
See more on AWS documentation
Additional resources
#
n/a

Sep 28, 2023

Edit this page

RDSDiskSpacePrediction #

Meaning #

Impact #

Diagnosis #

Mitigation #

Additional resources #