Long running queries

PostgreSQLLongRunningQueries #

Meaning #

Alert is triggered when SQL queries run for an extended period.

Impact #

  • Block WAL file rotation

  • Could block vacuum operations

  • Could block other queries due to locks

  • Could lead to replication lag on replica

Diagnosis #

  1. Open PostgreSQL server live dashboard

  2. Click on the queries to get details

Mitigation #

  1. Identify the PIDs of the long running queries

    SQL
    SELECT
        pid,
        leader_pid,
        datname as database,
        usename AS user,
        application_name,
        EXTRACT(EPOCH FROM now() - xact_start) as transaction_duration,
        CASE
            WHEN state = 'active' THEN EXTRACT(EPOCH FROM now() - query_start)
            ELSE null
        END query_duration,
        state,
        wait_event_type,
        wait_event,
        CASE
            WHEN state = 'active' THEN query
            ELSE null
        END query,
        xact_start as transaction_start,
        CASE
            WHEN state = 'active' THEN query_start
            ELSE null
        END query_start,
        CASE
            WHEN state != 'active' THEN EXTRACT(EPOCH FROM state_change - query_start)
            ELSE null
        END last_query_duration,
        CASE
            WHEN state != 'active' THEN query
            ELSE null
        END last_query,
        pg_blocking_pids(pid) as blocked_by,
        client_addr,
        backend_start,
        ceil(EXTRACT(EPOCH FROM now() - backend_start)) as backend_duration
    FROM pg_stat_activity
    WHERE query_start is not null
        AND usename!='rdsrepladmin'
        AND query not like 'START_REPLICATION %'
        AND pid != pg_backend_pid();
    

    Queries could be blocked in trying to acquire a lock, so pay particular attention at the blocked_by column. If you identify specific queries blocking others, note down their PIDs. Below is a focused view of current locks on the database:

    SQL
    SELECT
        blocked_locks.pid AS blocked_pid,
        blocked_activity.usename AS blocked_user,
        blocking_locks.pid AS blocking_pid,
        blocking_activity.usename AS blocking_user,
        blocked_activity.query AS blocked_statement,
        blocking_activity.query AS current_statement_in_blocking_process
    FROM pg_catalog.pg_locks blocked_locks
    JOIN pg_catalog.pg_stat_activity blocked_activity ON blocked_activity.pid = blocked_locks.pid
    JOIN pg_catalog.pg_locks blocking_locks ON blocking_locks.locktype = blocked_locks.locktype
        AND blocking_locks.database IS NOT DISTINCT FROM blocked_locks.database
        AND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relation
        AND blocking_locks.page IS NOT DISTINCT FROM blocked_locks.page
        AND blocking_locks.tuple IS NOT DISTINCT FROM blocked_locks.tuple
        AND blocking_locks.virtualxid IS NOT DISTINCT FROM blocked_locks.virtualxid
        AND blocking_locks.transactionid IS NOT DISTINCT FROM blocked_locks.transactionid
        AND blocking_locks.classid IS NOT DISTINCT FROM blocked_locks.classid
        AND blocking_locks.objid IS NOT DISTINCT FROM blocked_locks.objid
        AND blocking_locks.objsubid IS NOT DISTINCT FROM blocked_locks.objsubid
        AND blocking_locks.pid != blocked_locks.pid
    JOIN pg_catalog.pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid
    WHERE NOT blocked_locks.granted;
    
  2. Terminate in priority the blocking queries, if not enough, terminate the other long running queries

    SELECT
        pg_terminate_backend(pid),
        usename,
        datname,
        application_name,
        client_addr,
        client_port,
        state,
        wait_event_type,
        wait_event,
        state_change,
        query
    FROM pg_stat_activity
    WHERE pid in ('<replace_with_pids>');
    

Additional resources #

n/a