Runbook: DLQ Replay

Applies to: pg-tide relay v0.13.0+
Scope: How to drain a flooded dead-letter queue (DLQ), requeue messages for retry, and monitor progress.

What the DLQ Is

When a message cannot be delivered after exhausting retries (permanent error or circuit breaker open), pg-tide writes it to tide.relay_dlq. This preserves the message for operator review rather than silently dropping it.

-- Count DLQ entries per pipeline:
SELECT   pipeline_name, error_kind, COUNT(*) AS entries
FROM     tide.relay_dlq
GROUP BY pipeline_name, error_kind
ORDER BY entries DESC;

Step 1 — Identify the Root Cause

Before requeuing, understand why messages landed in the DLQ:

-- Inspect recent DLQ entries for a pipeline:
SELECT id, event_id, error_kind, error_message, failed_at
FROM   tide.relay_dlq
WHERE  pipeline_name = 'my-pipeline'
ORDER  BY failed_at DESC
LIMIT  20;

Common error_kind values:

Kind	Meaning	Resolution
`permanent`	Sink rejected the message (invalid schema, auth, etc.)	Fix the root cause before requeuing
`transient`	Sink was unavailable; max retries exhausted	Confirm sink is healthy, then requeue
`dlq_write_failure`	Secondary DLQ write failure	Indicates DLQ table is misconfigured; check `tide.relay_dlq` permissions

Step 2 — Fix the Underlying Problem

Requeuing DLQ entries while the root cause is still present will return them to the DLQ immediately. Fix the sink, schema, or configuration first:

# Validate pipeline config against the live catalog and sink:
pg-tide validate-config --pipeline my-pipeline --postgres-url "$PG_TIDE_POSTGRES_URL"

# Check that the relay can connect to all required services:
pg-tide doctor --postgres-url "$PG_TIDE_POSTGRES_URL"

Step 3 — Requeue Messages for Retry

Via SQL

-- Requeue all DLQ entries for a pipeline (marks them as pending retry):
SELECT tide.dlq_requeue('my-pipeline');

-- Requeue a single entry by ID:
SELECT tide.dlq_requeue_entry(42);

Via CLI

# Requeue all DLQ entries for a pipeline:
pg-tide replay dlq-requeue --pipeline my-pipeline --postgres-url "$PG_TIDE_POSTGRES_URL"

# Preview without actually requeuing (dry run):
pg-tide replay dlq-requeue --pipeline my-pipeline --dry-run --postgres-url "$PG_TIDE_POSTGRES_URL"

Step 4 — Monitor Progress

Watch the DLQ depth decrease and message throughput increase:

# Tail the relay logs:
docker logs -f pg-tide 2>&1 | grep -E "(dlq|pipeline=my-pipeline)"

# Or watch Prometheus metrics:
curl -s http://localhost:9090/metrics | grep pg_tide_relay_dlq

Key metrics:

Metric	Description
`pg_tide_relay_dlq_entries_written_total`	Cumulative DLQ writes — should stop growing after root cause is fixed
`pg_tide_relay_messages_published_total`	Should increase as requeued messages are delivered
`pg_tide_relay_consumer_lag`	Should decrease as the pipeline catches up

Step 5 — Purge Resolved DLQ Entries

After successful redelivery, clean up resolved entries:

-- Delete all successfully redelivered DLQ entries for a pipeline:
DELETE FROM tide.relay_dlq
WHERE  pipeline_name = 'my-pipeline'
  AND  requeued_at IS NOT NULL;

-- Or delete all entries older than 30 days:
DELETE FROM tide.relay_dlq
WHERE  failed_at < NOW() - INTERVAL '30 days';

Flood Control

If the DLQ is growing very fast (more than ~100 entries/min):

Disable the pipeline to stop new DLQ writes:

SELECT tide.relay_disable('my-pipeline');

Fix the root cause.
Clear the DLQ entries that will never be retryable.

Re-enable the pipeline and requeue surviving entries:

SELECT tide.relay_enable('my-pipeline');
SELECT tide.dlq_requeue('my-pipeline');

pg_tide Documentation