Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Feature: Dead Letter Queue

When a message fails to deliver after all retry attempts — because the sink rejected it, the payload couldn't be decoded, or a permanent error occurred — it needs somewhere to go. The dead letter queue (DLQ) captures these failed messages in a PostgreSQL table where you can inspect them, understand why they failed, fix the underlying issue, and replay them back through the pipeline.

Without a DLQ, a single poison message can block an entire pipeline. The DLQ isolates failures so healthy messages continue flowing while problematic ones wait for human attention.

How It Works

Message fails        →  Retry (up to max_retries)
Still failing        →  Classify error kind
Route to DLQ         →  INSERT INTO tide.relay_dlq
Continue pipeline    →  Next messages flow normally

Failed messages are inserted into tide.relay_dlq with full context: the pipeline name, source and sink names, the original payload, the error message, and a classification of why it failed.

Error Classifications

Error KindMeaningExample
decodePayload couldn't be decoded from wire formatMalformed JSON, Avro schema mismatch
sink_permanentSink rejected permanently (no retry will help)Invalid credentials, schema validation failure
inbox_permanentInbox insertion failedConstraint violation, duplicate key
max_retries_exceededTransient error persisted beyond retry limitNetwork timeout after 5 attempts

Configuration

DLQ is configured per-pipeline:

SELECT tide.relay_set_outbox(
    'orders-pipeline',
    'order_events',
    '{
        "sink_type": "kafka",
        "brokers": "kafka:9092",
        "topic": "orders",
        "dlq": {
            "enabled": true,
            "max_retries": 5,
            "retry_delay_seconds": 10,
            "retention_days": 30
        }
    }'::jsonb
);

Configuration Reference

ParameterTypeDefaultDescription
dlq.enabledbooltrueEnable dead letter queue
dlq.max_retriesint5Delivery attempts before DLQ routing
dlq.retry_delay_secondsint10Delay between retry attempts
dlq.retention_daysint30Days to keep resolved DLQ entries

Inspecting the DLQ

Query the DLQ table directly:

SELECT id, pipeline_name, error_kind, error_message, created_at
FROM tide.relay_dlq
WHERE resolved_at IS NULL
ORDER BY created_at DESC;

Or use the SQL API:

-- List unresolved DLQ entries for a pipeline
SELECT * FROM tide.relay_dlq_list('orders-pipeline');

-- View full payload of a specific entry
SELECT payload FROM tide.relay_dlq WHERE id = 42;

Replaying Failed Messages

Once you've fixed the underlying issue (corrected credentials, updated schema, fixed payload format), replay messages back through the pipeline:

-- Retry a single message
SELECT tide.relay_dlq_retry(42);

-- Retry all messages for a pipeline
SELECT tide.relay_dlq_retry_all('orders-pipeline');

Replayed messages go through the normal pipeline path. If they fail again, they return to the DLQ with an updated retry count.

Integration with Circuit Breaker

When the circuit breaker opens (sink is unhealthy), messages are routed directly to the DLQ rather than waiting indefinitely. This prevents message buildup in memory while the sink recovers. Once the circuit closes, new messages flow normally — and you can replay DLQ entries to recover the ones that were sidelined.

Monitoring

Track DLQ activity via Prometheus metrics:

  • pg_tide_dlq_entries_total — Total messages routed to DLQ (by pipeline, error_kind)
  • Check the DLQ table row count as part of your alerting

Further Reading