Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Feature: Graceful Shutdown

When the relay receives a shutdown signal (SIGTERM or SIGINT), it doesn't abruptly terminate. Instead, it performs a graceful drain: in-flight batches complete, messages are acknowledged, advisory locks are released, and connections are closed cleanly. This ensures no messages are lost or double-processed during deployments, restarts, or scaling events.

Shutdown Sequence

1. SIGTERM received
2. Coordinator signals all worker tasks to stop
3. Each worker:
   a. Finishes current batch publish (if in progress)
   b. Acknowledges the batch with the source
   c. Exits its processing loop
4. Coordinator waits for all workers to exit
5. Coordinator releases all advisory locks
6. Metrics server stops accepting new requests
7. OpenTelemetry flushes pending traces
8. Process exits with code 0

Why This Matters

Without graceful shutdown:

  • In-flight messages could be published to the sink but not acknowledged in the source, causing re-delivery (duplicates)
  • Advisory locks would be held until PostgreSQL's connection timeout (potentially minutes), delaying failover
  • Metrics might not be scraped for the final interval
  • Traces might be lost

With graceful shutdown:

  • Every message is either fully processed (published + acknowledged) or not processed at all
  • Advisory locks are released immediately, enabling instant failover
  • Final metrics are available for scraping
  • All traces are exported

Shutdown Timeout

The relay enforces a maximum shutdown duration. If workers don't exit within the timeout, the process terminates forcefully:

# Default: 30 seconds
pg-tide --shutdown-timeout 30

If a sink is extremely slow (e.g., a webhook endpoint that takes 60 seconds to respond), increase this timeout. In Kubernetes, ensure terminationGracePeriodSeconds exceeds your shutdown timeout.

Kubernetes Integration

In Kubernetes deployments, the pod receives SIGTERM when it's being evicted, scaled down, or updated. Configure your deployment to give pg_tide enough time:

spec:
  terminationGracePeriodSeconds: 60  # Must exceed shutdown-timeout
  containers:
    - name: pg-tide
      command: ["pg-tide", "--shutdown-timeout", "45"]

PreStop Hook (optional)

If you need extra time for load balancers to drain connections to the metrics endpoint:

lifecycle:
  preStop:
    exec:
      command: ["sleep", "5"]

Signal Handling

SignalBehavior
SIGTERMGraceful shutdown (standard Kubernetes signal)
SIGINTGraceful shutdown (Ctrl+C in terminal)
SIGKILLImmediate termination (cannot be caught)

Further Reading