Feature: Graceful Shutdown
When the relay receives a shutdown signal (SIGTERM or SIGINT), it doesn't abruptly terminate. Instead, it performs a graceful drain: in-flight batches complete, messages are acknowledged, advisory locks are released, and connections are closed cleanly. This ensures no messages are lost or double-processed during deployments, restarts, or scaling events.
Shutdown Sequence
1. SIGTERM received
2. Coordinator signals all worker tasks to stop
3. Each worker:
a. Finishes current batch publish (if in progress)
b. Acknowledges the batch with the source
c. Exits its processing loop
4. Coordinator waits for all workers to exit
5. Coordinator releases all advisory locks
6. Metrics server stops accepting new requests
7. OpenTelemetry flushes pending traces
8. Process exits with code 0
Why This Matters
Without graceful shutdown:
- In-flight messages could be published to the sink but not acknowledged in the source, causing re-delivery (duplicates)
- Advisory locks would be held until PostgreSQL's connection timeout (potentially minutes), delaying failover
- Metrics might not be scraped for the final interval
- Traces might be lost
With graceful shutdown:
- Every message is either fully processed (published + acknowledged) or not processed at all
- Advisory locks are released immediately, enabling instant failover
- Final metrics are available for scraping
- All traces are exported
Shutdown Timeout
The relay enforces a maximum shutdown duration. If workers don't exit within the timeout, the process terminates forcefully:
# Default: 30 seconds
pg-tide --shutdown-timeout 30
If a sink is extremely slow (e.g., a webhook endpoint that takes 60 seconds to respond), increase this timeout. In Kubernetes, ensure terminationGracePeriodSeconds exceeds your shutdown timeout.
Kubernetes Integration
In Kubernetes deployments, the pod receives SIGTERM when it's being evicted, scaled down, or updated. Configure your deployment to give pg_tide enough time:
spec:
terminationGracePeriodSeconds: 60 # Must exceed shutdown-timeout
containers:
- name: pg-tide
command: ["pg-tide", "--shutdown-timeout", "45"]
PreStop Hook (optional)
If you need extra time for load balancers to drain connections to the metrics endpoint:
lifecycle:
preStop:
exec:
command: ["sleep", "5"]
Signal Handling
| Signal | Behavior |
|---|---|
SIGTERM | Graceful shutdown (standard Kubernetes signal) |
SIGINT | Graceful shutdown (Ctrl+C in terminal) |
SIGKILL | Immediate termination (cannot be caught) |
Further Reading
- HA Coordination — How shutdown interacts with advisory locks
- Deployment Guide — Production deployment practices