Feature: Circuit Breaker
The circuit breaker protects your relay pipelines from cascading failures. When a sink becomes unavailable — a Kafka broker goes down, an HTTP endpoint returns 503s, a database connection drops — the circuit breaker detects the pattern of consecutive failures and stops attempting delivery. This prevents wasted resources, avoids flooding error logs, and gives the downstream system time to recover.
State Machine
The circuit breaker has three states:
failure_threshold
┌────────┐ consecutive ┌────────┐
│ CLOSED │ ─────────────────→│ OPEN │
│(normal)│ │(failing)│
└────────┘ └────────┘
↑ │
│ success_threshold │ half_open_timeout
│ consecutive ↓
┌────────────┐ ┌───────────┐
│ CLOSED │←─────────────│ HALF-OPEN │
└────────────┘ success │ (probe) │
└───────────┘
│
│ failure
↓
┌────────┐
│ OPEN │
└────────┘
Closed (normal operation): Messages flow to the sink. Each failure increments a counter; each success resets it. When consecutive failures reach failure_threshold, the circuit opens.
Open (failing): All publish attempts fail immediately without contacting the sink. After half_open_timeout elapses, the circuit transitions to half-open.
Half-open (recovery probe): A single message is allowed through as a probe. If it succeeds, success_threshold consecutive successes close the circuit. If it fails, the circuit re-opens immediately.
Configuration
SELECT tide.relay_set_outbox(
'orders-pipeline',
'order_events',
'{
"sink_type": "kafka",
"brokers": "kafka:9092",
"topic": "orders",
"circuit_breaker": {
"enabled": true,
"failure_threshold": 5,
"success_threshold": 3,
"half_open_timeout_seconds": 30
}
}'::jsonb
);
Configuration Reference
| Parameter | Type | Default | Description |
|---|---|---|---|
circuit_breaker.enabled | bool | true | Enable circuit breaker |
circuit_breaker.failure_threshold | int | 5 | Consecutive failures to open circuit |
circuit_breaker.success_threshold | int | 3 | Consecutive successes to close from half-open |
circuit_breaker.half_open_timeout_seconds | int | 30 | Seconds before open → half-open transition |
Behavior When Open
When the circuit is open:
- Messages are not sent to the sink (no wasted network calls)
- If a DLQ is configured, messages are routed there for later replay
- If no DLQ, the worker sleeps until the half-open timeout, then probes
- Prometheus metrics reflect the unhealthy state (
pipeline_healthy= 0) - The
/healthendpoint reports the pipeline as unhealthy
Tuning Guidelines
Low failure_threshold (2-3): Opens quickly, aggressive protection. Use for sinks that rarely have transient errors — if they fail twice, something is seriously wrong.
High failure_threshold (10-20): Tolerates intermittent failures. Use for sinks with occasional transient errors (network blips, DNS resolution hiccups).
Short half_open_timeout (5-15s): Recovers quickly after brief outages. Use for sinks that recover fast (load-balanced services, managed cloud endpoints).
Long half_open_timeout (60-300s): Gives downstream systems more time to recover. Use for sinks that take time to restart (database failovers, broker rebalances).
Monitoring
The circuit breaker state is reflected in:
- Prometheus gauge:
pg_tide_pipeline_healthy(1 = closed, 0 = open) - Health endpoint:
/healthreturns 503 when any pipeline's circuit is open - Logs: State transitions logged at
warnlevel (open) andinfolevel (close)
Further Reading
- Dead Letter Queue — Where messages go when circuit is open
- Rate Limiting — Complementary back-pressure mechanism
- Monitoring — Prometheus metrics reference