Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Wire Format: Debezium

The Debezium wire format produces and consumes messages in the same shape as Debezium, the popular open-source CDC platform. This means pg_tide can be a drop-in replacement for Debezium in existing architectures — your Kafka consumers, ksqlDB queries, Flink jobs, and stream processors continue working without modification.

Encoded Format (Outbox → Sink)

For an INSERT operation:

{
  "schema": { ... },
  "payload": {
    "before": null,
    "after": {
      "order_id": "ORD-001",
      "status": "confirmed",
      "total": 99.95
    },
    "op": "c",
    "ts_ms": 1714029482000,
    "source": {
      "version": "pg-tide",
      "connector": "postgresql",
      "name": "production",
      "ts_ms": 1714029482000,
      "db": "mydb",
      "schema": "public",
      "table": "orders",
      "lsn": 12345678
    }
  }
}

For a DELETE operation with tombstones enabled, two messages are produced:

  1. The delete event (with before populated and after as null)
  2. A tombstone message (null value with the same key) for log compaction

Operation Mapping

pg_tide opDebezium opNotes
insertc (create)before = null, after = new row
updateu (update)before = old row, after = new row
deleted (delete)before = old row, after = null

Decoded Format (Source → Inbox)

When consuming Debezium messages (e.g., from an actual Debezium deployment feeding Kafka), pg_tide extracts:

FieldSource
event_idFrom message key or generated UUID
event_type{source.db}.{source.table}
opMapped from payload.op: c→insert, u→update, d→delete, r→insert/upsert
payloadFrom payload.after (or payload.before for deletes)
old_payloadFrom payload.before (for updates)
commit_tsFrom payload.source.ts_ms
source_positionFrom payload.source.lsn

Snapshot reads (op: "r") are treated as inserts by default, configurable via snapshot_op_treatment.

Configuration

[[pipelines]]
name = "orders-to-kafka"
wire_format = "debezium"

[pipelines.wire_config]
server_name = "production"
emit_tombstones = true
envelope = "json"
tombstone_handling = "delete"
key_strategy = "primary_key"
snapshot_op_treatment = "insert"
heartbeat_interval_ms = 10000

Configuration Reference

ParameterTypeDefaultDescription
server_namestring"pg-tide"Logical name in source.name field
envelopestring"json"Envelope type: "json" or "avro"
emit_tombstonesbooltrueEmit null-value tombstone after DELETE
tombstone_handlingstring"delete"How to handle incoming tombstones: "delete" or "drop"
key_strategystring"primary_key"Message key: "primary_key" or "message_key"
snapshot_op_treatmentstring"insert"Treat op=r as: "insert" or "upsert"
heartbeat_interval_msint10000Heartbeat emission interval (0 = disabled)
schema_registry_urlstringnullConfluent Schema Registry URL (for Avro)

Avro Support

When envelope is set to "avro", messages are serialized using Apache Avro with schemas registered in a Confluent-compatible Schema Registry. This provides:

  • Compact binary encoding (smaller messages)
  • Schema evolution with compatibility checks
  • Integration with the Confluent ecosystem
[pipelines.wire_config]
envelope = "avro"
schema_registry_url = "http://schema-registry:8081"

Schema Evolution

The Debezium format tracks schema changes. When a new column appears in the outbox, it's added to the Avro schema or JSON structure automatically. Incompatible changes (column removal) are detected and reported.

Tombstones and Log Compaction

Kafka topic compaction uses null-value messages (tombstones) to signal that a key should be removed. When emit_tombstones is true, a DELETE operation produces two messages:

  1. Delete event — Contains the old row state in before
  2. Tombstone — Null value with the same key, enabling compaction to remove the key

This is essential for Kafka topics configured with cleanup.policy=compact.

Migrating from Debezium to pg_tide

If you're currently using Debezium and want to switch to pg_tide:

  1. Configure pg_tide with wire_format = "debezium" and matching server_name
  2. Consumers see identical message shapes — no changes needed
  3. The source.version field changes to "pg-tide" but this rarely matters
  4. You lose Debezium-specific features (schema history topic) but gain transactional outbox guarantees

Further Reading