Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Delta Lake

Delta Lake is an open-source storage framework that brings ACID transactions and scalable metadata handling to data lakes. Originally created by Databricks, Delta Lake has become a foundational technology in the Databricks ecosystem and is widely adopted beyond it. When pg_tide delivers messages to Delta Lake, your PostgreSQL events are written as Parquet files with a transaction log that enables time travel, schema enforcement, and reliable upserts.

When to Use This Sink

Choose Delta Lake when your analytics platform is built on Databricks or Spark, when you need ACID transactions on object storage, or when you want to support both streaming and batch queries on the same dataset. Delta Lake's integration with Databricks Unity Catalog provides governance, lineage tracking, and fine-grained access control.

Configuration

SELECT tide.relay_set_outbox(
    'events-to-delta',
    'events',
    'delta-relay',
    '{
        "sink_type": "delta",
        "table_uri": "s3://my-lake/delta/events",
        "storage_options": {
            "AWS_ACCESS_KEY_ID": "${env:AWS_ACCESS_KEY_ID}",
            "AWS_SECRET_ACCESS_KEY": "${env:AWS_SECRET_ACCESS_KEY}",
            "AWS_REGION": "us-east-1"
        },
        "batch_size": 1000
    }'::jsonb
);

Configuration Reference

ParameterTypeDefaultDescription
sink_typestringMust be "delta"
table_uristringDelta table location (S3/GCS/ADLS/local path)
storage_optionsobject{}Cloud storage credentials and options
batch_sizeint1000Records per commit
modestring"append"Write mode: "append" or "overwrite"

How It Works

Each batch of messages is written as a Parquet data file to the Delta table location. The relay then atomically commits the file to the Delta transaction log (_delta_log/). This ensures that readers always see complete, consistent batches. Failed or partial writes do not affect the table state.

Delivery Guarantees

At-least-once delivery. Delta Lake's transaction log is append-only, and each commit is atomic. If the relay crashes before committing, the orphaned Parquet file is ignored. Duplicates can be handled using Delta Lake's MERGE operations or by deduplication during downstream queries.

Troubleshooting

  • "Access denied" — Check storage credentials in storage_options
  • "Table does not exist" — Create the table first using Spark or delta-rs, or enable auto-creation
  • "Conflict during commit" — Concurrent writers detected; relay retries automatically

Further Reading