pg_trickle

pg_trickle is a PostgreSQL 18 extension that adds self-maintaining materialized views — stream tables — and keeps them up to date incrementally as the underlying data changes. No external streaming engine, no sidecars, no bespoke refresh pipeline. Just install the extension and write SQL.

SELECT pgtrickle.create_stream_table(
    name     => 'active_orders',
    query    => 'SELECT * FROM orders WHERE status = ''active''',
    schedule => '30s'
);

INSERT INTO orders (id, status) VALUES (42, 'active');
SELECT count(*) FROM active_orders;  -- 1, automatically

New here? Read What is pg_trickle? for the plain-language overview, or jump to the 5-Minute Quickstart to try it. First time installing? See the Installation Guide.


How it works

pg_trickle keeps stream tables current by tracking every change to the source tables — inserts, updates, and deletes — and recomputing only the parts of the view that are affected by those changes. This is called differential (or incremental) view maintenance. Instead of re-running the full query on every refresh cycle, pg_trickle applies a delta computation proportional to the number of changed rows, not the total table size. A stream table over a billion-row orders table refreshes in milliseconds when only a few rows changed.

Change capture works through row-level AFTER triggers (the default) or WAL-based logical decoding (cdc_mode = 'wal' or the automatic 'auto' mode). Trigger-based capture writes changed rows into a per-source change-buffer table within the same transaction, providing full atomicity with no possibility of a committed change being missed. The background scheduler reads from the change buffer, computes the delta SQL, and applies the result to the stream table using MERGE in a separate transaction.

For queries that cannot be maintained incrementally (non-monotonic functions, LATERAL with volatile sub-expressions, etc.), pg_trickle automatically falls back to a full refresh — replacing the entire stream table contents in a single transaction. You can also force full mode explicitly or let the cost-based AUTO strategy choose per-refresh based on the change-to-table-size ratio.


Choose your path

PersonaStart here
Curious / evaluatorWhat is pg_trickle?Use CasesComparisonsPlayground
Application developer5-Minute QuickstartGetting Started tutorialPatternsSQL Reference
DBA / SREPre-Deployment ChecklistConfigurationTroubleshootingCapacity Planning
Data / analytics engineerUse Casesdbt integrationMigrating from materialized views
Building a dashboard backendReal-Time Analytics Dashboard tutorial
Event-sourced architectureEvent Sourcing / CQRS tutorial
Migrating from REFRESH MATERIALIZED VIEWBackfill and Migration tutorial
Hardening a production deploymentSecurity Hardening tutorialSecurity Guide
Confused by jargonGlossary

What's new

See What's New for a curated summary of recent releases, or the Changelog for the full history.


Project & licensing

What is pg_trickle?

pg_trickle is a PostgreSQL 18 extension that adds stream tables — tables that are defined by a SQL query and stay up to date automatically as the underlying data changes. No external process, no streaming engine, no pipeline to operate. Just install the extension and write SQL.

If you have ever wished CREATE MATERIALIZED VIEW would just keep itself fresh, this is that.


The problem

PostgreSQL's materialized views are powerful but frustrating. REFRESH MATERIALIZED VIEW re-runs the entire query from scratch, even if only one row changed in a million-row table. Your choices are:

  • Burn CPU on full recomputation, on a schedule, and hope you refresh often enough.
  • Accept stale data, and try to explain that to the dashboard user.
  • Build a bespoke refresh pipeline — Debezium, Kafka Connect, a streaming engine, a separate read database. Now you have two systems to operate.

Most teams pick option 3 and end up maintaining infrastructure that is more complex than the application it supports.


What pg_trickle does instead

You declare a stream table with a SQL query and a schedule:

SELECT pgtrickle.create_stream_table(
    name     => 'revenue_by_region',
    query    => $$
        SELECT c.region,
               COUNT(*)                  AS order_count,
               SUM(o.quantity * p.price) AS total_revenue
        FROM orders   o
        JOIN customers c ON c.id = o.customer_id
        JOIN products  p ON p.id = o.product_id
        GROUP BY c.region
    $$,
    schedule => '1s'
);

-- Read it like any table — always fresh.
SELECT * FROM revenue_by_region ORDER BY total_revenue DESC;

Behind the scenes:

  1. pg_trickle parses your query into an operator tree (scans, joins, aggregates).
  2. It captures every INSERT, UPDATE, and DELETE on the source tables — by default with lightweight row-level triggers, with no replication slots required.
  3. On each refresh cycle (every 1s, in the example above) it derives a delta query from the operator tree — the SQL that computes only the change since the last refresh — and merges the result into the stream table.

When you insert one row into a million-row source table, pg_trickle processes one row's worth of computation. Not a million.


Why people care

A few things make this materially different from "just another IVM project":

No external infrastructure. No Kafka, no Flink, no Debezium, no sidecar. The extension lives inside PostgreSQL and uses only its built-in mechanisms.

Stream tables can depend on stream tables. A single write to a base table can ripple through a graph of derived tables, each refreshed in the right order, each doing only the work proportional to what actually changed.

Demand-driven scheduling. With the default CALCULATED schedule mode, you only set a refresh interval on the consumer-facing stream tables — the ones your application actually reads. Upstream stream tables inherit the tightest cadence among their downstream dependents. You declare freshness where it matters; the system propagates it everywhere else.

Hybrid change capture. It bootstraps with row-level triggers, which always work. If wal_level = logical is available, it transitions automatically to WAL-based capture for near-zero write-side overhead. The transition is seamless. If anything goes wrong, it falls back to triggers.

Broad SQL coverage. Joins (inner, left, right, full outer, lateral), GROUP BY with 30+ aggregates, DISTINCT, UNION/INTERSECT/EXCEPT, subqueries (EXISTS, IN, scalar), CTEs including WITH RECURSIVE, window functions, and most of standard SQL. See docs/SQL_REFERENCE.md for the complete matrix.

Inside the same transaction, if you want it. IMMEDIATE refresh mode maintains the stream table inside the same transaction as the source DML, giving read-your-writes consistency without any background worker.


Where the design comes from

The mathematical foundation is the DBSP differential dataflow framework (Budiu et al., 2022). Delta queries are derived automatically from your SQL's operator tree:

  • joins produce the classic bilinear expansion,
  • aggregates maintain auxiliary counters,
  • linear operators like filters and projections pass deltas through unchanged.

You do not need to know any of this to use the extension; the rules are baked in. If you are curious, the DVM Operators reference and the DBSP Comparison explain how pg_trickle relates to the original.


Performance, briefly

Performance is a primary goal. Maximum throughput, low latency, and minimal overhead drive every design decision. Differential refresh is the default; full refresh is a fallback of last resort.

A few headline numbers from the TPC-H validation:

  • All 22 standard analytical queries pass in DIFFERENTIAL, IMMEDIATE, and FULL modes, with identical results across modes.
  • 5–90× measured speedup over FULL across the suite at 1% change rate.
  • The bottleneck is PostgreSQL's own MERGE, not pg_trickle's pipeline.

For workloads with high write volume, hybrid CDC and column-level change tracking keep the write-side overhead low (sub-microsecond per row in WAL mode).


What's it built on

  • Language: Rust, using pgrx 0.18.
  • Targets: PostgreSQL 18.
  • License: Apache 2.0.
  • Status: active development, approaching 1.0. APIs may still change between minor versions; see ROADMAP.md.
  • Tests: thousands of unit, integration, and end-to-end tests. TPC-H 22/22 in all modes.

Try it

The fastest paths from zero to a working stream table:

# Option 1 — playground (sample data, dashboards)
cd playground && docker compose up -d

# Option 2 — minimal Docker image
docker run --rm -e POSTGRES_PASSWORD=secret -p 5432:5432 \
  ghcr.io/trickle-labs/pg_trickle:latest

Then read the 5-Minute Quickstart.


Where to go next

AudienceStart here
Curious / evaluatorUse Cases · Comparisons
Application developerQuickstart (5 min)Tutorial (in depth)Patterns
DBA / SREPre-Deployment ChecklistConfigurationTroubleshooting
Data / analytics engineerUse Casesdbt integrationMigrating from materialized views

Source: https://github.com/trickle-labs/pg-trickle

Note on terminology. A few terms are used throughout the docs without further definition: stream table, differential, delta query, DAG, frontier. The Glossary defines all of them.

Use Cases

pg_trickle is for any team that has wanted PostgreSQL to keep a derived table up to date automatically — without writing a refresh cron, an external streaming pipeline, or a custom CDC consumer.

This page is a gallery of the most common things people build with stream tables. Each section tells you what the pattern looks like, gives you a minimal SQL example, and links to a deeper guide.

New to stream tables? Read What is pg_trickle? first, or jump straight to the 5-minute Quickstart.


At a glance


1. Real-time dashboards

You want KPI tiles ("orders today", "revenue per region", "active users this hour") to update within a second or two of the underlying data changing. With pg_trickle you write the SQL once and any number of dashboards (Grafana, Metabase, Looker, a custom React app) just read from the stream table.

SELECT pgtrickle.create_stream_table(
    'kpi_revenue_today',
    $$SELECT region, SUM(amount) AS revenue
      FROM orders
      WHERE created_at >= date_trunc('day', now())
      GROUP BY region$$,
    schedule => '2s'
);

Why it's a fit: aggregates over high-cardinality groups are exactly where DIFFERENTIAL refresh wins biggest.


2. Operational read models / CQRS

A microservice writes to a normalised event/order/customer table; a read API needs the denormalised projection (one row per order with customer name, current status, latest payment). Most teams build this with a separate read database and a CDC pipeline. With pg_trickle the projection is just a stream table sitting next to the write tables.

SELECT pgtrickle.create_stream_table(
    'order_view',
    $$SELECT o.id, o.placed_at, c.name AS customer,
             p.status AS payment_status,
             COALESCE(s.shipped_at, NULL) AS shipped_at
      FROM orders o
      JOIN customers c ON c.id = o.customer_id
      LEFT JOIN payments p ON p.order_id = o.id
      LEFT JOIN shipments s ON s.order_id = o.id$$,
    refresh_mode => 'IMMEDIATE'
);

Why it's a fit: IMMEDIATE mode gives you read-your-writes consistency without a second database.


3. Fraud detection, alerting, anomaly

Define a stream table that flags suspicious activity (large transactions, velocity rules, unusual geographies). Subscribe an alerter to that stream table — either via PostgreSQL LISTEN/NOTIFY, a downstream publication, or by polling.

SELECT pgtrickle.create_stream_table(
    'high_velocity_accounts',
    $$SELECT account_id, COUNT(*) AS txn_count, SUM(amount) AS total
      FROM transactions
      WHERE occurred_at >= now() - interval '5 minutes'
      GROUP BY account_id
      HAVING COUNT(*) > 20$$,
    schedule => '1s'
);

The demo ships a full 9-node fraud-detection DAG you can run locally with docker compose up.


4. Leaderboards & TopK

ORDER BY ... LIMIT N stream tables are a special case where pg_trickle stores only the top N rows and updates them with scoped recomputation when the changes affect the leaderboard.

SELECT pgtrickle.create_stream_table(
    'top_10_customers',
    $$SELECT customer_id, SUM(amount) AS lifetime_spend
      FROM orders
      GROUP BY customer_id
      ORDER BY lifetime_spend DESC
      LIMIT 10$$,
    schedule => '5s'
);

5. Bronze / Silver / Gold

A medallion architecture in PostgreSQL alone:

  • Bronze – raw ingest table (a regular table you write to).
  • Silver – cleaned and deduplicated stream table.
  • Gold – business-level aggregates stream table that depends on Silver.

Set the schedule only on Gold; pg_trickle propagates the cadence upstream automatically (CALCULATED scheduling).

See Patterns §1.


6. Event-driven services

The transactional outbox/inbox pattern, native to PostgreSQL:

  • Outbox — write events in the same transaction as your business data; an external system drains them to Kafka / NATS / SQS / a webhook.
  • Inbox — receive events idempotently from an external system; stream tables give you live views of pending work, retries, and a dead-letter queue.

See Transactional Outbox and Transactional Inbox.


7. Cross-system replication

Once a stream table exists, you can expose it as a standard PostgreSQL logical-replication publication. Anything that speaks logical replication — Debezium, Kafka Connect, a downstream PostgreSQL replica, a Spark Structured Streaming job, a custom WAL consumer — can subscribe to live changes.

SELECT pgtrickle.stream_table_to_publication('order_view');
-- Subscribers can now subscribe to publication pgt_pub_order_view

See Downstream Publications.


8. Slowly-changing dimensions (SCD)

Type 2 SCDs (one row per version, with valid_from / valid_to) fall out naturally from a stream table over an event log. See Patterns §3.


9. Multi-tenant analytics

If your application multi-tenants by tenant_id, you can build per-tenant aggregates as a single stream table grouped by tenant_id and protect access with PostgreSQL Row-Level Security. See Multi-tenant integration and the Row-Level Security tutorial.


10. Citus distributed analytics

pg_trickle works on Citus-distributed source tables. The scheduler polls per-worker WAL slots via dblink, merges changes on the coordinator, and applies the delta — automatically and idempotently. See Citus.


11. dbt-managed warehouse models

Use the stream_table materialization in dbt. No custom adapter needed — works with the standard dbt-postgres adapter.

{{ config(materialized='stream_table', schedule='5m', refresh_mode='AUTO') }}
SELECT customer_id, SUM(amount) AS total
FROM {{ source('raw', 'orders') }}
GROUP BY customer_id

See dbt integration.


When pg_trickle is not the right fit

A short, honest list — knowing when to look elsewhere is a feature.

  • Pure OLTP with no derived state. If you don't have anything to materialize, you don't need a materializer.
  • Sub-millisecond latency derived state. IMMEDIATE mode is fast, but it pays the differential cost inside your transaction. If you need every transaction to commit in < 1 ms, benchmark first.
  • Stateless event transformation only. If you want "transform each Kafka event" with no stored state, a stream processor (Flink, Bytewax) is closer to that shape.
  • Cross-database joins at scale. pg_trickle reads from local PostgreSQL tables (or Citus distributions). For federated joins across many heterogeneous databases, consider a streaming engine.
  • Workloads where you cannot install extensions. Some managed PostgreSQL services don't allow third-party extensions. Check Installation for the support matrix.

See also: Patterns · Performance Cookbook · SQL Reference · Comparisons

Comparisons

This page compares pg_trickle to adjacent tools so you can decide whether it's the right fit. Each comparison is a short, honest summary — strengths, weaknesses, and "use this instead if…".

If you are evaluating pg_trickle from a specific tool you already run, jump to the relevant section. If you want a deeper academic comparison, see also DBSP Comparison, pg_ivm Comparison, and Prior Art.


At a glance

ToolLives in PostgreSQL?Incremental?External infra?Best for
pg_trickleSelf-maintaining materialized views inside one PostgreSQL
REFRESH MATERIALIZED VIEWPeriodic full recomputation, no automation
pg_ivm✅ (limited)Incremental views with a smaller SQL surface
Materialize✕ (own engine)Whole new databaseCross-source streaming SQL
RisingWave✕ (own engine)Whole new databaseStreaming SQL with PostgreSQL wire compat
Apache FlinkJVM cluster + state backendStateful event processing at scale
Debezium + sink(CDC only)Kafka + ConnectReplicating change events out of PostgreSQL
ksqlDBKafka clusterStreaming SQL on top of Kafka
Snowflake Dynamic TablesSnowflakeAuto-refreshing tables in Snowflake
Custom cron + materialized viewWhat teams build before they find pg_trickle

vs. PostgreSQL REFRESH MATERIALIZED VIEW

The question this answers: "I'm already using materialized views — what would I gain?"

REFRESH MATERIALIZED VIEWpg_trickle stream table
Refresh triggerManual (or your cron)Schedule, transition, or in-transaction (IMMEDIATE)
Refresh costAlways full recomputationIncremental (delta only) for most queries
Cross-table dependenciesManual coordinationDAG-aware topological refresh
ConcurrencyCONCURRENTLY requires unique indexAlways non-blocking; advisory locks coordinate
Read-your-writesNot possibleIMMEDIATE mode
Operator coverageAnything PostgreSQL supportsA large but explicit subset (see SQL Reference)

Use vanilla materialized views if: you only refresh occasionally, your data is small, and you do not have a chain of dependent views.

Switch to pg_trickle if: any of those things stop being true.


vs. pg_ivm

The question this answers: "There's another PostgreSQL extension in this space — how do they relate?"

pg_ivm is an open-source IVM extension that pioneered much of the relevant work in PostgreSQL land. The two projects have different scopes.

pg_ivmpg_trickle
MaturityFirst released 2022First released 2024
Refresh modelTrigger-driven, statement-by-statementTrigger or WAL CDC + scheduler + DAG
SQL coverageAggregates, simple joins, sub-queriesFull DBSP-style coverage incl. WITH RECURSIVE, window functions, FULL OUTER JOIN, LATERAL, GROUPING SETS, scalar subqueries
Cross-table chainsManualDAG with topological refresh and CALCULATED schedules
ModesAlways immediateAUTO / DIFFERENTIAL / FULL / IMMEDIATE
DistributedCitus integration
OperationsMinimal toolingHealth-check, fuse, parallel refresh, snapshots, dbt

There is a more thorough side-by-side at research/PG_IVM_COMPARISON.md.

If your queries are simple aggregates and you want the smallest possible install footprint, pg_ivm is a perfectly good choice. If you want broader SQL, multi-layer DAGs, or operational tooling, pg_trickle is closer to that shape.


vs. Materialize

Materialize is a cloud-native database built specifically for incremental view maintenance. It is the inspiration for much of this space.

Materializepg_trickle
DeploymentSeparate cloud database (or self-hosted server)Extension inside PostgreSQL
Source coveragePostgreSQL, Kafka, S3, MySQL, …PostgreSQL tables (incl. Citus, foreign tables)
LatencyStreaming, sub-secondSub-second with 1s schedule; in-transaction with IMMEDIATE
Joins / aggregates / recursionYes, very matureYes
PricingCommercial cloud productOpen-source, runs anywhere PostgreSQL runs
Operational footprintManaged service or significant self-hosted commitmentAdd-on to existing PostgreSQL

Use Materialize if: you want one engine to materialise across many heterogeneous sources, you want true streaming semantics, and you are happy operating a separate database.

Use pg_trickle if: your data lives in PostgreSQL and you want the materialisation to live there too.


vs. RisingWave

RisingWave is a PostgreSQL-wire-compatible streaming database in Rust. Like Materialize, it is its own engine that you deploy alongside (or instead of) PostgreSQL.

The same trade-off applies: RisingWave is a richer streaming engine; pg_trickle is the answer if you do not want to operate a second database.


Flink is a general stateful stream processor. It can do everything pg_trickle can and a lot more — including state-machine workflows, event-time semantics, and complex windowing.

The trade-off is operational. Flink wants a JVM cluster, a state backend (RocksDB / S3), checkpointing, savepoint management, a schema registry, and so on. For "I want my materialized views to update themselves", that is overkill.

Use Flink if: you have stateful event processing that goes beyond derived tables — state machines, complex CEP, multi-source joins at high throughput.

Use pg_trickle if: you want stream-table semantics and you are already running PostgreSQL.


vs. Debezium + sink (Kafka Connect, etc.)

Debezium captures changes from PostgreSQL and emits them onto Kafka (or another stream). It is only the change-capture half of the problem — you still need a downstream consumer that turns those changes into a derived table.

Debeziumpg_trickle
Captures changes from PostgreSQL✅ (built-in CDC)
Computes derived tables✕ (you write that)
Kafka required
Downstream sinksManyLogical replication via downstream publications

Use Debezium if: you need to fan changes out to many heterogeneous downstream systems (Elasticsearch, S3, Snowflake, a data lake).

Use pg_trickle if: you want the derived table to live in PostgreSQL itself. You can still expose stream-table changes via downstream publications — and even use Debezium to read those.


vs. ksqlDB

ksqlDB gives you streaming SQL on top of Kafka. Same trade-off as Materialize/RisingWave: another engine, another set of operational concerns.

If your data already lives in Kafka and you want SQL on it, ksqlDB is a fine choice. If your data lives in PostgreSQL, pg_trickle is closer to where it already is.


vs. Snowflake Dynamic Tables

Snowflake Dynamic Tables are auto-refreshing tables inside Snowflake. They occupy almost exactly the same conceptual slot as pg_trickle — but in a different database.

Use whichever matches the database you have.


vs. "cron + REFRESH MATERIALIZED VIEW"

This is what most teams build before they find a real IVM tool. It works, until:

  • Refreshes start to overlap.
  • A long refresh blocks readers.
  • The refresh becomes too expensive to run as often as you'd like.
  • A second view depends on the first and you start writing ordering logic.
  • A failure leaves stale data and nobody notices.

When that happens, pg_trickle's quick start is ~5 minutes of setup.


See also: Use Cases · Migrating from materialized views · Migrating from pg_ivm · Research and prior art

Glossary

A plain-language reference for the terms used throughout the pg_trickle documentation. If a term isn't here, check the FAQ — and please open an issue so we can add it.

How to use this page. Most pg_trickle pages link the first use of a jargon term back to the matching entry below. You can also search this page directly (the entries are alphabetised within each section).


Core concepts

Stream table

A table whose contents are defined by a SQL query, and that pg_trickle keeps up to date automatically as the underlying data changes. Think of it as a materialized view that maintains itself — without you ever calling REFRESH MATERIALIZED VIEW.

Defining query

The SQL SELECT statement you give to pgtrickle.create_stream_table(). It can use joins, aggregates, CTEs, window functions, and most of standard SQL. The defining query is what pg_trickle differentiates to compute deltas.

Source table

A regular PostgreSQL table that a stream table reads from. Source tables are written to in the normal way (INSERT, UPDATE, DELETE); pg_trickle captures those writes and propagates them downstream.

Base table

Synonym for source table. Used interchangeably in older docs.

Schedule

How often a stream table refreshes. May be a duration ('5s', '10m'), a cron expression ('@hourly', '0 * * * *'), the special value 'CALCULATED' (derived from downstream consumers), or NULL (only refresh when called manually, or for IMMEDIATE mode).

Refresh

A single round of bringing a stream table up to date with its sources. Each refresh either rewrites the whole result (FULL) or applies only the incremental change (DIFFERENTIAL).

Refresh mode

Tells pg_trickle how to refresh. The four modes:

  • AUTO — pick the cheapest mode each cycle (the default).
  • DIFFERENTIAL — incremental: only changed rows are processed.
  • FULL — re-run the entire defining query.
  • IMMEDIATE — refresh inside the same transaction as the source DML (no scheduler involved).

Incremental View Maintenance (IVM)

The technique of updating a materialized view by computing only the change induced by recent edits, rather than re-running the whole query. pg_trickle is an IVM engine for PostgreSQL.

Differential

Synonym for "incremental" in the IVM sense. Also the name of the refresh mode that uses incremental computation. Inspired by differential dataflow and the DBSP framework — see also delta query below.

Delta query (ΔQ)

The SQL pg_trickle generates internally to compute the change in a stream table given a change in its inputs. ΔQ is derived automatically from the defining query's operator tree. You can inspect it with pgtrickle.explain_st(name).

Operator tree

The internal representation of your defining query — a tree of nodes like scan, filter, join, aggregate. pg_trickle differentiates this tree operator by operator to derive the delta query.

DAG (directed acyclic graph)

The shape of your stream-table dependencies. If stream table B reads from stream table A, there is an edge A → B. pg_trickle refreshes the DAG in topological order so that downstream tables always see consistent upstream state.

Diamond (in a DAG)

A pattern where two parallel branches both depend on a common ancestor and both feed into a common descendant (A → B → D and A → C → D). pg_trickle refreshes diamonds atomically to prevent the descendant from seeing one branch updated and the other not.

SCC (strongly connected component)

A cycle in the dependency graph — a group of stream tables that all transitively depend on each other. pg_trickle supports cycles only for monotone queries and only when explicitly enabled (pg_trickle.allow_circular). See Circular Dependencies.


Change capture

CDC (Change Data Capture)

The mechanism that records every INSERT, UPDATE, and DELETE on a source table so pg_trickle knows what changed since the last refresh. pg_trickle has two CDC backends — see CDC Modes.

Trigger-based CDC

The default backend. Lightweight AFTER row-level triggers on each source table write a single row to a change buffer per data change. Cost is roughly 2–15 µs per row, paid by the writing transaction.

WAL-based CDC

The optional backend that uses PostgreSQL's logical replication to read changes from the write-ahead log instead of via triggers. Requires wal_level = logical. Adds near-zero write-side cost.

Hybrid CDC

The default behaviour: pg_trickle starts with triggers (which always work), and if wal_level = logical is available, transitions automatically to WAL once the first refresh succeeds. If anything goes wrong, it falls back to triggers.

Change buffer

A small per-source table in the pgtrickle_changes.* schema that holds captured changes between refreshes. Each refresh drains the relevant rows.

Compaction

Collapsing redundant entries in a change buffer — for example an INSERT followed by a matching DELETE cancels out. pg_trickle compacts buffers automatically when refreshes are batched.

Watermark

A timestamp or position published by an external loader that tells pg_trickle "you can safely consider data through here as complete". Downstream refreshes wait until all relevant watermarks are aligned. Used in ETL bootstrap patterns.

Frontier

A set of per-source positions (LSN or logical timestamps) that records where each input was up to at the moment of the last refresh. The next refresh reads from frontier→now. Frontiers are how pg_trickle guarantees correctness across multiple sources.

LSN (Log Sequence Number)

A PostgreSQL identifier for a position in the write-ahead log. Looks like 16/B374D8A0. pg_trickle uses LSNs to record frontiers in WAL CDC mode.


Refresh & scheduling

CALCULATED schedule

The default. Set an explicit refresh interval only on the consumer-facing stream tables (the ones your application queries). Upstream stream tables inherit the tightest cadence among their downstream dependents automatically.

Tier (Hot / Warm / Cold / Frozen)

A coarse refresh cadence bucket used for very large deployments (pg_trickle.tiered_scheduling = on). Hot tables refresh as scheduled; Frozen tables never refresh until manually thawed.

Adaptive fallback

The engine's automatic switch from DIFFERENTIAL to FULL when the change ratio exceeds a threshold (default 50%). It switches back when the rate drops.

Fuse (circuit breaker)

A safety mechanism: a stream table that fails repeatedly is automatically suspended so it cannot block the scheduler. You re-enable it with pgtrickle.reset_fuse(). See Fuse Circuit Breaker.

MERGE

PostgreSQL's MERGE statement, which lets pg_trickle apply a delta (insert / update / delete in one go) to the stream table's storage. Most of a refresh's wall-clock time is spent in MERGE — i.e., in PostgreSQL itself, not in pg_trickle.

Scoped recomputation

A delta-application strategy used for MIN, MAX, and TopK aggregates: re-aggregate just the affected groups (rather than the whole result) by reading only the rows that match the changed keys.

Group-rescan

Similar to scoped recomputation, used for "holistic" aggregates like STRING_AGG, ARRAY_AGG, MODE, PERCENTILE_*.

Predicate pushdown

The optimisation that injects WHERE clauses from the defining query directly into change-buffer scans, so irrelevant changes are filtered out at read time.

Columnar tracking

A capture-side optimisation: CDC records only the columns referenced by the defining query, encoded as a bitmask. Updates that touch only unreferenced columns are skipped entirely.


Background workers

Launcher

The single per-server background worker that scans pg_database every few seconds and spawns a scheduler in every database where pg_trickle is installed.

Scheduler

The per-database background worker that wakes periodically, decides which stream tables are due for refresh, and (in parallel mode) dispatches refresh jobs to a worker pool.

BGW (BackGround Worker)

A PostgreSQL concept — a long-running process spawned by the postmaster. pg_trickle uses BGWs for the launcher, schedulers, and parallel refresh workers.

Parallel refresh

An execution mode (pg_trickle.parallel_refresh_mode = 'on') where independent stream tables in the DAG are refreshed concurrently across a pool of dynamic background workers.


Aggregates

Algebraic aggregate

An aggregate that can be maintained from previous state plus a delta (SUM, COUNT, AVG by tracking sum + count). Cheapest possible IVM.

Semi-algebraic aggregate

An aggregate that can be maintained on inserts cheaply, but on a delete may need to rescan the affected group (MIN, MAX). pg_trickle handles this with scoped recomputation.

Holistic aggregate

An aggregate that has no incremental form (PERCENTILE_*, MODE, STRING_AGG, ARRAY_AGG). pg_trickle re-aggregates the affected groups from source.


Stream-table features

Snapshot

A point-in-time copy of a stream table's contents (v0.27+). Useful for backups, replica bootstrap, or test fixtures. See Snapshots.

Outbox

A stream-table-backed implementation of the transactional outbox pattern: write events in the same transaction as your business data; external systems consume them with at-least-once delivery. See Transactional Outbox.

Inbox

The mirror of the outbox: receive events idempotently from an external system, with stream tables giving you live views of pending work and a dead-letter queue. See Transactional Inbox.

Publication (downstream)

A regular PostgreSQL logical-replication publication automatically created over a stream table's storage. Lets Debezium, Kafka Connect, Spark, etc. subscribe to stream-table changes without an extra pipeline. See Downstream Publications.

Relay

A standalone Rust binary (pg-tide-relay) that bridges outbox/inbox tables with external messaging systems (NATS, Kafka, Redis Streams, SQS, RabbitMQ, webhooks). Extracted to the pg_tide project in v0.46.0.

TopK

Stream tables of the form SELECT … ORDER BY x LIMIT N (optionally with OFFSET M). pg_trickle stores only the top N rows and recomputes them incrementally when the changes affect the leaderboard.

IMMEDIATE mode

A refresh mode that maintains the stream table inside the same transaction as the source DML — no scheduler, no change buffers. Gives read-your-writes consistency at the cost of slightly heavier writes.


Engine internals

DVM (Differential View Maintenance)

The engine inside pg_trickle that turns operator trees into delta queries. The name is used informally; the academic name for the underlying technique is differential dataflow.

DBSP

The academic framework that pg_trickle's differentiation rules are based on. See the DBSP Comparison for the relationship between pg_trickle and the original DBSP runtime.

Bilinear expansion

The expansion of Δ(A ⋈ B) = ΔA ⋈ B + A ⋈ ΔB + ΔA ⋈ ΔB for joins. This is the formal recipe for incrementally maintaining a join. You don't need to know it to use pg_trickle.

Semi-naive evaluation

A classical algorithm for incremental recursive queries (WITH RECURSIVE). pg_trickle uses it in IMMEDIATE and DIFFERENTIAL modes for insert-only recursion.

DRed (Delete-and-Rederive)

A companion algorithm to semi-naive evaluation that handles deletions inside recursive queries. Also used in IMMEDIATE mode.

__pgt_row_id

A hidden column pg_trickle adds to every stream table to give each row a stable identity, even if the defining query has no primary key. You can ignore it in your queries; it is replicated correctly to logical-replication subscribers.

Auto-rewrite

A pipeline of small rewrites pg_trickle applies to your defining query before differentiation — for example expanding DISTINCT ON to ROW_NUMBER(), inlining views, or splitting GROUPING SETS into UNION ALL. See Auto-Rewrite Pipeline.


Operations & infrastructure

GUC (Grand Unified Configuration)

A PostgreSQL configuration variable, set in postgresql.conf, by ALTER SYSTEM, or per-session with SET. All pg_trickle GUCs start with pg_trickle.*. The full list is in Configuration.

Advisory lock

A lightweight lock you ask PostgreSQL to keep on your behalf. pg_trickle uses advisory locks to coordinate refreshes across processes — including across PgBouncer-pooled sessions, which is why pg_trickle is pooler-friendly.

Pooler

Connection pooler such as PgBouncer or pgcat, often deployed in front of PostgreSQL. pg_trickle's background workers connect directly (not through the pooler), so your app's pooler does not interact with refresh activity.

CNPG

CloudNativePG, a Kubernetes operator for PostgreSQL. pg_trickle ships a minimal scratch-based OCI image suitable for CNPG's Image Volume Extensions feature.

pgrx

The Rust framework pg_trickle is built with. Provides safe wrappers around PostgreSQL internals.

wal_level

The PostgreSQL setting that controls how much information the write-ahead log contains. Values: minimal, replica (default), logical. WAL-based CDC requires logical.

Replication slot

A PostgreSQL object that retains WAL until a consumer has read it. WAL CDC mode creates one slot per source table; trigger CDC mode does not.


Acronym key

AcronymMeaning
BGWBackground worker
CDCChange Data Capture
CNPGCloudNativePG
CTECommon Table Expression (WITH …)
DAGDirected Acyclic Graph
DBSPDatabase Stream Processor (the framework pg_trickle is inspired by)
DDLData Definition Language (CREATE, ALTER, DROP)
DLQDead-Letter Queue
DMLData Manipulation Language (INSERT, UPDATE, DELETE)
DRedDelete-and-Rederive (recursive query algorithm)
DVMDifferential View Maintenance (the engine inside pg_trickle)
GUCGrand Unified Configuration (PostgreSQL setting)
IVMIncremental View Maintenance
LSNLog Sequence Number
OIDObject Identifier (PostgreSQL row ID for catalog objects)
ΔQDelta query — the SQL pg_trickle generates to compute changes
RLSRow-Level Security
SCCStrongly Connected Component (a cycle in the DAG)
SLAService Level Agreement
SLOService Level Objective
SPIServer Programming Interface (PostgreSQL's in-process query API)
SRFSet-Returning Function
STStream Table
WALWrite-Ahead Log

See also: FAQ · SQL Reference · Architecture · Configuration

Playground

The quickest way to explore pg_trickle is the playground — a pre-configured Docker environment with sample data and stream tables ready to query. No installation, no configuration. One command and you're running.

Quick Start

git clone https://github.com/trickle-labs/pg-trickle.git
cd pg-trickle/playground
docker compose up -d

Then connect:

psql postgresql://postgres:playground@localhost:5432/playground

PostgreSQL 18+ note: The Docker image stores data in a versioned subdirectory (/var/lib/postgresql/18/main). The compose file mounts /var/lib/postgresql (not .../data) — this is intentional.


What's Pre-Loaded

The seed script creates three base tables and five stream tables that cover the most common pg_trickle patterns.

Base Tables

TableDescription
productsProduct catalog with categories and prices
ordersOrder line items with quantities and timestamps
customersCustomer profiles with regions

Stream Tables

Stream TableQueryPattern demonstrated
sales_by_regionSUM(total) grouped by regionBasic aggregate, DIFFERENTIAL mode
top_productsSUM(quantity) ranked by categoryWindow function (RANK())
customer_lifetime_valueRevenue + order count per customerMulti-table join + aggregates
daily_revenueRevenue per dayTime-series aggregation
active_productsProducts with ordersEXISTS subquery

Exercises

1. Watch an INSERT propagate

-- Current state
SELECT * FROM sales_by_region ORDER BY region;

-- Insert a new order
INSERT INTO orders (customer_id, product_id, quantity, order_date)
VALUES (1, 1, 10, CURRENT_DATE);

-- After ~1 s the stream table refreshes
SELECT * FROM sales_by_region ORDER BY region;

2. Inspect pg_trickle internals

-- Overall health
SELECT * FROM pgtrickle.health_check();

-- Status of all stream tables
SELECT name, status, refresh_mode, staleness
FROM pgtrickle.pgt_status()
ORDER BY name;

-- Recent refresh activity
SELECT start_time, stream_table, action, status, duration_ms
FROM pgtrickle.refresh_timeline(10);

-- Delta SQL for a stream table
SELECT pgtrickle.explain_st('sales_by_region');

-- Change buffer sizes
SELECT * FROM pgtrickle.change_buffer_sizes();

3. Update and Delete

-- Update a product price
UPDATE products SET price = 99.99 WHERE name = 'Widget';

-- customer_lifetime_value re-calculates
SELECT * FROM customer_lifetime_value ORDER BY total_revenue DESC LIMIT 5;

-- Delete a customer's orders
DELETE FROM orders WHERE customer_id = 3;

-- Stream tables reflect the removal
SELECT * FROM sales_by_region ORDER BY region;

4. Create your own stream table

SELECT pgtrickle.create_stream_table(
    name     => 'my_experiment',
    query    => $$
        SELECT p.category,
               COUNT(DISTINCT o.customer_id) AS unique_buyers,
               SUM(o.quantity)               AS total_units
        FROM orders o
        JOIN products p ON p.id = o.product_id
        GROUP BY p.category
        HAVING SUM(o.quantity) > 5
    $$,
    schedule => '2s'
);

SELECT * FROM my_experiment;

Tear Down

docker compose down -v

The -v flag removes the data volume. Omit it if you want to keep your changes.


Next Steps

Real-time Demo

This demo shows pg_trickle doing real work: a continuous stream of events flows into PostgreSQL, and a DAG of stream tables keeps a live view of that data up to date — automatically, incrementally, and within seconds.

Three scenarios are available via the DEMO_SCENARIO environment variable:

ScenarioDefault?Pipeline
fraudFinancial fraud detection — 9-node, 4-layer DAG over a transaction stream
ecommerceE-commerce analytics — 6-node DAG over a continuous order stream
financeFinancial risk analytics — 10-level deep DAG with only leaf schedules (CALCULATED throughout)

Each scenario includes two purpose-built differential efficiency showcases: stream tables with sub-1.0 change ratios that demonstrate when DIFFERENTIAL mode is clearly the right choice.

It is the fastest way to see how stream tables, differential refresh, and DAG-aware scheduling work together on data you can watch moving.


Quick Start

cd demo

# E-commerce analytics (default)
docker compose up --build

# Fraud detection
DEMO_SCENARIO=fraud docker compose up --build

# Financial risk analytics (10-level DAG with deep calculated dependencies)
DEMO_SCENARIO=finance docker compose up --build

Open http://localhost:8080 — the dashboard refreshes every 2 seconds.

To stop and remove all data:

docker compose down -v

Switching scenarios requires removing the old data volume:

docker compose down -v
DEMO_SCENARIO=fraud docker compose up

Or use any of the three: fraud, ecommerce, finance.


What the Demo Does

Three Docker services start together:

ServiceRole
postgresPostgreSQL 18 with pg_trickle; initialises the schema, seed data, and all stream tables on first boot
generatorPython script that continuously inserts events; periodically triggers bursts that stress-test differential refresh
dashboardFlask web app served at http://localhost:8080; reads from stream tables and auto-refreshes every 2 seconds

The DEMO_SCENARIO variable controls which SQL files are loaded on startup and which generator/dashboard module is activated. Both scenarios share the same Docker Compose services.


Scenario: fraud

The demo models the data pipeline a financial institution might build to spot suspicious activity as it happens, not hours later in a batch job.

Source Data

Four regular PostgreSQL tables hold the reference data:

TableContents
users30 users, each with a name, country, and account age
merchants15 merchants across categories: Retail, Electronics, Travel, Food, Pharmacy, Gambling, Crypto
transactionsThe live stream — the generator inserts here continuously
merchant_risk_tierSlowly-changing risk tier (STANDARD / ELEVATED / HIGH) for each merchant; the generator rotates one merchant's tier every ~30 cycles

transactions is the only table that grows continuously. merchant_risk_tier changes occasionally (about one row per minute). Everything else is static.

Normal vs. Suspicious Traffic

The generator creates two kinds of transactions:

Normal traffic — a random user buys something from a random merchant at a plausible amount for that merchant's category. Inserted at roughly one per second.

Suspicious burst — every ~45 seconds, the generator picks one user and fires 6–14 rapid transactions (0.15–0.45 s apart) at Crypto or Gambling merchants, with amounts that escalate with each successive transaction. This pattern is designed to cross the risk thresholds and light up the HIGH-risk column on the dashboard.


The DAG of Stream Tables (fraud)

All nine stream tables are defined in demo/postgres/fraud/02_stream_tables.sql.

  Base tables             Layer 1 — Silver           Layer 2 — Gold              Layer 3 — Platinum
  ────────────            ──────────────────────     ─────────────────────       ──────────────────────

  ┌──────────┐            ┌──────────────────┐
  │  users   │───────────►│  user_velocity   │──────────────────────────────────►┌──────────────┐
  └──────────┘            │  DIFFERENTIAL 1s │                                   │ country_risk │
                          └──────┬───────────┘                                   │  DIFF, calc  │
                                 │                                                └──────────────┘
  ┌──────────────┐               │  ┌──────────────────┐
  │ transactions │───────────────┼─►│  merchant_stats  │
  │  (stream)    │               │  │  DIFFERENTIAL 1s │
  └──────────────┘               │  └──────┬───────────┘
          │                      │         │
          │          ┌───────────┴─────────┘ ← DIAMOND DEPENDENCY
          │          │
          │          ▼                                                             ┌─────────────────┐
          │     ┌────────────────────┐                                             │  alert_summary  │
          │     │    risk_scores     │────────────────────────────────────────────►│   DIFF, calc    │
          │     │   FULL, calc       │                                             └─────────────────┘
          │     └────────────────────┘
          │                                                                         ┌───────────────────────┐
          │                                                                         │  top_risky_merchants  │
          └─────────────────────────────────────────────────────────────────────►  │   DIFF, calc          │
                                                                                    └───────────┬───────────┘
  ┌──────────┐            ┌──────────────────┐                                                 │
  │merchants │───────────►│ category_volume  │          ┌──────────────────────────────────────▼──────┐
  └──────────┘            │  DIFFERENTIAL 1s │          │       top_10_risky_merchants                │
                          └──────────────────┘          │  DIFFERENTIAL 5s  ← SHOWCASE #2             │
                                                         │  change ratio ≈ 0.25 (LIMIT 10)             │
  ┌───────────────────┐   ┌──────────────────────┐      └─────────────────────────────────────────────┘
  │ merchant_risk_tier│──►│ merchant_tier_stats  │  ← SHOWCASE #1
  │ (slowly-changing) │   │   DIFFERENTIAL 5s    │     change ratio ≈ 0.07
  └───────────────────┘   │                      │
  ┌──────────┐            │                      │
  │merchants │───────────►│                      │
  └──────────┘            └──────────────────────┘

Layer 1 — Silver: Direct Aggregates

These three stream tables each read directly from the base tables and refresh every second using DIFFERENTIAL mode. pg_trickle calculates only what changed since the last refresh — if five new transactions arrived, it adjusts exactly the five affected aggregate buckets rather than recomputing the full table.

user_velocity — per-user transaction statistics

For each of the 30 users, keeps a running count of transactions, total spend, average transaction amount, and how many distinct merchants they have visited. This is the core input for detecting users who are suddenly transacting far more than usual.

SELECT u.id, u.name, u.country,
       COUNT(t.id)               AS txn_count,
       SUM(t.amount)             AS total_spent,
       ROUND(AVG(t.amount), 2)   AS avg_txn_amount,
       COUNT(DISTINCT t.merchant_id) AS unique_merchants
FROM users u
LEFT JOIN transactions t ON t.user_id = u.id
GROUP BY u.id, u.name, u.country

merchant_stats — per-merchant baseline

Tracks how many transactions each merchant typically sees and at what amounts. A transaction that is 3× the merchant's own average is more suspicious than one that is 3× the user's average — this table supplies that context.

category_volume — industry-level view

Groups by merchant category (Crypto, Gambling, Retail, etc.) so the dashboard can show which sectors are hot at any moment. Refreshes every second and uses DIFFERENTIAL: a new transaction in Electronics updates only the Electronics row.

Layer 2 — Gold: Derived Metrics

These stream tables read from the Layer 1 stream tables (stream tables reading stream tables). pg_trickle's scheduler refreshes Layer 1 first, then triggers Layer 2 automatically — you never schedule this yourself.

risk_scores — the diamond convergence node

This is the most interesting table in the DAG. It joins:

  • transactions — the raw event
  • user_velocity — that user's accumulated behaviour (Layer 1)
  • merchant_stats — that merchant's baseline (Layer 1)

Because transactions feeds both user_velocity and merchant_stats independently, and risk_scores depends on both, this creates a classic diamond dependency:

transactions ──→ user_velocity  ──┐
                                   ├──→ risk_scores
transactions ──→ merchant_stats ──┘

pg_trickle detects this diamond and schedules both Layer 1 nodes before triggering the Layer 2 refresh, so risk_scores always sees fresh context.

The risk scoring logic lives entirely in SQL:

CASE
    WHEN uv.txn_count > 20
         AND t.amount > 3 * uv.avg_txn_amount
        THEN 'HIGH'
    WHEN uv.txn_count > 10
         OR  t.amount > 2 * uv.avg_txn_amount
        THEN 'MEDIUM'
    ELSE 'LOW'
END AS risk_level

risk_scores uses refresh_mode => 'FULL' because it joins three sources (including two stream tables) in a way that requires re-evaluating all rows to maintain correctness. FULL mode is the right choice here — it is still fast because it is triggered only when an upstream actually changes.

country_risk — geographic rollup

Reads user_velocity (Layer 1) and aggregates by country. Pure DIFFERENTIAL because it is a simple GROUP BY country over the Silver layer.

Layer 3 — Platinum: Executive Roll-Ups

These stream tables read from risk_scores (Layer 2) and are defined with schedule => 'calculated', meaning pg_trickle fires them automatically whenever their upstream changes.

alert_summary — the primary KPI

Counts and totals by risk level (LOW / MEDIUM / HIGH). This is what drives the four big counters at the top of the dashboard. Because it is a simple aggregate over risk_scores, it uses DIFFERENTIAL mode and updates only the affected risk-level row on each cycle.

top_risky_merchants — merchant triage list

Groups risk_scores by merchant name and category, counting how many HIGH and MEDIUM transactions each merchant has seen, plus a risk-rate percentage. Operationally this is where a fraud team would start when deciding which merchants to review or block.

Differential Efficiency Showcases

Two additional stream tables sit outside the main fraud pipeline. Their purpose is to demonstrate that DIFFERENTIAL mode can achieve a meaningfully sub-1.0 change ratio when the output cardinality is constrained.

merchant_tier_stats — Showcase #1: slowly-changing lookup source

Joins merchants (static) with merchant_risk_tier (a 15-row lookup that the generator updates one row per ~30 cycles). Because no fast-growing table is in the query, only the one rotated merchant's row changes each cycle:

  • Change ratio ≈ 1/15 ≈ 0.07
  • Refresh Mode Advisor recommendation: ✓ KEEP DIFFERENTIAL
  • Schedule: 5 s (independent of the main DAG)

This is the counterpoint to risk_scores. risk_scores correctly uses FULL because its change ratio is ~1.0; merchant_tier_stats correctly uses DIFFERENTIAL because its change ratio is ~0.07. Seeing both on the same dashboard makes the advisor's logic concrete.

top_10_risky_merchants — Showcase #2: fixed-cardinality output

Reads top_risky_merchants (Layer 3) and applies LIMIT 10. Even though the upstream changes heavily every cycle, only the merchants whose rank crosses the top-10 boundary produce a net change in the output. Typically 2–3 merchants enter or leave the top 10 per refresh cycle:

  • Change ratio ≈ 0.2–0.3
  • Refresh Mode Advisor recommendation: ✓ KEEP DIFFERENTIAL
  • Schedule: 5 s

The Dashboard

Open http://localhost:8080. Panels update every 2 seconds.

KPI Row

Four counters: total transactions, LOW / MEDIUM / HIGH counts. Below them, a proportional colour bar that makes the risk mix visible at a glance.

Recent Alerts

A live table of the most recent HIGH and MEDIUM transactions — transaction ID, user name, merchant, category, amount, and risk badge. Rows turn red for HIGH and amber for MEDIUM. During a burst you will see this fill with HIGH rows as the generator fires rapid transactions.

Merchant Risk Leaderboard

Sorted by HIGH-risk count descending. Crypto and Gambling merchants will typically appear at the top because the generator targets them during bursts.

User Velocity, Country Overview, Category Volume

Three side-by-side tables driven by the Layer 1 stream tables. These are the most-refreshed tables in the DAG (every second); watching user velocity change in real time illustrates why DIFFERENTIAL mode matters — only the affected rows move.

Merchant Tier Stats and Tiers

Two panels driven by merchant_tier_stats. The left panel shows the full 15-row output (merchant ID, name, category, current tier, risk score, and when the tier last changed). The right panel shows a compact tier-only view. Tiers are colour-coded: HIGH = red, ELEVATED = amber, STANDARD = green. Tiers rotate visibly every ~30 generator cycles (roughly once per minute).

Top 10 Risky Merchants Leaderboard

A live leaderboard driven by top_10_risky_merchants. Shows rank, merchant name, category, total transactions, HIGH and MEDIUM risk counts, and a risk-rate percentage. The percentage column is coloured green (<25%), amber (25–49%), or red (≥50%). Watch the rankings shift as the generator's burst patterns accumulate.

Stream Table Status

A compact status panel showing each stream table's refresh mode, schedule, and whether it is populated. This reads from pgtrickle.pgt_status().

DAG Topology

A collapsible ASCII diagram showing the full dependency graph. Useful as a reference while exploring the database directly.


Exploring the Database (fraud)

Connect directly to inspect the stream tables and pg_trickle internals:

docker compose exec postgres psql -U demo -d fraud_demo

Check all stream table status:

SELECT name, status, refresh_mode, is_populated, staleness
FROM pgtrickle.pgt_status();

Inspect the DAG dependency graph (full tree):

SELECT tree_line FROM pgtrickle.dependency_tree()
ORDER BY tree_line;

See the most recent refresh history:

SELECT st.pgt_name, rh.action, rh.status, 
       (rh.end_time - rh.start_time) AS duration, rh.start_time
FROM pgtrickle.pgt_refresh_history rh
JOIN pgtrickle.pgt_stream_tables st ON st.pgt_id = rh.pgt_id
ORDER BY rh.start_time DESC
LIMIT 20;

Spot the diamond groups (stream tables with shared sources):

SELECT group_id, member_name, is_convergence, schedule_policy
FROM pgtrickle.diamond_groups()
ORDER BY group_id, is_convergence DESC;

Watch a HIGH-risk alert appear:

-- In one terminal: watch for new HIGH rows
SELECT txn_id, user_name, merchant_name, amount
FROM risk_scores
WHERE risk_level = 'HIGH'
ORDER BY txn_id DESC
LIMIT 5;

-- Wait ~45 seconds for the next burst, then run the query again.

Force a manual refresh:

SELECT pgtrickle.refresh_stream_table('risk_scores');

How the Files Are Organised

demo/
├── docker-compose.yml          # Service definitions; DEMO_SCENARIO selects the scenario
├── README.md                   # Quick start + scenario descriptions
│
├── postgres/
│   ├── fraud/
│   │   ├── 01_schema.sql       # Base tables + seed data (30 users, 15 merchants,
│   │   │                       # 40 initial transactions, merchant_risk_tier)
│   │   └── 02_stream_tables.sql# All 9 stream table definitions (Layers 1–3 + showcases)
│   ├── ecommerce/
│   │   ├── 01_schema.sql       # Base tables + seed data (customers, products,
│   │   │                       # categories, orders, product_catalog)
│   │   └── 02_stream_tables.sql# All 6 stream table definitions (Layers 1–3 + showcases)
│   └── finance/
│       ├── 01_schema.sql       # Base tables + seed data (30 instruments, 50 accounts,
│       │                       # 5 portfolios, 8 sectors, seed trades/prices)
│       └── 02_stream_tables.sql# All 10 stream table definitions (L1–L10 cascade)
│
├── generator/
│   ├── Dockerfile
│   ├── requirements.txt        # psycopg2-binary only
│   ├── generate.py             # Scenario dispatcher; reads DEMO_SCENARIO
│   └── scenarios/
│       ├── fraud.py            # Transaction generator (normal + burst mode)
│       ├── ecommerce.py        # Order generator (normal + flash sale mode)
│       └── finance.py          # Trade + price tick generator (normal + algo burst mode)
│
└── dashboard/
    ├── Dockerfile
    ├── requirements.txt        # flask + psycopg2-binary
    ├── app.py                  # Scenario dispatcher: / → HTML, /api/data → JSON,
    │                           #   /api/internals → stream table metadata (shared)
    └── scenarios/
        ├── fraud.py            # Fraud HTML, DAG diagram, and data queries
        ├── ecommerce.py        # E-commerce HTML, DAG diagram, and data queries
        └── finance.py          # Finance HTML, DAG diagram, and data queries

Why These Design Choices

Why seven stream tables instead of one big query?

Splitting the computation across three layers means each layer does a smaller, cheaper differential computation. A single-query approach that joins users, transactions, and merchants directly into risk_scores would force a FULL refresh every cycle because no single source table captures all changes. With the layered approach:

  • L1 tables catch source changes differentially (they are cheap to maintain)
  • L2/L3 tables read from already-aggregated L1 data (smaller join inputs)
  • The scheduler only re-runs a layer when its inputs actually changed

Why is risk_scores FULL and not DIFFERENTIAL?

risk_scores is a 1:1 projection of transactions — one output row per input transaction, with enrichment from the L1 stream tables. Since transactions is append-only, the change ratio is ~1.0: every refresh adds roughly as many new rows as existed before, making DIFFERENTIAL no more efficient than FULL. Additionally, FULL mode simplifies the refresh logic (avoiding complex multi-source delta algebra) while remaining fast because the table is small and updates are infrequent (only triggered when L1 feeds new data).

pg_trickle's diagnostic system (recommend_refresh_mode()) confirms this choice: it observed an avg change ratio of 1.0 and recommended to KEEP FULL.

The L3 tables (alert_summary, top_risky_merchants) are purely single-upstream aggregates over risk_scores and therefore support DIFFERENTIAL efficiently — the change ratio there is much lower.

Why schedule => 'calculated' for L2 and L3?

calculated means "refresh whenever an upstream stream table has new data." This is the right choice for derived layers: there is no point refreshing risk_scores if neither user_velocity nor merchant_stats has changed, and there is no point waiting for a clock interval when they have.

Why triggers and not logical replication?

The demo uses the default CDC mode (row-level AFTER triggers). This works with any PostgreSQL 18 installation out of the box — no replication slot configuration, no wal_level = logical requirement. For production deployments with very high write throughput, WAL-based CDC is more efficient. pg_trickle can switch modes transparently; see CONFIGURATION.md for details.

Empirical optimization: FULL vs DIFFERENTIAL by change ratio

The demo illustrates a practical rule of thumb: when a stream table's change ratio (fraction of output rows that are inserted or deleted per refresh cycle) is high (>0.5), FULL mode is often faster than DIFFERENTIAL because the delta overhead dominates the benefit. Use pgtrickle.recommend_refresh_mode(table_name) to check — it analyzes actual refresh history and recommends the best mode with confidence scores.

The Refresh Mode Advisor computes change ratio as:

change_ratio = (rows_inserted + rows_deleted) / max(reltuples, 1)

where reltuples is the stream table's current row count from pg_class. This gives a meaningful fraction: 0.07 means 7% of output rows changed last cycle; 1.0 means the entire output turned over.

The two showcase tables make this concrete:

TableChange ratioAdvisor says
merchant_tier_stats≈ 0.07✓ KEEP DIFFERENTIAL
top_10_risky_merchants≈ 0.25✓ KEEP DIFFERENTIAL
risk_scores≈ 1.0KEEP FULL (append-only source)
alert_summary≈ 1.0KEEP FULL (small table; delta overhead dominates)

Scenario: ecommerce (default)

The e-commerce scenario models a real-time online store analytics pipeline with orders streaming in continuously.

cd demo
DEMO_SCENARIO=ecommerce docker compose up --build

Source Data

TableContents
customers30 customers with name and country
products15 products across 8 categories (Electronics, Clothing, Sports, etc.)
categories8 product categories
ordersThe live stream — the generator inserts here continuously
product_catalogSlowly-changing current price per product; the generator reprices one product every ~30 cycles

orders is the only table that grows continuously. product_catalog changes occasionally (about one row per minute). Everything else is static.

Normal vs. Flash Sale Traffic

Normal orders — a random customer orders a random product in quantity 1–2 at roughly the catalog price (±15%). Inserted at roughly one per second.

Flash sale burst — every ~45 seconds, the generator picks one category and fires 8–18 rapid orders (0.10–0.35 s apart) at a 70–90% discount. This creates a visible revenue spike in the Category Revenue panel.

The DAG of Stream Tables (ecommerce)

All six stream tables are defined in demo/postgres/ecommerce/02_stream_tables.sql.

  Base tables          Layer 1 — Silver           Layer 2 — Gold         Layer 3 — Platinum
  ────────────         ──────────────────────      ─────────────────────  ──────────────────────

  ┌────────────┐       ┌──────────────────┐
  │ customers  │──────►│ customer_stats   │──────────────────────────────►┌──────────────────┐
  └────────────┘       │  DIFFERENTIAL 1s │                               │ country_revenue  │
                       └──────────────────┘                               │  DIFF, calc      │
                                │                                          └──────────────────┘
  ┌────────────┐                │
  │  orders    │────────────────┘
  │ (stream)   │────────────────────────────────►┌────────────────┐
  └────────────┘       ┌────────────────┐         │ product_sales  │
  ┌────────────┐       │category_revenue│         │ DIFFERENTIAL   │
  │  products  │──────►│ DIFFERENTIAL   │         │ 1s             │
  │ categories │──────►│ 1s             │         └────────────────┘
  └─────┬──────┘       └────────────────┘
        │
  ┌─────▼──────────────┐   ┌──────────────────────┐
  │  product_catalog   │──►│ catalog_price_impact │  ← DIFFERENTIAL SHOWCASE #1
  │  (slowly-changing) │   │   DIFFERENTIAL 5s    │    change ratio ≈ 0.07
  └────────────────────┘   └──────────────────────┘

  ┌──────────────────┐   ┌──────────────────┐
  │  customer_stats  │──►│  top_10_customers│  ← DIFFERENTIAL SHOWCASE #2
  │  DIFFERENTIAL 1s │   │  DIFFERENTIAL    │    change ratio ≈ 0.1–0.2
  └──────────────────┘   │  calc, LIMIT 10  │
                         └──────────────────┘

Stream Tables (ecommerce)

NameLayerModeScheduleWhat it computes
product_salesL1DIFFERENTIAL1 sPer-product: units sold, revenue, avg selling price
customer_statsL1DIFFERENTIAL1 sPer-customer: order count, total spent, avg order value
category_revenueL1DIFFERENTIAL1 sPer-category: orders, units, revenue
country_revenueL2DIFFERENTIALcalculatedPer-country: roll-up from customer_stats
catalog_price_impactshowcaseDIFFERENTIAL5 sPer-product: current vs. base price delta
top_10_customersshowcaseDIFFERENTIALcalculatedTop 10 customers by total spend (LIMIT 10)

Differential efficiency showcases (ecommerce)

catalog_price_impact — Showcase #1: slowly-changing price catalog

Joins products (static) with product_catalog (15 rows, one repriced per ~30 cycles). Only the repriced product's row changes each cycle:

  • Change ratio ≈ 1/15 ≈ 0.07
  • Refresh Mode Advisor recommendation: ✓ KEEP DIFFERENTIAL
  • The price_delta and pct_change columns highlight repriced products in real time.

top_10_customers — Showcase #2: fixed-cardinality leaderboard

Reads customer_stats (all 30 customers) and applies LIMIT 10. Only rank boundary crossings produce net output changes:

  • Change ratio ≈ 0.1–0.2
  • Refresh Mode Advisor recommendation: ✓ KEEP DIFFERENTIAL

Exploring the Database (ecommerce)

docker compose exec postgres psql -U demo -d ecommerce_demo
-- See category revenue live
SELECT category, order_count, units_sold, revenue
FROM   category_revenue ORDER BY revenue DESC;

-- Top 10 customers leaderboard
SELECT * FROM top_10_customers;

-- Price changes vs. base price
SELECT product_name, base_price, current_price, pct_change
FROM   catalog_price_impact ORDER BY ABS(pct_change) DESC;

-- Refresh efficiency comparison across all stream tables
SELECT pgt_name, avg_diff_ms, diff_speedup, avg_change_ratio
FROM   pgtrickle.refresh_efficiency() ORDER BY pgt_name;

Scenario: finance

The finance scenario models a real-time financial risk analytics pipeline demonstrating a 10-level deep DAG where only the leaf tables have fixed schedules, and all downstream layers cascade with schedule => 'calculated'. This showcases pg_trickle's strength in maintaining deep computation graphs where derived data automatically flows upward through the layers.

cd demo
DEMO_SCENARIO=finance docker compose up --build

Source Data

TableContents
sectors8 financial sectors (Technology, Healthcare, Finance, Energy, etc.)
instruments30 financial instruments (stocks, bonds, commodities)
accounts50 trading accounts
portfolios5 portfolios (institutional investors)
tradesThe live stream — the generator inserts here continuously with buy/sell orders
market_pricesSlowly-changing OHLC prices per instrument; the generator updates one instrument per cycle with realistic bid/ask/mid prices

trades is append-only; market_prices updates continuously (one instrument per generator cycle). Everything else is static reference data.

Real-Time Market Activity

Normal trading — the generator randomly selects accounts and instruments, inserting buy/sell trades at current market prices (~1 trade per second). Buy quantities are positive; sell quantities are negative. This creates a natural, continuous flow of position changes.

Algo bursts — every ~60 seconds, the generator triggers an algorithmic burst (0.05–0.20s between trades, 8–20 trades) concentrated on a few instruments, simulating realistic high-frequency trading patterns that stress-test the differential calculation pipeline.

The DAG of Stream Tables (finance)

All 10 stream tables are defined in demo/postgres/finance/02_stream_tables.sql.

This is the deepest DAG in all three scenarios — a 10-level cascade where each level reads only from the previous level (plus static reference data), creating a linear dependency chain. The two leaf tables (price_snapshot at 2s and net_positions at 1s) are the only ones with fixed schedules; all nine downstream layers use schedule => 'calculated' to propagate changes automatically.

  Base tables          Layer 1          Layer 2          ...         Layer 10
  ────────────         ────────           ────────                   ─────────

  ┌──────────────┐     ┌──────────────────────┐
  │market_prices │────►│  price_snapshot      │
  │(stream, 2s)  │     │  DIFFERENTIAL 2s     │
  └──────────────┘     └─────────┬────────────┘
                                 │
  ┌──────────────┐               │     ┌────────────────┐
  │   trades     │──────────┐    └────►│position_values │  L2 (calculated)
  │(stream, 1s)  │          │          │ DIFFERENTIAL   │
  └──────────────┘          │          └─────────┬──────┘
         │                  │                    │
         │                  ▼                    ▼
         │          ┌────────────────┐   ┌───────────────┐
         │          │net_positions   │   │ account_pnl   │  L3 (calculated)
         │          │DIFFERENTIAL 1s │   │ DIFFERENTIAL  │
         │          └────────────────┘   └───────┬───────┘
         │                                       │
         │                  ┌──────────────────────┤
         │                  │                      │
         │                  ▼                      ▼
         └─────────────────►┌────────────────────────────────┐
                            │ portfolio_pnl, sector_exposure │  L4–L5 (calculated)
                            │ DIFFERENTIAL                   │
                            └────────────────┬───────────────┘
                                             │
                                             ▼
                            ┌────────────────────────────┐
                            │  var_contributions         │  L6 (calculated)
                            │  account_var, portfolio_var│  L7–L8 (calculated)
                            │  DIFFERENTIAL throughout   │
                            └────────────────┬───────────┘
                                             │
                                             ▼
                            ┌────────────────────────────┐
                            │ regulatory_capital         │  L9 (calculated)
                            │ DIFFERENTIAL               │
                            └────────────────┬───────────┘
                                             │
                                             ▼
                            ┌────────────────────────────┐
                            │ breach_dashboard           │  L10 (calculated)
                            │ DIFFERENTIAL, LIMIT 10     │  Fixed cardinality
                            └────────────────────────────┘

Stream Tables (finance)

LayerNameModeScheduleWhat it computes
L1price_snapshotDIFFERENTIAL2s (fixed)Per-instrument: current bid/ask/mid, sector affinity
L1net_positionsDIFFERENTIAL1s (fixed)Per-account-instrument: net quantity, position status
L2position_valuesDIFFERENTIALcalculatedPer-account-instrument: marked-to-market value (qty × mid)
L3account_pnlDIFFERENTIALcalculatedPer-account: total P&L, position count, exposure
L4portfolio_pnlDIFFERENTIALcalculatedPer-portfolio: aggregate P&L from accounts
L5sector_exposureDIFFERENTIALcalculatedPer-sector: total exposure, position count
L6var_contributionsDIFFERENTIALcalculatedPer-position: parametric Value-at-Risk (95% & 99%)
L7account_varDIFFERENTIALcalculatedPer-account: aggregated VaR, diversification
L8portfolio_varDIFFERENTIALcalculatedPer-portfolio: total VaR + stressed scenario (99%)
L9regulatory_capitalDIFFERENTIALcalculatedPer-portfolio: Basel simplified capital requirement
L10breach_dashboardDIFFERENTIALcalculatedTop 10 portfolios by capital utilization ratio (LIMIT 10)

Why This DAG Demonstrates Differential Efficiency

The finance scenario is built specifically to show how DIFFERENTIAL mode excels in deep DAGs:

  1. High cardinality at L1 — The 30 × 50 instrument-account combinations yield up to 1,500 potential positions. A price tick affects ~30 positions (change ratio ≈ 0.02).

  2. Compressing cardinality downstream — As data aggregates up the layers (L2 → L3 → L4 ...), the output cardinality shrinks. By L5 (sector exposure), there are only 8 rows max. By L10 (top 10 portfolios), there are ≤10 rows.

  3. Sub-1.0 change ratios throughout — Because each layer reads only from the previous layer and applies aggregation or filtering:

    • L2 (position_values): change ratio ≈ 0.02 (only changed positions)
    • L5 (sector_exposure): change ratio ~1 row per tick (typically only 1–2 sectors affected)
    • L10 (breach_dashboard): change ratio ≈ 0–0.1 (typically 0–1 row changes per cycle; fixed cardinality means rank shifts are rare)
  4. All-DIFFERENTIAL cascade — Unlike the fraud scenario (which uses one FULL table), finance uses DIFFERENTIAL throughout. This is valid because each layer is a simple aggregate or filtered view of the previous layer (no complex diamond joins with multiple independent sources).

The Dashboard

Open http://localhost:8080. Panels update every 2 seconds.

Key metrics:

  • Top KPIs — Total positions tracked, total book exposure (USD), portfolio P&L, aggregate portfolio VaR
  • Breach Dashboard — Top 10 portfolios by capital utilization ratio (regulatory capital used / capital limit). Highlights portfolios approaching regulatory constraints.
  • Sector Exposure — Per-sector breakdown of position count and net exposure.
  • Portfolio Metrics — Per-portfolio P&L and 95% VaR.
  • Market Snapshot — Current bid/ask/mid prices for all 30 instruments with sector labels.
  • Stream Table Status — Refresh mode, schedule, and population status for all 10 tables.
  • DAG Topology — ASCII diagram showing the 10-level cascade and dependency flow.

Exploring the Database (finance)

docker compose exec postgres psql -U demo -d finance_demo
-- See all stream table status
SELECT pgt_name, refresh_mode, schedule, is_populated
FROM   pgtrickle.pgt_status()
ORDER BY pgt_id;

-- View the full 10-level dependency graph
SELECT tree_line FROM pgtrickle.dependency_tree()
ORDER BY tree_line;

-- Current market snapshot (L1)
SELECT symbol, sector, bid, ask, mid
FROM   price_snapshot
ORDER BY symbol;

-- Net positions per account-instrument (L1)
SELECT account_id, instrument_symbol, net_quantity, position_value
FROM   net_positions
WHERE  net_quantity != 0
ORDER BY account_id, instrument_symbol;

-- Top-10 portfolio risk ranking (L10)
SELECT portfolio_name, capital_required, capital_limit, utilization_ratio
FROM   breach_dashboard
ORDER BY utilization_ratio DESC;

-- Watch differential efficiency in action
-- This shows very low change ratios at the top levels (L1) and even lower at L10
SELECT pgt_name, avg_change_ratio, avg_diff_ms, diff_speedup
FROM   pgtrickle.refresh_efficiency()
ORDER BY pgt_id;

Differential Efficiency at Work

Open a terminal and watch the refresh history:

docker compose exec postgres psql -U demo -d finance_demo -c "
SELECT 
  st.pgt_name,
  rh.action,
  (rh.end_time - rh.start_time)::int8 AS duration_ms,
  rh.start_time
FROM pgtrickle.pgt_refresh_history rh
JOIN pgtrickle.pgt_stream_tables st ON st.pgt_id = rh.pgt_id
WHERE rh.start_time > now() - interval '10 seconds'
ORDER BY rh.start_time DESC
LIMIT 50;
" --watch

5-Minute Quickstart

The shortest possible introduction to pg_trickle. By the end of this page you will have created a self-maintaining table, watched it update in real time, and dropped it again — without leaving psql.

Prefer to see it first? Run the playground (cd playground && docker compose up -d) for a pre-loaded environment, or pull the prebuilt image:

docker run --rm -e POSTGRES_PASSWORD=secret -p 5432:5432 \
  ghcr.io/trickle-labs/pg_trickle:latest

Then connect with psql postgres://postgres:secret@localhost:5432/postgres and skip to Step 2 below.


Step 1 — Install the extension

If you already have a PostgreSQL 18 server with pg_trickle installed (via the playground, the Docker image, or a manual install — see Installation for full options), skip this step.

Otherwise, the shortest path on a developer machine is the prebuilt Docker image — one command, no configuration:

docker run --rm -e POSTGRES_PASSWORD=secret -p 5432:5432 \
  ghcr.io/trickle-labs/pg_trickle:latest

Connect with psql:

psql postgres://postgres:secret@localhost:5432/postgres

Step 2 — Enable the extension

CREATE EXTENSION IF NOT EXISTS pg_trickle;

That's all the configuration you need. The extension auto-discovers every database where it's installed and starts a per-database scheduler.


Step 3 — Create a source table

CREATE TABLE orders (
    id      SERIAL PRIMARY KEY,
    region  TEXT     NOT NULL,
    amount  NUMERIC  NOT NULL
);

INSERT INTO orders (region, amount) VALUES
    ('US',   100),
    ('EU',   200),
    ('US',   300),
    ('APAC',  50);

This is a perfectly ordinary table. You will write to it the normal way.


Step 4 — Create a stream table

SELECT pgtrickle.create_stream_table(
    name     => 'revenue_by_region',
    query    => $$
        SELECT region,
               SUM(amount) AS total,
               COUNT(*)    AS order_count
        FROM orders
        GROUP BY region
    $$,
    schedule => '1s'
);

What just happened:

  1. pg_trickle parsed your query and built an internal operator tree.
  2. It created a new table revenue_by_region with the right columns.
  3. It installed lightweight AFTER triggers on orders to capture changes.
  4. It ran an initial full refresh, populating the new table.
  5. It registered a 1-second refresh schedule.

Query the stream table — it's already populated:

SELECT * FROM revenue_by_region ORDER BY region;
 region | total | order_count
--------+-------+-------------
 APAC   |    50 |           1
 EU     |   200 |           1
 US     |   400 |           2

Step 5 — Watch it update

Insert a new order:

INSERT INTO orders (region, amount) VALUES ('US', 999);

Wait one second (or call SELECT pgtrickle.refresh_stream_table('revenue_by_region') to refresh immediately):

SELECT * FROM revenue_by_region WHERE region = 'US';
 region | total | order_count
--------+-------+-------------
 US     |  1399 |           3

Only the US group was recomputed — the other regions were not touched at all. That is differential refresh in action.


Step 6 — Look around

A few useful built-ins worth knowing about right away:

-- Status of all stream tables in this database
SELECT * FROM pgtrickle.pgt_status();

-- A one-shot health triage (returns rows only when something is wrong)
SELECT * FROM pgtrickle.health_check() WHERE severity != 'OK';

-- See what delta SQL pg_trickle would run on the next refresh
SELECT pgtrickle.explain_st('revenue_by_region');

Step 7 — Clean up

SELECT pgtrickle.drop_stream_table('revenue_by_region');
DROP TABLE orders;

This removes the stream table, its catalog entries, and the CDC triggers on orders.


Where to go next

If you want to…Read
See a multi-table tutorial with chains and aggregates15-Minute Tutorial
Walk through every feature in depthIn-Depth Tour
Browse common patterns and example appsUse Cases · Patterns
Understand how it works underneathArchitecture
Look up a function, GUC, or operatorSQL Reference · Configuration
Deploy it to productionPre-Deployment Checklist
Decode a piece of jargonGlossary

Getting Started with pg_trickle

What is pg_trickle?

pg_trickle adds stream tables to PostgreSQL — tables that are defined by a SQL query and kept automatically up to date as the underlying data changes. Think of them as materialized views that refresh themselves, but smarter: instead of re-running the entire query on every refresh, pg_trickle uses Incremental View Maintenance (IVM) to process only the rows that changed.

Traditional materialized views force a choice: either re-run the full query (expensive) or accept stale data. pg_trickle eliminates this trade-off. When you insert a single row into a million-row table, pg_trickle computes the effect of that one row on the query result — it doesn't touch the other 999,999.

How data flows

The key concept is that data flows downstream automatically — from your base tables through any chain of stream tables, without you writing a single line of orchestration code:

  You write to base tables
         │
         ▼
  ┌─────────────┐   triggers (or WAL)   ┌─────────────────────┐
  │ Base Tables │ ─────────────────────▶ │   Change Buffers    │
  │ (you write) │                        │ (pgtrickle_changes.*) │
  └─────────────┘                        └──────────┬──────────┘
                                                     │
                                           delta query (ΔQ) on refresh
                                                     │
                                                     ▼
  ┌──────────────────────────────────────────────────────────────┐
  │  Stream Table A  ◀── depends on base tables                  │
  └──────────────────────────┬───────────────────────────────────┘
                             │  change captured, buffer written
                             ▼
  ┌──────────────────────────────────────────────────────────────┐
  │  Stream Table B  ◀── depends on Stream Table A               │
  └──────────────────────────────────────────────────────────────┘

One write to a base table can ripple through an entire DAG of stream tables — each layer refreshed in the correct topological order, each doing only the work proportional to what actually changed.

  1. You write to your base tables normally — INSERT, UPDATE, DELETE
  2. Lightweight AFTER row-level triggers capture each change into a buffer, atomically in the same transaction. No polling, no logical replication slots required by default.
  3. On each refresh cycle, pg_trickle derives a delta query (ΔQ) that reads only the buffered changes since the last refresh frontier
  4. The delta is merged into the stream table — only the affected rows are written
  5. If other stream tables depend on this one, they are scheduled next (topological order)
  6. Optionally: once wal_level = logical is available and the first refresh succeeds, pg_trickle automatically transitions from triggers to WAL-based CDC (near-zero write-path overhead compared to ~2–15 μs for triggers). The transition is seamless and transparent.

This tutorial walks through a concrete org-chart example so you can see this flow end to end, including a chain of stream tables that propagates changes automatically.


Prerequisites

  • PostgreSQL 18.x with pg_trickle installed (see Installation)
  • shared_preload_libraries = 'pg_trickle' in postgresql.conf
  • max_worker_processes raised to at least 32 (see Installation); the PostgreSQL default of 8 is often exhausted if you have several databases, causing stream tables to silently stop refreshing
  • psql or any SQL client

Deploying to production? See the Pre-Deployment Checklist for a complete list of requirements, pooler compatibility, and recommended GUC values.

Playground: The fastest way to experiment is the playground — a Docker Compose environment with sample tables and stream tables pre-loaded. cd playground && docker compose up -d and you're running.

Quick start with Docker: Pull the pre-built GHCR image — PostgreSQL 18.3 + pg_trickle ready to run, no configuration needed:

docker run --rm -e POSTGRES_PASSWORD=secret -p 5432:5432 ghcr.io/trickle-labs/pg_trickle:latest

All GUC defaults (wal_level, shared_preload_libraries, scheduler settings) are pre-configured. See Installation for tag details and volume mounting.

Security — RLS bypass: The pg_trickle background worker executes refresh queries with SET LOCAL row_security = off. This mirrors the behaviour of PostgreSQL's own REFRESH MATERIALIZED VIEW, which also bypasses Row-Level Security. As a result, stream table output is always the full, unfiltered result set regardless of any RLS policies defined on source tables. Applications must not rely on RLS to restrict what data a stream table exposes — use view filters, column masking, or a separate per-role view on top of the stream table instead. See PRE_DEPLOYMENT.md for the full security checklist.

Connect to the database you want to use and enable the extension:

CREATE EXTENSION pg_trickle;

No additional configuration is needed. pg_trickle automatically discovers all databases on the server and starts a scheduler for each one where the extension is installed.


Chapter 1: Hello World — Your First Stream Table

Before diving into multi-table joins and recursive CTEs, start with the simplest possible stream table: a single-source aggregate with no joins.

1.1 Setup

Create one table and enable the extension:

CREATE EXTENSION IF NOT EXISTS pg_trickle;

CREATE TABLE products (
    id       SERIAL PRIMARY KEY,
    category TEXT           NOT NULL,
    price    NUMERIC(10,2)  NOT NULL,
    in_stock BOOLEAN        NOT NULL DEFAULT true
);

INSERT INTO products (category, price) VALUES
    ('Electronics', 299.99),
    ('Electronics', 49.99),
    ('Books',       14.99),
    ('Books',       24.99),
    ('Books',        9.99);

1.2 Create the stream table

SELECT pgtrickle.create_stream_table(
    name     => 'category_summary',
    query    => $$
        SELECT
            category,
            COUNT(*)                    AS product_count,
            ROUND(AVG(price), 2)        AS avg_price,
            MIN(price)                  AS min_price,
            MAX(price)                  AS max_price,
            COUNT(*) FILTER (WHERE in_stock) AS in_stock_count
        FROM products
        GROUP BY category
    $$,
    schedule => '1s'
);

Query it immediately — it was populated by the initial full refresh:

SELECT category, product_count, avg_price, min_price, max_price, in_stock_count
FROM category_summary ORDER BY category;
  category   | product_count | avg_price | min_price | max_price | in_stock_count
-------------+---------------+-----------+-----------+-----------+----------------
 Books       |             3 |     16.66 |      9.99 |     24.99 |              3
 Electronics |             2 |    174.99 |     49.99 |    299.99 |              2
(2 rows)

1.3 Watch an INSERT update one group

INSERT INTO products (category, price) VALUES ('Books', 39.99);

Within ~1 second (or call SELECT pgtrickle.refresh_stream_table('category_summary') to force it):

SELECT category, product_count, avg_price, min_price, max_price, in_stock_count
FROM category_summary WHERE category = 'Books';
 category | product_count | avg_price | min_price | max_price | in_stock_count
----------+---------------+-----------+-----------+-----------+----------------
 Books    |             4 |     22.49 |      9.99 |     39.99 |              4
(1 row)

The Electronics row was not touched at all — pg_trickle read exactly 1 row from the change buffer, adjusted only the Books group.

1.4 Watch an UPDATE propagate

UPDATE products SET price = 19.99 WHERE price = 299.99;

After the next refresh:

SELECT category, product_count, avg_price, min_price, max_price, in_stock_count
FROM category_summary WHERE category = 'Electronics';
  category   | product_count | avg_price | min_price | max_price | in_stock_count
-------------+---------------+-----------+-----------+-----------+----------------
 Electronics |             2 |     34.99 |     19.99 |     49.99 |              2
(1 row)

For AVG, pg_trickle maintains running sum and count columns internally, so re-aggregating a group is O(1) regardless of group size.

1.5 What you just saw

  • A single function call created the storage table, installed CDC triggers, ran the initial full refresh, and registered a 1-second schedule.
  • Every subsequent DML on products was captured in an AFTER trigger — no polling, no logical replication.
  • Each refresh touched only the rows and groups that changed.
  • The stream table is a real PostgreSQL table — you can SELECT, index, and join against category_summary like any other table.

Clean up: SELECT pgtrickle.drop_stream_table('category_summary'); DROP TABLE products;


Chapter 2: Joins, Aggregates & Chains

What you'll build

An employee org-chart system with two stream tables:

  • department_tree — a recursive CTE that flattens a department hierarchy into paths like Company > Engineering > Backend
  • department_stats — a join + aggregation over department_tree (a stream table!) that computes headcount and salary budget, with the full path included
  • department_report — a further aggregation that rolls up stats to top-level departments

The chain departmentsdepartment_treedepartment_statsdepartment_report demonstrates automatic downstream propagation: modify a department name in the base table and all three stream tables update automatically, in the right order, without any manual orchestration.

By the end you will have:

  • Seen how stream tables are created, queried, and refreshed
  • Watched a single UPDATE in a base table cascade through three layers of stream tables automatically
  • Understood the four refresh modes and IVM strategies

Prefer dbt? A runnable dbt companion project mirrors every step below. Clone the repo and run:

./examples/dbt_getting_started/scripts/run_example.sh

See examples/dbt_getting_started/ for full details.


2.1 Create the Base Tables

These are ordinary PostgreSQL tables — pg_trickle doesn't require any special column types, annotations, or schema conventions.

Tables without a primary key work, but pg_trickle will emit a WARNING at stream table creation time: change detection falls back to a content-based hash across all columns, which is slower for wide tables and cannot distinguish between identical duplicate rows. Adding a primary key gives the best performance and most reliable change detection. A primary key is also required for automatic transition to WAL-based CDC (cdc_mode = 'auto'); without one the source table stays on trigger-based CDC.

-- Department hierarchy (self-referencing tree)
CREATE TABLE departments (
    id         SERIAL PRIMARY KEY,
    name       TEXT NOT NULL,
    parent_id  INT REFERENCES departments(id)
);

-- Employees belong to a department
CREATE TABLE employees (
    id            SERIAL PRIMARY KEY,
    name          TEXT NOT NULL,
    department_id INT NOT NULL REFERENCES departments(id),
    salary        NUMERIC(10,2) NOT NULL
);

Now insert some data — a three-level department tree and a handful of employees:

-- Top-level
INSERT INTO departments (id, name, parent_id) VALUES
    (1, 'Company',     NULL);

-- Second level
INSERT INTO departments (id, name, parent_id) VALUES
    (2, 'Engineering', 1),
    (3, 'Sales',       1),
    (4, 'Operations',  1);

-- Third level (under Engineering)
INSERT INTO departments (id, name, parent_id) VALUES
    (5, 'Backend',     2),
    (6, 'Frontend',    2),
    (7, 'Platform',    2);

-- Employees
INSERT INTO employees (name, department_id, salary) VALUES
    ('Alice',   5, 120000),   -- Backend
    ('Bob',     5, 115000),   -- Backend
    ('Charlie', 6, 110000),   -- Frontend
    ('Diana',   7, 130000),   -- Platform
    ('Eve',     3, 95000),    -- Sales
    ('Frank',   3, 90000),    -- Sales
    ('Grace',   4, 100000);   -- Operations

At this point these are plain tables with no triggers, no change tracking, nothing special. The department tree looks like this:

Company (1)
├── Engineering (2)
│   ├── Backend (5)     — Alice, Bob
│   ├── Frontend (6)    — Charlie
│   └── Platform (7)    — Diana
├── Sales (3)           — Eve, Frank
└── Operations (4)      — Grace

2.2 Create the First Stream Table — Recursive Hierarchy

Our first stream table flattens the department tree. For every department, it computes the full path from the root and the depth level. This uses WITH RECURSIVE — a SQL construct that can't be differentiated with simple algebraic rules (the recursion depends on itself), but pg_trickle handles it using incremental strategies (semi-naive evaluation for inserts, Delete-and-Rederive for mixed changes) that we'll explain later.

SELECT pgtrickle.create_stream_table(
    name         => 'department_tree',
    query        => $$
    WITH RECURSIVE tree AS (
        -- Base case: root departments (no parent)
        SELECT id, name, parent_id, name AS path, 0 AS depth
        FROM departments
        WHERE parent_id IS NULL

        UNION ALL

        -- Recursive step: children join back to the tree
        SELECT d.id, d.name, d.parent_id,
               tree.path || ' > ' || d.name AS path,
               tree.depth + 1
        FROM departments d
        JOIN tree ON d.parent_id = tree.id
    )
    SELECT id, name, parent_id, path, depth FROM tree
    $$,
    schedule     => '1s'
);

Note on short schedules: A 1-second schedule is safe for development and production thanks to auto_backoff (on by default since v0.10.0). If a refresh takes more than 95% of the schedule window, the scheduler automatically stretches the effective interval (up to 8× the configured schedule) to prevent CPU runaway, then resets to 1× as soon as a refresh completes on time. You will see a WARNING message when backoff activates.

v0.2.0+: create_stream_table also accepts diamond_consistency ('none' or 'atomic') and diamond_schedule_policy ('fastest' or 'slowest') for diamond-shaped dependency graphs. Schedules can be cron expressions (e.g., '*/5 * * * *', '@hourly'). Set pooler_compatibility_mode => true if you're connecting through PgBouncer or another transaction-mode connection pooler. See SQL_REFERENCE.md for the full parameter list.

What just happened?

That single function call did a lot of work atomically (all in one transaction):

  1. Parsed the defining query into an operator tree — identifying the recursive CTE, the scan on departments, the join, the union
  2. Created a storage table called department_tree in the public schema — a real PostgreSQL heap table with columns matching the SELECT output, plus internal columns __pgt_row_id (a hash used to track individual rows)
  3. Installed CDC triggers on the departments table — lightweight AFTER INSERT OR UPDATE OR DELETE row-level triggers that will capture every future change
  4. Created a change buffer table in the pgtrickle_changes schema — this is where the triggers write captured changes
  5. Ran an initial full refresh — executed the recursive query against the current data and populated the storage table
  6. Registered the stream table in pg_trickle's catalog with a 1-second refresh schedule

TRUNCATE caveat: Row-level triggers do not fire on TRUNCATE. If you TRUNCATE a base table, the change is not captured incrementally — the stream table will become stale. Use DELETE FROM table instead, or call pgtrickle.refresh_stream_table('department_tree') after a TRUNCATE. If the stream table uses DIFFERENTIAL mode, temporarily switch to FULL for a full recompute: pgtrickle.alter_stream_table('department_tree', refresh_mode => 'FULL'), refresh, then switch back. Query it immediately — it's already populated:

SELECT id, name, parent_id, path, depth FROM department_tree ORDER BY path;

Expected output:

 id |    name     | parent_id |               path               | depth
----+-------------+-----------+----------------------------------+-------
  1 | Company     |           | Company                          |     0
  2 | Engineering |         1 | Company > Engineering            |     1
  5 | Backend     |         2 | Company > Engineering > Backend  |     2
  6 | Frontend    |         2 | Company > Engineering > Frontend |     2
  7 | Platform    |         2 | Company > Engineering > Platform |     2
  4 | Operations  |         1 | Company > Operations             |     1
  3 | Sales       |         1 | Company > Sales                  |     1
(7 rows)

This is a real PostgreSQL table — you can create indexes on it, join it in other queries, reference it in views, or even use it as a source for other stream tables. pg_trickle keeps it in sync automatically.

Key insight: The recursive query that computes paths and depths would normally need to be re-run manually (or via REFRESH MATERIALIZED VIEW). With pg_trickle, it stays fresh — any change to the departments table is automatically reflected within the schedule bound (1 second here).


2.3 Chain Stream Tables — Build the Downstream Layers

Now create department_stats. The twist: instead of joining directly against departments, it joins against department_tree — the stream table we just created. This creates a chain: changes to departments update department_tree, whose changes then trigger department_stats to update.

This demonstrates how pg_trickle builds a DAG — a directed acyclic graph of stream tables — and automatically schedules refreshes in topological order.

SELECT pgtrickle.create_stream_table(
    name         => 'department_stats',
    query        => $$
    SELECT
        t.id          AS department_id,
        t.name        AS department_name,
        t.path        AS full_path,
        t.depth,
        COUNT(e.id)                    AS headcount,
        COALESCE(SUM(e.salary), 0)     AS total_salary,
        COALESCE(AVG(e.salary), 0)     AS avg_salary
    FROM department_tree t
    LEFT JOIN employees e ON e.department_id = t.id
    GROUP BY t.id, t.name, t.path, t.depth
    $$,
    schedule     => 'calculated'      -- CALCULATED: inherit schedule from downstream; see explanation below
);

What just happened — and why this one is different?

Like before, pg_trickle parsed the query, created a storage table, and set up CDC. But department_stats depends on department_tree, not a base table — so no new triggers were installed. Instead, pg_trickle registered department_tree as an upstream dependency in the DAG.

The schedule is 'calculated' (CALCULATED mode), which means: "don't give this table its own schedule — inherit the tightest schedule of any downstream table that queries it". Internally this stores NULL in the catalog, but you must pass the string 'calculated' — passing SQL NULL is an error. Since no other stream table has been created yet, it will be refreshed on demand or when a downstream dependent triggers it.

The query has no recursive CTE, so pg_trickle uses algebraic differentiation:

  1. Decomposed into operators: Scan(department_tree)LEFT JOINScan(employees)Aggregate(GROUP BY + COUNT/SUM/AVG)Project
  2. Derived a differentiation rule for each:
    • Δ(Scan) = read only change buffer rows (not the full table)
    • Δ(LEFT JOIN) = join change rows from one side against the full other side
    • Δ(Aggregate) = for COUNT/SUM/AVG, add or subtract per group — no rescan needed
  3. Composed these into a single delta query (ΔQ) that never touches unchanged rows

When one employee is inserted, the refresh reads one change buffer row, joins to find the department, and adjusts only that group's count and sum.

Query it:

SELECT department_name, full_path, headcount, total_salary
FROM department_stats
ORDER BY full_path;

Expected output:

 department_name |            full_path             | headcount | total_salary
-----------------+----------------------------------+-----------+--------------
 Company         | Company                          |         0 |            0
 Engineering     | Company > Engineering            |         0 |            0
 Backend         | Company > Engineering > Backend  |         2 |    235000.00
 Frontend        | Company > Engineering > Frontend |         1 |    110000.00
 Platform        | Company > Engineering > Platform |         1 |    130000.00
 Operations      | Company > Operations             |         1 |    100000.00
 Sales           | Company > Sales                  |         2 |    185000.00
(7 rows)

Notice that the full_path column comes from department_tree — this data already went through one layer of incremental maintenance before landing here.

Add a third layer: department_report

Now add a rollup that aggregates department_stats by top-level group (depth = 1):

SELECT pgtrickle.create_stream_table(
    name         => 'department_report',
    query        => $$
    SELECT
        split_part(full_path, ' > ', 2) AS division,
        SUM(headcount)                  AS total_headcount,
        SUM(total_salary)               AS total_payroll
    FROM department_stats
    WHERE depth >= 1
    GROUP BY 1
    $$,
    schedule     => '1s'              -- this is the only explicit schedule; CALCULATED tables above inherit it
);

The DAG is now:

departments (base)  employees (base)
      │                   │
      ▼                   │
department_tree ──────────┤
   (DIFF, CALCULATED)     │
      │                   ▼
      └──────▶ department_stats
                 (DIFF, CALCULATED)
                      │
                      ▼
               department_report
                  (DIFF, 1s)   ◀── only explicit schedule

department_report drives the whole pipeline. Because it has a 1-second schedule, pg_trickle automatically propagates that cadence upstream: department_stats and department_tree will also be refreshed within 1 second of a base table change, in topological order, with no manual configuration.

Query the report:

SELECT division, total_headcount, total_payroll FROM department_report ORDER BY division;
  division   | total_headcount | total_payroll
-------------+-----------------+---------------
 Engineering |               4 |    475000.00
 Operations  |               1 |    100000.00
 Sales       |               2 |    185000.00
(3 rows)

2.4 Watch a Change Cascade Through All Three Layers

This is the heart of pg_trickle. We'll make four changes to the base tables and watch changes propagate automatically through the three-layer DAG — each layer doing only the minimum work.

The data flow pipeline (three layers)

  Your SQL statement
       │
       ▼
  CDC trigger fires (same transaction)
  Change buffer receives one row
       │
       ▼
  Background scheduler fires (within ~1 second)
       │
       ├──▶ [Layer 1] Refresh department_tree
       │         delta query reads change buffer
       │         MERGE touches only affected rows in department_tree
       │         department_tree's own change buffer is updated
       │
       ├──▶ [Layer 2] Refresh department_stats
       │         delta query reads department_tree's change buffer
       │         MERGE touches only affected department groups
       │
       └──▶ [Layer 3] Refresh department_report
                 delta query reads department_stats' change buffer
                 MERGE touches only affected division rows
                 All change buffers cleaned up ✓

All three layers run in a single scheduled pass, in topological order.

2.4a: INSERT ripples through all three layers

INSERT INTO employees (name, department_id, salary) VALUES
    ('Heidi', 6, 105000);  -- New Frontend engineer

What happened immediately (in your transaction): The AFTER INSERT trigger on employees fired and wrote one row to pgtrickle_changes.changes_<employees_oid>. The row contains the new values, action type I, and the LSN at the time of insert. Your transaction committed normally — no blocking.

The stream tables don't know about Heidi yet. The change is in the buffer, waiting for the next refresh.

The background scheduler handles this automatically. With a 1-second schedule, department_stats and department_report refresh within about a second.

To confirm a refresh has happened, check data_timestamp in the monitoring view:

SELECT name, data_timestamp, staleness FROM pgtrickle.pgt_status();

To force an immediate synchronous refresh, wait a moment first (so the scheduler can finish its current tick), then call in topological order. Note that refresh_stream_table only refreshes the named table — it does not cascade upstream:

SELECT pg_sleep(2);  -- let the scheduler finish any in-progress tick
SELECT pgtrickle.refresh_stream_table('department_stats');
SELECT pgtrickle.refresh_stream_table('department_report');

What happened across the three layers:

LayerWhat ranRows touched
department_treeNo change — employees is not a source for this ST0
department_statsDelta query: read 1 buffer row, join to Frontend, COUNT+1, SUM+1050001 (Frontend group only)
department_reportDelta query: read 1 change from dept_stats, SUM += 1 headcount, += 1050001 (Engineering row only)

Check the result:

SELECT department_name, headcount, total_salary FROM department_stats
WHERE department_name = 'Frontend';
 department_name | headcount | total_salary
-----------------+-----------+--------------
 Frontend        |         2 |    215000.00

The 6 other groups in department_stats were not touched at all.

Contrast with a standard materialized view: REFRESH MATERIALIZED VIEW would re-scan all 8 employees, re-join with all 7 departments, re-aggregate, and update all 7 rows. With pg_trickle, the work was proportional to the 1 changed row — across all three layers.

2.4b: A department change cascades through the whole DAG

Now change the departments table — the root of the entire chain:

INSERT INTO departments (id, name, parent_id) VALUES
    (8, 'DevOps', 2);  -- New team under Engineering

What happened: The CDC trigger on departments fired. The change buffer for departments has one new row. None of the stream tables know about it yet.

The scheduler handles this automatically — all three tables will refresh within a second in the correct dependency order (upstream first). To force it synchronously, wait a moment first then refresh each table in topological order (refresh_stream_table does not cascade upstream):

SELECT pg_sleep(2);
SELECT pgtrickle.refresh_stream_table('department_tree');
SELECT pgtrickle.refresh_stream_table('department_stats');
SELECT pgtrickle.refresh_stream_table('department_report');

What happened across all three layers:

LayerWhat ranRows touched
department_treeSemi-naive evaluation: base case finds new dept, recursive term computes its path. Result: 1 new row1 inserted
department_statsDelta query reads new row from dept_tree's change buffer; DevOps has 0 employees so delta is minimal1 inserted (headcount=0)
department_reportDelta on Engineering row: headcount stays the same (DevOps has 0 employees)0 effective changes

How the recursive CTE refresh works — unlike department_stats, recursive CTEs can't be algebraically differentiated (the recursion references itself). pg_trickle uses incremental fixpoint strategies:

  • INSERT → semi-naive evaluation: differentiate the base case, propagate the delta through the recursive term, stopping when no new rows are produced. Only new rows inserted.
  • DELETE or UPDATE → Delete-and-Rederive (DRed): remove rows derived from deleted facts, re-derive rows that may have alternative derivation paths, handle cascades cleanly.
SELECT id, name, depth, path FROM department_tree WHERE name = 'DevOps';
 id |  name  | depth |              path
----+--------+-------+--------------------------------
  8 | DevOps |     2 | Company > Engineering > DevOps
(1 row)

The recursive CTE automatically expanded to include the new department at the correct depth and path. One inserted row in departments produced one new row in the stream table.

2.4c: UPDATE — A single rename that cascades everywhere

Rename "Engineering" to "R&D":

UPDATE departments SET name = 'R&D' WHERE id = 2;

What happened in the change buffer: The CDC trigger captured the old row (name='Engineering') and the new row (name='R&D'). Both old and new values are stored so the delta can compute what to remove and what to add.

Wait a moment for the scheduler to propagate the rename through all layers. To force it synchronously, wait then refresh each table in topological order (refresh_stream_table does not cascade upstream):

SELECT pg_sleep(2);
SELECT pgtrickle.refresh_stream_table('department_tree');
SELECT pgtrickle.refresh_stream_table('department_stats');
SELECT pgtrickle.refresh_stream_table('department_report');

What happened across all three layers:

LayerWork doneResult
department_treeDRed strategy: delete rows derived with old name, re-derive with new name. 5 rows updated (Engineering + 4 sub-teams)Paths now say Company > R&D > …
department_statsDelta reads 5 changed rows from dept_tree's buffer; updates full_path column for those 5 departments5 rows updated
department_reportDivision name changed: "Engineering" row replaced by "R&D" row1 DELETE + 1 INSERT

Query to verify the cascade:

SELECT name, path FROM department_tree WHERE path LIKE '%R&D%' ORDER BY depth, name;
   name   |           path           
----------+--------------------------
 R&D      | Company > R&D
 Backend  | Company > R&D > Backend
 DevOps   | Company > R&D > DevOps
 Frontend | Company > R&D > Frontend
 Platform | Company > R&D > Platform
(5 rows)

One UPDATE to a department name flowed through all three layers automatically — updating 5 + 5 + 2 rows across the chain.

2.4d: DELETE — Remove an employee

DELETE FROM employees WHERE name = 'Bob';

What happened: The AFTER DELETE trigger on employees fired, writing a change buffer row with action type D and Bob's old values (department_id=5, salary=115000). The delta query will use these old values to compute the correct aggregate adjustment — it knows to subtract 115000 from Backend's salary sum and decrement the count.

Important — refresh before querying: The background scheduler refreshes all three tables within ~1 second, in topological order. To see the result immediately, wait a moment then explicitly refresh in upstream-first order:

SELECT pg_sleep(2);
SELECT pgtrickle.refresh_stream_table('department_stats');
SELECT pgtrickle.refresh_stream_table('department_report');

Why call department_stats first? department_stats sources from both employees and department_tree. Refreshing in topological order ensures each layer processes its upstream changes before computing its own deltas. Even when department_tree has unprocessed changes from step 4c and a new employee change arrives simultaneously, pg_trickle's differential engine handles both correctly — using the pre-change left snapshot (L₀) to avoid double-counting.

Then verify the result:

SELECT department_name, headcount, total_salary, avg_salary
FROM department_stats WHERE department_name = 'Backend';
 department_name | headcount | total_salary |     avg_salary
-----------------+-----------+--------------+---------------------
 Backend         |         1 |    120000.00 | 120000.000000000000
(1 row)

Headcount dropped from 2 → 1 and the salary aggregates updated. Again, only the Backend group was touched — the other 6 department rows were untouched.


Chapter 3: Scheduling & Backpressure

Automatic Scheduling — Let the DAG Drive Itself

pg_trickle runs a background scheduler that automatically refreshes stale tables in topological order. In the Step 4 examples above, the scheduler handled every change within about a second. You can also call refresh_stream_table() directly when needed (e.g. in scripts or tests), but in normal operation the scheduler takes care of everything.

How schedules propagate

We gave department_report a '1s' schedule and the two upstream tables a NULL schedule (CALCULATED mode). This is the recommended pattern:

 department_tree    (CALCULATED → inherits 1s from downstream)
       │
 department_stats   (CALCULATED → inherits 1s from downstream)
       │
 department_report  (1s — the only explicit schedule)

CALCULATED mode (pass schedule => 'calculated') means: compute the tightest schedule across all downstream dependents. You declare freshness requirements at the tables your application queries — the system figures out how often each upstream table needs to refresh.

What the scheduler does every second

  1. Queries the catalog for stream tables past their freshness bound
  2. Sorts them topologically (upstream first) — department_tree refreshes before department_stats, which refreshes before department_report
  3. Runs each refresh (respecting pg_trickle.max_concurrent_refreshes)
  4. Updates the last-refresh frontier

Monitoring

-- Current status of all stream tables
SELECT name, status, refresh_mode, schedule, data_timestamp, staleness
FROM pgtrickle.pgt_status();
        name                 | status |  refresh_mode | schedule |       data_timestamp        |    staleness
-----------------------------+--------+---------------+----------+-----------------------------+-----------------
 public.department_tree      | ACTIVE | DIFFERENTIAL  |          | 2026-02-26 10:30:00.123+01 | 00:00:00.877
 public.department_stats     | ACTIVE | DIFFERENTIAL  |          | 2026-02-26 10:30:00.456+01 | 00:00:00.544
 public.department_report    | ACTIVE | DIFFERENTIAL  | 1s       | 2026-02-26 10:30:00.789+01 | 00:00:00.211
-- Detailed performance stats
SELECT pgt_name, total_refreshes, avg_duration_ms, successful_refreshes
FROM pgtrickle.pg_stat_stream_tables;
-- Health check: quick triage of common issues
SELECT check_name, severity, detail FROM pgtrickle.health_check();
-- Visualize the dependency DAG
SELECT * FROM pgtrickle.dependency_tree();
-- Recent refresh timeline across all stream tables
SELECT * FROM pgtrickle.refresh_timeline(10);
-- Check CDC change buffer sizes (spotting buffer build-up)
SELECT * FROM pgtrickle.change_buffer_sizes();

See SQL_REFERENCE.md for the full list of monitoring functions including list_sources(), trigger_inventory(), and diamond_groups().


Chapter 4: Monitoring In Depth

All the monitoring capabilities from the monitoring quick reference above, expanded. For the five most important day-to-day introspection queries see the Monitoring Quick Reference at the end of this guide.

Optional: WAL-based CDC

By default pg_trickle uses triggers. If wal_level = logical is configured, set:

ALTER SYSTEM SET pg_trickle.cdc_mode = 'auto';
SELECT pg_reload_conf();

pg_trickle will automatically transition each stream table from trigger-based to WAL-based capture after the first successful refresh — reducing per-write overhead from ~2–15 μs (triggers) to near-zero (WAL-based capture adds no synchronous overhead to your DML). The transition is transparent; your queries and the refresh schedule are unaffected.

Optional: Parallel Refresh (v0.4.0+)

By default the scheduler refreshes stream tables sequentially in topological order within a single background worker. This is correct and efficient for most workloads.

For deployments with many independent stream tables, enable parallel refresh:

ALTER SYSTEM SET pg_trickle.parallel_refresh_mode = 'on';
ALTER SYSTEM SET pg_trickle.max_dynamic_refresh_workers = 4;  -- cluster-wide cap
SELECT pg_reload_conf();

Independent stream tables at the same DAG level will then refresh concurrently in separate dynamic background workers. Refresh pairs with IMMEDIATE-trigger connections and atomic consistency groups still run in a single worker for correctness.

Before enabling, ensure max_worker_processes has enough room:

max_worker_processes >= 1 (launcher)
                      + number of databases with stream tables
                      + max_dynamic_refresh_workers (default 4)
                      + autovacuum and other extension workers

Monitor parallel refresh:

SELECT * FROM pgtrickle.worker_pool_status();        -- live worker budget
SELECT * FROM pgtrickle.parallel_job_status(60);     -- recent jobs

See CONFIGURATION.md — Parallel Refresh for the complete tuning reference.

Optional: PgBouncer / Connection Pooler Compatibility (v0.10.0+)

If you're connecting through PgBouncer or another connection pooler in transaction mode (the default on Supabase, Railway, Neon, and most managed PostgreSQL platforms), set pooler_compatibility_mode when creating or altering a stream table:

SELECT pgtrickle.create_stream_table(
    name                    => 'live_headcount',
    query                   => 'SELECT department_id, COUNT(*) FROM employees GROUP BY 1',
    schedule                => '1s',
    pooler_compatibility_mode => true
);

This disables prepared statements and NOTIFY emissions for that table — the two features that break in transaction-pool mode. Leave it off (the default) if you connect directly to PostgreSQL.

Optional: Change Buffer Compaction (v0.10.0+)

For high-churn tables, pg_trickle automatically compacts the pending change buffer before each refresh cycle when it exceeds pg_trickle.compact_threshold (default 100,000 rows). INSERT→DELETE pairs that cancel each other out are eliminated, and multiple changes to the same row are collapsed to a single net change, reducing delta scan overhead by 50–90%.


Chapter 5: Advanced Topics

Refresh Modes and IVM Strategies

You've now seen the IVM strategies pg_trickle uses for incremental view maintenance. Understanding the four refresh modes and when each strategy applies helps you write efficient stream table queries.

The Four Refresh Modes

ModeWhen it refreshesUse case
AUTO (default)On a schedule (background)Most use cases — uses DIFFERENTIAL when possible, falls back to FULL automatically
DIFFERENTIALOn a schedule (background)Like AUTO but errors if the query can't be differentiated
FULLOn a schedule (background)Forces full recompute every cycle
IMMEDIATESynchronously, in the same transaction as the DMLReal-time dashboards, audit tables — the stream table is always up-to-date

When you omit refresh_mode, the default is 'AUTO' — it uses differential (delta-only) maintenance when the query supports it, and automatically falls back to full recomputation when it doesn't. You only need to specify a mode explicitly for advanced cases.

IMMEDIATE mode (new in v0.2.0) maintains stream tables synchronously within the same transaction as the base table DML. It uses statement-level AFTER triggers with transition tables — no change buffers, no scheduler. The stream table is always consistent with the current transaction.

-- Create a stream table that updates in real-time
SELECT pgtrickle.create_stream_table(
    name         => 'live_headcount',
    query        => $$
    SELECT department_id, COUNT(*) AS headcount
    FROM employees
    GROUP BY department_id
    $$,
    refresh_mode => 'IMMEDIATE'
);

-- After any INSERT/UPDATE/DELETE on employees,
-- live_headcount is already up-to-date — no refresh needed!

IMMEDIATE mode supports joins, aggregates, window functions, LATERAL subqueries, and cascading IMMEDIATE stream tables. Recursive CTEs are not supported in IMMEDIATE mode (use DIFFERENTIAL instead).

You can switch between modes at any time:

-- Switch from DIFFERENTIAL to IMMEDIATE
SELECT pgtrickle.alter_stream_table('department_stats', refresh_mode => 'IMMEDIATE');

-- Switch back to DIFFERENTIAL with a schedule
SELECT pgtrickle.alter_stream_table('department_stats', refresh_mode => 'DIFFERENTIAL', schedule => '1s');

Algebraic Differentiation (used by department_stats)

For queries composed of scans, filters, joins, and algebraic aggregates (COUNT, SUM, AVG), pg_trickle can derive the IVM delta mathematically. The rules come from the theory of DBSP (Database Stream Processing):

OperatorDelta RuleCost
ScanRead only change buffer rows (not the full table)O(changes)
Filter (WHERE)Apply predicate to change rowsO(changes)
JoinJoin change rows from one side against the full other sideO(changes × lookup)
Aggregate (COUNT/SUM/AVG)Add or subtract deltas per group — no rescanO(affected groups)
ProjectPass throughO(changes)

The total cost is proportional to the number of changes, not the table size. For a million-row table with 10 changes, the delta query touches ~10 rows.

Incremental Strategies for Recursive CTEs (used by department_tree)

For recursive CTEs, pg_trickle can't derive an algebraic delta because the recursion references itself. Instead it uses two complementary strategies, chosen automatically based on what changed:

Semi-naive evaluation (for INSERT-only changes):

  1. Differentiate the base case — find the new seed rows
  2. Propagate the delta through the recursive term, iterating until no new rows are produced
  3. The result is only the new rows created by the change — not the whole tree

Delete-and-Rederive (DRed) (for DELETE or UPDATE):

  1. Remove all rows derived from the old fact
  2. Re-derive rows that had the old fact as one of their derivation paths (they may still be reachable via other paths)
  3. Insert the newly derived rows under the new fact

Both strategies are more efficient than full recomputation — they work on the affected portion of the result set, not the entire recursive query. The MERGE only modifies rows that actually changed.

When to use which strategy?

You don't choose — pg_trickle detects the strategy automatically based on the query structure:

Query PatternStrategyPerformance
Scan + Filter + Join + algebraic Aggregate (COUNT/SUM/AVG)AlgebraicExcellent — O(changes)
CORR, COVAR_POP/SAMP, REGR_* (12 functions)Algebraic (Welford running totals)O(changes) — running totals updated per changed row, no group rescan (v0.10.0+)
Non-recursive CTEsAlgebraic (inlined)CTE body is differentiated inline
MIN / MAX aggregatesSemi-algebraicUses LEAST/GREATEST merge; per-group rescan only when an extremum is deleted
STRING_AGG, ARRAY_AGG, ordered-set aggregatesGroup-rescanAffected groups fully re-aggregated from source
GROUPING SETS / CUBE / ROLLUPAlgebraic (rewritten)Auto-expanded to UNION ALL of GROUP BY queries; CUBE capped at 64 branches
Recursive CTEs (WITH RECURSIVE) INSERTSemi-naive evaluationO(new rows derived from the change)
Recursive CTEs (WITH RECURSIVE) DELETE/UPDATEDelete-and-RederiveRe-derives rows with alternative paths; O(affected subgraph) (v0.10.0+)
LATERAL subqueriesCorrelated re-evaluationOnly outer rows correlated with changed inner data re-evaluated — O(correlated rows) (v0.10.0+)
Window functionsPartition recomputeOnly affected partitions recomputed
ORDER BY … LIMIT N (TopK)Scoped recomputationRe-evaluates top-N via MERGE; stores exactly N rows
IMMEDIATE mode queriesIn-transaction deltaSame algebraic strategies, applied synchronously via transition tables

FUSE Circuit Breaker (v0.11.0+)

The fuse is a circuit breaker that stops a stream table from processing an unexpectedly large batch of changes — for example from a runaway script or mass-delete migration — without operator review.

-- Arm a fuse: blow when pending changes exceed 50 000 rows
SELECT pgtrickle.alter_stream_table(
    'category_summary',
    fuse           => 'on',
    fuse_ceiling   => 50000
);

-- Check fuse status across all stream tables
SELECT name, fuse_mode, fuse_state, fuse_ceiling, blown_at
FROM pgtrickle.fuse_status();

-- After investigating and deciding to apply the batch:
SELECT pgtrickle.reset_fuse('category_summary', action => 'apply');

-- Or skip the oversized batch entirely and resume from current state:
SELECT pgtrickle.reset_fuse('category_summary', action => 'skip_changes');

reset_fuse supports three actions:

  • 'apply' — process all pending changes and resume normal scheduling.
  • 'reinitialize' — drop and repopulate the stream table from scratch.
  • 'skip_changes' — discard pending changes and resume from the current frontier.

A pgtrickle_alert NOTIFY is emitted when the fuse blows, making it easy to hook into alerting pipelines or LISTEN from application code.

Partitioned Stream Tables (v0.11.0+)

For large stream tables, declare a partition key at creation time so MERGE operations are scoped to only the relevant partitions:

SELECT pgtrickle.create_stream_table(
    name         => 'sales_by_month',
    query        => $$
        SELECT
            DATE_TRUNC('month', sale_date) AS month,
            product_id,
            SUM(amount) AS total_sales
        FROM sales
        GROUP BY 1, 2
    $$,
    schedule     => '1m',
    partition_by => 'month'    -- partition key must be in the SELECT output
);

pg_trickle creates the storage table as PARTITION BY RANGE (month) with a catch-all partition, then on each refresh:

  1. Inspects the delta to find the MIN and MAX of the partition key.
  2. Injects AND st.month BETWEEN min AND max into the MERGE ON clause.
  3. PostgreSQL prunes all partitions outside the range — giving ~100× I/O reduction for a 0.1% change rate on a 10M-row table.

See SQL_REFERENCE.md for full partitioning options.

IMMEDIATE Mode — Real-Time In-Transaction IVM

-- Create a stream table that updates in the same transaction as its source
SELECT pgtrickle.create_stream_table(
    name         => 'live_headcount',
    query        => $$
    SELECT department_id, COUNT(*) AS headcount
    FROM employees
    GROUP BY department_id
    $$,
    refresh_mode => 'IMMEDIATE'
);

-- After any INSERT/UPDATE/DELETE on employees, live_headcount is already up-to-date:
INSERT INTO employees (name, department_id, salary) VALUES ('Zara', 2, 95000);
SELECT * FROM live_headcount WHERE department_id = 2;  -- 4 rows, immediately

IMMEDIATE mode uses statement-level AFTER triggers with transition tables — no change buffers, no scheduler, no background workers. The stream table is always consistent with the current transaction. Ideal for audit tables, real-time dashboards, and applications that need zero-latency reads.

Multi-Tenant Worker Quotas (v0.11.0+)

In deployments with multiple databases, one busy database can starve others if all dynamic refresh workers are claimed. The per_database_worker_quota GUC prevents this:

-- Limit one performance-critical database to 4 workers (with burst to 6)
ALTER DATABASE analytics  SET pg_trickle.per_database_worker_quota = 4;
-- Allow a reporting database only 2 base workers
ALTER DATABASE reporting  SET pg_trickle.per_database_worker_quota = 2;
-- Apply changes
SELECT pg_reload_conf();

When the cluster has spare capacity (active workers < 80% of max_dynamic_refresh_workers), a database may temporarily burst to 150% of its quota. Burst is reclaimed within 1 scheduler cycle once load rises. Within each dispatch tick, IMMEDIATE-trigger closures are always dispatched first, followed by atomic groups, singletons, and cyclic SCCs.

See CONFIGURATION.md for full quota tuning options.


Clean Up

When you're done experimenting, drop the stream tables. Drop dependents before their sources:

SELECT pgtrickle.drop_stream_table('department_report');
SELECT pgtrickle.drop_stream_table('department_stats');
SELECT pgtrickle.drop_stream_table('department_tree');

DROP TABLE employees;
DROP TABLE departments;

drop_stream_table atomically removes in a single transaction:

  • The storage table (e.g., public.department_stats)
  • CDC triggers on source tables (removed only if no other stream table references the same source)
  • Change buffer tables in pgtrickle_changes
  • Catalog entries in pgtrickle.pgt_stream_tables


Monitoring Quick Reference

pg_trickle ships several built-in monitoring functions and a ready-made Prometheus/Grafana stack. Here are the five most useful functions for day-to-day operations.

Stream Table Status

-- Overview of all stream tables: status, staleness, last refresh time, errors
SELECT name, status, staleness, last_refresh_at, last_error
FROM pgtrickle.pgt_status();

Health Check

-- Run all built-in health checks; returns severity (OK/WARNING/CRITICAL) per check
SELECT check_name, severity, detail FROM pgtrickle.health_check();

Change Buffer Sizes

-- Show CDC buffer row counts per source table — useful for spotting backlogs
SELECT * FROM pgtrickle.change_buffer_sizes();

Dependency Tree

-- Visualize the DAG: which stream tables depend on what
SELECT * FROM pgtrickle.dependency_tree();

Fuse Status

-- Check circuit breaker state for all stream tables (v0.11.0+)
SELECT * FROM pgtrickle.fuse_status();

Prometheus & Grafana

For production monitoring, pg_trickle ships a ready-made observability stack in the monitoring/ directory:

cd monitoring && docker compose up

This starts PostgreSQL + postgres_exporter + Prometheus + Grafana with pre-configured dashboards and alerting rules. Grafana is available at http://localhost:3000 (admin/admin). See the monitoring README for the full list of exported metrics and alert conditions.

Key Prometheus metrics:

MetricDescription
pgtrickle_refresh_totalCumulative refresh count per table
pgtrickle_refresh_duration_secondsLast refresh duration per table
pgtrickle_staleness_secondsSeconds since last successful refresh
pgtrickle_consecutive_errorsCurrent error streak per table
pgtrickle_cdc_buffer_rowsPending change buffer rows per source table

Pre-configured alerts: staleness > 5 min, ≥3 consecutive failures, table SUSPENDED, CDC buffer > 1 GB, scheduler down, high refresh duration.


Summary: What You Learned

ConceptWhat you saw
Stream tablesTables defined by a SQL query that stay automatically up to date
CDC triggersLightweight change capture in the same transaction — no logical replication or polling required
DAG schedulingStream tables can depend on other stream tables; refreshes run in topological order, schedules propagate upstream via CALCULATED mode
Algebraic IVMDelta queries that process only changed rows — O(changes) regardless of table size
Semi-naive / DRedIncremental strategies for WITH RECURSIVE — INSERT uses semi-naive, DELETE/UPDATE uses Delete-and-Rederive (v0.10.0+)
IMMEDIATE modeSynchronous in-transaction IVM — stream tables updated within the same transaction as your DML, always consistent
TopKORDER BY … LIMIT N queries store exactly N rows, refreshed via scoped recomputation
Diamond consistencyAtomic refresh groups for diamond-shaped dependency graphs via diamond_consistency = 'atomic'
Downstream propagationA single base table write cascades through an entire chain of stream tables, automatically, in the right order
Trigger-based CDCLightweight row-level triggers by default (no WAL configuration needed); optional transition to WAL-based capture via pg_trickle.cdc_mode = 'auto'
Parallel refreshIndependent stream tables refresh concurrently in dynamic background workers via pg_trickle.parallel_refresh_mode = 'on' (v0.4.0+, default off)
auto_backoffScheduler automatically stretches effective interval when refresh cost exceeds 95% of the schedule window, capped at 8× (on by default, v0.10.0+)
PgBouncer compatibilitySet pooler_compatibility_mode => true per stream table to work behind transaction-mode connection poolers (v0.10.0+)
Monitoringpgt_status(), health_check(), dependency_tree(), pg_stat_stream_tables, and more for freshness, timing, and error history

The key takeaway: you write to base tables — pg_trickle does the rest. Data flows downstream automatically, each layer doing the minimum work proportional to what changed, in dependency order.


Troubleshooting

Stream table is stale / not refreshing

Check the status view first:

SELECT name, status, last_error, last_refresh_at, staleness FROM pgtrickle.pgt_status();

A status of ERROR means the last refresh failed. last_error contains the message. Fix the underlying issue (e.g., a dropped column referenced in the query) then call:

SELECT pgtrickle.refresh_stream_table('your_table');

For a broader health check:

SELECT check_name, severity, detail FROM pgtrickle.health_check();

Change buffer growing large

If a stream table has status = 'PAUSED' or refreshes are falling behind:

SELECT * FROM pgtrickle.change_buffer_sizes();  -- find large buffers

Large buffers are normal under heavy load — auto_backoff slows the schedule to avoid CPU runaway and will self-correct once throughput stabilises. If a buffer stays large indefinitely, check last_error in pgt_status() for a blocked refresh.

CDC triggers missing after restore / point-in-time recovery

PITR restores the heap table but not the triggers if the extension was installed after the base backup. Verify:

SELECT * FROM pgtrickle.trigger_inventory();  -- expected vs installed triggers

Any missing trigger can be reinstalled with:

SELECT pgtrickle.repair_stream_table('your_table');

Deployment Best Practices

Once you've built your stream tables interactively, you'll want to deploy them reliably — via SQL migration scripts, dbt, or GitOps pipelines.

Kubernetes Deployment (CloudNativePG)

pg_trickle integrates natively with CloudNativePG using Image Volume Extensions (Kubernetes 1.33+). The extension is packaged as a scratch-based OCI image containing only the .so, .control, and .sql files — no custom PostgreSQL image required.

Prerequisites

  • Kubernetes 1.33+ with the ImageVolume feature gate enabled
  • CloudNativePG operator 1.28+
  • pg_trickle extension image pushed to your cluster registry

Quick Start

  1. Deploy the Cluster with the extension mounted as an Image Volume:
# cnpg/cluster-example.yaml (abridged)
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: pg-trickle-demo
spec:
  instances: 3
  imageName: ghcr.io/cloudnative-pg/postgresql:18
  postgresql:
    shared_preload_libraries:
      - pg_trickle
    extensions:
      - name: pg-trickle
        image:
          reference: ghcr.io/<owner>/pg_trickle-ext:<version>
    parameters:
      max_worker_processes: "8"
  1. Create the extension declaratively with a CNPG Database resource:
# cnpg/database-example.yaml
apiVersion: postgresql.cnpg.io/v1
kind: Database
metadata:
  name: pg-trickle-app
spec:
  name: app
  owner: app
  cluster:
    name: pg-trickle-demo
  extensions:
    - name: pg_trickle
  1. Apply both resources:
kubectl apply -f cnpg/cluster-example.yaml
kubectl apply -f cnpg/database-example.yaml

Full example manifests are in the cnpg/ directory.

Health Monitoring

CNPG manages PostgreSQL liveness/readiness probes via its instance manager. For pg_trickle-specific health, use the built-in health check function:

-- Run against the primary or any replica:
SELECT * FROM pgtrickle.health_check();

This returns rows for scheduler status, error/suspended tables, stale tables, CDC buffer growth, WAL slot lag, and worker pool utilization. Integrate it into your monitoring stack:

  • Prometheus: Use the CNPG monitoring integration to expose pgtrickle.health_check() results as custom metrics
  • Kubernetes CronJob: Schedule periodic health checks and alert via your existing alerting pipeline

Probe Configuration

The example manifests include probe settings tuned for pg_trickle workloads:

probes:
  startup:
    periodSeconds: 10
    failureThreshold: 60     # 10 min for shared_preload_libraries init
  liveness:
    periodSeconds: 10
    failureThreshold: 6      # 60s before restart
  readiness:
    type: streaming
    maximumLag: 64Mi         # replicas must be streaming before serving reads

Why readiness: streaming? Stream tables are readable on replicas, but a lagging replica serves stale stream table data. The maximumLag setting ensures replicas are caught up before receiving traffic.

Failover Behavior

When the primary pod fails and CNPG promotes a replica:

  • Scheduler: The new primary starts the pg_trickle scheduler background worker automatically (registered via shared_preload_libraries)
  • Stream tables: All stream table definitions are stored in the pgtrickle.pgt_stream_tables catalog table, which is replicated to all replicas. The promoted replica has the complete catalog.
  • CDC triggers: Trigger definitions are replicated as part of the WAL stream. The new primary's triggers fire normally on new writes.
  • Change buffers: Uncommitted change buffer rows from in-flight transactions on the old primary are lost (standard PostgreSQL behavior). The next refresh cycle detects the gap and performs a FULL refresh to resynchronize.
  • Refresh frontiers: Each stream table's last-refresh frontier is stored in the catalog. If the frontier is ahead of the available change buffer data (due to WAL replay lag), the scheduler falls back to FULL refresh once and then resumes DIFFERENTIAL.

No manual intervention is required after failover.

Idempotent SQL Migrations

Use create_or_replace_stream_table() in your migration scripts. It's safe to run on every deploy:

-- migrations/V003__stream_tables.sql
-- Creates if absent, updates if definition changed, no-op if identical.

SELECT pgtrickle.create_or_replace_stream_table(
    name         => 'employee_salaries',
    query        => 'SELECT e.id, e.name, d.name AS department, e.salary
                     FROM employees e JOIN departments d ON e.department_id = d.id',
    schedule     => '30s',
    refresh_mode => 'DIFFERENTIAL'
);

SELECT pgtrickle.create_or_replace_stream_table(
    name         => 'department_stats',
    query        => 'SELECT department, COUNT(*) AS headcount, AVG(salary) AS avg_salary
                     FROM employee_salaries GROUP BY department',
    schedule     => '30s',
    refresh_mode => 'DIFFERENTIAL'
);

If someone changes the query in a later migration, create_or_replace detects the difference and migrates the storage table in place — no need to drop and recreate.

dbt Integration

With the dbt-pgtrickle package, stream tables are just dbt models with materialized='stream_table':

-- models/department_stats.sql
{{ config(
    materialized='stream_table',
    schedule='30s',
    refresh_mode='DIFFERENTIAL'
) }}

SELECT department, COUNT(*) AS headcount, AVG(salary) AS avg_salary
FROM {{ ref('employee_salaries') }}
GROUP BY department

Every dbt run calls create_or_replace_stream_table() under the hood, so deployments are always idempotent.


Day 2 Operations

Added in v0.20.0 (UX-4).

Once your stream tables are running in production, pg_trickle can monitor itself using its own stream tables — a technique called self-monitoring.

Enabling Self-Monitoring

-- Create all five monitoring stream tables (idempotent, safe to repeat).
SELECT pgtrickle.setup_self_monitoring();

-- Check what was created.
SELECT * FROM pgtrickle.self_monitoring_status();

This creates five stream tables in the pgtrickle schema:

Stream TablePurpose
df_efficiency_rollingRolling-window refresh statistics (replaces manual refresh_efficiency() calls)
df_anomaly_signalsDetects duration spikes, error bursts, mode oscillation
df_threshold_adviceRecommends threshold adjustments based on multi-cycle analysis
df_cdc_buffer_trendsTracks CDC buffer growth rates per source table
df_scheduling_interferenceDetects concurrent refresh overlap patterns

Checking Recommendations

After at least 10–20 refresh cycles have accumulated:

-- Which stream tables have poorly calibrated thresholds?
SELECT pgt_name, current_threshold, recommended_threshold, confidence, reason
FROM pgtrickle.df_threshold_advice
WHERE confidence IN ('HIGH', 'MEDIUM')
  AND abs(recommended_threshold - current_threshold) > 0.05;

-- Are any stream tables experiencing anomalies?
SELECT pgt_name, duration_anomaly, recent_failures
FROM pgtrickle.df_anomaly_signals
WHERE duration_anomaly IS NOT NULL OR recent_failures >= 2;

Automatic Threshold Tuning

To let pg_trickle automatically apply threshold recommendations:

SET pg_trickle.self_monitoring_auto_apply = 'threshold_only';

This applies changes only when confidence is HIGH and the recommended threshold differs by more than 5%. Changes are rate-limited to once per 10 minutes per stream table and logged with initiated_by = 'SELF_MONITOR'.

Visualizing the DAG

-- See the full refresh graph (Mermaid format, paste into any Mermaid renderer).
SELECT pgtrickle.explain_dag();

Dog-feeding STs appear in green, user STs in blue, suspended in red.

Disabling Self-Monitoring

SELECT pgtrickle.teardown_self_monitoring();

This drops all monitoring stream tables. User stream tables are never affected. The control plane continues operating identically without self-monitoring.


What's Next?

Viewing on GitHub? The installation guide lives in INSTALL.md. This stub is served by the pg_trickle docs site — the include below renders there.

Installation Guide

Choose your installation path

EnvironmentRecommended approach
Local development / quick evaluationDocker sandboxdocker compose up -d in playground/ gets you a running pg_trickle instance in ~60 seconds with no build step.
Self-hosted Linux serverPre-built release — download the .tar.gz, copy two directories, CREATE EXTENSION.
macOS developmentBuild from source using cargo pgrx install, or use the Docker sandbox above.
Kubernetes / CNPGCloudNativePG — use the Dockerfile.ghcr image or the CNPG extension manifest.
Managed PostgreSQLCheck your provider's extension list. If pg_trickle is not available, the Docker or Kubernetes path is your best option.

Prerequisites

RequirementVersion
PostgreSQL18.x

Building from source additionally requires Rust 1.85+ (edition 2024) and pgrx 0.18.x. Pre-built release artifacts only need a running PostgreSQL 18.x instance.


Installing from a Pre-built Release

1. Download the release archive

Download the archive for your platform from the GitHub Releases page:

PlatformArchive
Linux x86_64pg_trickle-<ver>-pg18-linux-amd64.tar.gz
macOS Apple Siliconpg_trickle-<ver>-pg18-macos-arm64.tar.gz
Windows x64pg_trickle-<ver>-pg18-windows-amd64.zip

Optionally verify the checksum against SHA256SUMS.txt from the same release:

sha256sum -c SHA256SUMS.txt

2. Extract and install

Linux / macOS:

tar xzf pg_trickle-<ver>-pg18-linux-amd64.tar.gz
cd pg_trickle-<ver>-pg18-linux-amd64

sudo cp lib/*.so  "$(pg_config --pkglibdir)/"
sudo cp extension/*.control extension/*.sql "$(pg_config --sharedir)/extension/"

Windows (PowerShell):

Expand-Archive pg_trickle-<ver>-pg18-windows-amd64.zip -DestinationPath .
cd pg_trickle-<ver>-pg18-windows-amd64

Copy-Item lib\*.dll  "$(pg_config --pkglibdir)\"
Copy-Item extension\* "$(pg_config --sharedir)\extension\"

3. Using with CloudNativePG (Kubernetes)

pg_trickle is distributed as an OCI extension image for use with CloudNativePG Image Volume Extensions.

Requirements: Kubernetes 1.33+, CNPG 1.28+, PostgreSQL 18.

# Pull the extension image
docker pull ghcr.io/trickle-labs/pg_trickle-ext:<ver>

See cnpg/cluster-example.yaml and cnpg/database-example.yaml for complete Cluster and Database deployment examples.

pg_trickle is published as a ready-to-run Docker image on the GitHub Container Registry. PostgreSQL 18.3 and pg_trickle are pre-installed and all sensible GUC defaults (wal_level, shared_preload_libraries, memory, scheduler settings) are baked in — no configuration file editing needed.

docker pull ghcr.io/trickle-labs/pg_trickle:latest

docker run --rm \
  -e POSTGRES_PASSWORD=secret \
  -p 5432:5432 \
  ghcr.io/trickle-labs/pg_trickle:latest

CREATE EXTENSION pg_trickle; runs automatically on the default postgres database at first startup.

Available tags:

TagMeaning
latestMost recent release
pg18Floating alias for the latest PostgreSQL 18 build
<version>-pg18.3Immutable tag, e.g. 0.13.0-pg18.3

Override any GUC at runtime without rebuilding:

docker run --rm \
  -e POSTGRES_PASSWORD=secret \
  -p 5432:5432 \
  ghcr.io/trickle-labs/pg_trickle:latest \
  -c shared_buffers=2GB -c work_mem=64MB -c effective_cache_size=6GB

For persistent data, mount a volume:

docker run -d \
  --name pg_trickle \
  -e POSTGRES_PASSWORD=secret \
  -p 5432:5432 \
  -v pg_trickle_data:/var/lib/postgresql/data \
  ghcr.io/trickle-labs/pg_trickle:latest

Alternative — manual mount from a release archive: If you prefer to use the stock postgres:18.3 image rather than the pre-built image, extract the extension files from a release archive and mount them:

tar xzf pg_trickle-<ver>-pg18-linux-amd64.tar.gz
cd pg_trickle-<ver>-pg18-linux-amd64

docker run --rm \
  -v $PWD/lib/pg_trickle.so:/usr/lib/postgresql/18/lib/pg_trickle.so:ro \
  -v $PWD/extension/:/tmp/ext/:ro \
  -e POSTGRES_PASSWORD=postgres \
  postgres:18.3 \
  sh -c 'cp /tmp/ext/* /usr/share/postgresql/18/extension/ && \
         exec postgres -c shared_preload_libraries=pg_trickle'

Installing from PGXN

pg_trickle is published on the PostgreSQL Extension Network (PGXN). Installing via PGXN compiles the extension from source, so the Rust toolchain and pgrx are required.

1. Install prerequisites

# Rust toolchain
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source "$HOME/.cargo/env"

# pgrx build tool
cargo install --locked cargo-pgrx --version 0.18.0
cargo pgrx init --pg18 "$(pg_config --bindir)/pg_config"

2. Install the pgxn client

pip install pgxnclient

3. Install pg_trickle

pgxn install pg_trickle

To install a specific version:

pgxn install pg_trickle=0.10.0

Note: After installation, follow the PostgreSQL Configuration and Extension Installation steps below.


Building from Source

1. Install Rust

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

2. Install pgrx

cargo install --locked cargo-pgrx --version 0.18.0
cargo pgrx init --pg18 $(pg_config --bindir)/pg_config

3. Build the Extension

# Development build (faster compilation)
cargo pgrx install --pg-config $(pg_config --bindir)/pg_config

# Release build (optimized, for production)
cargo pgrx install --release --pg-config $(pg_config --bindir)/pg_config

# Package for deployment (creates installable artifacts)
cargo pgrx package --pg-config $(pg_config --bindir)/pg_config

PostgreSQL Configuration

Add the following to postgresql.conf before starting PostgreSQL:

# Required — loads the extension shared library at server start
shared_preload_libraries = 'pg_trickle'

# Must accommodate the pg_trickle launcher + one scheduler per database
# with pg_trickle installed + optional parallel refresh workers.
#
# WARNING: when this limit is reached, the launcher silently skips
# databases it cannot spawn a scheduler for and retries every 5 minutes.
# Those databases stop refreshing without any visible error.
# Check PostgreSQL logs for:
#   WARNING:  pg_trickle launcher: could not spawn scheduler for database '...'
#
# Formula:
#   1 (launcher) + N (one scheduler per DB) + max_dynamic_refresh_workers
#   + autovacuum_max_workers + parallel query workers + other extensions
#
# 32 is a safe starting point for most clusters:
max_worker_processes = 32

Note: wal_level = logical and max_replication_slots are not required. The extension uses lightweight row-level triggers for CDC, not logical replication.

Restart PostgreSQL after modifying these settings:

pg_ctl restart -D /path/to/data
# or
systemctl restart postgresql

Extension Installation

Connect to the target database and run:

CREATE EXTENSION pg_trickle;

This creates:

  • The pgtrickle schema with catalog tables and SQL functions
  • The pgtrickle_changes schema for change buffer tables
  • Event triggers for DDL tracking
  • The pgtrickle.pg_stat_stream_tables monitoring view

Verification

After installation, verify everything is working:

-- Check the extension version
SELECT extname, extversion FROM pg_extension WHERE extname = 'pg_trickle';

-- Or get a full status overview (includes version, scheduler state, stream table count)
SELECT * FROM pgtrickle.pgt_status();

Quick functional test

CREATE TABLE test_source (id INT PRIMARY KEY, val TEXT);
INSERT INTO test_source VALUES (1, 'hello');

SELECT pgtrickle.create_stream_table(
    'test_st',
    'SELECT id, val FROM test_source',
    '1m',
    'FULL'
);

SELECT * FROM test_st;
-- Should return: 1 | hello

-- Clean up
SELECT pgtrickle.drop_stream_table('test_st');
DROP TABLE test_source;

Upgrading

To upgrade pg_trickle to a newer version without losing data:

For comprehensive upgrade instructions, version-specific notes, troubleshooting, and rollback procedures, see docs/UPGRADING.md.

1. Install the new extension files

Follow the same steps as Installing from a Pre-built Release to overwrite the shared library and SQL files with the new version. You do not need to drop the extension from your databases first.

Linux / macOS:

tar xzf pg_trickle-<new-ver>-pg18-linux-amd64.tar.gz
cd pg_trickle-<new-ver>-pg18-linux-amd64

sudo cp lib/*.so  "$(pg_config --pkglibdir)/"
sudo cp extension/*.control extension/*.sql "$(pg_config --sharedir)/extension/"

2. Restart PostgreSQL (when required)

If the shared library ABI has changed, restart PostgreSQL before proceeding so the new .so/.dll is loaded. The release notes for each version will call this out explicitly when a restart is required.

pg_ctl restart -D /path/to/data
# or
systemctl restart postgresql

3. Apply the schema migration in each database

Connect to every database where pg_trickle is installed and run:

-- Upgrade to the latest bundled version
ALTER EXTENSION pg_trickle UPDATE;

-- Or upgrade to a specific version
ALTER EXTENSION pg_trickle UPDATE TO '<new-version>';

PostgreSQL uses the versioned SQL migration scripts bundled with the release (e.g. pg_trickle--0.2.3--0.3.0.sql, pg_trickle--0.3.0--0.4.0.sql) to apply catalog and SQL-surface changes. PostgreSQL automatically chains these scripts when you run ALTER EXTENSION pg_trickle UPDATE. The command is a no-op when no migration script is needed for a given release.

You can confirm the active version afterwards:

SELECT extversion FROM pg_extension WHERE extname = 'pg_trickle';

Coming soon: A future release will include a helper function (pgtrickle.upgrade()) that automates steps 2–3 across all databases in the cluster and validates catalog integrity after the migration. Until then, the manual steps above are the supported upgrade path.


Uninstallation

-- Drop all stream tables first
SELECT pgtrickle.drop_stream_table(pgt_schema || '.' || pgt_name)
FROM pgtrickle.pgt_stream_tables;

-- Drop the extension
DROP EXTENSION pg_trickle CASCADE;

Remove pg_trickle from shared_preload_libraries in postgresql.conf and restart PostgreSQL.

Troubleshooting

Unit tests crash on macOS 26+ (symbol not found in flat namespace)

macOS 26 (Tahoe) changed dyld to eagerly resolve all flat-namespace symbols at binary load time. pgrx extensions reference PostgreSQL server-internal symbols (e.g. CacheMemoryContext, SPI_connect) via the -Wl,-undefined,dynamic_lookup linker flag. These symbols are normally provided by the postgres executable when the extension is loaded as a shared library — but for cargo test --lib there is no postgres process, so the test binary aborts immediately:

dyld[66617]: symbol not found in flat namespace '_CacheMemoryContext'

This affects local development only — integration tests, E2E tests, and the extension itself running inside PostgreSQL are unaffected.

The fix is built into the just test-unit recipe. It automatically:

  1. Compiles a tiny C stub library (scripts/pg_stub.ctarget/libpg_stub.dylib) that provides NULL/no-op definitions for the ~28 PostgreSQL symbols.
  2. Compiles the test binary with --no-run.
  3. Runs the binary with DYLD_INSERT_LIBRARIES pointing to the stub.

The stub is only built on macOS 26+. On Linux or older macOS, just test-unit runs cargo test --lib directly with no changes.

Note: The stub symbols are never called — unit tests exercise pure Rust logic only. If a test accidentally calls a PostgreSQL function it will crash with a NULL dereference (the desired fail-fast behavior).

If you run unit tests without just (e.g. directly via cargo test --lib), you can use the wrapper script instead:

./scripts/run_unit_tests.sh pg18

# With test name filter:
./scripts/run_unit_tests.sh pg18 -- test_parse_basic

Extension fails to load

Ensure shared_preload_libraries = 'pg_trickle' is set and PostgreSQL has been restarted (not just reloaded). The extension requires shared memory initialization at startup.

Background worker not starting

Check that max_worker_processes is high enough. In sequential mode (default) pg_trickle needs one slot per database with stream tables. With parallel refresh enabled (pg_trickle.parallel_refresh_mode = 'on') it additionally needs max_dynamic_refresh_workers slots (default 4) shared across all databases.

See the worker-budget formula in CONFIGURATION.md for sizing guidance.

Check logs for details

The extension logs at various levels. Enable debug logging for more detail:

SET client_min_messages TO debug1;

Backup & Restore

Backing Up a pg_trickle-Enabled Database

pg_trickle uses two schemas that must be included in any backup alongside your application schema:

  • pgtrickle — stream table catalog, dependency graph, and refresh history
  • pgtrickle_changes — change-buffer tables (one per CDC-enabled source)

Include both schemas explicitly when using pg_dump:

pg_dump \
  --schema=public \
  --schema=pgtrickle \
  --schema=pgtrickle_changes \
  mydb > mydb_backup.sql

If your application data lives in schemas other than public, include those schemas too. Omitting pgtrickle_changes means any unprocessed CDC rows are lost on restore, forcing the next differential refresh to fall back to FULL mode.

Restoring

Restore normally:

psql -d mydb_restored -f mydb_backup.sql

After restore, run the health check to validate catalog integrity:

SELECT * FROM pgtrickle.health_check();

OID Re-assignment After Restore

Change-buffer tables in pgtrickle_changes are named by storage-table OID (e.g. changes_12345). OIDs may differ after restore if tables were created in a different order. Run pg_trickle_repair_stream_table() on each stream table immediately after restore to reconcile any OID mismatches:

SELECT pgtrickle.repair_stream_table(pgt_schema || '.' || pgt_name)
FROM pgtrickle.pgt_stream_tables;

Next Steps

Best-Practice Patterns for pg_trickle

This guide covers common data modeling patterns and recommended configurations for pg_trickle stream tables. Each pattern includes worked SQL examples, anti-patterns to avoid, and refresh mode recommendations.

Version: v0.14.0+. Some features require recent versions — check SQL_REFERENCE.md for per-feature availability.


Table of Contents


Pattern 1: Bronze / Silver / Gold Materialization

A multi-layer approach where raw data flows through progressively refined stream tables, similar to a medallion architecture.

Architecture

  [raw_events]          ← Bronze: raw ingest table (regular table)
       ↓
  [events_cleaned]      ← Silver: filtered, deduplicated, typed
       ↓
  [events_aggregated]   ← Gold: business-level aggregates

SQL Example

-- Bronze: regular PostgreSQL table (source of truth)
CREATE TABLE raw_events (
    event_id    BIGSERIAL PRIMARY KEY,
    user_id     INT NOT NULL,
    event_type  TEXT NOT NULL,
    payload     JSONB,
    received_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

-- Silver: cleaned and deduplicated events
SELECT pgtrickle.create_stream_table(
    'events_cleaned',
    $$SELECT DISTINCT ON (event_id)
        event_id,
        user_id,
        event_type,
        (payload->>'amount')::numeric AS amount,
        received_at
      FROM raw_events
      WHERE event_type IN ('purchase', 'refund', 'subscription')$$,
    schedule => '5s',
    refresh_mode => 'DIFFERENTIAL'
);

-- Gold: per-user purchase summary
SELECT pgtrickle.create_stream_table(
    'user_purchase_summary',
    $$SELECT user_id,
             COUNT(*) AS total_purchases,
             SUM(amount) AS total_spent,
             AVG(amount) AS avg_order
      FROM events_cleaned
      WHERE event_type = 'purchase'
      GROUP BY user_id$$,
    schedule => 'calculated',
    refresh_mode => 'DIFFERENTIAL'
);
LayerRefresh ModeScheduleTier
SilverDIFFERENTIAL5s – 30shot
GoldDIFFERENTIALcalculatedhot

Anti-Patterns

  • Don't use FULL refresh for Silver. With frequent small inserts, DIFFERENTIAL is 10–100x faster.
  • Don't skip the Silver layer. Joining raw tables directly in Gold queries produces wider joins and slower deltas.
  • Don't use IMMEDIATE mode for Gold. Aggregate maintenance on every DML row is expensive — batched DIFFERENTIAL is more efficient.

When NOT to use this pattern

  • Your data never changes after insert — a single stream table is simpler.
  • The Bronze source is external (foreign table, dblink) — CDC triggers cannot be attached to foreign tables; use WAL CDC mode or FULL refresh.
  • You have fewer than ~10,000 rows in Silver — the overhead of three layers is not justified; use one or two tables instead.

Pattern 2: Event Sourcing with Stream Tables

Use stream tables as projections of an append-only event log. The source table is the event store; stream tables materialize different read models.

SQL Example

-- Event store (append-only source)
CREATE TABLE events (
    event_id    BIGSERIAL PRIMARY KEY,
    aggregate_id UUID NOT NULL,
    event_type   TEXT NOT NULL,
    payload      JSONB NOT NULL,
    created_at   TIMESTAMPTZ NOT NULL DEFAULT now()
);

-- Projection 1: Current state per aggregate
SELECT pgtrickle.create_stream_table(
    'aggregate_state',
    $$SELECT DISTINCT ON (aggregate_id)
        aggregate_id,
        event_type AS last_event,
        payload AS current_state,
        created_at AS last_updated
      FROM events
      ORDER BY aggregate_id, created_at DESC$$,
    schedule => '2s',
    refresh_mode => 'DIFFERENTIAL'
);

-- Projection 2: Event counts by type per hour
SELECT pgtrickle.create_stream_table(
    'hourly_event_counts',
    $$SELECT date_trunc('hour', created_at) AS hour,
             event_type,
             COUNT(*) AS event_count
      FROM events
      GROUP BY 1, 2$$,
    schedule => '10s',
    refresh_mode => 'DIFFERENTIAL'
);
ProjectionRefresh ModeWhy
Current stateDIFFERENTIALSmall delta per cycle; DISTINCT ON supported
Hourly countsDIFFERENTIALAlgebraic aggregate (COUNT), efficient delta
String aggregationsAUTOGROUP_RESCAN aggs may benefit from FULL

Anti-Patterns

  • Don't DELETE from the event store. pg_trickle tracks changes via triggers; mixing append and delete on the source creates unnecessary delta complexity. Archive old events to a separate table.
  • Don't use append_only => true with UPDATE/DELETE patterns. The append_only flag skips DELETE tracking in the change buffer — only use it when the source truly never updates or deletes.

When NOT to use this pattern

  • Your event log is consumed and processed in real time by application code — a stream table adds latency without benefit.
  • You need strict per-event ordering guarantees within a transaction — use IMMEDIATE mode with a single-row projection instead.
  • Your events are multi-gigabyte payloads — stream tables replicate the whole row; store only metadata in the event log, not the payload.

Pattern 3: Slowly Changing Dimensions (SCD)

SCD Type 1: Overwrite

The stream table always reflects the current state. Source updates overwrite previous values.

-- Source: customer dimension table (updated in place)
CREATE TABLE customers (
    customer_id INT PRIMARY KEY,
    name        TEXT NOT NULL,
    email       TEXT,
    tier        TEXT DEFAULT 'standard',
    updated_at  TIMESTAMPTZ DEFAULT now()
);

-- SCD-1: current customer state enriched with order stats
SELECT pgtrickle.create_stream_table(
    'customer_360',
    $$SELECT c.customer_id,
             c.name,
             c.email,
             c.tier,
             COUNT(o.id) AS total_orders,
             COALESCE(SUM(o.amount), 0) AS lifetime_value
      FROM customers c
      LEFT JOIN orders o ON o.customer_id = c.customer_id
      GROUP BY c.customer_id, c.name, c.email, c.tier$$,
    schedule => '30s',
    refresh_mode => 'DIFFERENTIAL'
);

SCD Type 2: History Tracking

For SCD-2, maintain a history table with valid-from/valid-to ranges. The stream table provides the current snapshot.

-- Source: customer history with validity ranges
CREATE TABLE customer_history (
    customer_id INT NOT NULL,
    name        TEXT NOT NULL,
    tier        TEXT NOT NULL,
    valid_from  TIMESTAMPTZ NOT NULL,
    valid_to    TIMESTAMPTZ,  -- NULL = current
    PRIMARY KEY (customer_id, valid_from)
);

-- Current active records only
SELECT pgtrickle.create_stream_table(
    'customers_current',
    $$SELECT customer_id, name, tier, valid_from
      FROM customer_history
      WHERE valid_to IS NULL$$,
    schedule => '10s',
    refresh_mode => 'DIFFERENTIAL'
);

Anti-Patterns

  • Don't use FULL refresh for SCD-1 with large dimension tables. Customer tables with millions of rows but few changes per cycle are ideal for DIFFERENTIAL.
  • Don't forget to index valid_to IS NULL for SCD-2 sources. Without it, the delta scan touches all historical rows.

When NOT to use this pattern

  • You already have a purpose-built slowly-changing-dimension ETL tool (e.g. dbt snapshots) — pg_trickle's SCD support is complementary, not a replacement, and duplicate ownership creates confusion.
  • Your dimension table changes every row on every load — DIFFERENTIAL offers no benefit; use FULL refresh or rethink the source update pattern.
  • You need Type 3 (add-a-column) or Type 6 SCD — those require schema evolution that pg_trickle does not automate today.

Pattern 4: High-Fan-Out Topology

When a single source table feeds many downstream stream tables.

Architecture

                    [orders]
                   ↙  ↓  ↓  ↘
  [daily_totals] [by_region] [by_product] [top_customers]

SQL Example

-- Single source feeding multiple views
CREATE TABLE orders (
    id          SERIAL PRIMARY KEY,
    customer_id INT NOT NULL,
    region      TEXT NOT NULL,
    product_id  INT NOT NULL,
    amount      NUMERIC(10,2) NOT NULL,
    order_date  DATE NOT NULL DEFAULT CURRENT_DATE
);

-- Fan-out: 4 stream tables on 1 source
SELECT pgtrickle.create_stream_table('daily_totals',
    'SELECT order_date, SUM(amount) AS daily_total, COUNT(*) AS order_count
     FROM orders GROUP BY order_date',
    schedule => '5s', refresh_mode => 'DIFFERENTIAL');

SELECT pgtrickle.create_stream_table('by_region',
    'SELECT region, SUM(amount) AS total, COUNT(*) AS cnt
     FROM orders GROUP BY region',
    schedule => '5s', refresh_mode => 'DIFFERENTIAL');

SELECT pgtrickle.create_stream_table('by_product',
    'SELECT product_id, SUM(amount) AS total, COUNT(*) AS cnt
     FROM orders GROUP BY product_id',
    schedule => '5s', refresh_mode => 'DIFFERENTIAL');

SELECT pgtrickle.create_stream_table('top_customers',
    'SELECT customer_id, SUM(amount) AS lifetime_value, COUNT(*) AS order_count
     FROM orders GROUP BY customer_id',
    schedule => '10s', refresh_mode => 'DIFFERENTIAL');
  • All fan-out targets share the same source change buffer — CDC overhead is paid once regardless of how many stream tables read from orders.
  • Use schedule => 'calculated' on downstream STs when they chain from other stream tables.
  • Consider pg_trickle.max_concurrent_refreshes if fan-out exceeds 8 (default: 4 concurrent refreshes).

Anti-Patterns

  • Don't use IMMEDIATE mode on high-fan-out sources. Each DML row triggers N refreshes (one per downstream ST). Use DIFFERENTIAL with a batched schedule instead.
  • Don't set different schedules on STs that should be consistent. If daily_totals and by_region must agree, give them the same schedule or use diamond_consistency => 'atomic'.

When NOT to use this pattern

  • You only have one or two downstream stream tables — the fan-out pattern adds planning overhead that isn't justified below ~4 targets.
  • Downstream queries have incompatible refresh modes (e.g. one needs IMMEDIATE, another needs FULL) — prefer separate source tables.
  • All downstream STs will always be queried together — a single wider stream table may be simpler and faster.

Pattern 5: Real-Time Dashboards

For dashboards that need sub-second refresh latency.

SQL Example

-- Live order monitor (sub-second freshness)
SELECT pgtrickle.create_stream_table(
    'order_monitor',
    $$SELECT
        date_trunc('minute', order_date) AS minute,
        region,
        COUNT(*) AS orders,
        SUM(amount) AS revenue
      FROM orders
      WHERE order_date >= CURRENT_DATE
      GROUP BY 1, 2$$,
    schedule => '1s',
    refresh_mode => 'DIFFERENTIAL'
);

-- For truly real-time needs, use IMMEDIATE mode (triggers on each DML)
SELECT pgtrickle.create_stream_table(
    'live_counter',
    $$SELECT region, COUNT(*) AS cnt, SUM(amount) AS total
      FROM orders GROUP BY region$$,
    schedule => 'IMMEDIATE',
    refresh_mode => 'DIFFERENTIAL'
);

When to Use IMMEDIATE vs Scheduled DIFFERENTIAL

ScenarioModeWhy
Dashboard polls every 1s1sBatched delta amortizes overhead
GraphQL subscription, < 100msIMMEDIATETriggers fire synchronously per DML
Aggregate with GROUP_RESCAN5s+Avoid per-row full rescans
High write throughput (>1K/s)2s5sIMMEDIATE adds latency to each INSERT

Anti-Patterns

  • Don't use IMMEDIATE for complex joins. Each INSERT/UPDATE/DELETE fires the full DVM delta SQL synchronously — multi-table joins in IMMEDIATE mode add significant latency to writes.
  • Don't forget pooler_compatibility_mode with PgBouncer. Transaction pooling drops temp tables between transactions; enable this flag to avoid stale PREPARE statements.

When NOT to use this pattern

  • The data source itself is the bottleneck (slow sensors, infrequent API polling) — a sub-second schedule on a stream table that changes once a minute burns CPU for nothing.
  • You need consistency across several related tiles — schedule them together or use a single wider query rather than sub-second individual refreshes that can transiently disagree.
  • Write throughput exceeds ~5,000 rows/s — IMMEDIATE mode adds latency to every write; profile with pg_trickle.latency_percentiles() first.

Pattern 6: Tiered Refresh Strategy

Assign refresh importance tiers to control scheduling priority.

-- Hot: real-time operational dashboard
SELECT pgtrickle.create_stream_table('live_metrics', ...);
SELECT pgtrickle.alter_stream_table('live_metrics', tier => 'hot');

-- Warm: hourly business reports (2x interval multiplier)
SELECT pgtrickle.create_stream_table('hourly_report', ...,
    schedule => '1m');
SELECT pgtrickle.alter_stream_table('hourly_report', tier => 'warm');

-- Cold: daily analytics (10x interval multiplier)
SELECT pgtrickle.create_stream_table('daily_analytics', ...,
    schedule => '5m');
SELECT pgtrickle.alter_stream_table('daily_analytics', tier => 'cold');

-- Frozen: archive/audit (skip refresh entirely)
SELECT pgtrickle.alter_stream_table('audit_log_summary', tier => 'frozen');

Tier Multipliers

TierSchedule MultiplierUse Case
hot1xOperational dashboards, alerts
warm2xHourly reports, batch pipelines
cold10xDaily analytics, low-priority STs
frozenskipPaused/archived, manual refresh

When NOT to use this pattern

  • All your stream tables are equally critical — don't introduce tier complexity just to have tiers; use a flat schedule instead.
  • Your scheduler runs with a single worker — tiering helps multi-worker scheduling; it has no effect when max_concurrent_refreshes = 1.
  • Tier multipliers change too frequently — tier is a static property; if freshness requirements change continuously, use SLA scheduling (pg_trickle.sla_scheduling) instead.

General Guidelines

Choosing a Refresh Mode

ScenarioRecommended Mode
Source has < 5% change ratio per cycleDIFFERENTIAL
Source changes > 50% per cycleFULL
Query is a simple filter/projectionDIFFERENTIAL
Query has GROUP_RESCAN aggregates (MIN, MAX)AUTO
Query joins 4+ tablesDIFFERENTIAL
Target table < 1000 rowsFULL
Need per-row latency guaranteeIMMEDIATE

Use pgtrickle.recommend_refresh_mode() (v0.14.0+) for automated analysis:

SELECT pgt_name, recommended_mode, confidence, reason
FROM pgtrickle.recommend_refresh_mode();

Monitoring Checklist

-- Check refresh efficiency across all stream tables
SELECT pgt_name, refresh_mode, diff_speedup, avg_change_ratio
FROM pgtrickle.refresh_efficiency()
ORDER BY total_refreshes DESC;

-- Find stream tables that might benefit from mode change
SELECT pgt_name, current_mode, recommended_mode, reason
FROM pgtrickle.recommend_refresh_mode()
WHERE recommended_mode != 'KEEP';

-- Check for error states
SELECT pgt_name, status, last_error_message
FROM pgtrickle.stream_tables_info
WHERE status IN ('ERROR', 'SUSPENDED');

-- Export definitions for backup
SELECT pgtrickle.export_definition(pgt_schema || '.' || pgt_name)
FROM pgtrickle.pgt_stream_tables;

Common Mistakes

  1. Using FULL refresh by default. Start with DIFFERENTIAL — it's correct for 80%+ of workloads. Switch to FULL only when recommend_refresh_mode() suggests it.

  2. Over-scheduling. A 1-second schedule on a table with 1-hour change cycles wastes CPU. Match the schedule to actual data arrival rate.

  3. Ignoring append_only. If the source table is truly append-only (no UPDATEs, no DELETEs), set append_only => true to halve change buffer writes.

  4. Not using calculated schedule for chained STs. When ST-B reads from ST-A, use schedule => 'calculated' on ST-B to avoid unnecessary refreshes. The scheduler automatically propagates ST-A changes downstream.

  5. Mixing IMMEDIATE and complex joins. IMMEDIATE mode fires delta SQL on every DML — an 8-table join in IMMEDIATE mode adds 50–200ms to each INSERT. Use scheduled DIFFERENTIAL for complex queries.


Replica Bootstrap & PITR Alignment (v0.27.0)

When bootstrapping a new replica or performing point-in-time recovery, stream tables need special handling because their state is derived from source data at a specific frontier (LSN + timestamp).

The problem

After a pg_basebackup or logical restore, stream table rows are present but their frontiers may be stale. The next refresh would trigger a FULL re-scan of all source data, which is expensive for large stream tables.

Solution: use snapshots for replica bootstrap

-- On the primary: export the stream table state
SELECT pgtrickle.snapshot_stream_table(
    'public.orders_agg',
    'pgtrickle.orders_agg_replica_init'
);

-- Dump only the snapshot table to the replica
pg_dump -t 'pgtrickle.orders_agg_replica_init' mydb | psql replica_db

-- On the replica: restore and align the frontier
SELECT pgtrickle.restore_from_snapshot(
    'public.orders_agg',
    'pgtrickle.orders_agg_replica_init'
);

-- Clean up the bootstrap snapshot
SELECT pgtrickle.drop_snapshot('pgtrickle.orders_agg_replica_init');

After restore_from_snapshot(), the frontier is set to the snapshot's frontier and the next refresh is DIFFERENTIAL — only changes after the snapshot creation time are fetched.

PITR alignment workflow

When performing PITR to a specific LSN:

  1. Take a snapshot immediately before the target LSN
  2. Restore the database to the target LSN using pg_basebackup + WAL replay
  3. Run restore_from_snapshot() on each stream table to align frontiers
-- Step 1: snapshot all stream tables (before PITR)
SELECT pgtrickle.snapshot_stream_table(
    pgt_schema || '.' || pgt_name,
    'pgtrickle.pitr_snapshot_' || pgt_name || '_' || extract(epoch from now())::bigint
)
FROM pgtrickle.pgt_stream_tables
WHERE status = 'ACTIVE';

-- Step 3 (after PITR): restore all snapshots
SELECT pgtrickle.restore_from_snapshot(
    pgt_schema || '.' || pgt_name,
    'pgtrickle.pitr_snapshot_' || pgt_name || '_<epoch>'
)
FROM pgtrickle.pgt_stream_tables;

Performance: Restoring a 1M-row stream table from a snapshot completes in < 5 seconds (bulk INSERT from local table). The frontier alignment ensures the first differential refresh fetches only new changes, not all rows.


Pattern 7: Transactional Outbox (v0.28.0)

Requires: v0.28.0+

The transactional outbox pattern reliably publishes stream table deltas to external consumers — even if the consumer is temporarily offline. Each time the stream table refreshes, pg_trickle writes a header row to a dedicated outbox table. Consumers read from the outbox via poll_outbox(), process the delta, then commit their offset.

Use this pattern when:

  • You need to publish stream table changes to a message queue, webhook, or another service
  • Consumers need at-least-once delivery guarantees
  • Multiple independent consumers need to read the same stream independently
  • You want replay / seek-to-offset for recovery

Architecture

orders (base table)
  └─→ orders_agg (stream table)
        └─→ pgt_outbox_orders_agg (outbox table)
              ├─→ Consumer group A: analytics pipeline
              └─→ Consumer group B: notification service

SQL Example

-- 1. Create the stream table
SELECT pgtrickle.create_stream_table(
    'public.orders_agg',
    'SELECT customer_id, SUM(amount) AS total, COUNT(*) AS cnt FROM orders GROUP BY customer_id',
    schedule_seconds => 5
);

-- 2. Enable the outbox
SELECT pgtrickle.enable_outbox('public.orders_agg', retention_hours => 48);

-- 3. Create consumer groups
SELECT pgtrickle.create_consumer_group('analytics', 'public.orders_agg', auto_offset_reset => 'latest');
SELECT pgtrickle.create_consumer_group('notifications', 'public.orders_agg', auto_offset_reset => 'latest');

-- 4. Consumer A polls and processes
DO $$
DECLARE
    r RECORD;
    last_id BIGINT := 0;
BEGIN
    FOR r IN
        SELECT * FROM pgtrickle.poll_outbox('analytics', 'worker-1', batch_size => 50)
    LOOP
        -- process r.payload (JSONB with inserted/deleted row arrays)
        last_id := r.outbox_id;
    END LOOP;

    IF last_id > 0 THEN
        PERFORM pgtrickle.commit_offset('analytics', 'worker-1', last_id);
    END IF;
END;
$$;

-- 5. Check consumer lag
SELECT * FROM pgtrickle.consumer_lag('analytics');

Consumer Group Tips

ScenarioSetting
Multiple competing workers sharing one offsetPut all workers in the same group
Independent pipelines that each need the full streamCreate a separate group per pipeline
Replay from the beginningseek_offset('my_group', 'worker-1', 0)
Resume after a crash without re-processingCommit offsets frequently; use extend_lease() for long processing
pg_trickle.outbox_enabled = true
pg_trickle.outbox_retention_hours = 48        # keep 2 days of history
pg_trickle.outbox_skip_empty_delta = true     # don't write rows for no-op refreshes
pg_trickle.outbox_force_retention = true      # keep rows until all groups commit
pg_trickle.consumer_dead_threshold_hours = 24 # mark workers dead after 24h silence
pg_trickle.consumer_cleanup_enabled = true

Anti-Patterns

  • Polling without committing: If commit_offset() is never called, the lease expires and the rows are re-delivered. Always commit after successful processing.
  • One group per worker: Use one group and multiple named consumers within it for competing-consumer parallelism. Use multiple groups only when pipelines are truly independent.
  • Long processing without heartbeats: Call consumer_heartbeat() every 10–15 seconds for long-running processing to avoid being marked dead.

When NOT to use this pattern

  • You only need to expose stream table changes to a single application that reads directly from PostgreSQL — a NOTIFY/LISTEN trigger or change table is simpler than a full outbox.
  • Delivery guarantees are not required (analytics, dashboards) — the overhead of consumer groups and offset tracking is not justified.
  • Your stream table refreshes every few seconds and consumers can tolerate a few seconds of lag — just poll the stream table directly.

Pattern 8: Transactional Inbox (v0.28.0)

Requires: v0.28.0+

The transactional inbox pattern provides a reliable, idempotent message receiver inside PostgreSQL. External producers write events to the inbox table; pg_trickle maintains stream tables that give you live views of pending, failed, and processed messages — all updated incrementally.

Use this pattern when:

  • You receive events from external systems and need to process them exactly-once
  • You want automatic dead-letter handling for failed messages
  • Multiple workers need to process different aggregates without stepping on each other
  • You need per-aggregate ordering guarantees

Architecture

external producer (Kafka / webhook / custom application)
  └─→ pgtrickle.orders_inbox (raw event table)
        ├─→ orders_inbox_pending  (stream table: awaiting processing)
        ├─→ orders_inbox_dlq      (stream table: failed messages)
        └─→ orders_inbox_stats    (stream table: event counts by type)

SQL Example

-- 1. Create the inbox
SELECT pgtrickle.create_inbox(
    'orders_inbox',
    schema           => 'pgtrickle',
    max_retries      => 3,
    with_dead_letter => true,
    with_stats       => true,
    schedule_seconds => 5
);

-- 2. External system inserts a message
INSERT INTO pgtrickle.orders_inbox (event_id, event_type, aggregate_id, payload)
VALUES (
    gen_random_uuid()::text,
    'order.placed',
    'customer-123',
    '{"order_id": 42, "amount": 99.50}'::jsonb
);

-- 3. Worker polls pending messages and processes
UPDATE pgtrickle.orders_inbox
SET processed_at = now()
WHERE event_id = '<event_id>'
  AND processed_at IS NULL;

-- 4. Check inbox health
SELECT pgtrickle.inbox_health('orders_inbox');

-- 5. Replay failed messages
SELECT pgtrickle.replay_inbox_messages(
    'orders_inbox',
    ARRAY['event-id-1', 'event-id-2']
);

Per-Aggregate Ordering

When messages for the same customer / entity must be processed in sequence:

-- Enable ordering: only surface the next unprocessed message per aggregate
SELECT pgtrickle.enable_inbox_ordering(
    'orders_inbox',
    aggregate_id_col => 'aggregate_id',
    seq_col          => 'event_sequence'
);

-- Workers now read from next_orders_inbox (one row per aggregate)
SELECT * FROM pgtrickle.next_orders_inbox;

Multi-Worker Partitioning

Scale horizontally without external coordination:

-- Worker 0 of 4 handles its share of aggregates
SELECT * FROM pgtrickle.orders_inbox_pending
WHERE pgtrickle.inbox_is_my_partition(aggregate_id, 0, 4);
pg_trickle.inbox_enabled = true
pg_trickle.inbox_processed_retention_hours = 72   # keep 3 days of processed msgs
pg_trickle.inbox_dlq_retention_hours = 0          # keep DLQ forever for forensics
pg_trickle.inbox_dlq_alert_max_per_refresh = 10   # alert on DLQ growth

Anti-Patterns

  • Not marking messages as processed: The _pending stream table will keep growing. Always set processed_at = now() after successful processing.
  • Ignoring the DLQ: Monitor orders_inbox_dlq and replay or investigate failed messages regularly. Use inbox_health() in your alerting pipeline.
  • Skipping idempotency: The inbox uses event_id for deduplication. Producers must supply stable, unique event_id values — typically a UUID derived from the source event.

When NOT to use this pattern

  • Message volume is very high (>10,000/s) — the inbox table becomes a hot bottleneck; consider a dedicated message queue (Kafka, NATS) fronting a batch INSERT into the inbox.
  • Processing is purely stateless and idempotency is guaranteed by the producer — writing to an inbox and querying _pending adds latency without benefit over direct INSERT + trigger.
  • The event source already provides at-least-once with dedup — a second layer of dedup in the inbox wastes storage.

See also: Use Cases · Performance Cookbook · SQL Reference · Tutorials: What Happens on INSERT · Outbox Pattern · Inbox Pattern

Mental Model: How pg_trickle Works

This document explains the core concepts behind pg_trickle's differential view maintenance engine for developers who know SQL but have not studied incremental view maintenance (IVM) theory. Analogies come before formulas.


1. The Problem: Expensive Full Recomputation

A standard PostgreSQL materialized view is a snapshot. When the source data changes, you call REFRESH MATERIALIZED VIEW and PostgreSQL re-runs the entire defining query — scanning every row in every source table, applying all the joins, filtering, and aggregating — every time.

For a billion-row orders table with 100 new orders since the last refresh, this is the equivalent of re-reading an entire library to update one paragraph.

pg_trickle solves this with differential maintenance: compute only the change in the view output caused by the change in the inputs.


2. The Key Insight: Deltas Are Just Rows

Think of a change to a table as a signed multiset of rows:

  • +1 for an inserted row
  • -1 for a deleted row
  • -1 for the old version of an updated row, +1 for the new version

If the source table T changes by a delta ΔT, and the view V = f(T), then the view output changes by ΔV = f(T + ΔT) - f(T).

For many SQL operators, ΔV can be computed without reading T at all — only ΔT is needed. For a simple SELECT * FROM orders WHERE status = 'active':

  • ΔV = { new rows in ΔT where status = 'active' } - { deleted rows in ΔT where status = 'active' }

For a COUNT(*) aggregate, the delta is even simpler:

  • Δcount = (inserted active rows) - (deleted active rows)

This is why pg_trickle can refresh a stream table in milliseconds even when the source table has billions of rows.


3. Change Capture: The Change Buffer

Before pg_trickle can compute ΔV, it needs to know ΔT. It captures changes using row-level AFTER triggers (the default) or WAL decoding.

Each source table gets a dedicated change buffer table: pgtrickle_changes.changes_<source_table_oid>. The trigger writes every inserted, updated, or deleted row into this buffer as part of the same transaction as the DML. This gives you:

  • Atomicity: A committed change is guaranteed to be in the buffer.
  • No missed changes: There is no window between commit and capture.
  • Snapshot isolation: The buffer holds the before/after images of each row.

The change buffer accumulates rows between refresh cycles. On each refresh, the DVM engine reads the buffer, computes the delta SQL, applies it to the stream table, and truncates the buffer.


4. The Delta SQL

For each stream table, pg_trickle pre-generates a delta SQL template at creation time. This template is parameterized by the change buffer contents and produces the ΔV rows to apply.

For a simple aggregation like:

SELECT customer_id, COUNT(*) AS order_count
FROM orders GROUP BY customer_id

The delta SQL looks roughly like:

-- Compute which customers changed in this refresh window
WITH changed_customers AS (
    SELECT DISTINCT customer_id FROM pgtrickle_changes.changes_<oid>
),
-- Recompute count for only the affected customers
new_counts AS (
    SELECT customer_id, COUNT(*) AS order_count
    FROM orders
    WHERE customer_id IN (SELECT customer_id FROM changed_customers)
    GROUP BY customer_id
)
-- Apply: delete old rows, insert new rows for changed customers
MERGE INTO stream_table AS t
USING new_counts AS s ON t.customer_id = s.customer_id
WHEN MATCHED THEN UPDATE SET order_count = s.order_count
WHEN NOT MATCHED THEN INSERT VALUES (s.customer_id, s.order_count)
WHEN NOT MATCHED BY SOURCE AND t.customer_id IN (SELECT customer_id FROM changed_customers)
    THEN DELETE;

The key property: the FROM orders scan is filtered to only the affected customer IDs, not the full table. When 10 customers out of 10 million changed, only 10 customer IDs are scanned.


5. Algebraic Operators: What Can Be Maintained Incrementally?

Not all SQL operators can be maintained in O(Δ). pg_trickle classifies them into categories:

✅ Fully Incremental (O(Δ))

  • SELECT with filters, projections, casts
  • INNER JOIN, LEFT JOIN (equi-join with indexed keys)
  • GROUP BY with algebraic aggregates: COUNT, SUM, MIN, MAX, AVG
  • DISTINCT (with reference counting)
  • UNION ALL
  • WHERE EXISTS / WHERE NOT EXISTS (converted to semi/anti-join)
  • HAVING (filter on aggregate result)

⚠️ Conditionally Incremental

  • COUNT(DISTINCT x) — incremental with algebraic Z-set counting
  • STDDEV, VARIANCE — incremental using sum-of-squares decomposition
  • TOP-N (ORDER BY ... LIMIT) — incremental within the top-N window
  • Multi-table joins — incremental, but delta SQL becomes larger with more tables
  • CUBE / ROLLUP — expanded into UNION ALL branches, each incremental

❌ Not Incremental (falls back to FULL refresh)

  • TABLESAMPLE — non-deterministic, cannot be differentiated
  • VOLATILE functions (random(), now(), nextval()) in the SELECT list
  • ORDER BY without LIMIT — full sort on every refresh
  • FETCH FIRST without ORDER BY — non-deterministic
  • Window functions in the output — planned for future support
  • Recursive CTEs with CYCLE — non-terminating delta

When a query contains a non-incremental operator, pg_trickle automatically uses FULL refresh — replacing the stream table contents entirely. This is transparent to the application.


6. The Row Identity Problem

MERGE needs to know which rows in the stream table correspond to which rows in the delta. This is the row identity problem.

For stream tables with a natural primary key in the output (e.g., customer_id), the MERGE key is obvious. For aggregations without a natural key, or for queries with complex output structures, pg_trickle computes a row identity hash (__pgt_row_id) from the grouping keys or the query structure. This column is maintained automatically and is invisible in normal SELECT * queries.


7. The Refresh Cycle

Source DML → CDC trigger → change buffer (same txn)
                                 ↓
                     Scheduler background worker (async)
                                 ↓
                     delta SQL → MERGE into stream table
                                 ↓
                     Truncate change buffer

The scheduler wakes every pg_trickle.scheduler_interval_ms (default 1s), checks which stream tables are ready to refresh based on their schedule, and runs the refresh in dependency (topological) order.

Key property: The application sees a consistent read of the stream table at all times. The MERGE either has fully committed or not. There is no partial update visible to readers.


8. DAG Chaining: Stream Tables as Sources

Stream tables can themselves be sources for other stream tables, forming a directed acyclic graph (DAG) of dependencies:

orders → orders_by_customer → customer_top10
              ↑
         order_items

When orders changes, pg_trickle refreshes orders_by_customer first, then uses its delta to refresh customer_top10. Each step is O(Δ), so the full chain completes in time proportional to the number of changed rows — not the total data size.

pg_trickle detects cycles and rejects stream table definitions that would create them (unless pg_trickle.allow_circular = true, which enables fixpoint iteration for convergent circular queries).

The scheduler runs refreshes in topological order and supports parallel refresh (pg_trickle.parallel_refresh_mode = 'on', the default) to execute independent branches of the DAG concurrently.


See Also

pg_trickle Limitations

This document covers what pg_trickle cannot do, the constraints of DIFFERENTIAL mode, source table restrictions, and operational anti-patterns. Use the decision tree at the end to quickly determine if your use case is supported.


Unsupported SQL Constructs

The following SQL features are not supported in the defining query of a stream table. Attempting to use them will produce an UnsupportedOperator error at creation time, or at the first refresh attempt.

In DIFFERENTIAL mode only

These constructs force a fallback to FULL refresh. The stream table is created successfully, but every refresh performs a full table recomputation:

ConstructReasonWorkaround
ORDER BY without LIMITResult ordering is non-deterministic as a deltaRemove ORDER BY, or use ORDER BY ... LIMIT N for Top-N views
TABLESAMPLENon-deterministic sampling cannot be differentiatedUse FULL mode explicitly
VOLATILE functions in SELECTrandom(), now(), clock_timestamp(), nextval() change on every callPre-compute volatile values in the source table, or use FULL mode
STABLE functions in GROUP BY keysKey values change between refresh cyclesUse stable or immutable functions only
Window functions in outputROW_NUMBER(), RANK(), LEAD(), LAG() require global reorderingUse FULL mode, or pre-aggregate and use Top-N views
FETCH FIRST without ORDER BYNon-deterministic selectionAdd a deterministic ORDER BY and use Top-N via LIMIT
GROUPING SETS beyond branch limitExplosion prevents O(Δ) maintenanceReduce dimensions, or raise pg_trickle.max_grouping_set_branches

Not supported at all (any mode)

ConstructReason
WITH RECURSIVERecursive CTEs require convergence logic not yet implemented
DDL inside the defining queryCREATE TABLE, CALL etc. are not valid in SELECT
RETURNING clausesNot applicable to SELECT queries
FOR UPDATE / FOR SHARELocking hints cannot be used in defining queries
Subqueries with side effectsINSERT ... RETURNING in subqueries
pg_catalog internal tables as sourcesInternal catalog tables are not tracked by CDC
Temp tables as sourcesTemporary tables are session-scoped; CDC triggers cannot be installed

DIFFERENTIAL Mode Constraints

Source table requirements

For DIFFERENTIAL mode to work correctly, each source table must:

  1. Have a primary key or unique index on the columns used as join keys. Without a reliable row identity, the MERGE step cannot match old and new versions of a row. pg_trickle can fall back to a hash-based row ID (__pgt_row_id) for sources without primary keys, but this adds overhead.

  2. Not use UNLOGGED or TEMPORARY storage for the stream table output. The stream table must survive a crash-recovery cycle. Source tables can be UNLOGGED (changes are still captured by triggers).

  3. Not be altered concurrently in ways that change column structure while a refresh is running. pg_trickle blocks source DDL by default (pg_trickle.block_source_ddl = true). Disabling this risks schema inconsistency between the change buffer and the stream table.

Multi-source join constraints

When a defining query joins multiple source tables:

  • All join keys must be equi-joins (e.g., t1.id = t2.id). Range joins (t1.ts BETWEEN t2.start AND t2.end) force FULL mode.
  • The number of delta CTEs grows with the number of sources. Queries joining 5+ large tables may hit the pg_trickle.max_diff_ctes limit. The default limit is 200; raise it or simplify the query.
  • Left outer joins with nullable right-side keys add correctness complexity. pg_trickle handles them correctly, but the delta SQL is larger.

Aggregate constraints

AggregateSupported?Notes
COUNT(*)✅ YesFully algebraic
SUM(x)✅ YesFully algebraic
MIN(x), MAX(x)✅ YesWith reference counting
AVG(x)✅ YesVia sum + count decomposition
STDDEV(x), VARIANCE(x)✅ YesVia sum-of-squares decomposition
COUNT(DISTINCT x)✅ YesVia Z-set algebraic counting
ARRAY_AGG(x)❌ NoOrder-dependent; use FULL mode
STRING_AGG(x, sep)❌ NoOrder-dependent; use FULL mode
JSON_AGG(x)❌ NoOrder-dependent; use FULL mode
PERCENTILE_CONT(f) WITHIN GROUP (ORDER BY x)❌ NoRequires global sort
MODE()❌ NoRequires global frequency computation
Custom user-defined aggregates⚠️ MaybeSupported if the aggregate provides sfunc + finalfunc that pg_trickle can decompose; marked STRICT aggregates are rejected

Source Table Restrictions

Supported source types

Source typeSupported?Notes
Regular heap tables✅ YesFull CDC support
Partitioned tables (declarative)✅ YesTriggers installed on each partition
Foreign tables (postgres_fdw)✅ YesSnapshot-comparison mode
Materialized views✅ YesSnapshot-comparison mode
Other stream tables✅ YesDAG chaining supported
UNLOGGED tables✅ Yes (source)Changes captured; stream table output must be logged
Temporary tables❌ NoSession-scoped; CDC triggers cannot persist
System catalogs (pg_class, etc.)❌ NoNot tracked by CDC
Views (non-materialized)❌ NoAutomatically inlined; the base tables become the sources
Remote tables via dblink⚠️ LimitedUse foreign tables via postgres_fdw instead

Column type constraints

  • text-typed columns named as join keys work, but are less efficient than integer or UUID keys. Use an index on the join key columns.
  • jsonb columns in GROUP BY are supported but hash joins on JSONB are expensive. Consider extracting the key sub-field.
  • bytea columns work in the output but cannot be used as GROUP BY keys in DIFFERENTIAL mode.

Operational Anti-Patterns

Anti-pattern 1: Very high write rates with low schedules

Problem: If a source table receives 100K inserts/second and the stream table schedule is 1 second, the change buffer accumulates 100K rows per cycle. The DIFFERENTIAL delta SQL must process all 100K rows on every refresh, which may take longer than 1 second — causing the scheduler to fall behind.

Fix:

  • Increase the schedule to allow batching: schedule => '10s'
  • Enable the adaptive fallback: pg_trickle.differential_max_change_ratio = 0.15 (default: fall back to FULL when > 15% of the source table changed)
  • Use pg_trickle.max_delta_estimate_rows to cap delta size

Anti-pattern 2: Unbounded DAG depth

Problem: A chain of 20+ stream tables where each depends on the previous creates O(depth) sequential refresh latency on every cycle.

Fix: Flatten the DAG where possible. Use parallel refresh (pg_trickle.parallel_refresh_mode = 'on') for independent branches. Consider whether intermediate stream tables are necessary.

Anti-pattern 3: Schema changes on live sources

Problem: ALTER TABLE ... DROP COLUMN on a source table while pg_trickle is running will break the change buffer schema and cause refresh errors.

Fix: Keep pg_trickle.block_source_ddl = true (the default). This causes schema changes to fail with a descriptive error; you can then update the stream table query explicitly before re-applying the schema change.

Anti-pattern 4: Treating stream tables as application write targets

Problem: Inserting or updating rows directly in a stream table bypasses pg_trickle's refresh logic. On the next refresh, the direct writes will be overwritten.

Fix: Stream tables are read-only from the application's perspective. All writes must go through the source tables. Use the pgtrickle.repair_stream_table() function if a stream table gets into an inconsistent state.

Anti-pattern 5: Using pg_trickle.enabled = false in production

Problem: Setting pg_trickle.enabled = false globally stops all refreshes. Change buffers accumulate indefinitely. Re-enabling causes a burst refresh of all stream tables simultaneously.

Fix: Use pgtrickle.suspend_stream_table() to pause individual stream tables, or pg_trickle.drain_mode = true to stop new work while completing in-flight refreshes.


"Will This Work?" Decision Tree

Does your query use window functions in the output?
  YES → Use FULL mode (refresh_mode => 'FULL')
  NO  ↓

Does your query use volatile functions (random(), now(), nextval())?
  YES → Use FULL mode, or pre-compute the volatile value in the source
  NO  ↓

Does your query use ORDER BY without LIMIT?
  YES → Remove ORDER BY, or use LIMIT N for a Top-N stream table
  NO  ↓

Does your query use WITH RECURSIVE?
  YES → Not supported. Materialize the recursive query into a regular table first.
  NO  ↓

Do all join keys use equi-join conditions (= not BETWEEN / >=)?
  NO  → Use FULL mode, or rewrite the join condition
  YES ↓

Does every source table have a primary key or unique index on the join key?
  NO  → pg_trickle will use hash-based row IDs (slightly less efficient, but works)
  YES ↓

✅ Your query is a good candidate for DIFFERENTIAL mode.
   Use: refresh_mode => 'DIFFERENTIAL' or 'AUTO' (default).

Multi-Column NOT IN with Nullable Elements (COR-1, v0.58.0)

When a defining query contains a multi-column NOT IN subquery such as:

SELECT a, b FROM t
WHERE (a, b) NOT IN (SELECT x, y FROM s)

pg_trickle v0.55.0 introduced an optimisation that rewrites (a, b) IN (SELECT x, y …) as a SemiJoin and NOT IN as an AntiJoin. However, SQL semantics for NOT IN differ from AntiJoin semantics when either side of the comparison can be NULL: SQL propagates UNKNOWN (which excludes the outer row), whereas an AntiJoin keeps the outer row.

Behaviour: When any element on the left-hand side of the row constructor is a NULL constant, or when any column in the subquery's SELECT list is a NULL literal, pg_trickle v0.58.0 detects this condition and falls back to the subquery-based (FULL refresh) execution path, emitting a NOTICE:

NOTICE: pg_trickle: multi-column NOT IN with nullable elements cannot be
rewritten to an anti-join; falling back to subquery-based delta computation.

Workaround: Rewrite using NOT EXISTS or add explicit IS NOT NULL guards to avoid NULL-producing expressions in the row constructor.


Known Future Improvements

LimitationPlanned in
Window functions in outputv1.1+
WITH RECURSIVE supportv1.2+
STRING_AGG / ARRAY_AGG incremental maintenanceResearching
Cross-database stream tables (without foreign tables)Not planned

See Also

Performance Tuning Cookbook

This document is a practical, recipe-oriented guide to squeezing the best throughput and latency out of pg_trickle stream tables. Each recipe describes why a problem occurs, when to apply it, and how to implement the fix.


Table of Contents

  1. Choosing the Right Refresh Mode
  2. Tuning the Scheduler Interval
  3. Controlling Change-Buffer Growth
  4. Accelerating Wide-Join Queries
  5. Reducing Lock Contention
  6. Managing Spill-to-Disk in Large Deltas
  7. Speeding Up FULL Refresh with Parallelism
  8. Monitoring with Prometheus
  9. Partition-Aware Stream Tables
  10. Adaptive Threshold Tuning
  11. Canary Testing Query Changes
  12. Recovering from Stale Stream Tables

1. Choosing the Right Refresh Mode

Problem: DIFFERENTIAL refresh is slower than expected, or FULL refresh keeps being chosen by the adaptive engine when you expect DIFFERENTIAL.

Diagnosis: Run the diagnostics helper:

SELECT * FROM pgtrickle.diagnose_stream_table('public.orders_mv');

Look at recommended_mode, composite_score, and change_ratio_current.

Recipe — Force DIFFERENTIAL for low-churn tables:

SELECT pgtrickle.alter_stream_table(
    'public.orders_mv',
    refresh_mode => 'DIFFERENTIAL'
);

Use this when:

  • change_ratio_current < 0.05 (less than 5% of rows change per tick)
  • The query has no DISTINCT, EXCEPT, or INTERSECT at the top level
  • The table has a suitable covering index on the join/group-by columns

Recipe — Force FULL for high-churn or complex queries:

SELECT pgtrickle.alter_stream_table(
    'public.summary_mv',
    refresh_mode => 'FULL'
);

Use this when:

  • change_ratio_current > 0.30
  • The query contains WITH RECURSIVE, complex GROUPING SETS, or multiple correlated subqueries

Recipe — Use AUTO (recommended default):

SELECT pgtrickle.alter_stream_table(
    'public.orders_mv',
    refresh_mode => 'AUTO'
);

AUTO switches between FULL and DIFFERENTIAL each cycle based on the adaptive cost model (pg_trickle.cost_model_safety_margin).


2. Tuning the Scheduler Interval

Problem: Stream tables are falling behind the source; refreshes are not running often enough. Or conversely, the scheduler is running too frequently, creating unnecessary load.

Diagnosis:

-- Check average staleness across all active stream tables
SELECT pgt_name, staleness_seconds
FROM pgtrickle.st_refresh_stats()
ORDER BY staleness_seconds DESC NULLS LAST;

Recipe — Reduce the poll interval for fresher data:

-- In postgresql.conf or via ALTER SYSTEM:
ALTER SYSTEM SET pg_trickle.scheduler_interval_ms = 250;
SELECT pg_reload_conf();

Minimum safe value: 250 ms. Below this, CPU overhead from the scheduler loop becomes noticeable.

Recipe — Set a per-table schedule:

-- Refresh every 30 seconds
SELECT pgtrickle.alter_stream_table('public.orders_mv', schedule => '30s');

-- Refresh using a cron expression (every 5 minutes)
SELECT pgtrickle.alter_stream_table('public.daily_agg', schedule => '*/5 * * * *');

Per-table schedules override the global poll interval for that stream table.


3. Controlling Change-Buffer Growth

Problem: The change buffer schema (pgtrickle_changes.*) keeps growing and consuming disk space.

Diagnosis:

SELECT schemaname, tablename,
       pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size
FROM pg_tables
WHERE schemaname = current_setting('pg_trickle.change_buffer_schema', true)
                   ::text
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC
LIMIT 20;

Recipe — Reduce the WAL-to-buffer retention window:

-- Advance the frontier faster by refreshing more frequently
SELECT pgtrickle.alter_stream_table('public.orders_mv', schedule => '5s');

pg_trickle deletes change-buffer rows once every stream table that references the source has consumed them. Slow stream tables block cleanup.

Recipe — Enable truncate-based cleanup (faster for large buffers):

ALTER SYSTEM SET pg_trickle.cleanup_use_truncate = on;
SELECT pg_reload_conf();

Uses TRUNCATE instead of DELETE when cleaning up entire partitioned change-buffer tables. Avoids bloat from frequent deletes.


4. Accelerating Wide-Join Queries

Problem: DIFFERENTIAL refresh on a query with 5+ table joins is slow.

Diagnosis:

-- Check the join scan count
SELECT pgtrickle.validate_query($$ SELECT … FROM a JOIN b JOIN c … $$);

Recipe — Enable planner hints for wide joins:

ALTER SYSTEM SET pg_trickle.planner_aggressive = on;
ALTER SYSTEM SET pg_trickle.merge_planner_hints = on;
SELECT pg_reload_conf();

This sets SET LOCAL enable_seqscan = off and SET LOCAL join_collapse_limit = 1 before the MERGE execution, forcing the planner to use indexes.

Recipe — Limit differential join depth:

SELECT pgtrickle.alter_stream_table(
    'public.complex_mv',
    max_differential_joins => 4
);

When join count exceeds max_differential_joins, pg_trickle falls back to FULL refresh instead of failing with a planning error.

Recipe — Add covering indexes on join keys:

-- The differential engine joins on __pgt_row_id; ensure the join keys
-- are indexed in both the storage table and source tables.
CREATE INDEX CONCURRENTLY ON orders (customer_id, order_date);
CREATE INDEX CONCURRENTLY ON customers (id) INCLUDE (name, region);

5. Reducing Lock Contention

Problem: lock timeout errors appear in pgt_refresh_history, or queries against the stream table are blocked during refresh.

Diagnosis:

SELECT * FROM pgtrickle.diagnose_errors('public.orders_mv') LIMIT 10;

Look for error_type = 'performance' with lock timeout in error_message.

Recipe — Increase lock timeout:

ALTER SYSTEM SET pg_trickle.lock_timeout = '5s';
SELECT pg_reload_conf();

Recipe — Use APPEND_ONLY mode for insert-only pipelines:

SELECT pgtrickle.alter_stream_table(
    'public.events_mv',
    append_only => true
);

APPEND_ONLY skips the MERGE and uses a fast INSERT … SELECT which holds locks for a much shorter time.

Recipe — Use pooler compatibility mode:

SELECT pgtrickle.alter_stream_table(
    'public.orders_mv',
    pooler_compatibility_mode => true
);

Disables prepared-statement reuse, which can cause issues with PgBouncer in transaction-pool mode.


6. Managing Spill-to-Disk in Large Deltas

Problem: Differential refresh writes large amounts of temp data, causing performance degradation.

Diagnosis:

SELECT pgt_name, last_temp_blks_written
FROM pgtrickle.st_refresh_stats();

Recipe — Increase work_mem for MERGE operations:

ALTER SYSTEM SET pg_trickle.merge_work_mem_mb = 256;
SELECT pg_reload_conf();

Recipe — Set a spill threshold to auto-switch to FULL:

-- Force FULL refresh after 3 consecutive spilling differentials
ALTER SYSTEM SET pg_trickle.spill_threshold_blocks = 10000;
ALTER SYSTEM SET pg_trickle.spill_consecutive_limit = 3;
SELECT pg_reload_conf();

After spill_consecutive_limit consecutive differential refreshes that write more than spill_threshold_blocks temp blocks, pg_trickle switches to FULL refresh for that stream table.


7. Speeding Up FULL Refresh with Parallelism

Problem: FULL refresh is slow due to large source tables.

Recipe — Enable parallel query for FULL refresh:

-- Allow more parallel workers
ALTER SYSTEM SET max_parallel_workers_per_gather = 8;
ALTER SYSTEM SET parallel_tuple_cost = 0.01;
SELECT pg_reload_conf();

pg_trickle uses INSERT INTO … SELECT … which respects the standard PostgreSQL parallel query settings.

Recipe — Enable partition-parallel refresh:

SELECT pgtrickle.alter_stream_table(
    'public.orders_mv',
    partition_by => 'region'
);

With partition_by, pg_trickle dispatches one refresh worker per partition, running them in parallel.


8. Monitoring with Prometheus

Problem: You want to monitor pg_trickle metrics with Prometheus.

Recipe — Enable the built-in metrics endpoint (v0.21.0+):

ALTER SYSTEM SET pg_trickle.metrics_port = 9188;
SELECT pg_reload_conf();

Then configure Prometheus to scrape:

scrape_configs:
  - job_name: pg_trickle
    static_configs:
      - targets: ['localhost:9188']
    metrics_path: /metrics

Available metrics:

MetricTypeDescription
pg_trickle_refreshes_totalcounterSuccessful refreshes per stream table
pg_trickle_refresh_failures_totalcounterFailed refreshes per stream table
pg_trickle_rows_changed_totalcounterRows inserted + deleted per table
pg_trickle_consecutive_errorsgaugeCurrent error streak per table
pg_trickle_activegauge1 if ACTIVE, 0 otherwise

Recipe — Check staleness via SQL (for custom alerting):

SELECT pgt_name, staleness_seconds, stale
FROM pgtrickle.st_refresh_stats()
WHERE stale = true;

9. Partition-Aware Stream Tables

Problem: A stream table over a large partitioned source is slow to refresh.

Recipe — Mirror source partitioning:

-- If the source is RANGE partitioned by order_date:
SELECT pgtrickle.create_stream_table(
    'public.orders_by_region',
    'SELECT customer_id, SUM(total) FROM orders GROUP BY customer_id',
    partition_by => 'customer_id'
);

Recipe — Per-partition MERGE for HASH-partitioned targets:

pg_trickle automatically uses per-partition MERGE when the stream table is HASH-partitioned. No additional configuration is needed; the optimizer routes each row to the correct partition.


10. Adaptive Threshold Tuning

Problem: The adaptive engine keeps switching between FULL and DIFFERENTIAL unexpectedly.

Recipe — Widen the dead zone (less switching):

-- Require a 30% score difference before switching (default: 20%)
ALTER SYSTEM SET pg_trickle.cost_model_safety_margin = 0.30;
SELECT pg_reload_conf();

Recipe — Use self-monitoring analytics to auto-tune:

-- Let pg_trickle automatically apply threshold recommendations
ALTER SYSTEM SET pg_trickle.self_monitoring_auto_apply = 'threshold_only';
SELECT pg_reload_conf();

With threshold_only, pg_trickle applies max_delta_fraction changes from pgtrickle.df_threshold_advice when confidence is HIGH.


11. Canary Testing Query Changes

Problem: You want to change a stream table's defining query safely without impacting production.

Recipe — Use canary/shadow mode (v0.21.0+):

-- 1. Create a canary table with the new query
SELECT pgtrickle.canary_begin(
    'public.orders_mv',
    'SELECT customer_id, COUNT(*), SUM(total) FROM orders GROUP BY customer_id'
);

-- 2. Wait for the canary to populate (check status)
SELECT status FROM pgtrickle.pgt_stream_tables
WHERE pgt_name = '__pgt_canary_orders_mv';

-- 3. Compare live vs canary output
SELECT * FROM pgtrickle.canary_diff('public.orders_mv');

-- 4. If diff is empty (or acceptable), promote the canary
SELECT pgtrickle.canary_promote('public.orders_mv');

The canary_diff result will be empty when both the old and new queries produce identical output for the current source data.


12. Recovering from Stale Stream Tables

Problem: A stream table is SUSPENDED or has a large backlog of changes.

Recipe — Pause all tables, catch up, then resume:

SELECT pgtrickle.pause_all();

-- Investigate
SELECT pgt_name, status, consecutive_errors
FROM pgtrickle.pgt_stream_tables
ORDER BY consecutive_errors DESC;

-- Fix the root cause, then resume
SELECT pgtrickle.resume_all();

Recipe — Force immediate refresh on stale tables:

-- Refresh only if older than 10 minutes
SELECT pgtrickle.refresh_if_stale('public.orders_mv', '10 minutes');

Recipe — Full reinitialization after schema change:

-- If source schema changed, reinitialize to rebuild column metadata
SELECT pgtrickle.reinitialize_stream_table('public.orders_mv');

13. DVM Query Complexity Limits

Problem: Differential refresh is slower than full refresh for complex queries, especially at scale. Understanding when DIFFERENTIAL mode breaks down helps you choose the right strategy.

Three Failure Mode Categories

CategorySQL PatternSymptomRoot Cause
Threshold Collapse4+ table JOINs with cascading EXCEPT ALLFast at small scale, 100–260× slower per data decadeIntermediate CTE cardinality blowup: O(n²) row generation from L₀ snapshot expansion
Early CollapseEXISTS anti-join with non-equi predicates140× jump at first 10× scale step, then stableEqui-join key filter not applied correctly; R_old EXCEPT ALL scans full table
Structural BugDoubly-nested correlated EXISTS / NOT EXISTSSlow at all scales (constant ~2s overhead)Inner R_old re-materialized per outer delta row: O(Δ_outer × n_inner)

Which SQL Patterns Trigger Each Category

Threshold Collapse (queries like TPC-H Q05, Q07, Q08, Q09):

  • Multi-table joins (4+ tables) using the cascading EXCEPT ALL delta strategy
  • Queries with many intermediate join nodes generate exponential intermediate rows
  • Diagnosis: pgtrickle.log_delta_sql = on + EXPLAIN (ANALYZE, BUFFERS)

Early Collapse (queries like TPC-H Q04):

  • WHERE EXISTS (SELECT 1 FROM t WHERE t.key = outer.key AND t.col < t.col2)
  • The non-equi predicates in the EXISTS clause can prevent key-filter extraction
  • Diagnosis: Check if R_old CTE scans the full right table

Structural Bug (queries like TPC-H Q20):

  • WHERE EXISTS (SELECT 1 FROM t1 WHERE EXISTS (SELECT 1 FROM t2 WHERE ...))
  • Inner snapshot CTEs are re-evaluated per outer row instead of shared
  • Diagnosis: Look for repeated CTE evaluations in EXPLAIN ANALYZE
PatternSafe for DIFFUse FULL above
Simple scan/filterAny scale
2-table JOINUp to ~10M rows
3-table JOINUp to ~1M rows~10M rows
4+ table JOINUp to ~100K rows~1M rows
EXISTS anti-joinUp to ~100K rows~1M rows
Nested EXISTSUse FULL mode

Diagnosing Your Query

-- 1. Enable delta SQL logging
SET pg_trickle.log_delta_sql = on;

-- 2. Trigger a manual refresh
SELECT pgtrickle.refresh_stream_table('my_stream_table');

-- 3. Check the PostgreSQL log for the generated delta SQL
-- 4. Run EXPLAIN (ANALYZE, BUFFERS) on the captured SQL
-- 5. Look for:
--    - Nested Loop joins on large tables (threshold collapse)
--    - Sequential scans on R_old CTEs (early collapse)
--    - Repeated CTE evaluations (structural bug)

-- Use explain_diff_sql() to inspect without executing:
SELECT pgtrickle.explain_diff_sql('my_stream_table');

Mitigation GUCs

-- Increase work_mem for delta execution
SET pg_trickle.delta_work_mem = 256;  -- MB

-- Disable nested loops for delta execution
SET pg_trickle.delta_enable_nestloop = off;

-- Run ANALYZE on change buffers (enabled by default)
SET pg_trickle.analyze_before_delta = on;

Worked Example A — max_diff_ctes Hit and Recovery

Symptom: EXPLAIN ANALYZE shows more than pg_trickle.max_diff_ctes (default 64) CTEs in the generated delta SQL, and the refresh falls back to FULL mode with a warning in pgt_refresh_history.

Diagnosis:

-- Check the warning in the refresh history
SELECT pgt_name, refresh_mode, rows_in_last_refresh, warning_message
FROM pgtrickle.pgt_refresh_history
WHERE pgt_name = 'my_complex_view'
ORDER BY started_at DESC
LIMIT 5;

-- Inspect the generated delta SQL
SELECT pgtrickle.explain_diff_sql('my_complex_view');
-- Count the CTE blocks in the output

Recovery steps:

-- Option 1: Raise the limit (accept higher delta execution cost)
ALTER SYSTEM SET pg_trickle.max_diff_ctes = 128;
SELECT pg_reload_conf();

-- Option 2: Simplify the query — split a complex view into two stream tables
-- First level: join + filter
SELECT pgtrickle.create_stream_table(
    'orders_with_products',
    'SELECT o.*, p.name AS product_name FROM orders o JOIN products p ON p.id = o.product_id',
    '10s', 'DIFFERENTIAL'
);
-- Second level: aggregate over the first
SELECT pgtrickle.create_stream_table(
    'revenue_summary',
    'SELECT product_name, SUM(amount) AS total FROM orders_with_products GROUP BY product_name',
    '15s', 'DIFFERENTIAL'
);

-- Option 3: Force FULL mode for queries that genuinely exceed complexity budget
SELECT pgtrickle.alter_stream_table('my_complex_view', refresh_mode => 'FULL');

Worked Example B — Detecting When FULL Beats DIFFERENTIAL

Symptom: The AUTO cost model keeps switching between FULL and DIFFERENTIAL every few cycles, or diff_speedup from refresh_efficiency() is below 1.5×.

Diagnosis using recommend_refresh_mode():

-- Get the weighted signal breakdown for the table
SELECT
    pgt_name,
    current_mode,
    recommended_mode,
    confidence,
    reason,
    jsonb_pretty(signals) AS signals
FROM pgtrickle.recommend_refresh_mode('my_table');

Examine the signals output. Key indicators that FULL is better:

SignalValue that favours FULL
change_ratio_avg> 0.30 (>30% of rows change per tick)
empirical_timingDIFF and FULL latency within 10%
latency_variancep95/p50 > 3 for DIFFERENTIAL
query_complexityScore < 0 (many joins / CTEs)

Apply the recommendation:

-- Switch to FULL when composite_score < -0.15
SELECT pgtrickle.alter_stream_table('my_table', refresh_mode => 'FULL');

-- Switch to AUTO and let the cost model decide going forward
SELECT pgtrickle.alter_stream_table('my_table', refresh_mode => 'AUTO');

-- Set the switching dead-zone wider to reduce oscillation
ALTER SYSTEM SET pg_trickle.cost_model_safety_margin = 0.25;
SELECT pg_reload_conf();

Worked Example C — Deep-Join Chain and max_differential_joins

Symptom: A stream table with a 6-way JOIN is slow in DIFFERENTIAL mode, even though individual tables are small.

Diagnosis:

-- Enable delta SQL logging and trigger a refresh
SET pg_trickle.log_delta_sql = on;
SELECT pgtrickle.refresh_stream_table('deep_join_view');

-- Check effective join count reported by the engine
SELECT pgt_name, query_join_depth, last_refresh_mode_reason
FROM pgtrickle.pgt_stream_tables
WHERE pgt_name = 'deep_join_view';

Understanding the GUC:

pg_trickle.max_differential_joins (default: 4) sets the maximum number of right-side scan expansions the delta engine will attempt before falling back to FULL mode. Each additional join roughly doubles the number of delta CTE branches generated.

Tuning steps:

-- Option 1: Allow deeper join differentiation (accept higher delta cost)
ALTER SYSTEM SET pg_trickle.max_differential_joins = 6;
SELECT pg_reload_conf();

-- Option 2: Intermediate stream table to break the join chain
-- Split a 6-way join into two 3-way steps:
SELECT pgtrickle.create_stream_table(
    'join_layer_1',
    $$SELECT a.*, b.val AS b_val, c.val AS c_val
      FROM table_a a
      JOIN table_b b ON b.id = a.b_id
      JOIN table_c c ON c.id = a.c_id$$,
    '5s', 'DIFFERENTIAL'
);
SELECT pgtrickle.create_stream_table(
    'join_layer_2',
    $$SELECT l1.*, d.val AS d_val, e.val AS e_val, f.val AS f_val
      FROM join_layer_1 l1
      JOIN table_d d ON d.id = l1.d_id
      JOIN table_e e ON e.id = l1.e_id
      JOIN table_f f ON f.id = l1.f_id$$,
    '10s', 'DIFFERENTIAL'
);

-- Option 3: Verify the deep-join fast-path is eligible for a given query
SELECT pgtrickle.validate_query(
    $$<your deep-join query>$$
);

Breaking the join chain into two stream tables reduces each step to ≤3 right-side expansions, well within the default max_differential_joins = 4 limit.


See Also

Performance Cheat Sheet

Quick reference for pg_trickle performance tuning. For in-depth explanations, see CONFIGURATION.md and PERFORMANCE_COOKBOOK.md.


Three Golden Rules

  1. Measure before tuning. Use pgtrickle.explain_st('my_table') and pgtrickle.refresh_history('my_table') to understand where time is spent before adjusting any GUC.

  2. DIFFERENTIAL is almost always faster for Δ-small changes. Only switch to FULL mode when the change-to-table ratio is high (> 15%) or the delta SQL is genuinely slower than a full scan on your data distribution.

  3. The bottleneck is usually the change buffer or work_mem. If refreshes are slow, check pg_stat_statements for temp block writes before adjusting schedule frequency.


Top-10 GUC Quick Wins

GUCDefaultTune when...Recommended value
pg_trickle.parallel_refresh_mode'on'You have many independent stream tablesKeep 'on'; set 'off' only to debug
pg_trickle.max_concurrent_refreshes4Parallelism is bottlenecked or over-saturating I/OSet to number of independent DAG branches (2–8 typical)
pg_trickle.differential_max_change_ratio0.15DIFF is slower than FULL at peak write ratesRaise to 0.300.50 on high-churn workloads
pg_trickle.delta_work_mem0 (inherit)Refresh spills temp blocksSet to 128 (MB) for complex joins; 256+ for large aggregates
pg_trickle.analyze_before_deltatruePlanner picks bad plans on stale statsKeep true; set false only if ANALYZE overhead is measurable
pg_trickle.aggregate_fast_pathtrueAggregation refreshes are slowKeep true (uses explicit DML instead of MERGE for simple aggregates)
pg_trickle.scheduler_interval_ms1000Scheduler CPU overhead is highRaise to 500010000 on clusters with 100+ stream tables
pg_trickle.cleanup_use_truncatetrueChange buffer cleanup causes lock contentionSet false if TRUNCATE AccessExclusiveLock conflicts with source DML
pg_trickle.tiered_schedulingtrueCold stream tables waste CPU cyclesKeep true (prevents cold STs from refreshing at full speed)
pg_trickle.max_delta_estimate_rows0OOM or excessive temp spill on large deltasSet to 100000500000 to cap delta size and trigger FULL fallback

5 FULL-Fallback Patterns and How to Fix Them

These patterns cause pg_trickle to fall back to FULL refresh automatically. Each can often be rewritten for DIFFERENTIAL support.

Pattern 1: Volatile function in SELECT

-- ❌ Forces FULL: now() is volatile
SELECT id, created_at, now() - created_at AS age FROM orders;

-- ✅ DIFFERENTIAL: compute age in the source table or exclude it
SELECT id, created_at FROM orders;
-- Then compute age in the application or in a wrapper view

Pattern 2: ORDER BY without LIMIT

-- ❌ Forces FULL: full sort on every refresh
SELECT customer_id, SUM(amount) AS total
FROM orders GROUP BY customer_id
ORDER BY total DESC;

-- ✅ DIFFERENTIAL: remove ORDER BY (sort in the query layer)
SELECT customer_id, SUM(amount) AS total
FROM orders GROUP BY customer_id;
-- Sort in the SELECT: SELECT * FROM my_stream ORDER BY total DESC

Pattern 3: Non-equi join

-- ❌ Forces FULL: range join cannot be differentiated
SELECT e.*, s.salary_band
FROM employees e
JOIN salary_bands s ON e.salary BETWEEN s.min AND s.max;

-- ✅ DIFFERENTIAL: pre-classify salary_band in the source table
ALTER TABLE employees ADD COLUMN salary_band TEXT;
-- Update via trigger or background job
SELECT e.*, e.salary_band FROM employees e;

Pattern 4: ARRAY_AGG / STRING_AGG

-- ❌ Forces FULL: order-dependent aggregate
SELECT customer_id, STRING_AGG(product, ', ') AS products
FROM order_lines GROUP BY customer_id;

-- ✅ DIFFERENTIAL: use COUNT or a separate denormalized column
SELECT customer_id, COUNT(*) AS product_count
FROM order_lines GROUP BY customer_id;
-- If you need the array: maintain it in the source table

Pattern 5: Window function in output

-- ❌ Forces FULL: window function requires global ordering
SELECT customer_id, amount,
       RANK() OVER (ORDER BY amount DESC) AS rank
FROM orders;

-- ✅ Use a Top-N stream table instead
SELECT pgtrickle.create_stream_table(
    name    => 'top_orders',
    query   => 'SELECT customer_id, amount FROM orders ORDER BY amount DESC LIMIT 100',
    refresh_mode => 'DIFFERENTIAL'  -- pg_trickle supports ORDER BY LIMIT
);

Refresh Latency Quick Diagnostics

-- See last N refresh durations for a stream table
SELECT started_at, duration_ms, mode, rows_changed
FROM pgtrickle.refresh_history('my_stream_table', limit => 20)
ORDER BY started_at DESC;

-- Check if delta is spilling to disk
SELECT query, temp_blks_written
FROM pg_stat_statements
WHERE query LIKE '%pgtrickle_changes%'
ORDER BY temp_blks_written DESC
LIMIT 10;

-- See the generated delta SQL
SELECT pgtrickle.explain_diff_sql('my_stream_table');

-- Check change buffer size
SELECT schemaname, tablename, n_live_tup
FROM pg_stat_user_tables
WHERE schemaname = 'pgtrickle_changes'
ORDER BY n_live_tup DESC
LIMIT 10;

See Also

SQL Reference

Complete reference for all SQL functions, views, and catalog tables provided by pgtrickle.


Table of Contents


Functions

Core Lifecycle

Create, modify, and manage the lifecycle of stream tables.


pgtrickle.create_stream_table

Create a new stream table.

pgtrickle.create_stream_table(
    name                  text,
    query                 text,
    schedule              text      DEFAULT 'calculated',
    refresh_mode          text      DEFAULT 'AUTO',
    initialize            bool      DEFAULT true,
    diamond_consistency   text      DEFAULT NULL,
    diamond_schedule_policy text    DEFAULT NULL,
    cdc_mode              text      DEFAULT NULL,
    append_only           bool      DEFAULT false,
    pooler_compatibility_mode bool  DEFAULT false
) → void

Parameters:

ParameterTypeDefaultDescription
nametextName of the stream table. May be schema-qualified (myschema.my_st). Defaults to public schema.
querytextThe defining SQL query. Must be a valid SELECT statement using supported operators.
scheduletext'calculated'Refresh schedule as a Prometheus/GNU-style duration string (e.g., '30s', '5m', '1h', '1h30m', '1d') or a cron expression (e.g., '*/5 * * * *', '@hourly'). Use 'calculated' for CALCULATED mode (inherits schedule from downstream dependents).
refresh_modetext'AUTO''AUTO' (adaptive — uses DIFFERENTIAL when possible, falls back to FULL if the query is not differentiable), 'FULL' (truncate and reload), 'DIFFERENTIAL' (apply delta only — errors if the query is not differentiable), or 'IMMEDIATE' (synchronous in-transaction maintenance via statement-level triggers).
initializebooltrueIf true, populates the table immediately via a full refresh. If false, creates the table empty.
diamond_consistencytextNULL (defaults to 'atomic')Diamond dependency consistency mode: 'atomic' (SAVEPOINT-based atomic group refresh) or 'none' (independent refresh).
diamond_schedule_policytextNULL (defaults to 'fastest')Schedule policy for atomic diamond groups: 'fastest' (fire when any member is due) or 'slowest' (fire when all are due). Set on the convergence node.
cdc_modetextNULL (use pg_trickle.cdc_mode)Optional per-stream-table CDC override: 'auto', 'trigger', or 'wal'. This affects all deferred TABLE sources of the stream table.
append_onlyboolfalseWhen true, differential refreshes use a fast INSERT path instead of MERGE. Skips DELETE/UPDATE/IS DISTINCT FROM checks. If a DELETE or Update is later detected in the change buffer, the flag is automatically reverted to false. Not compatible with FULL, IMMEDIATE, or keyless sources.
pooler_compatibility_modeboolfalseWhen true, the refresh engine uses inline SQL instead of PREPARE/EXECUTE and suppresses all NOTIFY emissions for this stream table. Enable this when the stream table is accessed through a transaction-mode connection pooler (e.g. PgBouncer).

When refresh_mode => 'IMMEDIATE', the cluster-wide pg_trickle.cdc_mode setting is ignored. IMMEDIATE mode always uses statement-level IVM triggers instead of CDC triggers or WAL replication slots. If you explicitly pass cdc_mode => 'wal' together with refresh_mode => 'IMMEDIATE', pg_trickle rejects the call because WAL CDC is asynchronous and incompatible with in-transaction maintenance.

Duration format:

UnitSuffixExample
Secondss'30s'
Minutesm'5m'
Hoursh'2h'
Daysd'1d'
Weeksw'1w'
Compound'1h30m', '2m30s'

Cron expression format:

schedule also accepts standard cron expressions for time-based scheduling. The scheduler refreshes the stream table when the cron schedule fires, rather than checking staleness.

FormatFieldsExampleDescription
5-fieldmin hour dom mon dow'*/5 * * * *'Every 5 minutes
6-fieldsec min hour dom mon dow'0 */5 * * * *'Every 5 minutes at :00 seconds
Alias'@hourly'Every hour
Alias'@daily'Every day at midnight
Alias'@weekly'Every Sunday at midnight
Alias'@monthly'First of every month
Weekday range'0 6 * * 1-5'6 AM on weekdays

Note: Cron-scheduled stream tables do not participate in CALCULATED schedule resolution. The stale column in monitoring views returns NULL for cron-scheduled tables.

Example:

-- Duration-based: refresh when data is staler than 2 minutes (refresh_mode defaults to 'AUTO')
SELECT pgtrickle.create_stream_table(
    name     => 'order_totals',
    query    => 'SELECT region, SUM(amount) AS total FROM orders GROUP BY region',
    schedule => '2m'
);

-- Cron-based: refresh every hour
SELECT pgtrickle.create_stream_table(
    name         => 'hourly_summary',
    query        => 'SELECT date_trunc(''hour'', ts), COUNT(*) FROM events GROUP BY 1',
    schedule     => '@hourly',
    refresh_mode => 'FULL'
);

-- Cron-based: refresh at 6 AM on weekdays
SELECT pgtrickle.create_stream_table(
    name         => 'daily_report',
    query        => 'SELECT region, SUM(revenue) AS total FROM sales GROUP BY region',
    schedule     => '0 6 * * 1-5',
    refresh_mode => 'FULL'
);

-- Immediate mode: maintained synchronously within the same transaction
-- No schedule needed — updates happen automatically when base table changes
SELECT pgtrickle.create_stream_table(
    name         => 'live_totals',
    query        => 'SELECT region, SUM(amount) AS total FROM orders GROUP BY region',
    refresh_mode => 'IMMEDIATE'
);

-- Force WAL CDC for this stream table even if the global GUC is 'trigger'
SELECT pgtrickle.create_stream_table(
    name         => 'wal_orders',
    query        => 'SELECT id, amount FROM orders',
    schedule     => '1s',
    refresh_mode => 'DIFFERENTIAL',
    cdc_mode     => 'wal'
);

Aggregate Examples:

All supported aggregate functions work in AUTO mode (and all other modes). Examples below omit refresh_mode — the default 'AUTO' selects DIFFERENTIAL automatically. Explicit modes are shown only when the mode itself is being demonstrated.

-- Algebraic aggregates (fully differential — no rescan needed)
SELECT pgtrickle.create_stream_table(
    name     => 'sales_summary',
    query    => 'SELECT region, COUNT(*) AS cnt, SUM(amount) AS total, AVG(amount) AS avg_amount
     FROM orders GROUP BY region',
    schedule => '1m'
);

-- Semi-algebraic aggregates (MIN/MAX)
SELECT pgtrickle.create_stream_table(
    name     => 'salary_ranges',
    query    => 'SELECT department, MIN(salary) AS min_sal, MAX(salary) AS max_sal
     FROM employees GROUP BY department',
    schedule => '2m'
);

-- Group-rescan aggregates (BOOL_AND/OR, STRING_AGG, ARRAY_AGG, JSON_AGG, JSONB_AGG,
--                          BIT_AND, BIT_OR, BIT_XOR, JSON_OBJECT_AGG, JSONB_OBJECT_AGG,
--                          STDDEV, STDDEV_POP, STDDEV_SAMP, VARIANCE, VAR_POP, VAR_SAMP,
--                          MODE, PERCENTILE_CONT, PERCENTILE_DISC,
--                          CORR, COVAR_POP, COVAR_SAMP, REGR_AVGX, REGR_AVGY,
--                          REGR_COUNT, REGR_INTERCEPT, REGR_R2, REGR_SLOPE,
--                          REGR_SXX, REGR_SXY, REGR_SYY, ANY_VALUE)
SELECT pgtrickle.create_stream_table(
    name     => 'team_members',
    query    => 'SELECT department,
            STRING_AGG(name, '', '' ORDER BY name) AS members,
            ARRAY_AGG(employee_id) AS member_ids,
            BOOL_AND(active) AS all_active,
            JSON_AGG(name) AS members_json
     FROM employees
     GROUP BY department',
    schedule => '1m'
);

-- Bitwise aggregates
SELECT pgtrickle.create_stream_table(
    name     => 'permission_summary',
    query    => 'SELECT department,
            BIT_OR(permissions) AS combined_perms,
            BIT_AND(permissions) AS common_perms,
            BIT_XOR(flags) AS xor_flags
     FROM employees
     GROUP BY department',
    schedule => '1m'
);

-- JSON object aggregates
SELECT pgtrickle.create_stream_table(
    name     => 'config_map',
    query    => 'SELECT department,
            JSON_OBJECT_AGG(setting_name, setting_value) AS settings,
            JSONB_OBJECT_AGG(key, value) AS metadata
     FROM config
     GROUP BY department',
    schedule => '1m'
);

-- Statistical aggregates
SELECT pgtrickle.create_stream_table(
    name     => 'salary_stats',
    query    => 'SELECT department,
            STDDEV_POP(salary) AS sd_pop,
            STDDEV_SAMP(salary) AS sd_samp,
            VAR_POP(salary) AS var_pop,
            VAR_SAMP(salary) AS var_samp
     FROM employees
     GROUP BY department',
    schedule => '1m'
);

-- Ordered-set aggregates (MODE, PERCENTILE_CONT, PERCENTILE_DISC)
SELECT pgtrickle.create_stream_table(
    name     => 'salary_percentiles',
    query    => 'SELECT department,
            MODE() WITHIN GROUP (ORDER BY grade) AS most_common_grade,
            PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY salary) AS median_salary,
            PERCENTILE_DISC(0.9) WITHIN GROUP (ORDER BY salary) AS p90_salary
     FROM employees
     GROUP BY department',
    schedule => '1m'
);

-- Regression / correlation aggregates (CORR, COVAR_*, REGR_*)
SELECT pgtrickle.create_stream_table(
    name     => 'regression_stats',
    query    => 'SELECT department,
            CORR(salary, experience) AS sal_exp_corr,
            COVAR_POP(salary, experience) AS covar_pop,
            COVAR_SAMP(salary, experience) AS covar_samp,
            REGR_SLOPE(salary, experience) AS slope,
            REGR_INTERCEPT(salary, experience) AS intercept,
            REGR_R2(salary, experience) AS r_squared,
            REGR_COUNT(salary, experience) AS regr_n
     FROM employees
     GROUP BY department',
    schedule => '1m'
);

-- ANY_VALUE aggregate (PostgreSQL 16+)
SELECT pgtrickle.create_stream_table(
    name     => 'dept_sample',
    query    => 'SELECT department, ANY_VALUE(office_location) AS sample_office
     FROM employees GROUP BY department',
    schedule => '1m'
);

-- FILTER clause on aggregates
SELECT pgtrickle.create_stream_table(
    name     => 'order_metrics',
    query    => 'SELECT region,
            COUNT(*) AS total,
            COUNT(*) FILTER (WHERE status = ''active'') AS active_count,
            SUM(amount) FILTER (WHERE status = ''shipped'') AS shipped_total
     FROM orders
     GROUP BY region',
    schedule => '1m'
);

-- PgBouncer compatibility (transaction-mode pooler)
SELECT pgtrickle.create_stream_table(
    name                      => 'pooled_orders',
    query                     => 'SELECT id, amount FROM orders',
    schedule                  => '5m',
    pooler_compatibility_mode => true
);

CTE Examples:

Non-recursive CTEs are fully supported in both FULL and DIFFERENTIAL modes:

-- Simple CTE
SELECT pgtrickle.create_stream_table(
    name     => 'active_order_totals',
    query    => 'WITH active_users AS (
        SELECT id, name FROM users WHERE active = true
    )
    SELECT a.id, a.name, SUM(o.amount) AS total
    FROM active_users a
    JOIN orders o ON o.user_id = a.id
    GROUP BY a.id, a.name',
    schedule => '1m'
);

-- Chained CTEs (CTE referencing another CTE)
SELECT pgtrickle.create_stream_table(
    name     => 'top_regions',
    query    => 'WITH regional AS (
        SELECT region, SUM(amount) AS total FROM orders GROUP BY region
    ),
    ranked AS (
        SELECT region, total FROM regional WHERE total > 1000
    )
    SELECT * FROM ranked',
    schedule => '2m'
);

-- Multi-reference CTE (referenced twice in FROM — shared delta optimization)
SELECT pgtrickle.create_stream_table(
    name     => 'self_compare',
    query    => 'WITH totals AS (
        SELECT user_id, SUM(amount) AS total FROM orders GROUP BY user_id
    )
    SELECT t1.user_id, t1.total, t2.total AS next_total
    FROM totals t1
    JOIN totals t2 ON t1.user_id = t2.user_id + 1',
    schedule => '1m'
);

-- Append-only stream table (INSERT-only fast path)
SELECT pgtrickle.create_stream_table(
    name        => 'event_log_st',
    query       => 'SELECT id, event_type, payload, created_at FROM events',
    schedule    => '30s',
    append_only => true
);

Recursive CTEs work with FULL, DIFFERENTIAL, and IMMEDIATE modes:

-- Recursive CTE (hierarchy traversal)
SELECT pgtrickle.create_stream_table(
    name         => 'category_tree',
    query        => 'WITH RECURSIVE cat_tree AS (
        SELECT id, name, parent_id, 0 AS depth
        FROM categories WHERE parent_id IS NULL
        UNION ALL
        SELECT c.id, c.name, c.parent_id, ct.depth + 1
        FROM categories c
        JOIN cat_tree ct ON c.parent_id = ct.id
    )
    SELECT * FROM cat_tree',
    schedule     => '5m',
    refresh_mode => 'FULL'  -- FULL mode: standard re-execution
);

-- Recursive CTE with DIFFERENTIAL mode (incremental semi-naive / DRed)
SELECT pgtrickle.create_stream_table(
    name         => 'org_chart',
    query        => 'WITH RECURSIVE reports AS (
        SELECT id, name, manager_id FROM employees WHERE manager_id IS NULL
        UNION ALL
        SELECT e.id, e.name, e.manager_id
        FROM employees e JOIN reports r ON e.manager_id = r.id
    )
    SELECT * FROM reports',
    schedule     => '2m',
    refresh_mode => 'DIFFERENTIAL'  -- Uses semi-naive, DRed, or recomputation (auto-selected)
);

-- Recursive CTE with IMMEDIATE mode (same-transaction maintenance)
SELECT pgtrickle.create_stream_table(
    name         => 'org_chart_live',
    query        => 'WITH RECURSIVE reports AS (
        SELECT id, name, manager_id FROM employees WHERE manager_id IS NULL
        UNION ALL
        SELECT e.id, e.name, e.manager_id
        FROM employees e JOIN reports r ON e.manager_id = r.id
    )
    SELECT * FROM reports',
    refresh_mode => 'IMMEDIATE'  -- Uses transition tables with semi-naive / DRed maintenance
);

Non-monotone recursive terms: If the recursive term contains operators like EXCEPT, aggregate functions, window functions, DISTINCT, INTERSECT (set), or anti-joins, the system automatically falls back to recomputation to guarantee correctness. Semi-naive and DRed strategies require monotone recursive terms (JOIN, UNION ALL, filter/project only).

Set Operation Examples:

INTERSECT, INTERSECT ALL, EXCEPT, EXCEPT ALL, UNION, and UNION ALL are supported:

-- INTERSECT: customers who placed orders in BOTH regions
SELECT pgtrickle.create_stream_table(
    name     => 'bi_region_customers',
    query    => 'SELECT customer_id FROM orders_east
     INTERSECT
     SELECT customer_id FROM orders_west',
    schedule => '2m'
);

-- INTERSECT ALL: preserves duplicates (bag semantics)
SELECT pgtrickle.create_stream_table(
    name     => 'common_items',
    query    => 'SELECT item_name FROM warehouse_a
     INTERSECT ALL
     SELECT item_name FROM warehouse_b',
    schedule => '1m'
);

-- EXCEPT: orders not yet shipped
SELECT pgtrickle.create_stream_table(
    name     => 'unshipped_orders',
    query    => 'SELECT order_id FROM orders
     EXCEPT
     SELECT order_id FROM shipments',
    schedule => '1m'
);

-- EXCEPT ALL: preserves duplicate counts (bag subtraction)
SELECT pgtrickle.create_stream_table(
    name     => 'excess_inventory',
    query    => 'SELECT sku FROM stock_received
     EXCEPT ALL
     SELECT sku FROM stock_shipped',
    schedule => '5m'
);

-- UNION: deduplicated merge of two sources
SELECT pgtrickle.create_stream_table(
    name     => 'all_contacts',
    query    => 'SELECT email FROM customers
     UNION
     SELECT email FROM newsletter_subscribers',
    schedule => '5m'
);

LATERAL Set-Returning Function Examples:

Set-returning functions (SRFs) in the FROM clause are supported in both FULL and DIFFERENTIAL modes. Common SRFs include jsonb_array_elements, jsonb_each, jsonb_each_text, and unnest:

-- Flatten JSONB arrays into rows
SELECT pgtrickle.create_stream_table(
    name     => 'flat_children',
    query    => 'SELECT p.id, child.value AS val
     FROM parent_data p,
     jsonb_array_elements(p.data->''children'') AS child',
    schedule => '1m'
);

-- Expand JSONB key-value pairs (multi-column SRF)
SELECT pgtrickle.create_stream_table(
    name     => 'flat_properties',
    query    => 'SELECT d.id, kv.key, kv.value
     FROM documents d,
     jsonb_each(d.metadata) AS kv',
    schedule => '2m'
);

-- Unnest arrays
SELECT pgtrickle.create_stream_table(
    name     => 'flat_tags',
    query    => 'SELECT t.id, tag.tag
     FROM tagged_items t,
     unnest(t.tags) AS tag(tag)',
    schedule => '1m'
);

-- SRF with WHERE filter
SELECT pgtrickle.create_stream_table(
    name     => 'high_value_items',
    query    => 'SELECT p.id, (e.value)::int AS amount
     FROM products p,
     jsonb_array_elements(p.prices) AS e
     WHERE (e.value)::int > 100',
    schedule => '5m'
);

-- SRF combined with aggregation
SELECT pgtrickle.create_stream_table(
    name         => 'element_counts',
    query        => 'SELECT a.id, count(*) AS cnt
     FROM arrays a,
     jsonb_array_elements(a.data) AS e
     GROUP BY a.id',
    schedule     => '1m',
    refresh_mode => 'FULL'
);

LATERAL Subquery Examples:

LATERAL subqueries in the FROM clause are supported in both FULL and DIFFERENTIAL modes. Use them for top-N per group, correlated aggregation, and conditional expansion:

-- Top-N per group: latest item per order
SELECT pgtrickle.create_stream_table(
    name     => 'latest_items',
    query    => 'SELECT o.id, o.customer, latest.amount
     FROM orders o,
     LATERAL (
         SELECT li.amount
         FROM line_items li
         WHERE li.order_id = o.id
         ORDER BY li.created_at DESC
         LIMIT 1
     ) AS latest',
    schedule => '1m'
);

-- Correlated aggregate
SELECT pgtrickle.create_stream_table(
    name     => 'dept_summaries',
    query    => 'SELECT d.id, d.name, stats.total, stats.cnt
     FROM departments d,
     LATERAL (
         SELECT SUM(e.salary) AS total, COUNT(*) AS cnt
         FROM employees e
         WHERE e.dept_id = d.id
     ) AS stats',
    schedule => '1m'
);

-- LEFT JOIN LATERAL: preserve outer rows with NULLs when subquery returns no rows
SELECT pgtrickle.create_stream_table(
    name     => 'dept_stats_all',
    query    => 'SELECT d.id, d.name, stats.total
     FROM departments d
     LEFT JOIN LATERAL (
         SELECT SUM(e.salary) AS total
         FROM employees e
         WHERE e.dept_id = d.id
     ) AS stats ON true',
    schedule => '1m'
);

WHERE Subquery Examples:

Subqueries in the WHERE clause are automatically transformed into semi-join, anti-join, or scalar subquery operators in the DVM operator tree:

-- EXISTS subquery: customers who have placed orders
SELECT pgtrickle.create_stream_table(
    name     => 'active_customers',
    query    => 'SELECT c.id, c.name
     FROM customers c
     WHERE EXISTS (SELECT 1 FROM orders o WHERE o.customer_id = c.id)',
    schedule => '1m'
);

-- NOT EXISTS: customers with no orders
SELECT pgtrickle.create_stream_table(
    name     => 'inactive_customers',
    query    => 'SELECT c.id, c.name
     FROM customers c
     WHERE NOT EXISTS (SELECT 1 FROM orders o WHERE o.customer_id = c.id)',
    schedule => '1m'
);

-- IN subquery: products that have been ordered
SELECT pgtrickle.create_stream_table(
    name     => 'ordered_products',
    query    => 'SELECT p.id, p.name
     FROM products p
     WHERE p.id IN (SELECT product_id FROM order_items)',
    schedule => '1m'
);

-- NOT IN subquery: products never ordered
SELECT pgtrickle.create_stream_table(
    name     => 'unordered_products',
    query    => 'SELECT p.id, p.name
     FROM products p
     WHERE p.id NOT IN (SELECT product_id FROM order_items)',
    schedule => '1m'
);

-- Scalar subquery in SELECT list
SELECT pgtrickle.create_stream_table(
    name     => 'products_with_max_price',
    query    => 'SELECT p.id, p.name, (SELECT max(price) FROM products) AS max_price
     FROM products p',
    schedule => '1m'
);

Notes:

  • The defining query is parsed into an operator tree and validated for DVM support.
  • Views as sources — views referenced in the defining query are automatically inlined as subqueries (auto-rewrite pass #0). CDC triggers are created on the underlying base tables. Nested views (view → view → table) are fully expanded. The user's original query is preserved in original_query for reinit and introspection. Materialized views are rejected in DIFFERENTIAL mode (use FULL mode or the underlying query directly). Foreign tables are also rejected in DIFFERENTIAL mode.
  • CDC triggers and change buffer tables are created automatically for each source table.
  • TRUNCATE on source tables — when a source table is TRUNCATEd, a CDC trigger writes a marker row (action='T') into the change buffer. On the next refresh cycle, pg_trickle detects the marker and automatically falls back to a FULL refresh. For single-source stream tables where no subsequent DML occurred after the TRUNCATE, an optimized fast path deletes all ST rows directly without re-running the full defining query.
  • The ST is registered in the dependency DAG; cycles are rejected.
  • Non-recursive CTEs are inlined as subqueries during parsing (Tier 1). Multi-reference CTEs share delta computation (Tier 2).
  • Recursive CTEs in DIFFERENTIAL mode use three strategies, auto-selected per refresh: semi-naive evaluation for INSERT-only changes, DRed (Delete-and-Rederive) for mixed DELETE/UPDATE changes, and recomputation fallback when CTE columns do not match ST storage columns. Non-monotone recursive terms (containing EXCEPT, Aggregate, Window, DISTINCT, AntiJoin, or INTERSECT SET) automatically fall back to recomputation to ensure correctness.

Recursive CTE DIFFERENTIAL mode -- DRed algorithm (P2-1) In DIFFERENTIAL mode, mixed DELETE/UPDATE changes now use the DRed (Delete-and-Rederive) algorithm: (1) semi-naive INSERT propagation; (2) over-deletion cascade from ST storage; (3) rederivation from current source tables; (4) combine net deletions. DRed correctly handles derived-column changes such as path rebuilds under a renamed ancestor node. When CTE output columns differ from ST storage columns (mismatch), recomputation is used. Implemented in v0.10.0 (P2-1).

  • LATERAL SRFs in DIFFERENTIAL mode use row-scoped recomputation: when a source row changes, only the SRF expansions for that row are re-evaluated.
  • LATERAL subqueries in DIFFERENTIAL mode also use row-scoped recomputation: when an outer row changes, the correlated subquery is re-executed only for that row.
  • WHERE subqueries (EXISTS, IN, scalar) are parsed into dedicated semi-join, anti-join, and scalar subquery operators with specialized delta computation.
  • ALL (subquery) is the only subquery form that is currently rejected.
  • ORDER BY is accepted but silently discarded — row order in the storage table is undefined (consistent with PostgreSQL's CREATE MATERIALIZED VIEW behavior). Apply ORDER BY when querying the stream table.
  • TopK (ORDER BY + LIMIT) — When a top-level ORDER BY … LIMIT N is present (with a constant integer limit, optionally with OFFSET M), the query is recognized as a "TopK" pattern and accepted. TopK stream tables store exactly N rows (starting from position M+1 if OFFSET is specified) and are refreshed via a scoped-recomputation MERGE strategy. The DVM delta pipeline is bypassed; instead, each refresh re-evaluates the full ORDER BY + LIMIT [+ OFFSET] query and merges the result into the storage table. The catalog records topk_limit, topk_order_by, and optionally topk_offset for the stream table. TopK is not supported with set operations (UNION/INTERSECT/EXCEPT) or GROUP BY ROLLUP/CUBE/GROUPING SETS.
  • LIMIT / OFFSET without ORDER BY are rejected — stream tables materialize the full result set. Apply LIMIT when querying the stream table.

pgtrickle.create_stream_table_if_not_exists

Create a stream table if it does not already exist. If a stream table with the given name already exists, this is a silent no-op (an INFO message is logged). The existing definition is never modified.

pgtrickle.create_stream_table_if_not_exists(
    name                    text,
    query                   text,
    schedule                text      DEFAULT 'calculated',
    refresh_mode            text      DEFAULT 'AUTO',
    initialize              bool      DEFAULT true,
    diamond_consistency     text      DEFAULT NULL,
    diamond_schedule_policy text      DEFAULT NULL,
    cdc_mode                text      DEFAULT NULL,
    append_only             bool      DEFAULT false,
    pooler_compatibility_mode bool    DEFAULT false
) → void

Parameters: Same as create_stream_table.

Example:

-- Safe to re-run in migrations:
SELECT pgtrickle.create_stream_table_if_not_exists(
    'order_totals',
    'SELECT customer_id, sum(amount) AS total FROM orders GROUP BY customer_id',
    '1m',
    'DIFFERENTIAL'
);

Notes:

  • Useful for deployment / migration scripts that should be safe to re-run.
  • If the stream table already exists, the provided query, schedule, and other parameters are ignored — the existing definition is preserved.

pgtrickle.create_or_replace_stream_table

Create a stream table if it does not exist, or replace the existing one if the definition changed. This is the declarative, idempotent API for deployment workflows (dbt, SQL migrations, GitOps).

pgtrickle.create_or_replace_stream_table(
    name                    text,
    query                   text,
    schedule                text      DEFAULT 'calculated',
    refresh_mode            text      DEFAULT 'AUTO',
    initialize              bool      DEFAULT true,
    diamond_consistency     text      DEFAULT NULL,
    diamond_schedule_policy text      DEFAULT NULL,
    cdc_mode                text      DEFAULT NULL,
    append_only             bool      DEFAULT false,
    pooler_compatibility_mode bool    DEFAULT false
) → void

Parameters: Same as create_stream_table.

Behavior:

Current stateAction taken
Stream table does not existCreate — identical to create_stream_table(...)
Stream table exists, query and all config identicalNo-op — logs INFO, returns immediately
Stream table exists, query identical but config differsAlter config — delegates to alter_stream_table(...) for schedule, refresh_mode, diamond settings, cdc_mode, append_only, pooler_compatibility_mode
Stream table exists, query differsReplace query — in-place ALTER QUERY migration plus any config changes; a full refresh is applied

The initialize parameter is honoured on create only. On replace, the stream table is always repopulated via a full refresh.

Query comparison uses the post-rewrite (normalized) form of the SQL. Cosmetic differences such as whitespace, casing, and extra parentheses are ignored.

Example:

-- Idempotent deployment — safe to run on every deploy:
SELECT pgtrickle.create_or_replace_stream_table(
    name         => 'order_totals',
    query        => 'SELECT region, SUM(amount) AS total FROM orders GROUP BY region',
    schedule     => '2m',
    refresh_mode => 'DIFFERENTIAL'
);

-- If the query changed since last deploy, the stream table is
-- migrated in place (no data gap). If nothing changed, it's a no-op.

Notes:

  • Mirrors PostgreSQL's CREATE OR REPLACE convention (CREATE OR REPLACE VIEW, CREATE OR REPLACE FUNCTION).
  • Never drops the stream table — even for incompatible schema changes, the ALTER QUERY path rebuilds storage in place while preserving the catalog entry (pgt_id).
  • For migration scripts that should not modify an existing definition, use create_stream_table_if_not_exists instead.

pgtrickle.bulk_create

Create multiple stream tables in a single transaction.

pgtrickle.bulk_create(
    definitions  jsonb     -- Array of stream table definitions
) → jsonb                  -- Array of result objects

Each element in the definitions array must be a JSON object with at least name and query keys. All other keys match the parameters of create_stream_table (snake_case):

KeyTypeDefaultDescription
namestring(required)Stream table name (optionally schema-qualified).
querystring(required)Defining SQL query.
schedulestring'calculated'Refresh schedule.
refresh_modestring'AUTO''AUTO', 'FULL', 'DIFFERENTIAL', or 'IMMEDIATE'.
initializebooleantrueWhether to populate immediately.
diamond_consistencystringNULL'atomic' or 'none'.
diamond_schedule_policystringNULL'fastest' or 'slowest'.
cdc_modestringNULL'auto', 'trigger', or 'wal'.
append_onlybooleanfalseEnable append-only fast path.
pooler_compatibility_modebooleanfalsePgBouncer compatibility.
partition_bystringNULLPartition key.
max_differential_joinsintegerNULLMax join scan limit.
max_delta_fractionnumberNULLMax delta fraction (0.0–1.0).

Returns a JSONB array of result objects:

[
  {"name": "st1", "status": "created", "pgt_id": 42},
  {"name": "st2", "status": "created", "pgt_id": 43}
]

On any error, the entire transaction is rolled back (standard PostgreSQL transactional semantics). The error message includes the index and name of the failing definition.

Example:

SELECT pgtrickle.bulk_create('[
  {"name": "order_totals", "query": "SELECT customer_id, SUM(amount) AS total FROM orders GROUP BY customer_id", "schedule": "30s"},
  {"name": "product_stats", "query": "SELECT product_id, COUNT(*) AS cnt FROM order_items GROUP BY product_id", "schedule": "1m"}
]'::jsonb);

pgtrickle.alter_stream_table

Alter properties of an existing stream table.

pgtrickle.alter_stream_table(
    name                  text,
    query                 text      DEFAULT NULL,
    schedule              text      DEFAULT NULL,
    refresh_mode          text      DEFAULT NULL,
    status                text      DEFAULT NULL,
    diamond_consistency   text      DEFAULT NULL,
    diamond_schedule_policy text    DEFAULT NULL,
    cdc_mode              text      DEFAULT NULL,
    append_only           bool      DEFAULT NULL,
    pooler_compatibility_mode bool  DEFAULT NULL,
    tier                  text      DEFAULT NULL
) → void

Parameters:

ParameterTypeDefaultDescription
nametextName of the stream table (schema-qualified or unqualified).
querytextNULLNew defining query. Pass NULL to leave unchanged. When set, the function validates the new query, migrates the storage table schema if needed, updates catalog entries and dependencies, and runs a full refresh. Schema changes are classified as same (no DDL), compatible (ALTER TABLE ADD/DROP COLUMN), or incompatible (full storage rebuild with OID change).
scheduletextNULLNew schedule as a duration string (e.g., '5m'). Pass NULL to leave unchanged. Pass 'calculated' to switch to CALCULATED mode.
refresh_modetextNULLNew refresh mode ('AUTO', 'FULL', 'DIFFERENTIAL', or 'IMMEDIATE'). Pass NULL to leave unchanged. Switching to/from 'IMMEDIATE' migrates trigger infrastructure (IVM triggers ↔ CDC triggers), clears or restores the schedule, and runs a full refresh.
statustextNULLNew status ('ACTIVE', 'SUSPENDED'). Pass NULL to leave unchanged. Resuming resets consecutive errors to 0.
diamond_consistencytextNULLNew diamond consistency mode ('none' or 'atomic'). Pass NULL to leave unchanged.
diamond_schedule_policytextNULLNew schedule policy for atomic diamond groups ('fastest' or 'slowest'). Pass NULL to leave unchanged.
cdc_modetextNULLNew requested CDC mode override ('auto', 'trigger', or 'wal'). Pass NULL to leave unchanged.
append_onlyboolNULLEnable or disable the append-only INSERT fast path. Pass NULL to leave unchanged. When true, rejected for FULL, IMMEDIATE, or keyless source stream tables.
pooler_compatibility_modeboolNULLEnable or disable pooler-safe mode. When true, prepared statements are bypassed and NOTIFY emissions are suppressed. Pass NULL to leave unchanged.
tiertextNULLRefresh tier for tiered scheduling ('hot', 'warm', 'cold', or 'frozen'). Only effective when pg_trickle.tiered_scheduling GUC is enabled. Hot (1×), Warm (2×), Cold (10×), Frozen (skip). Pass NULL to leave unchanged.

If you switch a stream table to refresh_mode => 'IMMEDIATE' while the cluster-wide pg_trickle.cdc_mode GUC is set to 'wal', pg_trickle logs an INFO and proceeds with IVM triggers. WAL CDC does not apply to IMMEDIATE mode. If the stream table has an explicit cdc_mode => 'wal' override, switching to IMMEDIATE is rejected until you change the requested CDC mode back to 'auto' or 'trigger'.

Examples:

-- Change the defining query (same output schema — fast path)
SELECT pgtrickle.alter_stream_table('order_totals',
    query => 'SELECT customer_id, SUM(amount) AS total FROM orders WHERE status = ''active'' GROUP BY customer_id');

-- Change query and add a column (compatible schema migration)
SELECT pgtrickle.alter_stream_table('order_totals',
    query => 'SELECT customer_id, SUM(amount) AS total, COUNT(*) AS cnt FROM orders GROUP BY customer_id');

-- Change query and mode simultaneously
SELECT pgtrickle.alter_stream_table('order_totals',
    query => 'SELECT customer_id, SUM(amount) AS total FROM orders GROUP BY customer_id',
    refresh_mode => 'FULL');

-- Change schedule
SELECT pgtrickle.alter_stream_table('order_totals', schedule => '5m');

-- Switch to full refresh mode
SELECT pgtrickle.alter_stream_table('order_totals', refresh_mode => 'FULL');

-- Switch to immediate (transactional) mode — installs IVM triggers, clears schedule
SELECT pgtrickle.alter_stream_table('order_totals', refresh_mode => 'IMMEDIATE');

-- Switch from immediate back to differential — re-creates CDC triggers, restores schedule
SELECT pgtrickle.alter_stream_table('order_totals',
    refresh_mode => 'DIFFERENTIAL', schedule => '5m');

-- Pin a deferred stream table to trigger CDC even when the global GUC is 'auto'
SELECT pgtrickle.alter_stream_table('order_totals', cdc_mode => 'trigger');

-- Enable append-only INSERT fast path
SELECT pgtrickle.alter_stream_table('event_log_st', append_only => true);

-- Enable pooler compatibility mode (for PgBouncer transaction mode)
SELECT pgtrickle.alter_stream_table('order_totals', pooler_compatibility_mode => true);

-- Set refresh tier (requires pg_trickle.tiered_scheduling = on)
SELECT pgtrickle.alter_stream_table('order_totals', tier => 'warm');
SELECT pgtrickle.alter_stream_table('archive_stats', tier => 'frozen');

-- Suspend a stream table
SELECT pgtrickle.alter_stream_table('order_totals', status => 'SUSPENDED');

-- Resume a suspended stream table
SELECT pgtrickle.resume_stream_table('order_totals');
-- Or via alter_stream_table
SELECT pgtrickle.alter_stream_table('order_totals', status => 'ACTIVE');

Notes:

  • When query is provided, the function runs the full query rewrite pipeline (view inlining, DISTINCT ON, GROUPING SETS, etc.) and validates the new query before applying changes.
  • The entire ALTER QUERY operation runs within a single transaction. If any step fails, the stream table is left unchanged.
  • For same-schema and compatible-schema changes, the storage table OID is preserved — views, policies, and publications referencing the stream table remain valid.
  • For incompatible schema changes (e.g., changing a column from integer to text), the storage table is rebuilt and the OID changes. A WARNING is emitted.
  • The stream table is temporarily suspended during query migration to prevent concurrent scheduler refreshes.

pgtrickle.drop_stream_table

Drop a stream table, removing the storage table and all catalog entries.

pgtrickle.drop_stream_table(name text) → void

Parameters:

ParameterTypeDescription
nametextName of the stream table to drop.

Example:

SELECT pgtrickle.drop_stream_table('order_totals');

Notes:

  • Drops the underlying storage table with CASCADE.
  • Removes all catalog entries (metadata, dependencies, refresh history).
  • Cleans up CDC triggers and change buffer tables for source tables that are no longer tracked by any ST.
  • Automatically drops any downstream publication created by stream_table_to_publication().

pgtrickle.stream_table_to_publication

Create a PostgreSQL logical replication publication for a stream table, enabling downstream consumers (Debezium, Kafka Connect, standby replicas) to subscribe to changes.

pgtrickle.stream_table_to_publication(name text) → void

Parameters:

ParameterTypeDescription
nametextName of the stream table (schema-qualified or unqualified).

Example:

SELECT pgtrickle.stream_table_to_publication('order_totals');
-- Creates publication 'pgt_pub_order_totals'

Notes:

  • The publication is named pgt_pub_<table_name>.
  • Only one publication per stream table is allowed.
  • The publication is automatically dropped when the stream table is dropped.

pgtrickle.drop_stream_table_publication

Drop the logical replication publication for a stream table.

pgtrickle.drop_stream_table_publication(name text) → void

Parameters:

ParameterTypeDescription
nametextName of the stream table (schema-qualified or unqualified).

Example:

SELECT pgtrickle.drop_stream_table_publication('order_totals');

pgtrickle.set_stream_table_sla

Assign a freshness deadline SLA to a stream table. The extension automatically assigns the appropriate refresh tier based on the SLA.

pgtrickle.set_stream_table_sla(name text, sla interval) → void

Parameters:

ParameterTypeDescription
nametextName of the stream table (schema-qualified or unqualified).
slaintervalMaximum acceptable data staleness.

Tier assignment:

  • SLA ≤ 5 seconds → Hot tier
  • SLA ≤ 30 seconds → Warm tier
  • SLA > 30 seconds → Cold tier

Example:

SELECT pgtrickle.set_stream_table_sla('order_totals', interval '10 seconds');
-- Assigns Warm tier

Notes:

  • The scheduler periodically checks actual refresh performance and dynamically re-assigns tiers if the SLA is consistently breached or over-served.

pgtrickle.resume_stream_table

Resume a suspended stream table, clearing its consecutive error count and re-enabling automated and manual refreshes.

pgtrickle.resume_stream_table(name text) → void

Parameters:

ParameterTypeDescription
nametextName of the stream table to resume (schema-qualified or unqualified).

Example:

-- Resume a stream table that was auto-suspended due to repeated errors
SELECT pgtrickle.resume_stream_table('order_totals');

Notes:

  • Errors if the ST is not in SUSPENDED state.
  • Resets consecutive_errors to 0 and sets status = 'ACTIVE'.
  • Emits a resumed event on the pg_trickle_alert NOTIFY channel.
  • After resuming, the scheduler will include the ST in its next cycle.

pgtrickle.refresh_stream_table

Manually trigger a synchronous refresh of a stream table.

pgtrickle.refresh_stream_table(name text) → void

Parameters:

ParameterTypeDescription
nametextName of the stream table to refresh.

Example:

SELECT pgtrickle.refresh_stream_table('order_totals');

Notes:

  • Blocked if the ST is SUSPENDED — use pgtrickle.resume_stream_table(name) first.
  • Uses an advisory lock to prevent concurrent refreshes of the same ST.
  • For DIFFERENTIAL mode, generates and applies a delta query. For FULL mode, truncates and reloads.
  • Records the refresh in pgtrickle.pgt_refresh_history with initiated_by = 'MANUAL'.

pgtrickle.repair_stream_table

Repair a stream table by reinstalling any missing CDC triggers, validating catalog entries, and reconciling change buffer state.

pgtrickle.repair_stream_table(name text) → text

Parameters:

ParameterTypeDescription
nametextName of the stream table to repair.

Example:

-- Reinstall missing CDC triggers after a point-in-time recovery
SELECT pgtrickle.repair_stream_table('order_totals');

Notes:

  • Inspects all source tables in the stream table's dependency graph and reinstalls any missing or disabled CDC triggers.
  • Validates that the stream table's catalog entry, storage table, and change buffer tables are consistent.
  • Useful after pg_basebackup or PITR restores where triggers may not have been captured in the backup.
  • Use pgtrickle.trigger_inventory() first to identify which triggers are missing.
  • Safe to call on a healthy stream table — it is a no-op if everything is intact.

Status & Monitoring

Query the state of stream tables, view refresh statistics, and diagnose problems.


pgtrickle.pgt_status

Get the status of all stream tables.

pgtrickle.pgt_status() → SETOF record(
    name                text,
    status              text,
    refresh_mode        text,
    is_populated        bool,
    consecutive_errors  int,
    schedule            text,
    data_timestamp      timestamptz,
    staleness           interval
)

Example:

SELECT * FROM pgtrickle.pgt_status();
namestatusrefresh_modeis_populatedconsecutive_errorsscheduledata_timestampstaleness
public.order_totalsACTIVEDIFFERENTIALtrue05m2026-02-21 12:00:00+0000:02:30

pgtrickle.health_check

Run a set of health checks against the pg_trickle installation and return one row per check.

pgtrickle.health_check() → SETOF record(
    check_name  text,   -- identifier for the check
    severity    text,   -- 'OK', 'WARN', or 'ERROR'
    detail      text    -- human-readable explanation
)

Filter to problems only:

SELECT check_name, severity, detail
FROM pgtrickle.health_check()
WHERE severity != 'OK';

Checks: scheduler_running, error_tables, stale_tables, needs_reinit, consecutive_errors, buffer_growth (> 10 000 pending rows), slot_lag (retained WAL above pg_trickle.slot_lag_warning_threshold_mb, default 100 MB), worker_pool (all worker tokens in use — parallel mode only), job_queue (> 10 jobs queued — parallel mode only).


pgtrickle.health_summary

Single-row summary of the entire pg_trickle deployment's health. Designed for monitoring dashboards that want one endpoint to poll instead of joining multiple views.

pgtrickle.health_summary() → SETOF record(
    total_stream_tables   int,
    active_count          int,
    error_count           int,
    suspended_count       int,
    stale_count           int,
    reinit_pending        int,
    max_staleness_seconds float8,    -- NULL if no stream tables
    scheduler_status      text,      -- 'ACTIVE', 'STOPPED', or 'NOT_LOADED'
    cache_hit_rate        float8     -- NULL if no cache lookups yet
)

Example:

SELECT * FROM pgtrickle.health_summary();
total_stream_tablesactive_counterror_countsuspended_countstale_countreinit_pendingmax_staleness_secondsscheduler_statuscache_hit_rate
1211010045.2ACTIVE0.94

Tip: Use this in a Grafana single-stat panel or a Prometheus exporter to surface fleet-level health at a glance.


pgtrickle.refresh_timeline

Return recent refresh records across all stream tables in a single chronological view.

pgtrickle.refresh_timeline(
    max_rows int  DEFAULT 50
) → SETOF record(
    start_time      timestamptz,
    stream_table    text,
    action          text,
    status          text,
    rows_inserted   bigint,
    rows_deleted    bigint,
    duration_ms     float8,
    error_message   text
)

Example:

-- Most recent 20 events across all stream tables:
SELECT start_time, stream_table, action, status, round(duration_ms::numeric,1) AS ms
FROM pgtrickle.refresh_timeline(20);

-- Just failures in the last 100 events:
SELECT * FROM pgtrickle.refresh_timeline(100) WHERE status = 'ERROR';

pgtrickle.st_refresh_stats

Return per-ST refresh statistics aggregated from the refresh history.

pgtrickle.st_refresh_stats() → SETOF record(
    pgt_name                text,
    pgt_schema              text,
    status                 text,
    refresh_mode           text,
    is_populated           bool,
    total_refreshes        bigint,
    successful_refreshes   bigint,
    failed_refreshes       bigint,
    total_rows_inserted    bigint,
    total_rows_deleted     bigint,
    avg_duration_ms        float8,
    last_refresh_action    text,
    last_refresh_status    text,
    last_refresh_at        timestamptz,
    staleness_secs       float8,
    stale           bool
)

Example:

SELECT pgt_name, status, total_refreshes, avg_duration_ms, stale
FROM pgtrickle.st_refresh_stats();

pgtrickle.get_refresh_history

Return refresh history for a specific stream table.

pgtrickle.get_refresh_history(
    name      text,
    max_rows  int  DEFAULT 20
) → SETOF record(
    refresh_id       bigint,
    data_timestamp   timestamptz,
    start_time       timestamptz,
    end_time         timestamptz,
    action           text,
    status           text,
    rows_inserted    bigint,
    rows_deleted     bigint,
    duration_ms      float8,
    error_message    text
)

Example:

SELECT action, status, rows_inserted, duration_ms
FROM pgtrickle.get_refresh_history('order_totals', 5);

pgtrickle.get_staleness

Get the current staleness in seconds for a specific stream table.

pgtrickle.get_staleness(name text) → float8

Returns NULL if the ST has never been refreshed.

Example:

SELECT pgtrickle.get_staleness('order_totals');
-- Returns: 12.345  (seconds since last refresh)

pgtrickle.explain_refresh_mode

Added in v0.11.0

Explain the configured vs. effective refresh mode for a stream table, including the reason for any downgrade (e.g., AUTO choosing FULL).

pgtrickle.explain_refresh_mode(name text) → TABLE(
    configured_mode  text,
    effective_mode   text,
    downgrade_reason text
)

Columns:

ColumnTypeDescription
configured_modetextThe refresh mode set on the stream table (e.g., DIFFERENTIAL, AUTO, FULL, IMMEDIATE)
effective_modetextThe mode actually used on the most recent refresh. NULL for IMMEDIATE mode (handled by triggers)
downgrade_reasontextHuman-readable explanation when effective_mode differs from configured_mode, or informational note for IMMEDIATE / APPEND_ONLY

Example:

SELECT * FROM pgtrickle.explain_refresh_mode('public.orders_summary');
configured_modeeffective_modedowngrade_reason
AUTOFULLThe most recent refresh used FULL mode. Possible causes: defining query contains a CTE or unsupported operator, adaptive change-ratio threshold was exceeded, or aggregate saturation occurred. Check pgtrickle.pgt_refresh_history for details.

pgtrickle.cache_stats

Return template cache statistics from shared memory.

Reports L1 (thread-local) hits, L2 (catalog table) hits, full misses (DVM re-parse), evictions (generation flushes), and the current L1 cache size for this backend.

pgtrickle.cache_stats() → SETOF record(
    l1_hits    bigint,
    l2_hits    bigint,
    misses     bigint,
    evictions  bigint,
    l1_size    integer
)
ColumnDescription
l1_hitsNumber of delta template cache hits in the thread-local (L1) cache. ~0 ns lookup.
l2_hitsNumber of delta template cache hits in the catalog table (L2) cache. ~1 ms SPI lookup.
missesNumber of full cache misses requiring DVM re-parse (~45 ms).
evictionsNumber of entries evicted from L1 due to DDL-triggered generation flushes.
l1_sizeCurrent number of entries in this backend's L1 cache.

Example:

SELECT * FROM pgtrickle.cache_stats();
l1_hitsl2_hitsmissesevictionsl1_size
14235108

Note: Counters are cluster-wide (shared memory) except l1_size which is per-backend. Requires shared_preload_libraries = 'pg_trickle'; returns zeros when loaded dynamically.


CDC Diagnostics

Inspect CDC pipeline health, replication slots, change buffers, and trigger coverage.


pgtrickle.slot_health

Check replication slot health for all tracked CDC slots.

pgtrickle.slot_health() → SETOF record(
    slot_name          text,
    source_relid       bigint,
    active             bool,
    retained_wal_bytes bigint,
    wal_status         text
)

Example:

SELECT * FROM pgtrickle.slot_health();
slot_namesource_relidactiveretained_wal_byteswal_status
pg_trickle_slot_1638416384false1048576reserved

pgtrickle.check_cdc_health

Check CDC health for all tracked source tables. Returns per-source health status including the current CDC mode, replication slot details, estimated lag, and any alerts.

The alert column uses the critical threshold configured by pg_trickle.slot_lag_critical_threshold_mb (default 1024 MB).

pgtrickle.check_cdc_health() → SETOF record(
    source_relid   bigint,
    source_table   text,
    cdc_mode       text,
    slot_name      text,
    lag_bytes      bigint,
    confirmed_lsn  text,
    alert          text
)

Columns:

ColumnTypeDescription
source_relidbigintOID of the tracked source table
source_tabletextResolved name of the source table (e.g., public.orders)
cdc_modetextCurrent CDC mode: TRIGGER, TRANSITIONING, or WAL
slot_nametextReplication slot name (NULL for TRIGGER mode)
lag_bytesbigintReplication slot lag in bytes (NULL for TRIGGER mode)
confirmed_lsntextLast confirmed WAL position (NULL for TRIGGER mode)
alerttextAlert message if unhealthy (e.g., slot_lag_exceeds_threshold, replication_slot_missing)

Example:

SELECT * FROM pgtrickle.check_cdc_health();
source_relidsource_tablecdc_modeslot_namelag_bytesconfirmed_lsnalert
16384public.ordersTRIGGER
16390public.eventsWALpg_trickle_slot_163905242880/1A8B000

pgtrickle.change_buffer_sizes

Show pending change counts and estimated on-disk sizes for all CDC-tracked source tables.

Returns one row per (stream_table, source_table) pair.

pgtrickle.change_buffer_sizes() → SETOF record(
    stream_table  text,     -- qualified stream table name
    source_table  text,     -- qualified source table name
    source_oid    bigint,
    cdc_mode      text,     -- 'trigger', 'wal', or 'transitioning'
    pending_rows  bigint,   -- rows in buffer not yet consumed
    buffer_bytes  bigint    -- estimated buffer table size in bytes
)

Example:

SELECT * FROM pgtrickle.change_buffer_sizes()
ORDER BY pending_rows DESC;

Useful for spotting a source table whose CDC buffer is growing unexpectedly (which may indicate a stalled differential refresh or a high-write source that has outpaced the schedule).


pgtrickle.worker_pool_status

Snapshot of the parallel refresh worker pool. Returns a single row.

pgtrickle.worker_pool_status() → SETOF record(
    active_workers  int,   -- workers currently executing refresh jobs
    max_workers     int,   -- cluster-wide worker budget (GUC)
    per_db_cap      int,   -- per-database dispatch cap (GUC)
    parallel_mode   text   -- current parallel_refresh_mode value
)

Example:

SELECT * FROM pgtrickle.worker_pool_status();

Returns 0 active workers when parallel_refresh_mode = 'off'.


pgtrickle.parallel_job_status

Active and recently completed scheduler jobs from the pgt_scheduler_jobs table. Shows jobs that are currently queued or running, plus jobs that finished within the last max_age_seconds (default 300).

pgtrickle.parallel_job_status(
    max_age_seconds int  DEFAULT 300
) → SETOF record(
    job_id         bigint,
    unit_key       text,        -- stable unit identifier (s:42, a:1,2, etc.)
    unit_kind      text,        -- 'singleton', 'atomic_group', 'immediate_closure'
    status         text,        -- 'QUEUED', 'RUNNING', 'SUCCEEDED', etc.
    member_count   int,
    attempt_no     int,
    scheduler_pid  int,
    worker_pid     int,         -- NULL if not yet claimed
    enqueued_at    timestamptz,
    started_at     timestamptz, -- NULL if still queued
    finished_at    timestamptz, -- NULL if not finished
    duration_ms    float8       -- NULL if not finished
)

Example — show running and recently failed jobs:

SELECT job_id, unit_key, status, duration_ms
FROM pgtrickle.parallel_job_status(60)
WHERE status NOT IN ('SUCCEEDED');

pgtrickle.trigger_inventory

List all CDC triggers that pg_trickle should have installed, and verify each one exists and is enabled in pg_catalog.

pgtrickle.trigger_inventory() → SETOF record(
    source_table  text,    -- qualified source table name
    source_oid    bigint,
    trigger_name  text,    -- expected trigger name
    trigger_type  text,    -- 'DML' or 'TRUNCATE'
    present       bool,    -- trigger exists in pg_catalog
    enabled       bool     -- trigger is not disabled
)

A present = false row means change capture is broken for that source.

Example:

-- Show only missing or disabled triggers:
SELECT source_table, trigger_type, trigger_name
FROM pgtrickle.trigger_inventory()
WHERE NOT present OR NOT enabled;

pgtrickle.fuse_status

Return the circuit-breaker (fuse) state for every stream table that has a fuse configured.

pgtrickle.fuse_status() → SETOF record(
    name           text,         -- stream table name
    fuse_mode      text,         -- 'off', 'on', or 'auto'
    fuse_state     text,         -- 'armed' or 'blown'
    fuse_ceiling   bigint,       -- change-count threshold
    fuse_sensitivity int,        -- consecutive over-ceiling cycles before blow
    blown_at       timestamptz,  -- when the fuse last blew (NULL if armed)
    blow_reason    text          -- reason the fuse blew (NULL if armed)
)

Example:

-- Check all fuse-enabled stream tables
SELECT name, fuse_mode, fuse_state, fuse_ceiling, blown_at
FROM pgtrickle.fuse_status();

-- Find blown fuses
SELECT name, blow_reason, blown_at
FROM pgtrickle.fuse_status()
WHERE fuse_state = 'blown';

Notes:

  • Returns one row per stream table where fuse_mode != 'off'.
  • A blown fuse suspends differential refreshes until cleared with pgtrickle.reset_fuse().
  • A pgtrickle_alert NOTIFY with event fuse_blown is emitted when the fuse trips.
  • See Configuration — fuse_default_ceiling for global defaults.

pgtrickle.reset_fuse

Clear a blown circuit-breaker fuse and resume scheduling for the stream table.

pgtrickle.reset_fuse(name text, action text DEFAULT 'apply') → void

Parameters:

ParameterTypeDefaultDescription
nametextName of the stream table whose fuse to reset.
actiontext'apply'How to handle the pending changes that caused the fuse to blow.

Actions:

ActionBehavior
'apply'Process all pending changes normally and resume scheduling.
'reinitialize'Drop and repopulate the stream table from scratch (full refresh from defining query).
'skip_changes'Discard the pending changes that triggered the fuse and resume from the current frontier.

Example:

-- After investigating a bulk load, apply the changes:
SELECT pgtrickle.reset_fuse('category_summary', action => 'apply');

-- Or skip the oversized batch entirely:
SELECT pgtrickle.reset_fuse('category_summary', action => 'skip_changes');

-- Or rebuild from scratch:
SELECT pgtrickle.reset_fuse('category_summary', action => 'reinitialize');

Notes:

  • Errors if the stream table's fuse is not in 'blown' state.
  • After reset, the fuse returns to 'armed' state and the scheduler resumes normal operation.
  • Use pgtrickle.fuse_status() to inspect the fuse state before resetting.
  • The 'skip_changes' action advances the frontier past the pending changes without applying them — use only when you are certain the changes should be discarded.

Dependency & Inspection

Visualize dependencies, understand query plans, and audit source table relationships.


pgtrickle.dependency_tree

Render all stream table dependencies as an indented ASCII tree.

pgtrickle.dependency_tree() → SETOF record(
    tree_line    text,    -- indented visual line (├──, └──, │ characters)
    node         text,    -- qualified name (schema.table)
    node_type    text,    -- 'stream_table' or 'source_table'
    depth        int,
    status       text,    -- NULL for source_table nodes
    refresh_mode text     -- NULL for source_table nodes
)

Roots (stream tables with no stream-table parents) appear at depth 0. Each dependent is indented beneath its parent. Plain source tables are rendered as leaf nodes tagged [src].

Example:

SELECT tree_line, status, refresh_mode
FROM pgtrickle.dependency_tree();
tree_line                               status   refresh_mode
----------------------------------------+---------+--------------
report_summary                          ACTIVE   DIFFERENTIAL
├── orders_by_region                    ACTIVE   DIFFERENTIAL
│   ├── public.orders [src]
│   └── public.customers [src]
└── revenue_totals                      ACTIVE   DIFFERENTIAL
    └── public.orders [src]

pgtrickle.diamond_groups

List all detected diamond dependency groups and their members.

When stream tables form diamond-shaped dependency graphs (multiple paths converge at a single fan-in node), the scheduler groups them for coordinated refresh. This function exposes those groups for monitoring and debugging.

pgtrickle.diamond_groups() → SETOF record(
    group_id        int4,
    member_name     text,
    member_schema   text,
    is_convergence  bool,
    epoch           int8,
    schedule_policy text
)

Return columns:

ColumnTypeDescription
group_idint4Numeric identifier for the consistency group (1-based).
member_nametextName of the stream table in this group.
member_schematextSchema of the stream table.
is_convergencebooltrue if this member is a convergence (fan-in) node where multiple paths meet.
epochint8Group epoch counter — advances on each successful atomic refresh of the group.
schedule_policytextEffective schedule policy for this group ('fastest' or 'slowest'). Computed from convergence node settings with strictest-wins.

Example:

SELECT * FROM pgtrickle.diamond_groups();
group_idmember_namemember_schemais_convergenceepochschedule_policy
1st_bpublicfalse0fastest
1st_cpublicfalse0fastest
1st_dpublictrue0fastest

Notes:

  • Singleton stream tables (not part of any diamond) are omitted.
  • The DAG is rebuilt on each call from the catalog — results reflect the current dependency graph.
  • Groups are only relevant when diamond_consistency = 'atomic' is set on the convergence node or globally via the pg_trickle.diamond_consistency GUC.

pgtrickle.pgt_scc_status

List all cyclic strongly connected components (SCCs) and their convergence status.

When stream tables form circular dependencies (with pg_trickle.allow_circular = true), they are grouped into SCCs and iterated to a fixed point. This function exposes those groups for monitoring and debugging.

pgtrickle.pgt_scc_status() → SETOF record(
    scc_id              int4,
    member_count        int4,
    members             text[],
    last_iterations     int4,
    last_converged_at   timestamptz
)

Return columns:

ColumnTypeDescription
scc_idint4SCC group identifier (1-based).
member_countint4Number of stream tables in this SCC.
memberstext[]Array of schema.name for each member.
last_iterationsint4Number of fixpoint iterations in the last convergence (NULL if never iterated).
last_converged_attimestamptzTimestamp of the most recent refresh among SCC members (NULL if never refreshed).

Example:

SELECT * FROM pgtrickle.pgt_scc_status();
scc_idmember_countmemberslast_iterationslast_converged_at
12{public.reach_a,public.reach_b}32026-03-15 12:00:00+00

Notes:

  • Only cyclic SCCs (with scc_id IS NOT NULL) are returned. Acyclic stream tables are omitted.
  • last_iterations reflects the maximum last_fixpoint_iterations across SCC members.
  • Results are queried from the catalog on each call.

pgtrickle.explain_st

Explain the DVM plan for a stream table's defining query.

pgtrickle.explain_st(name text) → SETOF record(
    property  text,
    value     text
)

Example:

SELECT * FROM pgtrickle.explain_st('order_totals');
propertyvalue
pgt_namepublic.order_totals
defining_querySELECT region, SUM(amount) ...
refresh_modeDIFFERENTIAL
statusactive
is_populatedtrue
dvm_supportedtrue
operator_treeAggregate → Scan(orders)
output_columnsregion, total
source_oids16384
delta_queryWITH ... SELECT ...
frontier{"orders": "0/15A3B80"}
amplification_stats{"samples":10,"min":1.0,...}
refresh_timing_stats{"samples":10,"min_ms":12.3,...}
source_partitions[{"source":"public.orders",...}]
dependency_graph_dotdigraph dependency_subgraph { ... }
spill_info{"temp_blks_read":0,"temp_blks_written":1234,...}

Output Fields

PropertyDescription
pgt_nameFully-qualified stream table name
defining_queryThe SQL query that defines the stream table
refresh_modeDIFFERENTIAL, FULL, or IMMEDIATE
statusCurrent status (active, suspended, etc.)
is_populatedWhether the stream table has been initially populated
dvm_supportedWhether the defining query supports differential view maintenance
operator_treeDebug representation of the DVM operator tree
output_columnsComma-separated list of output column names
source_oidsComma-separated list of source table OIDs
aggregate_strategiesPer-aggregate maintenance strategies (JSON, if aggregates present)
delta_queryThe generated delta SQL used for DIFFERENTIAL refresh
frontierCurrent LSN/watermark frontier (JSON)
amplification_statsDelta amplification ratio statistics over the last 20 refreshes (JSON)
refresh_timing_statsRefresh duration statistics over the last 20 completed refreshes (JSON). Fields: samples, min_ms, max_ms, avg_ms, latest_ms, latest_action
source_partitionsPartition info for partitioned source tables (JSON array). Fields per entry: source, partition_key, partitions
dependency_graph_dotDependency sub-graph in DOT format. Shows immediate upstream sources (ellipses for base tables, boxes for stream tables) and downstream dependents. Paste into a Graphviz renderer to visualize.
spill_infoTemp file spill metrics from pg_stat_statements (JSON). Fields: temp_blks_read, temp_blks_written, threshold, exceeds_threshold. Only present when pg_trickle.spill_threshold_blocks > 0.

Note: Properties are only included when data is available. For example, source_partitions only appears when at least one source table is partitioned, and refresh_timing_stats only appears after at least one completed refresh.


pgtrickle.list_sources

List the source tables that a stream table depends on.

pgtrickle.list_sources(name text) → SETOF record(
    source_table   text,         -- qualified source table name
    source_oid     bigint,
    source_type    text,         -- 'table', 'stream_table', etc.
    cdc_mode       text,         -- 'trigger', 'wal', or 'transitioning'
    columns_used   text          -- column-level dependency info (if available)
)

Example:

SELECT * FROM pgtrickle.list_sources('order_totals');

Returns the tables tracked by CDC for the given stream table, along with how they are being tracked. Useful when diagnosing why a stream table is not refreshing or to audit which source tables are being trigger-tracked.


Utilities

Utility functions for CDC management and row identity hashing.


pgtrickle.rebuild_cdc_triggers

Rebuild all CDC triggers (function body + trigger DDL) for every source table tracked by pg_trickle. This recreates trigger functions and re-attaches the trigger to each source table.

pgtrickle.rebuild_cdc_triggers() → text

Returns 'done' on success. Emits a WARNING per table on error and continues processing remaining sources.

When to use:

  • After changing pg_trickle.cdc_trigger_mode from row to statement (or vice versa).
  • After ALTER EXTENSION pg_trickle UPDATE when the CDC trigger function body has changed.
  • After restoring from a backup where triggers may have been lost.

Example:

-- Switch to statement-level triggers and rebuild
SET pg_trickle.cdc_trigger_mode = 'statement';
SELECT pgtrickle.rebuild_cdc_triggers();

Notes:

  • Called automatically during ALTER EXTENSION pg_trickle UPDATE (0.3.0 → 0.4.0) migration.
  • Safe to call at any time — existing triggers are dropped and recreated.
  • On error for a specific table, a WARNING is logged and processing continues with remaining sources.

pgtrickle.pg_trickle_hash

Compute a 64-bit xxHash row ID from a text value.

pgtrickle.pg_trickle_hash(input text) → bigint

Marked IMMUTABLE, PARALLEL SAFE.

Example:

SELECT pgtrickle.pg_trickle_hash('some_key');
-- Returns: 1234567890123456789

pgtrickle.pg_trickle_hash_multi

Compute a row ID by hashing multiple text values (composite keys).

pgtrickle.pg_trickle_hash_multi(inputs text[]) → bigint

Marked IMMUTABLE, PARALLEL SAFE. Uses \x1E (record separator) between values and \x00NULL\x00 for NULL entries.

Example:

SELECT pgtrickle.pg_trickle_hash_multi(ARRAY['key1', 'key2']);

Operator Support Matrix — Summary

pg_trickle supports 60+ SQL constructs across three refresh modes. The table below summarises broad categories. For the complete per-operator matrix (including notes on caveats, auxiliary columns and strategies), see DVM_OPERATORS.md.

CategoryFULLDIFFERENTIALIMMEDIATENotes
Basic SELECT / WHERE / DISTINCT
Joins (INNER, LEFT, RIGHT, FULL, CROSS, LATERAL)Hybrid delta strategy
Subqueries (EXISTS, IN, NOT EXISTS, NOT IN, scalar)
Set operations (UNION ALL, INTERSECT, EXCEPT)
Algebraic aggregates (COUNT, SUM, AVG, STDDEV, …)Fully invertible delta
Semi-algebraic aggregates (MIN, MAX)Group rescan on ambiguous delete
Group-rescan aggregates (STRING_AGG, ARRAY_AGG, …)⚠️⚠️Warning emitted at creation time
Window functions (ROW_NUMBER, RANK, LAG, LEAD, …)Partition-scoped recompute
CTEs (non-recursive and WITH RECURSIVE)Semi-naive / DRed strategies
TopK (ORDER BY … LIMIT)Scoped recomputation
LATERAL / set-returning functions / JSON_TABLERow-scoped re-execution
ST-to-ST dependenciesDifferential via change buffers
VOLATILE functionsRejected at creation time

Legend: ✅ fully supported — ⚠️ supported with caveats — ❌ not supported

For details on each operator's delta strategy, auxiliary columns, and known limitations, see the full Operator Support Matrix.


Expression Support

pgtrickle's DVM parser supports a wide range of SQL expressions in defining queries. All expressions work in both FULL and DIFFERENTIAL modes.

Conditional Expressions

ExpressionExampleNotes
CASE WHEN … THEN … ELSE … ENDCASE WHEN amount > 100 THEN 'high' ELSE 'low' ENDSearched CASE
CASE <expr> WHEN … THEN … ENDCASE status WHEN 1 THEN 'active' WHEN 2 THEN 'inactive' ENDSimple CASE
COALESCE(a, b, …)COALESCE(phone, email, 'unknown')Returns first non-NULL argument
NULLIF(a, b)NULLIF(divisor, 0)Returns NULL if a = b
GREATEST(a, b, …)GREATEST(score1, score2, score3)Returns the largest value
LEAST(a, b, …)LEAST(price, max_price)Returns the smallest value

Comparison Operators

ExpressionExampleNotes
IN (list)category IN ('A', 'B', 'C')Also supports NOT IN
BETWEEN a AND bprice BETWEEN 10 AND 100Also supports NOT BETWEEN
IS DISTINCT FROMa IS DISTINCT FROM bNULL-safe inequality
IS NOT DISTINCT FROMa IS NOT DISTINCT FROM bNULL-safe equality
SIMILAR TOname SIMILAR TO '%pattern%'SQL regex matching
op ANY(array)id = ANY(ARRAY[1,2,3])Array comparison
op ALL(array)score > ALL(ARRAY[50,60])Array comparison

Boolean Tests

ExpressionExample
IS TRUEactive IS TRUE
IS NOT TRUEflag IS NOT TRUE
IS FALSEcompleted IS FALSE
IS NOT FALSEvalid IS NOT FALSE
IS UNKNOWNresult IS UNKNOWN
IS NOT UNKNOWNflag IS NOT UNKNOWN

SQL Value Functions

FunctionDescription
CURRENT_DATECurrent date
CURRENT_TIMECurrent time with time zone
CURRENT_TIMESTAMPCurrent date and time with time zone
LOCALTIMECurrent time without time zone
LOCALTIMESTAMPCurrent date and time without time zone
CURRENT_ROLECurrent role name
CURRENT_USERCurrent user name
SESSION_USERSession user name
CURRENT_CATALOGCurrent database name
CURRENT_SCHEMACurrent schema name

Array and Row Expressions

ExpressionExampleNotes
ARRAY[…]ARRAY[1, 2, 3]Array constructor
ROW(…)ROW(a, b, c)Row constructor
Array subscriptarr[1]Array element access
Field access(rec).fieldComposite type field access
Star indirection(data).*Expand all fields

Subquery Expressions

Subqueries are supported in the WHERE clause and SELECT list. They are parsed into dedicated DVM operators with specialized delta computation for incremental maintenance.

ExpressionExampleDVM Operator
EXISTS (subquery)WHERE EXISTS (SELECT 1 FROM orders WHERE orders.cid = c.id)Semi-Join
NOT EXISTS (subquery)WHERE NOT EXISTS (SELECT 1 FROM orders WHERE orders.cid = c.id)Anti-Join
IN (subquery)WHERE id IN (SELECT product_id FROM order_items)Semi-Join (rewritten as equality)
NOT IN (subquery)WHERE id NOT IN (SELECT product_id FROM order_items)Anti-Join
ALL (subquery)WHERE price > ALL (SELECT price FROM competitors)Anti-Join (NULL-safe)
Scalar subquery (SELECT)SELECT (SELECT max(price) FROM products) AS max_pScalar Subquery

Notes:

  • EXISTS and IN (subquery) in the WHERE clause are transformed into semi-join operators. NOT EXISTS and NOT IN (subquery) become anti-join operators.
  • Multi-column IN (subquery) is not supported (e.g., WHERE (a, b) IN (SELECT x, y FROM ...)). Rewrite as WHERE EXISTS (SELECT 1 FROM ... WHERE a = x AND b = y) for equivalent semantics.
  • Multiple subqueries in the same WHERE clause are supported when combined with AND. Subqueries combined with OR are also supported — they are automatically rewritten into UNION of separate filtered queries.
  • Scalar subqueries in the SELECT list are supported as long as they return exactly one row and one column.
  • ALL (subquery) is supported — see the worked example below.

ALL (subquery) — Worked Example

ALL (subquery) tests whether a comparison holds against every row returned by the subquery. pg_trickle rewrites it to a NULL-safe anti-join so it can be maintained incrementally.

Comparison operators supported: >, >=, <, <=, =, <>

Example — products cheaper than all competitors:

-- Source tables
CREATE TABLE products (
    id    INT PRIMARY KEY,
    name  TEXT,
    price NUMERIC
);
CREATE TABLE competitor_prices (
    id          INT PRIMARY KEY,
    product_id  INT,
    price       NUMERIC
);

-- Sample data
INSERT INTO products VALUES (1, 'Widget', 9.99), (2, 'Gadget', 24.99), (3, 'Gizmo', 14.99);
INSERT INTO competitor_prices VALUES (1, 1, 12.99), (2, 1, 11.50), (3, 2, 19.99), (4, 3, 14.99);

-- Stream table: find products priced below ALL competitor prices
SELECT pgtrickle.create_stream_table(
    name  => 'cheapest_products',
    query => $$
        SELECT p.id, p.name, p.price
        FROM products p
        WHERE p.price < ALL (
            SELECT cp.price
            FROM competitor_prices cp
            WHERE cp.product_id = p.id
        )
    $$,
    schedule => '1m'
);

Result: Widget (9.99 < all of [12.99, 11.50]) is included. Gadget (24.99 ≮ 19.99) is excluded. Gizmo (14.99 ≮ 14.99) is excluded.

How pg_trickle handles it internally:

  1. WHERE price < ALL (SELECT ...) is parsed into an anti-join with a NULL-safe condition.
  2. The condition NOT (x op col) is wrapped as (col IS NULL OR NOT (x op col)) to correctly handle NULL values in the subquery — if any subquery row is NULL, the ALL comparison fails (standard SQL semantics).
  3. The anti-join uses the same incremental delta computation as NOT EXISTS, so changes to either products or competitor_prices are propagated efficiently.

Other common patterns:

-- Employees whose salary meets or exceeds all department maximums
WHERE salary >= ALL (SELECT max_salary FROM department_caps)

-- Orders with ratings better than all thresholds
WHERE rating > ALL (SELECT min_rating FROM quality_thresholds)

Auto-Rewrite Pipeline

pg_trickle transparently rewrites certain SQL constructs before parsing. These rewrites are applied automatically and require no user action:

OrderTriggerRewrite
#0View references in FROMInline view body as subquery
#1DISTINCT ON (expr)Convert to ROW_NUMBER() OVER (PARTITION BY expr ORDER BY ...) = 1 subquery
#2GROUPING SETS / CUBE / ROLLUPDecompose into UNION ALL of separate GROUP BY queries
#3Scalar subquery in WHEREConvert to CROSS JOIN with inline view
#4Correlated scalar subquery in SELECTConvert to LEFT JOIN with grouped inline view
#5EXISTS/IN inside ORSplit into UNION of separate filtered queries
#6Multiple PARTITION BY clausesSplit into joined subqueries, one per distinct partitioning
#7Window functions inside expressionsLift to inner subquery with synthetic __pgt_wf_N columns (see below)

Window Functions in Expressions (Auto-Rewrite)

Window functions nested inside expressions (e.g., CASE WHEN ROW_NUMBER() ..., ABS(RANK() OVER (...) - 5)) are automatically rewritten. pg_trickle lifts each window function call into a synthetic column in an inner subquery, then applies the original expression in the outer SELECT.

This rewrite is transparent — you write your query naturally and pg_trickle handles it:

Your query:

SELECT
    id,
    name,
    CASE WHEN ROW_NUMBER() OVER (PARTITION BY dept ORDER BY salary DESC) = 1
         THEN 'top earner'
         ELSE 'other'
    END AS rank_label
FROM employees

What pg_trickle generates internally:

SELECT
    "__pgt_wf_inner".id,
    "__pgt_wf_inner".name,
    CASE WHEN "__pgt_wf_inner"."__pgt_wf_1" = 1
         THEN 'top earner'
         ELSE 'other'
    END AS "rank_label"
FROM (
    SELECT *, ROW_NUMBER() OVER (PARTITION BY dept ORDER BY salary DESC) AS "__pgt_wf_1"
    FROM employees
) "__pgt_wf_inner"

The inner subquery produces the window function result as a plain column (__pgt_wf_1), which the DVM engine can maintain incrementally using its existing window function support. The outer expression is then a simple column reference.

More examples:

-- Arithmetic with window functions
SELECT id, ABS(RANK() OVER (ORDER BY score) - 5) AS adjusted_rank
FROM players

-- COALESCE with window function
SELECT id, COALESCE(LAG(value) OVER (ORDER BY ts), 0) AS prev_value
FROM sensor_readings

-- Multiple window functions in expressions
SELECT id,
       ROW_NUMBER() OVER (ORDER BY created_at) * 100 AS seq,
       SUM(amount) OVER (ORDER BY created_at) / COUNT(*) OVER (ORDER BY created_at) AS running_avg
FROM transactions

All of these are handled automatically — each distinct window function call is extracted to its own __pgt_wf_N synthetic column.

HAVING Clause

HAVING is fully supported. The filter predicate is applied on top of the aggregate delta computation — groups that pass the HAVING condition are included in the stream table.

SELECT pgtrickle.create_stream_table(
    name     => 'big_departments',
    query    => 'SELECT department, COUNT(*) AS cnt FROM employees GROUP BY department HAVING COUNT(*) > 10',
    schedule => '1m'
);

Tables Without Primary Keys (Keyless Tables)

Tables without a primary key can be used as sources. pg_trickle generates a content-based row identity by hashing all column values using pg_trickle_hash_multi(). This allows DIFFERENTIAL mode to work, though at the cost of being unable to distinguish truly duplicate rows (rows with identical values in all columns).

-- No primary key — pg_trickle uses content hashing for row identity
CREATE TABLE events (ts TIMESTAMPTZ, payload JSONB);
SELECT pgtrickle.create_stream_table(
    name     => 'event_summary',
    query    => 'SELECT payload->>''type'' AS event_type, COUNT(*) FROM events GROUP BY 1',
    schedule => '1m'
);

Known Limitation — Duplicate Rows in Keyless Tables (G7.1)

When a keyless table contains exact duplicate rows (identical values in every column), content-based hashing produces the same __pgt_row_id for each copy. Consequences:

  • INSERT of a duplicate row may appear as a no-op (the hash already exists in the stream table).
  • DELETE of one copy may delete all copies (the MERGE matches on __pgt_row_id, hitting every duplicate).
  • Aggregate counts over keyless tables with duplicates may drift from the true query result.

Recommendation: Add a PRIMARY KEY or at least a UNIQUE constraint to source tables used in DIFFERENTIAL mode. This eliminates the ambiguity entirely. If duplicates are expected and correctness matters, use FULL refresh mode, which always recomputes from scratch.

Volatile Function Detection

pg_trickle checks all functions and operators in the defining query against pg_proc.provolatile:

  • VOLATILE functions (e.g., random(), clock_timestamp(), gen_random_uuid()) are rejected in DIFFERENTIAL and IMMEDIATE modes because they produce different results on each evaluation, breaking delta correctness.
  • VOLATILE operators — custom operators backed by volatile functions are also detected. The check resolves the operator’s implementation function via pg_operator.oprcode and checks its volatility in pg_proc.
  • STABLE functions (e.g., now(), current_timestamp, current_setting()) produce a warning in DIFFERENTIAL and IMMEDIATE modes — they are consistent within a single refresh but may differ between refreshes.
  • IMMUTABLE functions are always safe and produce no warnings.

FULL mode accepts all volatility classes since it re-evaluates the entire query each time.

Volatile Function Policy (VOL-1)

The pg_trickle.volatile_function_policy GUC controls how volatile functions are handled:

ValueBehavior
reject (default)ERROR — volatile functions are rejected at creation time.
warnWARNING emitted but creation proceeds. Delta correctness is not guaranteed.
allowSilent — no warning or error. Use when you understand the implications.
-- Allow volatile functions with a warning
SET pg_trickle.volatile_function_policy = 'warn';

-- Allow volatile functions silently
SET pg_trickle.volatile_function_policy = 'allow';

-- Restore default (reject volatile functions)
SET pg_trickle.volatile_function_policy = 'reject';

COLLATE Expressions

COLLATE clauses on expressions are supported:

SELECT pgtrickle.create_stream_table(
    name     => 'sorted_names',
    query    => 'SELECT name COLLATE "C" AS c_name FROM users',
    schedule => '1m'
);

IS JSON Predicate (PostgreSQL 16+)

The IS JSON predicate validates whether a value is valid JSON. All variants are supported:

-- Filter rows with valid JSON
SELECT pgtrickle.create_stream_table(
    name     => 'valid_json_events',
    query    => 'SELECT id, payload FROM events WHERE payload::text IS JSON',
    schedule => '1m'
);

-- Type-specific checks
SELECT pgtrickle.create_stream_table(
    name         => 'json_objects_only',
    query        => 'SELECT id, data IS JSON OBJECT AS is_obj,
          data IS JSON ARRAY AS is_arr,
          data IS JSON SCALAR AS is_scalar
   FROM json_data',
    schedule     => '1m',
    refresh_mode => 'FULL'
);

Supported variants: IS JSON, IS JSON OBJECT, IS JSON ARRAY, IS JSON SCALAR, IS NOT JSON (all forms), WITH UNIQUE KEYS.

SQL/JSON Constructors (PostgreSQL 16+)

SQL-standard JSON constructor functions are supported in both FULL and DIFFERENTIAL modes:

-- JSON_OBJECT: construct a JSON object from key-value pairs
SELECT pgtrickle.create_stream_table(
    name     => 'user_json',
    query    => 'SELECT id, JSON_OBJECT(''name'' : name, ''age'' : age) AS data FROM users',
    schedule => '1m'
);

-- JSON_ARRAY: construct a JSON array from values
SELECT pgtrickle.create_stream_table(
    name         => 'value_arrays',
    query        => 'SELECT id, JSON_ARRAY(a, b, c) AS arr FROM measurements',
    schedule     => '1m',
    refresh_mode => 'FULL'
);

-- JSON(): parse a text value as JSON
-- JSON_SCALAR(): wrap a scalar value as JSON
-- JSON_SERIALIZE(): serialize a JSON value to text

Note: JSON_ARRAYAGG() and JSON_OBJECTAGG() are SQL-standard aggregate functions fully recognized by the DVM engine. In DIFFERENTIAL mode, they use the group-rescan strategy (affected groups are re-aggregated from source data). The full deparsed SQL is preserved to handle the special key: value, ABSENT ON NULL, ORDER BY, and RETURNING clause syntax.

JSON_TABLE (PostgreSQL 17+)

JSON_TABLE() generates a relational table from JSON data. It is supported in the FROM clause in both FULL and DIFFERENTIAL modes. Internally, it is modeled as a LateralFunction.

-- Extract structured data from a JSON column
SELECT pgtrickle.create_stream_table(
    name     => 'user_phones',
    query    => $$SELECT u.id, j.phone_type, j.phone_number
    FROM users u,
         JSON_TABLE(u.contact_info, '$.phones[*]'
           COLUMNS (
             phone_type TEXT PATH '$.type',
             phone_number TEXT PATH '$.number'
           )
         ) AS j$$,
    schedule => '1m'
);

Supported column types:

  • Regular columnsname TYPE PATH '$.path' (with optional ON ERROR/ON EMPTY behaviors)
  • EXISTS columnsname TYPE EXISTS PATH '$.path'
  • Formatted columnsname TYPE FORMAT JSON PATH '$.path'
  • Nested columnsNESTED PATH '$.path' COLUMNS (...)

The PASSING clause is also supported for passing named variables to path expressions.

Unsupported Expression Types

The following are rejected with clear error messages rather than producing broken SQL:

ExpressionError BehaviorSuggested Rewrite
TABLESAMPLERejected — stream tables materialize the complete result setUse WHERE random() < 0.1 if sampling is needed
FOR UPDATE / FOR SHARERejected — stream tables do not support row-level lockingRemove the locking clause
Unknown node typesRejected with type information

Note: Window functions inside expressions (e.g., CASE WHEN ROW_NUMBER() OVER (...) ...) were unsupported in earlier versions but are now automatically rewritten — see Auto-Rewrite Pipeline § Window Functions in Expressions.


Restrictions & Interoperability

Stream tables are standard PostgreSQL heap tables stored in the pgtrickle schema with an additional __pgt_row_id BIGINT PRIMARY KEY column managed by the refresh engine. This section describes what you can and cannot do with them.

Referencing Other Stream Tables

Stream tables can reference other stream tables in their defining query. This creates a dependency edge in the internal DAG, and the scheduler refreshes upstream tables before downstream ones. By default, cycles are detected and rejected at creation time.

When pg_trickle.allow_circular = true, circular dependencies are allowed for stream tables that use DIFFERENTIAL refresh mode and have monotone defining queries (no aggregates, EXCEPT, window functions, or NOT EXISTS/NOT IN). Cycle members are assigned an scc_id and the scheduler iterates them to a fixed point. Non-monotone operators are rejected because they prevent convergence.

-- ST1 reads from a base table
SELECT pgtrickle.create_stream_table(
    name     => 'order_totals',
    query    => 'SELECT customer_id, SUM(amount) AS total FROM orders GROUP BY customer_id',
    schedule => '1m'
);

-- ST2 reads from ST1
SELECT pgtrickle.create_stream_table(
    name     => 'big_customers',
    query    => 'SELECT customer_id, total FROM pgtrickle.order_totals WHERE total > 1000',
    schedule => '1m'
);

Views as Sources in Defining Queries

PostgreSQL views can be used as source tables in a stream table's defining query. Views are automatically inlined — replaced with their underlying SELECT definition as subqueries — so CDC triggers land on the actual base tables.

CREATE VIEW active_orders AS
  SELECT * FROM orders WHERE status = 'active';

-- This works (views are auto-inlined):
SELECT pgtrickle.create_stream_table(
    name     => 'order_summary',
    query    => 'SELECT customer_id, COUNT(*) FROM active_orders GROUP BY customer_id',
    schedule => '1m'
);
-- Internally, 'active_orders' is replaced with:
--   (SELECT ... FROM orders WHERE status = 'active') AS active_orders

Nested views (view → view → table) are fully expanded via a fixpoint loop. Column-renaming views (CREATE VIEW v(a, b) AS ...) work correctly — pg_get_viewdef() produces the proper column aliases.

When a view is inlined, the user's original SQL is stored in the original_query catalog column for reinit and introspection. The defining_query column contains the expanded (post-inlining) form.

DDL hooks: CREATE OR REPLACE VIEW on a view that was inlined into a stream table marks that ST for reinit. DROP VIEW sets affected STs to ERROR status.

Materialized views are rejected in DIFFERENTIAL mode — their stale-snapshot semantics prevent CDC triggers from tracking changes. Use the underlying query directly, or switch to FULL mode. In FULL mode, materialized views are allowed (no CDC needed).

Foreign tables are rejected in DIFFERENTIAL mode — row-level triggers cannot be created on foreign tables. Use FULL mode instead.

Partitioned Tables as Sources

Partitioned tables are fully supported as source tables in both FULL and DIFFERENTIAL modes. CDC triggers are installed on the partitioned parent table, and PostgreSQL 13+ ensures the trigger fires for all DML routed to child partitions. The change buffer uses the parent table's OID (pgtrickle_changes.changes_<parent_oid>).

CREATE TABLE orders (
    id INT, region TEXT, amount NUMERIC
) PARTITION BY LIST (region);
CREATE TABLE orders_us PARTITION OF orders FOR VALUES IN ('US');
CREATE TABLE orders_eu PARTITION OF orders FOR VALUES IN ('EU');

-- Works — inserts into any partition are captured:
SELECT pgtrickle.create_stream_table(
    name     => 'order_summary',
    query    => 'SELECT region, SUM(amount) FROM orders GROUP BY region',
    schedule => '1m'
);

ATTACH PARTITION detection: When a new partition is attached to a tracked source table via ALTER TABLE parent ATTACH PARTITION child ..., pg_trickle's DDL event trigger detects the change in partition structure and automatically marks affected stream tables for reinitialize. This ensures pre-existing rows in the newly attached partition are included on the next refresh. DETACH PARTITION is also detected and triggers reinitialization.

WAL mode: When using WAL-based CDC (cdc_mode = 'wal'), publications for partitioned source tables are created with publish_via_partition_root = true. This ensures changes from child partitions are published under the parent table's identity, matching trigger-mode CDC behavior.

Note: pg_trickle targets PostgreSQL 18. On PostgreSQL 12 or earlier (not supported), parent triggers do not fire for partition-routed rows, which would cause silent data loss.

Foreign Tables as Sources

Foreign tables (via postgres_fdw or other FDWs) can be used as stream table sources with these constraints:

CDC MethodSupported?Why
Trigger-based❌ NoForeign tables don't support row-level triggers
WAL-based❌ NoForeign tables don't generate local WAL entries
FULL refresh✅ YesRe-executes the remote query each cycle
Polling-based✅ YesWhen pg_trickle.foreign_table_polling = on
-- Foreign table source — FULL refresh only
SELECT pgtrickle.create_stream_table(
    name         => 'remote_summary',
    query        => 'SELECT region, SUM(amount) FROM remote_orders GROUP BY region',
    schedule     => '5m',
    refresh_mode => 'FULL'
);

When pg_trickle detects a foreign table source, it emits an INFO message explaining the constraints. If you attempt to use DIFFERENTIAL mode without polling enabled, the creation will succeed but the refresh falls back to FULL.

Polling-based CDC creates a local snapshot table and computes EXCEPT ALL differences on each refresh. Enable with:

SET pg_trickle.foreign_table_polling = on;

For a complete step-by-step setup guide, see the Foreign Table Sources tutorial.

IMMEDIATE Mode Query Restrictions

The 'IMMEDIATE' refresh mode supports nearly all SQL constructs supported by 'DIFFERENTIAL' and 'FULL' modes. Queries are validated at stream table creation and when switching to IMMEDIATE mode via alter_stream_table.

Supported in IMMEDIATE mode:

  • Simple SELECT ... FROM table scans, filters, projections
  • JOIN (INNER, LEFT, FULL OUTER)
  • GROUP BY with standard aggregates (COUNT, SUM, AVG, MIN, MAX, etc.)
  • DISTINCT
  • Non-recursive WITH (CTEs)
  • UNION ALL, INTERSECT, EXCEPT
  • EXISTS / IN subqueries (SemiJoin, AntiJoin)
  • Subqueries in FROM
  • Window functions (ROW_NUMBER, RANK, DENSE_RANK, etc.)
  • LATERAL subqueries
  • LATERAL set-returning functions (unnest(), jsonb_array_elements(), etc.)
  • Scalar subqueries in SELECT
  • Cascading IMMEDIATE stream tables (ST depending on another IMMEDIATE ST)
  • Recursive CTEs (WITH RECURSIVE) — uses semi-naive evaluation (INSERT-only) or Delete-and-Rederive (DELETE/UPDATE); bounded by pg_trickle.ivm_recursive_max_depth (default 100) to guard against infinite loops from cyclic data

Not yet supported in IMMEDIATE mode:

None — all constructs that work in 'DIFFERENTIAL' mode are now also available in 'IMMEDIATE' mode.

Notes on WITH RECURSIVE in IMMEDIATE mode:

  • A __pgt_depth counter is injected into the generated semi-naive SQL. Propagation stops when the counter reaches ivm_recursive_max_depth (default 100). Raise this GUC for deeper hierarchies or set it to 0 to disable the guard.
  • A WARNING is emitted at stream table creation time reminding operators to monitor for stack depth limit exceeded errors on very deep hierarchies.
  • Non-linear recursion (multiple self-references) is rejected — PostgreSQL itself enforces this restriction.

Attempting to create a stream table with an unsupported construct produces a clear error message.

Logical Replication Targets

Tables that receive data via logical replication require special consideration. Changes arriving via replication do not fire normal row-level triggers, which means CDC triggers will miss those changes.

pg_trickle emits a WARNING at stream table creation time if any source table is detected as a logical replication target (via pg_subscription_rel).

Workarounds:

  • Use cdc_mode = 'wal' for WAL-based CDC that captures all changes regardless of origin.
  • Use FULL refresh mode, which recomputes entirely from the current table state.
  • Set a frequent refresh schedule with FULL mode to limit staleness.

Views on Stream Tables

PostgreSQL views can reference stream tables. The view reflects the data as of the most recent refresh.

CREATE VIEW top_customers AS
SELECT customer_id, total
FROM pgtrickle.order_totals
WHERE total > 500
ORDER BY total DESC;

Materialized Views on Stream Tables

Materialized views can reference stream tables, though this is typically redundant (both are physical snapshots of a query). The materialized view requires its own REFRESH MATERIALIZED VIEW — it does not auto-refresh when the stream table refreshes.

Logical Replication of Stream Tables

Stream tables can be published for logical replication like any ordinary table:

-- On publisher
CREATE PUBLICATION my_pub FOR TABLE pgtrickle.order_totals;

-- On subscriber
CREATE SUBSCRIPTION my_sub
  CONNECTION 'host=... dbname=...'
  PUBLICATION my_pub;

Caveats:

  • The __pgt_row_id column is replicated (it is the primary key), which is an internal implementation detail.
  • The subscriber receives materialized data, not the defining query. Refreshes on the publisher propagate as normal DML via logical replication.
  • Do not install pg_trickle on the subscriber and attempt to refresh the replicated table — it will have no CDC triggers or catalog entries.
  • The internal change buffer tables (pgtrickle_changes.changes_<oid>) and catalog tables are not published by default; subscribers only receive the final output.

Known Delta Computation Limitations

The following edge cases produce incorrect delta results in DIFFERENTIAL mode under specific data mutation patterns. They have no effect on FULL mode.

JOIN Key Column Change + Simultaneous Right-Side Delete — Fixed (EC-01)

Resolved in v0.14.0. This limitation no longer exists — the delta query now uses a pre-change right snapshot (R₀) for DELETE deltas, ensuring stale rows are correctly removed even when the join partner is simultaneously deleted.

The fix splits Part 1 of the JOIN delta into two arms:

  • Part 1a (inserts): ΔL_inserts ⋈ R₁ — uses current right state
  • Part 1b (deletes): ΔL_deletes ⋈ R₀ — uses pre-change right state

R₀ is reconstructed as R_current EXCEPT ALL ΔR_inserts UNION ALL ΔR_deletes (or via NOT EXISTS anti-join for simple Scan nodes). This ensures the DELETE half always finds the old join partner, even if that partner was deleted in the same cycle.

The fix applies to INNER JOIN, LEFT JOIN, and FULL OUTER JOIN delta operators. See DVM_OPERATORS.md for implementation details.

CUBE/ROLLUP Expansion Limit

CUBE(a, b, c...n) on N columns generates $2^N$ grouping set branches (a UNION ALL of N queries). pg_trickle rejects CUBE/ROLLUP that would produce more than 64 branches to prevent runaway memory usage during query generation. Use explicit GROUPING SETS(...) instead:

-- Rejected: CUBE(a, b, c, d, e, f, g) would generate 128 branches
-- Use instead:
SELECT pgtrickle.create_stream_table(
    name     => 'multi_dim',
    query    => 'SELECT a, b, c, SUM(v) FROM t
   GROUP BY GROUPING SETS ((a, b, c), (a, b), (a), ())',
    schedule => '5m'
);

What Is NOT Allowed

OperationRestrictionReason
Direct DML (INSERT, UPDATE, DELETE)❌ Not supportedStream table contents are managed exclusively by the refresh engine.
Direct DDL (ALTER TABLE)❌ Not supportedUse pgtrickle.alter_stream_table() to change the defining query or schedule.
Foreign keys referencing or from a stream table❌ Not supportedThe refresh engine performs bulk MERGE operations that do not respect FK ordering.
User-defined triggers on stream tables✅ Supported (DIFFERENTIAL)In DIFFERENTIAL mode, the refresh engine decomposes changes into explicit DELETE + UPDATE + INSERT statements so triggers fire with correct TG_OP, OLD, and NEW. Row-level triggers are suppressed during FULL refresh. Controlled by pg_trickle.user_triggers GUC (default: auto).
TRUNCATE on a stream table❌ Not supportedUse pgtrickle.refresh_stream_table() to reset data.

Tip: The __pgt_row_id column is visible but should be ignored by consuming queries — it is an implementation detail used for delta MERGE operations.

Internal __pgt_* Auxiliary Columns

Stream tables may contain additional hidden columns whose names begin with __pgt_. These are managed exclusively by the refresh engine — they are not part of the user-visible schema and should never be read or written by application queries.

__pgt_row_id — Row identity (always present)

Every stream table has a BIGINT PRIMARY KEY column named __pgt_row_id. It is a content hash of all output columns (xxHash3-128 with Fibonacci-mixing of multiple column hashes), updated by the refresh engine on every MERGE. It is used as the MERGE join key to detect inserts/updates/deletes.

__pgt_count — Group multiplicity (aggregates & DISTINCT)

Added when the defining query contains GROUP BY, DISTINCT, UNION ALL ... GROUP BY, or any aggregate expression that requires tracking how many source rows contribute to each output row.

TypeTriggers
BIGINT NOT NULL DEFAULT 0GROUP BY, DISTINCT, COUNT(*), SUM(...), AVG(...), STDDEV(...), VAR(...), UNION deduplication

__pgt_count_l / __pgt_count_r — Dual multiplicity (INTERSECT / EXCEPT)

Added when the defining query contains INTERSECT or EXCEPT. Stores independently the left-branch and right-branch row counts for Z-set delta algebra.

TypeTriggers
BIGINT NOT NULL DEFAULT 0 eachINTERSECT, INTERSECT ALL, EXCEPT, EXCEPT ALL

__pgt_aux_sum_<alias> / __pgt_aux_count_<alias> — Running totals for AVG

Pairs of auxiliary columns added for each AVG(expr) in the query. Instead of recomputing the average from scratch on each delta, the refresh engine maintains a running sum and count and derives the average algebraically.

TypeTriggers
NUMERIC NOT NULL DEFAULT 0 (sum), BIGINT NOT NULL DEFAULT 0 (count)Any AVG(expr) in GROUP BY query

Named __pgt_aux_sum_<output_alias> and __pgt_aux_count_<output_alias>, where <output_alias> is the column alias for the AVG expression in the SELECT list.

__pgt_aux_sum2_<alias> — Sum-of-squares for STDDEV / VARIANCE

Added alongside the sum/count pair when the query contains STDDEV, STDDEV_POP, STDDEV_SAMP, VARIANCE, VAR_POP, or VAR_SAMP. Enables O(1) algebraic computation of variance from the Welford identity.

TypeTriggers
NUMERIC NOT NULL DEFAULT 0STDDEV(...), STDDEV_POP(...), STDDEV_SAMP(...), VARIANCE(...), VAR_POP(...), VAR_SAMP(...)

__pgt_aux_sumx_* / __pgt_aux_sumy_* / __pgt_aux_sumxy_* / __pgt_aux_sumx2_* / __pgt_aux_sumy2_* — Cross-product accumulators for regression aggregates

Five auxiliary columns per aggregate, used for O(1) algebraic maintenance of the twelve PostgreSQL regression and correlation aggregates.

TypeTriggers
NUMERIC NOT NULL DEFAULT 0 (five columns per aggregate)CORR(Y,X), COVAR_POP(Y,X), COVAR_SAMP(Y,X), REGR_AVGX(Y,X), REGR_AVGY(Y,X), REGR_COUNT(Y,X), REGR_INTERCEPT(Y,X), REGR_R2(Y,X), REGR_SLOPE(Y,X), REGR_SXX(Y,X), REGR_SXY(Y,X), REGR_SYY(Y,X)

The five columns are named with base prefix __pgt_aux_<kind>_<output_alias> where <kind> is sumx, sumy, sumxy, sumx2, or sumy2. The shared group count is stored in the companion __pgt_aux_count_<output_alias> column.

__pgt_aux_nonnull_<alias> — Non-NULL count for SUM + FULL OUTER JOIN

Added when the query contains SUM(expr) inside a FULL OUTER JOIN aggregate. When matched rows transition to unmatched (null-padded), standard algebraic SUM would produce 0 instead of NULL. This counter tracks how many non-NULL argument values exist in each group; when it reaches zero the SUM is definitively NULL without a full rescan.

TypeTriggers
BIGINT NOT NULL DEFAULT 0SUM(expr) in a query with FULL OUTER JOIN at the top level

__pgt_wf_<N> — Window function lift-out (query rewrite)

Added at query-rewrite time (before storage table creation) when the defining query contains window functions embedded inside larger expressions (e.g. CASE WHEN ROW_NUMBER() OVER (...) = 1 THEN ...). The engine lifts the window function to a synthetic inner-subquery column so the outer SELECT can reference it by alias.

TypeTriggers
Inherits the window-function return typeWindow function inside expression — e.g. RANK(), ROW_NUMBER(), DENSE_RANK(), LAG(), LEAD(), etc.

__pgt_depth — Recursion depth counter (recursive CTE)

Present only inside the DVM-generated SQL for recursive CTE queries. Used to limit unbounded recursion in semi-naive evaluation. Not added as a permanent column to the storage table.


Rule of thumb: Unless you see an ALTER TABLE query mentioning one of these columns, they are transparent to consuming queries. Never SELECT __pgt_* columns in application code — their names, types, and presence may change across minor versions.

Row-Level Security (RLS)

Stream tables follow the same RLS model as PostgreSQL's built-in MATERIALIZED VIEW: the refresh always materializes the full, unfiltered result set. Access control is applied at read time via RLS policies on the stream table itself.

How It Works

AreaBehavior
RLS on source tablesIgnored during refresh. The scheduler runs as superuser; manual refresh_stream_table() and IMMEDIATE-mode triggers bypass RLS via SET LOCAL row_security = off / SECURITY DEFINER. The stream table always contains all rows.
RLS on the stream tableWorks naturally. Enable RLS and create policies on the stream table to filter reads per role — exactly as you would on any regular table.
RLS policy changes on source tablesCREATE POLICY, ALTER POLICY, and DROP POLICY on a source table are detected by pg_trickle's DDL event trigger and mark the stream table for reinitialisation.
ENABLE/DISABLE RLS on source tablesALTER TABLE … ENABLE ROW LEVEL SECURITY and DISABLE ROW LEVEL SECURITY on a source table mark the stream table for reinitialisation.
Change buffer tablesRLS is explicitly disabled on all change buffer tables (pgtrickle_changes.changes_*) so CDC trigger inserts always succeed regardless of schema-level RLS settings.
IMMEDIATE modeIVM trigger functions are SECURITY DEFINER with a locked search_path, so the delta query always sees all rows. The DML issued by the calling user is still filtered by that user's RLS policies on the source table — only the stream table maintenance runs with elevated privileges.
-- 1. Create a stream table (materializes all rows)
SELECT pgtrickle.create_stream_table(
    name  => 'order_totals',
    query => 'SELECT tenant_id, SUM(amount) AS total FROM orders GROUP BY tenant_id'
);

-- 2. Enable RLS on the stream table
ALTER TABLE pgtrickle.order_totals ENABLE ROW LEVEL SECURITY;

-- 3. Create per-tenant policies
CREATE POLICY tenant_isolation ON pgtrickle.order_totals
    USING (tenant_id = current_setting('app.tenant_id')::INT);

-- 4. Each role sees only its own rows
SET app.tenant_id = '42';
SELECT * FROM pgtrickle.order_totals;  -- only tenant 42's rows

Note: This is identical to how you would apply RLS to a regular MATERIALIZED VIEW. One stream table serves all tenants; per-tenant filtering happens at query time with zero storage duplication.


Views

pgtrickle.stream_tables_info

Status overview with computed staleness information.

SELECT * FROM pgtrickle.stream_tables_info;

Columns include all pgtrickle.pgt_stream_tables columns plus:

ColumnTypeDescription
stalenessintervalnow() - last_refresh_at
stalebooltrue when the scheduler itself is behind (last_refresh_at age exceeds schedule); false when the scheduler is healthy even if source tables have had no writes

pgtrickle.pg_stat_stream_tables

Comprehensive monitoring view combining catalog metadata with aggregate refresh statistics.

SELECT * FROM pgtrickle.pg_stat_stream_tables;

Key columns:

ColumnTypeDescription
pgt_idbigintStream table ID
pgt_schema / pgt_nametextSchema and name
statustextINITIALIZING, ACTIVE, SUSPENDED, ERROR
refresh_modetextFULL or DIFFERENTIAL
data_timestamptimestamptzTimestamp of last refresh
stalenessintervalnow() - last_refresh_at
stalebooltrue when the scheduler is behind its schedule; false when the scheduler is healthy (quiet source tables do not count as stale)
total_refreshesbigintTotal refresh count
successful_refreshesbigintSuccessful refresh count
failed_refreshesbigintFailed refresh count
avg_duration_msfloat8Average refresh duration
consecutive_errorsintCurrent error streak
cdc_modestext[]Distinct CDC modes across TABLE-type sources (e.g. {wal}, {trigger,wal}, {transitioning,wal})
scc_idintSCC group identifier for circular dependencies (NULL if not in a cycle)
last_fixpoint_iterationsintNumber of fixpoint iterations in the last SCC convergence (NULL if not cyclic)

pgtrickle.quick_health

Single-row health summary for dashboards and alerting. Returns the overall health status of the pg_trickle extension at a glance.

SELECT * FROM pgtrickle.quick_health;
ColumnTypeDescription
total_stream_tablesbigintTotal number of stream tables
error_tablesbigintStream tables with status = 'ERROR' or consecutive_errors > 0
stale_tablesbigintStream tables whose data is older than their schedule interval
scheduler_runningbooleanWhether a pg_trickle scheduler backend is detected in pg_stat_activity
statustextOverall status: EMPTY, OK, WARNING, or CRITICAL

Status values:

  • EMPTY — No stream tables exist.
  • OK — All stream tables are healthy and up-to-date.
  • WARNING — Some tables have errors or are stale.
  • CRITICAL — At least one stream table is SUSPENDED.

pgtrickle.pgt_cdc_status

Convenience view for inspecting the CDC mode and WAL slot state of every TABLE-type source for all stream tables. Useful for monitoring in-progress TRIGGER→WAL transitions.

SELECT * FROM pgtrickle.pgt_cdc_status;
ColumnTypeDescription
pgt_schematextSchema of the stream table
pgt_nametextName of the stream table
source_relidoidOID of the source table
source_nametextName of the source table
source_schematextSchema of the source table
cdc_modetextCurrent CDC mode: trigger, transitioning, or wal
slot_nametextReplication slot name (NULL for trigger mode)
decoder_confirmed_lsnpg_lsnLast WAL position decoded (NULL for trigger mode)
transition_started_attimestamptzWhen the trigger→WAL transition began (NULL if not transitioning)

Subscribe to the pgtrickle_cdc_transition NOTIFY channel to receive real-time events when a source moves between CDC modes (payload is a JSON object with source_oid, from, and to fields).


Catalog Tables

pgtrickle.pgt_stream_tables

Core metadata for each stream table.

ColumnTypeDescription
pgt_idbigserialPrimary key
pgt_relidoidOID of the storage table
pgt_nametextTable name
pgt_schematextSchema name
defining_querytextThe SQL query that defines the ST
original_querytextThe user-supplied query before normalization
scheduletextRefresh schedule (duration or cron expression)
refresh_modetextFULL, DIFFERENTIAL, or IMMEDIATE
statustextINITIALIZING, ACTIVE, SUSPENDED, ERROR
is_populatedboolWhether the table has been populated
data_timestamptimestamptzTimestamp of the data in the ST
frontierjsonbPer-source LSN positions (version tracking)
last_refresh_attimestamptzWhen last refreshed
consecutive_errorsintCurrent error streak count
needs_reinitboolWhether upstream DDL requires reinitialization
auto_thresholddouble precisionPer-ST adaptive fallback threshold (overrides GUC)
last_full_msdouble precisionLast FULL refresh duration in milliseconds
functions_usedtext[]Function names used in the defining query (for DDL tracking)
topk_limitintLIMIT value for TopK stream tables (NULL if not TopK)
topk_order_bytextORDER BY clause SQL for TopK stream tables
topk_offsetintOFFSET value for paged TopK queries (NULL if not paged)
diamond_consistencytextDiamond consistency mode: none or atomic
diamond_schedule_policytextDiamond schedule policy: fastest or slowest
has_keyless_sourceboolWhether any source table lacks a PRIMARY KEY (EC-06)
function_hashestextMD5 hashes of referenced function bodies for change detection (EC-16)
scc_idintSCC group identifier for circular dependencies (NULL if not in a cycle)
last_fixpoint_iterationsintNumber of iterations in the last SCC fixpoint convergence (NULL if never iterated)
created_attimestamptzCreation timestamp
updated_attimestamptzLast modification timestamp

pgtrickle.pgt_dependencies

DAG edges — records which source tables each ST depends on, including CDC mode metadata.

ColumnTypeDescription
pgt_idbigintFK to pgt_stream_tables
source_relidoidOID of the source table
source_typetextTABLE, STREAM_TABLE, VIEW, MATVIEW, or FOREIGN_TABLE
columns_usedtext[]Which columns are referenced
column_snapshotjsonbSnapshot of source column metadata at creation time
schema_fingerprinttextSHA-256 fingerprint of column snapshot for fast equality checks
cdc_modetextCurrent CDC mode: TRIGGER, TRANSITIONING, or WAL
slot_nametextReplication slot name (WAL/TRANSITIONING modes)
decoder_confirmed_lsnpg_lsnWAL decoder's last confirmed position
transition_started_attimestamptzWhen the trigger→WAL transition started

pgtrickle.pgt_refresh_history

Audit log of all refresh operations.

ColumnTypeDescription
refresh_idbigserialPrimary key
pgt_idbigintFK to pgt_stream_tables
data_timestamptimestamptzData timestamp of the refresh
start_timetimestamptzWhen the refresh started
end_timetimestamptzWhen it completed
actiontextNO_DATA, FULL, DIFFERENTIAL, REINITIALIZE, SKIP
rows_insertedbigintRows inserted
rows_deletedbigintRows deleted
delta_row_countbigintNumber of delta rows processed from change buffers
merge_strategy_usedtextWhich merge strategy was used (e.g. MERGE, DELETE+INSERT)
was_full_fallbackboolWhether the refresh fell back to FULL from DIFFERENTIAL
error_messagetextError message if failed
statustextRUNNING, COMPLETED, FAILED, SKIPPED
initiated_bytextWhat triggered: SCHEDULER, MANUAL, or INITIAL
freshness_deadlinetimestamptzSLA deadline (duration schedules only; NULL for cron)
fixpoint_iterationintIteration of the fixed-point loop (NULL for non-cyclic refreshes)

pgtrickle.pgt_change_tracking

CDC slot tracking per source table.

ColumnTypeDescription
source_relidoidOID of the tracked source table
slot_nametextLogical replication slot name
last_consumed_lsnpg_lsnLast consumed WAL position
tracked_by_pgt_idsbigint[]Array of ST IDs depending on this source

pgtrickle.pgt_source_gates

Bootstrap source gate registry. One row per source table that has ever been gated. Only sources with gated = true are actively blocking scheduler refreshes.

ColumnTypeDescription
source_relidoidOID of the gated source table (PK)
gatedbooleantrue while the source is gated; false after ungate_source()
gated_attimestamptzWhen the gate was most recently set
ungated_attimestamptzWhen the gate was cleared (NULL if still active)
gated_bytextActor that set the gate (e.g. 'gate_source')

pgtrickle.pgt_refresh_groups

User-declared Cross-Source Snapshot Consistency groups (v0.9.0). A refresh group guarantees that all member stream tables are refreshed against a snapshot taken at the same point in time, preventing partial-update visibility (e.g. orders and order_lines both reflecting the same transaction boundary).

ColumnTypeDescription
group_idserialPrimary key
group_nametextUnique human-readable group name
member_oidsoid[]OIDs of the stream table storage relations that participate in this group
isolationtextSnapshot isolation level for the group: 'read_committed' (default) or 'repeatable_read'
created_attimestamptzWhen the group was created

Management API

-- Create a refresh group
SELECT pgtrickle.create_refresh_group(
    'orders_snapshot',
    ARRAY['public.orders_summary', 'public.order_lines_summary'],
    'repeatable_read'   -- or 'read_committed' (default)
);

-- List all groups:
SELECT * FROM pgtrickle.refresh_groups();

-- Remove a group:
SELECT pgtrickle.drop_refresh_group('orders_snapshot');

Validation rules:

  • At least 2 member stream tables are required.
  • All members must exist in pgt_stream_tables.
  • No member can appear in more than one refresh group.
  • Valid isolation levels: 'read_committed' (default), 'repeatable_read'.

Bootstrap Source Gating (v0.5.0)

These functions let operators pause and resume scheduler-driven refreshes for individual source tables — useful during large bulk loads or ETL windows.

pgtrickle.gate_source(source TEXT)

Mark a source table as gated. The scheduler will skip any stream table that reads from this source until ungate_source() is called.

SELECT pgtrickle.gate_source('my_schema.big_source');

Manual refresh_stream_table() calls are not affected by gates.

pgtrickle.ungate_source(source TEXT)

Clear a gate set by gate_source(). After this call the scheduler resumes normal refresh scheduling for dependent stream tables.

SELECT pgtrickle.ungate_source('my_schema.big_source');

pgtrickle.source_gates()

Table function returning the current gate status for all registered sources.

SELECT * FROM pgtrickle.source_gates();
-- source_table | schema_name | gated | gated_at | ungated_at | gated_by
ColumnTypeDescription
source_tabletextRelation name
schema_nametextSchema name
gatedbooleanWhether the source is currently gated
gated_attimestamptzWhen the gate was set
ungated_attimestamptzWhen the gate was cleared (NULL if active)
gated_bytextWhich function set the gate

Typical workflow

-- 1. Gate the source before a bulk load.
SELECT pgtrickle.gate_source('orders');

-- 2. Load historical data (scheduler sits idle for orders-based STs).
COPY orders FROM '/data/historical_orders.csv';

-- 3. Ungate — the next scheduler tick refreshes everything cleanly.
SELECT pgtrickle.ungate_source('orders');

pgtrickle.bootstrap_gate_status() (v0.6.0)

Rich introspection of bootstrap gate lifecycle. Returns the same columns as source_gates() plus computed fields for debugging.

SELECT * FROM pgtrickle.bootstrap_gate_status();
-- source_table | schema_name | gated | gated_at | ungated_at | gated_by | gate_duration | affected_stream_tables
ColumnTypeDescription
source_tabletextRelation name
schema_nametextSchema name
gatedbooleanWhether the source is currently gated
gated_attimestamptzWhen the gate was set (updated on re-gate)
ungated_attimestamptzWhen the gate was cleared (NULL if active)
gated_bytextWhich function set the gate
gate_durationintervalHow long the gate has been active (gated: now() - gated_at; ungated: ungated_at - gated_at)
affected_stream_tablestextComma-separated list of stream tables whose scheduler refreshes are blocked by this gate

Rows are sorted with currently-gated sources first, then alphabetically.

ETL Coordination Cookbook (v0.6.0)

Step-by-step recipes for common bulk-load patterns using source gating.

Recipe 1 — Single Source Bulk Load

Gate one source table during a large data import. The scheduler pauses refreshes for all stream tables that depend on this source.

-- 1. Gate the source before loading.
SELECT pgtrickle.gate_source('orders');

-- 2. Load the data.  The scheduler sits idle for orders-dependent STs.
COPY orders FROM '/data/orders_2026.csv' WITH (FORMAT csv, HEADER);

-- 3. Ungate.  On the next tick the scheduler refreshes everything cleanly.
SELECT pgtrickle.ungate_source('orders');

Recipe 2 — Coordinated Multi-Source Load

When multiple sources feed into a shared downstream stream table, gate them all before loading so no intermediate refreshes occur.

-- 1. Gate all sources that will be loaded.
SELECT pgtrickle.gate_source('orders');
SELECT pgtrickle.gate_source('order_lines');

-- 2. Load each source (can be parallel, any order).
COPY orders FROM '/data/orders.csv' WITH (FORMAT csv, HEADER);
COPY order_lines FROM '/data/lines.csv' WITH (FORMAT csv, HEADER);

-- 3. Ungate all sources.  The scheduler refreshes downstream STs once.
SELECT pgtrickle.ungate_source('orders');
SELECT pgtrickle.ungate_source('order_lines');

Recipe 3 — Gate + Deferred Initialization

Combine gating with initialize => false to prevent incomplete initial population when sources are loaded asynchronously.

-- 1. Gate sources before creating any stream tables.
SELECT pgtrickle.gate_source('orders');
SELECT pgtrickle.gate_source('order_lines');

-- 2. Create stream tables without initial population.
SELECT pgtrickle.create_stream_table(
    'order_summary',
    'SELECT region, SUM(amount) FROM orders GROUP BY region',
    '1m', initialize => false
);
SELECT pgtrickle.create_stream_table(
    'order_report',
    'SELECT s.region, s.total, l.line_count
     FROM order_summary s
     JOIN (SELECT region, COUNT(*) AS line_count FROM order_lines GROUP BY region) l
       USING (region)',
    '1m', initialize => false
);

-- 3. Run ETL processes (can be in separate transactions).
BEGIN;
  COPY orders FROM 's3://warehouse/orders.parquet';
  SELECT pgtrickle.ungate_source('orders');
COMMIT;

BEGIN;
  COPY order_lines FROM 's3://warehouse/lines.parquet';
  SELECT pgtrickle.ungate_source('order_lines');
COMMIT;

-- 4. Once all sources are ungated, the scheduler initializes and refreshes
--    all stream tables in dependency order.

Recipe 4 — Nightly Batch Pattern

For scheduled ETL that runs overnight, gate sources before the batch starts and ungate after the batch completes.

-- Nightly ETL script:

-- Gate all sources that will be refreshed.
SELECT pgtrickle.gate_source('sales');
SELECT pgtrickle.gate_source('inventory');

-- Truncate and reload (or use COPY, INSERT...SELECT, etc.).
TRUNCATE sales;
COPY sales FROM '/data/nightly/sales.csv' WITH (FORMAT csv, HEADER);

TRUNCATE inventory;
COPY inventory FROM '/data/nightly/inventory.csv' WITH (FORMAT csv, HEADER);

-- All data loaded — ungate and let the scheduler handle the rest.
SELECT pgtrickle.ungate_source('sales');
SELECT pgtrickle.ungate_source('inventory');

-- Verify: check the gate status to confirm everything is ungated.
SELECT * FROM pgtrickle.bootstrap_gate_status();

Recipe 5 — Monitoring During a Gated Load

Use bootstrap_gate_status() to monitor progress when streams appear stalled.

-- Check which sources are currently gated and how long they've been paused.
SELECT source_table, gate_duration, affected_stream_tables
FROM pgtrickle.bootstrap_gate_status()
WHERE gated = true;

-- If a gate has been active too long (e.g. ETL failed), ungate manually.
SELECT pgtrickle.ungate_source('stale_source');

Watermark Gating (v0.7.0)

Watermark gating is a scheduling control for ETL pipelines where multiple source tables are populated by separate jobs that finish at different times. Each ETL job declares "I'm done up to timestamp X", and the scheduler waits until all sources in a group are caught up within a configurable tolerance before refreshing downstream stream tables.

Catalog Tables

pgtrickle.pgt_watermarks

Per-source watermark state. One row per source table that has had a watermark advanced.

ColumnTypeDescription
source_relidoidSource table OID (primary key)
watermarktimestamptzCurrent watermark value
updated_attimestamptzWhen the watermark was last advanced
advanced_bytextUser/role that advanced the watermark
wal_lsn_at_advancetextWAL LSN at the time of advancement

pgtrickle.pgt_watermark_groups

Watermark group definitions. Each group declares that a set of sources must be temporally aligned.

ColumnTypeDescription
group_idserialAuto-generated group ID (primary key)
group_nametextUnique group name
source_relidsoid[]Array of source table OIDs in the group
tolerance_secsfloat8Maximum allowed lag in seconds (default 0)
created_attimestamptzWhen the group was created

pgtrickle.pgt_template_cache

Added in v0.16.0. Cross-backend delta SQL template cache (UNLOGGED). Stores compiled delta query templates so new backends skip the ~45 ms DVM parse+differentiate step. Managed automatically — no user interaction required.

ColumnTypeDescription
pgt_idbigintStream table ID (PK, FK → pgt_stream_tables)
query_hashbigintHash of the defining query (staleness detection)
delta_sqltextDelta SQL template with LSN placeholder tokens
columnstext[]Output column names
source_oidsinteger[]Source table OIDs
is_dedupbooleanWhether the delta is deduplicated per row ID
key_changedbooleanWhether __pgt_key_changed column is present
all_algebraicbooleanWhether all aggregates are algebraically invertible
cached_attimestamptzWhen the entry was last populated

Functions

pgtrickle.advance_watermark(source TEXT, watermark TIMESTAMPTZ)

Signal that a source table's data is complete through the given timestamp.

  • Monotonic: rejects watermarks that go backward (raises error).
  • Idempotent: advancing to the same value is a silent no-op.
  • Transactional: the watermark is part of the caller's transaction.
SELECT pgtrickle.advance_watermark('orders', '2026-03-01 12:05:00+00');

pgtrickle.create_watermark_group(group_name TEXT, sources TEXT[], tolerance_secs FLOAT8 DEFAULT 0)

Create a watermark group. Requires at least 2 sources.

  • tolerance_secs: maximum allowed lag between the most-advanced and least-advanced watermarks. Default 0 means strict alignment.
SELECT pgtrickle.create_watermark_group(
    'order_pipeline',
    ARRAY['orders', 'order_lines'],
    0    -- strict alignment (default)
);

pgtrickle.drop_watermark_group(group_name TEXT)

Remove a watermark group by name.

SELECT pgtrickle.drop_watermark_group('order_pipeline');

pgtrickle.watermarks()

Return the current watermark state for all registered sources.

SELECT * FROM pgtrickle.watermarks();
ColumnTypeDescription
source_tabletextSource table name
schema_nametextSchema name
watermarktimestamptzCurrent watermark value
updated_attimestamptzLast advancement time
advanced_bytextUser that advanced it
wal_lsntextWAL LSN at advancement

pgtrickle.watermark_groups()

Return all watermark group definitions.

SELECT * FROM pgtrickle.watermark_groups();

pgtrickle.watermark_status()

Return live alignment status for each watermark group.

SELECT * FROM pgtrickle.watermark_status();
ColumnTypeDescription
group_nametextGroup name
min_watermarktimestamptzLeast-advanced watermark
max_watermarktimestamptzMost-advanced watermark
lag_secsfloat8Lag in seconds between max and min
alignedbooleanWhether lag is within tolerance
sources_with_watermarkint4Number of sources that have a watermark
sources_totalint4Total sources in the group

Recipes

Recipe 6 — Nightly ETL with Watermarks

-- Create a watermark group for the order pipeline.
SELECT pgtrickle.create_watermark_group(
    'order_pipeline',
    ARRAY['orders', 'order_lines']
);

-- Nightly ETL job 1: Load orders
BEGIN;
  COPY orders FROM '/data/orders_20260301.csv';
  SELECT pgtrickle.advance_watermark('orders', '2026-03-01');
COMMIT;

-- Nightly ETL job 2: Load order lines (may run later)
BEGIN;
  COPY order_lines FROM '/data/lines_20260301.csv';
  SELECT pgtrickle.advance_watermark('order_lines', '2026-03-01');
COMMIT;

-- order_report refreshes on the next tick after both watermarks align.

Recipe 7 — Micro-Batch Tolerance

-- Allow up to 30 seconds of skew between trades and quotes.
SELECT pgtrickle.create_watermark_group(
    'realtime_pipeline',
    ARRAY['trades', 'quotes'],
    30   -- 30-second tolerance
);

-- External process advances watermarks every few seconds.
SELECT pgtrickle.advance_watermark('trades', '2026-03-01 12:00:05+00');
SELECT pgtrickle.advance_watermark('quotes', '2026-03-01 12:00:02+00');
-- Lag is 3s, within 30s tolerance → stream tables refresh normally.

Recipe 8 — Monitoring Watermark Alignment

-- Check which groups are currently misaligned.
SELECT group_name, lag_secs, aligned
FROM pgtrickle.watermark_status()
WHERE NOT aligned;

-- Check individual source watermarks.
SELECT source_table, watermark, updated_at
FROM pgtrickle.watermarks()
ORDER BY watermark;

Stuck Watermark Detection (WM-7, v0.15.0)

When pg_trickle.watermark_holdback_timeout is set to a positive value (seconds), the scheduler periodically checks all watermark sources. If any source in a watermark group has not been advanced within the timeout, downstream stream tables in that group are paused (refresh is skipped) and a pgtrickle_alert NOTIFY is emitted.

This protects against silent data staleness when an ETL pipeline breaks and stops advancing watermarks -- without this guard, stream tables would continue refreshing with stale external data.

Behavior:

  • Stuck detection: Every ~60 seconds, the scheduler checks updated_at for all watermark sources. If now() - updated_at > watermark_holdback_timeout, the source is stuck.
  • Pause: Any stream table whose source set overlaps a group containing a stuck source is skipped. A SKIP record with "stuck" in the reason is logged to pgt_refresh_history.
  • Alert: A pgtrickle_alert NOTIFY with event watermark_stuck is emitted (once per newly-stuck source, not repeated every check cycle).
  • Auto-resume: When the stuck watermark is advanced via advance_watermark(), the next scheduler check detects the advancement, lifts the pause, and emits a watermark_resumed event.

Recipe 9 — Stuck Watermark Protection

-- Enable stuck-watermark detection with a 10-minute timeout.
ALTER SYSTEM SET pg_trickle.watermark_holdback_timeout = 600;
SELECT pg_reload_conf();

-- Listen for alerts in a monitoring process.
LISTEN pgtrickle_alert;

-- When the ETL pipeline breaks and stops calling advance_watermark(),
-- the scheduler will start skipping downstream STs after 10 minutes.
-- You'll receive a NOTIFY payload like:
--   {"event":"watermark_stuck","group":"order_pipeline","source_oid":16385,"age_secs":620}

-- When the ETL pipeline recovers and advances the watermark:
SELECT pgtrickle.advance_watermark('orders', '2026-03-02 00:00:00+00');
-- The scheduler automatically resumes, and you'll receive:
--   {"event":"watermark_resumed","source_oid":16385}

Developer Diagnostics (v0.12.0)

Four SQL-callable introspection functions that surface internal DVM state without side-effects. All functions are read-only — they never modify catalog tables or trigger refreshes.

pgtrickle.explain_query_rewrite(query TEXT)

Walk a query through the full DVM rewrite pipeline and report each pass.

Returns one row per rewrite pass. When a pass changes the query, changed = true and sql_after contains the SQL after the transformation. Two synthetic rows are appended: topk_detection (detects ORDER BY … LIMIT) and dvm_patterns (lists detected DVM constructs such as aggregation strategy, join types, and volatility).

SELECT pass_name, changed, sql_after
FROM pgtrickle.explain_query_rewrite(
  'SELECT customer_id, SUM(amount) FROM orders GROUP BY customer_id'
);

Return columns:

ColumnTypeDescription
pass_nametextRewrite pass name (e.g. view_inlining, distinct_on, grouping_sets)
changedboolWhether this pass modified the query
sql_aftertextSQL text after this pass (NULL if unchanged)

Rewrite passes (in order):

PassDescription
view_inliningExpand view references to their defining SQL
nested_window_liftLift window functions out of expressions (e.g. CASE WHEN ROW_NUMBER() OVER (...) ...)
distinct_onRewrite DISTINCT ON to a ROW_NUMBER() window
grouping_setsExpand GROUPING SETS / CUBE / ROLLUP to UNION ALL of GROUP BY
scalar_subquery_in_whereRewrite scalar subqueries in WHERE to CROSS JOIN
correlated_scalar_in_selectRewrite correlated scalar subqueries in SELECT to LEFT JOIN
sublinks_in_or_demorganApply De Morgan normalization and expand SubLinks inside OR
rows_fromRewrite ROWS FROM() multi-function expressions
topk_detectionDetect ORDER BY … LIMIT n TopK pattern
dvm_patternsDetected DVM constructs: join types, aggregate strategies, volatility

pgtrickle.diagnose_errors(name TEXT)

Return the last 5 FAILED refresh events for a stream table, with each error classified by type and supplied with a remediation hint.

SELECT event_time, error_type, error_message, remediation
FROM pgtrickle.diagnose_errors('my_stream_table');

Return columns:

ColumnTypeDescription
event_timetimestamptzWhen the failed refresh started
error_typetextClassification: user, schema, correctness, performance, infrastructure
error_messagetextRaw error text from pgt_refresh_history
remediationtextSuggested next step

Error types:

TypeTrigger patternsTypical action
userquery parse error, unsupported operator, type mismatchCheck query; run validate_query()
schemaupstream table schema changed, upstream table droppedReinitialize; check pgt_dependencies
correctnessphantom, EXCEPT ALL, row count mismatchSwitch to refresh_mode='FULL'; report bug
performancelock timeout, deadlock, serialization failure, spillTune lock_timeout; enable buffer_partitioning
infrastructurepermission denied, SPI error, replication slotCheck role grants; verify slot config

pgtrickle.list_auxiliary_columns(name TEXT)

List all __pgt_* internal columns on a stream table's storage relation, with an explanation of each column's role.

These columns are normally hidden from SELECT * output. This function surfaces them for debugging and operator visibility.

SELECT column_name, data_type, purpose
FROM pgtrickle.list_auxiliary_columns('my_stream_table');

Return columns:

ColumnTypeDescription
column_nametextInternal column name (e.g. __pgt_row_id)
data_typetextPostgreSQL type (e.g. bigint, text)
purposetextHuman-readable description of the column's role

Common auxiliary columns:

ColumnPurpose
__pgt_row_idRow identity hash — MERGE join key for delta application
__pgt_countMultiplicity counter for DISTINCT / aggregation / UNION dedup
__pgt_count_lLeft-side multiplicity for INTERSECT / EXCEPT
__pgt_count_rRight-side multiplicity for INTERSECT / EXCEPT
__pgt_aux_sum_<col>Running SUM for algebraic AVG maintenance
__pgt_aux_count_<col>Running COUNT for algebraic AVG maintenance
__pgt_aux_sum2_<col>Sum-of-squares for STDDEV / VAR maintenance
__pgt_aux_sum{x,y,xy,x2,y2}_<col>Five-column set for CORR / COVAR / REGR_*
__pgt_aux_nonnull_<col>Non-null count for SUM-above-FULL-JOIN maintenance

pgtrickle.validate_query(query TEXT)

Parse and validate a query through the DVM pipeline without creating a stream table. Returns detected SQL constructs, warnings, and the resolved refresh mode.

SELECT check_name, result, severity
FROM pgtrickle.validate_query(
  'SELECT customer_id, COUNT(*) FROM orders GROUP BY customer_id'
);

Return columns:

ColumnTypeDescription
check_nametextName of the check or detected construct
resulttextResolved value or construct description
severitytextINFO, WARNING, or ERROR

The first row always has check_name = 'resolved_refresh_mode' with the mode that would be assigned under refresh_mode = 'AUTO': DIFFERENTIAL, FULL, or TOPK.

Common check names:

CheckDescription
resolved_refresh_modeDIFFERENTIAL, FULL, or TOPK
topk_patternDetected LIMIT + ORDER BY values
unsupported_constructFeature not supported for DIFFERENTIAL mode (→ WARNING)
matview_or_foreign_tableQuery references matview/foreign table (→ WARNING, FULL)
ivm_support_checkDVM parse result (→ WARNING if DIFFERENTIAL not possible)
aggregateAggregate with strategy: ALGEBRAIC_INVERTIBLE, ALGEBRAIC_VIA_AUX, SEMI_ALGEBRAIC, or GROUP_RESCAN
joinDetected join type: INNER, LEFT_OUTER, FULL_OUTER, SEMI, ANTI
set_opSet operation: DISTINCT, UNION_ALL, INTERSECT, EXCEPT, EXCEPT_ALL
window_functionQuery contains window functions
scalar_subqueryQuery contains scalar subqueries
lateralQuery contains LATERAL functions or subqueries
recursive_cteQuery uses WITH RECURSIVE
volatilityWorst-case volatility of functions used: immutable, stable, volatile
needs_pgt_countMultiplicity counter column will be added
needs_dual_countLeft/right multiplicity counters will be added
parse_warningAdvisory warning from the DVM parse phase

Example output for a GROUP_RESCAN query:

SELECT check_name, result, severity
FROM pgtrickle.validate_query(
  'SELECT grp, STRING_AGG(tag, '','') FROM events GROUP BY grp'
);
check_nameresultseverity
resolved_refresh_modeDIFFERENTIALINFO
aggregateSTRING_AGG(GROUP_RESCAN)WARNING
needs_pgt_counttrue — multiplicity counter column requiredINFO
volatilityimmutableINFO

Note on GROUP_RESCAN: STRING_AGG, ARRAY_AGG, BOOL_AND, and other non-algebraic aggregates use a group-rescan strategy — any change in a group triggers full re-aggregation from the source data for that group. This is still DIFFERENTIAL (only changed groups are rescanned), but has higher per-group cost than algebraic strategies. If this is performance-sensitive, consider pre-aggregating with a simpler aggregate and post-processing.


Delta SQL Profiling (v0.13.0)

pgtrickle.explain_delta(st_name text, format text DEFAULT 'text')

Generate the delta SQL query plan for a stream table without executing a refresh.

explain_delta produces the differential delta SQL that would be used on the next DIFFERENTIAL refresh, then runs EXPLAIN (ANALYZE false, FORMAT <format>) on it and returns the plan lines. This function is useful for:

  • Identifying slow joins or missing indexes in auto-generated delta SQL.
  • Comparing plan complexity between different query forms.
  • Monitoring how the size of change buffers affects plan shape.

The delta SQL is generated against a hypothetical "scan all changes" window (LSN 0/0 → FF/FFFFFFFF) so the plan shows the full join/filter structure even when the change buffer is currently empty.

Parameters:

NameTypeDescription
st_nametextQualified stream table name (e.g. 'public.orders_summary').
formattextPlan format: 'text' (default), 'json', 'xml', or 'yaml'.

Returns: SETOF text — one row per plan line (text format) or one row containing the full JSON/XML/YAML plan.

Example:

-- Show the text plan for the delta query
SELECT line FROM pgtrickle.explain_delta('public.orders_summary');

-- Get the JSON plan for programmatic analysis
SELECT line FROM pgtrickle.explain_delta('public.orders_summary', 'json');

Environment variable (PGS_PROFILE_DELTA=1): When the environment variable PGS_PROFILE_DELTA=1 is set in the PostgreSQL server process, every DIFFERENTIAL refresh automatically captures EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON) for the resolved delta SQL and writes the plan to /tmp/delta_plans/<schema>_<table>.json. This is intended for E2E test diagnostics and local profiling sessions.


pgtrickle.dedup_stats()

Show MERGE deduplication profiling counters accumulated since server start.

When the delta cannot be guaranteed to contain at most one row per __pgt_row_id (e.g. for aggregate queries or keyless sources), the MERGE must group and aggregate the delta before merging. This is tracked as dedup needed. A consistently high ratio indicates that pre-MERGE compaction in the change buffer would reduce refresh latency.

Returns: one row with:

ColumnTypeDescription
total_diff_refreshesbigintTotal DIFFERENTIAL refreshes executed since server start that processed at least one change. Resets on server restart.
dedup_neededbigintNumber of those refreshes where the delta required weight aggregation / deduplication in the MERGE USING clause.
dedup_ratio_pctfloat8dedup_needed / total_diff_refreshes × 100. 0 when total_diff_refreshes = 0.

Example:

SELECT * FROM pgtrickle.dedup_stats();
-- total_diff_refreshes | dedup_needed | dedup_ratio_pct
-- ----------------------+--------------+-----------------
--                  1234 |           87 |            7.05

A dedup_ratio_pct ≥ 10 is the threshold recommended for investigating a two-pass MERGE strategy. See plans/performance/REPORT_OVERALL_STATUS.md §14 for background.

pgtrickle.shared_buffer_stats()

Added in v0.13.0

D-4 observability function. Returns one row per shared change buffer (one per tracked source table), showing how many stream tables share the buffer, which columns are tracked, the safe cleanup frontier, and the current buffer size.

Return columns:

ColumnTypeDescription
source_oidbigintPostgreSQL OID of the source table
source_tabletextFully qualified source table name
consumer_countintegerNumber of stream tables sharing this buffer
consumerstextComma-separated list of consumer stream table names
columns_trackedintegerNumber of new_* columns in the buffer (column superset)
safe_frontier_lsntextMIN(frontier LSN) across all consumers — rows at or below this are safe to clean up
buffer_rowsbigintCurrent number of rows in the change buffer
is_partitionedbooleanWhether the buffer uses LSN-range partitioning

Example:

SELECT * FROM pgtrickle.shared_buffer_stats();
-- source_oid | source_table       | consumer_count | consumers                          | columns_tracked | safe_frontier_lsn | buffer_rows | is_partitioned
-- -----------+--------------------+----------------+------------------------------------+-----------------+-------------------+-------------+----------------
--      16456 | public.orders      |              3 | public.orders_by_region, public... |               5 | 0/1A2B3C4D        |         142 | f

UNLOGGED Change Buffers (v0.14.0)

pgtrickle.convert_buffers_to_unlogged()

Converts all existing logged change buffer tables to UNLOGGED. This eliminates WAL writes for trigger-inserted CDC rows, reducing WAL amplification by ~30%.

Returns: bigint — the number of buffer tables converted.

SELECT pgtrickle.convert_buffers_to_unlogged();
-- convert_buffers_to_unlogged
-- ----------------------------
--                            5

Warning: Each conversion acquires ACCESS EXCLUSIVE lock on the buffer table. Run this function during a low-traffic maintenance window to minimize lock contention.

After conversion: Buffer contents will be lost on crash recovery. The scheduler automatically detects this and enqueues a FULL refresh for affected stream tables. See pg_trickle.unlogged_buffers for the full trade-off discussion.


Refresh Mode Diagnostics (v0.14.0)

pgtrickle.recommend_refresh_mode(st_name TEXT DEFAULT NULL)

Analyze stream table workload characteristics and recommend the optimal refresh mode (FULL vs DIFFERENTIAL). When st_name is NULL, returns one row per stream table. When provided, returns a single row for the named stream table.

The function evaluates seven weighted signals — change ratio, empirical timing, query complexity, target size, index coverage, and latency variance — and computes a composite score. Scores above +0.15 recommend DIFFERENTIAL; below −0.15 recommend FULL; in between, the function recommends KEEP (current mode is near-optimal).

Parameters:

NameTypeDefaultDescription
st_nametextNULLOptional stream table name. NULL = all stream tables.

Return columns:

ColumnTypeDescription
pgt_schematextStream table schema
pgt_nametextStream table name
current_modetextCurrently configured refresh mode
effective_modetextMode actually used in the last refresh
recommended_modetextDIFFERENTIAL, FULL, or KEEP
confidencetexthigh, medium, or low
reasontextHuman-readable explanation of the recommendation
signalsjsonbDetailed signal breakdown with scores and weights

Example:

-- Check all stream tables
SELECT pgt_name, current_mode, recommended_mode, confidence, reason
FROM pgtrickle.recommend_refresh_mode();

-- Check a specific stream table
SELECT recommended_mode, confidence, reason, signals
FROM pgtrickle.recommend_refresh_mode('public.orders_summary');

Signal weights:

SignalBase WeightDescription
change_ratio_current0.25Current pending changes / source rows
change_ratio_avg0.30Historical average change ratio
empirical_timing0.35Observed DIFF vs FULL speed ratio
query_complexity0.10JOIN/aggregate/window count
target_size0.10Target relation + index size
index_coverage0.05Whether __pgt_row_id index exists
latency_variance0.05DIFF latency p95/p50 ratio

pgtrickle.refresh_efficiency()

Per-table refresh efficiency metrics. Returns operational statistics for every stream table — useful for monitoring dashboards and Grafana alerts.

Return columns:

ColumnTypeDescription
pgt_schematextStream table schema
pgt_nametextStream table name
refresh_modetextCurrent refresh mode
total_refreshesbigintTotal completed refresh count
diff_countbigintDIFFERENTIAL refresh count
full_countbigintFULL refresh count
avg_diff_msfloat8Average DIFFERENTIAL duration (ms)
avg_full_msfloat8Average FULL duration (ms)
avg_change_ratiofloat8Average change ratio from history
diff_speeduptextSpeedup factor (e.g. 12.3x) of FULL / DIFF timing
last_refresh_attextTimestamp of last data refresh

Example:

SELECT pgt_name, refresh_mode, diff_count, full_count,
       avg_diff_ms, avg_full_ms, diff_speedup
FROM pgtrickle.refresh_efficiency()
ORDER BY total_refreshes DESC;

Export API (v0.14.0)

pgtrickle.export_definition(st_name TEXT)

Export a stream table's configuration as reproducible DDL. Returns a SQL script containing DROP STREAM TABLE IF EXISTS followed by SELECT pgtrickle.create_stream_table(...) with all configured options, plus any ALTER STREAM TABLE calls for post-creation settings (tier, fuse mode, etc.).

Parameters:

NameTypeDescription
st_nametextFully qualified or search-path-resolved stream table name.

Returns: text — SQL script that recreates the stream table.

Example:

-- Export a single definition
SELECT pgtrickle.export_definition('public.orders_summary');

-- Export all definitions
SELECT pgtrickle.export_definition(pgt_schema || '.' || pgt_name)
FROM pgtrickle.pgt_stream_tables;

dbt Integration (v0.13.0)

The dbt-pgtrickle package exposes two new config(...) keys added in v0.13.0: partition_by and the fuse circuit-breaker options. Use them directly in any stream_table materialization model.

For full dbt documentation see dbt-pgtrickle/README.md.


partition_by config

Partition the stream table's underlying storage table using PostgreSQL PARTITION BY RANGE. Only applied at creation time — changing it after the stream table exists has no effect (use --full-refresh to recreate).

-- models/marts/events_by_day.sql
{{ config(
    materialized='stream_table',
    schedule='1m',
    refresh_mode='DIFFERENTIAL',
    partition_by='event_day'
) }}

SELECT
    event_day,
    user_id,
    COUNT(*) AS event_count
FROM {{ source('raw', 'events') }}
GROUP BY event_day, user_id

pg_trickle creates a PARTITION BY RANGE (event_day) storage table with an automatic default catch-all partition. Add named partitions via standard DDL:

CREATE TABLE analytics.events_by_day_2026
  PARTITION OF analytics.events_by_day
  FOR VALUES FROM ('2026-01-01') TO ('2027-01-01');

The partition_by value is stored in pgtrickle.pgt_stream_tables.st_partition_key and visible via pgtrickle.stream_tables_info.


fuse config

The fuse circuit breaker suspends differential refreshes when the incoming change volume exceeds a threshold, preventing runaway refresh cycles during bulk ingestion. Fuse parameters are applied via alter_stream_table() on every dbt run; they are a no-op if the values have not changed.

-- models/marts/order_totals.sql
{{ config(
    materialized='stream_table',
    schedule='5m',
    refresh_mode='DIFFERENTIAL',
    fuse='auto',
    fuse_ceiling=50000,
    fuse_sensitivity=3
) }}

SELECT customer_id, SUM(amount) AS total
FROM {{ source('raw', 'orders') }}
GROUP BY customer_id
Config keyTypeDefaultDescription
fuse'off'|'on'|'auto'null (no-op)Fuse mode. 'auto' activates only when FULL refresh would be cheaper than DIFFERENTIAL.
fuse_ceilingintegernullChange-count threshold (number of changed rows) that triggers the fuse. null uses the global pg_trickle.fuse_default_ceiling GUC.
fuse_sensitivityintegernullNumber of consecutive over-ceiling observations required before the fuse blows. null means 1 (blow immediately).

Monitor fuse state via pgtrickle.dedup_stats() or check pgtrickle.pgt_stream_tables.fuse_state directly:

SELECT pgt_name, fuse_mode, fuse_state, fuse_ceiling, fuse_sensitivity
FROM pgtrickle.pgt_stream_tables
WHERE fuse_mode != 'off';

Self Monitoring — Self-Monitoring (v0.20.0)

Added in v0.20.0.

pg_trickle can monitor itself using its own stream tables. Five self-monitoring stream tables maintain reactive analytics over the internal catalog, replacing repeated full-scan diagnostic queries with continuously-maintained incremental views.

Quick Start

-- Create all five self-monitoring stream tables (idempotent).
SELECT pgtrickle.setup_self_monitoring();

-- Check status.
SELECT * FROM pgtrickle.self_monitoring_status();

-- View threshold recommendations (after 10+ refresh cycles).
SELECT * FROM pgtrickle.df_threshold_advice
WHERE confidence IN ('HIGH', 'MEDIUM');

-- View anomalies.
SELECT * FROM pgtrickle.df_anomaly_signals
WHERE duration_anomaly IS NOT NULL;

-- Enable auto-apply (optional).
SET pg_trickle.self_monitoring_auto_apply = 'threshold_only';

-- Clean up.
SELECT pgtrickle.teardown_self_monitoring();

pgtrickle.setup_self_monitoring()

Creates all five self-monitoring stream tables. Idempotent — safe to call multiple times. Emits a warm-up warning if pgt_refresh_history has fewer than 50 rows.

Stream tables created:

NameScheduleModePurpose
pgtrickle.df_efficiency_rolling48sAUTORolling-window refresh statistics
pgtrickle.df_anomaly_signals48sAUTODuration spikes, error bursts, mode oscillation
pgtrickle.df_threshold_advice96sAUTOMulti-cycle threshold recommendations
pgtrickle.df_cdc_buffer_trends48sAUTOCDC buffer growth rates per source
pgtrickle.df_scheduling_interference96sFULLConcurrent refresh overlap detection

pgtrickle.teardown_self_monitoring()

Drops all self-monitoring stream tables. Safe with partial setups — missing tables are silently skipped. User stream tables are never affected.

pgtrickle.self_monitoring_status()

Returns the status of all five expected self-monitoring stream tables:

ColumnTypeDescription
st_nametextStream table name
existsboolWhether the ST exists
statustextCurrent status (ACTIVE, SUSPENDED, etc.)
refresh_modetextEffective refresh mode
last_refresh_attextLast successful refresh timestamp
total_refreshesbigintTotal completed refreshes

pgtrickle.scheduler_overhead()

Returns scheduler efficiency metrics for the last hour:

ColumnTypeDescription
total_refreshes_1hbigintTotal refreshes in the last hour
df_refreshes_1hbigintDog-feeding refreshes in the last hour
df_refresh_fractionfloatFraction of refreshes that are self-monitoring
avg_refresh_msfloatAverage refresh duration (ms)
avg_df_refresh_msfloatAverage DF refresh duration (ms)
total_refresh_time_sfloatTotal time spent refreshing (seconds)
df_refresh_time_sfloatTime spent on DF refreshes (seconds)

pgtrickle.explain_dag(format)

Returns the full refresh DAG as a Mermaid markdown (default) or Graphviz DOT string. Node colours: user STs = blue, self-monitoring STs = green, suspended = red, fused = orange.

-- Mermaid format (default).
SELECT pgtrickle.explain_dag();

-- Graphviz DOT format.
SELECT pgtrickle.explain_dag('dot');

Auto-Apply Policy

The pg_trickle.self_monitoring_auto_apply GUC controls whether analytics can automatically adjust stream table configuration:

ValueBehaviour
off (default)Advisory only — no automatic changes
threshold_onlyApply threshold recommendations when confidence is HIGH and delta > 5%
fullAlso apply scheduling hints from interference analysis

Auto-apply is rate-limited to at most one threshold change per stream table per 10 minutes. Changes are logged to pgt_refresh_history with initiated_by = 'SELF_MONITOR'.

Confidence Levels and Sparse History

df_threshold_advice assigns a confidence level to each recommendation:

ConfidenceCriteriaWhat to expect
HIGH≥ 20 total refreshes, ≥ 5 DIFFERENTIAL, ≥ 2 FULLReliable recommendation — auto-apply will act on this
MEDIUM≥ 10 total refreshesDirectionally useful, but may lack enough FULL/DIFF mix
LOW< 10 total refreshesInsufficient data — recommendation equals the current threshold

When you see LOW confidence: This is normal during the first minutes after setup_self_monitoring(). The stream tables need time to accumulate refresh history. In typical deployments with a 1-minute schedule, expect:

  • LOW for the first ~10 minutes
  • MEDIUM after ~10 minutes
  • HIGH after ~20 minutes (requires at least 2 FULL refreshes — these happen naturally when the auto-threshold triggers a mode switch)

If a stream table uses FULL mode exclusively, the advice will remain at MEDIUM because no DIFFERENTIAL observations exist for comparison.

The sla_headroom_pct column shows how much faster DIFFERENTIAL is compared to FULL as a percentage. A value of 70% means "DIFF is 70% faster than FULL". This column is NULL when either FULL or DIFF observations are missing.


Stream Table Snapshots (v0.27.0)

Added in v0.27.0 (SNAP-1–3).

Snapshots let you export the current state of a stream table into an archival table, then restore from that snapshot on another node or after a PITR operation. The main use cases are:

  • Replica bootstrap — populate a new standby without a full re-scan
  • PITR alignment — re-align stream table frontiers after point-in-time recovery so the first refresh is DIFFERENTIAL, not a full re-scan
  • Archiving — preserve a historical snapshot for audit or rollback

pgtrickle.snapshot_stream_table(name, target)

Export the current content of a stream table into a new archival table.

pgtrickle.snapshot_stream_table(
    name   TEXT,              -- stream table (schema.name or plain name)
    target TEXT DEFAULT NULL  -- destination table name; auto-generated if NULL
) → TEXT                      -- returns the fully-qualified snapshot table name

The snapshot table is created in the pgtrickle schema with the naming convention snapshot_<name>_<epoch_ms> unless you supply target. The table includes three metadata columns added by pg_trickle: __pgt_snapshot_version, __pgt_frontier, and __pgt_snapshotted_at.

-- Auto-named snapshot
SELECT pgtrickle.snapshot_stream_table('public.orders_agg');
-- → 'pgtrickle.snapshot_orders_agg_1745452800000'

-- Named snapshot (useful when targeting a replica)
SELECT pgtrickle.snapshot_stream_table(
    'public.orders_agg',
    'pgtrickle.orders_agg_replica_init'
);

pgtrickle.restore_from_snapshot(name, source)

Restore a stream table from an archival snapshot and realign its frontier.

pgtrickle.restore_from_snapshot(
    name   TEXT,  -- stream table to restore into
    source TEXT   -- fully-qualified snapshot table created by snapshot_stream_table()
) → void

After restore_from_snapshot() completes:

  1. The stream table's rows are replaced with the snapshot contents.
  2. The frontier is set to the snapshot's frontier, so the next refresh cycle is DIFFERENTIAL — only changes made after the snapshot are fetched.
SELECT pgtrickle.restore_from_snapshot(
    'public.orders_agg',
    'pgtrickle.orders_agg_replica_init'
);

pgtrickle.list_snapshots(name)

List all archival snapshots for a stream table.

pgtrickle.list_snapshots(
    name TEXT  -- stream table name
) → SETOF record(
    snapshot_table TEXT,
    created_at     TIMESTAMPTZ,
    row_count      BIGINT,
    frontier       JSONB,
    size_bytes     BIGINT
)

pgtrickle.drop_snapshot(snapshot_table)

Drop an archival snapshot table and remove it from the catalog.

pgtrickle.drop_snapshot(
    snapshot_table TEXT  -- fully-qualified snapshot table
) → void
SELECT pgtrickle.drop_snapshot('pgtrickle.orders_agg_replica_init');

Catalog Table

TableDescription
pgtrickle.pgt_snapshotsOne row per snapshot: pgt_id, snapshot_schema, snapshot_table, snapshot_version, frontier, created_at

Transactional Outbox & Consumer Groups (v0.28.0)

Added in v0.28.0 (OUTBOX-1–6, OUTBOX-B1–B6).

The outbox pattern lets you reliably publish stream table deltas to external consumers — even if the consumer is temporarily unavailable. Each refresh writes a header row to a dedicated outbox table. Consumers poll for new rows, process them, and commit their offset. The pattern provides:

  • At-least-once delivery with explicit offset commits
  • Kafka-style consumer groups for parallel consumption with independent offsets
  • Visibility leases to prevent duplicate processing within a group
  • Claim-check delivery for large deltas (automatic when delta exceeds a configurable row threshold)
  • Consumer lag metrics (consumer_lag()) for monitoring

Quickstart

-- 1. Enable the outbox on a stream table
SELECT pgtrickle.enable_outbox('public.orders_agg');

-- 2. Create a consumer group
SELECT pgtrickle.create_consumer_group('my_group', 'public.orders_agg');

-- 3. Poll for new messages (returns rows since last committed offset)
SELECT * FROM pgtrickle.poll_outbox('my_group', 'worker-1');

-- 4. Process the rows, then commit the highest offset you processed
SELECT pgtrickle.commit_offset('my_group', 'worker-1', 42);

pgtrickle.enable_outbox(name, retention_hours)

Enable the outbox pattern for a stream table.

pgtrickle.enable_outbox(
    name            TEXT,       -- stream table name
    retention_hours INT DEFAULT 24  -- how long to keep outbox rows
) → void

Creates an outbox table pgtrickle.pgt_outbox_<st> and a convenience view pgtrickle.pgt_outbox_latest_<st>. Records configuration in pgtrickle.pgt_outbox_config.

Restriction: Not compatible with IMMEDIATE refresh mode — use SCHEDULED or AUTO instead.

pgtrickle.disable_outbox(name, if_exists)

Disable the outbox pattern and drop the associated outbox table.

pgtrickle.disable_outbox(
    name      TEXT,
    if_exists BOOLEAN DEFAULT false
) → void

pgtrickle.outbox_status(name)

Return a JSONB summary of outbox state for a stream table.

pgtrickle.outbox_status(name TEXT) → JSONB

Returns: enabled, outbox_table, retention_hours, pending_rows, oldest_row_age, consumer_groups.

pgtrickle.outbox_rows_consumed(stream_table, outbox_id)

Mark an outbox row as consumed and release its claim-check rows (if any).

pgtrickle.outbox_rows_consumed(
    stream_table TEXT,
    outbox_id    BIGINT
) → void

Use this when consuming outbox rows without a consumer group. For consumer-group mode, use commit_offset() instead.

Example:

-- Simple (non-group) consumer: fetch latest, process, release
SELECT outbox_id, payload
FROM pgtrickle.pgt_outbox_latest_orders_agg
LIMIT 10;

-- After successful processing, release the outbox row:
SELECT pgtrickle.outbox_rows_consumed('public.orders_agg', 77);

Consumer Groups

Consumer groups give independent consumers their own offset pointer into the outbox. Multiple consumers in the same group share a single offset (competing consumers); multiple groups each get the full message stream.

pgtrickle.create_consumer_group(name, outbox, auto_offset_reset)

pgtrickle.create_consumer_group(
    name              TEXT,
    outbox            TEXT,
    auto_offset_reset TEXT DEFAULT 'latest'  -- 'latest' | 'earliest'
) → void

auto_offset_reset = 'latest' means a new group starts consuming from the newest row. Use 'earliest' to replay from the beginning.

pgtrickle.drop_consumer_group(name, if_exists)

pgtrickle.drop_consumer_group(
    name      TEXT,
    if_exists BOOLEAN DEFAULT false
) → void

Drops the group and all its offsets and leases.

Example:

-- Remove a consumer group (error if not found)
SELECT pgtrickle.drop_consumer_group('retired-group');

-- Idempotent removal
SELECT pgtrickle.drop_consumer_group('retired-group', if_exists => true);

pgtrickle.poll_outbox(group, consumer, batch_size, visibility_seconds)

Fetch the next batch of unprocessed messages for a consumer.

pgtrickle.poll_outbox(
    group              TEXT,
    consumer           TEXT,
    batch_size         INT DEFAULT 100,
    visibility_seconds INT DEFAULT 30
) → SETOF record(
    outbox_id      BIGINT,
    pgt_id         UUID,
    created_at     TIMESTAMPTZ,
    inserted_count BIGINT,
    deleted_count  BIGINT,
    is_claim_check BOOLEAN,
    payload        JSONB
)

poll_outbox grants a visibility lease for visibility_seconds. The consumer must call commit_offset() or extend_lease() before the lease expires, otherwise the rows become visible again to other consumers.

When is_claim_check = true, the payload is NULL and the actual delta rows are in a separate table (call outbox_rows_consumed() to release them after processing).

Example:

-- Fetch up to 50 messages with a 60-second visibility window
SELECT outbox_id, inserted_count, deleted_count, payload
FROM pgtrickle.poll_outbox(
    'analytics-group',
    'worker-1',
    batch_size         => 50,
    visibility_seconds => 60
);

pgtrickle.commit_offset(group, consumer, last_offset)

Commit the highest outbox offset the consumer has successfully processed.

pgtrickle.commit_offset(
    group       TEXT,
    consumer    TEXT,
    last_offset BIGINT
) → void

Example:

-- After successfully processing messages up through offset 142:
SELECT pgtrickle.commit_offset('analytics-group', 'worker-1', 142);

pgtrickle.extend_lease(group, consumer, extension_seconds)

Extend the visibility lease when processing takes longer than expected.

pgtrickle.extend_lease(
    group             TEXT,
    consumer          TEXT,
    extension_seconds INT DEFAULT 30
) → void

Example:

-- Extend the lease by 2 minutes when a large batch takes longer than expected
SELECT pgtrickle.extend_lease('analytics-group', 'worker-1', extension_seconds => 120);

pgtrickle.seek_offset(group, consumer, new_offset)

Jump to a specific offset for replay or recovery.

pgtrickle.seek_offset(
    group      TEXT,
    consumer   TEXT,
    new_offset BIGINT
) → void

Example:

-- Rewind consumer to replay from offset 100 (disaster recovery)
SELECT pgtrickle.seek_offset('analytics-group', 'worker-1', 100);

-- Fast-forward past known-bad messages to offset 500
SELECT pgtrickle.seek_offset('analytics-group', 'worker-1', 500);

pgtrickle.consumer_heartbeat(group, consumer)

Signal that a consumer is still alive. Prevents the consumer from being marked as dead (controlled by pg_trickle.consumer_dead_threshold_hours).

pgtrickle.consumer_heartbeat(
    group    TEXT,
    consumer TEXT
) → void

Example:

-- Call periodically from a long-running consumer to stay alive
SELECT pgtrickle.consumer_heartbeat('analytics-group', 'worker-1');

pgtrickle.consumer_lag(group)

Return per-consumer lag metrics for a consumer group.

pgtrickle.consumer_lag(group TEXT) → SETOF record(
    consumer         TEXT,
    committed_offset BIGINT,
    latest_offset    BIGINT,
    lag              BIGINT,
    last_seen        TIMESTAMPTZ
)

Example:

-- Monitor lag for all consumers in a group
SELECT consumer, lag, last_seen
FROM pgtrickle.consumer_lag('analytics-group')
ORDER BY lag DESC;

-- Alert if any consumer is more than 1000 messages behind
SELECT consumer, lag
FROM pgtrickle.consumer_lag('analytics-group')
WHERE lag > 1000;

Outbox Catalog Tables

pgtrickle.pgt_outbox_config

Maps stream tables to their pg_tide outbox names. Populated by attach_outbox(); one row per stream table with an outbox enabled.

ColumnTypeDescription
stream_table_oidOIDPostgreSQL OID of the stream table (PRIMARY KEY)
stream_table_nameTEXTQualified name (schema.table) of the stream table
tide_outbox_nameTEXTName of the corresponding pg_tide outbox
created_atTIMESTAMPTZWhen the outbox was attached

pgtrickle.pgt_consumer_groups

Named consumer groups that track consumption progress on an outbox.

ColumnTypeDescription
group_nameTEXTConsumer group name (PRIMARY KEY)
outbox_nameTEXTName of the outbox being consumed
auto_offset_resetTEXTStarting position for new groups: 'latest' or 'earliest'
created_atTIMESTAMPTZWhen the group was created

pgtrickle.pgt_consumer_offsets

Per-consumer committed offsets and heartbeat tracking within a group.

ColumnTypeDescription
group_nameTEXTConsumer group (FK → pgt_consumer_groups)
consumer_idTEXTConsumer identifier within the group
committed_offsetBIGINTHighest outbox offset successfully committed
last_committed_atTIMESTAMPTZWhen the last commit occurred
last_heartbeat_atTIMESTAMPTZLast heartbeat signal timestamp

Primary key: (group_name, consumer_id)

pgtrickle.pgt_consumer_leases

Visibility leases for in-flight outbox message batches (prevents duplicate delivery).

ColumnTypeDescription
group_nameTEXTConsumer group (FK → pgt_consumer_offsets)
consumer_idTEXTConsumer holding the lease
batch_startBIGINTFirst offset in the leased batch
batch_endBIGINTLast offset in the leased batch
lease_expiresTIMESTAMPTZLease expiry time; expired leases become visible again

Primary key: (group_name, consumer_id)


Transactional Inbox (v0.28.0)

Added in v0.28.0 (INBOX-1–6, INBOX-B1–B4).

The inbox pattern provides a reliable, idempotent message receiver inside PostgreSQL. Incoming events are written to an inbox table; pg_trickle automatically creates stream tables that give you views of pending messages, dead-letter messages, and statistics — all updated incrementally.

What gets created

When you call create_inbox('orders_inbox', ...), pg_trickle creates:

Table / ViewPurpose
pgtrickle.orders_inboxThe raw inbox table (one row per event)
orders_inbox_pending stream tableEvents with processed_at IS NULL and retry_count < max_retries
orders_inbox_dlq stream tableDead-letter events (retry_count >= max_retries)
orders_inbox_stats stream tableEvent counts grouped by event_type

pgtrickle.create_inbox(name, ...)

Create a new transactional inbox with its associated stream tables.

pgtrickle.create_inbox(
    name             TEXT,
    schema           TEXT    DEFAULT 'pgtrickle',
    max_retries      INT     DEFAULT 3,
    with_dead_letter BOOLEAN DEFAULT true,
    with_stats       BOOLEAN DEFAULT true,
    schedule_seconds INT     DEFAULT 5
) → void
SELECT pgtrickle.create_inbox('orders_inbox');
-- Creates: pgtrickle.orders_inbox, orders_inbox_pending, orders_inbox_dlq, orders_inbox_stats

pgtrickle.drop_inbox(name, if_exists, cascade)

Drop an inbox and all associated stream tables.

pgtrickle.drop_inbox(
    name      TEXT,
    if_exists BOOLEAN DEFAULT false,
    cascade   BOOLEAN DEFAULT false
) → void

pgtrickle.enable_inbox_tracking(name, table_ref, ...)

Bring-your-own-table (BYOT) mode: register an existing table as an inbox without creating a new one.

pgtrickle.enable_inbox_tracking(
    name             TEXT,
    table_ref        TEXT,            -- fully-qualified existing table
    max_retries      INT     DEFAULT 3,
    with_dead_letter BOOLEAN DEFAULT true,
    with_stats       BOOLEAN DEFAULT true,
    schedule_seconds INT     DEFAULT 5
) → void

pgtrickle.inbox_health(name)

Return a JSONB health summary for an inbox.

pgtrickle.inbox_health(name TEXT) → JSONB

Returns: inbox_name, pending_count, dlq_count, processed_24h, oldest_pending_age, stream_table_statuses.

pgtrickle.inbox_status(name)

Return a tabular status summary for one or all inboxes.

pgtrickle.inbox_status(
    name TEXT DEFAULT NULL  -- NULL = all inboxes
) → SETOF record(
    inbox_name   TEXT,
    pending      BIGINT,
    dlq          BIGINT,
    max_retries  INT,
    created_at   TIMESTAMPTZ
)

pgtrickle.replay_inbox_messages(name, event_ids)

Reset specific messages back to pending state for re-processing.

pgtrickle.replay_inbox_messages(
    name      TEXT,
    event_ids TEXT[]  -- list of event_id values to replay
) → BIGINT            -- number of messages reset

Example:

-- Replay two specific messages that failed processing
SELECT pgtrickle.replay_inbox_messages(
    'orders_inbox',
    ARRAY['evt-001', 'evt-002']
);
-- Returns: 2

-- Replay all dead-letter messages for manual retry
SELECT pgtrickle.replay_inbox_messages(
    'orders_inbox',
    ARRAY(SELECT event_id FROM orders_inbox_dlq)
);

Per-Aggregate Ordering (INBOX-B1)

By default, multiple workers can process inbox messages concurrently. If messages for the same aggregate must be processed in order, enable per-aggregate ordering:

pgtrickle.enable_inbox_ordering(inbox, aggregate_id_col, seq_col)

pgtrickle.enable_inbox_ordering(
    inbox            TEXT,
    aggregate_id_col TEXT,  -- column that identifies the aggregate (e.g. 'customer_id')
    seq_col          TEXT   -- monotonic sequence column (e.g. 'event_sequence')
) → void

Creates a next_<inbox> stream table that surfaces only the lowest-sequence unprocessed message per aggregate. Workers consume from next_<inbox> to avoid concurrent processing of the same aggregate.

pgtrickle.disable_inbox_ordering(inbox, if_exists)

pgtrickle.disable_inbox_ordering(inbox TEXT, if_exists BOOLEAN DEFAULT false) → void

Priority Tiers (INBOX-B2)

pgtrickle.enable_inbox_priority(inbox, priority_col, tiers)

Register a priority column for cost-model–aware scheduling.

pgtrickle.enable_inbox_priority(
    inbox        TEXT,
    priority_col TEXT,    -- column name that holds the priority value
    tiers        INT DEFAULT 3
) → void

pgtrickle.disable_inbox_priority(inbox, if_exists)

pgtrickle.disable_inbox_priority(inbox TEXT, if_exists BOOLEAN DEFAULT false) → void

Sequence Gap Detection (INBOX-B3)

pgtrickle.inbox_ordering_gaps(inbox_name)

Detect gaps in the per-aggregate sequence — useful for identifying lost or out-of-order messages.

pgtrickle.inbox_ordering_gaps(inbox_name TEXT) → SETOF record(
    aggregate_id TEXT,
    expected_seq BIGINT,
    actual_seq   BIGINT,
    gap_size     BIGINT
)

Example:

-- Find any ordering gaps (missing events) across all aggregates
SELECT aggregate_id, expected_seq, actual_seq, gap_size
FROM pgtrickle.inbox_ordering_gaps('orders_inbox')
ORDER BY gap_size DESC;

-- Alert if any gap is larger than 1
DO $$
DECLARE gap RECORD;
BEGIN
    FOR gap IN
        SELECT * FROM pgtrickle.inbox_ordering_gaps('orders_inbox')
        WHERE gap_size > 1
    LOOP
        RAISE WARNING 'Sequence gap for aggregate %: expected %, got % (gap=%)'
            USING DETAIL = gap.aggregate_id || ' seq ' || gap.expected_seq;
    END LOOP;
END;
$$;

Consistent-Hash Partitioning (INBOX-B4)

pgtrickle.inbox_is_my_partition(aggregate_id, worker_id, total_workers)

Distribute inbox processing across multiple workers without external coordination. Returns true when this worker should process messages for the given aggregate.

pgtrickle.inbox_is_my_partition(
    aggregate_id  TEXT,
    worker_id     INT,   -- 0-based worker index
    total_workers INT
) → BOOLEAN

Uses FNV-1a consistent hashing so the same aggregate always routes to the same worker, preventing concurrent processing.

-- Worker 2 of 4 processes only its assigned aggregates:
SELECT * FROM orders_inbox_pending
WHERE pgtrickle.inbox_is_my_partition(customer_id::text, 2, 4);

Inbox Catalog Tables

pgtrickle.pgt_inbox_config

Catalog of named transactional inbox configurations.

ColumnTypeDescription
inbox_nameTEXTInbox name (PRIMARY KEY)
inbox_schemaTEXTSchema where the inbox table is created (default: pgtrickle)
max_retriesINTMaximum retry attempts before a message moves to DLQ (default: 3)
scheduleTEXTRefresh schedule for associated stream tables (default: '1s')
with_dead_letterBOOLWhether a dead-letter-queue stream table is created (default: true)
with_statsBOOLWhether a stats stream table is created (default: true)
retention_hoursINTHow long processed messages are retained (default: 72)
id_columnTEXTColumn name for the unique event ID (default: 'event_id')
processed_at_columnTEXTColumn name for the processing timestamp (default: 'processed_at')
retry_count_columnTEXTColumn name for the retry counter (default: 'retry_count')
error_columnTEXTColumn name for the last error message (default: 'error')
received_at_columnTEXTColumn name for the receipt timestamp (default: 'received_at')
event_type_columnTEXTColumn name for the event type (default: 'event_type')
is_managedBOOLWhether pg_trickle manages the inbox lifecycle (default: true)
created_atTIMESTAMPTZWhen the inbox was created

pgtrickle.pgt_inbox_ordering_config

Per-inbox ordering configuration for per-aggregate sequenced processing (INBOX-B1).

ColumnTypeDescription
inbox_nameTEXTInbox name (PK, FK → pgt_inbox_config)
aggregate_id_colTEXTColumn that identifies the aggregate (e.g., 'customer_id')
sequence_num_colTEXTMonotonic sequence column (e.g., 'event_sequence')
created_atTIMESTAMPTZWhen ordering was enabled

pgtrickle.pgt_inbox_priority_config

Priority tier configuration for inbox message scheduling (INBOX-B2).

ColumnTypeDescription
inbox_nameTEXTInbox name (PK, FK → pgt_inbox_config)
priority_colTEXTColumn that holds the priority value
tiersJSONBPriority tier definitions (threshold → schedule mapping)
created_atTIMESTAMPTZWhen priority was enabled

Note: The relay pipeline SQL API (set_relay_outbox, set_relay_inbox, enable_relay, disable_relay, delete_relay, get_relay_config, list_relay_configs) was moved to the pg_tide extension in v0.46.0.


Public API Stability Contract

Added in v0.19.0 (DB-6).

Stable (will not break without a major version bump)

SurfaceGuarantee
All functions in the pgtrickle schema documented in this referenceSignature and return type preserved across minor releases. New optional parameters may be added with defaults that preserve existing behaviour.
Catalog tables pgtrickle.pgt_stream_tables, pgtrickle.pgt_dependencies, pgtrickle.pgt_refresh_historyExisting columns are not renamed or removed. New columns may be added.
NOTIFY channels pg_trickle_refresh, pgtrickle_alert, pgtrickle_wakeChannel names and JSON payload structure preserved. New keys may be added to JSON payloads.
GUC names listed in docs/CONFIGURATION.mdNames preserved; default values may change between minor releases (documented in CHANGELOG).

Unstable (may change in any release)

SurfaceNotes
Functions prefixed with _ (e.g. _signal_launcher_rescan)Internal use only.
Catalog tables not listed above (e.g. pgt_scheduler_jobs, pgt_source_gates, pgt_watermarks)Schema may change.
The pgtrickle_changes schema and its changes_* tablesCDC implementation detail; format may change.
SQL generated by the DVM engine (MERGE, delta CTEs)Internal query structure is not an API.
The pgtrickle.pgt_schema_version tableMigration infrastructure; rows and schema may change.

Versioning Policy

  • Patch releases (0.x.Y): Bug fixes only. No breaking changes.
  • Minor releases (0.X.0): New features. Stable API preserved; unstable surfaces may change. Breaking changes to stable API only with a deprecation cycle (WARNING for one release, removal in the next).
  • Major release (1.0.0): Stable API locked. Breaking changes require a major version bump.

See also: Configuration · Patterns · Performance Cookbook · Error Reference · Glossary · FAQ


Reserved Column-Name Prefixes (v0.55.0)

Added in v0.55.0 (M-7).

pg_trickle uses several internal column-name prefixes for synthetic columns it creates during query analysis and delta SQL generation. User-defined columns whose names begin with these prefixes will conflict with internal template tokens and produce incorrect query results.

__pgt_* — DVM engine columns

PrefixPurposeExample
__pgt_countWeight column for aggregate deduplication (DIFF mode)__pgt_count
__pgt_row_idContent-based row identity hash for tables without primary keys__pgt_row_id
__pgt_wf_NSynthetic window-function lifting columns (rewrite pass #7)__pgt_wf_1, __pgt_wf_2
__pgt_in_sub_*Derived-table alias for multi-column IN → SemiJoin rewrite (M-5)__pgt_in_sub_t
__pgt_src_NSource partition aliases in generated delta CTEs__pgt_src_0

__pgs_* — Scheduler / shared-memory columns

PrefixPurposeExample
__pgs_*Reserved for future scheduler-side synthetic columns(none exposed today)

Consequences of prefix collision

If your defining query produces a column whose name starts with __pgt_ or __pgs_, the DVM engine may:

  • Fail to generate correct delta SQL (silently wrong results in DIFFERENTIAL mode).
  • Produce a parse error if the synthetic name is also used as a template token.
  • Cause the MERGE statement to reference the wrong column in the ON clause.

Mitigation: rename the conflicting column using an alias:

-- Bad: __pgt_count collides with the weight column
SELECT id, sum(amount) AS __pgt_count FROM orders GROUP BY id;

-- Good: use any name that does not start with __pgt_ or __pgs_
SELECT id, sum(amount) AS order_total FROM orders GROUP BY id;

Configuration

Complete reference for all pg_trickle GUC (Grand Unified Configuration) variables.


Quick-tuning by goal

Not sure which GUC to change? Start here.

GoalGUCs to adjust
Lower refresh latencyscheduler_interval_ms, min_schedule_seconds
Reduce write overhead on busy tablescompact_threshold, max_buffer_rows, cleanup_use_truncate, user_triggers
Handle larger DAGs without timeoutsmax_workers, max_dynamic_refresh_workers, scheduler_interval_ms
Connection-pooler compatibility (PgBouncer)pooler_compatibility_mode, use_prepared_statements
Lower memory usage during refreshmerge_work_mem_mb, max_delta_estimate_rows
Improve cost-model accuracycost_model_safety_margin, planner_aggressive, differential_max_change_ratio
Enable WAL-based CDCcdc_mode, wal_transition_timeout, slot_lag_warning_threshold_mb
Prevent a runaway stream tablemax_consecutive_errors, fuse_threshold, buffer_alert_threshold

See the full reference below for each variable's defaults, valid values, and notes. Use pgtrickle.recommend_refresh_mode() for per-table advice.


Table of Contents


Overview

pg_trickle exposes over forty configuration variables in the pg_trickle namespace. All can be set in postgresql.conf or at runtime via SET / ALTER SYSTEM.

Required postgresql.conf settings:

shared_preload_libraries = 'pg_trickle'

The extension must be loaded via shared_preload_libraries because it registers GUC variables and a background worker at startup.

Note: wal_level = logical and max_replication_slots are recommended but not required. The default CDC mode (auto) uses lightweight row-level triggers initially and transparently transitions to WAL-based capture if wal_level = logical is available. If wal_level is not logical, pg_trickle stays on triggers permanently — no degradation, no errors. Set pg_trickle.cdc_mode = 'trigger' to disable WAL transitions entirely (see pg_trickle.cdc_mode).


GUC Variables

Essential

The settings most users configure at install time.


pg_trickle.enabled

Enable or disable the pg_trickle extension.

PropertyValue
Typebool
Defaulttrue
ContextSUSET (superuser)
Restart RequiredNo

When set to false, the background scheduler stops processing refreshes. Existing stream tables remain in the catalog but are not refreshed. Manual pgtrickle.refresh_stream_table() calls still work.

Note on CDC triggers: Setting enabled = false stops the scheduler from refreshing stream tables but does not disable CDC trigger execution. Change buffers continue to accumulate. This is intentional: when the extension is re-enabled, stream tables can refresh immediately from the buffered changes rather than performing a full table scan.

To fully quiesce CDC overhead during extended maintenance, use pgtrickle.drain() before disabling, then DROP TRIGGER the CDC triggers manually and recreate them via pgtrickle.repair_stream_table() when re-enabling.

-- Disable automatic refreshes
SET pg_trickle.enabled = false;

-- Re-enable
SET pg_trickle.enabled = true;

pg_trickle.cdc_mode

CDC (Change Data Capture) mechanism selection.

ValueDescription
'auto'(default) Use triggers for creation; transition to WAL-based CDC if wal_level = logical. Falls back to triggers automatically on error.
'trigger'Always use row-level triggers for change capture
'wal'Require WAL-based CDC (fails if wal_level != logical)

Default: 'auto'

pg_trickle.cdc_mode only affects deferred refresh modes ('AUTO', 'FULL', and 'DIFFERENTIAL'). refresh_mode = 'IMMEDIATE' bypasses CDC entirely and always uses statement-level IVM triggers. If the GUC is set to 'wal' when a stream table is created or altered to IMMEDIATE, pg_trickle logs an INFO and continues with IVM triggers instead of creating CDC triggers or WAL slots.

Per-stream-table overrides take precedence over the GUC when you pass cdc_mode => 'auto' | 'trigger' | 'wal' to pgtrickle.create_stream_table(...) or pgtrickle.alter_stream_table(...). The override is stored in pgtrickle.pgt_stream_tables.requested_cdc_mode. For shared source tables, pg_trickle resolves the effective source-level CDC mechanism conservatively: any dependent stream table that requests 'trigger' keeps the source on trigger CDC; otherwise 'wal' wins over 'auto'.

-- Enable automatic trigger → WAL transition (default)
SET pg_trickle.cdc_mode = 'auto';

-- Force trigger-only CDC (disable WAL transitions)
SET pg_trickle.cdc_mode = 'trigger';

-- Require WAL-based CDC (error if wal_level != logical)
SET pg_trickle.cdc_mode = 'wal';

pg_trickle.scheduler_interval_ms

How often the background scheduler checks for stream tables that need refreshing.

PropertyValue
Typeint
Default1000 (1 second)
Range10060000 (100ms to 60s)
ContextSUSET
Restart RequiredNo

Tuning Guidance:

  • Low-latency workloads (sub-second schedule): Set to 100500.
  • Standard workloads (minutes of schedule): Default 1000 is appropriate.
  • Low-overhead workloads (many STs with long schedules): Increase to 500010000 to reduce scheduler overhead.

The scheduler interval does not determine refresh frequency — it determines how often the scheduler checks whether any ST's staleness exceeds its schedule (or whether a cron expression has fired). The actual refresh frequency is governed by schedule (duration or cron) and canonical period alignment.

SET pg_trickle.scheduler_interval_ms = 500;

pg_trickle.event_driven_wake

⚠️ Removed in v0.51.0 — This GUC has been removed. It had no effect since v0.39.0 because PostgreSQL's LISTEN command is not permitted inside background worker processes. The scheduler always uses efficient latch-based polling regardless of this setting.

Migration: Remove pg_trickle.event_driven_wake from postgresql.conf and any ALTER SYSTEM settings. The scheduler behavior is unchanged — it wakes at pg_trickle.scheduler_interval_ms intervals. To reduce latency, lower scheduler_interval_ms instead (e.g. 200 ms for sub-200 ms refresh latency).


pg_trickle.wake_debounce_ms

⚠️ Removed in v0.51.0 — This GUC has been removed together with event_driven_wake. It had no effect since event_driven_wake was always non-functional in background workers.

Migration: Remove pg_trickle.wake_debounce_ms from postgresql.conf and any ALTER SYSTEM settings. No replacement is needed.


pg_trickle.min_schedule_seconds

Minimum allowed schedule value (in seconds) when creating or altering a stream table with a duration-based schedule. This limit does not apply to cron expressions.

PropertyValue
Typeint
Default1 (1 second)
Range186400 (1 second to 24 hours)
ContextSUSET
Restart RequiredNo

This acts as a safety guardrail to prevent users from setting impractically small schedules that would cause excessive refresh overhead.

Tuning Guidance:

  • Development/testing: Default 1 allows sub-second testing.
  • Production: Raise to 60 or higher to prevent excessive WAL consumption and CPU usage.
-- Restrict to 10-second minimum schedules
SET pg_trickle.min_schedule_seconds = 10;

pg_trickle.default_schedule_seconds

Default effective schedule (in seconds) for isolated CALCULATED stream tables that have no downstream dependents.

PropertyValue
Typeint
Default1 (1 second)
Range186400 (1 second to 24 hours)
ContextSUSET
Restart RequiredNo

When a CALCULATED stream table (scheduled with 'calculated') has no downstream dependents to derive a schedule from, this value is used as its effective refresh interval. This is distinct from min_schedule_seconds, which is the validation floor for duration-based schedules.

Tuning Guidance:

  • Development/testing: Default 1 allows rapid iteration.
  • Production standalone CALCULATED tables: Raise to match your desired update cadence (e.g., 60 for once-per-minute).
-- Set default for isolated CALCULATED tables to 30 seconds
SET pg_trickle.default_schedule_seconds = 30;

pg_trickle.max_consecutive_errors

Maximum consecutive refresh failures before a stream table is moved to ERROR status.

PropertyValue
Typeint
Default3
Range1100
ContextSUSET
Restart RequiredNo

When a ST's consecutive_errors reaches this threshold:

  1. The ST status changes to ERROR.
  2. Automatic refreshes stop for this ST.
  3. Manual intervention is required: SELECT pgtrickle.alter_stream_table('...', status => 'ACTIVE').

Tuning Guidance:

  • Strict (production): 3 — fail fast to surface issues.
  • Lenient (development): 1020 — tolerate transient errors.
SET pg_trickle.max_consecutive_errors = 5;

WAL CDC

Settings specific to WAL-based CDC. Only relevant when pg_trickle.cdc_mode = 'auto' or 'wal'.


pg_trickle.wal_transition_timeout

Note: This setting is only relevant when pg_trickle.cdc_mode = 'auto' or 'wal'. See ARCHITECTURE.md for the full CDC transition lifecycle.

Maximum time (seconds) to wait for the WAL decoder to catch up during the transition from trigger-based to WAL-based CDC. If the decoder has not caught up within this timeout, the system falls back to triggers.

Default: 300 (5 minutes)
Range: 103600

SET pg_trickle.wal_transition_timeout = 300;

pg_trickle.slot_lag_warning_threshold_mb

Warning threshold for retained WAL on pg_trickle replication slots.

PropertyValue
Typeint
Default100 (MB)
Range11048576
ContextSUSET
Restart RequiredNo

When retained WAL for a pg_trickle replication slot exceeds this threshold:

  • The scheduler emits a slot_lag_warning event on LISTEN pg_trickle_alert
  • pgtrickle.health_check() reports WARN for the slot_lag check

Raise this on high-throughput systems that intentionally tolerate larger WAL retention. Lower it if you want earlier warning before slots risk invalidation.

SET pg_trickle.slot_lag_warning_threshold_mb = 256;

pg_trickle.slot_lag_critical_threshold_mb

Critical threshold for retained WAL on pg_trickle replication slots.

PropertyValue
Typeint
Default1024 (MB)
Range11048576
ContextSUSET
Restart RequiredNo

When retained WAL for a pg_trickle replication slot exceeds this threshold, pgtrickle.check_cdc_health() returns a per-source slot_lag_exceeds_threshold alert.

This threshold is intentionally higher than the warning threshold so operators can separate early warning from source-level unhealthy state.

SET pg_trickle.slot_lag_critical_threshold_mb = 2048;

Refresh Performance

Fine-grained tuning for the differential refresh engine.


pg_trickle.differential_max_change_ratio

Maximum change-to-table ratio before DIFFERENTIAL refresh falls back to FULL refresh.

PropertyValue
Typefloat
Default0.15 (15%)
Range0.01.0
ContextSUSET
Restart RequiredNo

When the number of pending change buffer rows exceeds this fraction of the source table's estimated row count, the refresh engine switches from DIFFERENTIAL (which uses JSONB parsing and window functions) to FULL refresh. At high change rates FULL refresh is cheaper because it avoids the per-row JSONB overhead.

Special Values:

  • 0.0: Disable adaptive fallback — always use DIFFERENTIAL.
  • 1.0: Always fall back to FULL (effectively forces FULL mode).

Tuning Guidance:

  • OLTP with low change rates (< 5%): Default 0.15 is appropriate.
  • Batch-load workloads (bulk inserts): Lower to 0.050.10 so large batches trigger FULL refresh sooner.
  • Latency-sensitive (want deterministic refresh time): Set to 0.0 to always use DIFFERENTIAL.
-- Lower threshold for batch-heavy workloads
SET pg_trickle.differential_max_change_ratio = 0.10;

-- Disable adaptive fallback
SET pg_trickle.differential_max_change_ratio = 0.0;

pg_trickle.refresh_strategy

Cluster-wide refresh strategy override.

PropertyValue
Typestring
Default'auto'
Values'auto', 'differential', 'full'
ContextSUSET
Restart RequiredNo

Controls the FULL vs. DIFFERENTIAL decision for all stream tables whose refresh_mode is DIFFERENTIAL:

  • 'auto' (default): Use the adaptive cost-based heuristic that considers differential_max_change_ratio, per-ST auto_threshold, refresh history, and spill detection to pick the optimal strategy per refresh cycle.
  • 'differential': Always use DIFFERENTIAL refresh — skip the adaptive ratio check entirely. The BUF-LIMIT safety check (max_buffer_rows) still applies.
  • 'full': Always use FULL refresh regardless of change volume. Useful for debugging or when you know DIFFERENTIAL is consistently slower for your workload.

Important: Per-ST refresh_mode in the catalog takes precedence. Stream tables explicitly configured as refresh_mode = 'FULL' always use FULL regardless of this GUC.

Tuning Guidance:

  • Most workloads: Leave at 'auto' — the adaptive heuristic learns from refresh history.
  • Known-low-churn workloads: Set to 'differential' to eliminate the per-source capped-count query overhead.
  • Debugging delta issues: Temporarily set to 'full' to compare behavior.
-- Force DIFFERENTIAL for all stream tables (skip ratio check)
SET pg_trickle.refresh_strategy = 'differential';

-- Force FULL for all stream tables (debugging)
SET pg_trickle.refresh_strategy = 'full';

-- Reset to adaptive heuristic
SET pg_trickle.refresh_strategy = 'auto';

pg_trickle.cost_model_safety_margin

Added in v0.17.0. Safety margin for the predictive cost model that decides FULL vs. DIFFERENTIAL.

PropertyValue
Typefloat
Default0.8
Range0.12.0
ContextSUSET
Restart RequiredNo

When refresh_strategy = 'auto', the cost model estimates DIFFERENTIAL and FULL costs from recent refresh history. DIFFERENTIAL is chosen when:

estimated_diff_cost < estimated_full_cost × safety_margin

A value below 1.0 biases toward DIFFERENTIAL (which has lower lock contention and is generally preferred). A value above 1.0 biases toward FULL.

The cost model also classifies each stream table's query complexity (scan, filter, aggregate, join, or join+aggregate) and uses per-class coefficients learned from historical data.

Tuning Guidance:

  • 0.8 (default): Prefer DIFFERENTIAL unless it's nearly as expensive as FULL.
  • 0.5: Strongly prefer DIFFERENTIAL — only fall back when it's clearly more expensive.
  • 1.0: Neutral — pick whichever is estimated to be cheaper.
  • 1.2: Slightly prefer FULL — useful when FULL is very fast and DIFFERENTIAL lock contention is a concern.
-- Strongly prefer DIFFERENTIAL
SET pg_trickle.cost_model_safety_margin = 0.5;

-- Neutral (pick the estimated cheapest)
SET pg_trickle.cost_model_safety_margin = 1.0;

pg_trickle.max_delta_estimate_rows

Added in v0.15.0. Maximum estimated delta output cardinality before falling back to FULL refresh.

PropertyValue
Typeint
Default0 (disabled)
Range010,000,000
ContextSUSET
Restart RequiredNo

Before executing the MERGE, the refresh executor extracts the delta subquery and runs a capped SELECT count(*) FROM (delta LIMIT N+1). If the count reaches the configured limit, the refresh emits a NOTICE and falls back to FULL refresh to prevent OOM or excessive temp-file spills from unexpectedly large delta output.

This is complementary to differential_max_change_ratio which checks input change buffer size as a ratio of source table size. max_delta_estimate_rows checks output cardinality — catching cases where a small number of input changes produce a large delta output after JOINs.

Special Values:

  • 0 (default): Disable the estimation check entirely.

Tuning Guidance:

  • Servers with 8–16 GB RAM: Start with 100000 and adjust based on observed refresh behavior.
  • Large-memory servers (32+ GB): 500000 or higher.
  • Complex multi-join queries: Lower to 50000 since join fan-out can amplify small changes.
-- Enable delta output estimation with 100K row limit
SET pg_trickle.max_delta_estimate_rows = 100000;

-- Disable estimation (default)
SET pg_trickle.max_delta_estimate_rows = 0;

pg_trickle.cleanup_use_truncate

Use TRUNCATE instead of per-row DELETE for change buffer cleanup when the entire buffer is consumed by a refresh.

PropertyValue
Typebool
Defaulttrue
ContextSUSET
Restart RequiredNo

After a differential refresh consumes all rows from the change buffer, the engine must clean up the buffer table. TRUNCATE is O(1) regardless of row count, versus DELETE which must update indexes row-by-row. This saves 3–5 ms per refresh at 10%+ change rates.

Trade-off: TRUNCATE acquires an AccessExclusiveLock on the change buffer table. If concurrent DML on the source table is actively inserting into the same change buffer via triggers, this lock can cause brief contention.

Tuning Guidance:

  • Most workloads: Leave at true — the performance benefit outweighs the brief lock.
  • High-concurrency OLTP with continuous writes during refresh: Set to false if you observe lock-wait timeouts on the change buffer.
  • PgBouncer / connection poolers: The AccessExclusiveLock acquired by TRUNCATE is held only on the change buffer table (not the source table), but in transaction-pooling mode with frequent refreshes, even brief exclusive locks can cause connection queuing. If you observe elevated pg_stat_activity wait events on change buffer tables, switch to false.
-- Use per-row DELETE for change buffer cleanup
SET pg_trickle.cleanup_use_truncate = false;

pg_trickle.planner_aggressive

Added in v0.14.0. Consolidated switch for all MERGE planner hints. Replaces the deprecated merge_planner_hints and merge_work_mem_mb GUCs.

PropertyValue
Typebool
Defaulttrue
ContextSUSET
Restart RequiredNo

When enabled, the refresh executor estimates the delta size and applies optimizer hints within the transaction:

  • Delta ≥ 100 rows: SET LOCAL enable_nestloop = off — forces hash joins instead of nested-loop joins.
  • Delta ≥ 10,000 rows: additionally SET LOCAL work_mem = '<N>MB' (see pg_trickle.merge_work_mem_mb).

Tuning Guidance:

  • Most workloads: Leave at true — the hints improve tail latency without affecting small deltas.
  • Custom plan overrides: Set to false if you manage planner settings yourself or if the hints conflict with your pg_hint_plan configuration.
  • Memory-constrained environments: When enabled, large deltas (≥ 10,000 rows) raise work_mem to 64 MB (configurable via merge_work_mem_mb). If your server has limited RAM and runs many concurrent refreshes, this can cause unexpected memory pressure or temp-file spills. Monitor temp_blks_written in pg_stat_statements and consider lowering merge_work_mem_mb or disabling this GUC if spills are frequent.
-- Disable all planner hints
SET pg_trickle.planner_aggressive = false;

pg_trickle.merge_join_strategy

Added in v0.15.0. Manual override for the join strategy used during MERGE execution.

PropertyValue
Typetext
Default'auto'
Valuesauto, hash_join, nested_loop, merge_join
ContextSUSET
Restart RequiredNo

Controls which join strategy the refresh executor hints to PostgreSQL via SET LOCAL during differential refresh. Requires planner_aggressive to be enabled.

ValueBehaviour
auto (default)Delta-size heuristics choose: nested-loop for tiny deltas, hash-join for larger ones
hash_joinAlways disable nested-loop joins and raise work_mem — best for medium-to-large deltas
nested_loopAlways disable hash-join and merge-join — best for very small deltas against indexed tables
merge_joinAlways disable hash-join and nested-loop — useful if data is pre-sorted

Tuning Guidance:

  • Most workloads: Leave at auto — the built-in heuristic performs well.
  • Consistently large deltas (1K+ rows): Setting to hash_join avoids heuristic overhead.
  • Troubleshooting: If refresh is slow, try different strategies and compare with explain_st().
-- Force hash joins for all MERGE operations
SET pg_trickle.merge_join_strategy = 'hash_join';

-- Revert to automatic heuristics
SET pg_trickle.merge_join_strategy = 'auto';

pg_trickle.merge_strategy

Added in v0.16.0. Controls how differential refresh applies deltas to stream tables.

PropertyValue
Typetext
Default'auto'
Valuesauto, merge
ContextSUSET
Restart RequiredNo
ValueBehaviour
auto (default)Use DELETE+INSERT when delta_rows / target_rows is below merge_strategy_threshold; MERGE otherwise
mergeAlways use the PostgreSQL MERGE statement

Breaking change (v0.19.0): The delete_insert value was removed in v0.19.0 (CORR-1) because it was semantically unsafe for aggregate and DISTINCT queries. Setting it now logs a WARNING and falls back to auto.

The DELETE+INSERT strategy avoids the MERGE join cost by executing two targeted statements: a DELETE for removed rows (matched by __pgt_row_id), then an INSERT for new rows. This is significantly cheaper for sub-1% deltas against large tables because it avoids scanning the entire target for the MERGE join.

Tuning Guidance:

  • Most workloads: Leave at auto — the heuristic picks DELETE+INSERT for small deltas automatically.
  • Correctness concerns: The merge setting preserves the pre-v0.16.0 behaviour.
-- Force MERGE for all differential refreshes
SET pg_trickle.merge_strategy = 'merge';

-- Revert to automatic heuristics
SET pg_trickle.merge_strategy = 'auto';

pg_trickle.merge_strategy_threshold

Added in v0.16.0. Delta ratio threshold for the auto merge strategy.

PropertyValue
Typefloat
Default0.01 (1%)
Range0.0011.0
ContextSUSET
Restart RequiredNo

When merge_strategy is auto, DELETE+INSERT is used instead of MERGE when delta_rows / target_rows is below this threshold. The target row count is estimated from pg_class.reltuples.

Tuning Guidance:

  • Default (0.01): DELETE+INSERT for deltas under 1% of the target table size.
  • Higher values (0.05–0.10): More aggressive use of DELETE+INSERT; useful for wide tables where MERGE join overhead is high.
  • Lower values (0.001): Only use DELETE+INSERT for very tiny deltas.
-- Use DELETE+INSERT for deltas under 5% of target size
SET pg_trickle.merge_strategy_threshold = 0.05;

pg_trickle.merge_planner_hints

Deprecated in v0.14.0. Use pg_trickle.planner_aggressive instead. This GUC is still accepted for backward compatibility but is ignored at runtime.

Inject SET LOCAL planner hints before MERGE execution during differential refresh.

PropertyValue
Typebool
Defaulttrue
ContextSUSET
Restart RequiredNo

When enabled, the refresh executor estimates the delta size and applies optimizer hints within the transaction:

  • Delta ≥ 100 rows: SET LOCAL enable_nestloop = off — forces hash joins instead of nested-loop joins.
  • Delta ≥ 10,000 rows: additionally SET LOCAL work_mem = '<N>MB' (see pg_trickle.merge_work_mem_mb).

This reduces P95 latency spikes caused by PostgreSQL choosing nested-loop plans for medium/large delta sizes.

Tuning Guidance:

  • Most workloads: Leave at true — the hints improve tail latency without affecting small deltas.
  • Custom plan overrides: Set to false if you manage planner settings yourself or if the hints conflict with your pg_hint_plan configuration.
-- Disable planner hints
SET pg_trickle.merge_planner_hints = false;

pg_trickle.merge_work_mem_mb

work_mem value (in MB) applied via SET LOCAL when the delta exceeds 10,000 rows and planner hints are enabled.

PropertyValue
Typeint
Default64 (64 MB)
Range84096 (8 MB to 4 GB)
ContextSUSET
Restart RequiredNo

A higher value lets PostgreSQL use larger in-memory hash tables for the MERGE join, avoiding disk-spilling sort/merge strategies on large deltas. This setting is only applied when both merge_planner_hints = true and the delta exceeds 10,000 rows.

Tuning Guidance:

  • Servers with ample RAM (32+ GB): Increase to 128256 for faster large-delta refreshes.
  • Memory-constrained: Lower to 1632 or disable planner hints entirely.
  • Very large deltas (100K+ rows): Consider 256512 if refresh latency matters.
SET pg_trickle.merge_work_mem_mb = 128;

pg_trickle.delta_work_mem_cap_mb

Maximum work_mem (in MB) that planner hints are allowed to set during delta MERGE execution. When the deep-join or large-delta path would set work_mem above this cap, the refresh falls back to FULL instead of risking OOM.

PropertyValue
Typeint
Default0 (disabled — no cap)
Range08192 (0 to 8 GB)
ContextSUSET
Restart RequiredNo

Set to 0 to disable the cap entirely (default). When enabled, the cap is checked before any SET LOCAL work_mem in apply_planner_hints(). If the configured or computed work_mem exceeds the cap, the refresh emits a NOTICE and falls back to FULL refresh.

Tuning Guidance:

  • Production servers with tight memory budgets: Set to 256512 to prevent runaway hash joins.
  • Servers with ample RAM (64+ GB): Leave at 0 (disabled) or set high (2048+).
  • If you see SCAL-3 fallback notices: Either raise the cap or investigate why delta sizes are unexpectedly large.
SET pg_trickle.delta_work_mem_cap_mb = 512;

pg_trickle.merge_seqscan_threshold

Delta-to-ST row ratio below which sequential scans are disabled for the MERGE transaction. Requires planner hints to be enabled.

PropertyValue
Typereal
Default0.001
Range0.01.0
ContextSUSET
Restart RequiredNo

When the estimated delta row count divided by the stream table's reltuples falls below this threshold, the refresh executor issues SET LOCAL enable_seqscan = off, coercing PostgreSQL into using the __pgt_row_id B-tree index instead of a full sequential scan.

Set to 0.0 to disable the feature entirely.

Tuning Guidance:

  • Default (0.001): Suitable for most workloads. A 10M-row ST with fewer than 10K delta rows triggers the hint.
  • High-throughput / small STs: Increase to 0.01 if your STs are small and you want more aggressive index usage.
  • Disable: Set to 0.0 if index-only scans are not beneficial for your access pattern.
SET pg_trickle.merge_seqscan_threshold = 0.01;

pg_trickle.auto_backoff

Automatically back off the refresh schedule when a stream table is consistently falling behind.

PropertyValue
Typebool
Defaulton
ContextSUSET
Restart RequiredNo

When enabled (the default), the scheduler tracks a per-stream-table backoff factor. If a refresh cycle takes more than 95% of the scheduled interval, the backoff factor doubles (capped at ), effectively stretching the schedule to avoid runaway refresh storms. The factor resets to 1× on the first on-time completion, and a WARNING is emitted whenever the factor changes so you always know why a stream table is refreshing more slowly than expected.

The 95% trigger threshold means that brief jitter on developer machines (e.g. a 950 ms refresh on a 1-second schedule) will correctly engage backoff, while a 900 ms refresh on the same schedule will not. The EC-11 operator alert (scheduler_falling_behind NOTIFY) continues to fire at the lower 80% threshold, giving you advance warning before the scheduler is actually stuck.

This is a safety net for overloaded systems — it prevents a single slow stream table from monopolizing the background worker when operators are not available to intervene.

Tuning Guidance:

  • Leave on (the default) for both production and development environments.
  • Disable only if you are deliberately running stream tables at the limit of their schedule budget and want the scheduler to keep trying at full speed regardless.
-- Disable if you want no backoff (not recommended for production)
SET pg_trickle.auto_backoff = off;

pg_trickle.tiered_scheduling

Enable tiered refresh scheduling (Hot/Warm/Cold/Frozen) for stream tables.

PropertyValue
Typebool
Defaulton
ContextSUSET
Restart RequiredNo

When enabled, the scheduler applies a per-stream-table refresh tier multiplier to duration-based schedules. Each stream table has a refresh_tier column (default 'hot') that controls how often it is refreshed relative to its configured schedule:

TierMultiplierEffect
hotRefresh at configured schedule (default)
warmRefresh at 2× the configured interval
cold10×Refresh at 10× the configured interval
frozenskipNever refreshed until manually promoted

Cron-based schedules are not affected by the tier multiplier.

Set the tier via:

SELECT pgtrickle.alter_stream_table('my_table', tier => 'warm');
SELECT pgtrickle.alter_stream_table('my_table', tier => 'frozen');

Design note: Tiers are user-assigned only. Automatic classification from pg_stat_user_tables was rejected because pg_trickle's own MERGE scans pollute the read counters, making auto-classification unreliable.

Tier Thresholds Reference

The following table summarizes the effective refresh behavior for each tier. All multipliers apply to duration-based schedules only — cron-based schedules are always honored as-is. New stream tables default to hot.

TierMultiplierEffective Schedule (1 s base)Use Case
hot1 sReal-time dashboards, alerting tables, SLA-bound queries
warm2 sImportant but not latency-critical tables; reduces CPU by 50%
cold10×10 sReporting tables queried infrequently; saves significant CPU
frozenskipnever (until promoted)Archival tables, tables under maintenance, or seasonal reports

When to use each tier:

  • Hot — default for all new stream tables. Appropriate when downstream consumers expect near-real-time freshness.
  • Warm — set for tables where a few seconds of staleness is acceptable. Halves the refresh CPU cost compared to Hot.
  • Cold — set for tables queried only by batch jobs or low-frequency dashboards. 10× reduction in refresh overhead.
  • Frozen — set when a table should not be refreshed at all (e.g., during a maintenance window or when the upstream source is being migrated). Promote back to Hot/Warm/Cold when ready.
-- Promote a frozen table back to warm
SELECT pgtrickle.alter_stream_table('seasonal_report', tier => 'warm');

-- Freeze a table during maintenance
SELECT pgtrickle.alter_stream_table('my_table', tier => 'frozen');

Changed in v0.12.0: The default for pg_trickle.tiered_scheduling changed from off to on. Set pg_trickle.tiered_scheduling = off in postgresql.conf to restore pre-v0.12.0 behavior (all STs refresh at full speed regardless of tier assignment).


Diamond Schedule Policy (per-stream-table)

Controls how the scheduler fires diamond consistency groups — sets of stream tables that share upstream sources through a diamond-shaped DAG topology.

PropertyValue
Columndiamond_schedule_policy in pgt_stream_tables
Values'fastest' (default), 'slowest'
Set viacreate_stream_table(..., diamond_schedule_policy => 'slowest')
Alter viaalter_stream_table('name', diamond_schedule_policy => 'slowest')

Only meaningful when diamond_consistency = 'atomic' is also set.

fastest (default): The atomic group fires when any member is due. This maximizes freshness but can cause CPU multiplication. In an asymmetric diamond where stream table B refreshes every 1 s and stream table C every 5 s, both feeding D with diamond_consistency = 'atomic': C refreshes 5× more often than its schedule because B triggers the group every second. For N members with schedules S₁ < S₂ < … < Sₙ, the total refresh count is N × (cycle_time / S₁), meaning slower members do up to Sₙ/S₁ times more work than their schedule implies.

slowest: The atomic group fires only when all members are due. This minimizes CPU cost at the expense of freshness — faster members are held back until the slowest member's schedule fires.

Tuning Guidance:

  • Use 'fastest' when freshness of the diamond tip matters and the cost of extra refreshes is acceptable.
  • Use 'slowest' when CPU budget is tight or members have very different schedules (e.g., 1 s vs 60 s) and the multiplication would be excessive.
-- Create with slowest policy to avoid CPU multiplication
SELECT pgtrickle.create_stream_table(
    'my_diamond_tip',
    'SELECT ... FROM a JOIN b ...',
    diamond_consistency => 'atomic',
    diamond_schedule_policy => 'slowest'
);

pg_trickle.use_prepared_statements

Use SQL PREPARE / EXECUTE for MERGE statements during differential refresh.

PropertyValue
Typebool
Defaulttrue
ContextSUSET
Restart RequiredNo

When enabled, the refresh executor issues PREPARE __pgt_merge_{id} on the first cache-hit cycle, then uses EXECUTE on subsequent cycles. After approximately 5 executions, PostgreSQL switches from a custom plan to a generic plan, saving 1–2 ms of parse/plan overhead per refresh.

Tuning Guidance:

  • Most workloads: Leave at true — the cumulative parse/plan savings are significant for frequently-refreshed stream tables.
  • Highly skewed data: Set to false if prepared-statement parameter sniffing produces poor plans (e.g., highly skewed LSN distributions causing bad join estimates).
-- Disable prepared statements
SET pg_trickle.use_prepared_statements = false;

pg_trickle.user_triggers

Control how user-defined triggers on stream tables are handled during refresh.

PropertyValue
Typetext
Default'auto'
Values'auto', 'off' ('on' accepted as deprecated alias for 'auto')
ContextSUSET
Restart RequiredNo

When a stream table has user-defined row-level triggers, the refresh engine can decompose the MERGE into explicit DELETE + UPDATE + INSERT statements so triggers fire with correct TG_OP, OLD, and NEW values.

Values:

  • auto (default): Automatically detect user triggers on the stream table. If present, use the explicit DML path; otherwise use MERGE.
  • off: Always use MERGE. User triggers are suppressed during refresh. This is the escape hatch if explicit DML causes issues.
  • on: Deprecated compatibility alias for auto. Existing configs continue to work, but new configs should use auto.

Notes:

  • Row-level triggers do not fire during FULL refresh regardless of this setting. FULL refresh uses DISABLE TRIGGER USER / ENABLE TRIGGER USER to suppress them.
  • The explicit DML path adds ~25–60% overhead compared to MERGE for affected stream tables.
  • Stream tables without user triggers have zero overhead when using auto (only a fast pg_trigger check).
-- Auto-detect (default)
SET pg_trickle.user_triggers = 'auto';

-- Suppress triggers, use MERGE
SET pg_trickle.user_triggers = 'off';

-- Backward-compatible legacy setting (treated the same as 'auto')
SET pg_trickle.user_triggers = 'on';

Guardrails & Limits

Safety controls and hard limits.


pg_trickle.block_source_ddl

When enabled, column-affecting DDL (e.g., ALTER TABLE ... DROP COLUMN, ALTER TABLE ... ALTER COLUMN ... TYPE) on source tables tracked by stream tables is blocked with an ERROR instead of silently marking stream tables for reinitialization.

This is useful in production environments where you want to prevent accidental schema changes that would trigger expensive full recomputation of downstream stream tables.

Default: false
Context: Superuser

-- Block column-affecting DDL on tracked source tables
SET pg_trickle.block_source_ddl = true;

-- Allow DDL (stream tables will be marked for reinit instead)
SET pg_trickle.block_source_ddl = false;

Note: Only column-affecting changes are blocked. Benign DDL (adding indexes, comments, constraints) is always allowed regardless of this setting.


pg_trickle.buffer_alert_threshold

When any source table's change buffer exceeds this number of rows, a BufferGrowthWarning alert is emitted. Raise for high-throughput workloads, lower for small tables.

Default: 1000000 (1 million rows)
Range: 1000100000000

SET pg_trickle.buffer_alert_threshold = 500000;

pg_trickle.compact_threshold

When a source table's pending change buffer exceeds this many rows, compaction is triggered before the next refresh cycle. Compaction eliminates net-zero INSERT+DELETE pairs (rows inserted then deleted within the same refresh window) and collapses multi-change groups to first+last rows per pk_hash, reducing delta scan overhead by 50–90% for high-churn tables.

Set to 0 to disable compaction.

Default: 100000 (100K rows)
Range: 0100000000

-- Trigger compaction at 50K pending rows
SET pg_trickle.compact_threshold = 50000;

-- Disable compaction
SET pg_trickle.compact_threshold = 0;

pg_trickle.max_buffer_rows

Added in v0.16.0. Hard limit on change buffer rows per source table. When a source table's change buffer exceeds this limit at refresh time, pg_trickle forces a FULL refresh and truncates the buffer, preventing unbounded disk growth when differential refresh fails repeatedly.

PropertyValue
Typeinteger
Default1000000 (1 million rows)
Range0100000000
ContextSUSET
Restart RequiredNo

Set to 0 to disable the limit (not recommended for production).

Tuning Guidance:

  • Most workloads: Leave at 1000000. This accommodates high-throughput tables while preventing runaway growth.
  • High-throughput event tables: Raise to 500000010000000 if your source tables regularly accumulate large change buffers between refresh cycles.
  • Small databases / tight disk budgets: Lower to 100000500000 to limit change buffer disk usage.
-- Set buffer limit to 5 million rows
SET pg_trickle.max_buffer_rows = 5000000;

-- Disable the limit (not recommended)
SET pg_trickle.max_buffer_rows = 0;

pg_trickle.auto_index

Added in v0.16.0. Controls whether create_stream_table() automatically creates performance indexes on stream tables.

PropertyValue
Typebool
Defaulttrue
ContextSUSET
Restart RequiredNo

When enabled, the following indexes are created automatically:

  1. GROUP BY composite index — for aggregate queries in DIFFERENTIAL mode, a composite index on the GROUP BY columns is created to speed up group lookups during MERGE.

  2. DISTINCT composite index — for DISTINCT queries with ≤ 8 output columns, a composite index on all output columns is created.

  3. Covering __pgt_row_id index — for stream tables with ≤ 8 output columns, the __pgt_row_id index includes all user columns via INCLUDE, enabling index-only scans during MERGE (20–50% faster for small deltas against large targets).

The __pgt_row_id index itself is always created regardless of this setting (it is required for correctness).

Tuning Guidance:

  • Most workloads: Leave at true.
  • Custom index strategies: Set to false if you prefer to manage indexes manually or if the auto-created indexes conflict with your workload patterns.
-- Disable automatic index creation
SET pg_trickle.auto_index = false;

pg_trickle.aggregate_fast_path

Added in v0.16.0. Controls whether stream tables with all-algebraic aggregates use the explicit DML fast-path instead of MERGE.

PropertyValue
Typebool
Defaulttrue
ContextSUSET
Restart RequiredNo

When enabled, stream tables whose aggregates are all algebraically invertible (COUNT, SUM, AVG, STDDEV, VAR, CORR, REGR_*, etc.) use the explicit DML path (DELETE + UPDATE + INSERT via a materialized temp table) instead of the generic MERGE statement. This avoids the MERGE hash-join cost, which dominates for aggregate queries with high group cardinality.

Not eligible:

  • Queries with SEMI_ALGEBRAIC aggregates (MIN, MAX) — these may require group-rescan on extremum deletion
  • Queries with GROUP_RESCAN aggregates (STRING_AGG, ARRAY_AGG, JSON_AGG, etc.)
  • Queries with user-defined triggers on the stream table (already use explicit DML via the user-trigger path)

The explain_st() output shows the aggregate_path field:

  • explicit_dml — fast-path is active
  • merge — using the default MERGE path
  • merge (fast-path disabled) — eligible but GUC is off
-- Disable aggregate fast-path
SET pg_trickle.aggregate_fast_path = false;

-- Check the current aggregate path for a stream table
SELECT * FROM pgtrickle.explain_st('my_agg_st');

pg_trickle.template_cache

Added in v0.16.0. Controls the cross-backend delta template cache backed by an UNLOGGED catalog table.

PropertyValue
Typebool
Defaulttrue
ContextSUSET
Restart RequiredNo

When enabled, delta SQL templates generated by the DVM engine are persisted in pgtrickle.pgt_template_cache so that new backends skip the ~45 ms parse+differentiate step on their first refresh of each stream table (down to ~1 ms SPI lookup).

Templates are automatically invalidated when:

  • A stream table's defining query changes (ALTER STREAM TABLE ... SET QUERY)
  • A stream table is dropped
  • A stream table is reinitialized

The explain_st() output includes template_cache (enabled/disabled) and template_cache_stats with L2 hit and full miss counters.

-- Disable the template cache for debugging
SET pg_trickle.template_cache = false;

-- Check template cache stats
SELECT * FROM pgtrickle.explain_st('my_st')
WHERE property IN ('template_cache', 'template_cache_stats');

pg_trickle.buffer_partitioning

Controls whether change buffer tables use PARTITION BY RANGE (lsn) for O(1) cleanup via partition detach instead of O(n) DELETE.

ValueBehaviour
'off'(Default) Unpartitioned heap tables. Cleanup uses DELETE or TRUNCATE. Lowest DDL overhead per cycle.
'on'Always create partitioned change buffers. Old partitions are detached and dropped after consumption — O(1) cleanup regardless of buffer size. Best for high-throughput sources where buffers routinely exceed compact_threshold.
'auto'Start with unpartitioned buffers. If a buffer accumulates more rows than compact_threshold within a single refresh cycle, automatically promote it to RANGE(lsn) partitioned mode. Once promoted, the buffer stays partitioned. Combines low overhead for quiet sources with O(1) cleanup for hot ones.

Default: 'off' Context: SUSET (superuser session-level)

-- Always partition change buffers
SET pg_trickle.buffer_partitioning = 'on';

-- Auto-promote based on throughput
SET pg_trickle.buffer_partitioning = 'auto';

-- Disable partitioning (default)
SET pg_trickle.buffer_partitioning = 'off';

Interaction with compact_threshold: In 'auto' mode, the compact_threshold value serves double duty — it triggers both compaction and the auto-promotion decision. Lowering compact_threshold makes auto-promotion more sensitive.


pg_trickle.max_grouping_set_branches

Maximum allowed grouping set branches in CUBE/ROLLUP queries. CUBE(n) produces $2^n$ branches — without a limit, large cubes cause memory exhaustion during parsing. Users who genuinely need more than 64 branches can raise this GUC.

Default: 64
Range: 165536

-- Allow up to 128 grouping set branches
SET pg_trickle.max_grouping_set_branches = 128;

pg_trickle.volatile_function_policy

Controls how volatile functions in defining queries are handled for DIFFERENTIAL and IMMEDIATE modes.

ValueBehaviour
reject(Default) Volatile functions cause an ERROR at stream table creation time.
warnVolatile functions emit a WARNING but creation proceeds. Delta correctness is not guaranteed.
allowVolatile functions are permitted silently. Use only when you understand that delta computation may produce incorrect results.

Default: reject Context: SUSET (superuser session-level)

-- Allow volatile functions with a warning
SET pg_trickle.volatile_function_policy = 'warn';

-- Allow volatile functions silently
SET pg_trickle.volatile_function_policy = 'allow';

Note: Volatile functions (e.g., random(), clock_timestamp()) produce different values on each evaluation. In DIFFERENTIAL/IMMEDIATE modes, the delta computation assumes deterministic functions — volatile functions may cause stale or incorrect rows. FULL mode is unaffected since it recomputes from scratch every time.


pg_trickle.unlogged_buffers

Create new change buffer tables as UNLOGGED to reduce WAL amplification from CDC trigger inserts.

ValueBehaviour
false(Default) Change buffers are WAL-logged. Crash-safe — no data loss on crash recovery.
trueNew change buffers are created as UNLOGGED. Eliminates WAL writes for trigger-inserted rows, reducing WAL amplification by ~30%. Trade-off: buffers are truncated on crash recovery; affected stream tables automatically receive a FULL refresh on the next scheduler cycle.

Default: false Context: SUSET (superuser session-level)

-- Enable UNLOGGED buffers for new stream tables
SET pg_trickle.unlogged_buffers = true;

Crash recovery: After a PostgreSQL crash or standby restart, UNLOGGED buffer tables are automatically truncated by PostgreSQL. The pg_trickle scheduler detects this condition and enqueues a FULL refresh for each affected stream table on the next tick. During the window between crash recovery and FULL refresh completion, stream table data may be stale.

Standby replicas: UNLOGGED tables are not replicated to standbys. Stream tables on read replicas will be stale after any standby restart until the next FULL refresh completes on the primary.

Converting existing buffers: This GUC only affects newly created change buffer tables. To convert existing logged buffers, use:

SELECT pgtrickle.convert_buffers_to_unlogged();

This function acquires ACCESS EXCLUSIVE lock on each buffer table. Run it during a low-traffic maintenance window.


pg_trickle.max_parse_depth

Maximum recursion depth for the query parser's tree visitors (G13-SD). Prevents stack-overflow crashes on pathological queries with deeply nested subqueries, CTEs, or set operations. When the limit is exceeded, the parser returns a QueryTooComplex error instead of crashing.

Default: 64 Range: 110000

-- Raise the limit for deeply nested queries
SET pg_trickle.max_parse_depth = 128;

pg_trickle.ivm_topk_max_limit

Maximum LIMIT value for TopK stream tables in IMMEDIATE mode. TopK queries exceeding this threshold are rejected because the inline micro-refresh (recomputing top-K rows on every DML statement) adds latency proportional to LIMIT. Set to 0 to disable TopK in IMMEDIATE mode entirely.

Default: 1000
Range: 01000000

-- Allow TopK up to LIMIT 5000 in IMMEDIATE mode
SET pg_trickle.ivm_topk_max_limit = 5000;

pg_trickle.ivm_recursive_max_depth

Maximum recursion depth for WITH RECURSIVE queries in IMMEDIATE mode. The semi-naive evaluation injects a __pgt_depth counter column into the recursive SQL; iteration stops when the counter reaches this limit. Protects against infinite recursion in pathological graphs.

Default: 100
Range: 110000

-- Allow deeper recursion for large hierarchies
SET pg_trickle.ivm_recursive_max_depth = 500;

Invalidation Ring & Deep-Join Tuning (v0.50.0)

SCAL-10-01 — Invalidation ring capacity ceiling

pg_trickle.invalidation_ring_capacity

Controls the number of slots in the per-backend invalidation ring buffer. When a source table DDL change (e.g. ALTER TABLE) or schema reload is detected, the extension marks affected stream-table OIDs in this ring so background refresh workers can schedule a full DAG rebuild.

Default: 128
Range: 11024
Hard ceiling: 1024 entries (enforced at registration time; values above 1024 are clamped to 1024)

-- Increase for deployments with many simultaneously-modified source tables
SET pg_trickle.invalidation_ring_capacity = 512;

Overflow behaviour

When more than invalidation_ring_capacity unique OIDs are invalidated in a single burst (e.g. during a schema migration touching hundreds of tables at once), the ring overflows. An overflow causes:

  1. A full DAG rebuild to be triggered on the next scheduler tick, regardless of which individual OIDs were invalidated.
  2. The invalidation_ring_overflows counter (visible via pgtrickle.reliability_counters() and the pg_trickle_reliability_invalidation_ring_overflows_total Prometheus metric) to be incremented by 1.

Overflows are safe but expensive — a full DAG rebuild scans all registered stream tables. A sustained non-zero overflow rate indicates that capacity should be increased.

Guidance for large deployments (1,000+ stream tables)

ST countRecommended capacity
< 200128 (default)
200–500256
500–1000512
> 10001024 (maximum)

Note: Each ring slot consumes ~8 bytes of shared memory (allocated at pg_trickle.max_shared_memory_kb). Increasing capacity by 896 slots (128 → 1024) uses an extra ~7 KB of shared memory, which is negligible.

SettingValue
Default128
Range11024
Contextpostmaster (requires server restart)

COR-10-01 — Deep join chain threshold

pg_trickle.part3_max_scan_count

Maximum number of source-table rows the differential engine scans in Part 3 (direct scan strategy) before it escalates to a full deep-join delta recomputation.

Part 3 applies when a join chain has depth ≤ part3_max_scan_count source rows per side. Beyond this threshold the planner falls back to a more expensive multi-pass join, which is correct at any depth but generates larger SQL plans.

Default: 5
Range: 110000

-- Tighten to Part 3 only for very small dimension tables (≤3 rows):
SET pg_trickle.part3_max_scan_count = 3;

-- Relax for moderately sized dimensions where Part 3 SQL is manageable:
SET pg_trickle.part3_max_scan_count = 20;

Trade-off: SQL complexity vs. delta correctness at depth

ThresholdEffect
Low (1–5)Part 3 used rarely; deeper join strategy always chosen for non-trivial chains; correct but larger delta SQL and higher planning time.
Default (5)Balanced; Part 3 applies only to tiny lookup tables (static enums, 1-row config rows). Recommended for most workloads.
High (50+)Part 3 used aggressively; SQL is simpler but intermediate delta estimates may miss correlated rows at join depth > 6.

Recommendation by join-chain depth

  • ≤ 6 table join chains: Default (5) is safe and near-optimal.
  • > 6 table join chains with small dimension tables: Increase to 1020 only if you observe excessive delta SQL plan sizes in EXPLAIN output.
  • Analytical workloads with 10+ table star schemas: Leave at default 5 and rely on the planner's GROUP_RESCAN fallback for correctness.

Diagnostic: Set pg_trickle.log_format = 'verbose' and look for part3_direct_scan tags in the scheduler log to see how often Part 3 is being selected.

SettingValue
Default5
Range110000
Contextsuperuser

Parallel Refresh

These settings control whether and how the scheduler dispatches refresh work to multiple dynamic background workers instead of processing stream tables sequentially. See PLAN_PARALLELISM.md for the design.

Note: Parallel refresh has been the default (on) since v0.11.0. Use pg_trickle.parallel_refresh_mode = 'off' to revert to sequential execution.

pg_trickle.parallel_refresh_mode

Controls whether the scheduler dispatches refresh work to dynamic background workers.

PropertyValue
Typetext
Default'on'
Values'off', 'dry_run', 'on'
ContextSUSET
Restart RequiredNo
  • on (default as of v0.11.0): True parallel refresh. The coordinator builds an execution-unit DAG, dispatches ready units to dynamic background workers, and respects both the per-database cap (max_concurrent_refreshes) and the cluster-wide cap (max_dynamic_refresh_workers).
  • dry_run: The scheduler computes execution units, logs dispatch decisions (unit keys, ready-queue contents, budget), but still executes refreshes inline. Useful for previewing parallel behaviour without actually spawning workers.
  • off: Sequential execution. All stream tables are refreshed one at a time in topological order by the single scheduler background worker.
-- Preview parallel dispatch decisions without changing runtime behaviour
SET pg_trickle.parallel_refresh_mode = 'dry_run';

-- Enable parallel refresh
SET pg_trickle.parallel_refresh_mode = 'on';

pg_trickle.max_dynamic_refresh_workers

Cluster-wide cap on concurrently active pg_trickle dynamic refresh workers.

PropertyValue
Typeint
Default4
Range064
ContextSUSET
Restart RequiredNo

This is distinct from pg_trickle.max_concurrent_refreshes (per-database cap). When multiple databases each have their own scheduler, this GUC prevents them from overcommitting the shared PostgreSQL max_worker_processes budget.

Worker-budget planning: Each dynamic refresh worker consumes one max_worker_processes slot. In addition, pg_trickle uses one slot for the launcher and one per-database scheduler. Ensure:

max_worker_processes >= pg_trickle launchers (1)
                      + pg_trickle schedulers (1 per database)
                      + max_dynamic_refresh_workers
                      + autovacuum workers
                      + parallel query workers
                      + other extensions

A typical small deployment (1–2 databases, 4 parallel workers) needs at least max_worker_processes = 16. The E2E test Docker image uses 128.

-- Allow up to 8 concurrent refresh workers cluster-wide
SET pg_trickle.max_dynamic_refresh_workers = 8;

pg_trickle.max_concurrent_refreshes

Per-database dispatch cap for parallel refresh workers.

PropertyValue
Typeint
Default4
Range132
ContextSUSET
Restart RequiredNo

When parallel_refresh_mode = 'on', this limits how many execution units a single database coordinator may have in-flight at the same time. In sequential mode (parallel_refresh_mode = 'off'), this setting has no effect.

The effective concurrent refreshes for a database is:

min(max_concurrent_refreshes, max_dynamic_refresh_workers - workers_used_by_other_dbs)
-- Allow up to 8 concurrent refreshes in this database
SET pg_trickle.max_concurrent_refreshes = 8;

pg_trickle.per_database_worker_quota

Per-database dynamic refresh worker quota for multi-tenant cluster isolation.

PropertyValue
Typeint
Default0 (disabled)
Range064
ContextSUSET
Restart RequiredNo

When greater than 0, each per-database scheduler limits itself to this many concurrently active dynamic refresh workers drawn from the shared max_dynamic_refresh_workers pool. This prevents a single busy database from starving others in multi-tenant clusters.

Burst capacity: when the cluster is lightly loaded (active workers < 80% of max_dynamic_refresh_workers), a database may temporarily exceed its quota by up to 50% to absorb sudden change backlogs. The burst is reclaimed automatically within 1 scheduler cycle once global load rises back above the 80% threshold.

Priority dispatch: within each dispatch tick, IMMEDIATE-trigger closures are dispatched before all other unit kinds, ensuring transactional consistency requirements are always met first, even under quota pressure.

-- Limit the analytics DB to 4 base workers (bursts to 6 when cluster is idle)
ALTER DATABASE analytics SET pg_trickle.per_database_worker_quota = 4;
-- Give the reporting DB only 2 base workers
ALTER DATABASE reporting  SET pg_trickle.per_database_worker_quota = 2;
SELECT pg_reload_conf();

When per_database_worker_quota = 0 (the default), this feature is disabled and all databases share the max_dynamic_refresh_workers pool on a first-come-first-served basis, bounded per coordinator by max_concurrent_refreshes.

Note: Set this GUC per-database with ALTER DATABASE rather than globally with ALTER SYSTEM, so different databases can have different quotas.


Advanced / Internal

pg_trickle.change_buffer_schema

Schema name for change-buffer tables created by the trigger-based CDC pipeline.

Default: 'pgtrickle_changes'

Change buffer tables are named <schema>.changes_<oid> where <oid> is the source table's OID. Placing them in a dedicated schema keeps them out of the public namespace.

SET pg_trickle.change_buffer_schema = 'my_change_buffers';

pg_trickle.foreign_table_polling

Enable polling-based change detection for foreign table sources. When enabled, the scheduler periodically re-executes the foreign table query and computes deltas via snapshot comparison (EXCEPT ALL). Foreign tables cannot use trigger or WAL-based CDC, so this is the only mechanism for incremental maintenance.

Default: false

SET pg_trickle.foreign_table_polling = true;

pg_trickle.matview_polling

Enable polling-based CDC for materialized views. When enabled, materialized views referenced in defining queries are supported via snapshot-comparison (the same mechanism as foreign table polling). A local shadow table stores the previous state; EXCEPT ALL computes the delta on each refresh cycle.

PropertyValue
Typeboolean
Defaultfalse
ContextSUSET (superuser)
Restart requiredNo
SET pg_trickle.matview_polling = true;

pg_trickle.cdc_trigger_mode

Controls the CDC trigger granularity: statement (default) or row.

statement uses statement-level AFTER triggers with transition tables (NEW TABLE / OLD TABLE). A single invocation per DML statement processes all affected rows in one bulk INSERT ... SELECT, giving 50-80% less write-side overhead for bulk UPDATE/DELETE. Single-row DML is unaffected.

row uses the legacy per-row trigger approach (pg_trickle < 0.4.0 behavior).

Changing this setting takes effect for newly installed CDC triggers. Call pgtrickle.rebuild_cdc_triggers() to migrate existing stream tables.

PropertyValue
Typestring
Default'statement'
Valid valuesstatement, row
ContextSUSET (superuser)
Restart requiredNo
-- Switch to statement-level triggers (default, recommended)
SET pg_trickle.cdc_trigger_mode = 'statement';

-- After changing, rebuild existing triggers:
SELECT pgtrickle.rebuild_cdc_triggers();

pg_trickle.tick_watermark_enabled

Cap CDC consumption to the WAL LSN at scheduler tick start. When enabled (default), each scheduler tick captures pg_current_wal_lsn() at its start and prevents any refresh from consuming WAL changes beyond that LSN. This bounds cross-source staleness without requiring user configuration.

Disable only if you need stream tables to always advance to the latest available LSN.

PropertyValue
Typeboolean
Defaulttrue
ContextSUSET (superuser)
Restart requiredNo
-- Disable tick watermark bounding
SET pg_trickle.tick_watermark_enabled = false;

pg_trickle.watermark_holdback_timeout

Maximum seconds a user-provided watermark may remain un-advanced before being considered stuck. When a watermark group contains a source whose watermark has not been advanced within this timeout, downstream stream tables in that group are paused (refresh is skipped) and a pgtrickle_alert NOTIFY with watermark_stuck event is emitted.

When the stuck watermark is advanced again (via advance_watermark()), the pause is automatically lifted and a watermark_resumed event is emitted.

Set to 0 to disable stuck-watermark detection (default). Useful values depend on your ETL pipeline cadence -- for a pipeline that loads every 5 minutes, a timeout of 600 (10 min) gives a safety margin.

PropertyValue
Typeinteger
Default0 (disabled)
Min0
Max86400 (24 hours)
ContextSUSET (superuser)
Restart requiredNo
-- Set stuck-watermark timeout to 10 minutes
ALTER SYSTEM SET pg_trickle.watermark_holdback_timeout = 600;
SELECT pg_reload_conf();

NOTIFY payloads:

{"event":"watermark_stuck","group":"order_pipeline","source_oid":16385,"age_secs":620}
{"event":"watermark_resumed","source_oid":16385}

pg_trickle.spill_threshold_blocks

Temp blocks written threshold for spill detection. After each differential MERGE, pg_trickle queries pg_stat_statements for the temp_blks_written metric. If the value exceeds this threshold, the refresh is considered a spill.

After spill_consecutive_limit consecutive spills, the scheduler forces a FULL refresh for that stream table to prevent repeated expensive differential merges.

Requires the pg_stat_statements extension to be installed. Set to 0 to disable spill detection (default).

PropertyValue
Typeinteger
Default0 (disabled)
Min0
Max100000000
ContextSUSET (superuser)
Restart requiredNo
-- Enable spill detection: flag > 1000 temp blocks as a spill
ALTER SYSTEM SET pg_trickle.spill_threshold_blocks = 1000;
SELECT pg_reload_conf();

pg_trickle.spill_consecutive_limit

Number of consecutive spilling differential refreshes before the scheduler automatically forces a FULL refresh. Resets after any non-spilling refresh.

Only effective when spill_threshold_blocks > 0.

PropertyValue
Typeinteger
Default3
Min1
Max100
ContextSUSET (superuser)
Restart requiredNo
-- Force FULL after 5 consecutive spills (default: 3)
ALTER SYSTEM SET pg_trickle.spill_consecutive_limit = 5;
SELECT pg_reload_conf();

pg_trickle.log_merge_sql

Log the generated MERGE SQL template on every refresh cycle. When enabled, the MERGE SQL template built during differential refresh is emitted to the PostgreSQL server log at LOG level.

Intended for debugging MERGE query generation only. Do not enable in production — the output is verbose and includes the full SQL for every refresh.

PropertyValue
Typeboolean
Defaultfalse
ContextSUSET (superuser)
Restart requiredNo
SET pg_trickle.log_merge_sql = true;

Guardrails & Diagnostics

These GUCs control safety thresholds and diagnostic warnings.

pg_trickle.fuse_default_ceiling

Global default change-count ceiling for the fuse circuit breaker. When a stream table has fuse_mode = 'on' or 'auto' and no per-ST fuse_ceiling, this value is used. If pending changes exceed this count, the fuse blows and the stream table is suspended (status = SUSPENDED).

Set to 0 to disable the global default (per-ST ceilings still apply).

PropertyValue
Typeinteger
Default0 (disabled)
Range0 - 2,000,000,000
ContextSUSET (superuser)
Restart requiredNo
-- Set global fuse ceiling to 1 million rows
SET pg_trickle.fuse_default_ceiling = 1000000;

pg_trickle.delta_amplification_threshold

Delta amplification detection threshold (output/input ratio). When a DIFFERENTIAL refresh produces more than this multiple of the input delta rows, a WARNING is emitted so operators can identify pathological join fan-out or many-to-many amplification.

Set to 0.0 to disable.

PropertyValue
Typefloat
Default0.0 (disabled)
Range0.0 - 100,000.0
ContextSUSET (superuser)
Restart requiredNo
-- Warn when delta output is 10x the input
SET pg_trickle.delta_amplification_threshold = 10.0;

pg_trickle.algebraic_drift_reset_cycles

Differential cycles between automatic full recomputes for algebraic aggregates. After this many differential refresh cycles, stream tables with algebraic aggregates (AVG, STDDEV, VAR) are automatically reinitialized to reset accumulated floating-point drift in auxiliary columns.

Set to 0 to disable automatic resets.

PropertyValue
Typeinteger
Default0 (disabled)
Range0 - 100,000
ContextSUSET (superuser)
Restart requiredNo
-- Reset algebraic aggregates every 10,000 cycles
SET pg_trickle.algebraic_drift_reset_cycles = 10000;

pg_trickle.agg_diff_cardinality_threshold

Estimated GROUP BY cardinality threshold for algebraic aggregate warnings. At create_stream_table time, if the defining query uses algebraic aggregates (SUM, COUNT, AVG) in DIFFERENTIAL mode and the estimated group cardinality is below this threshold, a WARNING is emitted suggesting FULL or AUTO mode.

Set to 0 to disable the warning.

PropertyValue
Typeinteger
Default0 (disabled)
Range0 - 100,000,000
ContextSUSET (superuser)
Restart requiredNo
-- Warn when GROUP BY cardinality is below 100
SET pg_trickle.agg_diff_cardinality_threshold = 100;

Connection Pooler

v0.19.0+ (STAB-1).

pg_trickle.connection_pooler_mode

Cluster-wide connection pooler compatibility override.

PropertyValue
Typestring
Default'off'
Valid values'off', 'transaction', 'session'
ContextSUSET
ValueBehaviour
off (default)Per-ST pooler_compatibility_mode governs
transactionGlobally disable prepared-statement reuse and suppress NOTIFY emissions (PgBouncer transaction-pool compatibility)
sessionExplicit opt-in to session mode (same as off today, reserved for future use)

See Connection Pooler Compatibility for deployment guidance.

-- Enable transaction-mode pooler compatibility globally
SET pg_trickle.connection_pooler_mode = 'transaction';

History & Retention

v0.19.0+ (DB-5).

pg_trickle.history_retention_days

Number of days to retain rows in pgtrickle.pgt_refresh_history.

PropertyValue
Typeinteger
Default90
Min0 (disabled)
Max36500 (~100 years)
ContextSUSET

The scheduler runs a daily background cleanup that deletes rows older than this many days. Set to 0 to disable automatic cleanup (history grows unbounded — monitor disk usage).

-- Keep 30 days of refresh history
SET pg_trickle.history_retention_days = 30;

Circular Dependencies

v0.7.0+ — Circular dependency support is now fully available for safe monotone cycles in DIFFERENTIAL mode. These settings control whether cycles are allowed at all and how many fixpoint iterations the scheduler will try before surfacing a non-convergence error.

pg_trickle.allow_circular

Master switch for circular (cyclic) stream table dependencies. When false (default), creating a stream table that would introduce a cycle in the dependency graph is rejected with a CycleDetected error. When true, monotone cycles — those containing only safe operators (joins, filters, projections, UNION ALL, INTERSECT, EXISTS) — are allowed.

Non-monotone operators (Aggregate, EXCEPT, Window functions, NOT EXISTS) always block cycle creation regardless of this setting, because they cannot guarantee convergence to a fixed point.

Default: false

SET pg_trickle.allow_circular = true;

pg_trickle.max_fixpoint_iterations

Maximum number of iterations per strongly connected component (SCC) before the scheduler declares non-convergence and marks all SCC members as ERROR. Prevents runaway loops in circular dependency chains.

For most practical use cases (transitive closure, graph reachability), convergence happens in 2–5 iterations. The default of 100 provides ample headroom.

Default: 100
Range: 110000

SET pg_trickle.max_fixpoint_iterations = 50;

pg_trickle.self_monitoring_auto_apply

Added in v0.20.0 (DF-G1).

Controls whether the self-monitoring analytics stream tables can automatically adjust stream table configuration.

ValueBehaviour
off (default)Advisory only — no automatic changes. Dog-feeding stream tables produce analytics that operators and dashboards can read, but nothing is applied automatically.
threshold_onlyAfter each 10-minute auto-apply cycle, reads df_threshold_advice. If a recommendation has HIGH confidence and the recommended threshold differs from the current threshold by more than 5%, applies ALTER STREAM TABLE ... SET auto_threshold = <recommended>. Changes are logged with initiated_by = 'SELF_MONITOR'.
fullSame as threshold_only, plus applies scheduling hints from df_scheduling_interference (future enhancement).

Default: off

-- Enable threshold auto-apply.
SET pg_trickle.self_monitoring_auto_apply = 'threshold_only';

-- Check current setting.
SHOW pg_trickle.self_monitoring_auto_apply;

Prerequisites: Dog-feeding stream tables must be created first via SELECT pgtrickle.setup_self_monitoring(). If the stream tables do not exist, the auto-apply worker is a no-op.

Rate limiting: At most one threshold change per stream table per 10 minutes.

Audit trail: All auto-apply changes are recorded in pgt_refresh_history with initiated_by = 'SELF_MONITOR' and a SKIP action describing the old and new threshold values.


Scheduler Scalability (v0.25.0)

pg_trickle.worker_pool_size

Added in v0.25.0 (SCAL-5).

Number of persistent background workers kept ready in a pool. When > 0, the scheduler reuses these workers across refresh cycles instead of spawning a new worker for each job, eliminating the ~2 ms per-worker startup cost.

PropertyValue
Typeinteger
Default0 (spawn-per-task)
Range0 – 64
ContextSUSET (superuser)
Restart requiredYes
-- Keep 4 persistent workers ready at all times
ALTER SYSTEM SET pg_trickle.worker_pool_size = 4;
SELECT pg_reload_conf();

Set to 0 to use the original spawn-per-task model (default, no change from pre-v0.25.0 behavior).

pg_trickle.template_cache_max_entries

Added in v0.25.0 (CACHE-2).

Maximum number of entries in the per-backend L1 delta SQL template cache. When the cache reaches this limit, the least-recently-used entry is evicted.

PropertyValue
Typeinteger
Default0 (unbounded)
Range0 – 65536
ContextSUSET (superuser)
Restart requiredNo
-- Cap the template cache at 200 entries per backend
SET pg_trickle.template_cache_max_entries = 200;

Operability, Observability & DR (v0.27.0)

pg_trickle.metrics_port

Added in v0.27.0 (OP-2).

TCP port for the Prometheus/OpenMetrics endpoint served by the per-database background scheduler. When non-zero, GET /metrics returns all pg_trickle monitoring metrics in Prometheus text format.

PropertyValue
Typeinteger
Default0 (disabled)
Range0 – 65535
ContextSUSET (superuser)
Restart requiredYes
-- Expose metrics on port 9188 (per database)
ALTER DATABASE mydb SET pg_trickle.metrics_port = 9188;
SELECT pg_reload_conf();

Use 0 (the default) to disable the HTTP endpoint. Each database runs its own scheduler, so the port must be unique per database on the same host.

pg_trickle.metrics_request_timeout_ms

Added in v0.27.0 (METR-2).

Maximum milliseconds the metrics HTTP handler is allowed to run. If a slow HTTP client holds the connection open longer, it is dropped. This protects the scheduler tick loop from being blocked by unresponsive Prometheus scrapers.

PropertyValue
Typeinteger
Default5000 (5 seconds)
Range0 (no timeout) – 600 000 (10 min)
ContextSUSET (superuser)
Restart requiredNo

pg_trickle.frontier_holdback_mode

Added in v0.27.0 (issue #536).

Controls how the scheduler prevents silent data loss from long-running transactions. When an uncommitted transaction has written rows to a source table, those change-buffer rows must not be included in a refresh until the transaction commits (or rolls back).

PropertyValue
Typetext
Default'xmin'
Values'xmin', 'none', 'lsn:<N>'
ContextSUSET (superuser)
Restart requiredNo
ValueBehaviour
'xmin'(Default) Probes pg_stat_activity + pg_prepared_xacts once per tick; caps the frontier to exclude rows from uncommitted transactions.
'none'No holdback — maximum performance but can skip rows from long-lived transactions. Not recommended for production.
'lsn:<N>'Hold back by exactly N bytes. Debugging use only.

pg_trickle.frontier_holdback_warn_seconds

Added in v0.27.0 (issue #536).

Emit a WARNING when a holdback-causing transaction has been blocking frontier advancement for longer than this many seconds. The warning fires at most once per minute to avoid log spam. Useful for identifying forgotten long-running transactions.

PropertyValue
Typeinteger
Default300 (5 minutes)
Range0 (disabled) – 3600
ContextSUSET (superuser)
Restart requiredNo

pg_trickle.publication_lag_warn_bytes

Added in v0.27.0 (PUB-1).

Emit a WARNING and defer change-buffer truncation when a downstream logical replication subscriber's confirmed WAL position lags behind the current change buffer by more than this many bytes.

PropertyValue
Typeinteger
Default0 (disabled)
Range0 – 2 147 483 647
ContextSUSET (superuser)
Restart requiredNo

This prevents data loss for subscribers that rely on the change buffer as part of their replication pipeline.

pg_trickle.schedule_recommendation_min_samples

Added in v0.27.0 (PLAN-4).

Minimum number of refresh-history observations before pgtrickle.recommend_schedule() returns a recommendation with non-zero confidence. Raise this for more reliable recommendations; lower it to get early guidance on new stream tables.

PropertyValue
Typeinteger
Default10
Range1 – 1000
ContextSUSET (superuser)
Restart requiredNo

pg_trickle.schedule_alert_cooldown_seconds

Added in v0.27.0 (PLAN-3).

Minimum seconds between consecutive predicted_sla_breach alerts for the same stream table. Prevents log spam when the cost model consistently predicts an imminent SLA violation.

PropertyValue
Typeinteger
Default300 (5 minutes)
Range0 (no cooldown) – 86 400
ContextSUSET (superuser)
Restart requiredNo

Transactional Outbox (v0.28.0)

These GUCs control the transactional outbox subsystem. See the SQL Reference for the enable_outbox(), poll_outbox(), and consumer group functions.

pg_trickle.outbox_enabled

Master enable/disable switch for the outbox subsystem.

PropertyValue
Typeboolean
Defaulttrue
ContextSUSET (superuser)
Restart requiredNo

pg_trickle.outbox_retention_hours

Default retention period (in hours) for outbox rows. Rows older than this threshold are eligible for the background drain sweep. Can be overridden per stream table via enable_outbox(retention_hours => N).

PropertyValue
Typeinteger
Default24
Range1 – 87 600 (10 years)
ContextSUSET (superuser)
Restart requiredNo

pg_trickle.outbox_drain_batch_size

Number of expired outbox rows deleted in a single background drain pass.

PropertyValue
Typeinteger
Default1000
Range1 – 1 000 000
ContextSUSET (superuser)
Restart requiredNo

pg_trickle.outbox_drain_interval_seconds

Seconds between background outbox drain sweeps. Set to 0 to disable automatic draining (you would then drain manually with outbox_rows_consumed()).

PropertyValue
Typeinteger
Default60
Range0 (disabled) – 86 400
ContextSUSET (superuser)
Restart requiredNo

pg_trickle.outbox_inline_threshold_rows

Maximum number of delta rows stored inline in the outbox payload. When a refresh delta exceeds this count, pg_trickle switches to claim-check mode: the payload is stored in a separate table and poll_outbox() returns is_claim_check = true with a NULL payload.

PropertyValue
Typeinteger
Default10000
Range0 (always inline) – 10 000 000
ContextSUSET (superuser)
Restart requiredNo

pg_trickle.outbox_skip_empty_delta

When true, no outbox row is written for refreshes that produce zero inserted and zero deleted rows. This reduces outbox table growth for frequently-scheduled stream tables with sparse updates.

PropertyValue
Typeboolean
Defaulttrue
ContextSUSET (superuser)
Restart requiredNo

pg_trickle.outbox_storage_critical_mb

Size threshold (in MB) at which the outbox table is considered critically large. When exceeded, a WARNING is emitted on each refresh cycle.

PropertyValue
Typeinteger
Default1024 (1 GB)
Range1 – 10 000 000 (10 TB)
ContextSUSET (superuser)
Restart requiredNo

pg_trickle.outbox_force_retention

When true, outbox rows are kept past their retention_hours expiry until all consumer groups have committed an offset past them. Prevents consumers that are temporarily offline from missing messages.

PropertyValue
Typeboolean
Defaultfalse
ContextSUSET (superuser)
Restart requiredNo

pg_trickle.consumer_dead_threshold_hours

Hours of silence (no heartbeat) after which a consumer is marked as dead and eligible for cleanup (when consumer_cleanup_enabled = true).

PropertyValue
Typeinteger
Default24
Range1 – 87 600
ContextSUSET (superuser)
Restart requiredNo

pg_trickle.consumer_stale_offset_threshold_days

Days of no offset progress after which a consumer's offset record is considered stale and eligible for cleanup.

PropertyValue
Typeinteger
Default7
Range1 – 3650
ContextSUSET (superuser)
Restart requiredNo

pg_trickle.consumer_cleanup_enabled

Enable automatic background cleanup of dead and stale consumer offsets and leases. When disabled, old records must be removed manually.

PropertyValue
Typeboolean
Defaulttrue
ContextSUSET (superuser)
Restart requiredNo

Transactional Inbox (v0.28.0)

These GUCs control the transactional inbox subsystem. See the SQL Reference for create_inbox() and related functions.

pg_trickle.inbox_enabled

Master enable/disable switch for the inbox subsystem.

PropertyValue
Typeboolean
Defaulttrue
ContextSUSET (superuser)
Restart requiredNo

pg_trickle.inbox_processed_retention_hours

Retention period (in hours) for successfully processed inbox messages (processed_at IS NOT NULL). Rows older than this threshold are deleted by the background drain sweep.

PropertyValue
Typeinteger
Default72 (3 days)
Range1 – 87 600
ContextSUSET (superuser)
Restart requiredNo

pg_trickle.inbox_dlq_retention_hours

Retention period (in hours) for dead-letter queue rows. Set to 0 to keep DLQ rows indefinitely (useful for forensics and manual replay).

PropertyValue
Typeinteger
Default0 (keep forever)
Range0 (keep forever) – 87 600
ContextSUSET (superuser)
Restart requiredNo

pg_trickle.inbox_drain_batch_size

Number of expired inbox messages deleted in a single background drain pass.

PropertyValue
Typeinteger
Default1000
Range1 – 1 000 000
ContextSUSET (superuser)
Restart requiredNo

pg_trickle.inbox_drain_interval_seconds

Seconds between inbox background drain sweeps. Set to 0 to disable automatic draining.

PropertyValue
Typeinteger
Default60
Range0 (disabled) – 86 400
ContextSUSET (superuser)
Restart requiredNo

pg_trickle.inbox_dlq_alert_max_per_refresh

Maximum number of DLQ alert events raised per refresh cycle. Limits log volume when many messages are failing simultaneously. Set to 0 to disable DLQ alerting.

PropertyValue
Typeinteger
Default10
Range0 (disabled) – 100
ContextSUSET (superuser)
Restart requiredNo

Pre-GA Correctness & Stability (v0.30.0)

pg_trickle.use_sqlstate_classification

Added in v0.30.0 (SCAL-1).

When true, the retry classification for SPI errors uses the five-character PostgreSQL SQLSTATE class rather than message-text pattern matching. This makes retry decisions locale-independent (safe with lc_messages=fr_FR or other non-English locales).

PropertyValue
Typeboolean
Defaultfalse
ContextSUSET (superuser)
Restart requiredNo
-- Enable locale-safe SQLSTATE-based retry classification
ALTER SYSTEM SET pg_trickle.use_sqlstate_classification = true;
SELECT pg_reload_conf();

Set to true in any deployment where lc_messages is not en_US.UTF-8, or in mixed-locale clusters.

pg_trickle.template_cache_max_age_hours

Added in v0.30.0 (STAB-3).

Maximum age (in hours) for entries in the L2 catalog-backed template cache (pgtrickle.pgt_template_cache). Entries older than this limit are purged by the background scheduler on each tick. Keeping the cache fresh ensures stale delta SQL templates do not persist after a stream table schema change.

PropertyValue
Typeinteger
Default168 (7 days)
Range1 – 8760 (1 year)
ContextSUSET (superuser)
Restart requiredNo
-- Purge L2 cache entries older than 24 hours
ALTER SYSTEM SET pg_trickle.template_cache_max_age_hours = 24;
SELECT pg_reload_conf();

Lower values reduce the risk of stale templates surviving a schema change. Higher values improve performance by retaining warm cache entries across long maintenance windows.

pg_trickle.max_parse_nodes

Added in v0.30.0 (PERF-2).

Maximum estimated number of parse-tree nodes allowed in a stream table defining query. When > 0, queries whose estimated node count exceeds this limit are rejected with a QueryTooComplex error before the OpTree builder allocates memory. This guards against pathological queries such as WHERE id IN (1, 2, …, 1 000 000) that would otherwise exhaust per-backend memory during delta SQL generation.

PropertyValue
Typeinteger
Default0 (disabled)
Range0 (disabled) – 10,000,000
ContextSUSET (superuser)
Restart requiredNo
-- Reject defining queries with more than 100 000 estimated nodes
ALTER SYSTEM SET pg_trickle.max_parse_nodes = 100000;
SELECT pg_reload_conf();

Node count is estimated conservatively as len(query) / 4. The default of 0 disables the check for backward compatibility. A value of 100000 is recommended for production deployments.


Citus Distributed Tables (v0.32.0+)

Configuration for Citus distributed-table CDC and stream table support. See docs/integrations/citus.md for the full setup guide.


pg_trickle.citus_st_lock_lease_ms

Duration in milliseconds of the pgtrickle.pgt_st_locks lease used for cross-node refresh coordination in Citus clusters.

The lease prevents two coordinator nodes from applying changes to the same distributed stream table simultaneously. When pg_ripple is deployed alongside pg_trickle, set this value to be pg_ripple.merge_fence_timeout_ms to prevent the lease from expiring mid-merge.

PropertyValue
Typeinteger
Default60000 (60 seconds)
Range0 (disabled) – 600,000 ms (10 minutes)
ContextSUSET (superuser)
Restart requiredNo
Added inv0.33.0 (COORD-2)
-- Align with a 30-second pg_ripple merge fence
ALTER SYSTEM SET pg_trickle.citus_st_lock_lease_ms = 45000;
SELECT pg_reload_conf();

pg_trickle.citus_worker_retry_ticks

Number of consecutive per-worker poll failures before the scheduler emits a WARNING in the PostgreSQL log and flags the worker in pgtrickle.citus_status. The warning is for operator attention — healthy workers continue refreshing uninterrupted while a failed worker is skipped.

Set to 0 to disable the alerting entirely (failures are still logged at LOG level per tick).

PropertyValue
Typeinteger
Default5
Range0 (disabled) – 100
ContextSUSET (superuser)
Restart requiredNo
Added inv0.34.0 (COORD-15)
-- Alert after 3 consecutive failures instead of 5
ALTER SYSTEM SET pg_trickle.citus_worker_retry_ticks = 3;
SELECT pg_reload_conf();

-- Disable alerting (failures logged at LOG level only)
ALTER SYSTEM SET pg_trickle.citus_worker_retry_ticks = 0;
SELECT pg_reload_conf();

WAL Backpressure & Logging (v0.36.0)

pg_trickle.enforce_backpressure

When true, CDC trigger writes are suppressed when the WAL replication slot lag exceeds pg_trickle.slot_lag_critical_threshold_mb. Writes resume once the lag drops below 50 % of the threshold (hysteresis).

When false (default), pg_trickle only emits WARNING log messages when slot lag is critical but does not suppress writes. Use enforce_backpressure = true only when disk exhaustion is an immediate risk.

Important: enforce_backpressure = true operates in discard mode — changes that arrive while backpressure is active are dropped from the CDC buffer. Stream tables must be reinitialized after backpressure clears to recover from the data gap. See also pg_trickle.cdc_capture_mode.

PropertyValue
Typeboolean
Defaultfalse
ContextSUSET (superuser)
Restart requiredNo
Added inv0.36.0 (A12)
-- Enable backpressure suppression (discard mode)
ALTER SYSTEM SET pg_trickle.enforce_backpressure = true;
SELECT pg_reload_conf();
-- After clearing: reinitialize affected stream tables
SELECT pgtrickle.refresh_stream_table('my_stream', 'FULL');

pg_trickle.log_format

Log format for pg_trickle structured log events.

  • "text" (default): Unstructured human-readable messages.
  • "json": Structured JSON with fields event, pgt_id, cycle_id, duration_ms, refresh_reason, error_code. Suitable for log aggregation pipelines (Loki, OpenTelemetry).
PropertyValue
Typestring
Defaulttext
Valid valuestext, json
ContextSUSET (superuser)
Restart requiredNo
Added inv0.36.0 (A20)
-- Switch to JSON structured logging
ALTER SYSTEM SET pg_trickle.log_format = 'json';
SELECT pg_reload_conf();

pgVectorMV & OpenTelemetry (v0.37.0)

pg_trickle.enable_vector_agg

When true, avg(vector_col) and sum(vector_col) in stream table defining queries are handled by the DVM engine using incremental aggregate operators for vector, halfvec, and sparsevec types. Requires the pgvector extension.

PropertyValue
Typeboolean
Defaultfalse
ContextSUSET (superuser)
Restart requiredNo
Added inv0.37.0 (F4)
ALTER SYSTEM SET pg_trickle.enable_vector_agg = true;
SELECT pg_reload_conf();

pg_trickle.enable_trace_propagation

When true, pg_trickle reads pg_trickle.trace_id from the session GUC at CDC capture time and stores the W3C traceparent in the change buffer column __pgt_trace_context. At refresh time, spans are exported via OTLP/gRPC to pg_trickle.otel_endpoint.

PropertyValue
Typeboolean
Defaultfalse
ContextSUSET (superuser)
Restart requiredNo
Added inv0.37.0 (F10)

pg_trickle.otel_endpoint

OTLP/gRPC endpoint for OpenTelemetry span export. Empty string (default) disables span export.

PropertyValue
Typestring
Default'' (disabled)
Example'http://jaeger:4317'
ContextSUSET (superuser)
Restart requiredNo
Added inv0.37.0 (F10)
-- Export spans to a local Jaeger instance
ALTER SYSTEM SET pg_trickle.otel_endpoint = 'http://localhost:4317';
ALTER SYSTEM SET pg_trickle.enable_trace_propagation = true;
SELECT pg_reload_conf();

pg_trickle.trace_id

Session-level W3C traceparent header for trace context propagation. Set this in the application session before DML so CDC capture links the changes to the initiating trace. Requires enable_trace_propagation = true.

PropertyValue
Typestring
Default''
ContextUSERSET (any user)
Restart requiredNo
Added inv0.37.0 (F10)
-- Propagate a trace across CDC capture
SET pg_trickle.trace_id = '00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01';
INSERT INTO orders VALUES (42, 'widget', 5);

Operational Truthfulness (v0.39.0)

pg_trickle.cdc_capture_mode

Controls what happens to CDC writes when pg_trickle.cdc_paused = on.

  • "discard" (default): CDC trigger bodies return NULL; changes arriving while paused are dropped. Stream tables must be reinitialized after un-pausing to recover from the data gap.
  • "hold": Reserved for a future durable capture-and-hold mode. Currently emits a WARNING and falls back to "discard".

Operator checklist: Before setting cdc_paused = on, check pgtrickle.cdc_pause_status() to confirm the active mode. After un-pausing in discard mode, run pgtrickle.refresh_stream_table('<name>') with FULL mode for each affected stream table.

PropertyValue
Typestring
Defaultdiscard
Valid valuesdiscard, hold (reserved)
ContextSUSET (superuser)
Restart requiredNo
Added inv0.39.0 (O39-8)
-- Check the current CDC pause status
SELECT * FROM pgtrickle.cdc_pause_status();

-- Pause CDC (discard mode — changes arriving now are DROPPED)
ALTER SYSTEM SET pg_trickle.cdc_paused = on;
SELECT pg_reload_conf();

-- After maintenance, un-pause and reinitialize affected tables
ALTER SYSTEM SET pg_trickle.cdc_paused = off;
SELECT pg_reload_conf();
-- Full refresh to recover from the gap:
SELECT pgtrickle.refresh_stream_table('public.my_stream_table', 'FULL');

GUC Interaction Matrix

Some GUC variables interact with or depend on each other. The table below documents these cross-dependencies to help avoid misconfiguration.

GUC AGUC BInteraction
auto_backoffmin_schedule_secondsauto_backoff stretches the effective interval up to 8× the configured schedule, but never below min_schedule_seconds. If min_schedule_seconds is high, backoff has limited room to operate.
auto_backoffdefault_schedule_secondsThe backoff multiplier is applied to default_schedule_seconds (or the per-ST override); raising this value gives backoff a wider range.
parallel_refresh_modemax_concurrent_refreshesparallel_refresh_mode = 'on' dispatches independent STs to parallel workers, up to max_concurrent_refreshes per database. Setting max_concurrent_refreshes = 1 effectively disables parallelism even when the mode is 'on'.
parallel_refresh_modemax_dynamic_refresh_workersmax_dynamic_refresh_workers is a cluster-wide cap across all databases. If you have 4 databases each wanting 4 concurrent refreshes, set this to ≥16 (or accept queuing).
max_dynamic_refresh_workersper_database_worker_quotaWhen per_database_worker_quota > 0, each database claims at most that many workers from the shared max_dynamic_refresh_workers pool. Set per_database_worker_quota to max_dynamic_refresh_workers / n_databases for equal sharing. Burst to 150% is allowed when the cluster is < 80% loaded.
differential_max_change_ratiofuse_default_ceilingBoth guard against large change batches but at different levels: differential_max_change_ratio triggers a FULL refresh fallback (proportional to table size), while fuse_default_ceiling halts refresh entirely (absolute row count). The fuse fires first if the change count exceeds it, regardless of the ratio.
block_source_ddlDDL operationsWhen true, DDL on source tables (ALTER TABLE, DROP COLUMN) is blocked by an event trigger. Disable temporarily with SET pg_trickle.block_source_ddl = false before schema migrations, then re-enable.
cdc_modecdc_trigger_modecdc_trigger_mode ('statement' / 'row') only applies when CDC is trigger-based. When cdc_mode = 'wal' (or after auto-transition to WAL), cdc_trigger_mode is irrelevant.
cdc_modewal_transition_timeoutwal_transition_timeout only applies when cdc_mode = 'auto'. It controls how many seconds to wait for the first WAL-based refresh to succeed before falling back to triggers.
cleanup_use_truncatecompact_thresholdcleanup_use_truncate = true uses TRUNCATE to clear consumed change buffers (fastest, acquires AccessExclusiveLock briefly). compact_threshold controls when fully-consumed buffers are compacted via DELETE — only relevant when TRUNCATE is disabled.
buffer_partitioningcompact_thresholdIn 'auto' mode, compact_threshold serves as the promotion trigger: if a buffer exceeds this many rows in a single refresh cycle, it is promoted to RANGE(lsn) partitioned mode. Lowering compact_threshold makes auto-promotion more sensitive.
allow_circularmax_fixpoint_iterationsmax_fixpoint_iterations is only evaluated when allow_circular = true. It caps the number of convergence iterations for circular dependency chains.
ivm_topk_max_limitTopK queriesQueries with LIMIT > ivm_topk_max_limit fall back to FULL refresh instead of the optimized TopK path. Raise this if you have legitimate large TopK queries.
ivm_recursive_max_depthRecursive CTEsRecursive expansion beyond ivm_recursive_max_depth iterations is terminated with a warning and falls back to FULL refresh. Set to 0 to disable the guard (not recommended).

Tuning Profiles

Three named profiles for common deployment patterns. Copy the relevant settings into your postgresql.conf and adjust to taste.

Low-Latency Profile

Goal: Minimize end-to-end latency from base table write to stream table update. Best for dashboards, real-time analytics, and operational monitoring.

# Fast scheduling (polling-based, sub-200ms median latency)
pg_trickle.scheduler_interval_ms = 200       # poll interval
pg_trickle.min_schedule_seconds = 1
pg_trickle.default_schedule_seconds = 1

# Parallel refresh for independent STs
pg_trickle.parallel_refresh_mode = 'on'
pg_trickle.max_concurrent_refreshes = 4

# Lean merge
pg_trickle.merge_planner_hints = true
pg_trickle.merge_work_mem_mb = 128           # more memory = fewer disk sorts
pg_trickle.cleanup_use_truncate = true
pg_trickle.use_prepared_statements = true

# Guardrails
pg_trickle.auto_backoff = true               # prevent CPU runaway
pg_trickle.fuse_default_ceiling = 0          # disabled — latency over safety
pg_trickle.block_source_ddl = true

High-Throughput Profile

Goal: Maximize rows-per-second processed across many stream tables under heavy write load. Accepts slightly higher latency in exchange for better batching and resource efficiency.

# Batched scheduling
pg_trickle.scheduler_interval_ms = 2000      # 2-second poll interval
pg_trickle.min_schedule_seconds = 2
pg_trickle.default_schedule_seconds = 5

# Heavy parallelism
pg_trickle.parallel_refresh_mode = 'on'
pg_trickle.max_concurrent_refreshes = 8
pg_trickle.max_dynamic_refresh_workers = 8

# Aggressive performance
pg_trickle.merge_planner_hints = true
pg_trickle.merge_work_mem_mb = 256           # large work_mem for big deltas
pg_trickle.merge_seqscan_threshold = 0.01    # allow seq scans for >1% changes
pg_trickle.cleanup_use_truncate = true
pg_trickle.use_prepared_statements = true
pg_trickle.auto_backoff = true
pg_trickle.buffer_partitioning = 'auto'      # O(1) cleanup for hot buffers

# Safety for bulk workloads
pg_trickle.fuse_default_ceiling = 500000     # pause on >500K changes
pg_trickle.differential_max_change_ratio = 0.25  # FULL fallback at 25%
pg_trickle.block_source_ddl = true

Resource-Constrained Profile

Goal: Minimize CPU and memory footprint for small instances, shared hosting, or development environments. Accepts higher latency and slower throughput.

# Poll-based scheduling (conservative)
pg_trickle.scheduler_interval_ms = 5000      # 5-second poll

# Conservative scheduling
pg_trickle.min_schedule_seconds = 5
pg_trickle.default_schedule_seconds = 10

# Minimal parallelism
pg_trickle.parallel_refresh_mode = 'off'     # single-threaded refresh
pg_trickle.max_concurrent_refreshes = 1
pg_trickle.max_dynamic_refresh_workers = 1

# Conservative memory
pg_trickle.merge_work_mem_mb = 32
pg_trickle.merge_planner_hints = true
pg_trickle.cleanup_use_truncate = true

# Tight guardrails
pg_trickle.auto_backoff = true
pg_trickle.fuse_default_ceiling = 100000
pg_trickle.differential_max_change_ratio = 0.10
pg_trickle.block_source_ddl = true
pg_trickle.buffer_alert_threshold = 500000

Complete postgresql.conf Example

# Required
shared_preload_libraries = 'pg_trickle'

# Essential
pg_trickle.enabled = true
pg_trickle.cdc_mode = 'auto'
pg_trickle.scheduler_interval_ms = 1000
pg_trickle.min_schedule_seconds = 1
pg_trickle.default_schedule_seconds = 1
pg_trickle.max_consecutive_errors = 3

# WAL CDC
pg_trickle.wal_transition_timeout = 300
pg_trickle.slot_lag_warning_threshold_mb = 100
pg_trickle.slot_lag_critical_threshold_mb = 1024

# Refresh performance
pg_trickle.differential_max_change_ratio = 0.15
pg_trickle.merge_planner_hints = true
pg_trickle.merge_work_mem_mb = 64
pg_trickle.cleanup_use_truncate = true
pg_trickle.use_prepared_statements = true
pg_trickle.user_triggers = 'auto'

# Guardrails & limits
pg_trickle.block_source_ddl = false
pg_trickle.buffer_alert_threshold = 1000000
pg_trickle.compact_threshold = 100000
pg_trickle.buffer_partitioning = 'off'
pg_trickle.max_grouping_set_branches = 64
pg_trickle.max_parse_depth = 64
pg_trickle.ivm_topk_max_limit = 1000
pg_trickle.ivm_recursive_max_depth = 100

# Circular dependencies (v0.7.0+)
pg_trickle.allow_circular = false                # master switch
pg_trickle.max_fixpoint_iterations = 100         # convergence limit

# Parallel refresh (v0.11.0+, default 'on')
pg_trickle.parallel_refresh_mode = 'on'         # 'off' | 'dry_run' | 'on'
pg_trickle.max_dynamic_refresh_workers = 4       # cluster-wide worker cap
pg_trickle.max_concurrent_refreshes = 4          # per-database dispatch cap
pg_trickle.max_parallel_workers = 0              # user-facing parallel cap (0 = use automatic sizing)

# Predictive cost model (v0.22.0+)
pg_trickle.prediction_window = 60               # minutes of history for regression
pg_trickle.prediction_ratio = 1.5               # diff/full cost ratio threshold
pg_trickle.prediction_min_samples = 5           # minimum samples before model activates

# DVM scaling & diagnostics (v0.23.0+)
pg_trickle.log_delta_sql = false                # log delta SQL at DEBUG1 (diagnostic only)
pg_trickle.delta_work_mem = 0                   # work_mem MB for delta execution (0 = inherit)
pg_trickle.delta_enable_nestloop = true         # allow nested-loop joins in delta SQL
pg_trickle.analyze_before_delta = true          # ANALYZE change buffers before delta SQL
pg_trickle.max_change_buffer_alert_rows = 0     # change buffer overflow alert threshold (0 = off)
pg_trickle.diff_output_format = 'split'         # 'split' (DI-2 pairs) | 'merged' (compat)

# Scheduler scalability (v0.25.0+)
pg_trickle.worker_pool_size = 0                 # 0 = spawn-per-task; >0 = persistent pool
pg_trickle.template_cache_max_entries = 0       # 0 = unbounded

# Operability & observability (v0.27.0+)
pg_trickle.metrics_port = 0                     # 0 = disabled; set per-database
pg_trickle.frontier_holdback_mode = 'xmin'      # xmin | none | lsn:<N>
pg_trickle.frontier_holdback_warn_seconds = 300 # warn after 5 min of blocked frontier
pg_trickle.publication_lag_warn_bytes = 0       # 0 = disabled

# Transactional outbox (v0.28.0+)
pg_trickle.outbox_enabled = true
pg_trickle.outbox_retention_hours = 24
pg_trickle.outbox_inline_threshold_rows = 10000
pg_trickle.outbox_skip_empty_delta = true
pg_trickle.outbox_force_retention = false
pg_trickle.consumer_dead_threshold_hours = 24
pg_trickle.consumer_cleanup_enabled = true

# Transactional inbox (v0.28.0+)
pg_trickle.inbox_enabled = true
pg_trickle.inbox_processed_retention_hours = 72
pg_trickle.inbox_dlq_retention_hours = 0       # 0 = keep forever
pg_trickle.inbox_dlq_alert_max_per_refresh = 10

# Citus distributed tables (v0.32.0+)
pg_trickle.citus_st_lock_lease_ms = 60000      # lease duration for cross-node coordination
pg_trickle.citus_worker_retry_ticks = 5        # failures before WARNING in citus_status

# Advanced / internal
pg_trickle.change_buffer_schema = 'pgtrickle_changes'
pg_trickle.foreign_table_polling = false

Runtime Configuration

All GUC variables can be changed at runtime by a superuser:

-- View current settings
SHOW pg_trickle.enabled;
SHOW pg_trickle.parallel_refresh_mode;

-- Enable parallel refresh for current session
SET pg_trickle.parallel_refresh_mode = 'on';

-- Change persistently (requires reload)
ALTER SYSTEM SET pg_trickle.scheduler_interval_ms = 500;
SELECT pg_reload_conf();

Further Reading


Appendix: Deprecated / Compatibility GUCs

The GUCs listed below are deprecated and retained only for backward compatibility (to allow rolling upgrades without configuration errors). They have no effect on extension behaviour and should be removed from new deployments.

pg_trickle.event_driven_wake

⚠️ Removed in v0.51.0 — This GUC has been fully removed from the extension. If it appears in postgresql.conf after upgrading to v0.51.0, PostgreSQL will emit an "unrecognized configuration parameter" warning at startup. Remove it to suppress the warning.

Migration: Remove pg_trickle.event_driven_wake from postgresql.conf and any ALTER SYSTEM settings. No replacement is needed — the scheduler always uses efficient latch-based polling.

pg_trickle.wake_debounce_ms

⚠️ Removed in v0.51.0 — This GUC has been fully removed together with event_driven_wake. Remove it from postgresql.conf to avoid an "unrecognized configuration parameter" warning at startup.

Migration: Remove pg_trickle.wake_debounce_ms from postgresql.conf. No replacement is needed.

pg_trickle.merge_planner_hints

⚠️ Deprecated — accepted for backwards compatibility but has no effect. Will be removed in a future major version.

pg_trickle.user_triggers (value 'on')

⚠️ Deprecated — The value 'on' is accepted as a deprecated alias for 'auto' and has no distinct behaviour. Use 'auto' (default) or 'off' instead. The 'on' alias will be removed in a future major version.

Predictive Cost Model

pg_trickle's AUTO refresh mode does not just toggle between FULL and DIFFERENTIAL by hand-tuned thresholds. It runs a predictive cost model that estimates the expected cost of each mode for the next refresh, given the current change ratio and historical runtimes, and picks the cheaper one.

This page explains how the model works, the levers you can pull, and when it is safe to ignore the model and pin a mode by hand.


Why a cost model?

Differential refresh is dramatically faster than FULL refresh — when the change ratio is small. As the change ratio grows, the delta overhead (computing ΔQ, scanning change buffers, planning the MERGE) starts to dominate, and at some point a full recomputation wins.

A static threshold ("switch at 50%") is a reasonable default, but it is wrong in either direction for many real queries:

  • Aggregates with a few groups recompute trivially in FULL — DIFF has to do work for nothing.
  • Wide joins with selective filters benefit from DIFF even at very high change ratios.
  • A query whose source has just doubled in size will see different trade-offs from yesterday's plan.

The cost model uses measured last_full_ms and last_diff_ms together with the current change ratio to make a per-refresh decision.


Inputs the model uses

For each stream table, on each scheduler tick:

InputSource
change_ratio_currentpending_changes / source_row_count
last_full_msmost recent full refresh duration
last_diff_msmost recent differential refresh duration
pending_rowssize of the change buffer
delta_amplification_factorlearned multiplier: estimated delta volume given pending changes
cost_model_safety_margin (GUC)bias toward FULL or DIFF

It then computes:

predicted_diff_ms = base_diff_overhead
                  + per_row_diff_cost × pending_rows × delta_amplification_factor

predicted_full_ms = last_full_ms × source_growth_factor

AUTO chooses the cheaper one (after applying the safety margin).


Inspect what the model would do

-- Recommendation and reasoning for a single stream table
SELECT * FROM pgtrickle.recommend_refresh_mode('order_totals');
-- recommended_mode | reason                           | composite_score
-- DIFFERENTIAL     | change ratio 0.018, est. 22 ms   | 0.31

-- Rolling efficiency: refresh durations vs source-table size
SELECT * FROM pgtrickle.refresh_efficiency('order_totals');

-- Or for the whole catalogue
SELECT pgt_name, recommended_mode, change_ratio, est_diff_ms, est_full_ms
FROM pgtrickle.recommend_refresh_mode_all();

Tuning levers

GUCDefaultEffect
pg_trickle.cost_model_safety_margin1.20Multiplier on the predicted DIFF cost. > 1.0 biases toward FULL.
pg_trickle.differential_max_change_ratio0.50Hard cap: never pick DIFF above this ratio.
pg_trickle.adaptive_full_threshold0.50Force FULL when change ratio exceeds this (legacy fallback).
pg_trickle.delta_amplification_threshold5.0When delta_volume / pending_changes > T, prefer FULL.
pg_trickle.max_delta_estimate_rows10000000Cap on the model's estimate; above this, prefer FULL.
pg_trickle.planner_aggressiveoffAllow the model to override per-table refresh-mode hints.

A typical "trust the model" setup:

ALTER SYSTEM SET pg_trickle.cost_model_safety_margin = '1.0';
ALTER SYSTEM SET pg_trickle.planner_aggressive       = 'on';
SELECT pg_reload_conf();

A typical "be conservative" setup (prefer FULL on uncertainty):

ALTER SYSTEM SET pg_trickle.cost_model_safety_margin       = '1.5';
ALTER SYSTEM SET pg_trickle.differential_max_change_ratio  = '0.30';

When to override the model

There are good reasons to pin a mode manually:

  • Queries the model can't see well. Holistic aggregates (PERCENTILE_*, STRING_AGG) often plan badly under DIFF; pin to FULL.
  • Workloads with large but cheap full refreshes. Tiny aggregates that re-aggregate from a small source — the FULL plan is so cheap that the model's DIFF estimate cannot beat it.
  • Predictable bursts. If you know that 9–10 a.m. is your upload window, pre-emptively ALTER STREAM TABLE … SET refresh_mode = 'FULL' for that window.

Pin a mode with:

SELECT pgtrickle.alter_stream_table('order_totals', refresh_mode => 'FULL');

The model will not override a manually-pinned mode unless planner_aggressive = on.


Verifying the model in production

-- Compare actual mode vs recommended mode over the last 1000 refreshes
WITH recent AS (
    SELECT pgt_name, refresh_mode, started_at, finished_at,
           EXTRACT(epoch FROM (finished_at - started_at)) * 1000 AS ms
    FROM pgtrickle.pgt_refresh_history
    WHERE started_at > now() - interval '1 hour'
)
SELECT pgt_name,
       refresh_mode,
       AVG(ms) AS avg_ms,
       COUNT(*) AS n
FROM recent
GROUP BY pgt_name, refresh_mode
ORDER BY pgt_name;

If a stream table is consistently slow in AUTO mode, look at the distribution of modes chosen. Often the answer is "the model is flapping" — increase cost_model_safety_margin or pin a mode.


See also: Performance Cookbook · tuning-refresh-mode · Configuration – Refresh Performance · SQL Reference – Diagnostics

Pre-Deployment Checklist

Complete this checklist before deploying pg_trickle to a new environment. Each item links to the relevant documentation for details.

Version: v0.14.0+. Earlier versions may have different requirements.


1. PostgreSQL Version

  • PostgreSQL 18.x is required (pg_trickle is compiled against PG 18)
  • Extension binary matches your exact PostgreSQL major version
SELECT version();  -- Must show PostgreSQL 18.x

2. shared_preload_libraries

pg_trickle must be loaded at server startup via shared_preload_libraries. Without this, GUC variables and the background scheduler are not available.

# postgresql.conf
shared_preload_libraries = 'pg_trickle'
  • shared_preload_libraries includes pg_trickle
  • PostgreSQL has been restarted after changing this setting (reload is not sufficient)
SHOW shared_preload_libraries;  -- Must include pg_trickle

Managed PostgreSQL: Some providers (Supabase, Neon) do not support custom shared_preload_libraries. Check your provider's extension compatibility list. AWS RDS and Google Cloud SQL support custom shared libraries via parameter groups.


pg_trickle works without wal_level = logical — it uses trigger-based CDC by default. However, WAL-based CDC provides lower overhead on write-heavy workloads.

# postgresql.conf (optional — for WAL-based CDC)
wal_level = logical
max_replication_slots = 10   # At least 1 per tracked source table
  • Decide: trigger-based CDC (default) or WAL-based CDC
  • If WAL: wal_level = logical and server restarted
  • If WAL: max_replication_slots is sufficient for your source table count

Note: CDC mode is configurable per stream table. The default cdc_mode = 'auto' starts with triggers and transitions to WAL automatically when wal_level = logical is detected. See CONFIGURATION.md for details.


4. Extension Installation

CREATE EXTENSION pg_trickle;

-- Verify installation
SELECT extname, extversion FROM pg_extension WHERE extname = 'pg_trickle';
  • Extension created successfully
  • Version matches expected release

5. Background Scheduler

The scheduler runs as a background worker and manages automatic refresh. Verify it's running:

SELECT pid, backend_type, state
FROM pg_stat_activity
WHERE backend_type = 'pg_trickle scheduler';
  • Scheduler process is visible in pg_stat_activity
  • pg_trickle.enabled = true (default; set to false to disable)

6. Connection Pooler Compatibility

PgBouncer (Transaction Mode)

PgBouncer in transaction pooling mode drops session state between transactions. pg_trickle needs special handling:

  • Enable pooler_compatibility_mode on affected stream tables:
SELECT pgtrickle.alter_stream_table('my_st',
    pooler_compatibility_mode => true);
  • Or set globally via GUC:
pg_trickle.pooler_compatibility_mode = true

PgBouncer (Session Mode)

Session mode preserves session state — no special configuration needed.

Supavisor / Other Poolers

Some poolers (Supavisor, pgcat) have their own compatibility characteristics. Test with pgtrickle.validate_query() before deploying.


These are sensible defaults for most workloads. Adjust based on monitoring data.

# Core settings (usually fine as defaults)
pg_trickle.enabled = true                    # Enable scheduler
pg_trickle.schedule_interval = '5s'          # Global default refresh interval
pg_trickle.max_concurrent_refreshes = 4      # Parallel refresh limit

# Performance tuning
pg_trickle.planner_aggressive = true         # Enable MERGE planner hints
pg_trickle.tiered_scheduling = true          # Tier-aware scheduling

# CDC mode
pg_trickle.cdc_mode = 'auto'                # auto | trigger | wal

# Safety
pg_trickle.unlogged_buffers = false          # true = faster but not crash-safe
pg_trickle.fuse_default_ceiling = 10000      # Auto-fuse change threshold
  • Review GUC values for your workload
  • See CONFIGURATION.md for the full reference

8. Resource Planning

Memory

  • Each background worker uses a separate PostgreSQL backend
  • work_mem applies to each worker's delta SQL execution
  • Monitor RSS growth via pg_stat_activity or OS-level tools

Storage

  • Change buffer tables (pgtrickle_changes.changes_*) grow between refreshes
  • Buffer size depends on DML rate × refresh interval
  • Monitor via pgtrickle.shared_buffer_stats()

Connections

  • The scheduler uses pg_trickle.max_concurrent_refreshes backend connections

  • Ensure max_connections has headroom for workers + application

  • max_connections is at least application connections + pg_trickle.max_concurrent_refreshes + 5


9. Monitoring Setup

Essential Queries

-- Stream table health overview
SELECT pgt_name, status, staleness, refresh_mode
FROM pgtrickle.stream_tables_info
ORDER BY staleness DESC NULLS LAST;

-- Refresh efficiency
SELECT pgt_name, diff_speedup, avg_change_ratio
FROM pgtrickle.refresh_efficiency();

-- Error states
SELECT pgt_name, status, last_error_message, last_error_at
FROM pgtrickle.pgt_stream_tables
WHERE status IN ('ERROR', 'SUSPENDED');

Grafana / Prometheus

See the monitoring/ directory for ready-to-use Grafana dashboards and Prometheus configuration.

  • Monitoring configured for stream table health
  • Alerting on ERROR/SUSPENDED status

10. Backup & Restore

pg_trickle stream tables are standard PostgreSQL tables and are included in pg_dump / pg_restore. See BACKUP_AND_RESTORE.md for details.

  • Backup strategy accounts for both source tables and stream tables
  • Restore procedure tested (stream tables may need re-initialization)

Quick Validation Script

Run this after deployment to verify everything is working:

-- 1. Extension loaded
SELECT extname, extversion FROM pg_extension WHERE extname = 'pg_trickle';

-- 2. Scheduler running
SELECT COUNT(*) > 0 AS scheduler_alive
FROM pg_stat_activity
WHERE backend_type = 'pg_trickle scheduler';

-- 3. Create a test stream table
CREATE TABLE _deploy_test_src (id INT PRIMARY KEY, val INT);
INSERT INTO _deploy_test_src VALUES (1, 100), (2, 200);

SELECT pgtrickle.create_stream_table(
    '_deploy_test_st',
    'SELECT id, val FROM _deploy_test_src',
    refresh_mode => 'FULL'
);

SELECT pgtrickle.refresh_stream_table('_deploy_test_st');

-- 4. Verify data
SELECT * FROM _deploy_test_st ORDER BY id;
-- Expected: (1, 100), (2, 200)

-- 5. Cleanup
SELECT pgtrickle.drop_stream_table('_deploy_test_st');
DROP TABLE _deploy_test_src;

Connection Pooler Compatibility

Added in v0.19.0 (UX-4 / STAB-1).

pg_trickle uses prepared statements and NOTIFY internally. These features require special handling when a connection pooler sits between the application and PostgreSQL.

PgBouncer Transaction Mode

In PgBouncer transaction pooling mode, each transaction may land on a different server-side connection. Prepared statements and LISTEN/NOTIFY do not survive across transactions.

Recommended configuration:

# postgresql.conf
pg_trickle.connection_pooler_mode = 'transaction'

This cluster-wide GUC:

  • Disables prepared-statement reuse for all stream tables.
  • Suppresses NOTIFY pg_trickle_refresh emissions (listeners on other connections will not receive them anyway in transaction mode).

Alternatively, enable pooler compatibility per stream table:

SELECT pgtrickle.alter_stream_table('my_stream_table',
    pooler_compatibility_mode => true);

PgBouncer Session Mode

Session pooling is fully compatible — no special configuration needed.

pgcat / Supavisor

These poolers generally support prepared statements and NOTIFY. Set pg_trickle.connection_pooler_mode = 'off' (the default).

Kubernetes / CNPG

See Scaling — CNPG for connection pooler configuration in Kubernetes environments.


Row-Level Security

Important: pg_trickle background workers execute refresh queries with SET LOCAL row_security = off. This is intentional and matches the semantics of PostgreSQL's REFRESH MATERIALIZED VIEW.

Implications

  • Stream table output always contains the full, unfiltered result set regardless of RLS policies on source tables.
  • Row-Level Security policies on source tables do not filter what ends up in a stream table.
  • If the source table has RLS and the defining query selects *, all rows (including those that would be hidden by RLS for normal roles) will be included in the stream table.

Mitigations

  • Audit all stream table queries: ensure sensitive columns are excluded or aggregated.
  • Do not expose stream tables directly to end-user roles if the source tables are protected by RLS.
  • Use a per-role VIEW on top of the stream table to re-apply filtering: CREATE VIEW orders_view AS SELECT * FROM order_totals_st WHERE user_id = current_user_id().
  • Consider column-level masking extensions (e.g., anon) on the stream table output view.
  • Review pgtrickle.list_stream_tables() output for any stream tables selecting from RLS-protected sources.

Scaling Guide

This document provides guidance for scaling pg_trickle to hundreds of stream tables and beyond. It covers worker pool sizing, scheduler tuning, and diagnostic queries for identifying bottlenecks.

Architecture Overview

pg_trickle uses a two-tier background worker model:

  1. Launcher — one per server. Scans pg_database every 10 seconds, spawns per-database schedulers, and auto-restarts crashed workers.
  2. Per-database scheduler — one per database. Wakes every scheduler_interval_ms (default: 1 s), reads DAG changes from shared memory, consumes CDC buffers, and dispatches refreshes.

When parallel_refresh_mode = 'on', the scheduler dispatches refresh work to a pool of dynamic background workers instead of running refreshes inline.

Worker Pool Sizing

Deployment SizeStream TablesRecommended max_dynamic_refresh_workersNotes
Small1–202–4Default (4) is usually sufficient
Medium20–1004–8Monitor worker saturation
Large100–2008–16Enable tiered scheduling
Very Large200+16–32Tune per-database quotas

Budget Formula

Worker slots are drawn from max_worker_processes, which is shared with autovacuum, parallel queries, and other extensions:

max_worker_processes >= launchers(1)
                      + schedulers(N_databases)
                      + max_dynamic_refresh_workers
                      + autovacuum_max_workers
                      + max_parallel_workers
                      + other_extensions

Example for 200 STs across 2 databases with 16 workers:

# postgresql.conf
max_worker_processes = 40
pg_trickle.max_dynamic_refresh_workers = 16
pg_trickle.max_concurrent_refreshes = 8
pg_trickle.per_database_worker_quota = 8
pg_trickle.parallel_refresh_mode = 'on'

Tiered Scheduling

For deployments with 50+ stream tables, enable tiered scheduling to reduce scheduler overhead:

pg_trickle.tiered_scheduling = on   -- default since v0.12.0

The scheduler classifies stream tables into tiers based on change frequency:

TierSchedule MultiplierBehavior
Hot1× (base interval)Tables with frequent changes
WarmTables with moderate changes
Cold10×Tables with rare changes
FrozenskipTables with no recent changes

This reduces the CPU cost of the scheduling loop itself, which can become a bottleneck at 200+ STs when every table is polled every cycle.

Dispatch Priority

When multiple stream tables are ready simultaneously, the scheduler dispatches in priority order:

  1. IMMEDIATE closures — time-critical refresh requests
  2. Atomic groups / Repeatable-read groups / Fused chains — multi-ST units
  3. Singletons — individual stream tables
  4. Cyclic SCCs — strongly-connected components

Within each priority band, the tier sort applies (Hot > Warm > Cold).

Per-Database Quotas and Burst

When per_database_worker_quota > 0, each database gets a guaranteed slice of the worker pool:

  • Normal load (cluster < 80% capacity): database can burst to 150% of its quota using idle capacity from other databases.
  • High load (cluster ≥ 80% capacity): strict quota enforcement.

This prevents a single high-traffic database from starving others.

Monitoring

Worker Pool Status

SELECT * FROM pgtrickle.worker_pool_status();
-- Returns: active_workers, max_workers, per_db_cap, parallel_mode

Active Job Details

SELECT * FROM pgtrickle.parallel_job_status(300);
-- Returns recent jobs (last 300s): status, duration, worker PID, etc.

Health Summary

SELECT * FROM pgtrickle.health_summary();
-- Returns: total/active/error/suspended/stale counts, scheduler status, cache hit rate

Buffer Backlog Check

SELECT * FROM pgtrickle.change_buffer_sizes()
ORDER BY row_count DESC
LIMIT 20;

Identifying Bottlenecks

Is the scheduler loop the bottleneck?

-- If queue depth is consistently > 10 and workers are not saturated,
-- the scheduler loop is the bottleneck. Reduce scheduler_interval_ms.
SELECT active_workers, max_workers
FROM pgtrickle.worker_pool_status();

Are workers saturated?

-- If active_workers == max_workers consistently, increase the pool.
SELECT active_workers >= max_workers AS saturated
FROM pgtrickle.worker_pool_status();

Which STs take the longest?

SELECT st.pgt_schema, st.pgt_name,
       AVG(EXTRACT(EPOCH FROM (h.end_time - h.start_time))) AS avg_sec,
       MAX(EXTRACT(EPOCH FROM (h.end_time - h.start_time))) AS max_sec,
       COUNT(*) AS refreshes
FROM pgtrickle.pgt_refresh_history h
JOIN pgtrickle.pgt_stream_tables st ON st.pgt_id = h.pgt_id
WHERE h.start_time > now() - interval '1 hour'
  AND h.status = 'COMPLETED'
GROUP BY st.pgt_schema, st.pgt_name
ORDER BY avg_sec DESC
LIMIT 20;

Tuning Profiles

Low-Latency (< 50 ms P99)

pg_trickle.scheduler_interval_ms = 200
pg_trickle.parallel_refresh_mode = 'on'
pg_trickle.max_dynamic_refresh_workers = 8
pg_trickle.tiered_scheduling = on

High-Throughput (200+ STs)

pg_trickle.scheduler_interval_ms = 500
pg_trickle.parallel_refresh_mode = 'on'
pg_trickle.max_dynamic_refresh_workers = 16
pg_trickle.max_concurrent_refreshes = 8
pg_trickle.per_database_worker_quota = 8
pg_trickle.tiered_scheduling = on
pg_trickle.merge_work_mem_mb = 128

Resource-Constrained (4 CPU / 8 GB RAM)

pg_trickle.scheduler_interval_ms = 2000
pg_trickle.parallel_refresh_mode = 'on'
pg_trickle.max_dynamic_refresh_workers = 2
pg_trickle.max_concurrent_refreshes = 2
pg_trickle.tiered_scheduling = on
pg_trickle.delta_work_mem_cap_mb = 256
pg_trickle.merge_work_mem_mb = 32

Profiling Methodology

To profile worker utilization at scale, run a test with 200+ stream tables and max_workers set to 4, 8, and 16 in turn. Collect the following metrics at 1-second intervals:

-- Worker pool utilization over time
SELECT now() AS ts,
       (SELECT active_workers FROM pgtrickle.worker_pool_status()) AS active,
       (SELECT max_workers FROM pgtrickle.worker_pool_status()) AS pool_size,
       (SELECT COUNT(*) FROM pgtrickle.parallel_job_status(5)
        WHERE status = 'QUEUED') AS queue_depth;

Plot active / pool_size (utilization) and queue_depth over time. If utilization is consistently > 90% with non-zero queue depth, the pool is undersized. If utilization is < 50%, the pool is oversized and consuming max_worker_processes slots unnecessarily.

Known Scaling Limits

ResourcePractical LimitBottleneck
Stream tables per DB~500Scheduler loop CPU
Worker pool size64GUC max
Change buffer rowsmax_buffer_rows (default 1M)Disk I/O
Template cache size128 entries (L1)Evictions increase at >128 STs
DAG depth~20 levelsTopological sort + cascade latency

Read Replicas & Hot Standby

Added in v0.19.0 (SCAL-1 / STAB-2).

pg_trickle is a primary-only extension. Stream tables are maintained by the background scheduler through DML (INSERT, DELETE, MERGE), which is only possible on the primary server.

Behaviour on Replicas

When the pg_trickle shared library is loaded on a read replica (physical standby or streaming replica):

  1. The launcher worker detects pg_is_in_recovery() = true and enters a sleep loop, checking every 30 seconds for promotion.
  2. Upon promotion (e.g. pg_promote()), the launcher resumes normal operation and spawns per-database schedulers.
  3. Manual refresh calls (pgtrickle.refresh_stream_table()) on a replica are rejected with a clear error message.
  • Include pg_trickle in shared_preload_libraries on both primary and replicas. This ensures immediate availability after failover without a restart.
  • Stream tables are read-queryable on replicas via physical replication — the storage tables are regular PostgreSQL tables that replicate normally.
  • Monitor the replication lag to estimate stream table staleness on replicas.

CNPG & Kubernetes Operations

Added in v0.19.0 (SCAL-3).

CloudNativePG (CNPG) is the recommended Kubernetes operator for running pg_trickle. The extension is packaged as a custom container image that extends the official PostgreSQL image.

Container Image

Build the pg_trickle image using the provided Dockerfiles:

# GHCR image (multi-stage build)
docker build -f Dockerfile.ghcr -t pg-trickle:latest .

# Or use the CNPG-specific Dockerfile
docker build -f cnpg/Dockerfile.ext -t pg-trickle-cnpg:latest .

CNPG Cluster Configuration

apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: pg-trickle-cluster
spec:
  instances: 3
  imageName: your-registry/pg-trickle:0.19.0
  postgresql:
    shared_preload_libraries:
      - pg_trickle
    parameters:
      pg_trickle.enabled: "true"
      pg_trickle.scheduler_interval_ms: "1000"
      pg_trickle.max_concurrent_refreshes: "4"
      # STAB-1: If using PgBouncer sidecar in transaction mode:
      # pg_trickle.connection_pooler_mode: "transaction"

Operational Notes

  • Failover: pg_trickle detects promotion automatically (see Read Replicas above). After CNPG promotes a replica, the launcher starts within 30 seconds.
  • Scaling replicas: Stream table data replicates to all replicas via physical replication. No pg_trickle-specific configuration needed on replicas.
  • Backup: Use CNPG's built-in Barman backup. pg_trickle's catalog tables are included automatically. See Backup & Restore.
  • Monitoring: The Prometheus endpoint (pgtrickle.health_summary()) is compatible with CNPG's monitoring sidecar. See the Grafana dashboards in monitoring/grafana/.

Cluster-wide Worker Fairness (v0.27.0)

When pg_trickle is installed across multiple databases on the same PostgreSQL instance, all scheduler background workers share a single worker pool bounded by pg_trickle.max_dynamic_refresh_workers. Without care, high-throughput databases can starve lower-priority databases of worker slots.

Quota allocation

Use the quota formula to distribute workers fairly:

per_db_quota = ceil(max_dynamic_refresh_workers / N_databases)

For high-priority databases, increase their individual quota via ALTER DATABASE SET:

ALTER DATABASE tenant_high SET pg_trickle.max_dynamic_refresh_workers = 4;

Monitoring cluster-wide allocation

Use pgtrickle.cluster_worker_summary() to monitor allocation in real time:

SELECT db_name, active_workers, total_active_workers
FROM pgtrickle.cluster_worker_summary()
ORDER BY active_workers DESC;

See docs/integrations/multi-tenant.md for the complete multi-tenant deployment guide, Prometheus configuration, and Grafana dashboard snippets.

Per-database Prometheus labels

From v0.27.0, all metrics include db_oid and db_name labels so Grafana dashboards can filter by database without requiring separate scrape targets:

rate(pg_trickle_refreshes_total{db_name="tenant_a"}[5m])

See also: Capacity Planning · Configuration · Multi-Database Deployments · Performance Cookbook · Cost Model · Pre-Deployment Checklist

Capacity Planning

This page helps you estimate the resources pg_trickle will need before you put it in production: disk for change buffers and WAL, memory for refresh execution, CPU for the scheduler and refresh workers, and connection budget for background workers.

The rules of thumb here are starting points. The Performance Cookbook and Scaling Guide cover how to tune once you have real data to work with.


Quick sizing table

Deployment sizeStream tablesSource tablesSustained write rateRecommended starting config
Small1–201–20< 100/sAll defaults
Medium20–10020–100100–1,000/sparallel_refresh_mode=on, max_dynamic_refresh_workers=4
Large100–50050–5001,000–10,000/stiered_scheduling=on, max_dynamic_refresh_workers=8, WAL CDC
Very large500+500+> 10,000/sAdd per-database quotas; consider Citus

Disk: change buffers

Each source table referenced by a stream table gets its own change buffer (pgtrickle_changes.changes_<oid>). One row per captured change.

Per-row size estimate (trigger CDC):

~ row_overhead (24 B)
+ key_columns (≈ 2 × avg_key_size)
+ referenced_columns (sum of referenced col sizes)
+ bitmap (1–2 B for narrow tables, ~ ncols/8 otherwise)

Rule of thumb: budget ~1.5 KB per captured change for a typical wide-row OLTP table, ~150 B for a narrow lookup table.

Steady-state size:

buffer_bytes ≈ writes_per_second × refresh_interval × per_row_size × (1 - compaction_ratio)

Compaction collapses cancelling INSERT/DELETE pairs and successive updates to the same row. Typical compaction ratios:

WorkloadCompaction ratio
Append-only event log0%
Mixed OLTP30–60%
High-churn (frequent UPDATEs to same key)70–95%

Worked example. A source table doing 5,000 writes/s, refreshed every 5 s, with 50% compaction and 1 KB rows:

5000 × 5 × 1024 × 0.5 ≈ 12.8 MiB per refresh cycle

That is the peak size of the buffer between refreshes. After the refresh, the consumed rows are deleted (or TRUNCATEd depending on pg_trickle.cleanup_use_truncate).

Alerts. Set pg_trickle.buffer_alert_threshold (default 100000 rows) so a WARNING is logged before a buffer becomes unbounded.


Disk: WAL retention (WAL CDC mode)

If you use pg_trickle.cdc_mode = 'auto' or 'wal', each source table gets a logical replication slot. PostgreSQL retains WAL until every active slot has consumed it.

Worst-case retention = slot_lag_critical_threshold_mb (default 1024 MB). Add this per source to your WAL disk budget if you expect occasional refresh delays.

Recommended monitoring:

SELECT * FROM pgtrickle.check_cdc_health()
WHERE severity != 'OK';

Memory: refresh execution

Each refresh runs a MERGE (DIFFERENTIAL) or full INSERT … SELECT (FULL). Memory usage is dominated by hash tables and sorts:

peak_memory ≈ work_mem × (number of hash/sort nodes in the plan)

For most stream tables, work_mem = 64 MB is comfortable. Wide joins or large GROUP BY may benefit from 256 MB. Use pg_trickle.merge_work_mem_mb to set it per-refresh without affecting the rest of the database.

pg_trickle.spill_threshold_blocks controls when intermediate results spill to disk; raise it on memory-rich servers.


CPU and worker processes

The scheduler is one background worker per database. Refresh work is either inline (default) or dispatched to a dynamic worker pool (pg_trickle.parallel_refresh_mode = on).

max_worker_processes  ≥  1 (launcher)
                       + N (one scheduler per pg_trickle database)
                       + max_dynamic_refresh_workers
                       + autovacuum_max_workers
                       + max_parallel_workers
                       + other extensions

A typical safe starting point:

# postgresql.conf
max_worker_processes              = 32
max_parallel_workers              = 8
pg_trickle.max_dynamic_refresh_workers = 4

Defaults of 8 are usually too low — the Pre-Deployment Checklist calls this out as the most common silent misconfiguration.


DAG topology and scheduling overhead

The scheduler walks the dependency DAG every tick (default 1 s). Per-stream-table overhead is small (sub-millisecond) but not zero. For very large DAGs:

Stream tablesRecommended scheduler interval
< 501000 ms (default)
50–2001000–2000 ms, plus tiered_scheduling=on
200–10002000–5000 ms + Hot/Warm/Cold tiers
1000+Consider splitting across databases

The scheduler's zero-change overhead is documented in README – Zero-Change Latency (target < 10 ms).


Connection budget

Each parallel-refresh worker uses one PostgreSQL backend slot. The scheduler uses one. The launcher uses one. So:

backends_used_by_pg_trickle = 1 + databases + max_dynamic_refresh_workers

If you front PostgreSQL with PgBouncer, this is separate from your application's pool — pg_trickle's background workers connect directly, not through the pooler.


Network (Citus & multi-node)

In Citus deployments, the coordinator polls each worker's WAL slot on every scheduler tick via dblink. Bandwidth scales with the delta volume, not the source-table size, but you still need a fast and reliable coordinator-to-worker network.

Plan for:

  • One TCP connection per worker per polling cycle.
  • Short, frequent reads.
  • Tolerance for individual worker failures — pg_trickle.citus_worker_retry_ticks controls when failures escalate to WARNING.

Forecasting growth

A workable rough model for a year of growth:

year_1_disk = current_buffer_peak × growth_factor
            + current_storage × growth_factor × number_of_stream_tables
            + WAL_retention_budget

Stream-table storage itself is just an ordinary heap table — its size is the size of the result set, no different from a materialized view.


Sanity-check queries

-- Per-source change-buffer size
SELECT * FROM pgtrickle.change_buffer_sizes()
ORDER BY pending_rows DESC;

-- Per-stream-table refresh stats
SELECT pgt_name, last_full_ms, last_diff_ms, p95_ms
FROM pgtrickle.st_refresh_stats()
ORDER BY p95_ms DESC NULLS LAST;

-- Worker-pool saturation
SELECT * FROM pgtrickle.worker_pool_status();

See also: Scaling Guide · Performance Cookbook · Configuration · Pre-Deployment Checklist

Multi-Database Deployments

pg_trickle is multi-database aware. A single PostgreSQL server can host pg_trickle in any number of databases simultaneously, and the extension's background workers handle the fan-out automatically. You do not need to start anything per database; the launcher discovers them.


Architecture in one diagram

┌─────────────────────────────────────────────────────────────┐
│                  PostgreSQL 18 server                        │
│                                                              │
│  ┌────────────┐                                              │
│  │  Launcher  │  ── scans pg_database every ~10 s            │
│  └─────┬──────┘                                              │
│        │ spawns                                              │
│   ┌────┼────────────────────┬────────────────────┐           │
│   ▼    ▼                    ▼                    ▼           │
│ ┌──────────┐         ┌──────────┐         ┌──────────┐       │
│ │Scheduler │         │Scheduler │         │Scheduler │       │
│ │   db_a   │         │   db_b   │         │  db_etl  │       │
│ └────┬─────┘         └────┬─────┘         └────┬─────┘       │
│      │ refresh jobs       │ refresh jobs       │             │
│      ▼                    ▼                    ▼             │
│  ┌─────────────────────────────────────────────────┐         │
│  │ Shared dynamic refresh worker pool               │         │
│  │ (max_dynamic_refresh_workers, per-DB quotas)     │         │
│  └─────────────────────────────────────────────────┘         │
└─────────────────────────────────────────────────────────────┘
  • One launcher per PostgreSQL server.
  • One scheduler per database that has pg_trickle installed.
  • One shared dynamic worker pool for parallel refreshes.

The launcher restarts crashed schedulers automatically. Each scheduler is fully independent; failure in one database does not affect the others.


Enabling pg_trickle in additional databases

Just install the extension. The launcher will pick it up on the next discovery cycle (within ~10 s).

\c db_b
CREATE EXTENSION pg_trickle;

Verify the scheduler started:

SELECT * FROM pgtrickle.cluster_worker_summary();

Expected: one row per database with a column showing the scheduler PID and uptime.


Resource budgeting across databases

Background-worker slots are a finite, server-wide resource. The formula:

max_worker_processes ≥ launchers(1)
                     + schedulers(N_databases)
                     + max_dynamic_refresh_workers
                     + autovacuum_max_workers
                     + max_parallel_workers
                     + other_extensions

For 4 databases each running pg_trickle, with 8 dynamic refresh workers and modest autovacuum:

max_worker_processes                    = 32
max_parallel_workers                    = 8
pg_trickle.max_dynamic_refresh_workers  = 8
pg_trickle.per_database_worker_quota    = 4

Without per_database_worker_quota, one busy database can starve the others. Set it to max_dynamic_refresh_workers / N_databases or higher.


Cluster-wide observability

-- One row per database with per-database stats
SELECT * FROM pgtrickle.cluster_worker_summary();

-- Combined health across every database
SELECT datname, severity, message
FROM pgtrickle.cluster_health_check()    -- (where exposed)
WHERE severity != 'OK';

The pgtrickle workers command also aggregates across databases when invoked against a server-level URL.


Common patterns

Per-tenant database

If you isolate tenants by database, each gets its own scheduler. Combined with tiered_scheduling = on, low-traffic tenants pay almost no scheduler cost.

App database + analytics database

A common topology: app_db runs OLTP and a few IMMEDIATE stream tables; analytics_db (often a logical replica) runs the heavy DIFFERENTIAL aggregates. The launcher handles both.

Tenant-of-tenants (Citus)

For very large multi-tenant deployments, see Citus — distributed sources can replace per-tenant databases.


Caveats

  • pg_trickle does not create cross-database stream tables. Stream tables live in exactly one database; their sources must live there too. Use logical replication or downstream publications to bridge databases.
  • shared_preload_libraries = 'pg_trickle' is server-wide. The launcher then discovers per-database state.
  • pg_trickle.enabled = off disables the launcher (and therefore every scheduler) — use it for maintenance windows.

See also: Scaling Guide · Capacity Planning · Configuration

Backup and Restore

pg_trickle plays nicely with every standard PostgreSQL backup mechanism — pg_dump, pg_basebackup, pgBackRest, WAL archiving, PITR, and pre-built tools like CloudNativePG and Crunchy Operator. The catalog, change buffers, and stream-table contents are all ordinary PostgreSQL relations, so they get backed up like anything else.

This page walks through the recommended workflows, the gotchas, and how the v0.27 Snapshots API fits in.

TL;DR. Physical backups (pgBackRest, pg_basebackup) just work. pg_dump works too, with one small ordering rule. Snapshots are an application-level tool for derived state, not a backup replacement.


Choosing the right tool

ToolBest forNotes
pgBackRest / WAL-G / pg_basebackupProduction backup & PITRFull-fidelity; no special pg_trickle steps
pg_dump / pg_restoreLogical copies, dev environments, schema migrationWorks; restore order matters slightly
Stream-table snapshotsReplica bootstrap, archival of derived state, fast rollback of one stream tableNot a substitute for a real backup

Physical backups (pgBackRest, pg_basebackup, WAL-G)

Physical backups copy the data directory at the file-system level. Everything is captured: source tables, stream-table storage, the pgtrickle.* catalog, the pgtrickle_changes.* change buffers, and (in WAL CDC mode) the replication slots' on-disk state.

Restore procedure:

  1. Restore the data directory exactly as you would for any PostgreSQL database.
  2. Start PostgreSQL.
  3. The pg_trickle launcher discovers each database on the next tick (~10 s) and resumes the per-database scheduler.

There is nothing pg_trickle-specific to do.

Point-in-time recovery (PITR). PITR works as expected. If you recover to a point in the middle of a refresh, that refresh is marked failed in pgtrickle.pgt_refresh_history on first start; the next scheduler tick re-runs it. No data loss.

WAL CDC slots after restore. If you were running in pg_trickle.cdc_mode = 'wal' and the restored cluster came up without the original slots (e.g. a logical-decoding replica that did not inherit slots), pg_trickle's scheduler detects the absence and re-bootstraps trigger CDC for the affected sources. You will see one WARNING per source; the system continues to work.


Logical backups (pg_dump / pg_restore)

pg_dump produces a portable SQL script (or directory archive) that can be replayed into a fresh database. pg_trickle objects are included automatically because they are normal extension objects.

The one ordering rule: restore must follow the standard PostgreSQL "schema, then data, then constraints/indexes" order. pg_restore --section=pre-data --section=data --section=post-data does this for you. Avoid hand-editing the dump to interleave sections.

# Create the dump (custom or directory format)
pg_dump --format=custom --file=mydb.dump mydb

# Restore into a fresh database
createdb mydb_restored
pg_restore --dbname=mydb_restored --jobs=4 mydb.dump

Then, if you want to verify everything came back:

-- Should list every stream table
SELECT * FROM pgtrickle.pgt_status();

-- Force a refresh on each one to confirm CDC is wired
SELECT pgtrickle.refresh_stream_table(pgt_name)
FROM pgtrickle.stream_tables_info;

What pg_dump does and does not capture

ObjectCaptured by pg_dump?
Source tables (your data)
Stream-table storage (your derived data)
pgtrickle.* catalog rows
CDC trigger definitions✅ (recreated when the extension reapplies them)
pgtrickle_changes.* change buffers✅ — but typically empty after a clean dump
WAL replication slots (WAL CDC mode)✕ (slots are not dumpable; the scheduler recreates them)
Refresh history

If you do not need the audit history, you can shrink the dump with pg_dump --exclude-table='pgtrickle.pgt_refresh_history'.


Stream-table snapshots vs. backups

Snapshots (v0.27+) are an application-level mechanism for capturing the contents of one stream table at a chosen point. They are great for:

  • Bootstrapping a replica without re-running a slow full refresh.
  • Archiving a slowly-changing dimension daily.
  • Rolling one stream table back after a defining-query mistake.

They are not a backup of your database. Use them in addition to, not instead of, pgBackRest / pg_dump.

A reasonable production posture:

  • Daily pgBackRest backup.
  • Snapshots of your most important stream tables on the cadence that matches your business RPO.
  • WAL retention sized to PITR window.

Backup and restore on Kubernetes (CNPG)

CloudNativePG handles backup orchestration via Barman / object storage. pg_trickle is fully compatible:

  • Use Cluster.spec.backup exactly as you would for any other PG cluster.
  • After a Cluster.spec.bootstrap.recovery operation, the pg_trickle launcher resumes automatically.
  • For very large stream tables, consider taking pre-backup snapshots and restoring them on the new cluster to skip an initial full refresh.

See CloudNativePG integration.


Disaster-recovery checklist

  • Backup tool of choice configured (pgBackRest / WAL-G / CNPG / managed service).
  • WAL retention window ≥ your PITR target.
  • If using WAL CDC: alerting on pg_trickle.slot_lag_critical_threshold_mb.
  • Periodic snapshot of business-critical stream tables.
  • Documented restore procedure tested at least once (snapshot → fresh database → pg_trickle.health_check()).
  • Off-site copy of backups (managed service, S3 with cross-region replication, etc.).
  • Monitoring on pg_trickle.pgt_refresh_history for restore drift.

See also: Snapshots · High Availability and Replication · CloudNativePG integration · Capacity Planning

Snapshots

A snapshot is a point-in-time copy of a stream table's contents, stored as an ordinary PostgreSQL table. Snapshots let you back up derived state, bootstrap a replica, build deterministic test fixtures, or compare two refresh runs without having to re-derive the data.

Available since v0.27.0


Why snapshots?

A stream table's contents are derived — pg_trickle can always recompute them from the source tables. But recomputation is not free, and there are operational situations where having a frozen copy is cheaper, safer, or simpler:

  • Replica bootstrap. When you stand up a new read replica or a fresh environment, you can restore from a snapshot in seconds instead of waiting for an initial full refresh that may take minutes or hours on a large dataset.
  • Point-in-time forensics. Take a snapshot before a risky migration or a suspicious incident; compare it to the live stream table later.
  • Test fixtures. Snapshot a stream table from a representative environment and check it into a test database.
  • Cheap rollback. If a defining-query change goes wrong, restore from the most recent snapshot while you investigate.

Quickstart

Take a snapshot

SELECT pgtrickle.snapshot_stream_table('order_totals');
-- pgtrickle.snapshot_order_totals_1735689421000

The function returns the fully-qualified name of the new snapshot table. By default snapshots live in the pgtrickle schema and are named snapshot_<table>_<epoch_ms>.

You can choose your own name with the optional second argument:

SELECT pgtrickle.snapshot_stream_table(
    'order_totals',
    'archive.order_totals_2026_q1'
);

List snapshots

SELECT * FROM pgtrickle.list_snapshots();

Or filter to a single stream table:

SELECT * FROM pgtrickle.list_snapshots('order_totals');

Restore from a snapshot

SELECT pgtrickle.restore_from_snapshot(
    'order_totals',                                       -- stream table to restore into
    'pgtrickle.snapshot_order_totals_1735689421000'       -- snapshot table
);

After a restore, pg_trickle reinitialises the stream table's frontier so that the next refresh reads only changes that occurred after the snapshot was taken.

Drop an old snapshot

SELECT pgtrickle.drop_snapshot('pgtrickle.snapshot_order_totals_1735689421000');

What's in a snapshot

The snapshot table is a plain PostgreSQL heap table with the same columns as the stream table, including the hidden __pgt_row_id column. That is what allows a restore to map snapshot rows back to their stable identities.

Because the snapshot is an ordinary table, you can:

  • Back it up with pg_dump, copy it elsewhere with pg_dump -t, or move it across databases with \copy.
  • Inspect it freely with regular SQL.
  • Add indexes for read-side workloads (the snapshot is independent of the live stream table).

Operational patterns

Periodic archival

-- Every night, snapshot a slowly-changing dimension
SELECT pgtrickle.snapshot_stream_table(
    'customer_360',
    format('archive.customer_360_%s', to_char(now(), 'YYYY_MM_DD'))
);

-- Keep only the last 30 days
SELECT pgtrickle.drop_snapshot(snapshot_table)
FROM pgtrickle.list_snapshots('customer_360')
WHERE created_at < now() - interval '30 days';

Replica bootstrap

-- On the source: pg_dump the snapshot
pg_dump -t pgtrickle.snapshot_order_totals_1735689421000 mydb > snap.sql

-- On the replica: restore the snapshot, then reattach
psql replicadb < snap.sql
SELECT pgtrickle.restore_from_snapshot(
    'order_totals',
    'pgtrickle.snapshot_order_totals_1735689421000'
);

Disaster recovery

Combine snapshots with regular pg_dump of the source tables. After a restore, pg_trickle's frontier tracking ensures the stream table will catch up correctly when CDC resumes.


Caveats

  • Snapshots are not coordinated across multiple stream tables. If you need a consistent view across several stream tables, take them inside a single transaction and rely on PostgreSQL's MVCC isolation.
  • Snapshots do not freeze the source tables. The "as-of" time is determined by the most recent refresh of the stream table at the moment you take the snapshot.
  • A restore reinitialises the frontier — if you want the stream table to replay changes between the snapshot time and now, the source CDC slots / change buffers must still hold those entries. Otherwise, expect a full refresh on the next cycle.

See also: Backup & Restore · Replica Bootstrap & PITR Alignment (Patterns) · SQL Reference – Lifecycle

High Availability and Replication

This page covers running pg_trickle in production with PostgreSQL replication: how stream tables behave on physical (streaming) replicas, on logical replicas, during failover, and across read-write splits.

Looking for backups instead? See Backup & Restore. Looking for disaster-recovery snapshots? See Snapshots.


Quick answers

QuestionAnswer
Can I run pg_trickle on a physical replica?The extension can be installed, but the scheduler does not refresh on a hot standby. Stream tables on the standby reflect what was promoted from the primary.
Will my stream tables survive a failover?Yes — they are ordinary heap tables. On the new primary, the scheduler resumes from the last persisted frontier.
Can I logically replicate a stream table?Yes — including the hidden __pgt_row_id column. See Downstream Publications.
Does pg_trickle work on a logical replica?Yes — the replica is its own database with its own pg_trickle scheduler.
What about CNPG (Kubernetes)?Fully supported; see CloudNativePG integration.

Physical (streaming) replication

PostgreSQL's streaming replication ships WAL from primary to replica. Stream tables, change buffers, and the pg_trickle catalog are all WAL-logged, so they are byte-for-byte identical on the replica.

On the primary: the scheduler runs and refreshes as normal.

On the replica (hot standby): the scheduler does not refresh — the database is read-only. Stream-table contents are exactly the contents that were on the primary at the replica's replay LSN. Reads from the replica are perfectly valid; writes (and refreshes) are not.

After a failover:

  1. The new primary's pg_trickle launcher detects the database is writable.
  2. The scheduler resumes refreshing from the last persisted frontier.
  3. Any change-buffer rows that arrived between the last refresh and the failover are processed in the next cycle.

Recommended GUCs on a streaming-replica role you might promote:

shared_preload_libraries = 'pg_trickle'
pg_trickle.enabled       = on    # safe to leave on; refresh is gated by writability

Logical replication

A logical replica is a separate database that subscribes to one or more publications on the primary. Each logical replica has its own pg_trickle catalog and its own scheduler.

This makes logical replication a good answer for an analytics replica: replicate the source tables to the analytics replica, install pg_trickle there, and define your stream tables on the replica. Heavy DIFFERENTIAL workloads are isolated from the OLTP primary.

┌──────────────┐        logical                 ┌────────────────────┐
│   primary    │  publication: source tables    │  analytics replica │
│   (writes)   │ ─────────────────────────────▶ │  (heavy STs here)  │
└──────────────┘                                 └────────────────────┘

Stream tables on the analytics replica are independent of any stream tables on the primary.


Replicating stream tables themselves

If you want a downstream system to receive stream-table changes (another PostgreSQL, Debezium, Kafka, …), use downstream publications:

SELECT pgtrickle.stream_table_to_publication('order_totals');

The publication exposes the storage table's INSERT/DELETE events, including the __pgt_row_id column. Subscribers receive the materialised data only — they do not need to know that pg_trickle is generating it.


Failover behaviour in detail

When the primary fails and a replica is promoted:

StageWhat happens
PromotionReplica becomes writable; its pg_trickle launcher detects this.
Scheduler restartScheduler resumes from the last persisted frontier (catalog row).
Change-buffer rowsAny rows captured before the failover are still in the buffers (they're WAL-logged). They are processed in the next refresh.
In-flight refreshAn interrupted refresh is marked failed in pgt_refresh_history and retried automatically (subject to the fuse).
WAL CDC slotsIf using WAL CDC, the slots exist on the replica (slots are WAL-logged in PG ≥ 17 with failover_slots). On older versions, the scheduler recreates them and falls back briefly to triggers.

CNPG (Kubernetes) specifics

  • Use the OCI extension image: ghcr.io/trickle-labs/pg_trickle-ext:<version>.
  • CNPG's standby cluster topology is supported — pg_trickle behaves exactly as on bare-metal streaming replication.
  • Cluster.spec.postgresql.shared_preload_libraries must include pg_trickle.
  • Use Cluster.spec.postgresql.parameters for pg_trickle.* GUCs.

Full example: see integrations/cloudnativepg.md and the cnpg/ directory in the repository.


Read-write splits with PgBouncer

pg_trickle's background workers connect directly to PostgreSQL, not through a pooler. Your application can use PgBouncer (in transaction-pool mode, including Supabase / Railway / Neon) freely.

For a read-only replica behind a pooler:

  • Reads from stream tables work exactly as reads from any table.
  • IMMEDIATE stream tables only update on the primary; on the replica they reflect what's been replayed.

See PgBouncer integration for tuning.


Geographic / cross-region replication

The recommended pattern for cross-region:

  1. Stream tables on the regional primary (low-latency CDC).
  2. Downstream publications for the materialised results.
  3. PostgreSQL logical replication carries them to the remote region.
  4. Optional: pg_trickle in the remote region builds further stream tables on the replicated data.

This keeps the heavy DIFFERENTIAL maintenance close to its source data, and ships only the final materialised diffs over the WAN.


Caveats

  • pg_trickle does not participate in synchronous replication decisions. It is data, not infrastructure.
  • A logical replica that subscribes to source tables but not to the pg_trickle catalog will need to define its own stream tables — they are not auto-created.
  • Promoting a standby with a stale pgtrickle_changes schema (e.g. after a long replication lag) is fine; the next refresh catches up. If the lag was very long, the model may pick FULL instead of DIFFERENTIAL for the catch-up — by design.

See also: Backup & Restore · Snapshots · Downstream Publications · CloudNativePG integration · PgBouncer integration

Upgrading pg_trickle

This guide covers upgrading pg_trickle from one version to another.


-- 1. Check current version
SELECT extversion FROM pg_extension WHERE extname = 'pg_trickle';

-- 2. Replace the binary files (.so/.dylib, .control, .sql)
--    See the installation method below for your platform.

-- 3. Restart PostgreSQL (required for shared library changes)
--    sudo systemctl restart postgresql

-- 4. Run the upgrade in each database that has pg_trickle installed
ALTER EXTENSION pg_trickle UPDATE;

-- 5. Verify the upgrade
SELECT pgtrickle.version();
SELECT * FROM pgtrickle.health_check();

Step-by-Step Instructions

1. Check Current Version

SELECT extversion FROM pg_extension WHERE extname = 'pg_trickle';
-- Returns your current installed version, e.g. '0.9.0'

2. Install New Binary Files

Replace the extension files in your PostgreSQL installation directory. The method depends on how you originally installed pg_trickle.

From release tarball:

# Replace <new-version> with the target release, for example 0.2.3
curl -LO https://github.com/getretake/pg_trickle/releases/download/v<new-version>/pg_trickle-<new-version>-pg18-linux-amd64.tar.gz
tar xzf pg_trickle-<new-version>-pg18-linux-amd64.tar.gz

# Copy files to PostgreSQL directories
sudo cp pg_trickle-<new-version>-pg18-linux-amd64/lib/* $(pg_config --pkglibdir)/
sudo cp pg_trickle-<new-version>-pg18-linux-amd64/extension/* $(pg_config --sharedir)/extension/

From source (cargo-pgrx):

cargo pgrx install --release

3. Restart PostgreSQL

The shared library (.so / .dylib) is loaded at server start via shared_preload_libraries. A restart is required for the new binary to take effect.

sudo systemctl restart postgresql
# or on macOS with Homebrew:
brew services restart postgresql@18

4. Run ALTER EXTENSION UPDATE

Connect to each database where pg_trickle is installed and run:

ALTER EXTENSION pg_trickle UPDATE;

This executes the upgrade migration scripts in order (for example, pg_trickle--0.5.0--0.6.0.sqlpg_trickle--0.6.0--0.7.0.sql). PostgreSQL automatically determines the full upgrade chain from your current version to the new default_version.

5. Verify the Upgrade

-- Check version
SELECT pgtrickle.version();

-- Run health check
SELECT * FROM pgtrickle.health_check();

-- Verify stream tables are intact
SELECT * FROM pgtrickle.stream_tables_info;

-- Test a refresh
SELECT pgtrickle.refresh_stream_table('your_stream_table');

Version-Specific Notes

0.1.3 → 0.2.0

New functions added:

  • pgtrickle.list_sources(name) — list source tables for a stream table
  • pgtrickle.change_buffer_sizes() — inspect CDC change buffer sizes
  • pgtrickle.health_check() — diagnostic health checks
  • pgtrickle.dependency_tree() — visualize the dependency DAG
  • pgtrickle.trigger_inventory() — audit CDC triggers
  • pgtrickle.refresh_timeline(max_rows) — refresh history
  • pgtrickle.diamond_groups() — diamond dependency group info
  • pgtrickle.version() — extension version string
  • pgtrickle.pgt_ivm_apply_delta(...) — internal IVM delta application
  • pgtrickle.pgt_ivm_handle_truncate(...) — internal TRUNCATE handler
  • pgtrickle._signal_launcher_rescan() — internal launcher signal

No schema changes to pgtrickle.pgt_stream_tables or pgtrickle.pgt_dependencies catalog tables.

No breaking changes. All v0.1.3 functions and views continue to work as before.

0.2.0 → 0.2.1

Three new catalog columns added to pgtrickle.pgt_stream_tables:

ColumnTypeDefaultPurpose
topk_offsetINTNULLPre-provisioned for paged TopK OFFSET (activated in v0.2.2)
has_keyless_sourceBOOLEAN NOT NULLFALSEEC-06: keyless source flag; switches apply strategy from MERGE to counted DELETE
function_hashesTEXTNULLEC-16: stores MD5 hashes of referenced function bodies for change detection

The migration script (pg_trickle--0.2.0--0.2.1.sql) adds these columns via ALTER TABLE … ADD COLUMN IF NOT EXISTS.

No breaking changes. All v0.2.0 functions, views, and event triggers continue to work as before.

What's also new:

  • Upgrade migration safety infrastructure (scripts, CI, E2E tests)
  • GitHub Pages book expansion (6 new documentation pages)
  • User-facing upgrade guide (this document)

0.2.1 → 0.2.2

No catalog table DDL changes. The topk_offset column needed for paged TopK was already added in v0.2.1.

Two SQL function updates are applied by pg_trickle--0.2.1--0.2.2.sql:

  • pgtrickle.create_stream_table(...)
    • default schedule changes from '1m' to 'calculated'
    • default refresh_mode changes from 'DIFFERENTIAL' to 'AUTO'
  • pgtrickle.alter_stream_table(...)
    • adds the optional query parameter used by ALTER QUERY support

Because PostgreSQL stores argument defaults and function signatures in pg_proc, the migration script must DROP FUNCTION and recreate both signatures during ALTER EXTENSION ... UPDATE.

Behavioral notes:

  • Existing stream tables keep their current catalog values. The migration only changes the defaults used by future create_stream_table(...) calls.
  • Existing applications can opt a table into the new defaults explicitly via pgtrickle.alter_stream_table(...) after the upgrade.
  • After installing the new binary and restarting PostgreSQL, the scheduler now warns if the shared library version and SQL-installed extension version do not match. This helps detect stale .so/.dylib files after partial upgrades.

0.2.2 → 0.2.3

One new catalog column is added to pgtrickle.pgt_stream_tables:

ColumnTypeDefaultPurpose
requested_cdc_modeTEXTNULLOptional per-stream-table CDC override ('auto', 'trigger', 'wal')

The upgrade script also recreates two SQL functions:

  • pgtrickle.create_stream_table(...)
    • adds the optional cdc_mode parameter
  • pgtrickle.alter_stream_table(...)
    • adds the optional cdc_mode parameter

Monitoring view updates:

  • pgtrickle.pg_stat_stream_tables gains the cdc_modes column
  • pgtrickle.pgt_cdc_status is added for per-source CDC visibility

Because PostgreSQL stores function signatures and defaults in pg_proc, the upgrade script drops and recreates both lifecycle functions during ALTER EXTENSION ... UPDATE.

0.6.0 → 0.7.0

One new catalog column is added to pgtrickle.pgt_stream_tables:

ColumnTypeDefaultPurpose
last_fixpoint_iterationsINTNULLRecords how many rounds the last circular-dependency fixpoint run required

Two new catalog tables are added:

TablePurpose
pgtrickle.pgt_watermarksStores per-source watermark progress reported by external loaders
pgtrickle.pgt_watermark_groupsStores groups of sources that must stay temporally aligned before refresh

The upgrade script also updates and adds SQL functions:

  • Recreates pgtrickle.pgt_status() so the result includes scc_id
  • Adds pgtrickle.pgt_scc_status() for circular-dependency monitoring
  • Adds pgtrickle.advance_watermark(source, watermark)
  • Adds pgtrickle.create_watermark_group(name, sources[], tolerance_secs)
  • Adds pgtrickle.drop_watermark_group(name)
  • Adds pgtrickle.watermarks()
  • Adds pgtrickle.watermark_groups()
  • Adds pgtrickle.watermark_status()

Behavioral notes:

  • Circular stream table dependencies can now run to convergence when pg_trickle.allow_circular = true and every member of the cycle is safe for monotone DIFFERENTIAL refresh.
  • The scheduler can now hold back refreshes until related source tables are aligned within a configured watermark tolerance.
  • Existing non-circular stream tables continue to work as before. The new catalog objects are additive.

0.7.0 → 0.8.0

No catalog schema changes. The upgrade migration script contains no DDL.

New operational features:

  • pg_dump / pg_restore support: stream tables are now safely exported and re-connected after restore without manual intervention.
  • Connection pooler opt-in was introduced at the per-stream level (superseded by the more comprehensive pooler_compatibility_mode added in v0.10.0).

No breaking changes. All v0.7.0 functions, views, and event triggers continue to work as before.

0.8.0 → 0.9.0

No catalog schema DDL changes to pgtrickle.pgt_stream_tables or the dependency catalog.

New API function added:

  • pgtrickle.restore_stream_tables() — re-installs CDC triggers and re-registers stream tables after a pg_restore from a pg_dump.

Hidden auxiliary columns for AVG / STDDEV / VAR aggregates. Stream tables using these aggregates will automatically receive hidden __pgt_aux_* columns on the next refresh after upgrading. No manual action is needed — pg_trickle detects missing auxiliary columns and performs a single full reinitialise to add them.

Behavioral notes:

  • COUNT, SUM, and AVG now update in constant time (O(changed rows)) instead of rescanning the whole group.
  • STDDEV and VAR variants likewise update in O(changed rows) via hidden sum-of-squares auxiliary columns.
  • MIN/MAX still requires a group rescan only when the deleted value is the current extreme.
  • Refresh groups (create_refresh_group, drop_refresh_group, refresh_groups()) are available starting from this version.

0.9.0 → 0.10.0

Two new catalog columns added to pgtrickle.pgt_stream_tables:

ColumnTypeDefaultPurpose
pooler_compatibility_modeBOOLEAN NOT NULLFALSEDisables prepared statements and NOTIFY for this stream table — required when accessed through PgBouncer in transaction-pool mode
refresh_tierTEXT NOT NULL'hot'Tiered scheduling tier: hot, warm, cold, or frozen

One new catalog table is added:

TablePurpose
pgtrickle.pgt_refresh_groupsStores refresh groups for snapshot-consistent multi-table refresh

The upgrade script also updates and adds SQL functions:

  • pgtrickle.create_stream_table(...) gains the pooler_compatibility_mode parameter
  • pgtrickle.create_stream_table_if_not_exists(...) likewise
  • pgtrickle.create_or_replace_stream_table(...) likewise
  • pgtrickle.alter_stream_table(...) likewise
  • Adds pgtrickle.create_refresh_group(name, members, isolation)
  • Adds pgtrickle.drop_refresh_group(name)
  • Adds pgtrickle.refresh_groups() — lists all declared groups

Behavioral notes:

  • pooler_compatibility_mode defaults to false. Existing stream tables are unaffected. Enable it only for stream tables accessed through PgBouncer transaction-mode pooling.
  • pg_trickle.auto_backoff now defaults to on (was off). The backoff threshold is raised from 80 % → 95 % and the maximum slowdown is capped at 8× (was 64×). If you relied on the old opt-in behaviour, set pg_trickle.auto_backoff = off explicitly.
  • diamond_consistency now defaults to 'atomic' for new stream tables (was 'none'). Existing stream tables keep their current setting.
  • The scheduler now uses row-level locking for concurrency control instead of session-level advisory locks, making pg_trickle compatible with PgBouncer transaction-pool and similar connection poolers.
  • Statistical aggregates (CORR, COVAR_*, REGR_*) now update incrementally using Welford-style accumulation, no longer requiring a group rescan.
  • Materialized view sources can now be used in DIFFERENTIAL mode when pg_trickle.matview_polling = on is set.
  • Recursive CTE stream tables with DELETE/UPDATE now use the Delete-and-Rederive algorithm (O(delta) instead of O(n)).

0.10.0 → 0.11.0

New catalog columns added to pgtrickle.pgt_stream_tables:

ColumnTypeDefaultPurpose
effective_refresh_modeTEXTNULLActual refresh mode used in the last cycle (FULL / DIFFERENTIAL / APPEND_ONLY / TOP_K / NO_DATA); populated by the scheduler after each completed refresh
fuse_modeTEXT NOT NULL'off'Circuit-breaker mode: off, on, or auto
fuse_stateTEXT NOT NULL'armed'Circuit-breaker state: armed, blown, or disabled
fuse_ceilingBIGINTNULLMaximum change-row count that can pass through in one refresh before the fuse blows; NULL = unlimited
fuse_sensitivityINTNULLSensitivity multiplier for auto-fuse detection
blown_atTIMESTAMPTZNULLTimestamp when the fuse last triggered
blow_reasonTEXTNULLHuman-readable reason the fuse blew
st_partition_keyTEXTNULLPartition key column for declaratively partitioned stream tables; NULL = not partitioned

Updated function signatures — existing calls continue to work because new parameters all have defaults:

  • pgtrickle.create_stream_table(...) gains partition_by TEXT DEFAULT NULL
  • pgtrickle.create_stream_table_if_not_exists(...) likewise
  • pgtrickle.create_or_replace_stream_table(...) likewise
  • pgtrickle.alter_stream_table(...) gains fuse TEXT DEFAULT NULL, fuse_ceiling BIGINT DEFAULT NULL, fuse_sensitivity INT DEFAULT NULL

New functions:

  • pgtrickle.reset_fuse(name TEXT, action TEXT DEFAULT 'apply') — clear a blown fuse and resume scheduling
  • pgtrickle.fuse_status() — returns circuit-breaker state for every stream table
  • pgtrickle.explain_refresh_mode(name TEXT) — shows configured mode, effective mode, and the reason for any downgrade

Behavioral notes:

  • Stream-table-to-stream-table chains now refresh incrementally — downstream tables receive a small insert/delete delta rather than cascading full refreshes.
  • pg_trickle.tiered_scheduling now defaults to on.
  • Declaratively partitioned stream tables are supported via partition_by — the refresh MERGE is automatically restricted to only the changed partitions.

0.11.0 → 0.12.0

No schema changes. This release adds four new diagnostic SQL functions only:

FunctionReturnsPurpose
pgtrickle.explain_query_rewrite(query TEXT)TABLE(pass_name TEXT, changed BOOL, sql_after TEXT)Walk a query through every DVM rewrite pass to see how pg_trickle transforms it
pgtrickle.diagnose_errors(name TEXT)TABLE(event_time TIMESTAMPTZ, error_type TEXT, error_message TEXT, remediation TEXT)Last 5 FAILED refresh events with error classification and suggested fixes
pgtrickle.list_auxiliary_columns(name TEXT)TABLE(column_name TEXT, data_type TEXT, purpose TEXT)List all hidden __pgt_* auxiliary columns on a stream table's storage relation
pgtrickle.validate_query(query TEXT)TABLE(valid BOOL, mode TEXT, reason TEXT)Parse and validate a query for stream-table compatibility without creating one

Behavioral notes:

  • The incremental engine now handles multi-table join deletes correctly — phantom rows after simultaneous deletes from multiple join sides no longer occur.
  • Stream-table-to-stream-table row identity is now computed consistently between the change buffer and the downstream table, eliminating stale duplicate rows after upstream UPDATEs.
  • pg_trickle.tiered_scheduling defaults to on (same as 0.11.0 runtime behaviour; this release makes it the explicit default).

0.12.0 → 0.13.0

Ten new catalog columns added to pgtrickle.pgt_stream_tables:

ColumnTypeDefaultPurpose
effective_refresh_modeTEXTNULLComputed refresh mode after AUTO resolution
fuse_modeTEXT NOT NULL'off'Fuse configuration: off, auto, or manual
fuse_stateTEXT NOT NULL'armed'Current fuse state: armed or blown
fuse_ceilingBIGINTNULLMaximum change count before fuse blows
fuse_sensitivityINTNULLConsecutive cycles above ceiling before triggering
blown_atTIMESTAMPTZNULLTimestamp when the fuse last blew
blow_reasonTEXTNULLReason the fuse blew
st_partition_keyTEXTNULLPartition key specification (RANGE, LIST, or HASH)
max_differential_joinsINTNULLMaximum join count for differential mode (auto-fallback to FULL when exceeded)
max_delta_fractionDOUBLE PRECISIONNULLMaximum delta-to-table ratio for differential mode (auto-fallback to FULL when exceeded)

All columns use ADD COLUMN IF NOT EXISTS for idempotent upgrades.

Nine new SQL functions (plus one replacement with new signature):

FunctionPurpose
pgtrickle.explain_delta(name, format)Delta SQL query plan inspection
pgtrickle.dedup_stats()MERGE deduplication frequency counters
pgtrickle.shared_buffer_stats()Per-source-buffer observability
pgtrickle.explain_refresh_mode(name)Refresh mode decision explanation
pgtrickle.reset_fuse(name)Reset a blown fuse
pgtrickle.fuse_status()Fuse state across all stream tables
pgtrickle.explain_query_rewrite(query)DVM rewrite pass inspection
pgtrickle.diagnose_errors(name)Error classification and remediation
pgtrickle.list_auxiliary_columns(name)Hidden __pgt_* column listing
pgtrickle.validate_query(query)Query compatibility validation
pgtrickle.alter_stream_table(...)(replaced) — new partition_by parameter

New GUC variables:

GUCDefaultPurpose
pg_trickle.per_database_worker_quota0 (auto)Per-database parallel worker limit

Behavioral notes:

  • Shared change buffers: Multiple stream tables reading from the same source now automatically share a single change buffer. No migration action required — existing per-source buffers continue to work.
  • Columnar change tracking: Wide-table UPDATEs that touch only value columns (not GROUP BY / JOIN / WHERE columns) now generate significantly less delta volume. This is fully automatic.
  • Auto buffer partitioning: Set pg_trickle.buffer_partitioning = 'auto' to let high-throughput buffers self-promote to partitioned mode for O(1) cleanup.
  • dbt macros: If you use dbt-pgtrickle, update your macros to the matching v0.13.0 version. New config options: partition_by, fuse, fuse_ceiling, fuse_sensitivity.

No breaking changes. All v0.12.0 functions, views, and event triggers continue to work as before.

0.13.0 → 0.14.0

Two new catalog columns added to pgtrickle.pgt_stream_tables:

ColumnTypeDefaultPurpose
last_error_messageTEXTNULLError message from the last permanent refresh failure
last_error_atTIMESTAMPTZNULLTimestamp of the last permanent refresh failure

Updated function signature (return type gained new columns):

  • pgtrickle.st_refresh_stats() — gains consecutive_errors, schedule, refresh_tier, and last_error_message columns. The upgrade script drops and recreates the function. No behavior change for existing callers that ignore unknown columns.

New SQL functions (available immediately after ALTER EXTENSION ... UPDATE):

FunctionPurpose
pgtrickle.recommend_refresh_mode(name)Workload-based refresh mode recommendation with confidence level
pgtrickle.refresh_efficiency(name)Per-table FULL vs. DIFFERENTIAL performance metrics
pgtrickle.export_definition(name)Export stream table as reproducible DROP+CREATE+ALTER DDL
pgtrickle.convert_buffers_to_unlogged()Convert logged change buffers to UNLOGGED

New GUC variables:

GUCDefaultPurpose
pg_trickle.planner_aggressivetrueConsolidated switch replacing merge_planner_hints + merge_work_mem_mb
pg_trickle.unlogged_buffersfalseCreate new change buffers as UNLOGGED (reduces WAL by ~30%)
pg_trickle.agg_diff_cardinality_threshold1000Warn at creation time when GROUP BY cardinality is below this

Deprecated GUCs (still accepted but ignored at runtime):

  • pg_trickle.merge_planner_hints → use pg_trickle.planner_aggressive
  • pg_trickle.merge_work_mem_mb → use pg_trickle.planner_aggressive

Behavioral notes:

  • Error-state circuit breaker: A single permanent refresh failure (e.g. a function that doesn't exist for the column type) now immediately sets the stream table status to ERROR with a message stored in last_error_message. The scheduler skips ERROR tables. Use pgtrickle.resume_stream_table(name) followed by pgtrickle.alter_stream_table(name, query => ...) to recover.
  • Tiered scheduling NOTICE: Demoting a stream table from hot to cold or frozen now emits a NOTICE so operators are aware the effective refresh interval has changed (10× for cold, suspended for frozen).
  • SECURITY DEFINER triggers: All CDC trigger functions now run with SECURITY DEFINER and an explicit SET search_path, hardening against privilege-escalation attacks. This is applied automatically on upgrade — no manual action needed.

No breaking changes. All v0.13.0 functions, views, and event triggers continue to work as before.


0.14.0 → 0.15.0

No schema changes. New features: interactive dashboard, bulk create_stream_tables_from_schema(), and per-table runaway-refresh protection (max_refresh_duration_ms).

New GUC variables:

GUCDefaultPurpose
pg_trickle.ivm_cache_max_entries0 (unbounded)Bound per-backend IVM delta cache

0.15.0 → 0.16.0

No schema changes. Performance improvements to the delta pipeline and refresh path. L2 catalog-backed template cache (pgtrickle.pgt_template_cache) introduced.


0.16.0 → 0.17.0

No schema changes. Query intelligence improvements: window function differentiation, correlated-sublink rewriting.


0.17.0 → 0.18.0

No schema changes. Hardening pass: tightened unsafe blocks, improved error propagation, delta performance improvements (prepared-statement MERGE path).


0.18.0 → 0.19.0

No schema changes. Security enhancements: SECURITY DEFINER on all public-facing functions, improved RLS awareness in delta generation.


0.19.0 → 0.20.0

New catalog table: pgtrickle.pgt_self_monitoring for extension health metrics. New function: pgtrickle.metrics_summary().


0.20.0 → 0.21.0

No schema changes. Reliability improvements: advisory-lock hardening, WAL-receiver retry, graceful SIGTERM in background workers.


0.21.0 → 0.22.0

No schema changes. New features: downstream CDC pipeline, parallel refresh scheduling, predictive cost model for FULL vs DIFFERENTIAL selection.


0.22.0 → 0.23.0

No schema changes. Performance tuning and diagnostics: delta amplification detection, EXPLAIN capture (PGS_PROFILE_DELTA), adaptive threshold auto-tuning.

New GUC variables:

GUCDefaultPurpose
pg_trickle.delta_amplification_threshold10.0Warn when output/input delta ratio exceeds this
pg_trickle.log_delta_sqlfalseLog resolved delta SQL at DEBUG1

0.23.0 → 0.24.0

No schema changes. Join correctness hardening: phantom-row detection infrastructure, durability improvements for committed change buffers.


0.24.0 → 0.25.0

No schema changes. Scheduler scalability: worker pool, L1 template cache with LRU eviction.

New GUC variables:

GUCDefaultPurpose
pg_trickle.worker_pool_size0Persistent worker pool size
pg_trickle.template_cache_max_entries0L1 delta SQL template cache cap

0.25.0 → 0.26.0

No schema changes. Concurrency hardening: improved lock ordering, stress test suite, fixed MERGE race under high concurrency.


0.26.0 → 0.27.0

New catalog columns added to pgtrickle.pgt_stream_tables:

ColumnTypeDefaultPurpose
last_full_msFLOAT8NULLDuration of last FULL refresh (ms)
auto_thresholdFLOAT8NULLAdaptive FULL/DIFF cost-ratio threshold

New catalog table: pgtrickle.pgt_template_cache for L2 cross-backend delta SQL storage.

New SQL functions:

FunctionPurpose
pgtrickle.snapshot_stream_table(name, dest)Consistent snapshot copy
pgtrickle.restore_from_snapshot(name, source)Restore from snapshot
pgtrickle.list_snapshots(name)List available snapshots
pgtrickle.recommend_schedule(name)SLA-based scheduling recommendation
pgtrickle.schedule_recommendations()Multi-table scheduling report
pgtrickle.cluster_worker_summary()Cross-database scheduler health
pgtrickle.metrics_summary()Prometheus-compatible extension metrics

New GUC variables:

GUCDefaultPurpose
pg_trickle.metrics_port9187Prometheus metrics port
pg_trickle.metrics_request_timeout_ms5000Metrics endpoint timeout
pg_trickle.frontier_holdback_modewarnHoldback action on stale frontier
pg_trickle.frontier_holdback_warn_seconds300Frontier holdback warning threshold
pg_trickle.publication_lag_warn_bytes104857600WAL lag warning threshold
pg_trickle.schedule_recommendation_min_samples20Min samples for schedule recommendation
pg_trickle.schedule_alert_cooldown_seconds300Min interval between schedule alerts
pg_trickle.change_buffer_durabilityunloggedChange buffer WAL level

No breaking changes.


0.27.0 → 0.28.0

New catalog tables: pgtrickle.outbox_events, pgtrickle.inbox_messages, pgtrickle.inbox_dead_letters for transactional outbox and inbox patterns.

New SQL functions: pgtrickle.enable_outbox(name, ...), pgtrickle.enable_inbox(name, ...), and related management functions.

No breaking changes.


0.28.0 → 0.29.0

Added relay catalog tables and SQL functions (set_relay_outbox, set_relay_inbox, enable_relay, disable_relay, delete_relay, get_relay_config, list_relay_configs) and the standalone pgtrickle-relay binary. These were later extracted to pg_tide in v0.46.0.

No breaking changes.


0.29.0 → 0.30.0

No schema changes. All improvements are confined to the Rust extension binary. The migration file (sql/pg_trickle--0.29.0--0.30.0.sql) is empty other than documentation comments.

New GUC variables:

GUCDefaultPurpose
pg_trickle.use_sqlstate_classificationfalseLocale-safe SQLSTATE-based retry classification
pg_trickle.template_cache_max_age_hours168Max age for L2 template-cache entries (hours)
pg_trickle.max_parse_nodes0Parser node-count guard (0 = disabled)

Behavioral changes:

  • restore_from_snapshot() now returns a typed error (SnapshotSchemaVersionMismatch) when the snapshot has no __pgt_snapshot_version column (pre-v0.27 snapshots). Previously it silently treated the missing column as compatible.
  • snapshot_stream_table() and restore_from_snapshot() now wrap critical operations in PostgreSQL subtransactions. A failed catalog INSERT rolls back the snapshot table creation, preventing orphan tables.
  • Cross-cycle phantom rows are now cleaned up unconditionally after every differential refresh of a join query.

No breaking changes.


0.30.0 → 0.31.0

No schema changes. All improvements are confined to the Rust extension binary and scheduler logic.

New GUC variables:

GUCDefaultPurpose
pg_trickle.cost_model_miss_penalty2.0Weight applied to the estimated cost when the planner's row count estimate is inaccurate
pg_trickle.scheduler_hot_tier_interval_ms500Effective polling interval (ms) for Hot-tier stream tables

Behavioral changes:

  • Scheduler now uses a predictive cost model to decide DIFFERENTIAL vs. FULL refresh per cycle; the model activates after pg_trickle.prediction_min_samples samples.
  • Event-driven wake now debounces duplicate NOTIFY payloads within a single tick to avoid redundant wakeups on bulk writes.

No breaking changes.


0.31.0 → 0.32.0

No schema changes. Citus stable naming infrastructure is added without altering the public catalog schema.

Behavioral changes:

  • pgtrickle.source_stable_name(rel_oid) introduced as a deterministic, version-stable WAL slot name for Citus distributed sources.
  • Per-source last_frontier column added to pgtrickle.pgt_stream_tables via ADD COLUMN IF NOT EXISTS — existing rows receive NULL.

No breaking changes.


0.32.0 → 0.33.0

Schema additions — new catalog tables for Citus distributed CDC:

ObjectTypePurpose
pgtrickle.pgt_worker_slotsTableTracks per-worker WAL slot name and last-consumed frontier for each Citus worker / source combination
pgtrickle.pgt_st_locksTableLightweight distributed mutex for cross-coordinator refresh serialisation
pgtrickle.citus_statusViewPer-(stream table, source, worker) CDC health view

New SQL functions:

  • pgtrickle.ensure_worker_slot(st_name, worker_host, worker_port) — creates the WAL slot on a Citus worker if it does not exist.
  • pgtrickle.poll_worker_slot_changes(st_name, worker_host, worker_port) — drains pending WAL changes from a worker slot into the coordinator change buffer.
  • pgtrickle.handle_vp_promoted(payload TEXT) — processes a pg_ripple.vp_promoted NOTIFY payload and signals the scheduler.
  • pgtrickle.check_citus_version_compat() — verifies that all worker nodes run the same pg_trickle version.
  • pgtrickle.check_worker_wal_level() — verifies that wal_level = logical on every worker.

New create_stream_table() parameter:

  • output_distribution_column TEXT — when provided (and Citus is installed), converts the output storage table to a Citus distributed table on that column immediately after creation.

New GUC:

GUCDefaultPurpose
pg_trickle.citus_st_lock_lease_ms60000Duration (ms) of the pgt_st_locks lease for cross-node coordination

No application-level breaking changes. Existing stream tables on non-Citus deployments are completely unaffected.


0.33.0 → 0.34.0

Schema additions — the pgtrickle.citus_status view gains five new columns:

ColumnTypeDescription
last_polled_attimestamptzTimestamp of the last successful per-worker poll
lease_holdertextSession holding the pgt_st_locks lease (NULL when unlocked)
lease_acquired_attimestamptzWhen the current lease was acquired
lease_expires_attimestamptzWhen the current lease expires
lease_healthtext'unlocked' / 'locked' / 'expired'

Behavioral changes:

  • The scheduler now drives the full per-worker slot lifecycle automatically for stream tables with source_placement = 'distributed': ensure_worker_slot() on first tick (and after topology changes), poll_worker_slot_changes() on every tick, and pgt_st_locks lease acquire/extend/release. Manual wiring via LISTEN "pg_ripple.vp_promoted" + handle_vp_promoted() is no longer required (though harmless if left in place).
  • Shard rebalance auto-recovery: the scheduler detects pg_dist_node topology changes, prunes stale pgt_worker_slots rows, inserts new ones, and marks the stream table for a full refresh — no operator intervention required.
  • Worker failure isolation: per-worker poll_worker_slot_changes() failures are caught, logged, and skipped for that tick; healthy workers continue uninterrupted.

New GUC:

GUCDefaultPurpose
pg_trickle.citus_worker_retry_ticks5Consecutive per-worker poll failures before emitting a WARNING and flagging in citus_status. Set to 0 to disable.

Migration note:

ALTER EXTENSION pg_trickle UPDATE TO '0.34.0';

The migration script adds the five new columns to citus_status via CREATE OR REPLACE VIEW. No data loss.

No breaking changes. Non-Citus deployments are completely unaffected.


0.34.0 → 0.35.0

New catalog columns added to pgtrickle.pgt_stream_tables:

ColumnTypeDefaultPurpose
in_shadow_buildBOOLEAN NOT NULLFALSEWhether this stream table is currently undergoing zero-downtime schema evolution
shadow_table_nameTEXTNULLName of the shadow table being built during schema evolution

New catalog table:

TablePurpose
pgtrickle.pgt_subscriptionsStores reactive subscription registrations (NOTIFY channel → stream table mappings)

New SQL functions:

FunctionPurpose
pgtrickle.subscribe(stream_table TEXT, channel TEXT)Register a NOTIFY channel to fire after each refresh cycle
pgtrickle.unsubscribe(stream_table TEXT, channel TEXT)Remove a subscription
pgtrickle.list_subscriptions()List all active subscriptions
pgtrickle.sla_summary()Return p50/p99 latency, freshness lag, error rate, and budget over the SLA window
pgtrickle.explain_stream_table(name TEXT)Human-readable DVM configuration and refresh mode explanation
pgtrickle.view_evolution_status()Status of in-progress shadow table builds

Behavioral notes:

  • Zero-downtime schema evolution (ALTER STREAM TABLE) now builds a shadow table in the background and cuts over atomically. The in_shadow_build column tracks progress; check pgtrickle.view_evolution_status() to monitor.
  • pgtrickle.sla_summary() queries pgt_refresh_history using the pg_trickle.sla_window_hours GUC (default 24 h).
  • Reactive subscriptions emit pg_notify(channel, '') after each non-empty refresh cycle. Debounce interval is controlled by pg_trickle.notify_coalesce_ms.

New GUCs:

GUCDefaultDescription
pg_trickle.cdc_pausedfalsePause CDC trigger writes (discard mode — see CONFIGURATION.md)
pg_trickle.notify_coalesce_ms250Debounce window (ms) for reactive subscription NOTIFY calls
pg_trickle.sla_window_hours24Reporting window (h) for sla_summary()
pg_trickle.history_prune_interval_seconds60Interval between pgt_refresh_history cleanup sweeps

Migration note:

ALTER EXTENSION pg_trickle UPDATE TO '0.35.0';

No breaking changes. All v0.34.0 functions continue to work.


0.35.0 → 0.36.0

New catalog columns added to pgtrickle.pgt_stream_tables:

ColumnTypeDefaultPurpose
temporal_modeBOOLEAN NOT NULLFALSEEnable temporal IVM (SCD Type 2) tracking
storage_backendTEXT NOT NULL'heap'Storage backend: 'heap', 'citus', or 'pg_mooncake'
column_lineageJSONBNULLColumn-level lineage mapping output columns to source tables/columns

New SQL functions:

FunctionPurpose
pgtrickle.drain(timeout_s INT)Gracefully quiesce all in-flight refreshes
pgtrickle.is_drained()Check whether the scheduler is fully drained
pgtrickle.bulk_alter_stream_tables(names TEXT[], params JSONB)Alter multiple stream tables in one call
pgtrickle.bulk_drop_stream_tables(names TEXT[])Drop multiple stream tables in one call
pgtrickle.stream_table_lineage(name TEXT)Return column-level lineage for a stream table
pgtrickle.exec_stream_ddl(cmd TEXT)Execute DDL in the stream-table DDL sandbox

Updated function signatures:

  • pgtrickle.create_stream_table(...) gains temporal BOOLEAN DEFAULT FALSE and storage_backend TEXT DEFAULT 'heap' parameters.

Behavioral notes:

  • Temporal IVM (CORR-1): Stream tables created with temporal := true maintain SCD Type 2 history. Each row carries __pgt_valid_from TIMESTAMPTZ and __pgt_valid_to TIMESTAMPTZ. Existing tables are unaffected.
  • Alternative storage backends: storage_backend = 'citus' creates the stream table storage as a Citus distributed table; 'pg_mooncake' uses columnar storage. Both require the respective extensions to be installed.
  • Drain mode (A35): pgtrickle.drain() is a safety mechanism for maintenance windows. The scheduler completes all in-flight refreshes, then stops dispatching new ones until the drain is cancelled or the server restarts.
  • WAL slot backpressure (A12): The pg_trickle.enforce_backpressure GUC is now wired — when slot lag exceeds slot_lag_critical_threshold_mb, CDC writes are suppressed. See CONFIGURATION.md for details and the discard semantics of cdc_paused.

New GUCs:

GUCDefaultDescription
pg_trickle.enforce_backpressurefalseSuppress CDC writes when WAL slot lag exceeds critical threshold
pg_trickle.log_format'text'Structured log format: 'text' or 'json'
pg_trickle.temporal_stream_tablesfalseMaster switch for temporal IVM support

TRUNCATE and CDC semantics: When pg_trickle.cdc_paused = on, CDC trigger bodies return NULL — changes are discarded. This is the discard mode. After un-pausing, stream tables must be reinitialized (FULL refresh) to recover from the gap. A future cdc_capture_mode = 'hold' option is planned.

Migration note:

ALTER EXTENSION pg_trickle UPDATE TO '0.36.0';

No breaking changes. The column_lineage column is additive. The temporal_mode and storage_backend columns have safe defaults.


0.36.0 → 0.37.0

Schema change: All existing change buffer tables in pgtrickle_changes.* gain a __pgt_trace_context TEXT column via dynamic ALTER TABLE. This is applied automatically by the upgrade script.

Behavioral notes:

  • W3C Trace Context (F10): When pg_trickle.enable_trace_propagation = true, CDC triggers capture the session pg_trickle.trace_id GUC into the __pgt_trace_context column. At refresh time, the stored trace context is propagated to any OTLP span exported to pg_trickle.otel_endpoint.
  • pgVectorMV (F4): avg(vector_col) and sum(vector_col) in defining queries are now handled incrementally when pg_trickle.enable_vector_agg = true. Requires pgvector ≥ 0.7.0.

New GUCs:

GUCDefaultDescription
pg_trickle.enable_vector_aggfalseEnable incremental vector aggregate operators
pg_trickle.enable_trace_propagationfalseEnable W3C Trace Context propagation
pg_trickle.otel_endpoint''OTLP/gRPC endpoint for span export (empty = off)
pg_trickle.trace_id''Session-level W3C traceparent header

Migration note:

ALTER EXTENSION pg_trickle UPDATE TO '0.37.0';

The upgrade script applies ALTER TABLE ... ADD COLUMN IF NOT EXISTS __pgt_trace_context TEXT to each existing change buffer table. This is idempotent and safe for large installations. Expect a brief metadata lock on each buffer table during the upgrade.

No breaking changes. The __pgt_trace_context column is NULL unless enable_trace_propagation = true and a session trace_id is set.


0.37.0 → 0.38.0

No schema changes. This is a correctness and diagnostic release.

Behavioral notes:

  • EC-01 correctness closeout: Join phantom row elimination is complete. Property tests now prove convergence across join patterns including three-way joins with simultaneous multi-side deletes.
  • Fuzz regression fixes: All known fuzz corpus failures are resolved.

Migration note:

ALTER EXTENSION pg_trickle UPDATE TO '0.38.0';

No breaking changes.


0.38.0 → 0.39.0

New SQL function:

FunctionPurpose
pgtrickle.cdc_pause_status()Return the active CDC pause state: paused flag, capture mode, and operator guidance

Extended SQL function:

  • pgtrickle.explain_stream_table(name TEXT) — now includes CDC status, backpressure state, and explicit DIFF/FULL reasoning.

New GUC:

GUCDefaultDescription
pg_trickle.cdc_capture_mode'discard'CDC semantics when cdc_paused=on: 'discard' (default, drops changes) or 'hold' (reserved)

Behavioral notes:

  • O39-6 SQLSTATE-first retry: When use_sqlstate_classification = true (default), scheduler retry decisions use the bracketed SQLSTATE code from PgTrickleError::SpiErrorCode instead of English error message text, making retry behavior locale-independent.
  • O39-8 CDC capture mode: The cdc_paused discard semantics are now explicitly documented and operator-visible via pgtrickle.cdc_pause_status(). The cdc_capture_mode GUC is reserved for a future hold mode.

TRUNCATE/CDC semantics (explicit):

When pg_trickle.cdc_paused = on:

  • cdc_capture_mode = 'discard' (default): All DML against source tables passes through the CDC trigger, but the trigger returns NULL immediately without writing to the change buffer. Changes are permanently lost. After un-pausing, run a FULL refresh on any stream table that received DML during the pause:
    SELECT pgtrickle.refresh_stream_table('my_stream', 'FULL');
    
  • cdc_capture_mode = 'hold': Not yet implemented. Emits a WARNING and falls back to 'discard'.

When TRUNCATE occurs on a source table:

  • In trigger-based CDC mode, the TRUNCATE trigger calls pgtrickle.pgt_ivm_handle_truncate() which schedules a FULL refresh. If cdc_paused = on, the trigger returns NULL and the TRUNCATE is not recorded. After un-pausing, the stream table will not be aware that a TRUNCATE occurred — reinitialize explicitly.
  • In WAL-based CDC mode, TRUNCATEs are detected during the next logical decoding poll and scheduled for FULL refresh.

Migration note:

ALTER EXTENSION pg_trickle UPDATE TO '0.39.0';

No catalog schema changes. The upgrade script is a no-op DDL-wise.


0.39.0 → 0.40.0

No SQL schema changes. This release contains internal improvements only.

ALTER EXTENSION pg_trickle UPDATE TO '0.40.0';

0.40.0 → 0.50.0

No SQL schema changes. This release contains internal improvements and new documentation only.

ALTER EXTENSION pg_trickle UPDATE TO '0.50.0';

0.50.0 → 0.51.0

Breaking changes:

  • pg_trickle.event_driven_wake GUC has been removed (CQ-10-02). Remove any ALTER SYSTEM SET pg_trickle.event_driven_wake ... or postgresql.conf entries for this GUC before upgrading.
  • pg_trickle.wake_debounce_ms GUC has been removed (CQ-10-02). Remove any references to this GUC from your configuration.

No SQL schema changes. Only code and documentation changes.

Migration:

-- Remove obsolete GUC settings before upgrading (run as superuser):
ALTER SYSTEM RESET pg_trickle.event_driven_wake;
ALTER SYSTEM RESET pg_trickle.wake_debounce_ms;
SELECT pg_reload_conf();

-- Then upgrade:
ALTER EXTENSION pg_trickle UPDATE TO '0.51.0';

Supported Upgrade Paths

The following migration hops are available. PostgreSQL chains them automatically when you run ALTER EXTENSION pg_trickle UPDATE.

FromToScript
0.1.30.2.0pg_trickle--0.1.3--0.2.0.sql
0.2.00.2.1pg_trickle--0.2.0--0.2.1.sql
0.2.10.2.2pg_trickle--0.2.1--0.2.2.sql
0.2.20.2.3pg_trickle--0.2.2--0.2.3.sql
0.2.30.3.0pg_trickle--0.2.3--0.3.0.sql
0.3.00.4.0pg_trickle--0.3.0--0.4.0.sql
0.4.00.5.0pg_trickle--0.4.0--0.5.0.sql
0.5.00.6.0pg_trickle--0.5.0--0.6.0.sql
0.6.00.7.0pg_trickle--0.6.0--0.7.0.sql
0.7.00.8.0pg_trickle--0.7.0--0.8.0.sql
0.8.00.9.0pg_trickle--0.8.0--0.9.0.sql
0.9.00.10.0pg_trickle--0.9.0--0.10.0.sql
0.10.00.11.0pg_trickle--0.10.0--0.11.0.sql
0.11.00.12.0pg_trickle--0.11.0--0.12.0.sql
0.12.00.13.0pg_trickle--0.12.0--0.13.0.sql
0.13.00.14.0pg_trickle--0.13.0--0.14.0.sql
0.14.00.15.0pg_trickle--0.14.0--0.15.0.sql
0.15.00.16.0pg_trickle--0.15.0--0.16.0.sql
0.16.00.17.0pg_trickle--0.16.0--0.17.0.sql
0.17.00.18.0pg_trickle--0.17.0--0.18.0.sql
0.18.00.19.0pg_trickle--0.18.0--0.19.0.sql
0.19.00.20.0pg_trickle--0.19.0--0.20.0.sql
0.20.00.21.0pg_trickle--0.20.0--0.21.0.sql
0.21.00.22.0pg_trickle--0.21.0--0.22.0.sql
0.22.00.23.0pg_trickle--0.22.0--0.23.0.sql
0.23.00.24.0pg_trickle--0.23.0--0.24.0.sql
0.24.00.25.0pg_trickle--0.24.0--0.25.0.sql
0.25.00.26.0pg_trickle--0.25.0--0.26.0.sql
0.26.00.27.0pg_trickle--0.26.0--0.27.0.sql
0.27.00.28.0pg_trickle--0.27.0--0.28.0.sql
0.28.00.29.0pg_trickle--0.28.0--0.29.0.sql
0.29.00.30.0pg_trickle--0.29.0--0.30.0.sql
0.30.00.31.0pg_trickle--0.30.0--0.31.0.sql
0.31.00.32.0pg_trickle--0.31.0--0.32.0.sql
0.32.00.33.0pg_trickle--0.32.0--0.33.0.sql
0.33.00.34.0pg_trickle--0.33.0--0.34.0.sql
0.34.00.35.0pg_trickle--0.34.0--0.35.0.sql
0.35.00.36.0pg_trickle--0.35.0--0.36.0.sql
0.36.00.37.0pg_trickle--0.36.0--0.37.0.sql
0.37.00.38.0pg_trickle--0.37.0--0.38.0.sql
0.38.00.39.0pg_trickle--0.38.0--0.39.0.sql
0.39.00.40.0pg_trickle--0.39.0--0.40.0.sql
0.40.00.50.0pg_trickle--0.40.0--0.50.0.sql
0.50.00.51.0pg_trickle--0.50.0--0.51.0.sql

Any installation from 0.1.3 onward can be upgraded to 0.51.0 in a single ALTER EXTENSION pg_trickle UPDATE — PostgreSQL chains the hops automatically after the new binaries are installed and the server has been restarted.


Rollback / Downgrade

PostgreSQL does not support automatic extension downgrades. To roll back:

  1. Export stream table definitions (if you want to recreate them later):
cargo run --bin pg_trickle_dump -- --output backup.sql

Or, if the binary is already installed in your PATH:

pg_trickle_dump --output backup.sql

Use --dsn '<connection string>' or standard PG* / DATABASE_URL environment variables when the default local connection parameters are not sufficient.

  1. Drop the extension (destroys all stream tables):

    DROP EXTENSION pg_trickle CASCADE;
    
  2. Install the old version and restart PostgreSQL.

  3. Recreate the extension at the old version:

    CREATE EXTENSION pg_trickle VERSION '0.1.3';
    
  4. Recreate stream tables from your backup.


Troubleshooting

"function pgtrickle.xxx does not exist" after upgrade

This means the upgrade script is missing a function. Workaround:

-- Check what version PostgreSQL thinks is installed
SELECT extversion FROM pg_extension WHERE extname = 'pg_trickle';

-- If the version looks correct but functions are missing,
-- the upgrade script may be incomplete. Try a clean reinstall:
DROP EXTENSION pg_trickle CASCADE;
CREATE EXTENSION pg_trickle CASCADE;
-- Warning: this destroys all stream tables!

Report this as a bug — upgrade scripts should never silently drop functions.

"could not access file pg_trickle" after restart

The new shared library file was not installed correctly. Verify:

ls -la $(pg_config --pkglibdir)/pg_trickle*

ALTER EXTENSION UPDATE says "already at version X"

The binary files are already the new version but the SQL catalog wasn't upgraded. This usually means the .control file's default_version matches your current version. Check:

cat $(pg_config --sharedir)/extension/pg_trickle.control

Multi-Database Environments

ALTER EXTENSION UPDATE must be run in each database where pg_trickle is installed. A common pattern:

for db in $(psql -t -c "SELECT datname FROM pg_database WHERE datname NOT IN ('template0', 'template1')"); do
  psql -d "$db" -c "ALTER EXTENSION pg_trickle UPDATE;" 2>/dev/null || true
done

CloudNativePG (CNPG)

For CNPG deployments, see cnpg/README.md for upgrade instructions specific to the Kubernetes operator.


Upgrading to v0.23.0

New GUCs

GUCDefaultDescription
pg_trickle.log_delta_sqloffLog generated delta SQL at DEBUG1 level for diagnosis
pg_trickle.delta_work_mem0 (inherit)work_mem override (MB) for delta SQL execution
pg_trickle.delta_enable_nestlooponAllow nested-loop joins during delta execution
pg_trickle.analyze_before_deltaonRun ANALYZE on change buffers before delta SQL
pg_trickle.max_change_buffer_alert_rows0 (disabled)Alert threshold for change buffer overflow
pg_trickle.diff_output_formatsplitDIFF output format: split or merged

Behavioral Changes

DI-2 aggregate UPDATE-split: The DIFF output row format for aggregate stream tables changes from UPDATE rows to DELETE+INSERT pairs. This is the algebraically correct form that enables O(Δ) performance for multi-join queries.

Impact: Application code that reads the change buffer or outbox and checks op = 'UPDATE' will silently produce incorrect results.

Migration path:

  1. Set pg_trickle.diff_output_format = 'merged' before upgrading
  2. Migrate application code to handle DELETE+INSERT pairs
  3. Switch to pg_trickle.diff_output_format = 'split' (default)

Rollback Strategy

The DI-2/DI-6 code paths are gated by detecting UPDATE rows in the change buffer. Downgrading to v0.22.0 is safe if no writes have occurred to upgraded stream tables.

Pre-Upgrade Validation

# Verify version files are in sync
just check-version-sync

New SQL Functions

  • pgtrickle.explain_diff_sql(name TEXT) — Returns the delta SQL template for a stream table (for inspection/EXPLAIN)
  • pgtrickle.pgtrickle_refresh_stats() — Per-stream-table timing stats with avg/p95/p99 percentiles

Security Guide

This page is the practical security reference for operators of pg_trickle. It covers roles and grants, what privileges the extension needs, how stream tables interact with PostgreSQL Row-Level Security (RLS), how triggers behave under SECURITY DEFINER vs INVOKER, and what to lock down in production.

Reporting a vulnerability? See SECURITY.md in the repository root for the disclosure policy.


Threat model in one paragraph

pg_trickle runs inside PostgreSQL. Anyone who can connect as a superuser, or as a role that owns the relevant tables, can already read, modify, or destroy the data the extension manages — they do not need pg_trickle to do that. The threats this guide focuses on are: privilege escalation through stream tables (e.g., a low-privilege role gaining access to source data via a stream table), accidental exposure of source data through CDC change buffers, and operational mistakes (running everything as the postgres superuser).


Roles & grants

What pg_trickle needs

The extension installs into the pgtrickle and pgtrickle_changes schemas. The role that runs CREATE EXTENSION pg_trickle must be a superuser because the extension installs background workers, but day-to-day usage can (and should) be done with a less-privileged role.

The role that creates a stream table needs:

  • USAGE on the schemas containing source tables.
  • SELECT on the source tables referenced in the defining query.
  • CREATE on the schema where the stream table will live.
  • EXECUTE on the relevant pgtrickle.* functions.
-- Owner of stream tables (your application's "data engineer" role)
CREATE ROLE st_author NOINHERIT;
GRANT USAGE       ON SCHEMA public TO st_author;
GRANT SELECT      ON ALL TABLES IN SCHEMA public TO st_author;
GRANT CREATE      ON SCHEMA public TO st_author;
GRANT EXECUTE     ON ALL FUNCTIONS IN SCHEMA pgtrickle TO st_author;
GRANT USAGE       ON SCHEMA pgtrickle TO st_author;

-- Read-only consumer (your application)
CREATE ROLE app_reader;
GRANT USAGE       ON SCHEMA public TO app_reader;
GRANT SELECT      ON ALL TABLES IN SCHEMA public TO app_reader;

app_reader can read stream tables exactly as it reads any other table — the extension does not require special privileges for reading a stream table.


Stream tables and Row-Level Security (RLS)

A stream table is the materialized result of its defining query. RLS policies on source tables are evaluated at the time the defining query runs, which is during refresh, under the owner's identity (not the consumer's).

This has two important consequences:

  1. Stream-table contents do not honour the consumer's RLS context. Two consumers with different RLS contexts will read the same rows from the stream table.
  2. You can apply RLS to the stream table itself to filter rows per consumer. pg_trickle does not interfere with RLS policies on stream tables (they are ordinary heap tables under the hood).

The recommended pattern is therefore:

-- Define the ST without RLS at the source level
SELECT pgtrickle.create_stream_table(
    'order_summary',
    $$SELECT tenant_id, customer_id, SUM(amount) AS total
      FROM orders GROUP BY tenant_id, customer_id$$
);

-- Apply RLS to the stream table for per-tenant isolation
ALTER TABLE order_summary ENABLE ROW LEVEL SECURITY;
CREATE POLICY tenant_isolation ON order_summary
    FOR SELECT USING (tenant_id = current_setting('app.tenant_id')::int);

See the Row-Level Security tutorial for a complete worked example.


CDC triggers — SECURITY DEFINER vs INVOKER

In trigger CDC mode, pg_trickle installs AFTER row-level triggers on every source table. These triggers run as SECURITY DEFINER under the role that owns the stream table — so they can write to pgtrickle_changes.* regardless of who issued the source-table write.

What this means for you:

  • Any role that can write to a source table will indirectly write to the corresponding change buffer. That is by design.
  • The change buffer table is owned by the stream-table owner. Other roles get no implicit access.
  • If you revoke INSERT on the change buffer, the trigger keeps working (it runs as the owner).

In WAL CDC mode, no triggers are installed; capture happens in PostgreSQL's logical decoding pipeline and is governed by the max_replication_slots and wal_level settings.


What change buffers contain

pgtrickle_changes.changes_<oid> tables contain the post-image of each changed row, restricted to the columns referenced by the defining query (columnar tracking). Two consequences:

  1. If your defining query references a sensitive column, that column ends up in the change buffer.
  2. The change buffer table inherits the same tablespace and disk layout rules as ordinary tables. If you encrypt your data directory, the change buffers are encrypted at rest the same way.

You can lock change buffers down further:

REVOKE ALL ON ALL TABLES IN SCHEMA pgtrickle_changes FROM PUBLIC;
GRANT  SELECT ON ALL TABLES IN SCHEMA pgtrickle_changes TO st_owner;

Lock down circular dependencies

pg_trickle.allow_circular is off by default and should generally stay that way. Cycles in the DAG are accepted only when this GUC is on, and only for monotone queries — but enabling it widens the class of queries pg_trickle accepts, which deserves explicit attention. Set it via ALTER SYSTEM and require a superuser to flip it.


Audit & monitoring

pg_trickle records every refresh in pgtrickle.pgt_refresh_history. For audit:

-- Last 100 refreshes across the whole installation
SELECT pgt_name, refresh_mode, started_at, finished_at,
       success, rows_in, rows_out, error_message
FROM pgtrickle.pgt_refresh_history
ORDER BY started_at DESC
LIMIT 100;

-- Failed refreshes in the last hour
SELECT * FROM pgtrickle.pgt_refresh_history
WHERE NOT success AND started_at > now() - interval '1 hour';

Combine with pg_audit for full DDL/DML coverage. The Monitoring & Alerting tutorial includes recommended Prometheus alerts.


Copy-Paste Role Templates

The following SQL templates create the three standard pg_trickle roles and grant the minimum required privileges. Run these as a superuser immediately after installing the extension.

pgtrickle_admin — stream table author

CREATE ROLE pgtrickle_admin NOLOGIN NOINHERIT;

-- Extension function access
GRANT USAGE   ON SCHEMA pgtrickle          TO pgtrickle_admin;
GRANT USAGE   ON SCHEMA pgtrickle_changes  TO pgtrickle_admin;
GRANT EXECUTE ON ALL FUNCTIONS IN SCHEMA pgtrickle TO pgtrickle_admin;

-- Create stream tables in the public schema
GRANT CREATE  ON SCHEMA public TO pgtrickle_admin;
GRANT USAGE   ON SCHEMA public TO pgtrickle_admin;
GRANT SELECT  ON ALL TABLES IN SCHEMA public TO pgtrickle_admin;

-- Automatically grant SELECT on new source tables
ALTER DEFAULT PRIVILEGES IN SCHEMA public
    GRANT SELECT ON TABLES TO pgtrickle_admin;

pgtrickle_user — application backend

CREATE ROLE pgtrickle_user NOLOGIN NOINHERIT;

GRANT USAGE   ON SCHEMA pgtrickle TO pgtrickle_user;

-- Monitoring functions (read-only)
GRANT EXECUTE ON FUNCTION pgtrickle.pgt_status()         TO pgtrickle_user;
GRANT EXECUTE ON FUNCTION pgtrickle.refresh_efficiency() TO pgtrickle_user;
GRANT EXECUTE ON FUNCTION pgtrickle.health_check()       TO pgtrickle_user;

-- Per-stream-table SELECT (run after each create_stream_table call):
-- GRANT SELECT ON <stream_table_name> TO pgtrickle_user;

pgtrickle_readonly — BI and reporting tools

CREATE ROLE pgtrickle_readonly NOLOGIN NOINHERIT;

GRANT USAGE ON SCHEMA public TO pgtrickle_readonly;

-- Per-stream-table SELECT (run after each create_stream_table call):
-- GRANT SELECT ON <stream_table_name> TO pgtrickle_readonly;

Assign roles to login roles

-- Data engineer
CREATE ROLE de_alice LOGIN PASSWORD '...';
GRANT pgtrickle_admin    TO de_alice;

-- Application backend
CREATE ROLE app_backend  LOGIN PASSWORD '...';
GRANT pgtrickle_user     TO app_backend;

-- BI tool
CREATE ROLE bi_tool      LOGIN PASSWORD '...';
GRANT pgtrickle_readonly TO bi_tool;

For a complete worked example including CDC trigger ownership verification, see the Security Hardening tutorial.


Hardening checklist

  • pg_trickle.allow_circular = off unless explicitly needed.
  • Stream tables owned by a dedicated, non-superuser role.
  • REVOKE ... FROM PUBLIC on pgtrickle_changes if change buffers contain sensitive columns.
  • RLS policies applied to stream tables that present per-tenant data.
  • Audit logging in place for pgtrickle.pgt_refresh_history.
  • pg_trickle.enabled = on only in environments that should run refreshes (you can disable extension behaviour without uninstalling it).

See also: Row-Level Security tutorial · Pre-Deployment Checklist · Configuration · SECURITY policy

Troubleshooting & Failure Mode Runbook

This document covers common failure scenarios, their symptoms, diagnosis steps, and resolution procedures. It is intended for operators and DBAs running pg_trickle in production.

Quick start: Run SELECT * FROM pgtrickle.health_check() WHERE severity != 'OK'; for a single-call triage of your installation.

See also:


Table of Contents


Diagnostic Toolkit

These functions are your primary tools for diagnosing issues:

FunctionPurpose
pgtrickle.health_check()Single-call overall health triage (OK/WARN/ERROR)
pgtrickle.pgt_status()Status, staleness, error count for all stream tables
pgtrickle.refresh_timeline(N)Last N refresh events across all stream tables
pgtrickle.diagnose_errors('name')Last 5 failed events with classification and remediation
pgtrickle.change_buffer_sizes()CDC pipeline: pending rows and buffer bytes per source
pgtrickle.trigger_inventory()CDC trigger presence and enabled state
pgtrickle.check_cdc_health()WAL replication slot health (WAL mode only)
pgtrickle.dependency_tree()Dependency DAG visualization
pgtrickle.worker_pool_status()Parallel refresh worker pool state
pgtrickle.explain_st('name')DVM operator tree and generated delta SQL

Quick health check script:

-- 1. Overall health
SELECT * FROM pgtrickle.health_check() WHERE severity != 'OK';

-- 2. Problem stream tables
SELECT name, status, refresh_mode, consecutive_errors, staleness
FROM pgtrickle.pgt_status()
WHERE status != 'ACTIVE' OR consecutive_errors > 0
ORDER BY consecutive_errors DESC;

-- 3. Recent failures
SELECT start_time, stream_table, action, status, duration_ms, error_message
FROM pgtrickle.refresh_timeline(20)
WHERE status = 'FAILED';

Failure Scenarios

1. Scheduler Not Running

Symptoms:

  • No stream tables are refreshing
  • health_check() reports scheduler_running = false
  • No pg_trickle scheduler process in pg_stat_activity

Diagnosis:

-- Check for the scheduler process
SELECT pid, datname, backend_type, state
FROM pg_stat_activity
WHERE backend_type = 'pg_trickle scheduler';

-- Check GUC
SHOW pg_trickle.enabled;

-- Check shared_preload_libraries
SHOW shared_preload_libraries;

Resolution:

CauseFix
pg_trickle.enabled = offALTER SYSTEM SET pg_trickle.enabled = on; SELECT pg_reload_conf();
Not in shared_preload_librariesAdd pg_trickle to shared_preload_libraries in postgresql.conf and restart PostgreSQL
max_worker_processes exhaustedIncrease max_worker_processes and restart. The launcher retries every 5 minutes — check PostgreSQL logs for WARNING: pg_trickle launcher: could not spawn scheduler
Scheduler crashedCheck PostgreSQL logs for crash details. The launcher will auto-restart it. If recurring, check for OOM or resource limits

2. Stream Table Stuck in SUSPENDED Status

Symptoms:

  • Stream table status shows SUSPENDED
  • consecutive_errors is at or above pg_trickle.max_consecutive_errors
  • No refreshes happening for this stream table

Diagnosis:

-- Check the stream table status
SELECT pgt_name, status, consecutive_errors, last_error_message
FROM pgtrickle.pg_stat_stream_tables
WHERE pgt_name = 'my_stream_table';

-- Get detailed error history
SELECT * FROM pgtrickle.diagnose_errors('my_stream_table');

Resolution:

  1. Fix the underlying error (check last_error_message and diagnose_errors)
  2. Resume the stream table:
SELECT pgtrickle.alter_stream_table('my_stream_table', enabled => true);
  1. Trigger a manual refresh to verify:
SELECT pgtrickle.refresh_stream_table('my_stream_table');

Prevention: Increase pg_trickle.max_consecutive_errors (default 3) if transient errors are common in your environment:

ALTER SYSTEM SET pg_trickle.max_consecutive_errors = 5;
SELECT pg_reload_conf();

3. CDC Triggers Missing or Disabled

Symptoms:

  • Stream table refreshes succeed but shows no changes
  • change_buffer_sizes() shows pending_rows = 0 despite active DML
  • Source tables have no pg_trickle triggers

Diagnosis:

-- Check trigger inventory
SELECT source_table, trigger_type, trigger_name, present, enabled
FROM pgtrickle.trigger_inventory()
WHERE NOT present OR NOT enabled;

-- Manual check on a specific source table
SELECT tgname, tgenabled
FROM pg_trigger
WHERE tgrelid = 'public.orders'::regclass
  AND tgname LIKE 'pgt_%';

Resolution:

CauseFix
Triggers dropped by DDL (e.g., pg_dump + restore without triggers)Drop and recreate the stream table, or reinitialize: SELECT pgtrickle.refresh_stream_table('my_st');
Triggers disabled (ALTER TABLE ... DISABLE TRIGGER)ALTER TABLE source_table ENABLE TRIGGER ALL;
Source gating activeCheck SELECT * FROM pgtrickle.source_gates(); and ungate: SELECT pgtrickle.ungate_source('source_table');
WAL mode active but slot missingSee WAL Replication Slot Lag or Missing

4. WAL Replication Slot Lag or Missing

Symptoms:

  • check_cdc_health() shows slot_lag_exceeds_threshold or replication_slot_missing
  • WAL disk usage growing unexpectedly
  • Stream tables not receiving changes in WAL mode

Diagnosis:

-- Check CDC health
SELECT * FROM pgtrickle.check_cdc_health();

-- Check replication slots directly
SELECT slot_name, active, restart_lsn, confirmed_flush_lsn,
       pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) AS lag
FROM pg_replication_slots
WHERE slot_name LIKE 'pgt_%';

Resolution:

CauseFix
Slot dropped externallypg_trickle will auto-fallback to trigger-based CDC. To recreate: drop and recreate the stream table
Slot lagging (WAL accumulation)Check for long-running transactions: SELECT pid, age(backend_xmin) FROM pg_stat_replication;. Kill idle-in-transaction sessions
wal_level != logicalWAL CDC requires wal_level = logical. Set it and restart PostgreSQL
max_replication_slots exhaustedIncrease max_replication_slots and restart

Fallback: Force trigger-based CDC mode if WAL mode is problematic:

ALTER SYSTEM SET pg_trickle.cdc_mode = 'trigger';
SELECT pg_reload_conf();

5. Stream Table Stuck in INITIALIZING

Symptoms:

  • Stream table status is INITIALIZING for an extended period
  • The initial full refresh may have failed or is still running

Diagnosis:

-- Check refresh history
SELECT * FROM pgtrickle.get_refresh_history('my_st', 5);

-- Check for active refresh
SELECT pid, state, query, now() - query_start AS running_for
FROM pg_stat_activity
WHERE query LIKE '%pgtrickle%' AND state = 'active';

Resolution:

CauseFix
Initial refresh failed (check error in history)Fix the error, then: SELECT pgtrickle.refresh_stream_table('my_st');
Defining query is very slowOptimize the query, add indexes on source tables, or increase work_mem
Lock contention during initial refreshSee Lock Contention

6. Change Buffers Growing Without Refresh

Symptoms:

  • change_buffer_sizes() shows large pending_rows and growing buffer_bytes
  • Stream tables are stale
  • Refreshes are not running or are failing

Diagnosis:

-- Check buffer sizes
SELECT stream_table, source_table, pending_rows, buffer_bytes
FROM pgtrickle.change_buffer_sizes()
ORDER BY pending_rows DESC;

-- Check if refreshes are happening
SELECT * FROM pgtrickle.refresh_timeline(10);

-- Check for blocked refresh processes
SELECT pid, wait_event_type, wait_event, state, query
FROM pg_stat_activity
WHERE query LIKE '%pgtrickle%';

Resolution:

CauseFix
Scheduler not runningSee Scheduler Not Running
All refreshes failingCheck diagnose_errors() for each affected stream table
Lock contentionSee Lock Contention
Very large buffer causing slow MERGEConsider lowering pg_trickle.differential_change_ratio_threshold to trigger FULL refresh for large batches

Emergency: If buffers are dangerously large and you need immediate relief:

-- Force a full refresh (bypasses change buffers)
SELECT pgtrickle.refresh_stream_table('my_st', force_full => true);

7. Lock Contention Blocking Refresh

Symptoms:

  • Refresh duration is much longer than usual
  • pg_stat_activity shows refresh processes in Lock wait state
  • Long-running transactions on source or stream tables

Diagnosis:

-- Find blocking locks
SELECT blocked.pid AS blocked_pid,
       blocked.query AS blocked_query,
       blocking.pid AS blocking_pid,
       blocking.query AS blocking_query
FROM pg_stat_activity blocked
JOIN pg_locks bl ON bl.pid = blocked.pid AND NOT bl.granted
JOIN pg_locks gl ON gl.locktype = bl.locktype
    AND gl.database IS NOT DISTINCT FROM bl.database
    AND gl.relation IS NOT DISTINCT FROM bl.relation
    AND gl.page IS NOT DISTINCT FROM bl.page
    AND gl.tuple IS NOT DISTINCT FROM bl.tuple
    AND gl.pid != bl.pid
    AND gl.granted
JOIN pg_stat_activity blocking ON blocking.pid = gl.pid
WHERE blocked.query LIKE '%pgtrickle%';

Resolution:

  1. Identify and terminate the blocking session if appropriate:
    SELECT pg_terminate_backend(<blocking_pid>);
    
  2. Investigate why the blocking transaction is long-running (idle in transaction, slow query, etc.)
  3. Consider adding statement_timeout or idle_in_transaction_session_timeout to prevent future occurrences

8. Out-of-Memory During Refresh

Symptoms:

  • Refresh processes killed by OS OOM killer
  • PostgreSQL logs show out of memory errors
  • Stream tables fail with system-category errors

Diagnosis:

# Check OS OOM killer logs
dmesg | grep -i "oom\|killed process" | tail -20

# Check PostgreSQL logs for memory errors
grep -i "out of memory\|oom" /var/log/postgresql/postgresql-*.log | tail -10
-- Check which stream tables have large source data
SELECT stream_table, source_table, pending_rows
FROM pgtrickle.change_buffer_sizes()
ORDER BY pending_rows DESC;

Resolution:

CauseFix
Large FULL refresh on big tableReduce work_mem or maintenance_work_mem to limit per-query memory
Large change buffer accumulationRefresh more frequently (shorter schedule) to keep buffers small
Complex query with many joinsSimplify the defining query or break into cascading stream tables
Parallel refresh amplifies memoryReduce pg_trickle.max_concurrent_refreshes

Tuning:

-- Limit per-refresh memory
SET work_mem = '64MB';

-- Limit concurrent refreshes to reduce peak memory
ALTER SYSTEM SET pg_trickle.max_concurrent_refreshes = 2;
SELECT pg_reload_conf();

9. Disk Full / WAL Retention Exceeded

Symptoms:

  • PostgreSQL logs No space left on device errors
  • WAL directory consuming excessive disk
  • Replication slots preventing WAL cleanup

Diagnosis:

# Check disk usage
df -h /var/lib/postgresql/data
du -sh /var/lib/postgresql/data/pg_wal/
-- Check replication slot WAL retention
SELECT slot_name, active,
       pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) AS retained_wal
FROM pg_replication_slots
ORDER BY pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn) DESC;

-- Check change buffer table sizes
SELECT stream_table, source_table,
       pg_size_pretty(buffer_bytes::bigint) AS buffer_size
FROM pgtrickle.change_buffer_sizes()
ORDER BY buffer_bytes DESC;

Resolution:

CauseFix
Inactive replication slot holding WALDrop the slot: SELECT pg_drop_replication_slot('pgt_...');
Change buffer tables too largeForce full refresh to clear buffers, or refresh more frequently
WAL accumulation from long transactionsTerminate idle-in-transaction sessions
max_wal_size too lowIncrease max_wal_size in postgresql.conf

Emergency cleanup:

-- Drop inactive pg_trickle replication slots
SELECT pg_drop_replication_slot(slot_name)
FROM pg_replication_slots
WHERE slot_name LIKE 'pgt_%' AND NOT active;

10. Circular Pipeline Convergence Failure

Symptoms:

  • Stream tables in a circular dependency hit the maximum iteration limit
  • Refresh history shows repeated cycles without convergence
  • Error messages mention fixed_point_max_iterations

Diagnosis:

-- Check for circular dependencies
SELECT * FROM pgtrickle.dependency_tree();

-- Check refresh history for iteration patterns
SELECT start_time, stream_table, action, status, error_message
FROM pgtrickle.refresh_timeline(50)
WHERE stream_table IN ('st_a', 'st_b')  -- suspected cycle members
ORDER BY start_time DESC;

Resolution:

  1. Verify the cycle is intentional (see Circular Dependencies tutorial)
  2. Increase the iteration limit if convergence is slow:
    ALTER SYSTEM SET pg_trickle.fixed_point_max_iterations = 20;
    SELECT pg_reload_conf();
    
  3. If the cycle never converges, the defining queries may not be monotone. Restructure to eliminate the cycle or ensure monotonicity

11. Schema Change Broke Stream Table

Symptoms:

  • Stream table has needs_reinit = true
  • Reinitialization keeps failing
  • Error messages reference dropped or renamed columns

Diagnosis:

-- Check for pending reinit
SELECT pgt_name, needs_reinit, status, last_error_message
FROM pgtrickle.pg_stat_stream_tables
WHERE needs_reinit;

-- Get error details
SELECT * FROM pgtrickle.diagnose_errors('my_st');

Resolution:

If the defining query is still valid after the DDL change, force a reinit:

SELECT pgtrickle.refresh_stream_table('my_st');

If the defining query needs to be updated:

-- Option 1: Alter the defining query
SELECT pgtrickle.alter_stream_table('my_st',
    query => 'SELECT new_column, SUM(amount) FROM orders GROUP BY new_column'
);

-- Option 2: Drop and recreate
SELECT pgtrickle.drop_stream_table('my_st');
SELECT pgtrickle.create_stream_table(
    'my_st',
    'SELECT new_column, SUM(amount) FROM orders GROUP BY new_column',
    '1m'
);

12. Worker Pool Exhaustion

Symptoms:

  • Refresh latency increases across the board
  • Some stream tables refresh while others queue indefinitely
  • worker_pool_status() shows all workers busy

Diagnosis:

-- Check worker pool
SELECT * FROM pgtrickle.worker_pool_status();

-- Check for long-running parallel jobs
SELECT job_id, unit_key, status, duration_ms
FROM pgtrickle.parallel_job_status(300)
WHERE status = 'RUNNING'
ORDER BY duration_ms DESC;

Resolution:

CauseFix
Too few workers for workloadIncrease pg_trickle.max_concurrent_refreshes and/or max_worker_processes
One stream table monopolizing workersCheck if a single slow refresh is blocking the pool. Consider splitting into smaller stream tables
Global worker cap reachedIncrease pg_trickle.max_dynamic_refresh_workers

13. Fuse Tripped (Circuit Breaker)

Symptoms:

  • Stream table shows fuse_state = 'BLOWN' or refresh is paused
  • fuse_status() reports a tripped fuse
  • No refreshes happening despite active scheduler

Diagnosis:

-- Check fuse status
SELECT * FROM pgtrickle.fuse_status();

Resolution:

Reset the fuse after investigating the root cause:

SELECT pgtrickle.reset_fuse('my_stream_table');

See the Fuse Circuit Breaker tutorial for details on fuse thresholds and configuration.


14. Stream Table Appears Stuck Behind a Long Transaction

Symptoms:

  • A stream table's data_timestamp is not advancing even though the source table is receiving new inserts.
  • The pg_trickle_frontier_holdback_lsn_bytes Prometheus gauge is non-zero.
  • Server log contains: pg_trickle: frontier holdback active — the oldest in-progress transaction is Ns old.

Cause: frontier_holdback_mode = 'xmin' (the default) prevents the scheduler from advancing the frontier while any in-progress transaction exists that is older than the previous tick's xmin baseline. A long-running or forgotten session holding an open transaction will pause frontier advancement for all stream tables on that PostgreSQL server.

This is intentional: without the holdback, a transaction that inserts into a tracked source table and commits after the scheduler ticks would have its change permanently lost (see Issue #536 and plans/safety/PLAN_FRONTIER_VISIBILITY_HOLDBACK.md).

Diagnosis:

-- Find the oldest in-progress transaction
SELECT pid, usename, state, application_name,
       backend_xmin,
       EXTRACT(EPOCH FROM (now() - xact_start))::int AS xact_age_secs,
       query
FROM pg_stat_activity
WHERE backend_xmin IS NOT NULL
  AND state <> 'idle'
ORDER BY xact_start;

-- Check for prepared (2PC) transactions
SELECT gid, prepared,
       EXTRACT(EPOCH FROM (now() - prepared))::int AS age_secs,
       owner, database
FROM pg_prepared_xacts
ORDER BY prepared;

Resolution:

  1. Identify and terminate the blocking session:

    SELECT pg_terminate_backend(pid)
    FROM pg_stat_activity
    WHERE state = 'idle in transaction'
      AND backend_xmin IS NOT NULL
    ORDER BY xact_start
    LIMIT 1;
    
  2. Rollback a forgotten 2PC transaction:

    ROLLBACK PREPARED 'gid_from_pg_prepared_xacts';
    
  3. For benchmark or known-safe workloads only, disable holdback to restore the pre-fix fast path (risks silent data loss):

    ALTER SYSTEM SET pg_trickle.frontier_holdback_mode = 'none';
    SELECT pg_reload_conf();
    
  4. Suppress the warning (while keeping holdback active) by raising the threshold:

    ALTER SYSTEM SET pg_trickle.frontier_holdback_warn_seconds = 300;
    SELECT pg_reload_conf();
    
  5. On managed PostgreSQL (RDS, Cloud SQL, Aiven, etc.) where pg_stat_activity is restricted to the current user's own sessions, the probe will silently see no other backends and never trigger a holdback. The server log will contain: pg_trickle: frontier holdback probe cannot see other PostgreSQL backends.

    Fix by granting the monitoring role to the pg_trickle service account:

    GRANT pg_monitor TO <pg_trickle_service_role>;
    

    Then restart the pg_trickle scheduler (or reload PostgreSQL) so the new privilege takes effect.


15. Stale Data After High-Concurrency Writes (Sequence Cache Inversion)

Symptoms:

  • A stream table consistently shows an outdated value for a row that is being updated frequently by concurrent sessions.
  • The issue is reproducible under high write concurrency but not with a single writer.
  • pgtrickle.check_cdc_health() returns a CRITICAL: change buffer sequence(s) have CACHE > 1 alert.

Cause: Someone manually altered a change buffer sequence to use CACHE > 1 (e.g. to reduce sequence LWLock contention). The change buffer BIGSERIAL sequence must use CACHE 1 — this is a hard correctness invariant, not a performance knob.

With CACHE > 1, PostgreSQL backends pre-allocate blocks of sequence values. Two concurrent transactions modifying the same row can commit in an order that inverts their pre-cached change_id values:

  1. Backend A caches [16, 31] and starts updating row id=1.
  2. Backend B caches [33, 64], updates the same row with change_id=33, and commits first.
  3. Backend A commits last (true final state), but its change_id=16.
  4. The compaction/delta pipeline uses ORDER BY change_id DESC to find the final state → picks change_id=33 (Backend B's stale data).

Silent data corruption. See issue #536 for full analysis.

Diagnosis:

-- check_cdc_health() surfaces the problem:
SELECT source_table, cdc_mode, alert
FROM pgtrickle.check_cdc_health()
WHERE alert IS NOT NULL;

-- Directly inspect all change buffer sequences:
SELECT schemaname, sequencename, cache_size
FROM pg_sequences
WHERE schemaname = 'pgtrickle_changes'
  AND (sequencename LIKE 'changes_%_change_id_seq'
    OR sequencename LIKE 'changes_pgt_%_change_id_seq')
  AND cache_size > 1;

Resolution:

  1. Reset every affected sequence back to CACHE 1:

    -- Replace <seq_name> with the sequencename from the query above.
    ALTER SEQUENCE pgtrickle_changes.<seq_name> CACHE 1;
    
  2. Verify the alert is gone:

    SELECT alert FROM pgtrickle.check_cdc_health() WHERE alert IS NOT NULL;
    -- Should return zero rows.
    
  3. Do NOT increase CACHE to reduce LWLock contention. The only structural solution for high-concurrency change_id contention is to switch to the WAL/logical-decoding CDC backend, which uses commit-LSN ordering and has no sequence at all:

    SELECT pgtrickle.alter_stream_table('my_st', cdc_mode => 'wal');
    

General Diagnostic Workflow

When investigating any issue, follow this sequence:

1. health_check()          → identify which subsystem is unhealthy
2. pgt_status()            → find specific affected stream tables
3. diagnose_errors('name') → get root cause for failures
4. refresh_timeline(20)    → correlate with recent refresh events
5. change_buffer_sizes()   → check CDC pipeline health
6. trigger_inventory()     → verify change capture is working
7. dependency_tree()       → confirm DAG wiring
8. PostgreSQL logs         → low-level crash/resource details

GUC Quick Reference for Troubleshooting

GUCDefaultWhat to check
pg_trickle.enabledonMust be on for scheduler to run
pg_trickle.max_consecutive_errors3Stream tables suspend after this many failures
pg_trickle.scheduler_interval_ms100Very high values cause refresh lag
pg_trickle.cdc_modeautotrigger for reliable fallback
pg_trickle.max_concurrent_refreshes4Per-database parallel refresh cap
pg_trickle.fixed_point_max_iterations10Circular pipeline iteration limit
pg_trickle.differential_change_ratio_threshold0.5Falls back to FULL above this ratio
pg_trickle.auto_backoffonStretches intervals up to 8x under load
pg_trickle.frontier_holdback_modexminnone disables holdback (unsafe); xmin = safe default
pg_trickle.frontier_holdback_warn_seconds60Warn after holding back for this many seconds

See also: Error Reference · Configuration · FAQ · Pre-Deployment Checklist · Capacity Planning · CDC Modes

pg_trickle Error Reference

This document lists all PgTrickleError variants with descriptions, common causes, and suggested fixes. If you encounter an error not listed here, please open an issue.

Tip: Most errors include context (table name, OID, or query fragment) in the message text. Use that context to narrow down the root cause.


SQLSTATE Code Reference

Every pg_trickle error includes a PostgreSQL SQLSTATE code for programmatic error handling. Use SQLSTATE in PL/pgSQL EXCEPTION WHEN blocks or check the error code in your client library.

Error VariantSQLSTATECode Name
QueryParseError42000SYNTAX_ERROR_OR_ACCESS_RULE_VIOLATION
TypeMismatch42804DATATYPE_MISMATCH
UnsupportedOperator0A000FEATURE_NOT_SUPPORTED
CycleDetected3F000INVALID_SCHEMA_DEFINITION
NotFound42P01UNDEFINED_TABLE
AlreadyExists42P07DUPLICATE_TABLE
InvalidArgument22023INVALID_PARAMETER_VALUE
QueryTooComplex54000PROGRAM_LIMIT_EXCEEDED
PermissionDenied42501INSUFFICIENT_PRIVILEGE
UpstreamTableDropped42P01UNDEFINED_TABLE
UpstreamSchemaChanged42P17INVALID_TABLE_DEFINITION
LockTimeout55P03LOCK_NOT_AVAILABLE
ReplicationSlotError55000OBJECT_NOT_IN_PREREQUISITE_STATE
WalTransitionError55000OBJECT_NOT_IN_PREREQUISITE_STATE
SpiErrorXX000INTERNAL_ERROR
SpiErrorCodeXX000INTERNAL_ERROR (SQLSTATE preserved from original)
SpiPermissionError42501INSUFFICIENT_PRIVILEGE
WatermarkBackwardMovement22000DATA_EXCEPTION
WatermarkGroupNotFound42704UNDEFINED_OBJECT
WatermarkGroupAlreadyExists42710DUPLICATE_OBJECT
RefreshSkipped55000OBJECT_NOT_IN_PREREQUISITE_STATE
PublicationAlreadyExists42710DUPLICATE_OBJECT
PublicationNotFound42704UNDEFINED_OBJECT
SlaTooSmall22023INVALID_PARAMETER_VALUE
ChangedColsBitmaskFailedXX000INTERNAL_ERROR
PublicationRebuildFailedXX000INTERNAL_ERROR
DiagnosticErrorXX000INTERNAL_ERROR
SnapshotAlreadyExists42710DUPLICATE_OBJECT
SnapshotSourceNotFound42P01UNDEFINED_TABLE
SnapshotSchemaVersionMismatch42P17INVALID_TABLE_DEFINITION
OutboxAlreadyEnabled42710DUPLICATE_OBJECT
OutboxNotEnabled42704UNDEFINED_OBJECT
PgTideMissing0A000FEATURE_NOT_SUPPORTED
UnresolvedPlaceholderXX000INTERNAL_ERROR
DiffDepthExceeded54000PROGRAM_LIMIT_EXCEEDED
DiffCteCountExceeded54000PROGRAM_LIMIT_EXCEEDED
StSourceFrontierMissing42P01UNDEFINED_TABLE
InternalErrorXX000INTERNAL_ERROR

Error Categories

pg_trickle classifies errors into four categories that determine retry behavior:

CategoryRetried by scheduler?Description
UserNoInvalid queries, type mismatches, DAG cycles. Fix the input.
SchemaNo (triggers reinitialize)Upstream DDL changes. The stream table is reinitialized automatically.
SystemYes (with backoff)Lock timeouts, replication slot problems, transient SPI failures.
InternalNoUnexpected bugs. Please report these.

User Errors

QueryParseError

Message: query parse error: <details>

Description: The defining query could not be parsed or validated by the pg_trickle query analyzer.

Common causes:

  • Syntax error in the defining query
  • Use of PostgreSQL syntax not yet supported by pgrx's query parser
  • A CTE or subquery that cannot be analyzed

Suggested fix: Simplify the query. Check that it runs as a standalone SELECT statement. Review SQL Reference — Expression Support for supported syntax.


TypeMismatch

Message: type mismatch: <details>

Description: A type incompatibility was detected between the defining query output and the stream table schema, or between source columns and expected types.

Common causes:

  • Column type changed on a source table after stream table creation
  • Explicit cast to an incompatible type in the defining query
  • UNION branches with mismatched column types

Suggested fix: Ensure column types match. Use explicit CAST() to align types if needed. If the source table changed, use pgtrickle.repair_stream_table() to reinitialize.


UnsupportedOperator

Message: unsupported operator for DIFFERENTIAL mode: <operator>

Description: The defining query uses an SQL operator or construct that pg_trickle cannot maintain incrementally.

Common causes:

  • TABLESAMPLE, GROUPING SETS beyond the branch limit, recursive CTEs with unsupported patterns, certain window function combinations
  • Non-monotonic or volatile functions in positions that prevent differential maintenance

Suggested fix: Use refresh_mode => 'FULL' to fall back to full recomputation:

SELECT pgtrickle.alter_stream_table('my_stream_table',
    refresh_mode => 'FULL');

Or restructure the query to avoid the unsupported construct. See SQL Reference — Expression Support.


CycleDetected

Message: cycle detected in dependency graph: A -> B -> C -> A

Description: Adding or altering this stream table would create a circular dependency in the refresh DAG.

Common causes:

  • Stream table A depends on stream table B, which depends on A
  • Indirect cycles through chains of stream tables

Suggested fix: Restructure the stream table definitions to break the cycle. Use pgtrickle.get_dependency_graph() to visualize the current DAG. If circular dependencies are intentional, enable pg_trickle.allow_circular = true (see Configuration).


NotFound

Message: stream table not found: <name>

Description: The specified stream table does not exist in the pgtrickle.pgt_stream_tables catalog.

Common causes:

  • Typo in the stream table name
  • The stream table was already dropped
  • Schema-qualified name required but not provided (e.g., myschema.my_st)

Suggested fix: Check the name with pgtrickle.list_stream_tables(). Use the fully qualified name: schema.table_name.


AlreadyExists

Message: stream table already exists: <name>

Description: A create_stream_table() call was made for a stream table name that is already registered.

Common causes:

  • Re-running a migration or DDL script without IF NOT EXISTS

Suggested fix: Use pgtrickle.create_stream_table_if_not_exists() or pgtrickle.create_or_replace_stream_table() for idempotent creation.


InvalidArgument

Message: invalid argument: <details>

Description: An invalid value was passed to a pg_trickle API function.

Common causes:

  • Invalid refresh_mode value (must be 'DIFFERENTIAL', 'FULL', or 'AUTO')
  • Calling resume_stream_table() on a table that is not suspended
  • Invalid schedule interval or threshold value
  • Empty or malformed table name

Suggested fix: Check the function signature in the SQL Reference and correct the argument.


QueryTooComplex

Message: query too complex: <details>

Description: The defining query exceeds the maximum parse depth, which protects against stack overflow during query analysis.

Common causes:

  • Deeply nested subqueries (> 64 levels by default)
  • Large UNION ALL chains
  • Complex CTE hierarchies

Suggested fix: Simplify the query. If the depth limit is too restrictive, increase pg_trickle.max_parse_depth (default: 64). See Configuration.


PermissionDenied

Message: permission denied: <details>

Description: The current role does not own the stream table's storage table or lacks the necessary PostgreSQL privileges (SEC-1).

Common causes:

  • Calling alter_stream_table() or drop_stream_table() as a role that does not own the underlying storage table
  • Using SECURITY DEFINER functions that change the effective role

Suggested fix: Run the operation as the table owner, or grant ownership with ALTER TABLE ... OWNER TO <role>.


UpstreamTableDropped

Message: upstream table dropped: OID <oid>

Description: A source table referenced by the stream table's defining query was dropped.

Common causes:

  • DROP TABLE on a source table
  • Table replaced via DROP + CREATE (new OID)

Suggested fix: Either recreate the source table with the same schema or drop the stream table and recreate it. If pg_trickle.block_source_ddl = true (default), the DROP would have been blocked in the first place.


UpstreamSchemaChanged

Message: upstream table schema changed: OID <oid>

Description: A source table's schema was altered (e.g., column added, dropped, or type changed) in a way that affects the defining query.

Common causes:

  • ALTER TABLE ... ADD/DROP/ALTER COLUMN on a source table
  • Type change on a column used in the defining query

Suggested fix: The stream table will be automatically reinitialized on the next scheduler tick. If pg_trickle.block_source_ddl = true (default), most schema changes are blocked proactively. Use pgtrickle.alter_stream_table(..., query => '...') to update the defining query if needed.


System Errors

LockTimeout

Message: lock timeout: <details>

Description: A lock required for refresh could not be acquired within the configured timeout.

Common causes:

  • Long-running transactions holding locks on the stream table or source tables
  • Concurrent ALTER TABLE or VACUUM FULL operations
  • High contention on the change buffer tables

Suggested fix: This error is automatically retried with exponential backoff. If persistent, investigate long-running transactions with pg_stat_activity. Consider increasing lock_timeout or reducing refresh frequency.


ReplicationSlotError

Message: replication slot error: <details>

Description: An error occurred with the logical replication slot used for WAL-based CDC.

Common causes:

  • Replication slot dropped externally
  • wal_level changed from logical to a lower level
  • Slot lag exceeded max_slot_wal_keep_size

Suggested fix: Check replication slot status with SELECT * FROM pg_replication_slots. Ensure wal_level = logical. If the slot was dropped, pg_trickle will recreate it automatically. See Configuration — WAL CDC.


WalTransitionError

Message: WAL transition error: <details>

Description: An error occurred during the transition from trigger-based CDC to WAL-based CDC.

Common causes:

  • wal_level is not logical when cdc_mode = 'auto'
  • Transient connection issues during the transition

Suggested fix: Ensure wal_level = logical in postgresql.conf if you want WAL-based CDC. Otherwise set pg_trickle.cdc_mode = 'trigger' to stay on trigger-based CDC. This error is retried automatically.


SpiError

Message: SPI error: <details>

Description: A PostgreSQL Server Programming Interface (SPI) error occurred during an internal query.

Common causes:

  • Transient serialization failures under high concurrency
  • Deadlocks between refresh and concurrent DML
  • Connection issues in background workers
  • Permanent errors: missing columns, syntax errors in generated SQL

Suggested fix: Transient SPI errors (deadlocks, serialization failures) are retried automatically. Permanent errors (permission denied, missing objects) will suspend the stream table after max_consecutive_errors failures. Check pgtrickle.check_health() for details.


SpiErrorCode

Message: SPI error [<sqlstate_code>]: <details>

Description: A PostgreSQL SPI error where the original SQLSTATE code has been preserved for programmatic classification (SCAL-1, v0.30.0). Used when pg_trickle.use_sqlstate_classification = true, which is the default.

Common causes: Same as SpiError above. The difference is that this variant carries the 5-character SQLSTATE code for locale-safe retry decisions rather than relying on English message pattern matching.

Suggested fix: Inspect the SQLSTATE code. 40001 (serialization failure) and 40P01 (deadlock) are retried automatically. 42xxx (privilege/schema errors) will suspend the stream table.


SpiPermissionError

Message: SPI permission error: <details>

Description: The background worker's role lacks required permissions.

Common causes:

  • Missing SELECT privilege on a source table
  • Missing INSERT/UPDATE/DELETE privilege on the stream table
  • Role used by the background worker is not the table owner

Suggested fix: Grant the necessary privileges to the role running pg_trickle's background workers:

GRANT SELECT ON source_table TO pgtrickle_role;
GRANT ALL ON pgtrickle.my_stream_table TO pgtrickle_role;

This error does not count toward the consecutive error suspension limit.


Watermark Errors

WatermarkBackwardMovement

Message: watermark moved backward: <details>

Description: A watermark advancement was rejected because the new value is older than the current watermark, violating monotonicity.

Common causes:

  • Clock skew in distributed systems
  • Manual watermark manipulation with an incorrect value
  • Bug in watermark tracking logic

Suggested fix: Ensure watermark values are monotonically increasing. Check the current watermark with pgtrickle.get_watermark_groups().


WatermarkGroupNotFound

Message: watermark group not found: <details>

Description: The specified watermark group does not exist.

Common causes:

  • Typo in the watermark group name
  • The group was deleted or never created

Suggested fix: List existing groups with pgtrickle.get_watermark_groups().


WatermarkGroupAlreadyExists

Message: watermark group already exists: <details>

Description: A watermark group with this name already exists.

Common causes:

  • Re-running a setup script without idempotent guards

Suggested fix: Use a different name or delete the existing group first.


Transient Errors

RefreshSkipped

Message: refresh skipped: <details>

Description: A refresh was skipped because a previous refresh for the same stream table is still running.

Common causes:

  • Slow refresh (large delta or complex query) overlapping with the next scheduled cycle
  • Multiple manual refresh_stream_table() calls in parallel

Suggested fix: No action needed — the scheduler will retry on the next cycle. If this happens frequently, increase the schedule interval or investigate why refreshes are slow using pgtrickle.explain_st().

This error does not count toward the consecutive error suspension limit.


Publication Errors

PublicationAlreadyExists

Message: publication already exists for stream table: <name>

Description: pgtrickle.create_publication() was called for a stream table that already has a downstream publication registered.

Common causes:

  • Re-running publication setup without IF NOT EXISTS
  • Concurrent setup in multi-process deployments

Suggested fix: Use pgtrickle.drop_publication() first if you want to recreate it, or check the existing publication with SELECT * FROM pgtrickle.pgt_stream_tables WHERE outbox_enabled = true.


PublicationNotFound

Message: no publication found for stream table: <name>

Description: A publication management call (e.g., drop_publication()) was made for a stream table that does not have a downstream publication.

Common causes:

  • Calling drop_publication() on a stream table that never had one
  • The publication was already dropped

Suggested fix: Check if the stream table has a publication before dropping it. Use pgtrickle.list_publications() to see active publications.


SLA Errors

SlaTooSmall

Message: SLA interval too small for available tiers: <details>

Description: The requested SLA interval is smaller than the fastest available scheduling tier.

Common causes:

  • Specifying a sub-second SLA (e.g., sla_seconds => 0.1) when the minimum schedule is 1 second (pg_trickle.min_schedule_seconds)
  • No available tier can satisfy the requested latency budget

Suggested fix: Lower pg_trickle.min_schedule_seconds if the cluster supports faster scheduling, or set a larger SLA interval. Check available tiers with pgtrickle.list_sla_tiers().


CDC Errors

ChangedColsBitmaskFailed

Message: failed to build changed-columns bitmask: <details>

Description: CDC-1 (v0.24.0): The columnar change tracking system could not build the bitmask expression for changed-column detection. This indicates a table structure that prevents column-level CDC tracking.

Common causes:

  • All columns of the source table are part of the primary key (no non-key columns to track changes for)
  • Very wide tables exceeding the bitmask width limit
  • Schema edge cases with generated or system columns

Suggested fix: Switch to whole-row change tracking with pg_trickle.columnar_cdc = false, or restructure the source table to have at least one non-primary-key column.


PublicationRebuildFailed

Message: publication rebuild failed: <details>

Description: CDC-2 (v0.24.0): The logical replication publication could not be rebuilt for a partitioned source table after a partition attach/detach or schema change.

Common causes:

  • Insufficient privileges to manage publications
  • The publication slot was already dropped
  • Partition schema changed concurrently during the rebuild

Suggested fix: Ensure the pg_trickle background worker role has CREATE PUBLICATION privilege. Check publication status with SELECT * FROM pg_publication. If the issue persists, call pgtrickle.reinitialize_stream_table() to force a clean rebuild.


Diagnostic Errors

DiagnosticError

Message: diagnostic error: <details>

Description: ERR-1 (v0.26.0): An error occurred inside a diagnostic or monitoring function such as explain_refresh_mode(), source_gates(), or watermarks(). These surface as user-visible PostgreSQL errors with context.

Common causes:

  • The stream table was dropped between the diagnostic call and its execution
  • Missing privileges on the internal catalog tables
  • An internal consistency check found unexpected state

Suggested fix: Verify the stream table still exists. Check that the calling role has access to pgtrickle.* catalog tables. If the error message says "stream table not found", the table may have been dropped or its catalog entry is corrupted — use pgtrickle.repair_stream_table().


Snapshot Errors

SnapshotAlreadyExists

Message: snapshot already exists: <name>

Description: SNAP-1 (v0.27.0): A snapshot with the given target name already exists in the snapshot catalog.

Common causes:

  • Re-running snapshot creation without checking for existing snapshots
  • Concurrent snapshot creation with the same name

Suggested fix: Use a unique name for each snapshot, or drop the existing snapshot with pgtrickle.drop_snapshot() before creating a new one.


SnapshotSourceNotFound

Message: snapshot source not found: <name>

Description: SNAP-2 (v0.27.0): The stream table specified as the snapshot source was not found in the catalog.

Common causes:

  • Typo in the stream table name
  • The stream table was dropped before the snapshot was taken

Suggested fix: Verify the stream table name with pgtrickle.list_stream_tables().


SnapshotSchemaVersionMismatch

Message: snapshot schema version mismatch: <details>

Description: SNAP-3 (v0.27.0): The snapshot's schema version does not match the current extension version, indicating the snapshot was taken with a different version of pg_trickle.

Common causes:

  • Upgrading pg_trickle after taking a snapshot
  • Restoring a snapshot from a different version of the extension

Suggested fix: Re-create the snapshot after the upgrade. Old snapshots cannot be used across major version boundaries. See Backup & Restore for migration guidance.


Outbox / pg_tide Errors

OutboxAlreadyEnabled

Message: outbox already attached for stream table: <name>

Description: v0.46.0: pgtrickle.attach_outbox() was called for a stream table that already has an outbox registered via the pg_tide integration.

Common causes:

  • Re-running outbox attachment without checking for existing configuration
  • Concurrent calls to attach_outbox() for the same stream table

Suggested fix: Check for existing outbox configuration with SELECT * FROM pgtrickle.pgt_stream_tables WHERE outbox_enabled = true. Use pgtrickle.detach_outbox() if you need to reconfigure.


OutboxNotEnabled

Message: outbox not attached for stream table: <name>

Description: v0.46.0: An outbox management operation was called on a stream table that does not have a pg_tide outbox attached.

Common causes:

  • Calling detach_outbox() or outbox-related functions on a stream table that never had an outbox configured

Suggested fix: Call pgtrickle.attach_outbox() first, or verify the stream table name.


PgTideMissing

Message: attach_outbox() requires the pg_tide extension. Install it with: CREATE EXTENSION pg_tide;

Description: v0.46.0: The pg_tide extension is not installed in the current database. The outbox/inbox functionality requires pg_tide to be present.

Common causes:

  • Calling pgtrickle.attach_outbox() before installing pg_tide
  • The extension was dropped after outbox configuration

Suggested fix:

CREATE EXTENSION pg_tide;

See pg_tide on GitHub for installation instructions.


Placeholder Errors

UnresolvedPlaceholder

Message: unresolved placeholder '<token>' in SQL for <context>

Description: A41-2: A delta SQL template still contains an unresolved __PGS_*__ or __PGT_*__ placeholder token after all substitution passes have completed. Executing SQL with a raw token would cause an obscure PostgreSQL syntax error; this error is raised early to give a clear, actionable message.

Common causes:

  • A source table OID or stream table ID that is referenced in the query but not present in the current refresh frontier
  • A bug in the delta SQL template generation where a new placeholder type was introduced but not registered in the substitution map
  • An upstream stream table was dropped while the delta SQL was cached

Suggested fix: Reinitialize the affected stream table with pgtrickle.reinitialize_stream_table() to force a fresh template generation. If this persists, please report the issue.


DVM Engine Errors

DiffDepthExceeded

Message: differential query depth exceeded limit of <N> levels; reduce query nesting or raise pg_trickle.max_parse_depth

Description: C-7 (v0.54.0): The diff_node() recursion depth exceeded the configured limit during differential query generation. This prevents stack overflows on pathologically deeply-nested queries.

Common causes:

  • Queries with more than pg_trickle.max_parse_depth levels of nested subqueries, CTEs, or operator trees
  • Highly chained view references that expand into deep nesting

Suggested fix: Simplify the query by reducing nesting depth. If the query is legitimately deep, raise pg_trickle.max_parse_depth:

SET pg_trickle.max_parse_depth = 128;

Alternatively, use refresh_mode => 'FULL' to bypass the differential engine.


DiffCteCountExceeded

Message: differential query CTE count exceeded limit of <N>; simplify the query or raise pg_trickle.max_diff_ctes

Description: R-7 (v0.54.0): The number of CTEs generated during differentiation exceeded the configured limit (pg_trickle.max_diff_ctes). This prevents unbounded memory growth for queries that produce thousands of intermediate CTEs.

Common causes:

  • Multi-source queries with many join paths where each path generates independent delta CTEs
  • Queries with many aggregation levels each requiring separate delta expressions

Suggested fix: Simplify the query or raise the CTE limit:

SET pg_trickle.max_diff_ctes = 500;

Alternatively, use refresh_mode => 'FULL'.


StSourceFrontierMissing

Message: upstream stream table (pgt_id=<id>) not found in refresh frontier; the source stream table may have been dropped — call pgtrickle.reinitialize_stream_table() to recover

Description: C-4 (v0.54.0): A stream-table-to-stream-table source frontier entry is missing from the refresh frontier, indicating the upstream stream table was dropped while a downstream stream table still references it.

Common causes:

  • An upstream stream table was dropped directly (bypassing the dependency check) while a downstream stream table's delta SQL still references it
  • Database restored from backup at a point before the upstream ST was recreated

Suggested fix:

SELECT pgtrickle.reinitialize_stream_table('downstream_stream_table');

If the upstream stream table was intentionally removed, drop and recreate the downstream one with an updated defining query.


Internal Errors

InternalError

Message: internal error: <details>

Description: An unexpected internal error that indicates a bug in pg_trickle.

Common causes:

  • This should not happen in normal operation

Suggested fix: Please report the issue with the full error message, your PostgreSQL version, and pg_trickle version. Include the output of pgtrickle.check_health() and the relevant PostgreSQL log entries.


v0.23.0 — DVM Scaling Errors

change_buffer_overflow Alert

Alert: pg_trickle_alert change_buffer_overflow

Description: A source table's change buffer exceeded the pg_trickle.max_change_buffer_alert_rows threshold during refresh.

Common causes:

  • High write rate on source tables
  • Slow or blocked refresh cycles
  • WAL accumulation during cross-query consistency checks

Suggested fix:

  • Increase pg_trickle.max_change_buffer_alert_rows if the write rate is expected
  • Check for long-running transactions blocking the refresh
  • Consider increasing refresh frequency or using FULL mode for affected tables

DIFF-Slower-Than-FULL Warning

Warning: [pg_trickle] DIFF refresh for <table> took Xms vs last FULL Yms — DIFF is Nx slower

Description: Emitted when pg_trickle.log_delta_sql = on and a DIFF refresh takes longer than the last recorded FULL refresh.

Common causes:

  • Query complexity exceeds the DVM engine's O(Δ) capacity (see PERFORMANCE_COOKBOOK.md §13)
  • Stale planner statistics on change buffer tables
  • work_mem too low for hash joins in the delta SQL

Suggested fix:

  • Check the delta SQL via pgtrickle.explain_diff_sql('<table>')
  • Increase pg_trickle.delta_work_mem for the affected database
  • Switch to AUTO or FULL mode for queries with known threshold-collapse patterns

See Also

Citus Distributed Tables

pg_trickle supports Citus distributed tables as sources for incremental view maintenance and as output targets for stream tables. Once configured, distribution is mostly invisible: you create stream tables exactly as you would on single-node PostgreSQL, and pg_trickle handles per-worker change capture and merging on the coordinator.

Available since v0.32.0 (sources, output targets); the fully automated per-worker scheduler arrived in v0.34.0.

This page is the canonical entry point for Citus support. The long-form reference (worker-slot lifecycle, troubleshooting, and internal architecture) lives at integrations/citus.md.


What you get

  • Distributed sources. Define a stream table whose source is a Citus-distributed table. pg_trickle creates a logical replication slot on each worker, polls all slots from the coordinator via dblink, and merges the changes into the stream table's storage.
  • Distributed output. Pass output_distribution_column to create_stream_table() and the resulting stream table is itself a Citus distributed table, co-located with your source shards.
  • Automated scheduler. Since v0.34, the per-worker slot lifecycle (ensure_worker_slot, poll_worker_slot_changes, lease management) runs automatically — no manual wiring required.
  • Shard-rebalance auto-recovery. Topology changes detected by comparing pg_dist_node against pgt_worker_slots; stale slots are pruned and new ones inserted without operator intervention.
  • Worker failure isolation. Per-worker poll failures are logged and skipped; healthy workers keep running. After pg_trickle.citus_worker_retry_ticks (default 5) consecutive failures, a WARNING is raised.

Prerequisites

  • PostgreSQL 17 or 18 with wal_level = logical on every node (coordinator and workers).
  • Citus 12.x or 13.x on the coordinator and all workers.
  • The dblink extension on the coordinator.
  • pg_trickle installed at the same version on every node.
  • Each source distributed table must have REPLICA IDENTITY FULL.

Quickstart

1. Verify prerequisites

-- Run on coordinator AND each worker:
SHOW wal_level;            -- must be 'logical'
SELECT extname, extversion FROM pg_extension
 WHERE extname IN ('citus', 'pg_trickle', 'dblink');

2. Create extensions on the coordinator

CREATE EXTENSION IF NOT EXISTS dblink;
CREATE EXTENSION IF NOT EXISTS pg_trickle;

3. Prepare a distributed source table

-- Distribute (or co-locate) the source
SELECT create_distributed_table('orders', 'customer_id');

-- Required for logical decoding to capture old values on UPDATE / DELETE
ALTER TABLE orders REPLICA IDENTITY FULL;

4. Create a stream table over distributed sources

SELECT pgtrickle.create_stream_table(
    'order_totals',
    $$SELECT customer_id, SUM(amount) AS total
      FROM orders GROUP BY customer_id$$,
    schedule => '5s'
);

That is it on the user side. pg_trickle:

  1. Detects that orders is distributed.
  2. Creates a per-worker logical replication slot.
  3. Records each slot in pgtrickle.pgt_worker_slots.
  4. Polls every slot on each scheduler tick via dblink.
  5. Merges decoded changes into the coordinator-local change buffer.
  6. Applies the delta to the stream table.

5. (Optional) make the output distributed too

SELECT pgtrickle.create_stream_table(
    'order_totals',
    $$SELECT customer_id, SUM(amount) AS total
      FROM orders GROUP BY customer_id$$,
    schedule                     => '5s',
    output_distribution_column   => 'customer_id'
);

The result table is now itself distributed on customer_id and co-located with the source shards.


Observability

HelperPurpose
SELECT * FROM pgtrickle.citus_status;Per-worker slot summary
SELECT * FROM pgtrickle.pgt_worker_slots;Raw slot catalogue
SELECT * FROM pgtrickle.check_cdc_health();WAL slot health (lag, status)
SELECT * FROM pgtrickle.health_check();Whole-extension triage

Caveats

  • DDL on distributed sources is more involved than on local tables; see the long-form guide.
  • Foreign keys across shards are restricted by Citus, not by pg_trickle.
  • Co-location: if your stream table joins distributed tables, the join columns must be the distribution columns (a Citus requirement).

See also: Long-form Citus reference (worker slots, lifecycle, internals) · CDC Modes · Configuration – pg_trickle.citus_* · CloudNativePG integration

CDC Modes

pg_trickle captures changes from source tables using Change Data Capture (CDC). Two mechanisms are available: row-level triggers and WAL-based logical decoding. Understanding both helps you choose the right setting for your workload.


Quick decision guide

SituationRecommended mode
Just getting started / unsureauto (default) — triggers now, upgrades to WAL automatically
High-write tables where trigger overhead mattersauto or wal
wal_level = logical not available (managed PG, read replica)trigger
You want strict control — no automatic transitionstrigger or wal
Per-table override (e.g. one hot table on WAL, rest on triggers)Pass cdc_mode to create_stream_table

How trigger-based CDC works

When you create a stream table, pg_trickle installs three AFTER row-level triggers on every source table:

AFTER INSERT OR UPDATE OR DELETE FOR EACH ROW

Each trigger fires synchronously within the user's transaction and writes one row per changed row to a buffer table (pgtrickle_changes.changes_<oid>). The buffer row is in the same transaction as the user's change — if the transaction rolls back, the buffer row also disappears.

User transaction:
  INSERT INTO orders …
    → trigger fires
    → INSERT INTO pgtrickle_changes.changes_12345 (op, row_data)
  COMMIT
        │
        ▼
  Scheduler picks up buffer rows → computes delta → refreshes stream table

Write-side cost: approximately 2–15 µs per changed row, depending on row width and table size. This is added directly to the user transaction's commit latency.


How WAL-based CDC works

WAL-based CDC uses PostgreSQL's built-in logical decoding to capture changes asynchronously from the write-ahead log, eliminating trigger overhead entirely.

User transaction:
  INSERT INTO orders …
  COMMIT  (no trigger overhead)
        │
        ▼
  WAL written to disk
        │
        ▼
  pg_trickle WAL decoder background worker
  calls pg_logical_slot_get_changes()
        │
        ▼
  Decoded changes written to pgtrickle_changes.changes_<oid>
        │
        ▼
  Scheduler refreshes stream table

The change capture is decoupled from the user transaction. Users see no added latency on commits.

Trade-off: WAL decoding introduces a small additional replication lag (typically < 1 second). Changes committed by the user are visible to the stream table slightly later than with triggers.

Prerequisites for WAL-based CDC

  1. wal_level = logical in postgresql.conf
  2. Sufficient replication slots: max_replication_slots ≥ (number of tracked source tables) + existing slots
  3. Source table has REPLICA IDENTITY DEFAULT (primary key) or REPLICA IDENTITY FULL
  4. PostgreSQL 18.x (required for the pg_trickle extension)

The auto mode: transparent transition

The default cdc_mode = 'auto' starts with triggers and automatically upgrades to WAL-based CDC when the prerequisites are met.

TRIGGER ──► TRANSITIONING ──► WAL
   ▲                           │
   └───────── (fallback) ──────┘

Transition lifecycle

  1. TRIGGER — pg_trickle installs row-level triggers on the source table.
  2. When wal_level = logical becomes available, pg_trickle starts the transition:
    • Creates a publication (pgtrickle_cdc_<oid>) and replication slot (pgtrickle_<oid>)
    • Sets the source's CDC state to TRANSITIONING
    • Both the trigger and WAL decoder write to the buffer (deduplication happens at refresh)
  3. WAL — once the WAL decoder confirms it has caught up, the trigger is dropped.
  4. Fallback — if the transition times out or errors (e.g. wal_level reverts to replica), the slot and publication are dropped and CDC reverts to triggers.

The transition is transparent — stream tables remain current throughout and there is no window of data loss.


Configuring CDC mode

Global setting

In postgresql.conf:

pg_trickle.cdc_mode = 'auto'     # default: start with triggers, upgrade to WAL
pg_trickle.cdc_mode = 'trigger'  # always use triggers; never create replication slots
pg_trickle.cdc_mode = 'wal'      # require WAL; error if wal_level != logical

Apply without restart:

ALTER SYSTEM SET pg_trickle.cdc_mode = 'auto';
SELECT pg_reload_conf();

Per-stream-table override

Override for a single stream table at creation time:

SELECT pgtrickle.create_stream_table(
    'public.order_totals',
    $$SELECT customer_id, SUM(amount) AS total FROM orders GROUP BY customer_id$$,
    cdc_mode => 'wal'    -- force WAL for this table's sources only
);

Or after the fact:

SELECT pgtrickle.alter_stream_table('public.order_totals', p_cdc_mode => 'trigger');

The per-table override is stored in pgtrickle.pgt_stream_tables.requested_cdc_mode and takes precedence over the global GUC.


Checking the current CDC mode

-- Per-stream-table CDC state for all sources
SELECT source_table, cdc_mode, pending_rows, buffer_bytes
FROM pgtrickle.change_buffer_sizes()
ORDER BY source_table;

-- Full health check including WAL lag
SELECT * FROM pgtrickle.check_cdc_health();

-- Which triggers are installed
SELECT source_table, trigger_type, trigger_name, present, enabled
FROM pgtrickle.trigger_inventory()
ORDER BY source_table;

check_cdc_health() returns one row per source table with:

ColumnDescription
source_tableQualified source table name
cdc_modeCurrent effective mode: trigger, wal, or transitioning
slot_lag_bytesWAL slot lag (NULL for trigger mode)
slot_lag_warntrue if lag exceeds publication_lag_warn_bytes
alertHuman-readable status / warning message

Enabling WAL-based CDC

If you start with cdc_mode = 'trigger' and later want to switch to WAL:

Step 1 — Configure PostgreSQL

# postgresql.conf
wal_level = logical
max_replication_slots = 20    # allow enough slots for all tracked sources

Requires a PostgreSQL restart:

pg_ctl restart -D $PGDATA

Step 2 — Set the GUC

ALTER SYSTEM SET pg_trickle.cdc_mode = 'auto';
SELECT pg_reload_conf();

pg_trickle will automatically begin transitioning existing stream tables to WAL-based CDC over the next few scheduler ticks. No manual intervention is needed per stream table.

Step 3 — Monitor the transition

SELECT source_table, cdc_mode FROM pgtrickle.check_cdc_health();

Tables will cycle through triggertransitioningwal over the next 1–2 minutes depending on write volume.


Reverting to trigger-based CDC

To revert globally:

ALTER SYSTEM SET pg_trickle.cdc_mode = 'trigger';
SELECT pg_reload_conf();

pg_trickle will drop all CDC replication slots and publications on the next scheduler tick and reinstall row-level triggers. Stream tables remain current throughout — the transition is safe.

To revert a single table:

SELECT pgtrickle.alter_stream_table('public.order_totals', p_cdc_mode => 'trigger');

Trigger mode details

Statement-level vs. row-level triggers

By default, pg_trickle uses row-level AFTER triggers. On high-volume bulk inserts (e.g. INSERT INTO orders SELECT … FROM staging), row-level triggers fire once per row. You can switch to statement-level triggers to reduce overhead at the cost of coarser change capture:

pg_trickle.cdc_trigger_mode = 'statement'   # default: 'row'

Note: cdc_trigger_mode is ignored when WAL-based CDC is active.

REPLICA IDENTITY and triggers

Trigger-based CDC captures the full NEW and OLD row. For DELETE and UPDATE to capture the old row values, the source table needs a primary key or REPLICA IDENTITY FULL. Without a primary key, pg_trickle detects this and may fall back to full refresh for affected stream tables.


WAL mode details

Replication slot naming

Each tracked source table gets its own replication slot:

pgtrickle_<source_table_oid>

And a publication:

pgtrickle_cdc_<source_table_oid>

These are internal to pg_trickle and should not be modified manually.

Slot lag management

If a subscriber (or pg_trickle itself) falls behind, the replication slot holds WAL on disk until it is consumed. This can grow unboundedly if pg_trickle is stopped for an extended period.

pg_trickle monitors slot lag and warns when it exceeds pg_trickle.publication_lag_warn_bytes (default: 64 MB). In auto mode, change-buffer cleanup is paused for lagging slots to prevent data loss.

If a slot grows dangerously large while pg_trickle is down, you can drop and recreate it:

-- 1. Temporarily switch to trigger mode
ALTER SYSTEM SET pg_trickle.cdc_mode = 'trigger';
SELECT pg_reload_conf();

-- 2. Manually drop the stale slot if needed
SELECT pg_drop_replication_slot('pgtrickle_12345');

-- 3. Switch back to auto (pg_trickle recreates the slot)
ALTER SYSTEM SET pg_trickle.cdc_mode = 'auto';
SELECT pg_reload_conf();

Partitioned source tables

WAL-based CDC for partitioned tables uses publish_via_partition_root = true so that child partition changes are published under the parent table name. This matches trigger-mode behaviour and ensures the stream table sees a unified change stream.

If a table is converted to partitioned after CDC is set up, pg_trickle detects the inconsistency on the next health check and rebuilds the publication with the correct setting automatically.


Monitoring slot lag in Prometheus

If you use the Prometheus & Grafana integration, pg_trickle exports per-source slot lag as:

pgtrickle_replication_slot_lag_bytes{slot_name="pgtrickle_12345", source_table="orders"}

Set an alert at 80% of your disk space budget for WAL retention.


Performance comparison

TriggerWAL
Write-side overhead~2–15 µs per rowZero (async)
Change latencySub-millisecondUp to ~1 second
PrerequisitesNonewal_level = logical, replication slot
Works on managed PG (e.g. RDS without logical replication)YesNo
Works on physical read replicasNoNo
Handles bulk inserts efficientlyStatement mode optionalYes (batch decoded)
Replication slot disk usageNoneYes — grows if consumer lags

Troubleshooting

Trigger CDC: changes not appearing

  1. Verify triggers are installed:
    SELECT * FROM pgtrickle.trigger_inventory() WHERE NOT present OR NOT enabled;
    
  2. If missing, rebuild:
    SELECT pgtrickle.rebuild_cdc_triggers('public.source_table');
    

WAL CDC: slot not advancing

  1. Check slot lag:
    SELECT slot_name, active, pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn) AS lag_bytes
    FROM pg_replication_slots WHERE slot_name LIKE 'pgtrickle_%';
    
  2. Check that the pg_trickle background worker is running:
    SELECT * FROM pg_stat_activity WHERE application_name LIKE 'pg_trickle%';
    
  3. Check pg_trickle.cdc_mode is set to 'auto' or 'wal'.

Stuck in TRANSITIONING state

If a source table stays in transitioning for more than a few minutes:

SELECT source_table, cdc_mode FROM pgtrickle.check_cdc_health();

The transition has a timeout (wal_transition_timeout, default: 300 s). After the timeout it falls back to triggers automatically. If it keeps failing:

  1. Check wal_level = logical is still set.
  2. Check max_replication_slots has not been exceeded.
  3. Force revert: ALTER SYSTEM SET pg_trickle.cdc_mode = 'trigger'.

See also

SLA-based Smart Scheduling

Instead of manually tuning refresh schedules, you can tell pg_trickle what your data freshness requirement is and let it figure out the rest.

Set a target — "this stream table must never be more than 10 seconds stale" — and pg_trickle assigns the right scheduling tier, monitors whether it can meet the target based on real refresh history, and alerts you before an SLA breach happens.

set_stream_table_sla available since v0.22.0 recommend_schedule, predicted_sla_breach alerts available since v0.27.0


The problem with manual scheduling

A manually configured schedule like schedule => '5s' works when your source tables are quiet, but it can easily become wrong over time:

  • Source tables grow → refreshes take longer → the 5-second schedule no longer completes in time.
  • A brief write surge hits → a single refresh takes 4× the normal time → the SLA is quietly broken with no warning.
  • You add a complex JOIN → differential refresh cost jumps → you never notice until a user complains about stale data.

SLA-based scheduling solves this by tying the refresh schedule to an observable outcome (data freshness) instead of an assumed refresh duration.


Quickstart

Set an SLA on a stream table

SELECT pgtrickle.set_stream_table_sla('public.order_totals', interval '10 seconds');

This does two things immediately:

  1. Stores 10000 ms as freshness_deadline_ms in the catalog.
  2. Assigns a tier based on the SLA value (see Tier assignment).

pg_trickle will then actively monitor whether each refresh is on track to meet the target, and alert you if it predicts a breach.

Check the current SLA

SELECT pgt_name, freshness_deadline_ms, refresh_tier, staleness
FROM pgtrickle.stream_tables_info
WHERE pgt_name = 'order_totals';

Tier assignment

set_stream_table_sla maps your freshness target to one of three scheduler tiers:

SLA targetTier assignedDescription
≤ 5 secondsHotMaximum priority; refreshes as fast as the worker pool allows
6–30 secondsWarmStandard priority
> 30 secondsColdBackground priority; other tables take precedence

You can still override the tier manually after setting an SLA:

-- Force to hot regardless of SLA
SELECT pgtrickle.set_stream_table_tier('public.order_totals', 'hot');

Schedule recommendations

Once a stream table has accumulated enough refresh history, pg_trickle can recommend an optimal schedule based on observed refresh durations using a median+MAD (Median Absolute Deviation) statistical model.

Single table recommendation

SELECT pgtrickle.recommend_schedule('public.order_totals');

Returns JSONB:

{
  "recommended_interval_seconds": 3.8,
  "current_interval_seconds": 5.0,
  "delta_pct": -24.0,
  "peak_window_cron": null,
  "confidence": 0.87,
  "reasoning": "median=1247ms mad=183ms p95_estimate=1796ms recommended=2.7s confidence=0.87"
}
FieldMeaning
recommended_interval_secondsSuggested new schedule, with a 1.5× headroom over p95 refresh duration
current_interval_secondsCurrent configured schedule
delta_pctHow much the recommendation differs from the current schedule (negative = speed up)
confidence0.0–1.0; reflects how consistent refresh times are; 0.0 means insufficient history
reasoningHuman-readable explanation of how the recommendation was computed

All tables at once

SELECT name, current_interval_seconds, recommended_interval_seconds, delta_pct, confidence
FROM pgtrickle.schedule_recommendations()
ORDER BY ABS(delta_pct) DESC;

This is particularly useful for a periodic review of your entire deployment. Sort by delta_pct DESC to find tables where the schedule is too aggressive (recommendation is longer → reducing unnecessary CPU cost), or by delta_pct ASC to find tables where the schedule is too relaxed (refresh is taking too long to stay within SLA).

Minimum sample threshold

The planner requires at least pg_trickle.schedule_recommendation_min_samples completed refreshes (default: 20) before computing a non-zero confidence score. Until then, confidence = 0.0 and the recommendation reflects the last known full refresh duration. You can lower this during initial setup:

ALTER SYSTEM SET pg_trickle.schedule_recommendation_min_samples = 10;
SELECT pg_reload_conf();

Predictive SLA breach alerts

After every refresh, the scheduler checks whether the predicted next refresh duration will exceed the stream table's freshness_deadline_ms by more than 20%. If so, a predicted_sla_breach alert is emitted via LISTEN/NOTIFY on the pg_trickle_alert channel.

This gives you advance warning before the breach happens — not after.

Listening for alerts

LISTEN pg_trickle_alert;

A breach alert payload looks like:

{
  "event": "predicted_sla_breach",
  "stream_table": "public.order_totals",
  "predicted_duration_ms": 12800,
  "deadline_ms": 10000,
  "overage_pct": 28.0,
  "timestamp": "2025-04-23T14:32:00Z"
}

Debouncing

To avoid flooding your alerting system during a temporary spike, alerts are debounced per stream table:

pg_trickle.schedule_alert_cooldown_seconds = 300   # 5 minutes (default)

Only one predicted_sla_breach alert fires per stream table per cooldown window, even if every refresh during that window predicts a breach.

Bridging alerts to external systems

See Monitoring & Alerting for examples of routing pg_trickle_alert notifications to PagerDuty, Slack, Prometheus alertmanager, and other systems.


Workflow: setting up SLA-based scheduling from scratch

1. Create the stream table with a rough initial schedule

SELECT pgtrickle.create_stream_table(
    'public.order_totals',
    $$SELECT customer_id, SUM(amount) AS total FROM orders GROUP BY customer_id$$,
    schedule => '5s'
);

2. Let it run for a while to build history

Wait for at least 20 refreshes (typically a minute or two with a 5-second schedule):

SELECT COUNT(*) FROM pgtrickle.pgt_refresh_history
WHERE pgt_id = (SELECT pgt_id FROM pgtrickle.pgt_stream_tables WHERE pgt_name = 'order_totals')
  AND status = 'COMPLETED';

3. Set an SLA

SELECT pgtrickle.set_stream_table_sla('public.order_totals', interval '8 seconds');

4. Get a data-driven recommendation

SELECT pgtrickle.recommend_schedule('public.order_totals');

5. Apply the recommendation

SELECT pgtrickle.alter_stream_table(
    'public.order_totals',
    p_schedule => '3s'   -- use the recommended value
);

6. Monitor for predicted breaches

LISTEN pg_trickle_alert;

Or query the alert history:

SELECT event_type, stream_table, payload, created_at
FROM pgtrickle.pgt_alert_history
WHERE event_type = 'predicted_sla_breach'
ORDER BY created_at DESC
LIMIT 10;

Checking current SLA status across all tables

SELECT
    pgt_name,
    freshness_deadline_ms,
    staleness,
    CASE WHEN staleness > (freshness_deadline_ms || ' milliseconds')::interval
         THEN 'BREACHED' ELSE 'OK' END AS sla_status
FROM pgtrickle.stream_tables_info
WHERE freshness_deadline_ms IS NOT NULL
ORDER BY sla_status DESC, staleness DESC;

Removing an SLA

To remove an SLA target without changing the schedule:

UPDATE pgtrickle.pgt_stream_tables
SET freshness_deadline_ms = NULL
WHERE pgt_name = 'order_totals';

No predictive breach alerts will fire after this.


When recommendations have low confidence

A low confidence score (< 0.5) means refresh durations are highly variable. Common causes:

CauseFix
Not enough historyWait for more refreshes, or lower schedule_recommendation_min_samples
Highly variable write loadWiden the prediction window; consider a cron schedule for peak hours
Source table growing rapidlyThe current schedule may already be too slow; reduce it manually
Mix of FULL and DIFFERENTIAL refreshesCheck that the differential threshold is tuned correctly

See also

Downstream Publications

pg_trickle can expose the live content of any stream table as a PostgreSQL logical replication publication. This lets any tool that understands PostgreSQL logical replication — Debezium, Kafka Connect, Spark Structured Streaming, a read replica, a custom consumer — subscribe to stream table changes in real time, without needing to poll the table or set up a separate CDC pipeline.

Available since v0.22.0


Why use downstream publications?

Stream tables are already the result of incremental view maintenance — every refresh produces a well-defined diff of inserted and deleted rows. Exposing that diff via logical replication means external systems get exactly the same granular change events that pg_trickle computes internally, without extra work.

Use caseTool
Push stream table changes to KafkaDebezium, Kafka Connect
Replicate to a read replica or standbyPostgreSQL physical/logical replica
Build event-driven microservicesAny logical replication consumer
Feed a data warehouse incrementallySpark, Flink, Airbyte
Archive change historyCustom WAL consumer

How it works

When you call stream_table_to_publication, pg_trickle creates a standard PostgreSQL publication named pgt_pub_<stream_table_name> that covers the stream table's underlying storage table.

Stream table refresh (MERGE)
        │
        ▼
  Rows inserted / deleted in stream table storage
        │
        ▼
  PostgreSQL logical replication
        │
        ▼
  Subscribers receive INSERT / DELETE events
  (standard pgoutput protocol)

The publication is named pgt_pub_<stream_table_name> and is owned by the same role that created the stream table.


Quickstart

Step 1 — Verify PostgreSQL is configured

Logical replication requires wal_level = logical in postgresql.conf:

SHOW wal_level;
-- Should return: logical

If it returns replica or minimal, update postgresql.conf:

wal_level = logical

Then restart PostgreSQL. You also need enough replication slots:

max_replication_slots = 10   # at least 1 per subscriber

Step 2 — Create the publication

SELECT pgtrickle.stream_table_to_publication('public.order_totals');
-- INFO: pg_trickle: created publication 'pgt_pub_order_totals' for stream table 'public.order_totals'

This creates the publication immediately. Any subscriber can connect right away.

Step 3 — Create a subscriber

PostgreSQL logical replication subscriber

-- On a downstream PostgreSQL instance:
CREATE SUBSCRIPTION order_totals_sub
    CONNECTION 'host=primary port=5432 dbname=mydb user=replicator password=secret'
    PUBLICATION pgt_pub_order_totals;

Debezium (via Kafka Connect)

{
  "name": "order-totals-connector",
  "config": {
    "connector.class": "io.debezium.connector.postgresql.PostgresConnector",
    "database.hostname": "primary",
    "database.port": "5432",
    "database.user": "replicator",
    "database.password": "secret",
    "database.dbname": "mydb",
    "publication.name": "pgt_pub_order_totals",
    "table.include.list": "public.order_totals",
    "plugin.name": "pgoutput"
  }
}

Kafka Connect (without Debezium)

{
  "name": "order-totals-source",
  "config": {
    "connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
    "publication.name": "pgt_pub_order_totals"
  }
}

Checking whether a publication exists

-- Via pg_trickle catalog
SELECT pgt_name, downstream_publication_name
FROM pgtrickle.pgt_stream_tables
WHERE pgt_name = 'order_totals';

-- Via PostgreSQL catalog
SELECT pubname, puballtables, pubinsert, pubupdate, pubdelete
FROM pg_publication
WHERE pubname = 'pgt_pub_order_totals';

Monitoring subscriber lag

Slow or stalled subscribers can cause the WAL to grow unboundedly. Monitor replication slot lag:

SELECT slot_name, database, active, pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) AS lag
FROM pg_replication_slots
WHERE slot_name LIKE 'pgt_%'
ORDER BY restart_lsn;

pg_trickle also watches subscriber lag automatically via pg_trickle.publication_lag_warn_bytes (v0.25.0). When a slot exceeds the configured byte lag:

  1. A warning is logged.
  2. Change-buffer cleanup is paused for that slot until it catches up — preventing data loss for slow consumers.

Configure the threshold:

pg_trickle.publication_lag_warn_bytes = 67108864   # 64 MB

Removing a publication

SELECT pgtrickle.drop_stream_table_publication('public.order_totals');

Publications are also automatically dropped when the stream table is dropped:

SELECT pgtrickle.drop_stream_table('public.order_totals');
-- Also drops pgt_pub_order_totals

Multiple subscribers on the same publication

A single publication can support multiple subscribers (e.g. both Debezium and a PostgreSQL logical replica). Each subscriber gets its own replication slot and offset — they progress independently.

-- One publication, multiple consumers:
-- Consumer 1: Debezium → Kafka
-- Consumer 2: PostgreSQL read replica
-- Consumer 3: Spark Structured Streaming

SELECT pgtrickle.stream_table_to_publication('public.order_totals');
-- All three consumers can subscribe to pgt_pub_order_totals

Partitioned stream tables

If your stream table is backed by a partitioned source, pg_trickle automatically sets publish_via_partition_root = true on the publication so that child partition changes are published under the parent table's identity. This matches the behaviour of trigger-based CDC and ensures subscribers see a consistent stream regardless of partitioning scheme.


Permissions

The role consuming the publication needs the REPLICATION attribute (or superuser):

CREATE ROLE replicator WITH REPLICATION LOGIN PASSWORD 'secret';

For Debezium and Kafka Connect, grant SELECT on the stream table too:

GRANT SELECT ON public.order_totals TO replicator;

Limitations

  • Only one publication per stream table. Calling stream_table_to_publication twice returns an error. Use a single publication with multiple subscribers instead.
  • wal_level = logical is required. This is not the default in all managed PostgreSQL providers — check your provider's documentation.
  • Subscribers must be able to handle INSERT and DELETE events (stream tables do not use UPDATE — every change is expressed as a delete + insert pair in the logical replication stream).

Relationship to WAL-based CDC

Downstream publications are a separate feature from pg_trickle's own WAL-based CDC mode. pg_trickle uses WAL internally (when cdc_mode = 'wal') to capture source table changes — the downstream publication feature exposes the output (stream table) to external consumers.

See CDC Modes for an explanation of how pg_trickle captures changes from source tables.


See also

Transactional Outbox

The transactional outbox pattern solves the dual-write problem: how to atomically update your database and publish an event to an external system without risking inconsistency if one side fails.

pg_trickle's outbox implementation builds on top of stream tables. Every time a stream table refresh produces a non-empty delta, a summary row is written to an outbox table in the same transaction as the MERGE. Consumers are notified via pg_notify the moment the commit lands.

Available since v0.28.0


How it works

Source tables (INSERT / UPDATE / DELETE)
        │
        ▼
  CDC trigger fires → pgtrickle_changes buffer
        │
        ▼
  Stream table refresh (MERGE)
        │   ← same transaction ─────────────────────────────┐
        ▼                                                    │
  Delta rows applied to stream table               outbox row written
  (inserted_count / deleted_count recorded)        to pgtrickle.outbox_<st>
                                                             │
                                                    pg_notify fired
                                                             │
                                                    Consumer polls / listens

The outbox row is guaranteed to exist if and only if the stream table was updated. There is no window where the stream table changes but no outbox row exists, or an outbox row exists but the stream table did not change.

Inline vs. claim-check mode

ConditionModeWhat the consumer receives
delta_rows ≤ outbox_inline_threshold_rows (default: 1000)InlineFull delta serialized as JSONB in payload
delta_rows > outbox_inline_threshold_rowsClaim-checkis_claim_check = true, payload is NULL; delta rows in pgtrickle.outbox_delta_rows_<st>

Inline mode is simpler — the consumer reads one row and gets everything. Claim-check mode avoids storing very large payloads in the outbox table, at the cost of an extra query to fetch the delta rows.


Quickstart

1. Create a stream table

SELECT pgtrickle.create_stream_table(
    'public.order_totals',
    $$SELECT customer_id, SUM(amount) AS total
      FROM orders
      GROUP BY customer_id$$
);

2. Enable the outbox

SELECT pgtrickle.enable_outbox('public.order_totals');

This creates:

  • pgtrickle.outbox_order_totals — outbox header table
  • pgtrickle.outbox_delta_rows_order_totals — claim-check delta rows
  • pgtrickle.pgt_outbox_latest_order_totals — convenience view pointing to the most recent outbox row

3. Create consumer groups

Each independent consumer needs its own group. Groups track their own offset into the outbox table so they never interfere with each other.

SELECT pgtrickle.create_consumer_group(
    'shipping_service',
    'public.order_totals'
);

SELECT pgtrickle.create_consumer_group(
    'analytics_pipeline',
    'public.order_totals'
);

4. Poll for messages

A consumer loop looks like this:

-- Claim up to 50 unprocessed rows, hold the lease for 30 seconds
SELECT * FROM pgtrickle.poll_outbox(
    'public.order_totals',
    'shipping_service',
    batch_size    => 50,
    lease_seconds => 30
);

poll_outbox returns outbox rows that this consumer has not yet committed. Each row is leased — no other worker sharing the same consumer group can claim it until the lease expires.

5. Process and commit

After successfully processing each batch:

SELECT pgtrickle.commit_offset('shipping_service', 'public.order_totals', last_id);

last_id is the highest id value from the batch you just processed. Committed rows are never returned by poll_outbox again.


Reading the payload

Inline mode

SELECT
    id,
    created_at,
    inserted_count,
    deleted_count,
    payload -> 'inserted' AS inserted_rows,
    payload -> 'deleted'  AS deleted_rows
FROM pgtrickle.outbox_order_totals
ORDER BY id DESC
LIMIT 5;

Claim-check mode

-- Get the outbox row
SELECT id, is_claim_check FROM pgtrickle.pgt_outbox_latest_order_totals;

-- Fetch the actual delta rows for a claim-check outbox row
SELECT row_op, row_data
FROM pgtrickle.outbox_delta_rows_order_totals
WHERE outbox_id = <outbox_id>
ORDER BY row_num;

Multiple workers (parallel consumption)

Multiple workers in the same consumer group share the workload. pg_trickle assigns non-overlapping leases, so each row is processed by exactly one worker at a time.

-- Worker 1
SELECT * FROM pgtrickle.poll_outbox('public.order_totals', 'shipping_service');

-- Worker 2 (concurrent, gets a different batch)
SELECT * FROM pgtrickle.poll_outbox('public.order_totals', 'shipping_service');

Workers should register their presence so the system can detect dead workers:

-- Call periodically (e.g. every 30 s) while the worker is alive
SELECT pgtrickle.consumer_heartbeat('shipping_service', 'worker-1');

Workers that miss their heartbeat deadline are removed from the consumer group. Any leases held by a dead worker expire automatically after lease_seconds, returning those rows to the available pool.


Lease management

Extending a lease

If processing is taking longer than expected:

SELECT pgtrickle.extend_lease(
    'shipping_service',
    'public.order_totals',
    outbox_id     => 42,
    extra_seconds => 60
);

Seeking to a specific position

For replay or recovery scenarios:

-- Replay from the beginning
SELECT pgtrickle.seek_offset('shipping_service', 'public.order_totals', 0);

-- Skip ahead to the current tip
SELECT pgtrickle.seek_offset(
    'shipping_service', 'public.order_totals',
    (SELECT MAX(id) FROM pgtrickle.outbox_order_totals)
);

Monitoring

Check outbox health

SELECT pgtrickle.outbox_status('public.order_totals');

Returns JSONB:

{
  "enabled": true,
  "stream_table": "public.order_totals",
  "outbox_table": "pgtrickle.outbox_order_totals",
  "row_count": 1247,
  "oldest_row": "2025-04-20T10:00:00Z",
  "newest_row": "2025-04-23T14:32:00Z",
  "retention_hours": 24
}

Consumer lag

-- Per consumer group
SELECT pgtrickle.consumer_lag('shipping_service', 'public.order_totals');

Returns the number of outbox rows that the consumer group has not yet committed. A large or growing lag means the consumer is falling behind.

Global outbox overview

SELECT * FROM pgtrickle.pgt_outbox_config;

Catalog tables

TableContents
pgtrickle.pgt_outbox_configOne row per enabled outbox: ST OID, outbox table name, retention hours
pgtrickle.pgt_consumer_groupsOne row per consumer group: name, stream table, created_at
pgtrickle.pgt_consumer_offsetsPer-group committed offsets and lease state
pgtrickle.outbox_<st>Outbox header rows (auto-created per stream table)
pgtrickle.outbox_delta_rows_<st>Claim-check delta rows (auto-created per stream table)

Retention and cleanup

Outbox rows are automatically deleted after outbox_retention_hours (default: 24). Claim-check delta rows are removed when commit_offset is called or when the retention period expires.

Configure retention per stream table at enable time:

SELECT pgtrickle.enable_outbox('public.order_totals', p_retention_hours => 48);

Or globally in postgresql.conf:

pg_trickle.outbox_retention_hours = 48

Disabling the outbox

SELECT pgtrickle.disable_outbox('public.order_totals');

This drops the outbox table, delta-rows table, and latest view, and removes the catalog entry. Consumer groups must be dropped separately:

SELECT pgtrickle.drop_consumer_group('shipping_service', 'public.order_totals');

GUCRecommended valueNotes
pg_trickle.outbox_enabledonMust be on for the outbox background worker to run
pg_trickle.outbox_retention_hours2472Balance storage cost vs. replay window
pg_trickle.outbox_drain_batch_size5002000Larger batches improve throughput
pg_trickle.outbox_inline_threshold_rows5002000Tune based on typical delta size
pg_trickle.outbox_skip_empty_deltaonSkip writing outbox rows when delta is empty
pg_trickle.consumer_cleanup_enabledonAuto-remove dead consumer workers
pg_trickle.consumer_dead_threshold_hours1Mark worker dead after 1 h of silence

Anti-patterns

Do not poll without committing. If your consumer processes messages but never calls commit_offset, the lag grows unboundedly and messages are replayed forever after a worker restart.

Do not use a single consumer group for independent services. Each service that needs to process outbox events independently must have its own consumer group. Sharing a group means one service blocking the other.

Do not delete outbox rows manually. Let the retention mechanism handle cleanup. Manual deletes can cause consumer group offsets to point to non-existent rows.

Do not enable the outbox on IMMEDIATE-mode stream tables. The outbox requires DIFFERENTIAL or FULL refresh mode to detect which rows changed.


See also

Transactional Inbox

The transactional inbox pattern solves the duplicate-processing problem: when messages arrive from an external system, your service needs a guarantee that each message is processed exactly once, even if the message broker delivers it more than once or your service restarts mid-batch.

pg_trickle's inbox works by writing incoming messages to a PostgreSQL table and using stream tables to present live views of pending work, dead letters, and per-type statistics. Because the inbox table is ordinary PostgreSQL, your application's processing step and the "mark as processed" step can be wrapped in a single transaction — making the entire operation atomic.

Available since v0.28.0


How it works

External system (Kafka / NATS / webhook / custom consumer)
        │
        ▼
  INSERT into pgtrickle.<inbox_name>
  (idempotent: ON CONFLICT DO NOTHING on event_id)
        │
        ▼
  Stream tables refresh automatically:
  ├─ <inbox_name>_pending   ← WHERE processed_at IS NULL AND retry_count < max_retries
  ├─ <inbox_name>_dlq       ← WHERE processed_at IS NULL AND retry_count >= max_retries
  └─ <inbox_name>_stats     ← GROUP BY event_type (counts)
        │
        ▼
  Your application queries <inbox_name>_pending,
  processes each message, then:
  UPDATE <inbox_name> SET processed_at = now() WHERE event_id = $1

The stream tables are differential: when a row's processed_at is set, the change propagates to _pending and _stats in the next refresh cycle (typically within 1 second).


Quickstart

1. Create an inbox

SELECT pgtrickle.create_inbox('order_events');

This creates:

  • pgtrickle.order_events — the inbox table (one row per message)
  • pgtrickle.order_events_pending — stream table: unprocessed messages
  • pgtrickle.order_events_dlq — stream table: messages that exhausted retries
  • pgtrickle.order_events_stats — stream table: per-event-type counts

2. Write messages (sender side)

The inbox table has a standard schema:

ColumnTypeDescription
event_idTEXT PKGlobally unique message ID (idempotency key)
event_typeTEXTMessage type / topic (e.g. order.placed)
sourceTEXTOriginating system or service
aggregate_idTEXTBusiness entity ID (e.g. order ID)
payloadJSONBMessage body
received_atTIMESTAMPTZSet to now() on insert
processed_atTIMESTAMPTZSet by your application after processing
errorTEXTLast error message, if any
retry_countINTNumber of failed attempts
trace_idTEXTDistributed trace ID for observability

Write messages with conflict protection to guarantee idempotency:

INSERT INTO pgtrickle.order_events
    (event_id, event_type, source, aggregate_id, payload)
VALUES
    ('evt-001', 'order.placed', 'shop-api', 'ORD-123', '{"amount": 49.99}')
ON CONFLICT (event_id) DO NOTHING;

3. Process messages (receiver side)

-- Read pending messages
SELECT event_id, event_type, aggregate_id, payload
FROM pgtrickle.order_events_pending
LIMIT 100;

Process each message in a transaction:

BEGIN;

-- Do your business logic here
-- (e.g. publish to downstream service, update application tables)

-- Mark as processed atomically with your business logic
UPDATE pgtrickle.order_events
SET processed_at = now()
WHERE event_id = 'evt-001';

COMMIT;

If the transaction rolls back, processed_at stays NULL and the message remains in _pending for retry.


Using an existing table (bring-your-own-table)

If you already have a messages table, point pg_trickle at it instead of creating a new one:

SELECT pgtrickle.enable_inbox_tracking(
    'my_inbox',                          -- logical name
    'app.incoming_events',               -- your existing table
    p_id_column          => 'msg_id',
    p_processed_at_column => 'done_at',
    p_event_type_column  => 'type'
);

pg_trickle validates that the required columns exist, then creates the standard stream tables on top of your table. The underlying table is not modified.


Ordering guarantees (per-aggregate)

By default, multiple workers can process messages for the same aggregate_id concurrently. If your business logic requires strictly sequential processing per aggregate (e.g. events for the same order must be handled in order), enable ordering:

SELECT pgtrickle.enable_inbox_ordering(
    'order_events',
    p_aggregate_column => 'aggregate_id',
    p_sequence_column  => 'received_at'
);

This creates a fourth stream table:

  • pgtrickle.next_order_events — one row per aggregate_id, always the next unprocessed message for that aggregate (DISTINCT ON semantics)

Workers that need ordered processing should query next_order_events instead of order_events_pending:

-- Only the next message per aggregate — safe for parallel workers
SELECT event_id, event_type, aggregate_id, payload
FROM pgtrickle.next_order_events
LIMIT 50;

A worker processing aggregate_id = 'ORD-123' blocks any other message for that order until it commits. Different aggregates are processed in parallel.

Checking for ordering gaps

-- Returns aggregate IDs where messages are out of sequence or missing
SELECT * FROM pgtrickle.inbox_ordering_gaps('order_events');

Priority processing

If some message types should be processed before others, enable priority scheduling:

SELECT pgtrickle.enable_inbox_priority(
    'order_events',
    p_priority_column => 'event_type',
    p_priority_map    => '{"order.cancelled": 1, "order.placed": 2, "order.shipped": 3}'::jsonb
);

Lower priority values are processed first. Messages without an entry in the priority map default to priority 999 (processed last).


Multi-worker partitioning

When many workers process the same inbox concurrently, you can partition the workload by aggregate ID using consistent hashing:

-- Worker 0 of 4: only process messages assigned to partition 0
SELECT event_id, aggregate_id, payload
FROM pgtrickle.order_events_pending
WHERE pgtrickle.inbox_is_my_partition('order_events', aggregate_id, 0, 4);

-- Worker 1 of 4
SELECT event_id, aggregate_id, payload
FROM pgtrickle.order_events_pending
WHERE pgtrickle.inbox_is_my_partition('order_events', aggregate_id, 1, 4);

The hash function is deterministic — the same aggregate_id always maps to the same partition — so you can scale the worker pool without rebalancing.


Dead-letter queue

Messages that exceed max_retries (default: 3) are automatically visible in the DLQ stream table:

-- View dead letters
SELECT event_id, event_type, aggregate_id, error, retry_count
FROM pgtrickle.order_events_dlq
ORDER BY received_at;

Replaying DLQ messages

After fixing the root cause:

-- Reset retry count so the message is picked up again
SELECT pgtrickle.replay_inbox_messages(
    'order_events',
    p_event_ids => ARRAY['evt-001', 'evt-002']
);

-- Or replay all DLQ messages of a specific type
SELECT pgtrickle.replay_inbox_messages(
    'order_events',
    p_event_type => 'order.placed'
);

Monitoring

Health check

SELECT pgtrickle.inbox_health('order_events');

Returns a JSONB object:

{
  "inbox": "order_events",
  "pending_count": 42,
  "dlq_count": 3,
  "oldest_pending_age_seconds": 12,
  "throughput_per_minute": 180,
  "status": "healthy"
}

A status of "degraded" means the DLQ count or pending age is above configured thresholds.

Detailed status

SELECT pgtrickle.inbox_status('order_events');

Returns richer JSONB including processing rates, error breakdown, and stream table refresh counts.

Global inbox overview

SELECT * FROM pgtrickle.pgt_inbox_config;

Catalog tables

TableContents
pgtrickle.pgt_inbox_configOne row per inbox: name, schema, max_retries, schedule
pgtrickle.pgt_inbox_ordering_configOrdering settings per inbox
pgtrickle.pgt_inbox_priority_configPriority map per inbox
pgtrickle.<name>The inbox message table (auto-created)
pgtrickle.<name>_pendingStream table: unprocessed messages
pgtrickle.<name>_dlqStream table: dead letters
pgtrickle.<name>_statsStream table: per-event-type counts
pgtrickle.next_<name>Stream table: next message per aggregate (ordering only)

Retention and cleanup

Processed messages are automatically deleted after inbox_processed_retention_hours (default: 72). DLQ rows are held for inbox_dlq_retention_hours (default: 168 = 7 days) to give operators time to inspect and replay them.

Configure globally in postgresql.conf:

pg_trickle.inbox_processed_retention_hours = 72
pg_trickle.inbox_dlq_retention_hours = 168

Dropping an inbox

-- Drop the inbox and its stream tables, but keep the underlying table
SELECT pgtrickle.drop_inbox('order_events');

-- Drop everything including the backing table
SELECT pgtrickle.drop_inbox('order_events', p_cascade => true);

GUCRecommended valueNotes
pg_trickle.inbox_enabledonMust be on for inbox background workers to run
pg_trickle.inbox_processed_retention_hours2472Adjust based on audit requirements
pg_trickle.inbox_dlq_retention_hours168Keep DLQ items for at least 7 days
pg_trickle.inbox_drain_batch_size5002000Tune for throughput vs. latency
pg_trickle.inbox_dlq_alert_max_per_refresh100Alert when DLQ grows rapidly

Anti-patterns

Do not mark messages as processed outside a transaction with your business logic. The atomic combination of "do work + mark processed" is what prevents duplicate processing. If you process first and then mark processed in a separate transaction, a crash between the two steps causes duplicate processing.

Do not share a single inbox across unrelated services. Each service should have its own inbox so they can fail, replay, and scale independently.

Do not ignore the DLQ. A growing DLQ is a signal that something is consistently broken. Set up an alert on inbox_dlq_alert_max_per_refresh and review DLQ items regularly.

Do not delete inbox rows manually. Let the retention mechanism handle cleanup. Manual deletes can confuse the stream table refresh cycle.


See also

What Happens When You INSERT a Row?

This tutorial traces the complete lifecycle of a single INSERT statement on a base table that is referenced by a stream table — from the moment the row is written to the moment the stream table reflects the change.

Setup: A Real-World Example

Suppose you run an e-commerce platform. You have an orders table and a stream table that maintains a running total per customer:

-- Base table
CREATE TABLE orders (
    id    SERIAL PRIMARY KEY,
    customer TEXT NOT NULL,
    amount   NUMERIC(10,2) NOT NULL
);

-- Stream table: always-fresh customer totals
SELECT pgtrickle.create_stream_table(
    name     => 'customer_totals',
    query    => $$
      SELECT customer, SUM(amount) AS total, COUNT(*) AS order_count
      FROM orders GROUP BY customer
    $$,
    schedule => '1m'  -- refresh when data is staler than 1 minute
    -- refresh_mode defaults to 'AUTO' (differential with full-refresh fallback)
);

After creation, customer_totals is a real PostgreSQL table:

SELECT * FROM customer_totals;
-- (empty — no orders yet)

Phase 1: The INSERT

A new order arrives:

INSERT INTO orders (customer, amount) VALUES ('alice', 49.99);

What happens inside PostgreSQL

When create_stream_table() was called, pg_trickle installed an AFTER INSERT OR UPDATE OR DELETE trigger on the orders table. This trigger fires automatically — the user's INSERT statement triggers it transparently.

The trigger function (pgtrickle_changes.pg_trickle_cdc_fn_<oid>()) executes inside the same transaction as the INSERT and writes a single row into the change buffer table:

pgtrickle_changes.changes_16384    (where 16384 = orders table OID)
┌───────────┬─────────────┬────────┬─────────┬──────────┬──────────┬────────────┐
│ change_id │ lsn         │ action │ pk_hash  │ new_id   │ new_cust │ new_amount │
├───────────┼─────────────┼────────┼─────────┼──────────┼──────────┼────────────┤
│ 1         │ 0/1A3F2B80  │ I      │ -837291 │ 1        │ alice    │ 49.99      │
└───────────┴─────────────┴────────┴─────────┴──────────┴──────────┴────────────┘

Key details:

  • lsn: The current WAL Log Sequence Number (pg_current_wal_lsn()), used to bound which changes belong to which refresh cycle.
  • action: 'I' for INSERT, 'U' for UPDATE, 'D' for DELETE.
  • pk_hash: A pre-computed hash of the primary key (orders.id), used later for efficient row matching.
  • new_* columns: The actual column values from NEW, stored as native PostgreSQL types (not JSONB). There are no old_* values for INSERTs.

The trigger adds zero overhead to the user's transaction commit beyond this single INSERT into the buffer table. There is no JSONB serialization, no logical replication slot, and no external process involved.

Phase 2: The Scheduler Wakes Up

A background worker called the scheduler runs inside PostgreSQL (registered via shared_preload_libraries). It wakes up every pg_trickle.scheduler_interval_ms milliseconds (default: 1000ms) and performs a tick:

  1. Rebuild the DAG (if any stream tables were created/dropped since last tick) — a dependency graph of all stream tables and their source tables.
  2. Topological sort — determine the refresh order so that stream tables depending on other stream tables are refreshed after their dependencies.
  3. For each stream table, check: has its staleness exceeded its schedule?

For customer_totals with a '1m' schedule, the scheduler compares:

  • now() minus data_timestamp (the freshness watermark from the last refresh)
  • Against the schedule: 60 seconds

If more than 60 seconds have elapsed and the stream table isn't already being refreshed, the scheduler begins a refresh.

Phase 3: Frontier Advancement

Before executing the refresh, the scheduler creates a new frontier — a snapshot of how far to read changes from each source table:

Previous frontier: { orders(16384): lsn = 0/1A3F2A00 }
New frontier:      { orders(16384): lsn = 0/1A3F2C00 }

The frontier is a DBSP-inspired version vector. Each source table has its own LSN cursor. The refresh will process all changes in the buffer table where lsn > previous_frontier_lsn AND lsn <= new_frontier_lsn.

This means:

  • Changes committed before the previous refresh are already reflected.
  • Changes committed after the new frontier will be picked up in the next cycle.
  • The INSERT we made (lsn = 0/1A3F2B80) falls within this window.

Phase 4: Change Detection — Is There Anything to Do?

Before running the full delta query, the scheduler runs a short-circuit check: does the change buffer actually have any rows in the LSN window?

SELECT count(*)::bigint FROM (
    SELECT 1 FROM pgtrickle_changes.changes_16384
    WHERE lsn > '0/1A3F2A00'::pg_lsn
    AND lsn <= '0/1A3F2C00'::pg_lsn
    LIMIT <threshold>
) __pgt_capped

This query also checks the adaptive threshold: if the number of changes exceeds a percentage of the source table size (default: 10%), the scheduler falls back to a FULL refresh instead of DIFFERENTIAL, because applying thousands of individual deltas would be slower than a bulk reload.

For our single INSERT, the count is 1 — well below the threshold. The scheduler proceeds with a DIFFERENTIAL refresh.

Phase 5: Delta Query Generation (DVM Engine)

This is where the Differential View Maintenance (DVM) engine does its work. The defining query:

SELECT customer, SUM(amount) AS total, COUNT(*) AS order_count
FROM orders GROUP BY customer

is parsed into an operator tree:

Aggregate(GROUP BY customer, SUM(amount), COUNT(*))
  └── Scan(orders)

The DVM engine differentiates each operator — converting it from "compute the full result" to "compute only what changed":

Step 1: Differentiate the Scan

The Scan(orders) operator becomes a read from the change buffer:

-- Reads only changes in the LSN window, splitting UPDATEs into DELETE+INSERT
WITH __pgt_raw AS (
    SELECT c.pk_hash, c.action,
           c."new_customer", c."old_customer",
           c."new_amount", c."old_amount"
    FROM pgtrickle_changes.changes_16384 c
    WHERE c.lsn > '0/1A3F2A00'::pg_lsn
    AND   c.lsn <= '0/1A3F2C00'::pg_lsn
)
-- INSERT rows: take new_* values
SELECT pk_hash AS __pgt_row_id, 'I' AS __pgt_action,
       "new_customer" AS customer, "new_amount" AS amount
FROM __pgt_raw WHERE action IN ('I', 'U')
UNION ALL
-- DELETE rows: take old_* values
SELECT pk_hash AS __pgt_row_id, 'D' AS __pgt_action,
       "old_customer" AS customer, "old_amount" AS amount
FROM __pgt_raw WHERE action IN ('D', 'U')

For our single INSERT, this produces:

__pgt_row_id | __pgt_action | customer | amount
-------------|--------------|----------|-------
-837291      | I            | alice    | 49.99

Step 2: Differentiate the Aggregate

The Aggregate differentiation is the heart of incremental maintenance. Instead of re-computing SUM(amount) over the entire orders table, it computes:

-- Delta for SUM: add new values, subtract deleted values
SELECT customer,
       SUM(CASE WHEN __pgt_action = 'I' THEN amount
                WHEN __pgt_action = 'D' THEN -amount END) AS total,
       SUM(CASE WHEN __pgt_action = 'I' THEN 1
                WHEN __pgt_action = 'D' THEN -1 END) AS order_count,
       pgtrickle.pg_trickle_hash(customer::text) AS __pgt_row_id,
       'I' AS __pgt_action
FROM <scan_delta>
GROUP BY customer

For our INSERT of ('alice', 49.99), this yields:

customer | total  | order_count | __pgt_row_id | __pgt_action
---------|--------|-------------|--------------|-------------
alice    | +49.99 | +1          | 7283194      | I

The stream table uses reference counting: it tracks __pgt_count (how many source rows contribute to each group). When __pgt_count reaches 0, the group row is deleted.

Phase 6: MERGE Into the Stream Table

The delta is applied to the customer_totals storage table using a single SQL MERGE statement:

MERGE INTO public.customer_totals AS st
USING (<delta_query>) AS d
ON st.__pgt_row_id = d.__pgt_row_id
WHEN MATCHED AND d.__pgt_action = 'D' THEN DELETE
WHEN MATCHED AND d.__pgt_action = 'I' THEN
    UPDATE SET customer = d.customer, total = d.total, order_count = d.order_count
WHEN NOT MATCHED AND d.__pgt_action = 'I' THEN
    INSERT (__pgt_row_id, customer, total, order_count)
    VALUES (d.__pgt_row_id, d.customer, d.total, d.order_count)

Since alice didn't exist before, this is a NOT MATCHEDINSERT. The stream table now contains:

SELECT * FROM customer_totals;
 customer | total | order_count
----------|-------|------------
 alice    | 49.99 | 1

Phase 7: Cleanup and Bookkeeping

After the MERGE succeeds:

  1. Consumed changes are deleted from the buffer table:

    DELETE FROM pgtrickle_changes.changes_16384
    WHERE lsn > '0/1A3F2A00'::pg_lsn
    AND lsn <= '0/1A3F2C00'::pg_lsn
    
  2. The frontier is saved to the catalog as JSONB, so the next refresh knows where to start.

  3. The refresh is recorded in pgtrickle.pgt_refresh_history:

    refresh_id | pgt_id | action       | rows_inserted | rows_deleted | delta_row_count | status    | initiated_by
    1          | 1      | DIFFERENTIAL | 1             | 0            | 1               | COMPLETED | SCHEDULER
    

    The delta_row_count column (new in v0.2.0) records the total number of change buffer rows consumed during this refresh cycle.

  4. The data timestamp on the stream table is advanced, resetting the staleness clock.

  5. The MERGE template is cached in thread-local storage. The next refresh for this stream table skips SQL parsing, operator tree construction, and differentiation — it only substitutes LSN values into the cached template. This saves ~45ms per refresh cycle.

What About UPDATE and DELETE?

UPDATE

UPDATE orders SET amount = 59.99 WHERE id = 1;

The trigger writes a single row with action = 'U', capturing both OLD and NEW values:

action | new_amount | old_amount | new_customer | old_customer
-------|------------|------------|--------------|-------------
U      | 59.99      | 49.99      | alice        | alice

The scan differentiation splits this into:

  • DELETE old: (alice, 49.99) with action 'D'
  • INSERT new: (alice, 59.99) with action 'I'

The aggregate differentiation computes: +59.99 - 49.99 = +10.00 for alice's total. The MERGE updates the existing row.

DELETE

DELETE FROM orders WHERE id = 1;

The trigger writes action = 'D' with the OLD values. The aggregate differentiation computes -49.99 for the total and -1 for the count. If the __pgt_count reaches 0 (no more orders for alice), the MERGE deletes alice's row from the stream table entirely.

Performance: Why This Is Fast

StepWhat it avoids
Trigger-based CDCNo logical replication slot, no WAL parsing, no external process
Typed columnsNo JSONB serialization in the trigger, no jsonb_populate_record in the delta query
Pre-computed pk_hashNo per-row hash computation during the delta query
LSN-bounded readsIndex scan on the change buffer, not a full table scan
Algebraic differentiationProcesses only changed rows — O(changes) not O(table size)
MERGE statementSingle SQL round-trip for all inserts, updates, and deletes
Cached templatesAfter the first refresh, delta SQL generation is skipped entirely
Adaptive fallbackAutomatically switches to FULL refresh when changes exceed a threshold

For a table with 10 million rows and 100 changed rows, a DIFFERENTIAL refresh processes only those 100 rows. A FULL refresh would need to scan all 10 million.


What About IMMEDIATE Mode?

Everything described above applies to the default AUTO mode — changes accumulate in a buffer and are applied on a schedule using differential (delta-only) maintenance. As of v0.2.0, pg_trickle also supports IMMEDIATE mode, which takes a fundamentally different path.

With IMMEDIATE mode, there are no change buffers, no scheduler, and no waiting:

SELECT pgtrickle.create_stream_table(
    name         => 'customer_totals_live',
    query        => $$
      SELECT customer, SUM(amount) AS total, COUNT(*) AS order_count
      FROM orders GROUP BY customer
    $$,
    refresh_mode => 'IMMEDIATE'
);

How IMMEDIATE Mode Differs for INSERT

PhaseDIFFERENTIALIMMEDIATE
Trigger typeRow-level AFTER triggerStatement-level AFTER trigger with REFERENCING NEW TABLE
What's capturedOne buffer row per INSERTA transition table containing all inserted rows
When delta runsNext scheduler tick (up to schedule bound)Immediately, in the same transaction
Delta sourceChange buffer table (pgtrickle_changes.*)Temp table copied from transition table
ConcurrencyNo locking between writersAdvisory lock per stream table

When you run INSERT INTO orders ...:

  1. A BEFORE INSERT statement-level trigger acquires an advisory lock on the stream table
  2. The AFTER INSERT trigger captures the transition table (NEW TABLE AS __pgt_newtable) into a temp table
  3. The DVM engine generates the same delta query, but reads from the temp table instead of the change buffer
  4. The delta is applied to the stream table via INSERT/DELETE DML (not MERGE)
  5. The stream table is immediately up-to-date — within the same transaction
BEGIN;
INSERT INTO orders (customer, amount) VALUES ('alice', 49.99);
-- customer_totals_live already shows alice with total=49.99 here!
SELECT * FROM customer_totals_live;
COMMIT;

The delta SQL template is cached per (pgt_id, source_oid, has_new, has_old) combination, so subsequent trigger invocations skip query parsing entirely.


Next in This Series

What Happens When You UPDATE a Row?

This tutorial traces what happens when an UPDATE statement hits a base table that is referenced by a stream table. It covers the trigger capture, the scan-level decomposition into DELETE + INSERT, and how each DVM operator propagates the change — including cases where the group key changes, where JOINs are involved, and where multiple UPDATEs happen within a single refresh window.

Prerequisite: Read WHAT_HAPPENS_ON_INSERT.md first — it introduces the full 7-phase lifecycle. This tutorial focuses on how UPDATE differs.

Setup

Same e-commerce example:

CREATE TABLE orders (
    id       SERIAL PRIMARY KEY,
    customer TEXT NOT NULL,
    amount   NUMERIC(10,2) NOT NULL
);

SELECT pgtrickle.create_stream_table(
    name     => 'customer_totals',
    query    => $$
      SELECT customer, SUM(amount) AS total, COUNT(*) AS order_count
      FROM orders GROUP BY customer
    $$,
    schedule => '1m'
);

-- Seed some data
INSERT INTO orders (customer, amount) VALUES
    ('alice', 49.99),
    ('alice', 30.00),
    ('bob',   75.00);

After the first refresh, the stream table contains:

customer | total | order_count
---------|-------|------------
alice    | 79.99 | 2
bob      | 75.00 | 1

Case 1: Simple Value UPDATE (Same Group Key)

UPDATE orders SET amount = 59.99 WHERE id = 1;

Alice's first order changes from 49.99 to 59.99. The customer (group key) stays the same.

Phase 1: Trigger Capture

The AFTER UPDATE trigger fires and writes one row to the change buffer with both OLD and NEW values:

pgtrickle_changes.changes_16384
┌───────────┬─────────────┬────────┬──────────┬──────────┬────────────┬──────────┬────────────┐
│ change_id │ lsn         │ action │ new_cust │ new_amt  │ old_cust   │ old_amt  │ pk_hash    │
├───────────┼─────────────┼────────┼──────────┼──────────┼────────────┼──────────┼────────────┤
│ 4         │ 0/1A3F3000  │ U      │ alice    │ 59.99    │ alice      │ 49.99    │ -837291    │
└───────────┴─────────────┴────────┴──────────┴──────────┴────────────┴──────────┴────────────┘

Key difference from INSERT: the trigger writes both new_* and old_* columns. The pk_hash is computed from NEW.id.

Phase 2–4: Scheduler, Frontier, Change Detection

Identical to the INSERT flow. The scheduler detects one change row in the LSN window.

Phase 5: Scan Differentiation — The U → D+I Split

This is where UPDATE handling diverges fundamentally. The scan delta operator decomposes the UPDATE into two events:

__pgt_row_id | __pgt_action | customer | amount
-------------|--------------|----------|-------
-837291      | D            | alice    | 49.99     ← old values (DELETE)
-837291      | I            | alice    | 59.99     ← new values (INSERT)

Why split into D+I? This is a core IVM principle. Downstream operators (aggregates, joins, filters) don't have special "update" logic — they only understand insertions and deletions. By decomposing the UPDATE:

  • The DELETE event subtracts the old values from running aggregates
  • The INSERT event adds the new values

This algebraic approach handles arbitrary operator trees without operator-specific update logic.

Phase 5 (continued): Aggregate Differentiation

The aggregate operator processes both events against the alice group:

-- DELETE event: subtract old values
alice: total += CASE WHEN action='D' THEN -49.99 END  →  -49.99
alice: count += CASE WHEN action='D' THEN -1 END       →  -1

-- INSERT event: add new values
alice: total += CASE WHEN action='I' THEN +59.99 END  →  +59.99
alice: count += CASE WHEN action='I' THEN +1 END       →  +1

Net effect on alice's group:

total delta:  -49.99 + 59.99 = +10.00
count delta:  -1 + 1 = 0

The aggregate emits this as an INSERT (because the group still exists and its value changed):

customer | total  | order_count | __pgt_row_id | __pgt_action
---------|--------|-------------|--------------|-------------
alice    | +10.00 | 0           | 7283194      | I

Phase 6: MERGE

The MERGE updates the existing row:

-- MERGE WHEN MATCHED AND action = 'I' THEN UPDATE:
-- alice's total: 79.99 + 10.00 = 89.99  (via reference counting)
-- alice's count: 2 + 0 = 2

Wait — that's not right. The MERGE doesn't add deltas; it replaces the row. The aggregate delta query actually computes the new absolute value by combining the stored state with the delta:

COALESCE(existing.total, 0) + delta.total  → 79.99 + 10.00 = 89.99
COALESCE(existing.__pgt_count, 0) + delta.__pgt_count → 2 + 0 = 2

Result:

SELECT * FROM customer_totals;
 customer | total | order_count
----------|-------|------------
 alice    | 89.99 | 2            ← was 79.99
 bob      | 75.00 | 1

Case 2: Group Key Change (Customer Reassignment)

UPDATE orders SET customer = 'bob' WHERE id = 2;

Alice's second order (amount=30.00) is reassigned to Bob. The group key itself changes.

Trigger Capture

change_id | lsn         | action | new_cust | new_amt | old_cust | old_amt | pk_hash
5         | 0/1A3F3100  | U      | bob      | 30.00   | alice    | 30.00   | 4521038

The old and new customer values differ.

Scan Delta: D+I Split

__pgt_row_id | __pgt_action | customer | amount
-------------|--------------|----------|-------
4521038      | D            | alice    | 30.00    ← removes from alice's group
4521038      | I            | bob      | 30.00    ← adds to bob's group

Aggregate Delta

The aggregate groups by customer, so the DELETE and INSERT land in different groups:

Group "alice":
  total delta:  -30.00
  count delta:  -1

Group "bob":
  total delta:  +30.00
  count delta:  +1

After MERGE

SELECT * FROM customer_totals;
 customer | total  | order_count
----------|--------|------------
 alice    | 59.99  | 1            ← lost one order (-30.00)
 bob      | 105.00 | 2            ← gained one order (+30.00)

This is why the D+I decomposition is essential. Without it, you'd need special "move between groups" logic. With it, the standard aggregate differentiation handles group key changes naturally.


Case 3: UPDATE That Deletes a Group

-- Alice only has one order left. Reassign it to bob.
UPDATE orders SET customer = 'bob' WHERE id = 1;

Aggregate Delta

Group "alice":
  total delta:    -59.99
  count delta:    -1
  new __pgt_count: 1 - 1 = 0  → group vanishes!

Group "bob":
  total delta:    +59.99
  count delta:    +1

When __pgt_count reaches 0, the aggregate emits a DELETE for alice's group:

customer | total | __pgt_row_id | __pgt_action
---------|-------|--------------|-------------
alice    | —     | 7283194      | D             ← group removed
bob      | ...   | 9182734      | I             ← group updated

The MERGE deletes alice's row entirely:

SELECT * FROM customer_totals;
 customer | total  | order_count
----------|--------|------------
 bob      | 165.00 | 3

Case 4: Multiple UPDATEs on the Same Row (Within One Refresh Window)

What if a row is updated multiple times before the next refresh?

UPDATE orders SET amount = 10.00 WHERE id = 3;  -- bob: 75 → 10
UPDATE orders SET amount = 20.00 WHERE id = 3;  -- bob: 10 → 20
UPDATE orders SET amount = 30.00 WHERE id = 3;  -- bob: 20 → 30

The change buffer now has 3 rows for pk_hash of order #3:

change_id | action | old_amt | new_amt
6         | U      | 75.00   | 10.00
7         | U      | 10.00   | 20.00
8         | U      | 20.00   | 30.00

Net-Effect Computation

The scan delta uses a split fast-path design. Since order #3 has multiple changes (cnt > 1), it takes the multi-change path with window functions:

FIRST_VALUE(action) OVER (PARTITION BY pk_hash ORDER BY change_id)  → 'U'
LAST_VALUE(action) OVER (...)                                        → 'U'

Both first and last actions are 'U', so:

  • DELETE: emits using old values from the earliest change (change_id=6): old_amt = 75.00
  • INSERT: emits using new values from the latest change (change_id=8): new_amt = 30.00

Net delta:

__pgt_row_id | __pgt_action | amount
-------------|--------------|-------
pk_hash_3    | D            | 75.00    ← original value before all changes
pk_hash_3    | I            | 30.00    ← final value after all changes

The aggregate sees -75.00 + 30.00 = -45.00. This is correct regardless of the intermediate values. The intermediate rows (10.00, 20.00) are never seen.


Case 5: INSERT + UPDATE in Same Window

INSERT INTO orders (customer, amount) VALUES ('charlie', 100.00);
UPDATE orders SET amount = 200.00 WHERE customer = 'charlie';

Both happen before the next refresh. The buffer has:

change_id | action | old_amt | new_amt
9         | I      | NULL    | 100.00
10        | U      | 100.00  | 200.00

Net-effect analysis:

  • first_action = 'I' (row didn't exist before this window)
  • last_action = 'U' (row exists after)

Result:

  • No DELETE emitted (first_action = 'I' means the row was born in this window)
  • INSERT with final values: (charlie, 200.00)

The aggregate sees a pure insertion of (charlie, 200.00) — the intermediate value of 100.00 never appears.


Case 6: UPDATE + DELETE in Same Window

UPDATE orders SET amount = 999.99 WHERE id = 3;
DELETE FROM orders WHERE id = 3;

Net-effect:

  • first_action = 'U' (row existed before)
  • last_action = 'D' (row no longer exists)

Result:

  • DELETE with original old values from the first change
  • No INSERT (last_action = 'D')

The aggregate correctly sees only a removal.


Case 7: UPDATE with JOINs

Consider a stream table that joins two tables:

CREATE TABLE customers (
    id   SERIAL PRIMARY KEY,
    name TEXT NOT NULL,
    tier TEXT NOT NULL DEFAULT 'standard'
);

CREATE TABLE orders (
    id          SERIAL PRIMARY KEY,
    customer_id INT REFERENCES customers(id),
    amount      NUMERIC(10,2)
);

SELECT pgtrickle.create_stream_table(
    name         => 'order_details',
    query        => $$
      SELECT c.name, c.tier, o.amount
      FROM orders o
      JOIN customers c ON o.customer_id = c.id
    $$,
    schedule => '1m'
);

Now update a customer's tier:

UPDATE customers SET tier = 'premium' WHERE name = 'alice';

How the JOIN Delta Works

The join differentiation follows the formula:

$$\Delta(L \bowtie R) = (\Delta L \bowtie R) \cup (L \bowtie \Delta R) - (\Delta L \bowtie \Delta R)$$

Since only the customers table changed:

  • $\Delta L$ = changes to orders (empty)
  • $\Delta R$ = changes to customers (alice's tier: standard → premium)

So:

  • Part 1: $\Delta\text{orders} \bowtie \text{customers}$ = empty (no order changes)
  • Part 2: $\text{orders} \bowtie \Delta\text{customers}$ = all of alice's orders joined with her tier change
  • Part 3: $\Delta\text{orders} \bowtie \Delta\text{customers}$ = empty (no order changes)

Part 2 produces the delta: for each of alice's orders, DELETE the old row (with tier='standard') and INSERT a new row (with tier='premium').

The stream table is updated to reflect the new tier across all of alice's order rows.


Performance Summary

ScenarioBuffer rowsDelta rows emittedWork
Simple value change12 (D+I)O(1) per group
Group key change12 (D+I, different groups)O(1) per affected group
Group deletion11 (D) + 1 (I) or 1 (D)O(1)
N updates same rowN2 (D first-old + I last-new)O(N) scan, O(1) aggregate
INSERT+UPDATE same window21 (I only)O(1)
UPDATE+DELETE same window21 (D only)O(1)

In all cases, the work is proportional to the number of changed rows, not the total table size. A single UPDATE on a billion-row table produces the same delta cost as on a 10-row table.


What About IMMEDIATE Mode?

Everything above describes DIFFERENTIAL mode — changes accumulate in a buffer and are applied on a schedule. As of v0.2.0, pg_trickle also supports IMMEDIATE mode, where the stream table is updated synchronously within the same transaction as your UPDATE.

How IMMEDIATE Mode Differs for UPDATE

PhaseDIFFERENTIALIMMEDIATE
Trigger typeRow-level AFTER triggerStatement-level AFTER trigger with REFERENCING OLD TABLE, NEW TABLE
What's capturedOne buffer row with old_* and new_*Two transition tables: __pgt_oldtable and __pgt_newtable
When delta runsNext scheduler tickImmediately, in the same transaction
D+I decompositionIn the scan delta CTESame algebra, but reading from transition temp tables
ConcurrencyNo locking between writersAdvisory lock per stream table

When you run UPDATE orders SET amount = 59.99 WHERE id = 1:

  1. A BEFORE UPDATE trigger acquires an advisory lock on the stream table
  2. The AFTER UPDATE trigger captures both OLD TABLE AS __pgt_oldtable and NEW TABLE AS __pgt_newtable into temp tables
  3. The DVM engine generates the same D+I decomposition, reading old values from the old-table and new values from the new-table
  4. The delta is applied to the stream table immediately
  5. Any query within the same transaction sees the updated stream table
BEGIN;
UPDATE orders SET amount = 59.99 WHERE id = 1;
-- customer_totals already reflects the new amount here!
SELECT * FROM customer_totals WHERE customer = 'alice';
COMMIT;

The same D+I split, aggregate differentiation, and net-effect logic applies — the only difference is the data source (transition tables vs change buffer) and timing (synchronous vs scheduled).


Next in This Series

What Happens When You DELETE a Row?

This tutorial traces what happens when a DELETE statement hits a base table that is referenced by a stream table. It covers the trigger capture, how the scan delta emits a single DELETE event, and how each DVM operator propagates the removal — including group deletion, partial group reduction, JOINs, cascading deletes within a single refresh window, and the important edge case where a DELETE cancels a prior INSERT.

Prerequisite: Read WHAT_HAPPENS_ON_INSERT.md first — it introduces the full 7-phase lifecycle (trigger → scheduler → frontier → change detection → DVM delta → MERGE → cleanup). This tutorial focuses on how DELETE differs.

Setup

Same e-commerce example used throughout the series:

CREATE TABLE orders (
    id       SERIAL PRIMARY KEY,
    customer TEXT NOT NULL,
    amount   NUMERIC(10,2) NOT NULL
);

SELECT pgtrickle.create_stream_table(
    name     => 'customer_totals',
    query    => $$
      SELECT customer, SUM(amount) AS total, COUNT(*) AS order_count
      FROM orders GROUP BY customer
    $$,
    schedule => '1m'
);

-- Seed some data
INSERT INTO orders (customer, amount) VALUES
    ('alice', 50.00),
    ('alice', 30.00),
    ('bob',   75.00),
    ('bob',   25.00);

After the first refresh, the stream table contains:

customer | total  | order_count
---------|--------|------------
alice    | 80.00  | 2
bob      | 100.00 | 2

Case 1: Delete One Row (Group Survives)

DELETE FROM orders WHERE id = 2;  -- alice's 30.00 order

Alice still has one remaining order (id=1, amount=50.00). The group shrinks but doesn't vanish.

Phase 1: Trigger Capture

The AFTER DELETE trigger fires and writes one row to the change buffer with only OLD values:

pgtrickle_changes.changes_16384
┌───────────┬─────────────┬────────┬──────────┬──────────┬────────────┬──────────┬────────────┐
│ change_id │ lsn         │ action │ new_cust │ new_amt  │ old_cust   │ old_amt  │ pk_hash    │
├───────────┼─────────────┼────────┼──────────┼──────────┼────────────┼──────────┼────────────┤
│ 5         │ 0/1A3F3000  │ D      │ NULL     │ NULL     │ alice      │ 30.00    │ 4521038    │
└───────────┴─────────────┴────────┴──────────┴──────────┴────────────┴──────────┴────────────┘

Key difference from INSERT and UPDATE:

  • new_* columns are all NULL — the row no longer exists, so there are no NEW values
  • old_* columns contain the deleted row's data — this is what gets subtracted
  • pk_hash is computed from OLD.id (the deleted row's primary key)

Phase 2–4: Scheduler, Frontier, Change Detection

Identical to the INSERT flow. The scheduler detects one change row in the LSN window.

Phase 5: Scan Differentiation — Pure DELETE

Unlike UPDATE (which splits into D+I), a DELETE produces a single event:

__pgt_row_id | __pgt_action | customer | amount
-------------|--------------|----------|-------
4521038      | D            | alice    | 30.00

The scan delta applies the net-effect filtering rule:

  • first_action = 'D' → row existed before the refresh window
  • last_action = 'D' → row does not exist after

Result: emit a DELETE using old values. No INSERT is emitted (because last_action = 'D').

This is the simplest path through the scan delta — one change, one PK, one DELETE event.

Phase 5 (continued): Aggregate Differentiation

The aggregate operator processes the DELETE event against the alice group:

-- DELETE event: subtract old values from alice's group
__ins_count = 0         -- no inserts
__del_count = 1         -- one deletion
__ins_total = 0         -- no amount added
__del_total = 30.00     -- 30.00 removed

The merge CTE joins this delta with the existing stream table state:

new_count = old_count + ins_count - del_count = 2 + 0 - 1 = 1  (still > 0)

Since new_count > 0 and the group already existed (old_count = 2), the action is classified as 'U' (update). The aggregate emits the group with its new values:

customer | total | order_count | __pgt_row_id | __pgt_action
---------|-------|-------------|--------------|-------------
alice    | 50.00 | 1           | 7283194      | I

Note: the 'U' meta-action is emitted as __pgt_action = 'I' because the MERGE treats it as an update-via-INSERT (see aggregate final CTE: CASE WHEN __pgt_meta_action = 'D' THEN 'D' ELSE 'I' END).

Phase 6: MERGE

The MERGE statement matches alice's existing row and updates it:

MERGE INTO customer_totals AS st
USING (...delta...) AS d
ON st.__pgt_row_id = d.__pgt_row_id
WHEN MATCHED AND d.__pgt_action = 'I' THEN
  UPDATE SET customer = d.customer, total = d.total, order_count = d.order_count, ...

Result:

SELECT * FROM customer_totals;
 customer | total  | order_count
----------|--------|------------
 alice    | 50.00  | 1            ← was 80.00 / 2
 bob      | 100.00 | 2

Phase 7: Cleanup

The change buffer rows in the consumed LSN window are deleted:

DELETE FROM pgtrickle_changes.changes_16384
WHERE lsn > '0/1A3F2FFF'::pg_lsn AND lsn <= '0/1A3F3000'::pg_lsn;

Case 2: Delete Last Row in Group (Group Vanishes)

-- Alice has one order left (id=1, amount=50.00). Delete it.
DELETE FROM orders WHERE id = 1;

Trigger Capture

change_id | lsn         | action | old_cust | old_amt | pk_hash
6         | 0/1A3F3100  | D      | alice    | 50.00   | -837291

Scan Delta

Single DELETE event:

__pgt_row_id | __pgt_action | customer | amount
-------------|--------------|----------|-------
-837291      | D            | alice    | 50.00

Aggregate Delta

Group "alice":
  ins_count = 0
  del_count = 1
  new_count = old_count + 0 - 1 = 1 - 1 = 0  → group vanishes!

When new_count drops to 0 (or below), the aggregate classifies this as action 'D' (delete). The reference count has reached zero — no rows contribute to this group anymore.

The aggregate emits a DELETE for alice's group:

customer | __pgt_row_id | __pgt_action
---------|--------------|-------------
alice    | 7283194      | D

MERGE

The MERGE matches alice's existing row and deletes it:

WHEN MATCHED AND d.__pgt_action = 'D' THEN DELETE

Result:

SELECT * FROM customer_totals;
 customer | total  | order_count
----------|--------|------------
 bob      | 100.00 | 2

Alice's row is completely removed from the stream table. This is the correct behavior — with zero contributing rows, the group should not exist.


Case 3: Delete Multiple Rows (Same Group, Same Window)

-- Delete both of bob's orders before the next refresh
DELETE FROM orders WHERE id = 3;  -- bob, 75.00
DELETE FROM orders WHERE id = 4;  -- bob, 25.00

The change buffer has two rows with different pk_hash values (different PKs):

change_id | action | old_cust | old_amt | pk_hash
7         | D      | bob      | 75.00   | pk_hash_3
8         | D      | bob      | 25.00   | pk_hash_4

Scan Delta

Each PK has exactly one change, so both take the single-change fast path:

__pgt_row_id | __pgt_action | customer | amount
-------------|--------------|----------|-------
pk_hash_3    | D            | bob      | 75.00
pk_hash_4    | D            | bob      | 25.00

Two DELETE events, both targeting bob's group.

Aggregate Delta

The aggregate sums both deletions:

Group "bob":
  ins_count = 0
  del_count = 2
  del_total = 75.00 + 25.00 = 100.00
  new_count = 2 + 0 - 2 = 0  → group vanishes!

The aggregate emits a DELETE for bob's group.

MERGE

Bob's row is deleted from the stream table. With both alice and bob gone (from Cases 1+2+3), the stream table is now empty.


Case 4: INSERT + DELETE in Same Window (Cancellation)

What if a row is inserted and then deleted before the next refresh?

INSERT INTO orders (customer, amount) VALUES ('charlie', 200.00);
DELETE FROM orders WHERE customer = 'charlie';

The change buffer has:

change_id | action | new_cust | new_amt | old_cust | old_amt | pk_hash
9         | I      | charlie  | 200.00  | NULL     | NULL    | pk_hash_new
10        | D      | NULL     | NULL    | charlie  | 200.00  | pk_hash_new

Net-Effect Computation

Both changes share the same pk_hash. The pk_stats CTE finds cnt = 2, so this goes through the multi-change path:

first_action = FIRST_VALUE(action) OVER (...) → 'I'
last_action  = LAST_VALUE(action)  OVER (...) → 'D'

The scan delta applies the net-effect filtering:

  • DELETE branch: requires first_action != 'I' → FAILS (first_action = 'I')
  • INSERT branch: requires last_action != 'D' → FAILS (last_action = 'D')

Result: zero events emitted. The INSERT and DELETE completely cancel each other out.

The aggregate never sees charlie. The stream table is unchanged. This is correct — the row was born and died within the same refresh window, so it should have no visible effect.


Case 5: UPDATE + DELETE in Same Window

UPDATE orders SET amount = 999.99 WHERE id = 3;  -- bob: 75 → 999.99
DELETE FROM orders WHERE id = 3;

The change buffer:

change_id | action | old_amt | new_amt
11        | U      | 75.00   | 999.99
12        | D      | 999.99  | NULL

Net-Effect Computation

Same pk_hash, cnt = 2:

first_action = 'U'  (row existed before this window)
last_action  = 'D'  (row no longer exists)

Filtering:

  • DELETE branch: first_action != 'I' → OK. Emit DELETE with old values from the earliest change: old_amt = 75.00
  • INSERT branch: last_action != 'D' → FAILS. No INSERT emitted.

Net delta:

__pgt_row_id | __pgt_action | amount
-------------|--------------|-------
pk_hash_3    | D            | 75.00

The intermediate value of 999.99 never appears. The aggregate sees only the removal of the original value (75.00), which is correct — that's the value that was previously accounted for in the stream table.


Case 6: DELETE with JOINs

Consider a stream table that joins two tables:

CREATE TABLE customers (
    id   SERIAL PRIMARY KEY,
    name TEXT NOT NULL,
    tier TEXT NOT NULL DEFAULT 'standard'
);

CREATE TABLE orders (
    id          SERIAL PRIMARY KEY,
    customer_id INT REFERENCES customers(id),
    amount      NUMERIC(10,2)
);

SELECT pgtrickle.create_stream_table(
    name         => 'order_details',
    query        => $$
      SELECT c.name, c.tier, o.amount
      FROM orders o
      JOIN customers c ON o.customer_id = c.id
    $$,
    schedule => '1m'
);

Seed data:

INSERT INTO customers VALUES (1, 'alice', 'premium'), (2, 'bob', 'standard');
INSERT INTO orders VALUES (1, 1, 50.00), (2, 1, 30.00), (3, 2, 75.00);

After refresh, the stream table has:

name  | tier     | amount
------|----------|-------
alice | premium  | 50.00
alice | premium  | 30.00
bob   | standard | 75.00

Now delete an order:

DELETE FROM orders WHERE id = 2;  -- alice's 30.00 order

How the JOIN Delta Works

The join differentiation formula:

$$\Delta(L \bowtie R) = (\Delta L \bowtie R) \cup (L \bowtie \Delta R) - (\Delta L \bowtie \Delta R)$$

Since only the orders table changed:

  • $\Delta L$ = changes to orders (one DELETE: order #2)
  • $\Delta R$ = changes to customers (empty)

So:

  • Part 1: $\Delta\text{orders} \bowtie \text{customers}$ = the deleted order joined with its customer
  • Part 2: $\text{orders} \bowtie \Delta\text{customers}$ = empty (no customer changes)
  • Part 3: $\Delta\text{orders} \bowtie \Delta\text{customers}$ = empty (customers unchanged)

Part 1 produces:

name  | tier    | amount | __pgt_action
------|---------|--------|-------------
alice | premium | 30.00  | D

The deleted order is joined with alice's customer record to produce a DELETE delta row with the complete joined values.

MERGE

The MERGE matches the row (alice, premium, 30.00) and deletes it:

SELECT * FROM order_details;
 name  | tier     | amount
-------|----------|-------
 alice | premium  | 50.00      ← alice's remaining order
 bob   | standard | 75.00

What About Deleting From the Dimension Table?

DELETE FROM customers WHERE id = 2;  -- remove bob entirely

Now $\Delta R$ has a DELETE for bob, while $\Delta L$ is empty:

  • Part 1: $\Delta\text{orders} \bowtie \text{customers}$ = empty
  • Part 2: $\text{orders} \bowtie \Delta\text{customers}$ = bob's order(s) joined with deleted customer record

Part 2 produces DELETE events for every order that referenced bob:

name | tier     | amount | __pgt_action
-----|----------|--------|-------------
bob  | standard | 75.00  | D

After MERGE, bob's rows vanish from the stream table.

Note: This assumes referential integrity — if orders still references customer #2, a foreign key constraint would prevent the DELETE in practice. But from the IVM perspective, the join delta correctly handles the removal regardless.


Case 7: Bulk DELETE

DELETE FROM orders WHERE amount < 50.00;

This deletes multiple rows across potentially multiple groups. The trigger fires once per row (it's a FOR EACH ROW trigger), writing one change buffer entry per deleted row:

change_id | action | old_cust | old_amt | pk_hash
13        | D      | alice    | 30.00   | pk_hash_2
14        | D      | bob      | 25.00   | pk_hash_4

Scan Delta

Each deleted PK is independent (different pk_hash values), so each takes the single-change fast path. Two DELETE events:

__pgt_row_id | __pgt_action | customer | amount
-------------|--------------|----------|-------
pk_hash_2    | D            | alice    | 30.00
pk_hash_4    | D            | bob      | 25.00

Aggregate Delta

The aggregate groups these by customer:

Group "alice":
  del_count = 1, del_total = 30.00
  new_count = 2 - 1 = 1  (survives)

Group "bob":
  del_count = 1, del_total = 25.00
  new_count = 2 - 1 = 1  (survives)

Both groups survive (count > 0), so the aggregate emits UPDATE (as 'I') events with new values:

customer | total | order_count
---------|-------|------------
alice    | 50.00 | 1
bob      | 75.00 | 1

The MERGE updates both rows. All work is proportional to the number of deleted rows (2), not the total table size.


Case 8: TRUNCATE (Automatic Full Refresh)

TRUNCATE orders;

TRUNCATE does not fire row-level triggers. However, as of v0.2.0, pg_trickle installs a statement-level AFTER TRUNCATE trigger that writes a 'T' marker to the change buffer. On the next refresh cycle, the scheduler detects this marker and automatically performs a full refresh — truncating the stream table and recomputing from the defining query.

No manual intervention is required. For details on how TRUNCATE is handled across all three refresh modes (DIFFERENTIAL, IMMEDIATE, FULL), see What Happens When You TRUNCATE a Table?.


How DELETE Differs From INSERT and UPDATE — A Summary

AspectINSERTUPDATEDELETE
Trigger writesnew_* columns onlyBoth new_* and old_*old_* columns only
new_ columns*Row valuesNew valuesNULL
old_ columns*NULLOld valuesRow values
pk_hash sourceNEW.pkNEW.pkOLD.pk
Scan delta output1 INSERT event2 events (D+I split)1 DELETE event
Aggregate effectAdds to group count/sumSubtracts old, adds newSubtracts from group
Can delete a group?No (only creates/grows)Yes (if group key changes)Yes (if count reaches 0)
MERGE actionINSERT new rowUPDATE existing rowDELETE matched row

The Reference Counting Principle

The core insight behind incremental DELETE handling is reference counting. Every aggregate group in the stream table maintains an internal counter (__pgt_count) that tracks how many source rows contribute to the group:

Stream table internal state:
customer | total | order_count | __pgt_count (hidden)
---------|-------|-------------|---------------------
alice    | 80.00 | 2           | 2
bob      | 100.00| 2           | 2
  • INSERT__pgt_count += 1
  • DELETE__pgt_count -= 1
  • UPDATE__pgt_count += 0 (D cancels I for same-group updates)

When __pgt_count reaches 0:

  • The group has zero contributing rows
  • The aggregate emits a DELETE event
  • The MERGE removes the row from the stream table

This is mathematically rigorous — the stream table always reflects the correct result of the defining query over the current base table contents, incrementally maintained through algebraic delta operations.


Performance Summary

ScenarioBuffer rowsDelta rows emittedWork
Single row DELETE (group survives)11 (D)O(1) per group
Single row DELETE (group vanishes)11 (D)O(1)
N deletes same groupNN (D) → 1 group deltaO(N) scan, O(1) per group
INSERT+DELETE same window20 (cancels)O(1)
UPDATE+DELETE same window21 (D original)O(1)
Bulk DELETE across M groupsNN (D) → M group deltasO(N) scan, O(M) aggregate
JOIN table DELETE1K (one per matched join row)O(K) join

In all cases, the work is proportional to the number of changed rows, not the total table size. Deleting 3 rows from a billion-row table produces the same delta cost as from a 10-row table.


What About IMMEDIATE Mode?

Everything above describes DIFFERENTIAL mode — changes accumulate in a buffer and are applied on a schedule. As of v0.2.0, pg_trickle also supports IMMEDIATE mode, where the stream table is updated synchronously within the same transaction as your DELETE.

How IMMEDIATE Mode Differs for DELETE

PhaseDIFFERENTIALIMMEDIATE
Trigger typeRow-level AFTER triggerStatement-level AFTER trigger with REFERENCING OLD TABLE
What's capturedOne buffer row with old_* columns per deleted rowA transition table containing all deleted rows
When delta runsNext scheduler tickImmediately, in the same transaction
Delta sourceChange buffer rows with action='D'Temp table copied from transition table
ConcurrencyNo locking between writersAdvisory lock per stream table

When you run DELETE FROM orders WHERE id = 2:

  1. A BEFORE DELETE trigger acquires an advisory lock on the stream table
  2. The AFTER DELETE trigger captures OLD TABLE AS __pgt_oldtable into a temp table
  3. The DVM engine generates the same aggregate delta, reading deleted values from the old-table
  4. The delta is applied to the stream table immediately — groups are decremented, and groups reaching count=0 are removed
  5. Any query within the same transaction sees the updated stream table
BEGIN;
DELETE FROM orders WHERE id = 2;  -- alice's 30.00 order
-- customer_totals already reflects the deletion here!
SELECT * FROM customer_totals WHERE customer = 'alice';
-- Shows: alice | 50.00 | 1
COMMIT;

The same reference counting, group deletion, and net-effect logic applies — the only difference is the data source (transition tables vs change buffer) and timing (synchronous vs scheduled).


Next in This Series

What Happens When You TRUNCATE a Table?

This tutorial explains what happens when a TRUNCATE statement hits a base table that is referenced by a stream table. Unlike INSERT, UPDATE, and DELETE — which are fully tracked by the CDC trigger — TRUNCATE is a special case that bypasses row-level triggers entirely. Understanding this gap is essential for operating pg_trickle correctly.

Prerequisite: Read WHAT_HAPPENS_ON_INSERT.md first — it introduces the 7-phase lifecycle. This tutorial explains why TRUNCATE breaks that lifecycle and how to recover.

Setup

Same e-commerce example used throughout the series:

CREATE TABLE orders (
    id       SERIAL PRIMARY KEY,
    customer TEXT NOT NULL,
    amount   NUMERIC(10,2) NOT NULL
);

SELECT pgtrickle.create_stream_table(
    name     => 'customer_totals',
    query    => $$
      SELECT customer, SUM(amount) AS total, COUNT(*) AS order_count
      FROM orders GROUP BY customer
    $$,
    schedule => '1m'
);

-- Seed some data
INSERT INTO orders (customer, amount) VALUES
    ('alice', 50.00),
    ('alice', 30.00),
    ('bob',   75.00),
    ('bob',   25.00);

After the first refresh, the stream table contains:

customer | total  | order_count
---------|--------|------------
alice    | 80.00  | 2
bob      | 100.00 | 2

Case 1: TRUNCATE the Base Table (DIFFERENTIAL Mode)

TRUNCATE orders;

All four rows are removed instantly.

What Happens at the Trigger Level: TRUNCATE Marker

Updated in v0.2.0: pg_trickle now installs a statement-level AFTER TRUNCATE trigger on tracked source tables. This trigger writes a single marker row to the change buffer with action = 'T'.

Unlike the per-row DML triggers, the TRUNCATE trigger cannot capture individual row data (PostgreSQL's TRUNCATE does not provide OLD records). Instead, it writes a sentinel:

pgtrickle_changes.changes_16384
┌───────────┬─────────────┬────────┬──────────┬──────────┐
│ change_id │ lsn         │ action │ new_*    │ old_*    │
├───────────┼─────────────┼────────┼──────────┼──────────┤
│ 5         │ 0/1A3F4000  │ T      │ NULL     │ NULL     │
└───────────┴─────────────┴────────┴──────────┴──────────┘

The 'T' action marker tells the refresh engine: "a TRUNCATE happened — a full refresh is required."

What Happens at the Scheduler: Automatic Full Refresh

On the next refresh cycle, the scheduler:

  1. Checks the change buffer for rows in the LSN window
  2. Finds the action = 'T' marker row
  3. Falls back to a FULL refresh — regardless of the stream table's configured refresh_mode
  4. TRUNCATEs the stream table
  5. Re-executes the defining query against the current base table state
  6. Inserts all results

Since the orders table is now empty, the defining query returns zero rows:

-- After the next scheduled refresh:
SELECT * FROM customer_totals;
 customer | total | order_count
----------|-------|------------
 (0 rows)                        ← correct: orders is empty

No manual intervention required. The TRUNCATE marker ensures the stream table is automatically brought back into consistency on the next refresh cycle.

Note: In versions before v0.2.0, TRUNCATE was not captured at all — the change buffer stayed empty and the stream table became silently stale. If you're running an older version, you still need to call pgtrickle.refresh_stream_table() manually after a TRUNCATE.


Case 2: Manual Refresh (Explicit Recovery)

Although TRUNCATE is now automatically handled on the next refresh cycle, you can force an immediate recovery without waiting:

SELECT pgtrickle.refresh_stream_table('customer_totals');

This executes a full refresh regardless of the stream table's configured refresh mode:

  1. TRUNCATE the stream table itself (clearing the stale data)
  2. Re-execute the defining query
  3. INSERT the results into the stream table
  4. Update the frontier so future differential refreshes start from the current LSN

This is useful when you can't wait for the next scheduled refresh cycle and need the stream table consistent immediately.


Case 3: TRUNCATE Then INSERT (Common ETL Pattern)

A common data loading pattern is:

BEGIN;
TRUNCATE orders;
INSERT INTO orders (customer, amount) VALUES
    ('charlie', 100.00),
    ('charlie', 200.00),
    ('dave',    150.00);
COMMIT;

What the Change Buffer Sees

  • TRUNCATE: 1 marker event (action = 'T') — captured by the statement-level trigger
  • INSERT charlie 100.00: 1 event (captured)
  • INSERT charlie 200.00: 1 event (captured)
  • INSERT dave 150.00: 1 event (captured)

The change buffer has 4 rows — the TRUNCATE marker plus 3 INSERT events.

What the Scheduler Does

The scheduler sees the action = 'T' marker and triggers a full refresh, ignoring the individual INSERT events. The full refresh re-executes the defining query against the current state of orders, which now contains only charlie and dave:

-- After the next scheduled refresh:
SELECT * FROM customer_totals;
 customer | total  | order_count
----------|--------|------------
 charlie  | 300.00 | 2            ← correct
 dave     | 150.00 | 1            ← correct

The old data (alice, bob) is gone because the full refresh recomputed from scratch. This is correct — the TRUNCATE marker ensures consistency regardless of what other changes occurred in the same window.


Case 4: TRUNCATE a Dimension Table in a JOIN

Consider a stream table that joins two tables:

CREATE TABLE customers (
    id   SERIAL PRIMARY KEY,
    name TEXT NOT NULL,
    tier TEXT NOT NULL DEFAULT 'standard'
);

CREATE TABLE orders (
    id          SERIAL PRIMARY KEY,
    customer_id INT REFERENCES customers(id),
    amount      NUMERIC(10,2)
);

SELECT pgtrickle.create_stream_table(
    name         => 'order_details',
    query        => $$
      SELECT c.name, c.tier, o.amount
      FROM orders o
      JOIN customers c ON o.customer_id = c.id
    $$,
    schedule => '1m'
);

Now truncate the dimension table:

TRUNCATE customers CASCADE;

The CASCADE also truncates orders (due to the foreign key). Both tables have TRUNCATE triggers installed, so both write a 'T' marker to their respective change buffers.

On the next refresh cycle, the scheduler detects the TRUNCATE markers and performs a full refresh. The stream table is recomputed from the now-empty base tables:

-- After the next scheduled refresh:
SELECT * FROM order_details;
-- (0 rows) — correct

Case 5: FULL Mode Stream Tables Are Immune

If the stream table uses FULL refresh mode instead of DIFFERENTIAL:

SELECT pgtrickle.create_stream_table(
    name         => 'customer_totals_full',
    query        => $$
      SELECT customer, SUM(amount) AS total, COUNT(*) AS order_count
      FROM orders GROUP BY customer
    $$,
    schedule     => '1m',
    refresh_mode => 'FULL'
);

A FULL-mode stream table doesn't use the change buffer at all. Every refresh cycle:

  1. TRUNCATEs the stream table
  2. Re-executes the defining query
  3. Inserts all results

So after a TRUNCATE of the base table, the next scheduled refresh automatically picks up the correct state — no manual intervention needed. The trade-off is that every refresh recomputes from scratch, which is more expensive for large result sets.


Why PostgreSQL Doesn't Fire Row Triggers on TRUNCATE

Understanding the PostgreSQL internals helps explain why per-row capture is impossible:

OperationMechanismRow triggers fired?Statement triggers fired?
DELETE FROM tScans and removes rows one by oneYes — AFTER DELETE per rowYes
TRUNCATE tRemoves all heap files and reinitializes the table storageNo — no per-row processingYes — AFTER TRUNCATE
DELETE FROM t WHERE trueSame as DELETE FROM t (full scan)Yes — AFTER DELETE per rowYes

TRUNCATE is fundamentally different from DELETE. It's an O(1) operation that replaces the table's storage files, while DELETE is O(N) — scanning every row and recording each removal in WAL.

pg_trickle uses a statement-level AFTER TRUNCATE trigger to detect the event and write a 'T' marker to the change buffer. This marker does not contain per-row data (PostgreSQL's TRUNCATE trigger doesn't provide OLD records), but it's sufficient to signal that a full refresh is needed.


Alternative: DELETE FROM Instead of TRUNCATE

For DIFFERENTIAL mode, TRUNCATE is now handled automatically (via the 'T' marker and full refresh fallback). However, using DELETE FROM instead of TRUNCATE has its own advantages:

-- Instead of: TRUNCATE orders;
DELETE FROM orders;

This fires the row-level DELETE trigger for every row. The change buffer captures all removals, and the next differential refresh correctly decrements all reference counts through the standard algebraic delta path — avoiding the need for a full refresh fallback.

ApproachSpeedStream table consistent?Refresh type
TRUNCATE ordersO(1) — instantYes — automatic full refresh on next cycleFULL (fallback)
DELETE FROM ordersO(N) — scans all rowsYes — per-row triggers fireDIFFERENTIAL
TRUNCATE + manual refreshO(1) + O(query)Yes — immediatelyFULL (manual)

For tables with millions of rows, DELETE FROM can be slow and generate significant WAL. TRUNCATE is generally the better choice — the automatic full refresh fallback makes it safe to use.


Best Practices

1. TRUNCATE Is Safe to Use

As of v0.2.0, TRUNCATE on tracked source tables is automatically detected and triggers a full refresh on the next scheduler cycle. No manual intervention is required for standard operation.

2. Use Manual Refresh for Immediate Consistency

If you need the stream table to be consistent immediately (not on the next cycle), call refresh explicitly:

TRUNCATE orders;
SELECT pgtrickle.refresh_stream_table('customer_totals');

3. Consider IMMEDIATE Mode for Real-Time Needs

For stream tables that need to reflect TRUNCATE instantly (within the same transaction), use IMMEDIATE mode. The TRUNCATE trigger automatically performs a full refresh synchronously.

4. Consider FULL Mode for ETL-Heavy Tables

If a table is routinely truncated and reloaded, FULL refresh mode may be simpler than DIFFERENTIAL — it naturally handles TRUNCATE because it recomputes from scratch every cycle.

5. Use trigger_inventory() to Verify Triggers

You can verify that both the DML trigger and the TRUNCATE trigger are installed and enabled:

SELECT * FROM pgtrickle.trigger_inventory();

This shows one row per (source table, trigger type) confirming both pg_trickle_cdc_<oid> (DML) and pg_trickle_cdc_truncate_<oid> (TRUNCATE) triggers are present.


How TRUNCATE Compares to Other Operations

AspectINSERTUPDATEDELETETRUNCATE
Row trigger fires?Yes (per row)Yes (per row)Yes (per row)No
Statement trigger fires?YesYesYesYes (writes 'T' marker)
Change buffer1 row per INSERT1 row per UPDATE1 row per DELETE1 marker row (action='T')
Stream table updated?Yes (next refresh)Yes (next refresh)Yes (next refresh)Yes (full refresh on next cycle)
RecoveryAutomatic (differential)Automatic (differential)Automatic (differential)Automatic (full refresh fallback)
FULL mode affected?N/A (recomputes)N/A (recomputes)N/A (recomputes)N/A (recomputes)
IMMEDIATE mode?Synchronous deltaSynchronous deltaSynchronous deltaSynchronous full refresh
SpeedO(1) per rowO(1) per rowO(1) per rowO(1) + O(query) for refresh

What About IMMEDIATE Mode?

In IMMEDIATE mode, TRUNCATE is handled synchronously within the same transaction:

  1. The BEFORE TRUNCATE trigger acquires an advisory lock on the stream table
  2. The AFTER TRUNCATE trigger calls pgt_ivm_handle_truncate(pgt_id)
  3. This function TRUNCATEs the stream table and re-populates it by re-executing the defining query
  4. The stream table is immediately consistent — within the same transaction
SELECT pgtrickle.create_stream_table(
    name         => 'customer_totals_live',
    query        => $$
      SELECT customer, SUM(amount) AS total, COUNT(*) AS order_count
      FROM orders GROUP BY customer
    $$,
    refresh_mode => 'IMMEDIATE'
);

BEGIN;
TRUNCATE orders;
-- customer_totals_live is already empty here!
SELECT * FROM customer_totals_live;  -- (0 rows)
COMMIT;

No waiting for a scheduler cycle, no stale data — TRUNCATE is fully handled in real-time.


Summary

As of v0.2.0, TRUNCATE is fully tracked by pg_trickle across all three refresh modes. While it cannot be captured as per-row DELETE events (PostgreSQL's TRUNCATE doesn't process individual rows), pg_trickle uses a statement-level trigger to detect the event and respond appropriately.

The key takeaways:

  1. TRUNCATE is automatically handled — a statement-level AFTER TRUNCATE trigger writes a 'T' marker to the change buffer
  2. DIFFERENTIAL mode: automatic full refresh — the scheduler detects the marker and falls back to a full refresh on the next cycle
  3. IMMEDIATE mode: synchronous full refresh — the stream table is rebuilt within the same transaction
  4. FULL mode: naturally immune — every refresh recomputes from scratch regardless
  5. Manual refresh for instant consistency — call pgtrickle.refresh_stream_table() if you can't wait for the next cycle
  6. DELETE FROM remains an alternative — fires per-row triggers, enabling incremental delta processing instead of full refresh fallback

Next in This Series

Tutorial: Build a Real-Time Analytics Dashboard

DOC-NEW-24 (v0.57.0) — End-to-end tutorial: build the backend for a real-time analytics dashboard over a sample e-commerce dataset.

What You Will Build

A real-time analytics backend that powers three dashboard panels:

  1. Revenue by region — running totals updated within seconds of each order
  2. Hourly order counts — time-bucketed activity feed for trend charts
  3. Top 10 products — a continuously-maintained leaderboard by revenue

All three panels are backed by pg_trickle stream tables, so they refresh incrementally — only the rows that actually changed are recomputed.


Prerequisites

  • PostgreSQL 18 with pg_trickle installed (see Installation)
  • psql or any SQL client

Step 1 — Create the Source Tables

-- Orders: the core transaction table
CREATE TABLE orders (
    id          BIGSERIAL PRIMARY KEY,
    region      TEXT        NOT NULL,
    product_id  BIGINT      NOT NULL,
    amount      NUMERIC(12,2) NOT NULL,
    placed_at   TIMESTAMPTZ DEFAULT now()
);

-- Products: the product catalogue
CREATE TABLE products (
    id          BIGSERIAL PRIMARY KEY,
    name        TEXT NOT NULL,
    category    TEXT NOT NULL
);

-- Seed products
INSERT INTO products (name, category) VALUES
    ('Laptop Pro 15',   'Electronics'),
    ('Wireless Keyboard', 'Electronics'),
    ('Standing Desk',   'Furniture'),
    ('Ergonomic Chair', 'Furniture'),
    ('USB-C Hub',       'Electronics');

Step 2 — Enable pg_trickle

CREATE EXTENSION IF NOT EXISTS pg_trickle;

Step 3 — Revenue by Region

This panel shows the total revenue in each region, updated automatically as orders arrive.

SELECT pgtrickle.create_stream_table(
    name     => 'revenue_by_region',
    query    => $$
        SELECT
            region,
            COUNT(*)             AS order_count,
            SUM(amount)          AS total_revenue,
            AVG(amount)          AS avg_order_value,
            MAX(placed_at)       AS last_order_at
        FROM orders
        GROUP BY region
    $$,
    schedule => '5s',
    refresh_mode => 'DIFFERENTIAL'
);

-- Index for fast dashboard lookups
CREATE INDEX ON revenue_by_region (region);

Dashboard query:

SELECT region,
       order_count,
       total_revenue,
       avg_order_value,
       last_order_at
FROM revenue_by_region
ORDER BY total_revenue DESC;

Step 4 — Hourly Order Counts

A time-series stream table that aggregates orders into one-hour buckets. date_trunc('hour', ...) is a STABLE function, so DIFFERENTIAL mode works.

SELECT pgtrickle.create_stream_table(
    name     => 'hourly_order_counts',
    query    => $$
        SELECT
            date_trunc('hour', placed_at)  AS hour,
            region,
            COUNT(*)                        AS order_count,
            SUM(amount)                     AS hourly_revenue
        FROM orders
        GROUP BY date_trunc('hour', placed_at), region
    $$,
    schedule => '10s',
    refresh_mode => 'DIFFERENTIAL'
);

CREATE INDEX ON hourly_order_counts (hour DESC, region);

Dashboard query — last 24 hours:

SELECT hour,
       region,
       order_count,
       hourly_revenue
FROM hourly_order_counts
WHERE hour >= now() - interval '24 hours'
ORDER BY hour DESC, region;

Step 5 — Top 10 Products by Revenue

A leaderboard of the top-selling products. Joining orders to products to include the product name.

SELECT pgtrickle.create_stream_table(
    name     => 'top_products',
    query    => $$
        SELECT
            p.id                    AS product_id,
            p.name                  AS product_name,
            p.category,
            COUNT(o.id)             AS order_count,
            SUM(o.amount)           AS total_revenue
        FROM orders o
        JOIN products p ON p.id = o.product_id
        GROUP BY p.id, p.name, p.category
        ORDER BY SUM(o.amount) DESC
        LIMIT 10
    $$,
    schedule => '5s',
    refresh_mode => 'DIFFERENTIAL'
);

Dashboard query:

SELECT product_name,
       category,
       order_count,
       total_revenue
FROM top_products
ORDER BY total_revenue DESC;

Note: LIMIT N stream tables use differential TOP-K maintenance — pg_trickle tracks the rank boundary and recomputes only when a row enters or exits the top 10.


Step 6 — Insert Some Sample Data and Watch It Update

-- Simulate a batch of orders
INSERT INTO orders (region, product_id, amount) VALUES
    ('US',   1, 1299.00),
    ('EU',   2,  149.99),
    ('US',   3,  799.00),
    ('APAC', 1, 1299.00),
    ('US',   4,  599.00),
    ('EU',   5,   49.99),
    ('US',   1, 1299.00);

-- Wait a few seconds, then query the stream tables
SELECT * FROM revenue_by_region ORDER BY total_revenue DESC;
SELECT * FROM top_products;

Step 7 — Chain the Stream Tables (Optional)

You can build derived stream tables on top of other stream tables. For example, compute a "daily summary" that reads from hourly_order_counts:

SELECT pgtrickle.create_stream_table(
    name     => 'daily_revenue_summary',
    query    => $$
        SELECT
            date_trunc('day', hour) AS day,
            region,
            SUM(order_count)        AS total_orders,
            SUM(hourly_revenue)     AS total_revenue
        FROM hourly_order_counts
        GROUP BY date_trunc('day', hour), region
    $$,
    schedule => '30s',
    refresh_mode => 'DIFFERENTIAL'
);

pg_trickle automatically builds a dependency DAG: when orders changes, it refreshes hourly_order_counts first, then daily_revenue_summary in topological order.


Step 8 — Optional: Grafana Data-Source Configuration

If you use Grafana with the PostgreSQL data source plugin, point it at the database and use the stream tables directly as query targets:

-- Grafana time-series panel query (hourly revenue per region)
SELECT
    $__timeGroupAlias(hour, '1h'),
    region,
    SUM(hourly_revenue) AS revenue
FROM hourly_order_counts
WHERE $__timeFilter(hour)
GROUP BY 1, 2
ORDER BY 1;

Set Refresh to 5s in the Grafana panel options to poll for updates.


Monitor the Stream Tables

-- Check that all three stream tables are ACTIVE and refreshing
SELECT pgt_name, status, refresh_mode,
       last_refresh_at,
       consecutive_errors
FROM pgtrickle.pgt_stream_tables
WHERE pgt_name IN ('revenue_by_region', 'hourly_order_counts',
                   'top_products', 'daily_revenue_summary')
ORDER BY pgt_name;

Clean Up

SELECT pgtrickle.drop_stream_table('daily_revenue_summary');
SELECT pgtrickle.drop_stream_table('top_products');
SELECT pgtrickle.drop_stream_table('hourly_order_counts');
SELECT pgtrickle.drop_stream_table('revenue_by_region');

DROP TABLE orders;
DROP TABLE products;

Next Steps

Tutorial: Stream Tables as Event-Sourced Read Models

DOC-NEW-25 (v0.57.0) — End-to-end tutorial: use stream tables as read-model projections over an event-sourced write model. Models an order-processing domain with CQRS pattern and event-replay guidance.

What You Will Build

An event-sourced order-processing system where:

  • Writes go to an immutable order_events table (the event log)
  • Reads are served by three stream tables (the read models):
    • current_order_state — current status of each order
    • customer_lifetime_value — rolling spend and order count per customer
    • inventory_levels — current stock count derived from reservation events

This is the CQRS (Command Query Responsibility Segregation) pattern: the write model is append-only events; the read models are incrementally maintained projections.


Prerequisites

  • PostgreSQL 18 with pg_trickle installed (see Installation)
  • Basic familiarity with event sourcing concepts

Step 1 — The Event Log (Write Model)

The event log is a single append-only table. Every mutation to an order is recorded as a row. The table is never updated or deleted from — only new events are appended.

CREATE TYPE order_event_type AS ENUM (
    'ORDER_PLACED',
    'PAYMENT_RECEIVED',
    'PAYMENT_FAILED',
    'SHIPPED',
    'DELIVERED',
    'CANCELLED',
    'REFUNDED',
    'ITEM_RESERVED',
    'ITEM_RELEASED'
);

CREATE TABLE order_events (
    id           BIGSERIAL PRIMARY KEY,
    event_type   order_event_type NOT NULL,
    order_id     UUID NOT NULL,
    customer_id  UUID NOT NULL,
    product_id   BIGINT,
    quantity     INT,
    amount       NUMERIC(12,2),
    payload      JSONB,
    occurred_at  TIMESTAMPTZ DEFAULT now()
);

-- Immutability enforced: no UPDATE or DELETE allowed
CREATE RULE no_update_order_events AS ON UPDATE TO order_events DO INSTEAD NOTHING;
CREATE RULE no_delete_order_events AS ON DELETE TO order_events DO INSTEAD NOTHING;

Step 2 — Enable pg_trickle

CREATE EXTENSION IF NOT EXISTS pg_trickle;

Step 3 — Current Order State (Read Model)

This stream table folds all events for each order into its current state. FILTER (WHERE ...) aggregates extract the latest relevant event data per event type.

SELECT pgtrickle.create_stream_table(
    name     => 'current_order_state',
    query    => $$
        SELECT
            order_id,
            customer_id,
            MAX(occurred_at)
                FILTER (WHERE event_type = 'ORDER_PLACED')      AS placed_at,
            MAX(occurred_at)
                FILTER (WHERE event_type = 'PAYMENT_RECEIVED')  AS paid_at,
            MAX(occurred_at)
                FILTER (WHERE event_type = 'SHIPPED')           AS shipped_at,
            MAX(occurred_at)
                FILTER (WHERE event_type = 'DELIVERED')         AS delivered_at,
            MAX(occurred_at)
                FILTER (WHERE event_type = 'CANCELLED')         AS cancelled_at,
            SUM(amount)
                FILTER (WHERE event_type = 'ORDER_PLACED')      AS order_total,
            CASE
                WHEN BOOL_OR(event_type = 'CANCELLED')   THEN 'cancelled'
                WHEN BOOL_OR(event_type = 'DELIVERED')   THEN 'delivered'
                WHEN BOOL_OR(event_type = 'SHIPPED')     THEN 'shipped'
                WHEN BOOL_OR(event_type = 'PAYMENT_RECEIVED') THEN 'paid'
                WHEN BOOL_OR(event_type = 'PAYMENT_FAILED')   THEN 'payment_failed'
                ELSE 'placed'
            END AS status
        FROM order_events
        WHERE event_type IN (
            'ORDER_PLACED', 'PAYMENT_RECEIVED', 'PAYMENT_FAILED',
            'SHIPPED', 'DELIVERED', 'CANCELLED'
        )
        GROUP BY order_id, customer_id
    $$,
    schedule     => '5s',
    refresh_mode => 'DIFFERENTIAL'
);

CREATE INDEX ON current_order_state (order_id);
CREATE INDEX ON current_order_state (customer_id, placed_at DESC);
CREATE INDEX ON current_order_state (status, placed_at DESC);

Read-model query — active orders for a customer:

SELECT order_id,
       status,
       order_total,
       placed_at,
       shipped_at
FROM current_order_state
WHERE customer_id = $1
  AND status NOT IN ('delivered', 'cancelled')
ORDER BY placed_at DESC;

Step 4 — Customer Lifetime Value (Read Model)

Track rolling spend and order count per customer.

SELECT pgtrickle.create_stream_table(
    name     => 'customer_lifetime_value',
    query    => $$
        SELECT
            customer_id,
            COUNT(DISTINCT order_id)                              AS total_orders,
            SUM(amount)
                FILTER (WHERE event_type = 'PAYMENT_RECEIVED')   AS total_spent,
            SUM(amount)
                FILTER (WHERE event_type = 'REFUNDED')           AS total_refunded,
            SUM(amount)
                FILTER (WHERE event_type = 'PAYMENT_RECEIVED') -
            COALESCE(SUM(amount)
                FILTER (WHERE event_type = 'REFUNDED'), 0)       AS net_revenue,
            MIN(occurred_at)
                FILTER (WHERE event_type = 'ORDER_PLACED')       AS first_order_at,
            MAX(occurred_at)
                FILTER (WHERE event_type = 'ORDER_PLACED')       AS last_order_at
        FROM order_events
        WHERE event_type IN ('ORDER_PLACED', 'PAYMENT_RECEIVED', 'REFUNDED')
        GROUP BY customer_id
    $$,
    schedule     => '10s',
    refresh_mode => 'DIFFERENTIAL'
);

CREATE INDEX ON customer_lifetime_value (customer_id);
CREATE INDEX ON customer_lifetime_value (net_revenue DESC);

Read-model query — top customers by net revenue:

SELECT customer_id,
       total_orders,
       net_revenue,
       last_order_at
FROM customer_lifetime_value
ORDER BY net_revenue DESC
LIMIT 20;

Step 5 — Inventory Levels (Read Model)

Derive current stock counts from ITEM_RESERVED and ITEM_RELEASED events.

SELECT pgtrickle.create_stream_table(
    name     => 'inventory_levels',
    query    => $$
        SELECT
            product_id,
            SUM(CASE
                    WHEN event_type = 'ITEM_RESERVED' THEN -quantity
                    WHEN event_type = 'ITEM_RELEASED' THEN  quantity
                    ELSE 0
                END) AS reserved_delta,
            SUM(quantity)
                FILTER (WHERE event_type = 'ITEM_RESERVED') AS total_reserved,
            SUM(quantity)
                FILTER (WHERE event_type = 'ITEM_RELEASED') AS total_released,
            COUNT(DISTINCT order_id)
                FILTER (WHERE event_type = 'ITEM_RESERVED') AS active_reservations
        FROM order_events
        WHERE event_type IN ('ITEM_RESERVED', 'ITEM_RELEASED')
          AND product_id IS NOT NULL
        GROUP BY product_id
    $$,
    schedule     => '5s',
    refresh_mode => 'DIFFERENTIAL'
);

CREATE INDEX ON inventory_levels (product_id);

Step 6 — Try It with Sample Events

-- A customer places an order
INSERT INTO order_events (event_type, order_id, customer_id, product_id, quantity, amount) VALUES
    ('ORDER_PLACED',    'ord-001'::uuid, 'cust-A'::uuid, 1, 2, 2598.00),
    ('ITEM_RESERVED',   'ord-001'::uuid, 'cust-A'::uuid, 1, 2, NULL);

-- Payment succeeds
INSERT INTO order_events (event_type, order_id, customer_id, amount) VALUES
    ('PAYMENT_RECEIVED', 'ord-001'::uuid, 'cust-A'::uuid, 2598.00);

-- Order ships
INSERT INTO order_events (event_type, order_id, customer_id) VALUES
    ('SHIPPED', 'ord-001'::uuid, 'cust-A'::uuid);

-- Wait for the scheduler, then read the projections
SELECT * FROM current_order_state WHERE order_id = 'ord-001'::uuid;
SELECT * FROM customer_lifetime_value WHERE customer_id = 'cust-A'::uuid;
SELECT * FROM inventory_levels WHERE product_id = 1;

Step 7 — CQRS Pattern Summary

Write path:                         Read path:
──────────────────                  ──────────────────────────────────
Application layer                   Dashboard / API queries
       │                                       │
       │ INSERT INTO order_events              │ SELECT FROM current_order_state
       │                                       │ SELECT FROM customer_lifetime_value
       ▼                                       │ SELECT FROM inventory_levels
order_events (event log)                       ▲
       │                                       │
       │ pg_trickle CDC triggers               │ pg_trickle differential refresh
       └──────────────────────────────────────►│
                    (incremental, per schedule)

The application layer writes only to order_events. pg_trickle handles all projection maintenance automatically.


Step 8 — Event Replay and Backfill

If you need to rebuild a projection from scratch (e.g., after changing the defining query), use the reinitialize API:

-- Force a full rebuild of the current_order_state projection
SELECT pgtrickle.reinitialize_stream_table('current_order_state');

This triggers a FULL refresh from the event log, rebuilding the projection from all historical events. Once complete, pg_trickle switches back to differential maintenance automatically.

Backfill workflow for a new projection:

-- 1. Create the new projection with IMMEDIATE for the first cycle,
--    then switch to DIFFERENTIAL
SELECT pgtrickle.create_stream_table(
    name         => 'new_projection',
    query        => '...',
    schedule     => '5s',
    refresh_mode => 'FULL'       -- use FULL for initial backfill
);

-- 2. Wait for the first full cycle to complete
SELECT pgt_name, status, last_refresh_at
FROM pgtrickle.pgt_stream_tables
WHERE pgt_name = 'new_projection';

-- 3. Once status = 'ACTIVE', switch to DIFFERENTIAL
SELECT pgtrickle.alter_stream_table('new_projection',
    refresh_mode => 'DIFFERENTIAL'
);

Monitor the Projections

SELECT pgt_name, status, refresh_mode,
       last_refresh_at,
       consecutive_errors,
       rows_in_last_refresh
FROM pgtrickle.pgt_stream_tables
WHERE pgt_name IN ('current_order_state', 'customer_lifetime_value',
                   'inventory_levels')
ORDER BY pgt_name;

Clean Up

SELECT pgtrickle.drop_stream_table('inventory_levels');
SELECT pgtrickle.drop_stream_table('customer_lifetime_value');
SELECT pgtrickle.drop_stream_table('current_order_state');

DROP TABLE order_events;
DROP TYPE order_event_type;

Next Steps

Tutorial: Zero-Downtime Migration from Materialized Views

DOC-NEW-26 (v0.57.0) — Step-by-step guide: migrate a manually-maintained materialized view to a stream table with zero downtime.

Overview

This tutorial walks through migrating an existing REFRESH MATERIALIZED VIEW workflow to a pg_trickle stream table without any downtime or data loss. The process runs the old view and the new stream table in parallel so you can verify correctness before cutting over consumers.


Prerequisites

  • PostgreSQL 18 with pg_trickle installed (see Installation)
  • An existing MATERIALIZED VIEW with a known refresh schedule
  • At least SELECT access to the materialized view

Step 1 — Pre-Migration Assessment

Before migrating, assess whether the defining query is IVM-eligible.

-- Check the current materialized view definition
SELECT schemaname, matviewname, definition
FROM pg_matviews
WHERE matviewname = 'my_view';

-- Validate the query against pg_trickle's IVM compatibility checker
SELECT pgtrickle.validate_query(
    $$<paste your view definition here>$$
);

Example output:

 result  | detail
---------+--------------------------------------------------------------
 ok      | Query is IVM-eligible. Recommended mode: DIFFERENTIAL

If validate_query returns ok, the migration is straightforward. If it returns warnings or not_eligible, check the detail column — common reasons include volatile functions or unsupported SQL patterns. See LIMITATIONS.md for the full list.

For non-eligible queries, use refresh_mode => 'FULL' — pg_trickle will maintain a full-refresh stream table automatically, eliminating the manual REFRESH MATERIALIZED VIEW calls.


Step 2 — Create the Stream Table in Parallel

Do not drop the old materialized view yet. Create the stream table alongside it, pointing at the same source tables.

-- Example: migrating this materialized view
-- CREATE MATERIALIZED VIEW orders_summary AS
--     SELECT region, COUNT(*) AS order_count, SUM(amount) AS total
--     FROM orders GROUP BY region;

-- Create the equivalent stream table
SELECT pgtrickle.create_stream_table(
    name         => 'orders_summary_st',
    query        => $$
        SELECT region,
               COUNT(*)   AS order_count,
               SUM(amount) AS total
        FROM orders
        GROUP BY region
    $$,
    schedule     => '10s',
    refresh_mode => 'DIFFERENTIAL'
);

The stream table will populate within one refresh cycle. Check status:

SELECT pgt_name, status, last_refresh_at, rows_in_last_refresh
FROM pgtrickle.pgt_stream_tables
WHERE pgt_name = 'orders_summary_st';

Step 3 — Verify Output Parity

Compare the outputs of both the old view and the new stream table:

-- Rows in old view but not in stream table
SELECT * FROM orders_summary
EXCEPT
SELECT * FROM orders_summary_st;

-- Rows in stream table but not in old view
SELECT * FROM orders_summary_st
EXCEPT
SELECT * FROM orders_summary;

Both queries should return zero rows. If there are differences:

  1. Check that the stream table has had at least one full refresh cycle.
  2. Verify the defining query is identical (column order and aliases matter).
  3. Run SELECT pgtrickle.reinitialize_stream_table('orders_summary_st') to force a clean full refresh if there is any doubt.

For long-running parallel validation, write DML to the source table and verify both targets update correctly:

INSERT INTO orders (region, amount) VALUES ('TEST', 99.99);

-- Both should show the TEST region
SELECT * FROM orders_summary     WHERE region = 'TEST';
SELECT * FROM orders_summary_st  WHERE region = 'TEST';

Step 4 — Create a Compatibility View (Optional)

If consumers reference orders_summary by name and you cannot update them before cutover, create a compatibility view that points to the stream table:

-- 1. Rename the old materialized view
ALTER MATERIALIZED VIEW orders_summary RENAME TO orders_summary_old;

-- 2. Create a regular view with the original name, reading from the ST
CREATE VIEW orders_summary AS SELECT * FROM orders_summary_st;

Consumers now read from the stream table transparently. The old materialized view remains as a fallback.


Step 5 — Consumer Cutover

Once parallel validation passes, cut over consumers:

Option A — Direct table reference (recommended): Update consumer queries/code to reference orders_summary_st directly. This is the cleanest path and exposes pg_trickle's automatic freshness.

Option B — Keep the compatibility view: If you created the compatibility view in Step 4, consumers already read from the stream table. No further changes needed.


Step 6 — Remove the Old Materialized View

After confirming all consumers use the stream table:

-- Remove the old materialized view
DROP MATERIALIZED VIEW orders_summary_old;

-- If you kept the compatibility view, optionally rename the stream table
-- to match the original name:
--   SELECT pgtrickle.drop_stream_table('orders_summary_st');
--   SELECT pgtrickle.create_stream_table('orders_summary', ...)
-- Or rename the view to align naming conventions.

Also remove any cron jobs or application code that called REFRESH MATERIALIZED VIEW orders_summary.


Step 7 — Rollback Procedure

If problems arise after cutover:

-- 1. Stop the stream table from refreshing
SELECT pgtrickle.pause_stream_table('orders_summary_st');

-- 2. Revert consumers to the old materialized view
--    (if you renamed it in Step 4)
ALTER MATERIALIZED VIEW orders_summary_old RENAME TO orders_summary;

-- 3. Drop the compatibility view if you created one
DROP VIEW IF EXISTS orders_summary;

-- 4. Resume your manual refresh schedule
-- (add back the cron job / pg_cron entry for REFRESH MATERIALIZED VIEW)

-- 5. Optionally drop the stream table
SELECT pgtrickle.drop_stream_table('orders_summary_st');

Common Migration Patterns

Non-IVM-eligible queries (use FULL mode)

-- Query uses a volatile function; pg_trickle will use FULL refresh
SELECT pgtrickle.create_stream_table(
    name         => 'hourly_snapshot',
    query        => $$
        SELECT *, now() AS snapshot_at FROM large_table
    $$,
    schedule     => '1h',
    refresh_mode => 'FULL'
);

Concurrently-refreshed materialized views

If the old view used REFRESH MATERIALIZED VIEW CONCURRENTLY, note that pg_trickle's MERGE-based update is also non-blocking for readers. No special configuration is needed.

Views with WITH DATA at creation

pg_trickle always populates the stream table on the first cycle, equivalent to WITH DATA. The WITHOUT DATA option does not apply.


Post-Migration Checklist

  • pgtrickle.validate_query() returned ok or migration is in FULL mode
  • Stream table reached status = 'ACTIVE'
  • EXCEPT diff queries return zero rows
  • Manual REFRESH MATERIALIZED VIEW calls removed from cron/pg_cron
  • Old materialized view dropped (or retained as read-only archive)
  • Consumer queries point to the stream table

Next Steps

Tutorial: Security Hardening for pg_trickle

DOC-NEW-27 (v0.57.0) — Step-by-step security hardening guide: dedicated roles, CDC trigger ownership, change-buffer protection, and audit logging.

Overview

This guide hardens a pg_trickle installation following the principle of least privilege. After completing these steps:

  • Stream tables are owned by a dedicated non-superuser role.
  • Application users can read (but not write) stream tables.
  • Change buffers are protected from direct application access.
  • DDL operations against stream tables are audit-logged.

Prerequisites

  • PostgreSQL 18 with pg_trickle installed as a superuser
  • psql or an admin SQL client

Step 1 — Create Dedicated Roles

Run these statements as a superuser (e.g., postgres).

-- ─── pgtrickle_admin ──────────────────────────────────────────────────────
-- Manages stream tables: create, alter, drop, reinitialize.
-- Intended for DBAs and data engineers.
CREATE ROLE pgtrickle_admin NOLOGIN NOINHERIT;

GRANT USAGE  ON SCHEMA pgtrickle         TO pgtrickle_admin;
GRANT USAGE  ON SCHEMA pgtrickle_changes TO pgtrickle_admin;
GRANT EXECUTE ON ALL FUNCTIONS IN SCHEMA pgtrickle TO pgtrickle_admin;

-- Allow creating stream tables in the public schema
GRANT CREATE ON SCHEMA public TO pgtrickle_admin;

-- Allow reading source tables (add schemas as required)
GRANT USAGE  ON SCHEMA public TO pgtrickle_admin;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO pgtrickle_admin;

-- Future tables in public schema (run once per schema)
ALTER DEFAULT PRIVILEGES IN SCHEMA public
    GRANT SELECT ON TABLES TO pgtrickle_admin;


-- ─── pgtrickle_user ───────────────────────────────────────────────────────
-- Reads stream tables and calls monitoring functions.
-- Intended for application backends.
CREATE ROLE pgtrickle_user NOLOGIN NOINHERIT;

GRANT USAGE  ON SCHEMA pgtrickle  TO pgtrickle_user;

-- Read-only access to stream tables (granted per-table below)
-- Monitoring functions
GRANT EXECUTE ON FUNCTION pgtrickle.pgt_status()             TO pgtrickle_user;
GRANT EXECUTE ON FUNCTION pgtrickle.refresh_efficiency()     TO pgtrickle_user;
GRANT EXECUTE ON FUNCTION pgtrickle.health_check()           TO pgtrickle_user;


-- ─── pgtrickle_readonly ───────────────────────────────────────────────────
-- Pure read access to stream tables only; no extension function access.
-- Intended for reporting tools and BI consumers.
CREATE ROLE pgtrickle_readonly NOLOGIN NOINHERIT;

GRANT USAGE ON SCHEMA public TO pgtrickle_readonly;
-- Per-table GRANT added below after stream tables are created.

Step 2 — Grant Roles to Login Roles

-- Example: your data engineer login
CREATE ROLE de_alice LOGIN PASSWORD '...';
GRANT pgtrickle_admin TO de_alice;

-- Example: your application backend login
CREATE ROLE app_backend LOGIN PASSWORD '...';
GRANT pgtrickle_user TO app_backend;

-- Example: your BI tool login
CREATE ROLE bi_tool LOGIN PASSWORD '...';
GRANT pgtrickle_readonly TO bi_tool;

Step 3 — Create Stream Tables Under the Admin Role

Connect as the pgtrickle_admin role (or SET ROLE pgtrickle_admin) and create stream tables. The admin role becomes the owner, not the superuser.

SET ROLE pgtrickle_admin;

SELECT pgtrickle.create_stream_table(
    name     => 'order_summary',
    query    => $$SELECT region, SUM(amount) AS total FROM orders GROUP BY region$$,
    schedule => '10s'
);

RESET ROLE;

Verify ownership:

SELECT tablename, tableowner
FROM pg_tables
WHERE tablename = 'order_summary';

Step 4 — Grant Read Access to Consumer Roles

-- pgtrickle_user: reads stream tables and calls monitoring functions
GRANT SELECT ON order_summary TO pgtrickle_user;

-- pgtrickle_readonly: pure read access
GRANT SELECT ON order_summary TO pgtrickle_readonly;

-- For future stream tables, set default privileges so new tables are
-- automatically accessible:
ALTER DEFAULT PRIVILEGES FOR ROLE pgtrickle_admin IN SCHEMA public
    GRANT SELECT ON TABLES TO pgtrickle_user;

ALTER DEFAULT PRIVILEGES FOR ROLE pgtrickle_admin IN SCHEMA public
    GRANT SELECT ON TABLES TO pgtrickle_readonly;

Step 5 — Protect Change Buffers

Change buffers in pgtrickle_changes should never be directly accessible to application users. Revoke all access and grant only to the extension owner:

-- Revoke PUBLIC access (if not already revoked during extension install)
REVOKE ALL ON SCHEMA pgtrickle_changes FROM PUBLIC;

-- Application roles must not see change buffer tables
REVOKE ALL ON ALL TABLES IN SCHEMA pgtrickle_changes FROM pgtrickle_user;
REVOKE ALL ON ALL TABLES IN SCHEMA pgtrickle_changes FROM pgtrickle_readonly;
REVOKE ALL ON ALL TABLES IN SCHEMA pgtrickle_changes FROM pgtrickle_admin;

-- Verify: this query should return zero rows for non-superuser roles
SELECT table_name
FROM information_schema.role_table_grants
WHERE table_schema = 'pgtrickle_changes'
  AND grantee IN ('pgtrickle_user', 'pgtrickle_readonly', 'pgtrickle_admin');

Step 6 — Secure CDC Trigger Ownership

CDC triggers on source tables are owned by the stream table owner (pgtrickle_admin). Verify this:

-- CDC triggers should be owned by pgtrickle_admin, not a superuser
SELECT trigger_name, event_object_table, action_statement
FROM information_schema.triggers
WHERE trigger_name LIKE 'pgt_cdc_%'
ORDER BY event_object_table;

-- Verify trigger function ownership
SELECT proname, rolname AS owner
FROM pg_proc
JOIN pg_roles ON pg_roles.oid = pg_proc.proowner
WHERE proname LIKE 'pgt_cdc_%';

If triggers are owned by postgres (the superuser), recreate the stream tables under pgtrickle_admin (drop and recreate via SET ROLE pgtrickle_admin).


Step 7 — Enable Audit Logging for Stream Table DDL

Use PostgreSQL's log_statement or pg_audit (if installed) to capture DDL events against pg_trickle objects.

Using log_statement (built-in)

-- Log all DDL operations (creates, alters, drops)
ALTER SYSTEM SET log_statement = 'ddl';
SELECT pg_reload_conf();

DDL against stream tables — including pgtrickle.create_stream_table(), pgtrickle.drop_stream_table(), and pgtrickle.alter_stream_table() — will appear in the PostgreSQL log.

-- Install pg_audit extension (if available)
CREATE EXTENSION IF NOT EXISTS pgaudit;

-- Audit all DDL and function calls in the pgtrickle schema
ALTER SYSTEM SET pgaudit.log = 'DDL, FUNCTION';
SELECT pg_reload_conf();

Query the pg_trickle DDL history

pg_trickle records every refresh in pgtrickle.pgt_refresh_history. For change-level audit trails:

-- All stream table DDL operations (create, alter, drop)
SELECT pgt_name, action, performed_by, performed_at
FROM pgtrickle.pgt_ddl_history
ORDER BY performed_at DESC
LIMIT 50;

-- Recent refresh failures
SELECT pgt_name, refresh_mode, started_at, error_message
FROM pgtrickle.pgt_refresh_history
WHERE NOT success
  AND started_at > now() - interval '24 hours'
ORDER BY started_at DESC;

Step 8 — Disable Extension Behaviour in Non-Refresh Environments

If you have replica databases or analysis environments where you do not want pg_trickle running refreshes:

-- Disable the scheduler without uninstalling the extension
ALTER SYSTEM SET pg_trickle.enabled = off;
SELECT pg_reload_conf();

Verification Checklist

After completing all steps, verify the hardened state:

-- 1. pgtrickle_admin can create stream tables
SET ROLE pgtrickle_admin;
SELECT pgtrickle.validate_query('SELECT 1');
RESET ROLE;

-- 2. pgtrickle_user can read stream tables but cannot modify them
SET ROLE pgtrickle_user;
SELECT * FROM order_summary LIMIT 1;   -- should succeed
-- INSERT INTO order_summary VALUES (...);  -- should fail with permission denied
RESET ROLE;

-- 3. pgtrickle_readonly cannot call extension functions
SET ROLE pgtrickle_readonly;
-- SELECT pgtrickle.refresh_stream_table('order_summary');  -- should fail
RESET ROLE;

-- 4. No application role can see change buffers
SELECT COUNT(*)
FROM information_schema.role_table_grants
WHERE table_schema = 'pgtrickle_changes'
  AND grantee NOT IN ('postgres', 'pg_trickle');
-- Expected: 0

Security Hardening Checklist

  • pgtrickle_admin role created with NOLOGIN NOINHERIT
  • pgtrickle_user role created for application backends
  • pgtrickle_readonly role created for BI / reporting tools
  • Stream tables owned by pgtrickle_admin, not a superuser
  • REVOKE ALL ON SCHEMA pgtrickle_changes FROM PUBLIC
  • Application roles have no access to pgtrickle_changes.*
  • Audit logging enabled (log_statement = 'ddl' or pg_audit)
  • pg_trickle.allow_circular = off (default)
  • pg_trickle.enabled = off on replica / analysis environments

Next Steps

Row-Level Security (RLS) on Stream Tables

This tutorial shows how to apply PostgreSQL Row-Level Security to stream tables so that different database roles see only the rows they are permitted to access.

Background

Stream tables materialize the full result set of their defining query, regardless of any RLS policies on the source tables. This matches the behavior of PostgreSQL's built-in MATERIALIZED VIEW — the cache contains everything, and access control is enforced at read time.

The recommended pattern is:

  1. Source tables: may or may not have RLS. Stream tables always see all rows.
  2. Stream table: enable RLS on the stream table and create per-role policies so each role sees only its permitted rows.

Setup: Multi-Tenant Orders

-- Source table: all tenant orders
CREATE TABLE orders (
    id        SERIAL PRIMARY KEY,
    tenant_id INT    NOT NULL,
    product   TEXT   NOT NULL,
    amount    NUMERIC(10,2) NOT NULL
);

INSERT INTO orders (tenant_id, product, amount) VALUES
    (1, 'Widget A', 19.99),
    (1, 'Widget B',  9.50),
    (2, 'Gadget X', 49.00),
    (2, 'Gadget Y', 25.00),
    (3, 'Doohickey', 5.00);

-- Stream table: per-tenant spend summary
SELECT pgtrickle.create_stream_table(
    name  => 'tenant_spend',
    query => $$
      SELECT tenant_id,
             COUNT(*)       AS order_count,
             SUM(amount)    AS total_spend
      FROM orders
      GROUP BY tenant_id
    $$,
    schedule => '1m'
);

After the first refresh, tenant_spend contains all three tenants:

SELECT * FROM pgtrickle.tenant_spend ORDER BY tenant_id;
--  tenant_id | order_count | total_spend
-- -----------+-------------+-------------
--          1 |           2 |       29.49
--          2 |           2 |       74.00
--          3 |           1 |        5.00

Step 1: Enable RLS on the Stream Table

ALTER TABLE pgtrickle.tenant_spend ENABLE ROW LEVEL SECURITY;

Once RLS is enabled, non-superuser roles see zero rows unless a policy grants access. The superuser (table owner) bypasses RLS by default.

Step 2: Create Per-Tenant Roles

CREATE ROLE tenant_1 LOGIN;
CREATE ROLE tenant_2 LOGIN;

GRANT USAGE  ON SCHEMA pgtrickle TO tenant_1, tenant_2;
GRANT SELECT ON pgtrickle.tenant_spend TO tenant_1, tenant_2;

Step 3: Create RLS Policies

-- Tenant 1 sees only tenant_id = 1
CREATE POLICY tenant_1_policy ON pgtrickle.tenant_spend
    FOR SELECT
    TO tenant_1
    USING (tenant_id = 1);

-- Tenant 2 sees only tenant_id = 2
CREATE POLICY tenant_2_policy ON pgtrickle.tenant_spend
    FOR SELECT
    TO tenant_2
    USING (tenant_id = 2);

Step 4: Verify Filtering

Connect as each tenant role and query:

-- As tenant_1:
SET ROLE tenant_1;
SELECT * FROM pgtrickle.tenant_spend;
--  tenant_id | order_count | total_spend
-- -----------+-------------+-------------
--          1 |           2 |       29.49

RESET ROLE;

-- As tenant_2:
SET ROLE tenant_2;
SELECT * FROM pgtrickle.tenant_spend;
--  tenant_id | order_count | total_spend
-- -----------+-------------+-------------
--          2 |           2 |       74.00

RESET ROLE;

Each tenant sees only their own data. The underlying stream table still contains all rows — the filtering happens at query time via RLS.

How Refresh Works with RLS

Both scheduled and manual refreshes run with superuser-equivalent privileges, so RLS on source tables is always bypassed during refresh. This ensures:

  • The stream table always contains the complete result set.
  • A refresh_stream_table() call produces the same result regardless of who calls it.
  • IMMEDIATE mode (IVM triggers) also bypasses RLS via SECURITY DEFINER trigger functions.

Policy Change Detection

pg_trickle automatically detects RLS-related DDL on source tables:

DDL on source tableEffect
CREATE POLICY / ALTER POLICY / DROP POLICYStream table marked for reinit
ALTER TABLE ... ENABLE ROW LEVEL SECURITYStream table marked for reinit
ALTER TABLE ... DISABLE ROW LEVEL SECURITYStream table marked for reinit
ALTER TABLE ... FORCE ROW LEVEL SECURITYStream table marked for reinit
ALTER TABLE ... NO FORCE ROW LEVEL SECURITYStream table marked for reinit

Since the stream table always sees all rows (bypassing RLS), these reinits serve as a confirmation that the materialized data remains consistent after the security posture of the source table changed.

Tips

  • One stream table, many roles: A single stream table can serve all tenants. Each role's RLS policy filters at read time — no per-tenant duplication needed.
  • Write policies: Stream tables are maintained by pg_trickle. Restrict writes to the pg_trickle system by only creating FOR SELECT policies.
  • Default deny: Once RLS is enabled, roles without a matching policy see zero rows. Always test with a non-superuser role.
  • FORCE ROW LEVEL SECURITY: By default, table owners bypass RLS. Use ALTER TABLE ... FORCE ROW LEVEL SECURITY if the owner should also be subject to policies.

Partitioned Tables as Sources

This tutorial shows how pg_trickle works with PostgreSQL's declarative table partitioning. It covers RANGE, LIST, and HASH partitioned source tables, explains what happens when you add or remove partitions, and documents known caveats.

Background

PostgreSQL lets you split large tables into smaller "partitions" — for example one partition per month for an orders table. This is a common technique for managing very large datasets. pg_trickle handles partitioned source tables transparently:

  • CDC triggers fire on all partitions. PostgreSQL 13+ automatically clones row-level triggers from the parent to every child partition. All DML (INSERT, UPDATE, DELETE) on any partition is captured in a single change buffer keyed by the parent table's OID.

  • ATTACH PARTITION is detected automatically. When you add a new partition with pre-existing data, pg_trickle's DDL event trigger detects the change and marks affected stream tables for reinitialization. No manual intervention required.

  • WAL-based CDC works correctly. When using WAL mode, publications are created with publish_via_partition_root = true so all partition changes appear under the parent table's identity.

Example: Monthly Sales Partitions (RANGE)

-- Create a RANGE-partitioned source table
CREATE TABLE sales (
    id         SERIAL,
    sale_date  DATE    NOT NULL,
    region     TEXT    NOT NULL,
    amount     NUMERIC NOT NULL,
    PRIMARY KEY (id, sale_date)
) PARTITION BY RANGE (sale_date);

-- Create partitions for each half of the year
CREATE TABLE sales_h1_2025 PARTITION OF sales
    FOR VALUES FROM ('2025-01-01') TO ('2025-07-01');
CREATE TABLE sales_h2_2025 PARTITION OF sales
    FOR VALUES FROM ('2025-07-01') TO ('2026-01-01');

-- Insert data across partitions
INSERT INTO sales (sale_date, region, amount) VALUES
    ('2025-02-15', 'US', 100.00),
    ('2025-05-20', 'EU', 250.00),
    ('2025-08-10', 'US', 175.00),
    ('2025-11-30', 'EU', 300.00);

-- Create a stream table over the partitioned source
SELECT pgtrickle.create_stream_table(
    name  => 'regional_sales',
    query => $$
        SELECT region, SUM(amount) AS total, COUNT(*) AS cnt
        FROM sales
        GROUP BY region
    $$,
    schedule     => '1 minute',
    refresh_mode => 'DIFFERENTIAL'
);

-- Refresh to populate
SELECT pgtrickle.refresh_stream_table('regional_sales');

-- Verify — aggregates span all partitions:
SELECT * FROM regional_sales ORDER BY region;
--  region | total  | cnt
-- --------+--------+-----
--  EU     | 550.00 |   2
--  US     | 275.00 |   2

Adding New Partitions

When you add a new partition, any new rows inserted through the parent are automatically captured by CDC triggers. The trigger on the parent is cloned to the new partition by PostgreSQL.

-- Add a new partition for 2026
CREATE TABLE sales_h1_2026 PARTITION OF sales
    FOR VALUES FROM ('2026-01-01') TO ('2026-07-01');

-- Inserts into the new partition are captured normally
INSERT INTO sales (sale_date, region, amount)
    VALUES ('2026-03-15', 'US', 400.00);

-- Next refresh picks up the new row
SELECT pgtrickle.refresh_stream_table('regional_sales');

SELECT * FROM regional_sales ORDER BY region;
--  region | total  | cnt
-- --------+--------+-----
--  EU     | 550.00 |   2
--  US     | 675.00 |   3

ATTACH PARTITION with Pre-Existing Data

The most important edge case: attaching a table that already contains rows. These rows were never seen by CDC triggers, so the stream table would be stale. pg_trickle detects this automatically.

-- Create a standalone table with existing data
CREATE TABLE sales_h2_2026 (
    id        SERIAL,
    sale_date DATE    NOT NULL,
    region    TEXT    NOT NULL,
    amount    NUMERIC NOT NULL,
    PRIMARY KEY (id, sale_date)
);
INSERT INTO sales_h2_2026 (sale_date, region, amount) VALUES
    ('2026-08-01', 'EU', 500.00),
    ('2026-09-15', 'US', 200.00);

-- Attach it to the partitioned table
ALTER TABLE sales ATTACH PARTITION sales_h2_2026
    FOR VALUES FROM ('2026-07-01') TO ('2027-01-01');

-- pg_trickle detects the partition change and marks the stream table
-- for reinitialize. Check:
SELECT pgt_name, needs_reinit
FROM pgtrickle.pgt_stream_tables
WHERE pgt_name = 'regional_sales';
--  pgt_name        | needs_reinit
-- -----------------+--------------
--  regional_sales  | t

-- The next refresh reinitializes — re-reading all data from scratch:
SELECT pgtrickle.refresh_stream_table('regional_sales');

SELECT * FROM regional_sales ORDER BY region;
--  region | total   | cnt
-- --------+---------+-----
--  EU     | 1050.00 |   3
--  US     |  875.00 |   4

DETACH PARTITION

When you detach a partition, the detached table's data is no longer visible through the parent. pg_trickle detects this too and marks stream tables for reinitialize.

-- Archive the old partition
ALTER TABLE sales DETACH PARTITION sales_h1_2025;

-- Stream table is marked for reinit:
SELECT pgt_name, needs_reinit
FROM pgtrickle.pgt_stream_tables
WHERE pgt_name = 'regional_sales';
--  pgt_name        | needs_reinit
-- -----------------+--------------
--  regional_sales  | t

-- After refresh, the detached partition's rows are gone:
SELECT pgtrickle.refresh_stream_table('regional_sales');
SELECT * FROM regional_sales ORDER BY region;
-- (only rows from remaining partitions)

LIST Partitioning

LIST partitioning splits rows by discrete values. It works identically:

CREATE TABLE events (
    id      SERIAL,
    region  TEXT NOT NULL,
    payload TEXT,
    PRIMARY KEY (id, region)
) PARTITION BY LIST (region);

CREATE TABLE events_us PARTITION OF events FOR VALUES IN ('US');
CREATE TABLE events_eu PARTITION OF events FOR VALUES IN ('EU');
CREATE TABLE events_ap PARTITION OF events FOR VALUES IN ('AP');

SELECT pgtrickle.create_stream_table(
    name  => 'event_counts',
    query => 'SELECT region, count(*) AS cnt FROM events GROUP BY region',
    schedule => '1 minute'
);

HASH Partitioning

HASH partitioning distributes rows across a fixed number of partitions. Useful for spreading write load evenly:

CREATE TABLE metrics (
    id        SERIAL PRIMARY KEY,
    sensor_id INT    NOT NULL,
    value     DOUBLE PRECISION
) PARTITION BY HASH (id);

CREATE TABLE metrics_0 PARTITION OF metrics
    FOR VALUES WITH (MODULUS 4, REMAINDER 0);
CREATE TABLE metrics_1 PARTITION OF metrics
    FOR VALUES WITH (MODULUS 4, REMAINDER 1);
CREATE TABLE metrics_2 PARTITION OF metrics
    FOR VALUES WITH (MODULUS 4, REMAINDER 2);
CREATE TABLE metrics_3 PARTITION OF metrics
    FOR VALUES WITH (MODULUS 4, REMAINDER 3);

SELECT pgtrickle.create_stream_table(
    name  => 'sensor_avg',
    query => $$
        SELECT sensor_id, AVG(value) AS avg_val, COUNT(*) AS cnt
        FROM metrics GROUP BY sensor_id
    $$,
    schedule => '1 minute'
);

Foreign Tables

Tables from other databases (via postgres_fdw) can be used as sources, but with restrictions:

  • No trigger-based CDC — foreign tables don't support row-level triggers.
  • No WAL-based CDC — foreign tables don't generate local WAL.
  • FULL refresh worksSELECT * executes a remote query each time.
  • Polling-based CDC works — when pg_trickle.foreign_table_polling is enabled, pg_trickle creates a local snapshot table and detects changes via EXCEPT ALL comparison.

When you use a foreign table as a source, pg_trickle emits an info message explaining the limitations:

CREATE EXTENSION postgres_fdw;

CREATE SERVER remote_db
    FOREIGN DATA WRAPPER postgres_fdw
    OPTIONS (host 'remote-host', dbname 'analytics');

CREATE USER MAPPING FOR CURRENT_USER
    SERVER remote_db OPTIONS (user 'reader');

CREATE FOREIGN TABLE remote_orders (
    id     INT,
    amount NUMERIC
) SERVER remote_db OPTIONS (table_name 'orders');

-- Only FULL refresh is available:
SELECT pgtrickle.create_stream_table(
    name  => 'remote_totals',
    query => 'SELECT SUM(amount) AS total FROM remote_orders',
    schedule     => '5 minutes',
    refresh_mode => 'FULL'
);
-- INFO: pg_trickle: source table remote_orders is a foreign table.
-- Foreign tables cannot use trigger-based or WAL-based CDC —
-- only FULL refresh mode or polling-based change detection is supported.

Known Caveats

CaveatDescription
PostgreSQL 13+ requiredParent-table triggers only propagate to child partitions on PG 13+. pg_trickle targets PostgreSQL 18, so this is always satisfied.
Partition key in PRIMARY KEYPostgreSQL requires the partition key to be part of any unique constraint. This means your PRIMARY KEY must include the partition column.
ATTACH with data = reinitializeAttaching a partition with pre-existing rows triggers a full reinitialize on the next refresh. For very large tables, this may be slow. Consider gating the source with pgtrickle.gate_source() during bulk partition operations.
Sub-partitioningMulti-level partitioning (partitions of partitions) works in principle because triggers propagate through the entire hierarchy, but it is not extensively tested.
pg_partman compatibilitypg_partman dynamically creates and drops partitions. Since pg_trickle detects ATTACH/DETACH via DDL event triggers, it should work, but this combination is not yet tested.
Partitioned storage tablesUsing a partitioned table as the stream table's storage is not supported. This is tracked for a future release.
DETACH PARTITION CONCURRENTLYDETACH PARTITION ... CONCURRENTLY is a two-phase operation. The DDL event trigger fires after the first phase; the partition is not fully detached until the second phase commits. The stream table may briefly reflect the old partition count.

Foreign Table Sources

This tutorial shows how to use a postgres_fdw foreign table as a source for a stream table. Foreign tables let you aggregate data from remote PostgreSQL databases into a local stream table that refreshes automatically.

Background

PostgreSQL's Foreign Data Wrapper (postgres_fdw) lets you define tables that transparently query a remote database. pg_trickle can use these foreign tables as stream table sources, but with different change-detection semantics than regular tables.

Key difference: Foreign tables cannot use trigger-based or WAL-based CDC. Changes are detected either by re-scanning the entire remote table (FULL refresh) or by comparing a local snapshot to the remote data (polling-based CDC).

Step 1 — Set Up the Foreign Server

-- Enable the foreign data wrapper extension
CREATE EXTENSION IF NOT EXISTS postgres_fdw;

-- Create a connection to the remote database
CREATE SERVER warehouse_db
    FOREIGN DATA WRAPPER postgres_fdw
    OPTIONS (host 'warehouse.example.com', dbname 'analytics', port '5432');

-- Map the current user to a remote user
CREATE USER MAPPING FOR CURRENT_USER
    SERVER warehouse_db
    OPTIONS (user 'readonly_user', password 'secret');

Step 2 — Define the Foreign Table

CREATE FOREIGN TABLE remote_orders (
    id          INT,
    customer_id INT,
    amount      NUMERIC(12,2),
    region      TEXT,
    created_at  TIMESTAMP
) SERVER warehouse_db
  OPTIONS (schema_name 'public', table_name 'orders');

Alternatively, import an entire remote schema:

IMPORT FOREIGN SCHEMA public
    LIMIT TO (orders, customers)
    FROM SERVER warehouse_db
    INTO public;

Step 3 — Create a Stream Table with FULL Refresh

The simplest approach uses FULL refresh mode — pg_trickle re-executes the query against the remote table on every refresh cycle:

SELECT pgtrickle.create_stream_table(
    name         => 'orders_by_region',
    query        => $$
        SELECT
            region,
            COUNT(*)        AS order_count,
            SUM(amount)     AS total_revenue,
            AVG(amount)     AS avg_order_value
        FROM remote_orders
        GROUP BY region
    $$,
    schedule     => '5m',
    refresh_mode => 'FULL'
);

pg_trickle will emit an informational message:

INFO: pg_trickle: source table remote_orders is a foreign table.
Foreign tables cannot use trigger-based or WAL-based CDC —
only FULL refresh mode or polling-based change detection is supported.

How FULL refresh works with foreign tables:

  1. Every 5 minutes, pg_trickle executes the defining query.
  2. The query is sent to the remote database via postgres_fdw.
  3. The complete result set replaces the stream table contents.
  4. This is equivalent to a MATERIALIZED VIEW refresh, but automated.

Step 4 — Polling-Based CDC (Optional)

If the remote table is large and changes are small, FULL refresh becomes expensive because it transfers the entire result set every cycle. Polling-based CDC provides a more efficient alternative:

-- Enable polling globally (or per-session)
SET pg_trickle.foreign_table_polling = on;

-- Now create with DIFFERENTIAL mode — pg_trickle will use polling
SELECT pgtrickle.create_stream_table(
    name         => 'orders_by_region_polling',
    query        => $$
        SELECT
            region,
            COUNT(*)        AS order_count,
            SUM(amount)     AS total_revenue,
            AVG(amount)     AS avg_order_value
        FROM remote_orders
        GROUP BY region
    $$,
    schedule     => '5m',
    refresh_mode => 'FULL'
);

How polling works:

  1. On the first refresh, pg_trickle creates a local snapshot table that mirrors the remote table's data.
  2. On subsequent refreshes, it fetches the current remote data and computes an EXCEPT ALL difference against the snapshot.
  3. Only the changed rows are written to the change buffer and processed through the incremental delta pipeline.
  4. The snapshot table is updated to reflect the new remote state.
  5. When the stream table is dropped, the snapshot table is cleaned up.

Trade-offs:

AspectFULL RefreshPolling CDC
Network transferFull result set every cycleFull remote scan, but only diffs applied
Local storageStream table onlyStream table + snapshot table
Best forSmall remote tablesLarge remote tables with small change rates
GUC requiredNopg_trickle.foreign_table_polling = on

Step 5 — Verify and Monitor

-- Check stream table status
SELECT * FROM pgtrickle.pgt_status('orders_by_region');

-- Check CDC health (will show foreign table constraints)
SELECT * FROM pgtrickle.check_cdc_health();

-- View refresh history
SELECT * FROM pgtrickle.get_refresh_history('orders_by_region', 5);

-- Monitor staleness
SELECT * FROM pgtrickle.get_staleness('orders_by_region');

Worked Example — Remote Inventory Dashboard

This example aggregates inventory data from a remote warehouse database into a local dashboard table:

-- Remote table definition
CREATE FOREIGN TABLE remote_inventory (
    sku         TEXT,
    warehouse   TEXT,
    quantity    INT,
    updated_at  TIMESTAMP
) SERVER warehouse_db
  OPTIONS (schema_name 'inventory', table_name 'stock_levels');

-- Dashboard: inventory summary by warehouse
SELECT pgtrickle.create_stream_table(
    name     => 'inventory_dashboard',
    query    => $$
        SELECT
            warehouse,
            COUNT(DISTINCT sku)  AS unique_products,
            SUM(quantity)        AS total_units,
            MIN(updated_at)      AS oldest_update,
            MAX(updated_at)      AS newest_update
        FROM remote_inventory
        GROUP BY warehouse
    $$,
    schedule     => '10m',
    refresh_mode => 'FULL'
);

After the first refresh:

SELECT * FROM inventory_dashboard;
 warehouse | unique_products | total_units | oldest_update       | newest_update
-----------+-----------------+-------------+---------------------+---------------------
 east      |             142 |       23500 | 2026-03-14 08:00:00 | 2026-03-14 09:15:00
 west      |              98 |       15200 | 2026-03-14 07:30:00 | 2026-03-14 09:10:00
 central   |             215 |       41000 | 2026-03-14 06:00:00 | 2026-03-14 09:20:00

Constraints and Caveats

ConstraintDetails
No trigger CDCForeign tables don't support PostgreSQL row-level triggers.
No WAL CDCForeign tables don't generate local WAL entries.
Network latencyEach refresh cycle queries the remote database. Schedule accordingly.
Remote availabilityIf the remote database is down, the refresh will fail (logged in pgt_refresh_history). The stream table retains its last successful data.
AuthenticationCREATE USER MAPPING credentials must remain valid. Use .pgpass or environment variables in production.
Snapshot storagePolling CDC creates a snapshot table sized proportionally to the remote table. Monitor disk usage.

FAQ

Q: Why does my foreign table stream table only work in FULL mode?

Foreign tables cannot install row-level triggers (the mechanism pg_trickle uses for trigger-based CDC) and don't generate local WAL records (used by WAL-based CDC). FULL refresh works because it simply re-executes the remote query. Enable pg_trickle.foreign_table_polling if you need differential-style change detection.

Q: Can I mix foreign and local tables in the same defining query?

Yes. If your query joins a foreign table with a local table, pg_trickle uses trigger/WAL CDC for the local table and FULL-rescan or polling for the foreign table. The refresh mode must be FULL unless polling is enabled for the foreign table sources.

Q: What happens if the remote database is temporarily unavailable?

The refresh attempt fails, is logged in pgt_refresh_history with status FAILED, and the consecutive_errors counter increments. The stream table retains its last successful data. When the remote database recovers, the next scheduled refresh succeeds and the error counter resets.

Tutorial: Tiered Scheduling

Tiered scheduling (v0.12.0+) lets you assign refresh priorities to stream tables using four tiers: Hot, Warm, Cold, and Frozen. This reduces CPU and I/O overhead by refreshing less-critical tables less frequently.

When to Use It

  • You have many stream tables (50+) and want to reduce scheduler load
  • Some tables power real-time dashboards (need hot refresh) while others serve weekly reports (can be cold)
  • You want to freeze tables during maintenance windows without dropping them

Tier Overview

TierMultiplierEffect
hotRefresh at the configured schedule (default)
warmRefresh at 2× the configured interval
cold10×Refresh at 10× the configured interval
frozenskipNever refreshed until manually promoted

For a stream table with schedule => '1m':

TierEffective Interval
hot1 minute
warm2 minutes
cold10 minutes
frozennever

Note: Cron-based schedules are not affected by the tier multiplier. They always fire at the configured cron time.

Step-by-Step Example

1. Enable tiered scheduling

Tiered scheduling is enabled by default since v0.12.0. Verify:

SHOW pg_trickle.tiered_scheduling;
-- Should return: on

2. Create stream tables with different priorities

-- Real-time dashboard — stays hot (default)
SELECT pgtrickle.create_stream_table(
    name     => 'live_order_count',
    query    => 'SELECT COUNT(*) AS total FROM orders WHERE status = ''active''',
    schedule => '30s'
);

-- Important but not latency-critical
SELECT pgtrickle.create_stream_table(
    name     => 'daily_revenue',
    query    => 'SELECT DATE_TRUNC(''day'', created_at) AS day, SUM(amount) AS revenue
                 FROM orders GROUP BY 1',
    schedule => '1m'
);

-- Weekly report — rarely queried
SELECT pgtrickle.create_stream_table(
    name     => 'customer_lifetime_value',
    query    => 'SELECT customer_id, SUM(amount) AS lifetime_value
                 FROM orders GROUP BY customer_id',
    schedule => '5m'
);

3. Assign tiers

-- live_order_count stays at 'hot' (default) — refreshes every 30s

-- daily_revenue: 2× multiplier → effective interval = 2 minutes
SELECT pgtrickle.alter_stream_table('daily_revenue', tier => 'warm');

-- customer_lifetime_value: 10× multiplier → effective interval = 50 minutes
SELECT pgtrickle.alter_stream_table('customer_lifetime_value', tier => 'cold');

4. Verify effective schedules

SELECT pgt_name, schedule, refresh_tier,
       CASE refresh_tier
           WHEN 'hot'  THEN schedule
           WHEN 'warm' THEN schedule || ' ×2'
           WHEN 'cold' THEN schedule || ' ×10'
           WHEN 'frozen' THEN 'never'
       END AS effective
FROM pgtrickle.pgt_stream_tables
ORDER BY refresh_tier;

5. Freeze a table during maintenance

-- Freeze before a schema migration
SELECT pgtrickle.alter_stream_table('customer_lifetime_value', tier => 'frozen');

-- ... perform migration ...

-- Promote back when ready
SELECT pgtrickle.alter_stream_table('customer_lifetime_value', tier => 'warm');

Choosing the Right Tier

Use CaseRecommended Tier
Real-time dashboards, alerting tableshot
Operational reports queried hourlywarm
Weekly/monthly analytics, batch consumerscold
Tables under maintenance, seasonal reportsfrozen

Rules of thumb:

  • Start with everything at hot (the default). Move tables to warm or cold as you identify which ones can tolerate more staleness.
  • Warm halves the refresh CPU cost compared to hot.
  • Cold reduces refresh overhead by 90%.
  • Use frozen sparingly — changes accumulate in the buffer and will be processed when you promote the table back.

Monitoring Tiers

-- Check which tables are in which tier
SELECT pgt_name, refresh_tier, status, staleness
FROM pgtrickle.stream_tables_info
ORDER BY refresh_tier, staleness DESC;

-- Find frozen tables (these are NOT being refreshed)
SELECT pgt_name, refresh_tier
FROM pgtrickle.pgt_stream_tables
WHERE refresh_tier = 'frozen';

Troubleshooting

All tables are frozen and nothing is refreshing:
If every stream table is set to frozen, the scheduler has nothing to do. Promote at least one table back to hot or warm.

Staleness exceeds expectations for cold tables:
Remember that cold applies a 10× multiplier. A 5-minute schedule becomes a 50-minute effective interval. If this is too stale, use warm instead.

Further Reading

Tutorial: Fuse Circuit Breaker

The fuse circuit breaker (v0.11.0+) suspends differential refreshes when the incoming change volume exceeds a threshold. This protects your database from runaway refresh cycles during bulk data loads, accidental mass-deletes, or migration scripts.

When to Use It

  • Bulk ETL loads — loading millions of rows that would overwhelm a differential refresh
  • Data migration scripts — large schema or data changes that temporarily spike the change buffer
  • Protection against accidents — an errant DELETE FROM orders shouldn't silently cascade through all downstream stream tables

How It Works

Normal operation           Fuse blows               After reset
─────────────────         ─────────────────        ─────────────────
Source DML ──▶ CDC ──▶ Refresh   Source DML ──▶ CDC ──▶ BLOCKED   Source DML ──▶ CDC ──▶ Refresh
                                  │                                    (resumed)
                                  ▼
                           NOTIFY alert
                           (fuse_blown)
  1. Each refresh cycle, the scheduler counts pending changes in the buffer.
  2. If the count exceeds fuse_ceiling for fuse_sensitivity consecutive cycles, the fuse blows.
  3. The stream table enters a paused state — no refreshes occur.
  4. A fuse_blown alert is emitted via NOTIFY pg_trickle_alert.
  5. An operator investigates and calls reset_fuse() to resume.

Step-by-Step Example

1. Create a stream table with fuse protection

SELECT pgtrickle.create_stream_table(
    name         => 'category_summary',
    query        => 'SELECT category, COUNT(*) AS cnt, SUM(price) AS total
                     FROM products GROUP BY category',
    schedule     => '1m',
    refresh_mode => 'DIFFERENTIAL'
);

-- Arm the fuse: blow when pending changes exceed 50,000 rows
SELECT pgtrickle.alter_stream_table(
    'category_summary',
    fuse           => 'on',
    fuse_ceiling   => 50000,
    fuse_sensitivity => 3    -- require 3 consecutive over-ceiling cycles
);

2. Observe normal operation

-- Insert a small batch — well under the ceiling
INSERT INTO products (name, category, price)
SELECT 'Product ' || i, 'Electronics', 9.99
FROM generate_series(1, 100) i;

-- After the next refresh cycle, the stream table is updated normally
SELECT * FROM pgtrickle.category_summary;

3. Trigger a bulk load

-- Simulate a large ETL load — 100,000 rows
INSERT INTO products (name, category, price)
SELECT 'Bulk ' || i, 'Imported', 4.99
FROM generate_series(1, 100000) i;

After fuse_sensitivity scheduler cycles (3 in our example), the fuse blows. The stream table stops refreshing.

4. Inspect the fuse state

SELECT name, fuse_mode, fuse_state, fuse_ceiling, blown_at, blow_reason
FROM pgtrickle.fuse_status();
     name          | fuse_mode | fuse_state | fuse_ceiling |          blown_at          |       blow_reason
-------------------+-----------+------------+--------------+----------------------------+---------------------------
 category_summary  | on        | blown      |        50000 | 2026-03-31 14:22:01.123+00 | change_count_exceeded

5. Decide how to recover

You have three options:

-- Option A: Apply the changes (process the bulk load normally)
SELECT pgtrickle.reset_fuse('category_summary', action => 'apply');

-- Option B: Skip the changes (discard the batch, resume from current state)
SELECT pgtrickle.reset_fuse('category_summary', action => 'skip_changes');

-- Option C: Reinitialize (full rebuild from the defining query)
SELECT pgtrickle.reset_fuse('category_summary', action => 'reinitialize');

After resetting, the fuse returns to 'armed' state and the scheduler resumes.

Fuse Modes

ModeBehavior
'off'No fuse protection (default)
'on'Always armed — blows when changes exceed fuse_ceiling
'auto'Blows only when a FULL refresh would be cheaper than DIFFERENTIAL

'auto' mode is recommended for most use cases — it protects against bulk loads while allowing large-but-efficient differential refreshes to proceed.

Using with dbt

In dbt models, configure the fuse via the stream_table materialization:

-- models/marts/category_summary.sql
{{ config(
    materialized='stream_table',
    schedule='5m',
    refresh_mode='DIFFERENTIAL',
    fuse='auto',
    fuse_ceiling=50000,
    fuse_sensitivity=3
) }}

SELECT category, COUNT(*) AS cnt, SUM(price) AS total
FROM {{ source('raw', 'products') }}
GROUP BY category

Global Defaults

Set a cluster-wide default ceiling via the pg_trickle.fuse_default_ceiling GUC. Stream tables with fuse_ceiling = NULL inherit this value:

ALTER SYSTEM SET pg_trickle.fuse_default_ceiling = 100000;
SELECT pg_reload_conf();

Monitoring

  • pgtrickle.fuse_status() — inspect fuse state for all stream tables
  • LISTEN pg_trickle_alert — receive real-time fuse_blown notifications
  • pgtrickle.dedup_stats() — includes fuse-related counters
  • pgtrickle.pgt_stream_tables.fuse_state — direct catalog query

Further Reading

Tutorial: Circular Dependencies

pg_trickle supports circular (cyclic) stream table dependencies (v0.7.0+) for queries that use only monotone operators. The scheduler groups circular dependencies into Strongly Connected Components (SCCs) and iterates them to a fixed point.

When to Use It

  • Transitive closure — computing all reachable nodes in a graph
  • Graph reachability — finding all paths between nodes
  • Iterative convergence — mutual dependencies that stabilize after a few iterations

Prerequisites

Circular dependencies are disabled by default. Enable them:

SET pg_trickle.allow_circular = true;

Monotone Operator Requirement

Only monotone operators are allowed in circular dependency chains. Monotone operators guarantee convergence — the result set grows (or stays the same) with each iteration until a fixed point is reached.

Allowed (Monotone)Blocked (Non-Monotone)
Joins (INNER, LEFT, RIGHT, FULL)Aggregates (SUM, COUNT, etc.)
Filters (WHERE)EXCEPT
Projections (SELECT)Window functions
UNION ALLNOT EXISTS / NOT IN
INTERSECT
EXISTS

Creating a circular dependency with non-monotone operators is rejected with a clear error message, regardless of the allow_circular setting.

Step-by-Step Example: Transitive Closure

Suppose you have a graph of relationships:

CREATE TABLE edges (src INT, dst INT);
INSERT INTO edges VALUES
    (1, 2), (2, 3), (3, 4), (4, 5),
    (1, 3), (2, 5);

1. Create the base reachability table

-- Direct edges: all nodes directly connected
SELECT pgtrickle.create_stream_table(
    name     => 'reachable_direct',
    query    => 'SELECT src, dst FROM edges',
    schedule => '1m',
    refresh_mode => 'DIFFERENTIAL'
);

2. Create the transitive closure with a self-reference

-- Transitive closure: if A→B and B→C, then A→C
-- This creates a circular dependency (reachable depends on itself via the join)
SELECT pgtrickle.create_stream_table(
    name     => 'reachable',
    query    => 'SELECT DISTINCT r1.src, r2.dst
                 FROM pgtrickle.reachable_direct r1
                 JOIN pgtrickle.reachable_direct r2 ON r1.dst = r2.src
                 UNION ALL
                 SELECT src, dst FROM edges',
    schedule => '1m',
    refresh_mode => 'DIFFERENTIAL'
);

Note: This example uses the reachable_direct table for the join rather than self-referencing reachable directly. For a true self-referencing cycle, pg_trickle detects the SCC and iterates.

3. Observe the fixed-point iteration

When the scheduler processes an SCC, it iterates until no new rows are produced (the fixed point):

-- Check SCC status
SELECT * FROM pgtrickle.pgt_scc_status();

Output:

 scc_id | members                          | iteration | converged
--------+----------------------------------+-----------+-----------
      1 | {reachable_direct,reachable}     |         3 | true

4. Add new edges and watch convergence

INSERT INTO edges VALUES (5, 1);  -- creates a cycle in the graph

On the next refresh cycle, the scheduler re-iterates the SCC until the transitive closure stabilizes with the new edge.

Monitoring SCCs

-- View all SCCs and their convergence status
SELECT * FROM pgtrickle.pgt_scc_status();

-- Check which stream tables belong to which SCC
SELECT pgt_name, scc_id
FROM pgtrickle.pgt_stream_tables
WHERE scc_id IS NOT NULL;

Controlling Iteration Limits

The pg_trickle.max_fixpoint_iterations GUC limits how many iterations the scheduler attempts before declaring non-convergence:

-- Default: 100 (generous headroom)
SHOW pg_trickle.max_fixpoint_iterations;

-- Lower it for fast-converging workloads
SET pg_trickle.max_fixpoint_iterations = 20;

If convergence is not reached within the limit, all SCC members are marked as ERROR. This prevents runaway infinite loops.

Limitations

  • Non-monotone operators are always rejected — aggregates, EXCEPT, window functions, and NOT EXISTS/NOT IN cannot appear in circular chains because they prevent convergence.
  • Performance scales with iteration count — each iteration runs a full differential refresh cycle for all SCC members. Keep cycles small.
  • All SCC members must use DIFFERENTIAL mode — FULL and IMMEDIATE modes are not supported for circular dependencies.

Further Reading

Tutorial: Tuning Refresh Mode

This tutorial walks you through using pg_trickle's built-in diagnostics to determine whether your stream tables are running in the most efficient refresh mode (FULL vs DIFFERENTIAL), and how to act on the recommendations.

Prerequisites

  • pg_trickle v0.14.0 or later
  • At least one stream table with several completed refresh cycles (the diagnostics become more accurate with more history)

Step 1: Check Current Refresh Efficiency

Start by reviewing how your stream tables are performing with their current refresh mode:

SELECT pgt_name, refresh_mode, diff_count, full_count,
       avg_diff_ms, avg_full_ms, diff_speedup
FROM pgtrickle.refresh_efficiency();

Example output:

pgt_namerefresh_modediff_countfull_countavg_diff_msavg_full_msdiff_speedup
order_totalsDIFFERENTIAL142312.4850.268.6x
user_statsFULL0145320.1
daily_metricsDIFFERENTIAL9847425.8410.31.0x

Key observations:

  • order_totals: DIFFERENTIAL is 68× faster — this is a great fit.
  • user_stats: Running in FULL mode with no DIFFERENTIAL history — worth checking if DIFFERENTIAL would be faster.
  • daily_metrics: DIFFERENTIAL and FULL take about the same time (1.0× speedup). FULL might actually be simpler and more predictable here.

Step 2: Get Recommendations

Use recommend_refresh_mode() to get AI-weighted recommendations:

SELECT pgt_name, current_mode, recommended_mode, confidence, reason
FROM pgtrickle.recommend_refresh_mode();

Example output:

pgt_namecurrent_moderecommended_modeconfidencereason
order_totalsDIFFERENTIALKEEPhighDIFFERENTIAL is 68.6× faster than FULL with low latency variance
user_statsFULLDIFFERENTIALmediumQuery is simple (no complex joins), change ratio is low (2.1%), target table is large
daily_metricsDIFFERENTIALFULLmediumDIFFERENTIAL shows no speedup over FULL (1.0×); high latency variance (p95/p50 = 4.2) suggests unstable performance

For a single table with full signal details:

SELECT recommended_mode, confidence, reason,
       jsonb_pretty(signals) AS signal_details
FROM pgtrickle.recommend_refresh_mode('daily_metrics');

Step 3: Understand the Signals

The signals JSONB column contains the detailed breakdown of all seven weighted signals that contributed to the recommendation:

{
  "composite_score": -0.22,
  "signals": [
    { "name": "change_ratio_avg", "score": -0.1, "weight": 0.30 },
    { "name": "empirical_timing", "score": -0.3, "weight": 0.35 },
    { "name": "change_ratio_current", "score": -0.2, "weight": 0.25 },
    { "name": "query_complexity", "score": 0.0, "weight": 0.10 },
    { "name": "target_size", "score": 0.1, "weight": 0.10 },
    { "name": "index_coverage", "score": 0.0, "weight": 0.05 },
    { "name": "latency_variance", "score": -0.4, "weight": 0.05 }
  ]
}

Positive scores favour DIFFERENTIAL; negative scores favour FULL. A composite score above +0.15 recommends DIFFERENTIAL; below −0.15 recommends FULL; in between, the current mode is near-optimal (KEEP).

Why ±0.15? The thresholds create a dead zone between −0.15 and +0.15 where the engine considers the two modes equivalent. Without this dead zone, small fluctuations in the change_ratio signal would cause the engine to oscillate between FULL and DIFFERENTIAL every few cycles — burning scheduling overhead with no net benefit. The +0.15 threshold means DIFFERENTIAL needs a clear edge (roughly a 15% advantage in combined signal weight) before the engine switches away from FULL, and vice versa.

You can widen or narrow the dead zone:

-- Wider dead zone (less switching) — good for stable, predictable workloads
ALTER SYSTEM SET pg_trickle.cost_model_safety_margin = 0.25;
SELECT pg_reload_conf();

-- Narrower dead zone (faster mode switching) — good for highly variable workloads
ALTER SYSTEM SET pg_trickle.cost_model_safety_margin = 0.05;
SELECT pg_reload_conf();

The default is 0.15. If you see frequent mode oscillation in pgtrickle.pgt_refresh_history, increase the margin.

Confidence levels:

LevelMeaning
high10+ completed refresh cycles; strong signal agreement
medium5–10 cycles or mixed signals
lowFewer than 5 cycles; recommendation is speculative

Step 4: Apply the Recommendation

If you decide to follow a recommendation, use ALTER STREAM TABLE:

-- Switch daily_metrics from DIFFERENTIAL to FULL
SELECT pgtrickle.alter_stream_table('daily_metrics',
    refresh_mode => 'FULL'
);

Or switch a table to DIFFERENTIAL:

-- Switch user_stats to DIFFERENTIAL mode
SELECT pgtrickle.alter_stream_table('user_stats',
    refresh_mode => 'DIFFERENTIAL'
);

The change takes effect on the next refresh cycle. No data is lost during the transition.

Step 5: Monitor After the Change

After switching modes, wait for several refresh cycles and re-check:

-- Wait a few minutes, then re-check efficiency
SELECT pgt_name, refresh_mode, diff_count, full_count,
       avg_diff_ms, avg_full_ms, diff_speedup
FROM pgtrickle.refresh_efficiency()
WHERE pgt_name = 'daily_metrics';

Run the recommendation function again to verify the change was beneficial:

SELECT recommended_mode, confidence, reason
FROM pgtrickle.recommend_refresh_mode('daily_metrics');

If the recommendation now says KEEP, the new mode is working well.

Common Scenarios

High-cardinality aggregates

Stream tables with SUM/COUNT/AVG over high-cardinality GROUP BY keys (1000+ groups) are almost always better in DIFFERENTIAL mode. pg_trickle warns about low-cardinality groups at creation time (DIAG-2).

Small tables with frequent full rewrites

If the source table is small (< 10,000 rows) and changes affect > 30% of rows per cycle, FULL refresh is often faster because it avoids the overhead of change tracking and delta application.

Complex multi-join queries

Queries with 4+ JOINs may have high DIFFERENTIAL overhead due to the delta propagation rules. If diff_speedup is below 2×, consider FULL mode.

Tables with volatile functions

Stream tables using volatile functions (e.g., now(), random()) must use FULL mode. pg_trickle rejects volatile functions in DIFFERENTIAL mode at creation time.

See Also

Tutorial: Monitoring & Alerting

This guide consolidates all pg_trickle monitoring capabilities into a single reference: built-in SQL views, NOTIFY-based alerts, and the Prometheus/Grafana observability stack.

Quick Health Check

The fastest way to verify pg_trickle is healthy:

SELECT * FROM pgtrickle.health_check() WHERE severity != 'OK';

If this returns no rows, everything is working. Any WARN or ERROR rows tell you where to investigate.

Built-in Monitoring Views

Stream table status

-- Overview: name, status, mode, staleness
SELECT name, status, refresh_mode, staleness, stale
FROM pgtrickle.stream_tables_info;

-- Detailed stats: refresh counts, duration, error streaks
SELECT pgt_name, total_refreshes, avg_duration_ms, consecutive_errors, stale
FROM pgtrickle.pg_stat_stream_tables;

-- Live status with error counts
SELECT * FROM pgtrickle.pgt_status();

Refresh history

-- Last 10 refreshes for a specific stream table
SELECT start_time, action, status, duration_ms, rows_inserted, rows_deleted, error_message
FROM pgtrickle.get_refresh_history('order_totals', 10);

-- Global refresh timeline (last 20 events across all stream tables)
SELECT start_time, stream_table, action, status, duration_ms, error_message
FROM pgtrickle.refresh_timeline(20);

-- Aggregate refresh statistics
SELECT * FROM pgtrickle.st_refresh_stats();

CDC pipeline health

-- Per-source CDC mode, WAL lag, and alerts
SELECT * FROM pgtrickle.check_cdc_health();

-- Change buffer sizes (pending changes not yet consumed)
SELECT stream_table, source_table, cdc_mode, pending_rows, buffer_bytes
FROM pgtrickle.change_buffer_sizes()
ORDER BY pending_rows DESC;

-- Verify all CDC triggers are installed and enabled
SELECT source_table, trigger_type, trigger_name
FROM pgtrickle.trigger_inventory()
WHERE NOT present OR NOT enabled;

Dependencies

-- ASCII tree view of the entire dependency graph
SELECT tree_line, status, refresh_mode
FROM pgtrickle.dependency_tree();

-- Diamond consistency groups
SELECT * FROM pgtrickle.diamond_groups();

Fuse circuit breaker

-- Check fuse state for all stream tables
SELECT name, fuse_mode, fuse_state, fuse_ceiling, blown_at
FROM pgtrickle.fuse_status();

Parallel workers

-- Worker pool status (when parallel_refresh_mode = 'on')
SELECT * FROM pgtrickle.worker_pool_status();

-- Recent parallel job history
SELECT job_id, unit_key, status, duration_ms
FROM pgtrickle.parallel_job_status(60);

NOTIFY-Based Alerting

pg_trickle emits real-time events via PostgreSQL's NOTIFY system:

LISTEN pg_trickle_alert;

Event Types

EventTriggerSeverity
stale_dataScheduler is also behind — view is genuinely out of dateWarning
no_upstream_changesScheduler is healthy but source tables have had no writes — view is correctInfo
auto_suspendedStream table suspended after max consecutive errorsCritical
resumedStream table resumed after suspensionInfo
reinitialize_neededUpstream DDL change detectedWarning
buffer_growth_warningChange buffer growing unexpectedlyWarning
slot_lag_warningWAL replication slot retaining excessive dataWarning
fuse_blownCircuit breaker trippedWarning
refresh_completedRefresh completed successfullyInfo
refresh_failedRefresh failedError
diamond_partial_failureOne member of an atomic diamond group failedWarning
scheduler_falling_behindRefresh duration approaching the schedule intervalWarning
spill_threshold_exceededDelta MERGE spilled to temp files for consecutive refreshes, forcing FULLWarning

Notification Payload

Each notification carries a JSON payload:

{
  "event": "auto_suspended",
  "stream_table": "order_totals",
  "consecutive_errors": 3,
  "last_error": "column \"deleted_column\" does not exist",
  "timestamp": "2026-03-31T14:22:01.123Z"
}

Bridging to External Systems

To forward NOTIFY events to external alerting systems (PagerDuty, Slack, OpsGenie), use a listener process:

# Example: Python listener using psycopg
import psycopg
import json

conn = psycopg.connect("postgresql://user:pass@host/db", autocommit=True)
conn.execute("LISTEN pg_trickle_alert")

for notify in conn.notifies():
    payload = json.loads(notify.payload)
    event = payload["event"]
    # no_upstream_changes is informational — source tables are quiet but healthy.
    # Only page on actionable events.
    if event in ("auto_suspended", "fuse_blown", "refresh_failed"):
        send_to_pagerduty(payload)
    elif event == "stale_data":  # scheduler itself is falling behind
        send_to_pagerduty(payload)

Prometheus & Grafana Stack

For production deployments, use the pre-built observability stack in the monitoring/ directory:

cd monitoring/
docker compose up -d

This gives you:

  • Prometheus scraping pg_trickle metrics via postgres_exporter
  • Grafana with a pre-provisioned dashboard
  • Alerting rules for staleness, errors, CDC lag, and scheduler health

See Prometheus & Grafana Integration for full setup details.

Diagnostic Workflow

When something is wrong, follow this systematic workflow:

Step 1 — Global health

SELECT * FROM pgtrickle.health_check() WHERE severity != 'OK';

Step 2 — Status and staleness

SELECT name, status, consecutive_errors, staleness
FROM pgtrickle.pgt_status()
ORDER BY staleness DESC NULLS FIRST;

Step 3 — Recent refresh activity

SELECT start_time, stream_table, action, status, error_message
FROM pgtrickle.refresh_timeline(20);

Step 4 — Error details for a specific stream table

SELECT * FROM pgtrickle.diagnose_errors('my_stream_table');

Step 5 — CDC pipeline

SELECT stream_table, source_table, pending_rows, buffer_bytes
FROM pgtrickle.change_buffer_sizes()
ORDER BY pending_rows DESC;

Step 6 — Trigger verification

SELECT source_table, trigger_type, trigger_name
FROM pgtrickle.trigger_inventory()
WHERE NOT present OR NOT enabled;

Common Alert Responses

AlertLikely CauseAction
stale_dataScheduler behind, long refresh, or lock contentionCheck pgt_status() and refresh_timeline()
auto_suspendedRepeated refresh failuresFix root cause, then resume_stream_table()
fuse_blownBulk load exceeded fuse ceilingInvestigate, then reset_fuse()
buffer_growth_warningScheduler not consuming buffers fast enoughCheck scheduler status and refresh errors
reinitialize_neededSource table DDL changedVerify schema compatibility; scheduler handles automatically

Further Reading

Tutorial: ETL & Bulk Load Patterns

pg_trickle provides source gating (v0.5.0+) and watermark gating (v0.7.0+) to coordinate stream table refreshes with ETL pipelines and bulk data loads. This tutorial covers common patterns for pausing refreshes during loads and resuming them safely afterward.

The Problem

When you bulk-load data into a source table (e.g., a nightly ETL job), the change buffer fills rapidly. Without coordination:

  • A differential refresh mid-load sees a partial batch, producing incomplete results
  • The adaptive fallback may trigger repeated FULL refreshes during the load
  • The fuse circuit breaker may blow, requiring manual intervention

Source gating solves this by telling pg_trickle to skip refreshes for gated sources until the load completes.

Recipe 1 — Single Source Bulk Load

The simplest pattern: gate the source, load data, ungate.

-- 1. Gate the source table — all dependent stream tables pause
SELECT pgtrickle.gate_source('public.orders');

-- 2. Perform the bulk load
COPY orders FROM '/data/orders_20260331.csv' WITH (FORMAT csv, HEADER);
-- or: INSERT INTO orders SELECT ... FROM staging_orders;

-- 3. Ungate — stream tables resume and process the full batch
SELECT pgtrickle.ungate_source('public.orders');

While gated, the scheduler skips all stream tables that depend on the gated source. Changes still accumulate in the CDC buffer and are processed in a single batch after ungating.

Recipe 2 — Coordinated Multi-Source Load

When your ETL loads multiple tables that feed into the same stream table:

-- Gate all sources involved in the load
SELECT pgtrickle.gate_source('public.orders');
SELECT pgtrickle.gate_source('public.customers');
SELECT pgtrickle.gate_source('public.products');

-- Load all tables
COPY orders FROM '/data/orders.csv' WITH (FORMAT csv, HEADER);
COPY customers FROM '/data/customers.csv' WITH (FORMAT csv, HEADER);
COPY products FROM '/data/products.csv' WITH (FORMAT csv, HEADER);

-- Ungate all at once — stream tables see a consistent snapshot
SELECT pgtrickle.ungate_source('public.orders');
SELECT pgtrickle.ungate_source('public.customers');
SELECT pgtrickle.ungate_source('public.products');

Recipe 3 — Gate + Deferred Stream Table Creation

For initial deployments where data must be loaded before stream tables are created:

-- 1. Gate the source before any stream tables exist
SELECT pgtrickle.gate_source('public.orders');

-- 2. Load the initial data
COPY orders FROM '/data/historical_orders.csv' WITH (FORMAT csv, HEADER);

-- 3. Create stream tables — they won't refresh yet (source is gated)
SELECT pgtrickle.create_stream_table(
    name     => 'order_totals',
    query    => 'SELECT customer_id, SUM(amount) AS total FROM orders GROUP BY customer_id',
    schedule => '1m'
);

-- 4. Ungate — the first refresh processes all data cleanly
SELECT pgtrickle.ungate_source('public.orders');

Recipe 4 — Nightly Batch Pattern

A common production pattern using a scheduled batch job:

-- Run nightly at 02:00 UTC

-- Step 1: Gate all ETL sources
DO $$
DECLARE
    src TEXT;
BEGIN
    FOR src IN SELECT DISTINCT source_table
               FROM pgtrickle.list_sources('daily_report')
    LOOP
        PERFORM pgtrickle.gate_source(src);
    END LOOP;
END;
$$;

-- Step 2: Run the ETL pipeline
CALL etl.load_daily_data();

-- Step 3: Ungate all sources
DO $$
DECLARE
    gated RECORD;
BEGIN
    FOR gated IN SELECT source_name FROM pgtrickle.source_gates()
                 WHERE is_gated = true
    LOOP
        PERFORM pgtrickle.ungate_source(gated.source_name);
    END LOOP;
END;
$$;

Monitoring During a Gated Load

While sources are gated, verify the gate status:

-- Check which sources are currently gated
SELECT * FROM pgtrickle.source_gates();

-- Bootstrap gate status (v0.6.0+)
SELECT * FROM pgtrickle.bootstrap_gate_status();

Combining with the Fuse Circuit Breaker

For extra safety, combine gating with the fuse circuit breaker:

-- Arm the fuse as a safety net
SELECT pgtrickle.alter_stream_table('order_totals',
    fuse         => 'on',
    fuse_ceiling => 500000
);

-- Gate for controlled loads
SELECT pgtrickle.gate_source('public.orders');
-- ... load data ...
SELECT pgtrickle.ungate_source('public.orders');

-- The fuse catches any unexpected bulk changes outside the gated window

Watermark Gating (v0.7.0+)

Watermark gating extends source gating with LSN-based coordination for more precise control:

-- Set a watermark — refreshes only consume changes up to this LSN
SELECT pgtrickle.set_watermark('public.orders', pg_current_wal_lsn());

-- Load new data (changes accumulate beyond the watermark)
COPY orders FROM '/data/new_orders.csv' WITH (FORMAT csv, HEADER);

-- Advance the watermark to include the new data
SELECT pgtrickle.advance_watermark('public.orders', pg_current_wal_lsn());

-- Or clear the watermark entirely
SELECT pgtrickle.clear_watermark('public.orders');

See the SQL Reference — Watermark Gating for the complete API.

Further Reading

Tutorial: Migrating from Materialized Views

This guide shows how to incrementally migrate existing PostgreSQL MATERIALIZED VIEW + manual REFRESH workflows to pg_trickle stream tables.

Coming from a different background?

The step-by-step guide below covers the PostgreSQL materialized view path. If you are migrating from a different system, start here:

You are migrating fromJump to
PostgreSQL MATERIALIZED VIEW + REFRESHThis guide
pg_ivmMigrating from pg_ivm
Cron-based REFRESH (pg_cron, OS cron)Step 6 — Remove external refresh jobs
Application-level refresh (manual SQL in code)Step 2 — Create the stream table
Debezium + Materialize / RisingWavePort your queries to PostgreSQL SQL, then follow this guide. See Comparisons for feature mapping.
Looker PDTs (Persistent Derived Tables)PDTs map closely to stream tables. Translate the PDT SQL to a create_stream_table() call; the schedule replaces the PDT caching strategy.
Snowflake Dynamic TablesThe concepts are nearly identical. Map TARGET_LAG to a pg_trickle schedule; DOWNSTREAM is schedule => 'calculated'. See Comparisons.
Homemade ETL pipeline (INSERT ... SELECT)Replace the periodic ETL job with a stream table using the same SELECT query.

Why Migrate?

Materialized ViewStream Table
RefreshManual (REFRESH MATERIALIZED VIEW)Automatic (scheduler) or manual
Incremental refreshNot supportedBuilt-in differential mode
Blocking readsREFRESH without CONCURRENTLY blocks readersNever blocks readers
Dependency orderingManualAutomatic (DAG-aware topological refresh)
MonitoringNoneBuilt-in views, stats, NOTIFY alerts
SchedulingExternal (cron, pg_cron)Native (duration, cron, CALCULATED)

Step-by-Step Migration

1. Identify materialized views to migrate

-- List all materialized views with their defining queries
SELECT schemaname, matviewname, definition
FROM pg_matviews
ORDER BY schemaname, matviewname;

2. Create the stream table

Take the materialized view's defining query and pass it to create_stream_table():

Before (materialized view):

CREATE MATERIALIZED VIEW order_totals AS
SELECT customer_id, SUM(amount) AS total, COUNT(*) AS order_count
FROM orders
GROUP BY customer_id;

-- Refreshed via cron or pg_cron:
-- */5 * * * * psql -c "REFRESH MATERIALIZED VIEW CONCURRENTLY order_totals"

After (stream table):

SELECT pgtrickle.create_stream_table(
    name     => 'order_totals',
    query    => 'SELECT customer_id, SUM(amount) AS total, COUNT(*) AS order_count
                 FROM orders GROUP BY customer_id',
    schedule => '5m'
);

3. Update application queries

Stream tables live in the pgtrickle schema by default. Update your application queries to reference the new location:

-- Before:
SELECT * FROM public.order_totals WHERE total > 1000;

-- After:
SELECT * FROM pgtrickle.order_totals WHERE total > 1000;

Or create a view in the original schema for backward compatibility:

CREATE VIEW public.order_totals AS
SELECT customer_id, total, order_count
FROM pgtrickle.order_totals;

4. Recreate indexes

Stream tables are regular heap tables — you can add indexes just like any other table. Recreate the indexes your queries depend on:

-- Before (on materialized view):
CREATE UNIQUE INDEX ON order_totals (customer_id);

-- After (on stream table):
CREATE INDEX ON pgtrickle.order_totals (customer_id);

Note: The __pgt_row_id column is the primary key on stream tables. You cannot add a separate UNIQUE primary key, but you can add regular or unique indexes on your business columns.

5. Remove the old materialized view

Once you've verified the stream table is working correctly:

DROP MATERIALIZED VIEW IF EXISTS public.order_totals;

6. Remove external refresh jobs

Delete any cron jobs, pg_cron entries, or application-level refresh triggers that were maintaining the old materialized view.

Migrating Concurrent Refresh Patterns

If you use REFRESH MATERIALIZED VIEW CONCURRENTLY (which requires a unique index), the stream table equivalent is simpler — differential refresh never blocks readers and doesn't require a unique index:

Before:

CREATE MATERIALIZED VIEW active_users AS
SELECT user_id, MAX(login_at) AS last_login
FROM logins
WHERE login_at > NOW() - INTERVAL '30 days'
GROUP BY user_id;

CREATE UNIQUE INDEX ON active_users (user_id);
REFRESH MATERIALIZED VIEW CONCURRENTLY active_users;

After:

SELECT pgtrickle.create_stream_table(
    name     => 'active_users',
    query    => 'SELECT user_id, MAX(login_at) AS last_login
                 FROM logins
                 WHERE login_at > NOW() - INTERVAL ''30 days''
                 GROUP BY user_id',
    schedule => '1m'
);
-- No unique index needed. No manual refresh needed.

Migrating Cascading Materialized Views

If you have materialized views that depend on other materialized views, the migration is straightforward — pg_trickle handles dependency ordering automatically:

Before:

CREATE MATERIALIZED VIEW order_totals AS
SELECT customer_id, SUM(amount) AS total FROM orders GROUP BY customer_id;

CREATE MATERIALIZED VIEW big_customers AS
SELECT customer_id, total FROM order_totals WHERE total > 1000;

-- Must refresh in order:
REFRESH MATERIALIZED VIEW order_totals;
REFRESH MATERIALIZED VIEW big_customers;

After:

SELECT pgtrickle.create_stream_table(
    name     => 'order_totals',
    query    => 'SELECT customer_id, SUM(amount) AS total FROM orders GROUP BY customer_id',
    schedule => '1m'
);

SELECT pgtrickle.create_stream_table(
    name     => 'big_customers',
    query    => 'SELECT customer_id, total FROM pgtrickle.order_totals WHERE total > 1000',
    schedule => '1m'
);
-- Dependency ordering is automatic. No manual refresh needed.

Idempotent Deployment

For CI/CD pipelines, use create_or_replace_stream_table() so your migration scripts are safe to re-run:

SELECT pgtrickle.create_or_replace_stream_table(
    name         => 'order_totals',
    query        => 'SELECT customer_id, SUM(amount) AS total FROM orders GROUP BY customer_id',
    schedule     => '5m',
    refresh_mode => 'DIFFERENTIAL'
);

Choosing the Right Refresh Mode

ScenarioMode
Most migrations (default)DIFFERENTIAL — only processes changes
Volatile functions (NOW(), RANDOM()) in the queryFULL — the query result changes even without source DML
Need real-time consistency within a transactionIMMEDIATE
UnsureAUTO (default) — pg_trickle picks the best mode per cycle

Migration Checklist

  • Identify all materialized views and their refresh schedules
  • Create equivalent stream tables with matching queries
  • Recreate any required indexes on the stream tables
  • Update application queries to reference the pgtrickle schema
  • Verify data correctness (compare stream table vs. materialized view)
  • Remove external refresh jobs (cron, pg_cron)
  • Drop the old materialized views
  • Set up monitoring (Prometheus/Grafana or built-in views)

Further Reading

Tutorial: Migrating from pg_ivm to pg_trickle

This guide walks through migrating existing pg_ivm IMMVs (Incrementally Maintained Materialized Views) to pg_trickle stream tables. It covers API mapping, behavioral differences, and a step-by-step migration checklist.

See also: plans/ecosystem/GAP_PG_IVM_COMPARISON.md for the full feature comparison and gap analysis between the two extensions.


Why Migrate?

pg_ivm (IMMV)pg_trickle (Stream Table)
Maintenance modelImmediate only (in-transaction)Deferred (scheduler) and Immediate
Aggregate functions5 (COUNT, SUM, AVG, MIN, MAX)60+ (all built-in + user-defined)
Window functionsNot supportedFull support
CTEs (recursive)Not supportedSemi-naive, DRed, recomputation
SubqueriesVery limitedFull (EXISTS, NOT EXISTS, IN, LATERAL, scalar)
Set operationsNot supportedUNION, INTERSECT, EXCEPT (bag + set)
HAVING clauseNot supportedSupported
GROUPING SETS / CUBE / ROLLUPNot supportedAuto-rewritten to UNION ALL
DISTINCT ONNot supportedAuto-rewritten to ROW_NUMBER
Views as sourcesNot supportedAuto-inlined
Cascading viewsNot supportedDAG-aware topological scheduling
Background schedulingNone (manual only)Native cron, duration, CALCULATED
Monitoring1 catalog table15+ diagnostic functions
ConcurrencyExclusiveLock during maintenanceAdvisory locks, non-blocking reads
Parallel refreshNot supportedWorker pool with caps

Concept Mapping

pg_ivm Conceptpg_trickle EquivalentNotes
IMMV (Incrementally Maintained Materialized View)Stream tableSame idea — a query result kept incrementally up to date
pgivm.create_immv(name, query)pgtrickle.create_stream_table(name, query)pg_trickle adds optional schedule and refresh_mode parameters
pgivm.refresh_immv(name, true)pgtrickle.refresh_stream_table(name)Manual refresh
pgivm.refresh_immv(name, false)No direct equivalentpg_trickle has pgtrickle.alter_stream_table(name, enabled => false) to suspend
pgivm.pg_ivm_immv catalogpgtrickle.pgt_stream_tablesPlus pgt_status(), refresh_timeline(), etc.
DROP TABLE immv_namepgtrickle.drop_stream_table(name)Stream tables must be dropped via the API
ALTER TABLE immv RENAME TO ...pgtrickle.alter_stream_table(old, name => new)Rename via API
In-transaction maintenance (AFTER row triggers)refresh_mode => 'IMMEDIATE'Same model — triggers fire in the writing transaction
(not available)refresh_mode => 'DIFFERENTIAL'Deferred incremental refresh via change buffers
(not available)refresh_mode => 'AUTO'Picks DIFFERENTIAL or FULL automatically
Auto-created indexes on GROUP BY / PKManual CREATE INDEXpg_trickle auto-creates the primary key but not secondary indexes

Step-by-Step Migration

1. Inventory existing IMMVs

List all pg_ivm IMMVs in your database:

-- pg_ivm catalog
SELECT immvrelid::regclass AS immv_name,
       pgivm.get_immv_def(immvrelid) AS defining_query
FROM pgivm.pg_ivm_immv
ORDER BY immvrelid::regclass::text;

Record each IMMV's name, defining query, and any indexes you have created on it.

2. Check query compatibility

pg_trickle supports a superset of pg_ivm's SQL dialect, so any query that works with pg_ivm will work with pg_trickle. However, there are a few things to verify:

  • Data types: pg_ivm requires btree operator classes for all columns (excluding json, xml, point, etc.). pg_trickle has no such restriction.
  • Outer joins: If your IMMV uses outer joins, pg_trickle removes pg_ivm's restrictions (single equijoin, no aggregates, no CASE). Your query may work unchanged or you may be able to simplify workarounds you added for pg_ivm.

3. Choose a refresh mode

For each IMMV, decide which pg_trickle refresh mode to use:

pg_ivm behaviorpg_trickle refresh modeWhen to choose
Zero staleness requiredIMMEDIATESame in-transaction behavior as pg_ivm
Some staleness acceptableDIFFERENTIAL with scheduleLower write latency, batched refresh
Let pg_trickle decideAUTO (default)Recommended for most cases

4. Create stream tables

For each IMMV, create the corresponding stream table:

pg_ivm (before):

SELECT pgivm.create_immv(
    'order_totals',
    'SELECT customer_id, SUM(amount) AS total FROM orders GROUP BY customer_id'
);

pg_trickle — IMMEDIATE mode (same behavior as pg_ivm):

SELECT pgtrickle.create_stream_table(
    'order_totals',
    'SELECT customer_id, SUM(amount) AS total FROM orders GROUP BY customer_id',
    NULL,          -- no schedule needed for IMMEDIATE
    'IMMEDIATE'
);

pg_trickle — deferred mode (lower write latency):

SELECT pgtrickle.create_stream_table(
    'order_totals',
    'SELECT customer_id, SUM(amount) AS total FROM orders GROUP BY customer_id',
    '30s'          -- refresh every 30 seconds; mode defaults to AUTO
);

5. Recreate indexes

pg_ivm auto-creates indexes on GROUP BY, DISTINCT, and primary key columns. pg_trickle auto-creates the primary key (pgt_row_id) but not secondary indexes.

Recreate any indexes that your read queries depend on:

-- Example: index on the GROUP BY column for lookup queries
CREATE INDEX ON pgtrickle.order_totals (customer_id);

6. Update application queries

pg_ivm IMMVs live in the schema where they were created (usually public). pg_trickle stream tables default to the pgtrickle schema.

-- Before (pg_ivm):
SELECT * FROM public.order_totals WHERE customer_id = 42;

-- After (pg_trickle):
SELECT * FROM pgtrickle.order_totals WHERE customer_id = 42;

To avoid changing application code, create a compatibility view:

CREATE VIEW public.order_totals AS
SELECT * FROM pgtrickle.order_totals;

7. Verify correctness

After creating the stream table and running a refresh, compare results:

-- Compare row counts
SELECT 'immv' AS source, COUNT(*) FROM public.order_totals_immv
UNION ALL
SELECT 'stream_table', COUNT(*) FROM pgtrickle.order_totals;

-- Full diff (should return zero rows)
(SELECT * FROM public.order_totals_immv EXCEPT SELECT * FROM pgtrickle.order_totals)
UNION ALL
(SELECT * FROM pgtrickle.order_totals EXCEPT SELECT * FROM public.order_totals_immv);

8. Drop the old IMMV

Once you have verified the stream table is correct and applications are updated:

DROP TABLE public.order_totals_immv;

9. (Optional) Remove pg_ivm

After all IMMVs are migrated:

DROP EXTENSION pg_ivm CASCADE;

Remove pg_ivm from shared_preload_libraries if it was listed there and restart PostgreSQL.


Behavioral Differences to Be Aware Of

Locking

  • pg_ivm: Holds ExclusiveLock on the IMMV during maintenance. In REPEATABLE READ / SERIALIZABLE, concurrent writes to the same IMMV's base tables may raise serialization errors.
  • pg_trickle (IMMEDIATE): Uses advisory locks. Concurrent reads of the stream table are never blocked.
  • pg_trickle (deferred): Base table writes only insert into change buffers (~2–50 μs). No lock contention with refresh.

TRUNCATE

  • pg_ivm: Synchronously truncates or fully refreshes the IMMV.
  • pg_trickle (IMMEDIATE): Performs a full refresh within the same transaction.
  • pg_trickle (deferred): Clears the change buffer and queues a full refresh on the next cycle.

Logical Replication

  • pg_ivm: Not compatible with logical replication — subscriber nodes do not have triggers that fire for replicated changes.
  • pg_trickle (deferred): Supports WAL-based CDC (pg_trickle.cdc_mode = 'wal') which reads from the WAL directly. Trigger-based CDC also works with logical replication if triggers are created on the subscriber.

Schema Changes

  • pg_ivm: No automatic DDL tracking. If a base table column is altered, the IMMV may break silently.
  • pg_trickle: Event triggers detect DDL changes on source tables and automatically reinitialize affected stream tables.

Upgrading Queries That pg_ivm Couldn't Handle

pg_ivm's SQL restrictions often force users to create workarounds. With pg_trickle, many of these workarounds can be simplified:

HAVING clauses

-- pg_ivm workaround: filter in application or wrap in a view
SELECT pgivm.create_immv('big_customers',
    'SELECT customer_id, SUM(amount) AS total
     FROM orders GROUP BY customer_id'
);
-- Then: SELECT * FROM big_customers WHERE total > 1000;

-- pg_trickle: use HAVING directly
SELECT pgtrickle.create_stream_table('big_customers',
    'SELECT customer_id, SUM(amount) AS total
     FROM orders GROUP BY customer_id
     HAVING SUM(amount) > 1000'
);

NOT EXISTS / anti-joins

-- pg_ivm: not supported — manual workaround required

-- pg_trickle: works directly
SELECT pgtrickle.create_stream_table('orphan_orders',
    'SELECT o.* FROM orders o
     WHERE NOT EXISTS (SELECT 1 FROM customers c WHERE c.id = o.customer_id)'
);

Window functions

-- pg_ivm: not supported

-- pg_trickle: works directly
SELECT pgtrickle.create_stream_table('ranked_products',
    'SELECT product_id, category, revenue,
            RANK() OVER (PARTITION BY category ORDER BY revenue DESC) AS rnk
     FROM product_revenue'
);

UNION ALL pipelines

-- pg_ivm: not supported — requires separate IMMVs + application-side UNION

-- pg_trickle: works directly
SELECT pgtrickle.create_stream_table('all_events',
    'SELECT id, ts, ''order'' AS type FROM order_events
     UNION ALL
     SELECT id, ts, ''return'' AS type FROM return_events'
);

Monitoring After Migration

pg_trickle provides extensive monitoring that pg_ivm does not offer:

-- Overall health
SELECT * FROM pgtrickle.health_check();

-- Status of all stream tables (includes staleness, last refresh, error count)
SELECT * FROM pgtrickle.pgt_status();

-- Recent refresh history across all stream tables
SELECT * FROM pgtrickle.refresh_timeline(20);

-- CDC pipeline health
SELECT * FROM pgtrickle.change_buffer_sizes();

-- Diagnose errors for a specific stream table
SELECT * FROM pgtrickle.diagnose_errors('order_totals');

See SQL Reference for the complete list of monitoring functions.

dbt-pgtrickle

dbt-pgtrickle is the official dbt adapter for pg_trickle. It lets you define stream tables as dbt models using standard {{ config() }} blocks, manage them through dbt run / dbt build, and run incremental refreshes as part of your dbt pipeline.

Quick example

-- models/orders_agg.sql
{{ config(
    materialized = 'stream_table',
    schedule     = '5m',
    refresh_mode = 'DIFFERENTIAL'
) }}

SELECT customer_id,
       COUNT(*)         AS order_count,
       SUM(amount)      AS total_spent
FROM {{ ref('orders') }}
GROUP BY customer_id

Run it:

dbt run --select orders_agg

The model is created as a stream table and refreshed automatically. Subsequent dbt run invocations update the defining query if it changed (via ALTER QUERY), without dropping and recreating the table.

Installation

pip install dbt-pgtrickle

Requires dbt-postgres 1.7+ and pg_trickle v0.30+.

The full configuration reference, supported materializations, macros, testing guide, and CI setup are in the dbt-pgtrickle README.


dbt-pgtrickle

A dbt package that integrates pg_trickle stream tables into your dbt project via a custom stream_table materialization.

No custom Python adapter required — works with the standard dbt-postgres adapter. Just Jinja SQL macros that call pg_trickle's SQL API.

Prerequisites

RequirementMinimum Version
dbt Core≥ 1.9
dbt-postgres adapterMatching dbt Core version
PostgreSQL18.x
pg_trickle extension≥ 0.1.0 (CREATE EXTENSION pg_trickle;)

Installation

Add to your packages.yml:

packages:
  - git: "https://github.com/trickle-labs/pg-trickle.git"
    revision: v0.15.0
    subdirectory: "dbt-pgtrickle"

From dbt Hub (once published)

After the package is listed on dbt Hub, you can install by package name:

packages:
  - package: grove/dbt_pgtrickle
    version: [">=0.15.0", "<1.0.0"]

Note: dbt Hub listing requires a separate GitHub repository for the package. See docs/integrations/dbt-hub-submission.md for the submission checklist and steps.

Then run:

dbt deps

Quick Start

Create a model with materialized='stream_table':

-- models/marts/order_totals.sql
{{
  config(
    materialized='stream_table',
    schedule='5m',
    refresh_mode='DIFFERENTIAL'
  )
}}

SELECT
    customer_id,
    SUM(amount) AS total_amount,
    COUNT(*) AS order_count
FROM {{ source('raw', 'orders') }}
GROUP BY customer_id
dbt run --select order_totals   # Creates the stream table
dbt test --select order_totals  # Tests work normally (it's a real table)

Configuration Reference

KeyTypeDefaultDescription
materializedstringMust be 'stream_table'
schedulestring/null'1m'Refresh schedule (e.g., '5m', '1h', cron). null for pg_trickle's CALCULATED schedule.
refresh_modestring'DIFFERENTIAL''FULL', 'DIFFERENTIAL', 'AUTO', or 'IMMEDIATE'
initializebooltruePopulate on creation
statusstring/nullnull'ACTIVE' or 'PAUSED'. When set, applies on subsequent runs via alter_stream_table().
stream_table_namestringmodel nameOverride stream table name
stream_table_schemastringtarget schemaOverride schema
cdc_modestring/nullnullCDC mode override: 'auto', 'trigger', or 'wal'. null uses the GUC default.
partition_bystring/nullnullColumn name for RANGE partitioning of the storage table (v0.13.0+). Cannot be changed after creation.
fusestring/nullnullFuse circuit-breaker mode: 'off', 'on', or 'auto' (v0.13.0+). Applied via alter_stream_table() on every run; no-op if unchanged.
fuse_ceilingint/nullnullChange-count threshold that triggers the fuse (v0.13.0+). null uses the global GUC default.
fuse_sensitivityint/nullnullNumber of consecutive over-ceiling observations before the fuse blows (v0.13.0+). null means 1 (immediate).

partition_by — RANGE partitioning

Partition the stream table's storage table by a column value. pg_trickle creates a PARTITION BY RANGE (<col>) storage table with a default catch-all partition. Add your own date/integer range partitions via standard PostgreSQL DDL after dbt run.

-- models/marts/events_by_day.sql
{{ config(
    materialized='stream_table',
    schedule='1m',
    refresh_mode='DIFFERENTIAL',
    partition_by='event_day'
) }}

SELECT
    event_day,
    user_id,
    COUNT(*) AS event_count
FROM {{ source('raw', 'events') }}
GROUP BY event_day, user_id

Note: partition_by is applied only at creation time. Changing it after the stream table exists has no effect. Use dbt run --full-refresh to recreate with a new partition key.

fuse — Circuit breaker

The fuse circuit breaker suspends refreshes when the change volume exceeds a threshold, protecting against runaway refresh cycles during bulk ingestion.

-- models/marts/order_totals.sql
{{ config(
    materialized='stream_table',
    schedule='5m',
    refresh_mode='DIFFERENTIAL',
    fuse='auto',
    fuse_ceiling=50000,
    fuse_sensitivity=3
) }}

SELECT customer_id, SUM(amount) AS total
FROM {{ source('raw', 'orders') }}
GROUP BY customer_id
fuse valueBehaviour
'off'Fuse disabled (default)
'on'Fuse always active; blows when ceiling is exceeded
'auto'Fuse activates only when the delta is large enough to make FULL refresh cheaper than DIFFERENTIAL

Fuse parameters are applied on every dbt run via alter_stream_table() — only calls the SQL function when the values have genuinely changed from the catalog state.

Project-level defaults

# dbt_project.yml
models:
  my_project:
    marts:
      +materialized: stream_table
      +schedule: '5m'
      +refresh_mode: DIFFERENTIAL

Operations

pgtrickle_refresh — Manual refresh

dbt run-operation pgtrickle_refresh --args '{"model_name": "order_totals"}'

refresh_all_stream_tables — Refresh all in dependency order

Refreshes all dbt-managed stream tables in topological (dependency) order. Upstream tables are refreshed before downstream ones. Designed for CI pipelines: run after dbt run and before dbt test to ensure all data is current.

# Refresh all dbt-managed stream tables
dbt run-operation refresh_all_stream_tables

# Refresh only stream tables in a specific schema
dbt run-operation refresh_all_stream_tables --args '{"schema": "analytics"}'

drop_all_stream_tables — Drop dbt-managed stream tables

Drops only stream tables defined as dbt models (safe in shared environments):

dbt run-operation drop_all_stream_tables

drop_all_stream_tables_force — Drop ALL stream tables

Drops everything from the pg_trickle catalog, including non-dbt stream tables:

dbt run-operation drop_all_stream_tables_force

pgtrickle_check_cdc_health — CDC pipeline health

dbt run-operation pgtrickle_check_cdc_health

Raises an error (non-zero exit) if any CDC source is unhealthy.

Freshness Monitoring

Native dbt source freshness is not supported (the last_refresh_at column lives in the catalog, not on the stream table). Use the pgtrickle_check_freshness run-operation instead:

# Check all active stream tables (defaults: warn=600s, error=1800s)
dbt run-operation pgtrickle_check_freshness

# Custom thresholds
dbt run-operation pgtrickle_check_freshness \
  --args '{model_name: order_totals, warn_seconds: 300, error_seconds: 900}'

Exits non-zero when any stream table exceeds the error threshold — safe for CI.

Useful dbt Commands

# List all stream table models
dbt ls --select config.materialized:stream_table

# Full refresh (drop + recreate)
dbt run --select order_totals --full-refresh

# Build models + tests in DAG order
dbt build --select order_totals

Note: dbt build runs stream table models early in the DAG. If downstream models depend on a stream table with initialize: false, the table may not be populated yet.

Testing

Stream tables are standard PostgreSQL heap tables — all dbt tests work normally:

models:
  - name: order_totals
    columns:
      - name: customer_id
        tests:
          - not_null
          - unique

Stream Table Health Test

Use the built-in stream_table_healthy generic test to fail your dbt test suite when a stream table is stale, erroring, or paused:

models:
  - name: order_totals
    tests:
      - dbt_pgtrickle.stream_table_healthy:
          warn_seconds: 300  # fail if stale for more than 5 minutes

The test queries pgtrickle.pg_stat_stream_tables and returns rows for any unhealthy condition. An empty result set means the stream table is healthy.

Stream Table Status Macro

For more programmatic control, use the pgtrickle_stream_table_status() macro directly in custom tests or run-operations:

{%- set st = dbt_pgtrickle.pgtrickle_stream_table_status('order_totals', warn_seconds=300) -%}
{# st.status is one of: 'healthy', 'stale', 'erroring', 'paused', 'not_found' #}
{# st.staleness_seconds, st.consecutive_errors, st.total_refreshes, etc. #}

__pgt_row_id Column

pg_trickle adds an internal __pgt_row_id column to stream tables for row identity tracking. This column:

  • Appears in SELECT * and dbt docs generate
  • Does not affect dbt test unless you check column counts
  • Can be documented to reduce confusion:
columns:
  - name: __pgt_row_id
    description: "Internal pg_trickle row identity hash. Ignore this column."

Limitations

LimitationWorkaround
No in-place query alterationMaterialization auto-drops and recreates when query changes
__pgt_row_id visibleDocument it; exclude in downstream SELECT
No native dbt source freshnessUse pgtrickle_check_freshness run-operation
No dbt snapshot supportSnapshot the stream table as a regular table
Query change detection is whitespace-sensitivedbt compiles deterministically; unnecessary recreations are safe
PostgreSQL 18 requiredExtension requirement
Shared version tags with pg_trickle extensionPin to specific git revision

Contributing

See AGENTS.md for development guidelines and the implementation plan for design rationale.

Running tests locally

The quickest way (requires Docker and dbt installed):

# Full run — builds Docker image, starts container, runs tests, cleans up
just test-dbt

# Fast run — reuses existing Docker image (run after first build)
just test-dbt-fast

Or use the script directly with options:

cd dbt-pgtrickle/integration_tests/scripts

# Default: builds image, runs tests with dbt 1.9, cleans up
./run_dbt_tests.sh

# Skip image rebuild (faster iteration)
./run_dbt_tests.sh --skip-build

# Keep the container running after tests (for debugging)
./run_dbt_tests.sh --skip-build --keep-container

# Use a custom port (avoids conflicts with local PostgreSQL)
PGPORT=25432 ./run_dbt_tests.sh

Manual testing against an existing pg_trickle instance

If you already have PostgreSQL 18 + pg_trickle running locally:

export PGHOST=localhost PGPORT=5432 PGUSER=postgres PGPASSWORD=postgres PGDATABASE=postgres
cd dbt-pgtrickle/integration_tests
dbt deps
dbt seed
dbt run
./scripts/wait_for_populated.sh order_totals 30
dbt test
dbt run-operation drop_all_stream_tables

License

Apache 2.0 — see LICENSE.

CloudNativePG / Kubernetes

pg_trickle is designed to work with CloudNativePG (CNPG) — the Kubernetes operator for PostgreSQL. The extension is loaded via Image Volume Extensions, meaning no custom PostgreSQL image is needed.

Prerequisites

  • Kubernetes 1.33+ with the ImageVolume feature gate enabled
  • CloudNativePG operator 1.28+
  • The pg_trickle-ext OCI image available in your cluster registry

Architecture

┌─────────────────────────────────────┐
│  CNPG Cluster (3 pods)              │
│                                     │
│  ┌──────────┐  ┌──────────────────┐ │
│  │ Primary  │  │ pg_trickle-ext   │ │
│  │ PG 18    │◄─┤ (ImageVolume)    │ │
│  │          │  │ .so + .sql only  │ │
│  └──────────┘  └──────────────────┘ │
│  ┌──────────┐  ┌──────────┐        │
│  │ Replica 1│  │ Replica 2│        │
│  │ (standby)│  │ (standby)│        │
│  └──────────┘  └──────────┘        │
└─────────────────────────────────────┘
  • The scheduler runs on the primary pod only. Replica pods detect recovery mode (pg_is_in_recovery() = true) and sleep.
  • Stream tables are replicated to standbys via physical streaming replication like any other heap table.
  • Pod restarts are safe — the scheduler resumes from the stored frontier with no data loss.

Deploying pg_trickle on CNPG

1. Build the extension image

The cnpg/Dockerfile.ext builds a scratch-based OCI image containing only the shared library, control file, and SQL migrations:

# From the dist/ directory with pre-built artifacts:
docker build -t ghcr.io/<owner>/pg_trickle-ext:0.13.0 -f cnpg/Dockerfile.ext dist/
docker push ghcr.io/<owner>/pg_trickle-ext:0.13.0

2. Deploy the Cluster

Apply the Cluster manifest with pg_trickle configured as an Image Volume extension:

# cnpg/cluster-example.yaml
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: pg-trickle-demo
spec:
  instances: 3
  imageName: ghcr.io/cloudnative-pg/postgresql:18

  postgresql:
    shared_preload_libraries:
      - pg_trickle
    extensions:
      - name: pg-trickle
        image:
          reference: ghcr.io/<owner>/pg_trickle-ext:0.13.0
    parameters:
      max_worker_processes: "8"

  bootstrap:
    initdb:
      database: app
      owner: app

  storage:
    size: 10Gi
    storageClass: standard
kubectl apply -f cnpg/cluster-example.yaml

3. Enable the extension

Use the CNPG Database resource for declarative extension management:

# cnpg/database-example.yaml
apiVersion: postgresql.cnpg.io/v1
kind: Database
metadata:
  name: app
spec:
  cluster:
    name: pg-trickle-demo
  name: app
  owner: app
  extensions:
    - name: pg_trickle
kubectl apply -f cnpg/database-example.yaml

4. Verify

kubectl exec -it pg-trickle-demo-1 -- psql -U postgres -d app -c \
  "SELECT pgtrickle.version();"

Key Considerations

Worker processes

Each database with pg_trickle needs one background worker slot. Set max_worker_processes in the Cluster manifest to accommodate the launcher (1) + one scheduler per database + any parallel refresh workers:

parameters:
  max_worker_processes: "16"

Persistent volumes

Catalog tables (pgtrickle.pgt_stream_tables) and change buffers (pgtrickle_changes.*) are stored in regular PostgreSQL tablespaces. Persistent volume claims preserve them across pod rescheduling.

Backups

pg_trickle state (catalog, change buffers, stream table data) is included in CNPG's Barman object-store backups automatically. After a restore, the scheduler detects frontier inconsistencies and performs a full refresh on the first cycle. See Backup and Restore for details.

Failover

When the primary pod fails and a replica is promoted, the new primary's scheduler starts automatically. Since stream tables were replicated via streaming replication, they are already up-to-date (minus replication lag). The scheduler resumes refreshing from the stored frontier.

Resource limits

For production deployments, set resource requests and limits in the Cluster manifest to prevent the scheduler from starving other workloads:

resources:
  requests:
    memory: 512Mi
    cpu: 500m
  limits:
    memory: 2Gi
    cpu: 2000m

Example manifests

The repository includes ready-to-use manifests in the cnpg/ directory:

FilePurpose
cnpg/Dockerfile.extBuild the scratch-based extension image
cnpg/Dockerfile.ext-buildMulti-stage build for CI/CD pipelines
cnpg/cluster-example.yamlComplete Cluster manifest with pg_trickle
cnpg/database-example.yamlDatabase resource with declarative extension management

Further reading

Citus Distributed Tables

pg_trickle supports Citus distributed tables as sources for incremental view maintenance and as output targets for stream tables.

Prerequisites

  • PostgreSQL 17 or 18 with wal_level = logical on every node (coordinator and all workers).
  • Citus 12.x or 13.x installed on the coordinator and all workers.
  • The dblink extension installed on the coordinator (CREATE EXTENSION IF NOT EXISTS dblink).
  • pg_trickle installed at the same version on every node.
  • Each source distributed table must have REPLICA IDENTITY FULL:
    ALTER TABLE my_distributed_table REPLICA IDENTITY FULL;
    

Architecture Overview

┌───────────────────────────────────────────────────────────┐
│  Citus Coordinator                                         │
│                                                            │
│  pg_trickle scheduler                                      │
│    ├─ reads coordinator WAL slot (local sources)           │
│    └─ polls worker WAL slots via dblink (distributed)      │
│                                                            │
│  pgtrickle.pgt_worker_slots   ← tracks per-worker slots   │
│  pgtrickle.citus_status       ← observability view        │
└─────────────┬────────────┬────────────────────────────────┘
              │            │
        dblink│      dblink│
              ▼            ▼
┌─────────────────┐  ┌─────────────────┐
│  Citus Worker 1 │  │  Citus Worker 2 │
│  WAL slot:      │  │  WAL slot:      │
│  pgtrickle_...  │  │  pgtrickle_...  │
└─────────────────┘  └─────────────────┘

pg_trickle creates a logical replication slot on each worker for every distributed source table. The coordinator scheduler polls these slots via dblink on every tick, merges the decoded changes into the coordinator-local change buffer, and then applies them to the stream table output.

Installation

1. Verify prerequisites on every node

-- Run on coordinator AND each worker:
SHOW wal_level;            -- must be 'logical'
SELECT extname, extversion FROM pg_extension WHERE extname IN ('citus', 'pg_trickle', 'dblink');

2. Create the extension on the coordinator

CREATE EXTENSION IF NOT EXISTS dblink;
CREATE EXTENSION IF NOT EXISTS pg_trickle;

3. Run pre-flight checks

pg_trickle provides two pre-flight helpers that verify worker readiness:

-- COORD-7: Verify pg_trickle version matches on all workers
SELECT pgtrickle.source_stable_name(0::oid);  -- triggers version check on startup

-- COORD-8: Verify wal_level=logical on all workers
-- (checked automatically when a distributed CDC source is set up)

4. Prepare your distributed source table

-- Distribute your source table if not already distributed
SELECT create_distributed_table('orders', 'customer_id');

-- REPLICA IDENTITY FULL is required for CDC on distributed tables
ALTER TABLE orders REPLICA IDENTITY FULL;

5. Create a stream table over a distributed source

-- Basic stream table (output stored on coordinator)
CALL pgtrickle.create_stream_table(
    name  => 'orders_summary',
    query => 'SELECT customer_id, count(*) AS order_count, sum(amount) AS total
              FROM orders GROUP BY customer_id'
);

-- Distributed output: co-locate the stream table with the source
CALL pgtrickle.create_stream_table(
    name                       => 'orders_summary',
    query                      => 'SELECT customer_id, count(*) AS order_count,
                                           sum(amount) AS total
                                   FROM orders GROUP BY customer_id',
    output_distribution_column => 'customer_id'
);

The output_distribution_column parameter (added in v0.33.0) converts the output storage table into a Citus distributed table on that column immediately after creation. If Citus is not loaded and you pass this parameter, an error is raised.

Placement Options

PlacementWhen to useCreated by
local (default)Small result sets, coordinator-only queriescreate_stream_table() without output_distribution_column
distributedLarge result sets, co-location with source shardsoutput_distribution_column => 'col'
referenceLookup tables replicated to all workerscreate_distributed_table(st, 'col', colocate_with => 'none') after creation

Monitoring

The pgtrickle.citus_status view shows per-worker CDC slot health:

SELECT
    pgt_schema || '.' || pgt_name AS stream_table,
    source_stable_name,
    source_placement,
    worker_name,
    worker_port,
    worker_slot,
    worker_frontier,
    last_polled_at,     -- v0.34.0+
    lease_health        -- v0.34.0+: 'unlocked' | 'locked' | 'expired'
FROM pgtrickle.citus_status
ORDER BY pgt_name, worker_name;
ColumnDescription
coordinator_slotLocal WAL slot name on the coordinator
source_placementdistributed, reference, or local
worker_nameHostname of the Citus worker
worker_portPort of the Citus worker
worker_slotWAL slot name on the worker
worker_frontierLast consumed LSN on the worker
last_polled_atTimestamp of the last successful poll for each worker slot (v0.34.0+)
lease_holderSession that currently holds the pgt_st_locks lease, if any (v0.34.0+)
lease_acquired_atWhen the current lease was acquired (v0.34.0+)
lease_expires_atWhen the current lease expires (v0.34.0+)
lease_health'unlocked', 'locked', or 'expired' (v0.34.0+)

Worker-failure alerting GUC (v0.34.0)

GUCDefaultDescription
pg_trickle.citus_worker_retry_ticks5Consecutive per-worker poll failures before raising a WARNING in the PostgreSQL log. Set to 0 to disable.
pg_trickle.citus_st_lock_lease_ms60000Duration (ms) of the pgt_st_locks distributed-refresh lease. Must be ≥ pg_ripple.merge_fence_timeout_ms when pg_ripple is in use.

Failure Modes

Worker unreachable

If a worker becomes unreachable, poll_worker_slot_changes() returns an error. pg_trickle logs the failure and skips that worker's changes for the current tick. Refresh resumes automatically once the worker is reachable again.

Action: Monitor pgtrickle.citus_status and alert on gaps in worker_frontier.

WAL slot recycled (slot missing or lag too high)

If the coordinator stops polling a worker slot for too long, PostgreSQL may recycle the WAL and invalidate the slot. pg_trickle will log a WalTransitionError and fall back to a full refresh for that stream table.

Prevention: Set pg_trickle.citus_slot_max_lag_bytes (default: 1 GB) and ensure the coordinator restarts within the slot retention window.

Recovery:

-- Drop the stale slot on the worker (via dblink if needed)
SELECT pg_drop_replication_slot('pgtrickle_<stable_name>');
-- pg_trickle will re-create it on the next scheduler tick

Shard rebalance

Citus shard rebalancing changes which worker holds which shards. Since v0.34.0, pg_trickle detects a topology change automatically (by comparing pg_dist_node active primaries against pgt_worker_slots) and recovers without operator intervention:

  1. Stale slot entries for removed workers are dropped.
  2. New pgt_worker_slots rows are inserted for the incoming workers.
  3. The affected stream table is marked for a full refresh on the next tick.

No manual DROP + CREATE of stream tables is required after a rebalance.

Version mismatch across nodes

If pg_trickle versions differ between the coordinator and workers, check_citus_version_compat() raises an error during CDC setup. Install the same pg_trickle version on all nodes before creating distributed stream tables.

Known Limitations

  • MERGE is not supported for distributed stream tables. pg_trickle automatically uses the DELETE + INSERT … ON CONFLICT DO UPDATE path for distributed output tables.

  • Cross-shard JOINs in the stream table query follow normal Citus pushdown rules. If the plan is not pushable, the query runs on the coordinator.

  • Citus reference tables work as sources with trigger-based CDC only (per-worker WAL slots are not needed for reference tables).

  • Worker failure alerting — configure pg_trickle.citus_worker_retry_ticks (default 5) to control how many consecutive poll failures trigger a WARNING. Set to 0 to disable the alert entirely.

pg_ripple Integration (v0.58.0+)

pg_trickle v0.33.0 and pg_ripple v0.58.0 can be deployed together on a Citus cluster. pg_ripple stores its RDF triples in Vertical Partitioning (VP) tables that are distributed by subject hash (s BIGINT). pg_trickle can track changes to these tables and materialize downstream stream tables.

Co-location Contract

VP tables are distributed on s (the XXH3-128 subject ID encoded as BIGINT). Downstream stream tables consuming VP data should use the same distribution column to avoid coordinator fan-out:

CALL pgtrickle.create_stream_table(
    name                       => 'rdf_subjects',
    query                      => 'SELECT s, count(*) AS triple_count
                                   FROM _pg_ripple.vp_42_delta GROUP BY s',
    output_distribution_column => 's'   -- co-locate with VP shards
);

The natural row identity for such a stream table is (s, predicate_hash, g) — the triple's encoded subject, predicate, and named-graph. Configure pg_trickle with this composite key so the DELETE WHERE row_id IN (…) apply path targets the correct shard.

VP Table Promotion Notifications

When pg_ripple distributes a new VP table it emits a pg_ripple.vp_promoted NOTIFY with the following JSON payload:

{
  "table":             "_pg_ripple.vp_42_delta",
  "shard_count":       32,
  "shard_table_prefix":"_pg_ripple.vp_42_delta_",
  "predicate_id":      42
}

pg_trickle ships a helper function that processes this payload. Use it from any regular backend session that LISTENs to the channel:

LISTEN "pg_ripple.vp_promoted";
-- … wait for pg_notify …
-- (in application code: call handle_vp_promoted with the notification payload)
SELECT pgtrickle.handle_vp_promoted(pg_notification_queue_transfer());
-- or pass the payload directly:
SELECT pgtrickle.handle_vp_promoted(
  '{"table":"_pg_ripple.vp_42_delta","shard_count":32,'
  '"shard_table_prefix":"_pg_ripple.vp_42_delta_","predicate_id":42}'
);

handle_vp_promoted() logs the promotion and, when the VP table is already tracked as a distributed CDC source, signals the scheduler that worker-slot probing should run on the next tick.

Merge Fencing and pgt_st_locks Lease Alignment

pg_ripple's merge worker emits pg_ripple.merge_start / merge_end NOTIFY signals as observability hints — the TRUNCATE+INSERT merge is a single 2PC transaction so no inconsistent state is visible to pg_trickle's per-worker WAL decoders even without these signals.

pg_trickle uses pgtrickle.pgt_st_locks (catalog-based leases) for cross-node coordination. Set the pgt_st_locks lease expiry pg_ripple.merge_fence_timeout_ms to prevent a lease from expiring mid-merge:

-- pg_ripple side (postgresql.conf or SET):
SET pg_ripple.merge_fence_timeout_ms = 30000;   -- 30 seconds

-- pg_trickle side:
SET pg_trickle.citus_st_lock_lease_ms = 45000;  -- 45 seconds (≥ 30s fence)

Monitor both together:

SELECT
    r.predicate_id,
    r.cycle_duration_ms,
    c.stream_table,
    c.worker_frontier
FROM pg_ripple.merge_status()   r
JOIN pgtrickle.citus_status      c
  ON c.source_stable_name LIKE '_pg_ripple_vp_' || r.predicate_id || '_%';

Prerequisites

  • pg_ripple ≥ 0.58.0 (Citus support)
  • pg_trickle ≥ 0.33.0 (distributed CDC + stream tables)
  • Citus 12.x on all nodes
  • pg_ripple.citus_sharding_enabled = on
  • pg_ripple.citus_trickle_compat = on (sets colocate_with='none' on VP tables, avoiding cross-shard tombstone deletes during CDC apply)

Performance Considerations

  • dblink polling adds round-trip latency per worker per tick. On a loopback network, throughput exceeds 50 k rows/s (see benches/bench_remote_slot_poll). If your workload requires higher throughput, consider batching slot polls or increasing the scheduler poll interval.
  • For large distributed stream tables, co-locating the output with the source shards (output_distribution_column) avoids data movement during apply.

See Also

Prometheus & Grafana Monitoring

pg_trickle ships with a complete observability stack based on postgres_exporter, Prometheus, and Grafana. The monitoring/ directory in the repository contains everything you need.

Quick Start

cd monitoring/
docker compose up -d

Open Grafana at http://localhost:3000 (default: admin / admin). The pg_trickle Overview dashboard is pre-provisioned.

Architecture

PostgreSQL + pg_trickle
        │
        │  custom SQL queries
        ▼
postgres_exporter (:9187)
        │
        │  /metrics (Prometheus format)
        ▼
   Prometheus (:9090)
        │
        │  data source
        ▼
    Grafana (:3000)

postgres_exporter runs custom SQL queries defined in prometheus/pg_trickle_queries.yml against the pg_trickle monitoring views (pgtrickle.stream_tables_info, pgtrickle.pg_stat_stream_tables, etc.) and exposes them as Prometheus metrics.

Connecting to an Existing Database

If you already have PostgreSQL + pg_trickle running, configure the exporter to point at your instance:

export PG_HOST=your-pg-host
export PG_PORT=5432
export PG_USER=postgres
export PG_PASSWORD=yourpassword
export PG_DATABASE=yourdb
docker compose up -d

Or edit the DATA_SOURCE_NAME in docker-compose.yml directly.

Metrics Exposed

All metrics are prefixed pg_trickle_.

MetricTypeDescription
pg_trickle_stream_tables_totalgaugeTotal stream tables by status
pg_trickle_stale_tables_totalgaugeTables with data older than schedule
pg_trickle_consecutive_errorsgaugePer-table consecutive error count
pg_trickle_refresh_duration_msgaugeAverage refresh duration (ms)
pg_trickle_total_refreshescounterTotal refresh count per table
pg_trickle_failed_refreshescounterFailed refresh count per table
pg_trickle_rows_inserted_totalcounterRows inserted per table
pg_trickle_rows_deleted_totalcounterRows deleted per table
pg_trickle_staleness_secondsgaugeSeconds since last successful refresh
pg_trickle_cdc_pending_rowsgaugePending rows in CDC change buffer
pg_trickle_cdc_buffer_bytesgaugeCDC change buffer size in bytes
pg_trickle_scheduler_runninggauge1 if scheduler background worker is alive
pg_trickle_health_statusgaugeOverall health: 0=OK, 1=WARNING, 2=CRITICAL

Pre-configured Alerts

Alerting rules are defined in prometheus/alerts.yml:

AlertConditionSeverity
PgTrickleTableStaleStaleness > 5 min past schedulewarning
PgTrickleConsecutiveErrors≥ 3 consecutive refresh failureswarning
PgTrickleTableSuspendedAny table in SUSPENDED statuscritical
PgTrickleCdcBufferLargeCDC buffer > 1 GBwarning
PgTrickleSchedulerDownScheduler not running for > 2 mincritical
PgTrickleHighRefreshDurationAvg refresh > 30 swarning

NOTIFY-Based Alerting

In addition to Prometheus alerts, pg_trickle emits real-time PostgreSQL NOTIFY events on the pg_trickle_alert channel:

LISTEN pg_trickle_alert;

Events include stale_data, auto_suspended, reinitialize_needed, buffer_growth_warning, fuse_blown, refresh_completed, and refresh_failed. Each notification carries a JSON payload with the stream table name and relevant details.

You can bridge NOTIFY events to external alerting systems (PagerDuty, Slack, etc.) using tools like pgnotify or a simple LISTEN loop in your application.

Grafana Dashboard

The pre-provisioned pg_trickle Overview dashboard (grafana/dashboards/pg_trickle_overview.json) includes panels for:

  • Stream table status distribution (active / suspended / error)
  • Refresh rate and duration over time
  • Staleness heatmap
  • CDC buffer sizes
  • Consecutive error counts
  • Scheduler uptime

Built-in SQL Monitoring Views

pg_trickle also provides built-in monitoring accessible without Prometheus:

-- Quick health overview (returns warnings and errors)
SELECT * FROM pgtrickle.health_check() WHERE severity != 'OK';

-- Stream table status and staleness
SELECT name, status, refresh_mode, staleness
FROM pgtrickle.stream_tables_info;

-- Detailed refresh statistics
SELECT * FROM pgtrickle.pg_stat_stream_tables;

-- CDC health per source table
SELECT * FROM pgtrickle.check_cdc_health();

-- Change buffer sizes
SELECT * FROM pgtrickle.change_buffer_sizes()
ORDER BY pending_rows DESC;

See the SQL Reference for the complete list of monitoring functions.

Files Reference

FilePurpose
monitoring/docker-compose.ymlDemo stack: PG + exporter + Prometheus + Grafana
monitoring/prometheus/prometheus.ymlPrometheus scrape configuration
monitoring/prometheus/pg_trickle_queries.ymlCustom SQL queries for postgres_exporter
monitoring/prometheus/alerts.ymlAlerting rules
monitoring/grafana/provisioning/Auto-provisioned data source + dashboard
monitoring/grafana/dashboards/pg_trickle_overview.jsonOverview dashboard

Requirements

  • Docker 24+ with Compose v2
  • pg_trickle 0.10.0+ installed in the target database
  • PostgreSQL user with SELECT on the pgtrickle.* schema

PgBouncer & Connection Poolers

pg_trickle's background scheduler uses session-level PostgreSQL features. This page explains how to configure pg_trickle alongside connection poolers like PgBouncer, Supavisor (Supabase), and PgCat.

Compatibility Matrix

Pooling ModeCompatible?Notes
Session mode (pool_mode = session)✅ FullyAll features work.
Direct connection (no pooler for scheduler)✅ FullyApplication queries can still go through a pooler.
Transaction mode (pool_mode = transaction)❌ Not supportedAdvisory locks, prepared statements, and LISTEN/NOTIFY are session-scoped.
Statement mode (pool_mode = statement)❌ Not supportedSame session-scoped limitations.

Why Transaction Mode Breaks

The pg_trickle scheduler relies on three session-level features:

FeatureProblem in Transaction Mode
pg_advisory_lock()Session lock released when connection returns to pool — concurrent refreshes become possible
PREPARE / EXECUTEPrepared statements vanish on connection hop — "prepared statement does not exist" errors
LISTEN / NOTIFYListener loses notifications when assigned a different backend connection

Route the pg_trickle background worker through a direct connection while keeping application traffic on the pooler:

┌─────────────────┐     ┌──────────────┐
│  Application    │────▶│  PgBouncer   │──┐
│  (transaction   │     │  (txn mode)  │  │
│   mode OK)      │     └──────────────┘  │
└─────────────────┘                       │
                                          ▼
┌─────────────────┐                ┌─────────────┐
│  pg_trickle     │───────────────▶│ PostgreSQL   │
│  scheduler      │  direct conn   │             │
│  (session mode) │                └─────────────┘
└─────────────────┘

The scheduler connects directly to PostgreSQL as a background worker — it does not go through the pooler at all. No special configuration is needed for this; the scheduler always uses an internal SPI connection.

The pooler only matters for application queries that read from stream tables or call pg_trickle functions (e.g., refresh_stream_table()).

Platform-Specific Notes

Supabase

Supabase uses Supavisor in transaction mode by default. pg_trickle's scheduler works because it runs as a background worker (bypasses the pooler). Application queries against stream tables work normally through the pooler since they are regular SELECT statements.

If you call pgtrickle.refresh_stream_table() from application code, use the direct connection string (port 5432) rather than the pooled connection (port 6543).

Neon

Neon uses a custom proxy that supports both session and transaction modes. Use the session-mode connection string for any pg_trickle management calls. The scheduler runs as a background worker and is unaffected by the proxy.

AWS RDS Proxy

RDS Proxy only supports transaction-mode pooling. The pg_trickle scheduler runs as a background worker inside the RDS instance and is unaffected. Application queries reading stream tables work normally through the proxy.

Manual refresh_stream_table() calls through the proxy may fail due to advisory lock issues. Use a direct connection for management operations.

Pooler Compatibility Mode

pg_trickle includes a pooler_compatibility_mode setting (v0.10.0+) that adjusts internal behavior for environments where the scheduler's SPI connection may be affected by pooler-like middleware:

-- Usually not needed — the scheduler bypasses external poolers
SHOW pg_trickle.pooler_compatibility_mode;

This GUC is primarily for edge cases in managed PostgreSQL services. For standard deployments, the default setting works correctly.

Further Reading

Flyway & Liquibase Migration Frameworks

pg_trickle stream tables are managed through SQL function calls, not standard DDL (CREATE TABLE / ALTER TABLE). This page documents patterns for integrating pg_trickle with Flyway and Liquibase migration frameworks.

Key Principle

Stream tables are created and managed via pgtrickle.create_stream_table(), pgtrickle.alter_stream_table(), and pgtrickle.drop_stream_table(). These are regular SQL function calls that can be embedded in any migration script.

CDC triggers are automatically installed on source tables during stream table creation — no manual trigger management is needed.


Flyway

Creating Stream Tables in Migrations

Place stream table definitions in versioned migration files alongside your regular schema changes:

-- V3__create_order_stream_tables.sql

-- 1. Create the source tables first (standard DDL)
CREATE TABLE IF NOT EXISTS orders (
    id         SERIAL PRIMARY KEY,
    region     TEXT NOT NULL,
    amount     NUMERIC(10,2) NOT NULL,
    created_at TIMESTAMPTZ DEFAULT now()
);

-- 2. Create stream tables via pg_trickle API
SELECT pgtrickle.create_stream_table(
    'order_totals',
    $$SELECT region, COUNT(*) AS order_count, SUM(amount) AS total
      FROM orders GROUP BY region$$,
    schedule     => '5s',
    refresh_mode => 'DIFFERENTIAL'
);

Altering Stream Tables

Use pgtrickle.alter_stream_table() in a new migration:

-- V5__update_order_totals_schedule.sql
SELECT pgtrickle.alter_stream_table(
    'order_totals',
    schedule => '10s'
);

Altering the Defining Query

Use alter_query to change the SQL without dropping and recreating:

-- V7__add_avg_to_order_totals.sql
SELECT pgtrickle.alter_stream_table(
    'order_totals',
    alter_query => $$SELECT region,
                            COUNT(*) AS order_count,
                            SUM(amount) AS total,
                            AVG(amount) AS avg_amount
                     FROM orders GROUP BY region$$
);

Dropping Stream Tables

-- V9__remove_legacy_stream_tables.sql
SELECT pgtrickle.drop_stream_table('legacy_report');

Bulk Creation

For environments with many stream tables, use bulk_create to create them atomically:

-- V4__create_all_stream_tables.sql
SELECT pgtrickle.bulk_create('[
    {
        "name": "order_totals",
        "query": "SELECT region, COUNT(*) AS cnt, SUM(amount) AS total FROM orders GROUP BY region",
        "schedule": "5s",
        "refresh_mode": "DIFFERENTIAL"
    },
    {
        "name": "daily_revenue",
        "query": "SELECT date_trunc(''day'', created_at) AS day, SUM(amount) AS revenue FROM orders GROUP BY 1",
        "schedule": "30s",
        "refresh_mode": "DIFFERENTIAL"
    }
]'::jsonb);

Ordering: Source Tables Before Stream Tables

Flyway executes migrations in version order. Ensure source tables are created in an earlier migration than their dependent stream tables:

V1__create_schema.sql           -- CREATE TABLE orders, products, ...
V2__create_indexes.sql          -- CREATE INDEX ...
V3__create_stream_tables.sql    -- SELECT pgtrickle.create_stream_table(...)

Repeatable Migrations

If you want stream table definitions to be re-applied on every Flyway run (for development environments), use repeatable migrations:

-- R__stream_tables.sql
-- Drop and recreate all stream tables
SELECT pgtrickle.drop_stream_table('order_totals') 
WHERE EXISTS (
    SELECT 1 FROM pgtrickle.pgt_stream_tables 
    WHERE pgt_name = 'order_totals'
);

SELECT pgtrickle.create_stream_table(
    'order_totals',
    $$SELECT region, COUNT(*) AS cnt FROM orders GROUP BY region$$,
    schedule => '5s',
    refresh_mode => 'DIFFERENTIAL'
);

Or use create_or_replace_stream_table for idempotent definitions:

-- R__stream_tables.sql (idempotent)
SELECT pgtrickle.create_or_replace_stream_table(
    'order_totals',
    $$SELECT region, COUNT(*) AS cnt FROM orders GROUP BY region$$,
    schedule => '5s',
    refresh_mode => 'DIFFERENTIAL'
);

Handling ALTER TABLE on Source Tables

When a Flyway migration alters a source table (e.g., adding a column), pg_trickle's DDL event trigger detects the change and suspends affected stream tables. After the schema change, stream tables resume automatically on the next refresh cycle.

If the source table change invalidates the stream table's defining query (e.g., removing a referenced column), you must update or drop the stream table in the same or a subsequent migration.


Liquibase

Creating Stream Tables in Changesets

Use Liquibase's <sql> tag to call pg_trickle functions:

<!-- changelog-3.0.xml -->
<changeSet id="create-order-stream-tables" author="dev">
    <sql>
        SELECT pgtrickle.create_stream_table(
            'order_totals',
            $pgt$SELECT region, COUNT(*) AS order_count, SUM(amount) AS total
                  FROM orders GROUP BY region$pgt$,
            schedule     => '5s',
            refresh_mode => 'DIFFERENTIAL'
        );
    </sql>
    <rollback>
        <sql>SELECT pgtrickle.drop_stream_table('order_totals');</sql>
    </rollback>
</changeSet>

Rollback Support

Always include <rollback> blocks that drop the stream table:

<changeSet id="add-daily-revenue-st" author="dev">
    <sql>
        SELECT pgtrickle.create_stream_table(
            'daily_revenue',
            $pgt$SELECT date_trunc('day', created_at) AS day,
                        SUM(amount) AS revenue
                 FROM orders GROUP BY 1$pgt$,
            schedule => '30s',
            refresh_mode => 'DIFFERENTIAL'
        );
    </sql>
    <rollback>
        <sql>SELECT pgtrickle.drop_stream_table('daily_revenue');</sql>
    </rollback>
</changeSet>

Altering Stream Tables

<changeSet id="update-order-totals-schedule" author="dev">
    <sql>
        SELECT pgtrickle.alter_stream_table(
            'order_totals',
            schedule => '10s'
        );
    </sql>
    <rollback>
        <sql>
            SELECT pgtrickle.alter_stream_table(
                'order_totals',
                schedule => '5s'
            );
        </sql>
    </rollback>
</changeSet>

Preconditions

Use Liquibase preconditions to check whether pg_trickle is available:

<changeSet id="create-stream-tables" author="dev">
    <preConditions onFail="MARK_RAN">
        <sqlCheck expectedResult="1">
            SELECT COUNT(*) FROM pg_extension WHERE extname = 'pg_trickle'
        </sqlCheck>
    </preConditions>
    <sql>
        SELECT pgtrickle.create_stream_table(...);
    </sql>
</changeSet>

Common Patterns

Environment-Specific Schedules

Use different schedules for development vs. production:

-- Use a function to parameterize schedules
SELECT pgtrickle.create_stream_table(
    'order_totals',
    $$SELECT region, COUNT(*) AS cnt FROM orders GROUP BY region$$,
    schedule => CASE 
        WHEN current_setting('pg_trickle.enabled', true) = 'on' 
        THEN '5s' 
        ELSE '1m' 
    END,
    refresh_mode => 'DIFFERENTIAL'
);

CI/Test Environments

In CI, set pg_trickle.enabled = off in postgresql.conf to prevent the background scheduler from running during schema migrations. Stream tables will still be created correctly — they just won't auto-refresh until the scheduler is enabled.

Extension Dependency

Ensure CREATE EXTENSION pg_trickle runs before any stream table migration. In Flyway, use an early versioned migration:

-- V0__extensions.sql
CREATE EXTENSION IF NOT EXISTS pg_trickle;

In Liquibase:

<changeSet id="install-extensions" author="dev" runOnChange="true">
    <sql>CREATE EXTENSION IF NOT EXISTS pg_trickle;</sql>
</changeSet>

Further Reading

ORM Integration Guides

pg_trickle stream tables are read-only materialized views that refresh automatically. This page documents how to use stream tables from popular Python ORMs — SQLAlchemy and Django ORM.

Key Principles

  1. Stream tables are read-only. All writes go to the source tables; pg_trickle refreshes stream tables in the background.
  2. Model stream tables as views, not regular tables. ORMs should never attempt INSERT, UPDATE, or DELETE on a stream table.
  3. Internal columns are hidden. The __pgt_row_id column used for incremental maintenance is excluded from SELECT * queries.

SQLAlchemy

Read-Only Model Definition

Map a stream table as a read-only model using __table_args__:

from sqlalchemy import Column, Integer, Numeric, String, BigInteger
from sqlalchemy.orm import DeclarativeBase

class Base(DeclarativeBase):
    pass

class OrderTotals(Base):
    """Read-only model backed by pg_trickle stream table."""
    __tablename__ = "order_totals"
    
    # Map the stream table's row ID as primary key for ORM identity
    __pgt_row_id = Column("__pgt_row_id", BigInteger, primary_key=True)
    
    region = Column(String, nullable=False)
    order_count = Column(BigInteger, nullable=False)
    total = Column(Numeric(10, 2), nullable=False)
    
    __table_args__ = {
        "info": {"readonly": True},  # Convention marker
    }

Querying

Query stream tables like any other SQLAlchemy model:

from sqlalchemy import select

# All regions
stmt = select(OrderTotals).order_by(OrderTotals.total.desc())
results = session.execute(stmt).scalars().all()

# Filtered
stmt = (
    select(OrderTotals)
    .where(OrderTotals.order_count > 10)
    .where(OrderTotals.region == "East")
)
row = session.execute(stmt).scalar_one_or_none()

Preventing Accidental Writes

Use SQLAlchemy events to block write operations:

from sqlalchemy import event

READONLY_TABLES = {"order_totals", "daily_revenue", "customer_stats"}

@event.listens_for(session, "before_flush")
def block_stream_table_writes(session, flush_context, instances):
    for obj in session.new | session.dirty | session.deleted:
        table_name = obj.__class__.__tablename__
        if table_name in READONLY_TABLES:
            raise RuntimeError(
                f"Cannot write to stream table '{table_name}'. "
                f"Write to the source table instead."
            )

Reflecting Stream Tables

If you prefer reflection over explicit models:

from sqlalchemy import MetaData, Table, create_engine

engine = create_engine("postgresql://...")
metadata = MetaData()

# Reflect the stream table (treated as a regular table by PostgreSQL)
order_totals = Table("order_totals", metadata, autoload_with=engine)

# Query
with engine.connect() as conn:
    result = conn.execute(order_totals.select().limit(10))
    for row in result:
        print(row)

Checking Freshness

Query the stream table's metadata to check when it was last refreshed:

from sqlalchemy import text

def get_staleness(session, st_name: str) -> dict:
    """Return freshness info for a stream table."""
    result = session.execute(
        text("SELECT * FROM pgtrickle.get_staleness(:name)"),
        {"name": st_name},
    ).mappings().one()
    return dict(result)

# Usage
staleness = get_staleness(session, "order_totals")
print(f"Last refresh: {staleness['data_timestamp']}")
print(f"Stale for: {staleness['staleness_seconds']}s")

Async SQLAlchemy (2.0+)

Works identically with async_session:

from sqlalchemy.ext.asyncio import AsyncSession

async def get_top_regions(session: AsyncSession, limit: int = 10):
    stmt = (
        select(OrderTotals)
        .order_by(OrderTotals.total.desc())
        .limit(limit)
    )
    result = await session.execute(stmt)
    return result.scalars().all()

Django ORM

Read-Only Model Definition

Use managed = False so Django never creates, alters, or drops the table:

# models.py
from django.db import models

class OrderTotals(models.Model):
    """Read-only model backed by pg_trickle stream table."""
    
    region = models.CharField(max_length=255)
    order_count = models.BigIntegerField()
    total = models.DecimalField(max_digits=10, decimal_places=2)
    
    class Meta:
        managed = False        # Django will not create/alter this table
        db_table = "order_totals"
    
    def save(self, *args, **kwargs):
        raise NotImplementedError("Stream tables are read-only")
    
    def delete(self, *args, **kwargs):
        raise NotImplementedError("Stream tables are read-only")

Querying

Standard Django QuerySet operations work:

# All regions sorted by total
OrderTotals.objects.all().order_by("-total")

# Filtered
OrderTotals.objects.filter(
    order_count__gt=10,
    region="East"
).first()

# Aggregation (on the stream table itself)
from django.db.models import Sum, Avg
OrderTotals.objects.aggregate(
    total_revenue=Sum("total"),
    avg_orders=Avg("order_count"),
)

Django Migrations

Since managed = False, Django migrations won't touch stream tables. Create stream tables in a custom migration using RunSQL:

# migrations/0003_create_stream_tables.py
from django.db import migrations

class Migration(migrations.Migration):
    dependencies = [
        ("myapp", "0002_create_orders_table"),
    ]

    operations = [
        migrations.RunSQL(
            sql="""
                SELECT pgtrickle.create_stream_table(
                    'order_totals',
                    $pgt$SELECT region,
                                COUNT(*) AS order_count,
                                SUM(amount) AS total
                         FROM orders GROUP BY region$pgt$,
                    schedule     => '5s',
                    refresh_mode => 'DIFFERENTIAL'
                );
            """,
            reverse_sql="""
                SELECT pgtrickle.drop_stream_table('order_totals');
            """,
        ),
    ]

Read-Only Mixin

Create a reusable mixin for all stream table models:

class StreamTableMixin(models.Model):
    """Base class for pg_trickle stream table models."""
    
    class Meta:
        abstract = True
        managed = False
    
    def save(self, *args, **kwargs):
        raise NotImplementedError(
            f"{self.__class__.__name__} is a read-only stream table. "
            f"Write to the source table instead."
        )
    
    def delete(self, *args, **kwargs):
        raise NotImplementedError(
            f"{self.__class__.__name__} is a read-only stream table."
        )

# Usage
class OrderTotals(StreamTableMixin):
    region = models.CharField(max_length=255)
    order_count = models.BigIntegerField()
    total = models.DecimalField(max_digits=10, decimal_places=2)
    
    class Meta(StreamTableMixin.Meta):
        db_table = "order_totals"

class DailyRevenue(StreamTableMixin):
    day = models.DateField()
    revenue = models.DecimalField(max_digits=12, decimal_places=2)
    
    class Meta(StreamTableMixin.Meta):
        db_table = "daily_revenue"

Checking Freshness

Use raw SQL to query pg_trickle diagnostics:

from django.db import connection

def get_staleness(st_name: str) -> dict:
    """Return freshness info for a stream table."""
    with connection.cursor() as cursor:
        cursor.execute(
            "SELECT * FROM pgtrickle.get_staleness(%s)", [st_name]
        )
        columns = [col.name for col in cursor.description]
        row = cursor.fetchone()
        return dict(zip(columns, row)) if row else {}

Django REST Framework

Stream table models work with DRF serializers and viewsets:

from rest_framework import serializers, viewsets

class OrderTotalsSerializer(serializers.ModelSerializer):
    class Meta:
        model = OrderTotals
        fields = ["region", "order_count", "total"]

class OrderTotalsViewSet(viewsets.ReadOnlyModelViewSet):
    """Read-only API endpoint for order totals stream table."""
    queryset = OrderTotals.objects.all()
    serializer_class = OrderTotalsSerializer

Common Patterns

Write to Source, Read from Stream

The fundamental pattern: all writes go to source tables (normal ORM models), reads come from stream tables (read-only models).

# Write to source table (normal ORM)
order = Order(region="East", amount=Decimal("99.99"))
session.add(order)
session.commit()

# Read from stream table (auto-refreshed by pg_trickle)
totals = session.execute(
    select(OrderTotals).where(OrderTotals.region == "East")
).scalar_one()
print(f"East: {totals.order_count} orders, ${totals.total}")

Handling Eventual Consistency

Stream tables refresh on a schedule (e.g., every 5 seconds). After writing to a source table, the stream table may be briefly stale. Options:

  1. Accept staleness — suitable for dashboards and reports.
  2. Force refresh — call pgtrickle.refresh_stream_table() after critical writes.
  3. Use IMMEDIATE mode — stream table refreshes within the same transaction.
# Option 2: Force refresh after a critical write
session.execute(text(
    "SELECT pgtrickle.refresh_stream_table('order_totals')"
))

Further Reading

Multi-tenant Deployment Guide

This guide covers recommended deployment patterns for running pg_trickle across multiple PostgreSQL databases on the same instance, including worker quota allocation, per-database observability, and Grafana dashboard configuration.


Architecture Overview

In a multi-tenant setup, each PostgreSQL database gets its own pg_trickle background worker scheduler. All schedulers share a single worker pool via PostgreSQL shared memory (ACTIVE_REFRESH_WORKERS counter). The total number of concurrent refresh workers is bounded by pg_trickle.max_dynamic_refresh_workers.

┌─────────────────────────────────────────────┐
│             PostgreSQL instance              │
│                                              │
│  ┌──────────────┐  ┌──────────────┐          │
│  │  tenant_a DB │  │  tenant_b DB │  ...     │
│  │  scheduler   │  │  scheduler   │          │
│  └──────┬───────┘  └──────┬───────┘          │
│         │                 │                  │
│         └────────┬────────┘                  │
│                  ▼                           │
│       Shared worker pool (shmem)             │
│       ACTIVE_REFRESH_WORKERS atomic          │
└─────────────────────────────────────────────┘

Worker Quota Formula

When running N databases on the same instance, the recommended per-database worker quota is:

per_db_quota = ceil(max_parallel_refresh_workers / N_databases)

For example, with pg_trickle.max_dynamic_refresh_workers = 8 and 4 databases:

per_db_quota = ceil(8 / 4) = 2 workers per database

Set this in postgresql.conf or in each database's ALTER DATABASE SET:

-- Global limit (applies to all databases)
ALTER SYSTEM SET pg_trickle.max_dynamic_refresh_workers = 8;

-- Per-database override (optional, for high-priority tenants)
\c tenant_a
ALTER DATABASE tenant_a SET pg_trickle.max_dynamic_refresh_workers = 4;

Monitoring with cluster_worker_summary()

The pgtrickle.cluster_worker_summary() function returns a real-time view of worker allocation across all databases visible from the current connection:

SELECT * FROM pgtrickle.cluster_worker_summary();

Example output:

db_oiddb_nameactive_workersscheduler_pidscheduler_runningtotal_active_workers
16384tenant_a212345true5
16385tenant_b112346true5
16386tenant_c212347true5

The total_active_workers column shows the cluster-wide total from shared memory.


Per-Database Prometheus Labels (CLUS-2)

All pg_trickle metrics emitted by the /metrics endpoint include db_oid and db_name labels from v0.27.0 onwards. This enables per-database Grafana panels and alerting rules without requiring separate Prometheus scrape targets.

Example metric with labels:

pg_trickle_refreshes_total{schema="public",name="orders_agg",db_oid="16384",db_name="tenant_a"} 1247

Configuring Prometheus scrape targets

In a multi-tenant setup, configure one scrape job per database, each pointing to its own scheduler's metrics port:

scrape_configs:
  - job_name: 'pg_trickle_tenant_a'
    static_configs:
      - targets: ['localhost:9101']
        labels:
          instance: 'pg-primary'

  - job_name: 'pg_trickle_tenant_b'
    static_configs:
      - targets: ['localhost:9102']
        labels:
          instance: 'pg-primary'

Configure each database's metrics port:

\c tenant_a
ALTER DATABASE tenant_a SET pg_trickle.metrics_port = 9101;

\c tenant_b
ALTER DATABASE tenant_b SET pg_trickle.metrics_port = 9102;

Grafana Dashboard Snippets

Per-tenant refresh rate panel

rate(pg_trickle_refreshes_total{db_name=~"$tenant"}[5m])

Variable $tenant should be a Grafana template variable sourcing from:

label_values(pg_trickle_refreshes_total, db_name)

Cluster-wide worker utilisation

sum(pg_trickle_active_workers) by (db_name)
  / scalar(pg_trickle_max_concurrent_refreshes)

Refresh failure rate heatmap

sum by (db_name, le) (
  rate(pg_trickle_refresh_failures_total{db_name=~"$tenant"}[1h])
)

SLA breach prediction alerts (PLAN-3)

Add this Grafana alert rule to fire when a predicted breach is emitted via NOTIFY:

-- pg_trickle emits NOTIFY pg_trickle_alert with JSON payload.
-- Parse in your alerting system:
-- {"event":"predicted_sla_breach","pgt_schema":"...","pgt_name":"...",
--   "predicted_ms":...,"sla_ms":...,"pct_over":...}
LISTEN pg_trickle_alert;

See Also

dbt Hub Submission Guide

This document describes how to publish dbt-pgtrickle to dbt Hub so users can install it with a simple package name instead of a git URL.

Background

dbt Hub is a package registry maintained by dbt Labs. Packages are indexed by the hubcap automation, which runs hourly and scans listed GitHub repositories for new tagged releases containing a dbt_project.yml at the repository root.

Current Status

dbt-pgtrickle lives in the dbt-pgtrickle/ subdirectory of the trickle-labs/pg-trickle monorepo. Because hubcap expects dbt_project.yml at the repository root, a monorepo layout requires one of the approaches below.

Submission Approaches

Create a standalone repository (e.g., grove/dbt-pgtrickle) that mirrors the dbt-pgtrickle/ directory. This is the standard pattern used by most Hub packages (Fivetran, Snowplow, etc.).

  1. Create grove/dbt-pgtrickle repository on GitHub.
  2. Copy (or subtree-split) the dbt-pgtrickle/ contents into the repo root.
  3. Tag a release matching the version in dbt_project.yml (e.g., v0.15.0).
  4. Submit a PR to dbt-labs/hubcap adding "grove": ["dbt-pgtrickle"] to hub.json.
  5. Once merged, hubcap will automatically index new tags and publish versions.

After listing, users install with:

packages:
  - package: grove/dbt_pgtrickle
    version: [">=0.15.0", "<1.0.0"]

Option B: Keep Monorepo, Git Install Only

Continue using the git-based install with subdirectory:. This is fully functional but requires users to specify a git URL and revision:

packages:
  - git: "https://github.com/trickle-labs/pg-trickle.git"
    revision: v0.15.0
    subdirectory: "dbt-pgtrickle"

Submission Checklist

  • dbt_project.yml has name, version, config-version, require-dbt-version
  • dbt_project.yml version synced with pg_trickle release (0.15.0)
  • README.md documents both git and Hub installation methods
  • Macros are in macros/ directory
  • Tests are in tests/ directory
  • Package has been tested with dbt deps && dbt run && dbt test
  • Separate grove/dbt-pgtrickle repository created (if using Option A)
  • Tagged release published on the standalone repo
  • PR submitted to dbt-labs/hubcap adding "grove": ["dbt-pgtrickle"]
  • Hub listing verified at https://hub.getdbt.com/grove/dbt_pgtrickle/latest

Hub.json Entry Format

The PR to hubcap adds an entry to hub.json:

{
    "grove": [
        "dbt-pgtrickle"
    ]
}

The key is the GitHub organization name (grove), and the value is an array of repository names. Hubcap will scan grove/dbt-pgtrickle for tags matching semantic versioning and index each version automatically.

Version Syncing

The dbt_project.yml version should track the pg_trickle extension version to avoid confusion. When releasing a new pg_trickle version:

  1. Update dbt-pgtrickle/dbt_project.yml version.
  2. If using a separate repo, sync the changes and tag a new release.
  3. Hubcap will pick up the new tag within ~1 hour.

References

pg_trickle Blog

Note: This blog directory is an experiment. All posts were generated with AI assistance (GitHub Copilot / Claude) as a way to explore how well LLM-generated technical writing holds up for a niche systems engineering topic. The technical content has been reviewed for accuracy, but treat the posts as drafts — not as officially reviewed documentation. The blog is meant for informative purposes — to learn about interesting topics in the context of pg_trickle. It is a showcase for use-cases rather than a definitive reference; for authoritative documentation see the pg_trickle documentation.


Posts

Core Concepts & Theory

PostSummary
Why Your Materialized Views Are Always StaleExplains why REFRESH MATERIALIZED VIEW fails at scale — locking, cost, and the full-scan ceiling — and how switching to a stream table with DIFFERENTIAL mode fixes staleness in 5 lines of SQL.
Differential Dataflow for the Rest of UsA plain-language walkthrough of the mathematics behind incremental view maintenance: delta rules for filters, joins, aggregates, the MERGE application step, and why some aggregates (MEDIAN, RANK) can't be made incremental.
Incremental Aggregates in PostgreSQL: No ETL RequiredHow SUM, COUNT, AVG, and (in v0.37) vector_avg are maintained as running algebraic state rather than full scans. Covers multi-table aggregates, conditional aggregates, and the non-differentiable cases.
The Z-Set: The Data Structure That Makes IVM CorrectA concrete tour of the integer-weighted multiset that underlies pg_trickle's differential engine — how inserts are +1, deletes are -1, updates are both, and why commutativity eliminates an entire class of ordering bugs.
The Cost Model: How pg_trickle Decides Whether to Refresh DifferentiallyInside AUTO mode: the decision inputs (delta ratio, query complexity, historical timings), the learned cost model, and when the engine switches between DIFFERENTIAL and FULL refresh mid-flight.

SQL Operator Deep Dives

PostSummary
Recursive CTEs That Update ThemselvesSemi-naive evaluation for insert-only tables and Delete-and-Rederive for mixed DML — how pg_trickle maintains WITH RECURSIVE queries incrementally for org charts, BOMs, and graph reachability.
Window Functions Without the Full RecomputePartition-scoped recomputation for ROW_NUMBER, RANK, LAG, LEAD, and all standard window functions. Change one partition, leave the rest untouched.
GROUPING SETS, ROLLUP, and CUBE — IncrementallyMulti-dimensional aggregation decomposed into UNION ALL branches, each maintained with algebraic delta rules. Drill-down dashboards that refresh in milliseconds.
EXISTS and NOT EXISTS: The Delta Rules Nobody Talks AboutSemi-joins and anti-joins maintained via reference counting on the join key. Delta-key pre-filtering, inverted semantics for NOT EXISTS, SubLink extraction from WHERE clauses.
DISTINCT That Doesn't RecountReference counting (__pgt_dup_count) for incremental deduplication. Insert increments, delete decrements, row removed when count hits zero. DISTINCT ON with tie-breaking.
Scalar Subqueries in the SELECT List — IncrementallyPre/post snapshot diff for correlated subqueries. Only groups affected by the delta are re-evaluated — O(affected groups), not O(all rows).
LATERAL Joins in a Stream TableRow-scoped re-execution for JSON_TABLE, unnest(), generate_series(), and correlated set-returning functions. Cost proportional to changed left-side rows.
Set Operations Done Right: UNION, INTERSECT, EXCEPTDual-count multiplicity tracking for all set operations. UNION uses reference counting, INTERSECT requires both-side presence, EXCEPT removes when the right side gains a match.

Refresh Modes & Scheduling

PostSummary
IMMEDIATE Mode: When "Good Enough Freshness" Isn't Good EnoughSynchronous IVM inside the source transaction — zero lag, no background worker. Account balances, inventory tracking, and the trade-offs vs. DIFFERENTIAL mode.
How pg_trickle Handles Diamond DependenciesWhen two branches of a DAG share a source and converge downstream, naively refreshing can cause double-counting. How the frontier tracker and diamond-group scheduling ensure correctness.
Temporal Stream Tables: Time-Windowed Views That Update ThemselvesThe "last 7 days" problem — results that change because time passes, not because data changed. Sliding-window eviction, the temporal_mode parameter, and when fixed windows don't need it.
Declare Freshness Once: CALCULATED SchedulingUpstream tables derive their refresh cadence from downstream consumers. Set the SLA on the dashboard; the pipeline adjusts automatically.
Cycles in Your Dependency Graph? That's Fine.Fixed-point iteration for monotone queries. allow_circular = on, SCC detection, convergence guarantees, the iteration limit, and when cycles are a legitimate design choice.
Hot, Warm, Cold, Frozen: Tiered Scheduling at ScaleAutomatic tier classification by change frequency. The scheduler checks hot tables every cycle, frozen tables every ~60 cycles — 80%+ overhead reduction at 500+ stream tables.

CDC & Change Tracking

PostSummary
The CDC Mode You Never Have to ChooseHybrid CDC starts with triggers, silently graduates to WAL. Three-step transition orchestration, automatic fallback on failure, WAL backpressure, and why AUTO is the right default.
IVM Without Primary KeysContent-based hashing (xxHash64) generates synthetic row identity for keyless tables. Multiplicity counting for duplicates, collision probability, and when to add a PK anyway.
Foreign Tables as Stream Table SourcesIVM over postgres_fdw, file_fdw, and parquet_fdw sources using polling-based change detection. Mixed local/foreign source queries, performance trade-offs, and the materialize-first optimization.

Architecture & Data Patterns

PostSummary
The Medallion Architecture Lives Inside PostgreSQLBronze/Silver/Gold without Spark or Airflow. Chained stream tables propagate from raw ingest to business aggregates in under 5 seconds, with DAG-aware scheduling and transactional consistency.
CQRS Without a Second DatabaseCommand Query Responsibility Segregation using stream tables as the read model — same PostgreSQL instance, no CDC pipeline, read-your-writes with IMMEDIATE mode.
Slowly Changing Dimensions in Real TimeSCD Type 2 (historical attribute tracking with valid_from/valid_to) maintained continuously by a stream table — no nightly ETL, no Airflow DAG.
The Append-Only Fast PathWhy insert-only tables (event logs, sensor data, clickstreams) get a 2–3× faster refresh: no delete-side delta, no inverse computation, no before-image lookups.

Use Cases & Migration

PostSummary
Real-Time Leaderboards That Don't LieTop-N stream tables for games, sales dashboards, and coding challenges — tied scores, multi-category boards, the pagination problem, and why you might not need Redis.
The Hidden Cost of Trigger-Based DenormalizationFour failure modes of hand-rolled trigger sync — blind UPDATE divergence, statement vs. row trigger semantics, invisible deletes, and multi-row races — and how declarative IVM avoids all of them.
How We Replaced a Celery Pipeline with 3 SQL StatementsA before/after case study of a Celery + Elasticsearch product search pipeline across three generations of growing complexity, and the pg_trickle stream table that replaced it. Includes benchmark numbers.
Migrating from pg_ivm to pg_trickleFeature gap table, SQL syntax differences, step-by-step migration procedure, and when staying on pg_ivm is the right call.

Integrations & Ecosystem

PostSummary
Streaming to Kafka Without Kafka Expertisepgtrickle-relay bridges stream table deltas to Kafka, NATS, SQS, and webhooks — a single binary with TOML config, advisory-lock HA, subject routing, and Prometheus metrics.
The Relay Deep Dive: NATS, Redis Streams, and RabbitMQBeyond Kafka: per-backend architecture for NATS JetStream, Redis Streams, RabbitMQ, SQS, and HTTP webhooks. Subject templates, consumer groups, multi-sink pipelines, and a decision tree for choosing a backend.
The Inbox Pattern: Receiving Events from Kafka into PostgreSQLIdempotent, ordered event ingestion via the inbox table — deduplication by event ID, dead-letter queue, and stream tables that aggregate incoming events incrementally.
The Outbox You Don't Have to Buildpg_trickle's built-in outbox API: enable_outbox(), consumer groups, poll_outbox(), offset tracking, exactly-once delivery, consumer lag monitoring, and cleanup.
dbt + pg_trickle: The Analytics Engineer's StackThe pgtrickle dbt materialization: continuously-fresh models that are also version-controlled, tested, and documented. DAG alignment, freshness checks, and mixing materializations.
Distributed IVM with CitusIncremental view maintenance across sharded PostgreSQL: per-worker CDC, shard-aware delta routing, co-located join push-down, and automatic recovery after shard rebalances.
pg_trickle on CloudNativePGProduction Kubernetes deployment using the CloudNativePG operator: Dockerfile, Cluster manifest, GUC configuration, HA failover behaviour, Prometheus metrics ConfigMap, alerting rules, upgrade procedure, and sizing guidance.
Making pg_trickle Work Through PgBouncerConnection pooling modes, the background-worker bypass, LISTEN/NOTIFY caveats in transaction mode, and a configuration checklist for PgBouncer + pg_trickle.
Publishing Stream Tables via Logical ReplicationStream tables as standard publication sources for downstream PostgreSQL instances. Replication identity, multi-region distribution, and feeding Debezium/Kafka with clean aggregated events.
One PostgreSQL, Five Databases, One Worker PoolMulti-database architecture: one launcher per server, one scheduler per database, shared worker pool with per-database quotas. Failure isolation and the database-per-tenant SaaS pattern.

pgvector Integration

PostSummary
Your pgvector Index Is Lying to YouFour silent failure modes of unmanaged pgvector deployments: stale embedding corpora, drifting aggregates, IVFFlat recall loss, and over-fetching. How pg_trickle's differential IVM and drift-aware reindexing closes each gap.
Incremental Vector Aggregates: Building Recommendation Engines in Pure SQLHow vector_avg (v0.37) turns user taste vectors, category centroids, and cluster representatives into live algebraic aggregates — O(new interactions) cost, not O(history). Comparison with batch recomputation, feature stores, and application-level updates.
Deploying RAG at Scale: pg_trickle as Your Embedding InfrastructureProduction operations for pgvector + pg_trickle: drift-aware HNSW reindexing (reindex_if_drift), vector_status() monitoring, multi-tenant tiered indexing patterns, sparse/half-precision aggregates, reactive distance subscriptions, and the embedding_stream_table() ergonomic API.
HNSW Recall Is a Lie: Distribution Drift ExplainedDeep dive on IVFFlat centroid staleness and HNSW tombstone accumulation — how to measure drift, what the right threshold is, and how post_refresh_action => 'reindex_if_drift' (v0.38) automates the fix.
The pgvector Tooling Landscape in 2026Honest comparison of pg_trickle against pgai (archived Feb 2026), pg_vectorize, DIY batch pipelines, and Debezium. Introduces the two-layer model: Layer 1 = embedding generation, Layer 2 = derived-state maintenance.
Multi-Tenant Vector Search with Row-Level SecurityZero cross-tenant data leakage using RLS policies on stream tables, tiered tenancy (large / medium / small tenant strategies), per-tenant partial HNSW indexes, and drift-aware reindexing per partition.

Operations & Observability

PostSummary
Stop Rebuilding Your Search Index at 3amHow pg_trickle's scheduler, SLA tiers (critical / standard / background), backpressure, and parallel workers let you tune refresh behaviour per workload — and why the 3am maintenance window disappears with continuous incremental refresh.
pg_trickle Monitors ItselfSince v0.20, the extension's own health metrics are maintained as stream tables. How self-monitoring works, what it tracks, and the recursion question ("who monitors the monitor?").
How to Change a Stream Table Query Without Taking It OfflineALTER STREAM TABLE ... QUERY performs online schema evolution — the stream table stays queryable during migration, with atomic swap and cascade-safe dependency checking.
Backup and Restore for Stream Tablespg_dump, PITR, selective restore, and the repair_stream_table procedure. What to do (and what breaks) when you restore a database with active stream tables.
Testing Stream Tables: Shadow Mode and Correctness FuzzingShadow mode runs DIFFERENTIAL and FULL refresh in parallel and compares. SQLancer fuzzing generates random schemas and DML to find delta engine bugs. The multiset invariant and what it caught.
Snapshots: Time Travel for Stream Tablessnapshot_stream_table() captures point-in-time copies for pre-migration safety, replica bootstrap, forensic comparison, and test fixtures. Restore, list, and clean up with one function call each.
Drain Mode: Zero-Downtime Upgrades for Stream Tablespgtrickle.drain() quiesces in-flight refreshes before maintenance. Safe upgrade workflow, CloudNativePG integration, HA failover, and the resume path.
Column-Level Lineage in One Function Callstream_table_lineage() maps output columns to source columns. Impact analysis before ALTER TABLE, GDPR column-deletion audit, documentation generation, and recursive DAG tracing.
Error Budgets for Stream TablesSRE-style freshness monitoring: sla_summary() with p50/p99 latency, staleness tracking, error budget consumption, alerting thresholds, and Prometheus integration.
Structured Logging and OpenTelemetry for Stream Tableslog_format = json emits structured events with cycle_id correlation. Event taxonomy, log aggregator integration (Loki, Datadog, Elasticsearch), and OpenTelemetry compatibility.

Analytics & Feature Engineering

PostSummary
Funnel Analysis and Cohort Retention at ScaleComputing conversion funnels, retention matrices, and session aggregates incrementally — keeping product analytics live without billion-row scans.
Incremental ML Feature Engineering in PostgreSQLReplace nightly feature store batch jobs with continuously fresh features: rolling windows, lag features, cross-entity comparisons, all maintained as stream tables.
Time-Series Downsampling Without TimescaleDBHourly, daily, and monthly rollups maintained incrementally from raw sensor data — cascading stream tables as a lightweight alternative to a dedicated TSDB.
Incremental Statistical Aggregates: stddev, Percentiles, and HistogramsWhich higher-order statistics (variance, correlation, histograms) can be maintained exactly, which need approximations, and the space-accuracy trade-offs.

Data Patterns & Domain Applications

PostSummary
Event Sourcing Read Models Without ReplayProject live read-optimized views from an append-only event store without replaying history — order status, revenue analytics, and inventory projections as stream tables.
Soft Deletes and Tombstone Management in Differential IVMHow deleted_at patterns interact with delta propagation, ghost row pitfalls, cascading visibility, and best practices for correct stream tables over soft-deletable data.
Compliance and Audit Trails with Append-Only Stream TablesGDPR-compliant, tamper-evident audit logs: right-to-erasure reconciliation, hash chains, access pattern monitoring, and retention policies — all incrementally maintained.
Incremental Full-Text Search with tsvectorMaintain ranked search results incrementally as documents change — tracked queries, faceted counts, and top-K ranking without re-indexing the corpus.
Incremental PageRank and Graph Analytics in SQLLive PageRank, connected components, and shortest-path metrics maintained inside PostgreSQL as stream tables — no graph database required.
PostGIS + pg_trickle: Incremental Geospatial AggregatesHeatmaps, geofencing, spatial clustering, and distance-based aggregation that update in milliseconds as new points arrive.

Deployment & Multi-Tenancy

PostSummary
High Availability Failover with pg_trickle and PatroniHow stream table state survives primary switchover, WAL replay semantics for change buffers, split-brain prevention, and zero-data-loss configuration.
Parameterized Stream Tables: Building a SQL View LibraryPatterns for reusable, tenant-scoped, and versionable stream table definitions: single-table multi-tenant, template functions, schema isolation, and composable building blocks.

Performance Internals

PostSummary
The 45ms Cold-Start Tax and How L0 Cache Eliminates ItConnection poolers recycle backends, paying a template-parse penalty. The L0 process-local RwLock<HashMap> cache keyed by (pgt_id, cache_generation) drops p99 from 48ms to 6ms.
Spill-to-Disk and the Auto-Fallback Safety NetWhen delta queries exceed work_mem, pg_trickle detects consecutive spills and auto-switches to FULL refresh. Tuning merge_work_mem_mb, spill_threshold_blocks, and the self-healing recovery path.

Benchmarks & Advanced Patterns

PostSummary
TPC-H at 1GB in 40msReproducible benchmark of differential vs. full refresh across five TPC-H queries (Q1, Q3, Q5, Q6, Q12). Results: 13–22× faster per refresh cycle, with differential lag under 2.5 seconds vs. 186 seconds at 5,000 rows/second sustained write load.
From Nexmark to Production: Benchmarking Stream Processing in PostgreSQLpg_trickle on the Nexmark streaming benchmark: per-query throughput, latency percentiles, and how the numbers compare to Flink, Materialize, and a cron job.
Reactive Alerts Without PollingHow pg_trickle's reactive subscriptions (v0.39) replace polling loops: SLA breach detection, inventory alerts, fraud velocity checks, and vector distance subscriptions. Covers OLD.*/NEW.* transition semantics and PostgreSQL LISTEN.
The Outbox Pattern, TurbochargedUsing stream tables as transactionally consistent event sources for the outbox pattern — derived aggregate events, fat payloads, transition-based routing, and why stream tables naturally debounce high-frequency changes into fewer events.

Contributing

These posts are deliberately rough-edged — they're drafts exploring how the extension works, not polished marketing copy. If you spot a technical inaccuracy, open an issue or PR. If you want to write a post, open a discussion first to avoid duplication.

Frequently Asked Questions

This FAQ covers everything from core concepts and getting started, through SQL support details, to operational topics like deployment, monitoring, and troubleshooting. Use the table of contents below to jump to a specific topic.


New User FAQ — Top 15 Questions

New to pg_trickle? Start here. Each answer is a short summary with a link to the full explanation further down.

1. What is pg_trickle?

A PostgreSQL 18 extension that adds stream tables — materialized views that refresh themselves incrementally, processing only changed rows instead of re-running the entire query. Full answer →

2. How is this different from a materialized view?

Stream tables refresh automatically on a schedule, support incremental (differential) refresh, track changes via CDC triggers, and propagate updates through dependency chains — none of which REFRESH MATERIALIZED VIEW provides. Full answer →

3. How do I install pg_trickle?

Install from the Docker image, PGXN, or build from source. Add shared_preload_libraries = 'pg_trickle' to postgresql.conf, then CREATE EXTENSION pg_trickle; in each database. Full answer →

4. How do I create my first stream table?

One function call: SELECT pgtrickle.create_stream_table(name => 'my_st', query => 'SELECT ...', schedule => '5s'); See the Getting Started guide for a walkthrough. Full answer →

5. What is the difference between FULL and DIFFERENTIAL refresh?

FULL re-runs the entire defining query. DIFFERENTIAL reads only the changed rows from the change buffer and computes the delta — orders of magnitude faster for small changes on large tables. AUTO mode picks the best strategy per cycle. Full answer →

6. Which refresh mode should I use?

Use AUTO (the default) — it selects DIFFERENTIAL when possible and falls back to FULL when needed. Use IMMEDIATE for same-transaction consistency. Use FULL only when the defining query uses volatile functions or is not IVM-eligible. Full answer →

7. What SQL features are supported?

Joins (INNER, LEFT, RIGHT, FULL OUTER, CROSS, LATERAL), aggregates (60+ functions including SUM, COUNT, AVG, array_agg, jsonb_agg), CTEs (including recursive), window functions, UNION/INTERSECT/EXCEPT, subqueries, CASE, COALESCE, DISTINCT, GROUP BY with ROLLUP/CUBE/GROUPING SETS, and more. Full answer →

8. How fresh is my stream table data?

As fresh as the refresh schedule allows. With a 1s schedule, data is typically < 2 seconds stale. With IMMEDIATE mode, data is updated within the same transaction as the source write. Full answer →

9. Can I chain stream tables (ST reads from another ST)?

Yes — stream tables can reference other stream tables. pg_trickle builds a dependency DAG and refreshes them in topological order automatically. Full answer →

10. How does change data capture work?

Lightweight row-level AFTER triggers capture every INSERT, UPDATE, and DELETE into per-table change buffers. If wal_level = logical is available, pg_trickle can automatically transition to WAL-based CDC for near-zero write-path overhead. Full answer →

11. Do I need wal_level = logical?

No. pg_trickle works with the default wal_level = replica using trigger-based CDC. WAL-based CDC is optional and provides lower write-path overhead. Full answer →

12. Can I use pg_trickle with PgBouncer / connection poolers?

Yes. pg_trickle's background workers use direct connections, not pooled ones. Your application can use any pooler for reads and writes — the scheduler operates independently. Full answer →

13. How do I monitor stream table health?

Built-in views (pgtrickle.pgt_status, pgtrickle.pgt_refresh_history), Prometheus metrics endpoint, Grafana dashboard, and NOTIFY-based alerts. Full answer →

14. What happens if a refresh fails?

The stream table is marked SUSPENDED after exceeding the fuse threshold (default 5 consecutive failures). Data in the change buffer is preserved. Use pgtrickle.reset_fuse('my_st') to resume after fixing the issue. Full answer →

15. Can I use pg_trickle with dbt?

Yes — the dbt-pgtrickle package provides a stream_table materialization. dbt run creates/alters stream tables, dbt source freshness checks staleness. Full answer →


Table of Contents

Getting started

Consistency & refresh modes

SQL features

Internals & architecture

Operations

Troubleshooting & reference


General

These questions cover fundamental concepts — what pg_trickle is, how incremental view maintenance works, and the key building blocks (frontiers, row IDs, the auto-rewrite pipeline) that power the extension.

What is pg_trickle?

pg_trickle is a PostgreSQL 18 extension that implements stream tables — declarative, automatically-refreshing materialized views with Differential View Maintenance (DVM). You define a SQL query and a refresh schedule; the extension handles change capture, delta computation, and incremental refresh automatically.

It is inspired by the DBSP differential dataflow framework. See DBSP_COMPARISON.md for a detailed comparison.

What is incremental view maintenance (IVM) and why does it matter?

Incremental View Maintenance means updating a materialized view by processing only the changes (deltas) to the source data, rather than re-executing the entire defining query from scratch.

Consider a stream table defined as SELECT customer_id, SUM(amount) FROM orders GROUP BY customer_id over a 10-million-row orders table. When you insert 5 new rows:

  • Without IVM (FULL refresh): Re-scans all 10 million rows and recomputes every group. Cost: O(total rows).
  • With IVM (DIFFERENTIAL refresh): Reads only the 5 new rows from the change buffer, identifies the affected groups, and updates just those groups. Cost: O(changed rows × affected groups).

pg_trickle's DVM engine implements IVM using differentiation rules for each SQL operator (Scan, Filter, Join, Aggregate, etc.), generating a delta query that computes the exact changes to the stream table from the exact changes to the source.

What is the difference between a stream table and a regular materialized view, in practice?

FeatureMaterialized ViewsStream Tables
RefreshManual (REFRESH MATERIALIZED VIEW)Automatic (scheduler) or manual
Incremental refreshNot supported nativelyBuilt-in differential mode
Change detectionNone — always full recomputeCDC triggers track row-level changes
Dependency orderingNoneDAG-aware topological refresh
MonitoringNoneBuilt-in views, stats, NOTIFY alerts
ScheduleNoneDuration strings (5m) or cron (*/5 * * * *)
Transactional IVMNoYes (IMMEDIATE mode)

In practice, stream tables are regular PostgreSQL heap tables under the hood — you can query them, create indexes on them, join them with other tables, and reference them from views. The key difference is that pg_trickle manages their contents automatically.

What happens behind the scenes when I INSERT a row into a table tracked by a stream table?

The full data flow for a DIFFERENTIAL-mode stream table:

  1. Your INSERT completes normally. The row is written to the source table.
  2. A CDC trigger fires (row-level AFTER INSERT). It writes a change record (action=I, the new row data as JSONB, the current WAL LSN) into the source's change buffer table (pgtrickle_changes.changes_<oid>). This happens within your transaction — if you roll back, the change record is also rolled back.
  3. You commit. Both the source row and the change record become visible.
  4. The scheduler wakes up (every pg_trickle.scheduler_interval_ms, default 1 second). It checks whether the stream table's schedule says a refresh is due.
  5. If due, the refresh engine runs. It reads the change buffer for rows with LSN > the stream table's current frontier, generates a delta query from the DVM operator tree, and applies the result via MERGE.
  6. Frontier advances. The stream table's frontier is updated to the new LSN, and the consumed change buffer rows are cleaned up.

For IMMEDIATE-mode stream tables, steps 2–6 are replaced: a statement-level AFTER trigger computes and applies the delta within your transaction, so the stream table is updated before your transaction commits.

What does "differential" mean in the context of pg_trickle?

"Differential" refers to the mathematical approach of computing differences (deltas) rather than absolute values. Given a query Q and a set of changes ΔR to source table R, the DVM engine computes ΔQ(R, ΔR) — the change to the query result caused by the change to the source. This delta is then applied (merged) into the stream table.

Each SQL operator has its own differentiation rule. For example:

  • Filter: ΔFilter(R, ΔR) = Filter(ΔR) — just apply the filter to the changes.
  • Join: ΔJoin(R, S, ΔR) = Join(ΔR, S) — join the changes against the other side's current state.
  • Aggregate: Recompute only the groups whose keys appear in the changes.

See DVM_OPERATORS.md for the complete set of differentiation rules.

What is a frontier, and why does pg_trickle track LSNs?

A frontier is a per-source map of {source_oid → LSN} that records exactly how far each stream table has consumed changes from each of its source tables. It is stored as JSONB in the pgtrickle.pgt_stream_tables catalog.

Why LSNs? PostgreSQL's Write-Ahead Log Sequence Number (LSN) provides a globally ordered, monotonically increasing position in the change stream. By recording the LSN at which each source was last consumed, the frontier ensures:

  • No missed changes. The next refresh reads changes with LSN > frontier, ensuring contiguous, non-overlapping windows.
  • No duplicate processing. Changes at or below the frontier are never re-read.
  • Consistent snapshots. When a stream table depends on multiple source tables, the frontier tracks each source independently, enabling consistent multi-source delta computation.

Lifecycle: Created on first full refresh → Advanced on each differential refresh → Reset on reinitialize.

What is the __pgt_row_id column and why does it appear in my stream tables?

Every stream table has a __pgt_row_id BIGINT PRIMARY KEY column. It stores a 64-bit xxHash of the row's group-by key (for aggregate queries) or all output columns (for non-aggregate queries). The refresh engine uses it to match incoming deltas against existing rows during the MERGE operation.

You should ignore this column in your queries. It is an implementation detail. If it bothers you, exclude it explicitly:

SELECT customer_id, total FROM order_totals;  -- omit __pgt_row_id

What is the auto-rewrite pipeline and how does it affect my queries?

Before parsing a defining query into the DVM operator tree, pg_trickle runs six automatic rewrite passes:

#PassWhat it does
0View inliningReplaces view references with (view_definition) AS alias subqueries (fixpoint, max depth 10)
1DISTINCT ONConverts to ROW_NUMBER() OVER (PARTITION BY … ORDER BY …) = 1 subquery
2GROUPING SETS / CUBE / ROLLUPDecomposes into UNION ALL of separate GROUP BY queries
3Scalar subquery in WHERERewrites WHERE col > (SELECT …) to CROSS JOIN
4Correlated scalar subquery in SELECTRewrites to LEFT JOIN with grouped inline view
5SubLinks in ORSplits WHERE a OR EXISTS (…) into UNION branches

The rewrites are transparent — your original query is preserved in the catalog (original_query column) while the rewritten version is stored in defining_query. The DVM engine only sees standard SQL operators after rewriting.

See ARCHITECTURE.md for details on each pass.

How does pg_trickle compare to DBSP (the academic framework)?

pg_trickle is inspired by DBSP but is not a direct implementation. Key differences:

  • DBSP is a general-purpose differential dataflow framework with a Rust runtime (Feldera). It models computation as circuits over Z-sets (multisets with integer weights).
  • pg_trickle implements the same mathematical principles (delta queries, frontier tracking) but embedded inside PostgreSQL as an extension. It generates SQL delta queries rather than running a separate computation engine.
  • Trade-off: pg_trickle leverages PostgreSQL's optimizer, indexes, and storage engine but is limited to what can be expressed as SQL queries. DBSP can implement arbitrary dataflow computations.

See DBSP_COMPARISON.md for a detailed comparison.

How does pg_trickle compare to pg_ivm?

Featurepg_ivmpg_trickle
Refresh timingImmediate (same transaction) onlyImmediate, Deferred (scheduled), or Manual
Incremental strategyTransition tables + query rewritingDVM operator tree + delta SQL generation
Supported SQLInner joins, simple outer joins, COUNT/SUM/AVG/MIN/MAX, EXISTS, DISTINCTAll of the above + window functions, recursive CTEs, LATERAL, UNION/INTERSECT/EXCEPT, 37 aggregates, TopK, GROUPING SETS
Cascading (view-on-view)NoYes (DAG-aware topological refresh)
SchedulingNone (always immediate)Duration, cron, CALCULATED, or NULL
MonitoringNoneBuilt-in views, stats, NOTIFY alerts
PostgreSQL version14–1718 only (until v0.4.0)

pg_trickle's IMMEDIATE mode is designed as a migration path for pg_ivm users — it uses the same statement-level trigger approach with transition tables.

What PostgreSQL versions are supported?

PostgreSQL 18.x exclusively. pg_trickle uses PostgreSQL 18 features such as enhanced MERGE syntax with NOT MATCHED BY SOURCE and improved event trigger payloads. These features are not available in earlier versions.

Backward compatibility with PostgreSQL 16–17 is planned for a future release (tracked in the roadmap).

Does pg_trickle require wal_level = logical?

No. By default, pg_trickle uses lightweight row-level triggers for change data capture instead of logical replication. This means you do not need to set wal_level = logical, configure max_replication_slots, or create publications.

If you later enable the hybrid CDC mode (pg_trickle.cdc_mode = 'auto'), WAL-based capture becomes an option — but this is opt-in and not required for normal operation.

Is pg_trickle production-ready?

pg_trickle is under active development and approaching production readiness. It has a comprehensive test suite with 700+ unit tests and 290+ end-to-end tests covering correctness, failure recovery, and concurrency scenarios.

That said, as with any new extension, you should evaluate it against your specific workloads before deploying to production. Start with non-critical dashboards or reporting tables, monitor refresh performance and data correctness, and gradually expand usage as confidence grows.


Installation & Setup

How do I install pg_trickle?

  1. Add pg_trickle to shared_preload_libraries in postgresql.conf:
    shared_preload_libraries = 'pg_trickle'
    
  2. Restart PostgreSQL.
  3. Run:
    CREATE EXTENSION pg_trickle;
    

See Installation for platform-specific instructions and pre-built release artifacts.

What are the minimum configuration requirements?

The only mandatory setting is adding pg_trickle to shared_preload_libraries in postgresql.conf (this requires a PostgreSQL restart):

shared_preload_libraries = 'pg_trickle'

All other GUC parameters have sensible defaults and can be tuned later. However, max_worker_processes often needs to be raised from its default of 8 — see the next question.

Can I install pg_trickle on a managed PostgreSQL service (RDS, Cloud SQL, etc.)?

It depends on whether the service allows custom extensions and shared_preload_libraries modifications. Many managed services restrict these. However, pg_trickle has one advantage over replication-based extensions: it does not require wal_level = logical, which avoids one of the most common restrictions on managed PostgreSQL services.

Check your provider's documentation for custom extension support. Services that support custom extensions (e.g., some tiers of Azure Flexible Server, Supabase, Neon) are more likely to work.

How do I uninstall pg_trickle?

  1. Drop all stream tables first (or they will be cascade-dropped):
    SELECT pgtrickle.drop_stream_table(pgt_name) FROM pgtrickle.pgt_stream_tables;
    
  2. Drop the extension:
    DROP EXTENSION pg_trickle CASCADE;
    
  3. Remove pg_trickle from shared_preload_libraries and restart PostgreSQL.

Creating & Managing Stream Tables

Do I need to choose a refresh mode?

No. The default mode ('AUTO') is adaptive: it uses differential (delta-only) maintenance when efficient, and automatically falls back to full recomputation when the change volume is high or the query cannot be differentiated. This works well for the vast majority of queries.

You only need to specify a mode explicitly when:

  • You want FULL mode to force recomputation every time (rare).
  • You want IMMEDIATE mode for sub-second, in-transaction updates (adds overhead to every write on source tables).
  • You want strict DIFFERENTIAL mode and prefer an error over silent fallback when the query isn't differentiable.

How do I create a stream table?

-- Minimal: just name and query. Refreshes on a calculated schedule
-- using adaptive differential maintenance.
SELECT pgtrickle.create_stream_table(
    'order_totals',
    'SELECT customer_id, SUM(amount) AS total
     FROM orders GROUP BY customer_id'
);

-- With custom schedule:
SELECT pgtrickle.create_stream_table(
    name     => 'order_totals',
    query    => 'SELECT customer_id, SUM(amount) AS total
     FROM orders GROUP BY customer_id',
    schedule => '5m'
);

What is the difference between FULL and DIFFERENTIAL refresh mode?

  • FULL — Truncates the stream table and re-runs the entire defining query every refresh cycle. Simple but expensive for large result sets.
  • DIFFERENTIAL — Computes only the delta (changes since the last refresh) using the DVM engine and applies it via a MERGE statement. Much faster when only a small fraction of source data changes between refreshes. When the change ratio exceeds pg_trickle.differential_max_change_ratio (default 15%), DIFFERENTIAL automatically falls back to FULL for that cycle.
  • IMMEDIATE — Maintains the stream table synchronously within the same transaction as the base table DML. Uses statement-level triggers with transition tables — no change buffers, no scheduler. The stream table is always up-to-date.

Why does FULL mode exist if DIFFERENTIAL can fall back to it automatically?

DIFFERENTIAL mode with adaptive fallback covers most user needs — it uses incremental deltas when changes are small and automatically switches to a full recompute when the change ratio is high. However, explicit FULL mode still has its place:

  1. No CDC overhead. FULL mode installs CDC triggers on source tables (for DAG tracking), but the refresh itself ignores the change buffers entirely. If your workload has very high write throughput and you know you'll always do a full recompute, FULL mode avoids the per-row trigger overhead of writing change records that will never be consumed incrementally.

  2. Simpler debugging. When investigating data correctness issues, FULL mode is a clean baseline — it re-runs the defining query with no delta computation, no frontier tracking, and no MERGE logic. If FULL produces correct results but DIFFERENTIAL doesn't, the bug is in the delta pipeline.

  3. Predictable performance. DIFFERENTIAL refresh time varies with the number of changes, which can be unpredictable. FULL refresh time is proportional to the total result set size, which is stable. For SLA-sensitive workloads where you'd rather have consistent 500ms refreshes than variable 5ms–500ms refreshes, FULL provides that predictability.

  4. Unsupported-but-planned constructs. Some queries may parse correctly in DIFFERENTIAL mode but produce suboptimal deltas. Using FULL mode explicitly is a safe fallback while the DVM engine matures.

For most users, DIFFERENTIAL is the right default. Use FULL when you have a specific reason.

When should I use FULL vs. DIFFERENTIAL vs. IMMEDIATE?

Use DIFFERENTIAL (default) when:

  • Source tables are large and changes between refreshes are small
  • The defining query uses supported operators (most common SQL is supported)
  • Some staleness (seconds to minutes) is acceptable

Use FULL when:

  • The defining query uses unsupported aggregates (CORR, COVAR_*, REGR_*)
  • Source tables are small and a full recompute is cheap
  • You see frequent adaptive fallbacks to FULL (check refresh history)

Use IMMEDIATE when:

  • The stream table must always reflect the latest committed data
  • You need transactional consistency (reads within the same transaction see updated data)
  • Write-side overhead per DML statement is acceptable
  • The defining query is relatively simple (no TopK, no materialized view sources)

What are the advantages and disadvantages of IMMEDIATE vs. deferred (FULL/DIFFERENTIAL) refresh modes?

IMMEDIATE mode

Detail
✅ Read-your-writes consistencyThe stream table is updated within the same transaction as the base table DML — always current from the writer's perspective.
✅ No lagNo background worker, no schedule interval. The view is never stale.
✅ No change bufferspgtrickle_changes.* tables are not used, reducing write overhead on source tables.
✅ pg_ivm compatibilityDrop-in migration path for existing pg_ivm / IMMV users.
❌ Write amplificationEvery DML statement on a base table also executes IVM trigger logic, adding latency to the original transaction.
❌ Serialized concurrent writesAn ExclusiveLock is taken on the stream table during maintenance, serializing writers.
❌ Limited SQL supportWindow functions, recursive CTEs, LATERAL joins, scalar subqueries, and TopK (ORDER BY … LIMIT) are not supported — use DIFFERENTIAL instead.
❌ Cascading limitationsCascading IMMEDIATE stream tables work but may require manual refresh for deep chains.
❌ No throttlingThe refresh cannot be delayed or rate-limited.

Deferred mode (FULL / DIFFERENTIAL)

Detail
✅ Decoupled write pathBase table writes are fast; view maintenance runs later via the scheduler or manual refresh.
✅ Broadest SQL supportWindow functions, recursive CTEs, LATERAL, UNION, user-defined aggregates, TopK, cascading stream tables, and more.
✅ Adaptive cost controlDIFFERENTIAL automatically falls back to FULL when the change ratio exceeds pg_trickle.differential_max_change_ratio.
✅ Concurrency-friendlyWriters never block on view maintenance.
❌ StalenessThe stream table lags by up to one schedule interval (e.g. 1m).
❌ No read-your-writesA writer querying the stream table immediately after a write may see the pre-change data.
❌ Infrastructure overheadRequires change buffer tables, a background worker, and frontier tracking.

Rule of thumb: use IMMEDIATE when the query is simple and freshness within the transaction matters. Use DIFFERENTIAL (or FULL) for complex queries, high concurrency, or when you want to decouple write latency from view maintenance.

What happens if I have an IMMEDIATE stream table between two DIFFERENTIAL stream tables in a dependency chain?

Consider the chain: source → ST_A (DIFFERENTIAL) → ST_B (IMMEDIATE) → ST_C (DIFFERENTIAL). This is a valid but unusual configuration with important behavioral consequences:

  • ST_A refreshes on its schedule (e.g., every 1 minute) via the background scheduler.
  • ST_B is IMMEDIATE, so it has no CDC triggers on ST_A — it uses statement-level IVM triggers. But ST_A is updated by the scheduler (not by user DML), and the scheduler's MERGE operation does fire statement-level triggers on ST_A's dependents. So ST_B updates within the scheduler's transaction when ST_A refreshes.
  • ST_C is DIFFERENTIAL and depends on ST_B. Since ST_B is a stream table, ST_C's CDC triggers fire when ST_B is modified. The scheduler refreshes ST_C on its own schedule.

The practical concern: write latency stacking. When the scheduler refreshes ST_A, ST_B's IVM triggers fire synchronously within that same transaction, adding IVM overhead to ST_A's refresh. If ST_B's delta computation is expensive, it slows down the entire scheduler cycle.

Recommendation: Avoid mixing IMMEDIATE into the middle of a deferred chain. Either make the entire chain IMMEDIATE (for small, simple queries) or keep it entirely DIFFERENTIAL. If you need read-your-writes for one specific step, consider making that the terminal (leaf) stream table in the chain.

What schedule formats are supported?

Duration strings:

UnitSuffixExample
Secondss30s
Minutesm5m
Hoursh2h
Daysd1d
Weeksw1w
Compound1h30m

Cron expressions:

FormatExampleDescription
5-field*/5 * * * *Every 5 minutes
Aliases@hourly, @dailyBuilt-in shortcuts

CALCULATED mode: Pass NULL as the schedule to inherit the schedule from downstream dependents.

How do cron schedules handle timezones? What does @daily really mean?

pg_trickle evaluates cron expressions in UTC. The underlying croner crate computes the next occurrence from a UTC timestamp, and the scheduler compares this against chrono::Utc::now(). There is no per-stream-table timezone setting.

This means:

  • @daily (equivalent to 0 0 * * *) fires at midnight UTC, not midnight in your local timezone.
  • @hourly (equivalent to 0 * * * *) fires at the top of each UTC hour.
  • 0 9 * * 1-5 fires at 09:00 UTC on weekdays — if your server is in America/New_York, that's 04:00 or 05:00 local time depending on DST.

If you need a schedule aligned to a local timezone, convert the desired local time to UTC and write the cron expression accordingly. For example, to refresh at 08:00 Europe/Oslo (UTC+1 in winter, UTC+2 in summer), use 0 6 * * * in summer and 0 7 * * * in winter — or accept the 1-hour seasonal shift and pick one.

Tip: For most analytics workloads, UTC-based schedules are preferable because they don't shift with daylight saving transitions.

What is the minimum allowed schedule?

The pg_trickle.min_schedule_seconds GUC (default: 60 seconds) sets the shortest allowed refresh schedule. Any create_stream_table or alter_stream_table call with a schedule shorter than this floor is rejected with a clear error message.

This guard exists to prevent accidentally creating stream tables that refresh too frequently, which could overload the scheduler or the source tables. During development and testing, you can lower it:

ALTER SYSTEM SET pg_trickle.min_schedule_seconds = 1;
SELECT pg_reload_conf();

What happens if all stream tables in the DAG have a CALCULATED schedule?

When every stream table uses a CALCULATED schedule (schedule => 'calculated'), there are no explicit schedules for the resolution algorithm to derive from. The CALCULATED logic works by propagating MIN(effective_schedule) from downstream dependents upward through the DAG. If no node has an explicit duration:

  1. Leaf nodes (no downstream dependents) have no schedules to take the minimum of, so they fall back to the pg_trickle.min_schedule_seconds GUC (default: 60 seconds).
  2. Upstream nodes then resolve to MIN(fallback) = fallback.
  3. The result: every stream table in the DAG gets the fallback schedule (60 s by default).

This is safe but usually not what you want — the whole DAG refreshes at the same generic interval. Best practice is to set an explicit schedule on at least the leaf (most-downstream) stream tables so that upstream CALCULATED schedules resolve to something meaningful:

-- Leaf ST with an explicit schedule
SELECT pgtrickle.create_stream_table(
    name     => 'daily_summary',
    query    => 'SELECT region, SUM(total) FROM pgtrickle.order_totals GROUP BY region',
    schedule => '10m'
);

-- Upstream ST inherits that 10 m schedule via CALCULATED
SELECT pgtrickle.create_stream_table(
    name     => 'order_totals',
    query    => 'SELECT customer_id, SUM(amount) AS total FROM orders GROUP BY customer_id',
    schedule => 'calculated'
);

You can inspect the resolved effective schedules with:

SELECT pgt_name, schedule, effective_schedule
FROM pgtrickle.pgt_stream_tables;

Can a stream table reference another stream table?

Yes. Stream tables can depend on other stream tables. The scheduler automatically refreshes them in topological order (upstream first). Circular dependencies are detected and rejected at creation time.

-- ST1: aggregates orders
SELECT pgtrickle.create_stream_table(
    name         => 'order_totals',
    query        => 'SELECT customer_id, SUM(amount) AS total FROM orders GROUP BY customer_id',
    schedule     => '1m',
    refresh_mode => 'DIFFERENTIAL'
);

-- ST2: filters ST1
SELECT pgtrickle.create_stream_table(
    name         => 'big_customers',
    query        => 'SELECT customer_id, total FROM pgtrickle.order_totals WHERE total > 1000',
    schedule     => '1m',
    refresh_mode => 'DIFFERENTIAL'
);

How do I change a stream table's schedule or mode?

-- Change schedule
SELECT pgtrickle.alter_stream_table('order_totals', schedule => '10m');

-- Switch refresh mode
SELECT pgtrickle.alter_stream_table('order_totals', refresh_mode => 'FULL');

-- Suspend
SELECT pgtrickle.alter_stream_table('order_totals', status => 'SUSPENDED');

-- Resume
SELECT pgtrickle.alter_stream_table('order_totals', status => 'ACTIVE');

Can I change the defining query of a stream table?

Yes — use the query parameter of alter_stream_table():

SELECT pgtrickle.alter_stream_table('order_totals',
    query => 'SELECT customer_id, SUM(amount) AS total, COUNT(*) AS order_count
              FROM orders GROUP BY customer_id');

The ALTER QUERY operation validates the new query, migrates the storage table schema if needed, updates catalog entries and source dependencies, and runs a full refresh — all within a single transaction. Concurrent readers see either the old data or the new data, never an empty table.

Schema migration behavior:

Schema changeBehavior
Same columnsFast path — no storage DDL, just catalog update + full refresh
Columns added or removedCompatible migration via ALTER TABLE ADD/DROP COLUMN — storage table OID preserved
Column type incompatibleFull rebuild — storage table dropped and recreated (OID changes, WARNING emitted)

You can also change the query and other parameters simultaneously:

SELECT pgtrickle.alter_stream_table('order_totals',
    query => 'SELECT customer_id, SUM(amount) AS total FROM orders GROUP BY customer_id',
    refresh_mode => 'FULL');

How do I deploy stream tables idempotently?

Use create_or_replace_stream_table() — one function call that does the right thing automatically:

-- Safe to run on every deploy — creates, updates, or no-ops as needed:
SELECT pgtrickle.create_or_replace_stream_table(
    name         => 'order_totals',
    query        => 'SELECT region, SUM(amount) AS total FROM orders GROUP BY region',
    schedule     => '2m',
    refresh_mode => 'DIFFERENTIAL'
);

What happens on each deploy:

SituationAction
First deploy (stream table doesn't exist)Creates it, populates data
Nothing changed since last deployNo-op — logs INFO, returns instantly
You changed the schedule or modeUpdates config in place (no data loss)
You changed the queryMigrates storage schema + runs a full refresh

This mirrors PostgreSQL's CREATE OR REPLACE VIEW / CREATE OR REPLACE FUNCTION pattern.

When to use which function:

FunctionUse case
create_or_replace_stream_table()Recommended for most deployments. Declarative, idempotent — handles all cases automatically.
create_stream_table_if_not_exists()Safe re-run, but never modifies an existing definition. Good for one-time seed migrations.
create_stream_table()Strict mode — errors if the stream table already exists. Use when you want an explicit failure on duplicates.

How do I trigger a manual refresh?

Call refresh_stream_table() to immediately refresh a stream table without waiting for the next scheduled cycle:

SELECT pgtrickle.refresh_stream_table('order_totals');

This runs a synchronous refresh in your current session and returns when complete. It works even when the background scheduler is disabled (pg_trickle.enabled = false), making it useful for testing, debugging, or one-off data refreshes.

To force a full refresh regardless of the stream table's configured mode, temporarily change the refresh mode:

SELECT pgtrickle.alter_stream_table('order_totals', refresh_mode => 'FULL');
SELECT pgtrickle.refresh_stream_table('order_totals');
-- Switch back to the original mode when done:
SELECT pgtrickle.alter_stream_table('order_totals', refresh_mode => 'DIFFERENTIAL');

Data Freshness & Consistency

Understanding when and how stream tables become current is the #1 conceptual hurdle for users coming from synchronous materialized views. This section explains staleness guarantees, read-your-writes behavior, and Delayed View Semantics (DVS).

How stale can a stream table be?

For deferred modes (FULL / DIFFERENTIAL): A stream table can be at most one schedule interval behind the source data, plus the time it takes to execute the refresh itself. For example, with schedule => '1m', the maximum staleness is approximately 1 minute + refresh duration.

In practice, staleness is often less than the schedule interval because the scheduler continuously checks for due refreshes at pg_trickle.scheduler_interval_ms (default: 1 second).

For IMMEDIATE mode: The stream table is always current within the transaction that modified the source data. There is zero staleness.

Check current staleness:

SELECT pgtrickle.get_staleness('order_totals');  -- returns seconds, NULL if never refreshed

-- Or check all stream tables:
SELECT pgt_name, staleness, stale FROM pgtrickle.stream_tables_info;

Can I read my own writes immediately after an INSERT?

It depends on the refresh mode:

  • IMMEDIATE mode: Yes. The stream table is updated within the same transaction as your INSERT. You can query it immediately and see the updated data.
  • DIFFERENTIAL / FULL mode: No. The stream table is updated by the background scheduler in a separate transaction. Your INSERT is captured by the CDC trigger, but the stream table won't reflect it until the next scheduled refresh (or a manual refresh_stream_table() call).

If read-your-writes consistency is a requirement, use refresh_mode => 'IMMEDIATE'.

What consistency guarantees does pg_trickle provide?

pg_trickle provides Delayed View Semantics (DVS): the contents of every stream table are logically equivalent to evaluating its defining query at some past point in time — the data_timestamp. This means:

  • The data is always internally consistent — it corresponds to a valid snapshot of the source data.
  • The data may be stale — it reflects the source state at data_timestamp, not necessarily the current state.
  • For cascading stream tables, the scheduler refreshes in topological order so that when ST B references upstream ST A, A has already been refreshed before B runs its delta query against A's contents.

For IMMEDIATE mode, the guarantee is stronger: the stream table always reflects the state of the source data as of the current transaction.

What are "Delayed View Semantics" (DVS)?

DVS is the formal consistency guarantee: a stream table's contents are equivalent to evaluating its defining query at a specific past time (the data_timestamp). This is analogous to how a materialized view captured at a point in time is always internally consistent, even if the source data has since changed.

The data_timestamp is recorded in the catalog and advanced after each successful refresh:

SELECT pgt_name, data_timestamp FROM pgtrickle.pgt_stream_tables;

What happens if the scheduler is behind — does data get lost?

No. Change data is never lost, even if the scheduler falls behind. Changes accumulate in the change buffer tables (pgtrickle_changes.changes_<oid>) until consumed by a refresh. The frontier ensures that each refresh picks up exactly where the last one left off.

However, a growing change buffer increases:

  • Disk usage (change buffer tables grow)
  • Refresh time (more changes to process per cycle)
  • Risk of adaptive fallback to FULL (if the change ratio exceeds pg_trickle.differential_max_change_ratio)

The monitoring system emits a buffer_growth_warning NOTIFY alert if buffers grow unexpectedly.

How does pg_trickle ensure deltas are applied in the right order across cascading stream tables?

The scheduler uses topological ordering from the dependency DAG. When ST B depends on ST A:

  1. ST A is refreshed first — its data is brought up to date and its frontier advances.
  2. ST A's refresh writes are captured by CDC triggers (since ST A is a source for ST B).
  3. ST B is refreshed next — its delta query reads ST A's current (just-refreshed) data and the change buffer.

This ensures that downstream stream tables always see consistent upstream data. Circular dependencies are rejected at creation time.


IMMEDIATE Mode (Transactional IVM)

IMMEDIATE mode maintains the stream table synchronously — within the same transaction as the source DML. This section covers when to use it, what SQL it supports, locking behavior, and how to switch between modes.

When should I use IMMEDIATE mode instead of DIFFERENTIAL?

Use IMMEDIATE when:

  • Your application requires read-your-writes consistency — e.g., a user inserts an order and immediately queries a dashboard that must include that order.
  • The defining query is relatively simple (single-table aggregation, joins, filters).
  • The source table write rate is moderate (IMMEDIATE adds latency to every DML statement).

Stick with DIFFERENTIAL when:

  • Staleness of a few seconds to minutes is acceptable.
  • The defining query uses unsupported IMMEDIATE constructs (materialized-view sources, foreign-table sources).
  • Write-side performance is critical (high-throughput OLTP).
  • You need to decouple write latency from view maintenance.

What SQL features are NOT supported in IMMEDIATE mode?

IMMEDIATE mode supports all constructs that DIFFERENTIAL supports, with two source-type exceptions:

FeatureStatusNotes
WITH RECURSIVE✅ Supported (IM1)Semi-naive evaluation inside the trigger. A depth counter guards against infinite loops (pg_trickle.ivm_recursive_max_depth, default 100). A warning is emitted at create time for very deep hierarchies.
TopK (ORDER BY … LIMIT N [OFFSET M])✅ Supported (IM2)Micro-refresh: recomputes the top-N rows on every DML statement. Gated by pg_trickle.ivm_topk_max_limit to prevent unbounded scans.
Materialized views as sources❌ RejectedStale-snapshot prevents trigger-based capture — use the underlying query instead.
Foreign tables as sources❌ RejectedNo triggers on foreign tables — use FULL mode instead.

Attempting to create or switch to IMMEDIATE mode with an unsupported construct produces a clear error message.

What happens when I TRUNCATE a source table in IMMEDIATE mode?

A statement-level AFTER TRUNCATE trigger fires and truncates the stream table, then re-populates it by executing a full refresh from the defining query — all within the same transaction. The stream table remains consistent.

Can I have cascading IMMEDIATE stream tables (ST A → ST B)?

Yes. When ST A is IMMEDIATE and ST B depends on ST A and is also IMMEDIATE, changes propagate through the chain within the same transaction. The IVM triggers on the base table update ST A, and since that write is visible within the transaction, ST B's triggers fire and update ST B.

What locking does IMMEDIATE mode use?

IMMEDIATE mode acquires statement-level locks on the stream table during delta application:

  • Simple queries (single-table scan/filter without aggregates or DISTINCT): RowExclusiveLock — allows concurrent readers, blocks other writers.
  • Complex queries (joins, aggregates, DISTINCT, window functions): ExclusiveLock — blocks both readers and writers to ensure delta consistency.

This means concurrent writes to the same base table are serialized through the stream table lock. For high-concurrency write workloads, DIFFERENTIAL mode avoids this bottleneck.

How do I switch an existing DIFFERENTIAL stream table to IMMEDIATE?

SELECT pgtrickle.alter_stream_table('order_totals', refresh_mode => 'IMMEDIATE');

This:

  1. Validates the defining query against IMMEDIATE mode restrictions.
  2. Removes the row-level CDC triggers from source tables.
  3. Installs statement-level IVM triggers (BEFORE + AFTER with transition tables).
  4. Clears the schedule (IMMEDIATE mode has no schedule).
  5. Performs a full refresh to establish a consistent baseline.

To switch back:

SELECT pgtrickle.alter_stream_table('order_totals', refresh_mode => 'DIFFERENTIAL');

This reverses the process: removes IVM triggers, installs CDC triggers, restores the schedule (default 1m), and performs a full refresh.

What happens to IMMEDIATE mode during a manual refresh_stream_table() call?

For IMMEDIATE mode stream tables, refresh_stream_table() performs a FULL refresh — truncates and re-populates from the defining query. This is useful for recovering from edge cases or forcing a clean baseline. It is equivalent to pg_ivm's refresh_immv(name, true).

How much write-side overhead does IMMEDIATE mode add?

Each DML statement on a base table tracked by an IMMEDIATE stream table incurs:

  • BEFORE trigger: Advisory lock acquisition + pre-state setup (~0.1–0.5 ms).
  • AFTER trigger: Transition table copy to temp tables + delta SQL generation + delta application (~1–50 ms depending on query complexity and delta size).

For a simple single-table aggregate, expect 2–10 ms overhead per statement. For multi-table joins or window functions, overhead is higher. The overhead scales with the number of IMMEDIATE stream tables that depend on the same source table.


SQL Support

pg_trickle supports a broad range of SQL in defining queries. This section covers what’s supported, what’s rejected (with rewrites), and how specific constructs like aggregates and ORDER BY are handled. The subsections that follow dive deeper into aggregates, joins, CTEs, window functions, and TopK.

What SQL features are supported in defining queries?

Most common SQL is supported in both FULL and DIFFERENTIAL modes:

  • Table scans, projections, WHERE/HAVING filters
  • INNER, LEFT, RIGHT, FULL OUTER JOIN (including multi-table joins)
  • GROUP BY with 25+ aggregate functions (COUNT, SUM, AVG, MIN, MAX, BOOL_AND/OR, STRING_AGG, ARRAY_AGG, JSON_AGG, JSONB_AGG, BIT_AND/OR/XOR, STDDEV, VARIANCE, MODE, PERCENTILE_CONT/DISC, and more)
  • FILTER (WHERE ...) on aggregates
  • DISTINCT
  • Set operations: UNION ALL, UNION, INTERSECT, INTERSECT ALL, EXCEPT, EXCEPT ALL
  • Subqueries: EXISTS, NOT EXISTS, IN (subquery), NOT IN (subquery), scalar subqueries
  • Non-recursive and recursive CTEs
  • Window functions (ROW_NUMBER, RANK, SUM OVER, etc.)
  • LATERAL joins with set-returning functions and correlated subqueries
  • CASE, COALESCE, NULLIF, GREATEST, LEAST, BETWEEN, IS DISTINCT FROM

See DVM Operators for the complete list.

What SQL features are NOT supported?

The following are rejected with clear error messages and suggested rewrites:

FeatureReasonSuggested Rewrite
TABLESAMPLEStream tables materialize the full result setUse WHERE random() < fraction in consuming query
Window functions in expressionsCannot be differentially maintainedMove window function to a separate column
LIMIT / OFFSET (without ORDER BY)Stream tables materialize the full result set; ORDER BY … LIMIT N [OFFSET M] is supported as TopKApply when querying the stream table, or add ORDER BY + LIMIT to use the TopK pattern
FOR UPDATE / FOR SHARERow-level locking not applicableRemove the locking clause
RANGE_AGG / RANGE_INTERSECT_AGGNo incremental delta decomposition exists for range aggregatesUse FULL mode, or compute range unions in the consuming query

Each rejected feature is explained in detail in the Why Are These SQL Features Not Supported? section below.

What happens to ORDER BY in defining queries?

ORDER BY in the defining query is accepted but silently discarded. This is consistent with how PostgreSQL handles CREATE MATERIALIZED VIEW AS SELECT ... ORDER BY ... — the ordering only affects the initial INSERT, not the stored data.

Stream tables are heap tables with no guaranteed row order. Apply ORDER BY when querying the stream table instead:

-- Don't rely on ORDER BY in the defining query:
-- 'SELECT region, SUM(amount) AS total FROM orders GROUP BY region ORDER BY total DESC'

-- Instead, order when reading:
SELECT * FROM regional_totals ORDER BY total DESC;

Exception: When ORDER BY is paired with LIMIT N (with or without OFFSET M), pg_trickle recognizes the TopK pattern and preserves the ordering, limit, and offset.

Which aggregates support DIFFERENTIAL mode?

Algebraic (O(changes), fully incremental): COUNT, SUM, AVG

Semi-algebraic (incremental with occasional group rescan): MIN, MAX

Group-rescan (affected groups re-aggregated from source): STRING_AGG, ARRAY_AGG, JSON_AGG, JSONB_AGG, BOOL_AND, BOOL_OR, BIT_AND, BIT_OR, BIT_XOR, JSON_OBJECT_AGG, JSONB_OBJECT_AGG, STDDEV, STDDEV_POP, STDDEV_SAMP, VARIANCE, VAR_POP, VAR_SAMP, MODE, PERCENTILE_CONT, PERCENTILE_DISC, CORR, COVAR_POP, COVAR_SAMP, REGR_AVGX, REGR_AVGY, REGR_COUNT, REGR_INTERCEPT, REGR_R2, REGR_SLOPE, REGR_SXX, REGR_SXY, REGR_SYY

37 aggregate function variants are supported in total.


Aggregates & Group-By

Aggregate handling is one of the most complex parts of incremental view maintenance. This section explains how pg_trickle categorizes aggregates by their incremental cost, how hidden auxiliary columns work, and what happens when groups are created or destroyed.

Which aggregates are fully incremental (O(1) per change) vs. group-rescan?

pg_trickle categorizes aggregates into three tiers:

TierCost per changeAggregatesMechanism
AlgebraicO(1)COUNT, SUM, AVGHidden auxiliary columns (__pgt_count, __pgt_sum_x) track running totals. Delta updates these columns arithmetically.
Semi-algebraicO(1) normally, O(group) on extremum deletionMIN, MAXMaintained via LEAST/GREATEST. If the current MIN/MAX is deleted, the group is rescanned to find the new extremum.
Group-rescanO(group size) per affected groupAll others (35 functions)Affected groups are re-aggregated from source data. A NULL sentinel marks stale groups for rescan.

For most workloads, the algebraic tier (COUNT/SUM/AVG) covers the majority of aggregations and is the fastest.

Why do some aggregates have hidden auxiliary columns?

For algebraic aggregates (COUNT, SUM, AVG), the DVM engine adds hidden __pgt_count and __pgt_sum_x columns to the stream table's storage. These store running totals that can be updated with O(1) arithmetic per change instead of rescanning the entire group.

For example, a stream table defined as SELECT dept, AVG(salary) FROM employees GROUP BY dept internally stores:

  • dept — the group-by key
  • avg — the user-visible average (computed as __pgt_sum_x / __pgt_count)
  • __pgt_count — running count of rows in the group
  • __pgt_sum_x — running sum of salary values
  • __pgt_row_id — row identity hash

When a new employee is inserted, the refresh updates __pgt_count += 1, __pgt_sum_x += new_salary, and recomputes avg. No rescan of the source table is needed.

How does HAVING work with incremental refresh?

HAVING is fully supported in DIFFERENTIAL mode. The DVM engine tracks threshold transitions — groups entering or exiting the HAVING condition:

  • Group crosses threshold upward: A previously excluded group (e.g., HAVING COUNT(*) > 5) gains enough members → the group is inserted into the stream table.
  • Group crosses threshold downward: A group that was included drops below the threshold → the group is deleted from the stream table.
  • Group stays above threshold: Normal delta update (adjust aggregate values).

This means the stream table always reflects only the groups that satisfy the HAVING clause, even as group membership changes.

What happens to a group when all its rows are deleted?

When the last row of a group is deleted from the source table, the DVM engine detects that __pgt_count drops to zero and deletes the group row from the stream table. The hidden auxiliary columns are cleaned up along with it.

If a new row for the same group-by key is later inserted, a fresh group row is created from scratch.

Why are CORR, COVAR_*, and REGR_* limited to FULL mode?

Regression aggregates like CORR, COVAR_POP, COVAR_SAMP, and the REGR_* family require maintaining running sums of products and squares across the entire group. Unlike COUNT/SUM/AVG (where deltas can be computed from the change alone), regression aggregates:

  1. Lack algebraic delta rules. There is no closed-form way to update a correlation coefficient from a single row change without access to the full group's data.
  2. Would degrade to group-rescan anyway. Even if supported, the implementation would need to rescan the full group from source — identical to FULL mode for most practical group sizes.

These aggregates work fine in FULL refresh mode, which re-runs the entire query from scratch each cycle.


Joins

Join delta computation can produce surprising results when both sides change simultaneously. This section covers the standard IVM join rule, FULL OUTER JOIN support, and known edge cases.

How does a DIFFERENTIAL refresh handle a join when both sides changed?

When both tables in a join have changes since the last refresh, the DVM engine computes the join delta using the standard IVM join rule:

$$\Delta(R \bowtie S) = (\Delta R \bowtie S) \cup (R \bowtie \Delta S) \cup (\Delta R \bowtie \Delta S)$$

In practice, this means:

  1. Join the changes from the left against the current state of the right.
  2. Join the current state of the left against the changes from the right.
  3. Join the changes from both sides (handles simultaneous changes to matching keys).

All three parts are combined into a single CTE-based delta query that PostgreSQL executes in one pass.

Does pg_trickle support FULL OUTER JOIN incrementally?

Yes. FULL OUTER JOIN is supported in DIFFERENTIAL mode with an 8-part delta computation. This handles all four cases: matched rows on both sides, left-only rows, right-only rows, and rows that transition between matched and unmatched states as data changes.

The 8 parts cover: new left matches, removed left matches, new right matches, removed right matches, newly matched from left-only, newly matched from right-only, newly unmatched to left-only, and newly unmatched to right-only.

What happens when a join key is updated and the joined row is simultaneously deleted?

This is a known edge case. When a join key column is updated in the same refresh cycle as the joined-side row is deleted, the delta may miss the required DELETE, potentially leaving a stale row in the stream table.

Mitigations:

  • The adaptive FULL fallback (triggered when the change ratio exceeds pg_trickle.differential_max_change_ratio) catches most high-change-rate scenarios where this is likely.
  • You can stagger changes across refresh cycles.
  • Use FULL mode for tables where this pattern is common.

How does NATURAL JOIN work?

NATURAL JOIN is fully supported. At parse time, pg_trickle resolves the common columns between the two tables and synthesizes explicit equi-join conditions. The internal __pgt_row_id column is excluded from common column resolution, so NATURAL JOINs between stream tables also work correctly.


CTEs & Recursive Queries

Recursive CTE support is a key differentiator for pg_trickle. This section explains the three maintenance strategies (semi-naive, DRed, recomputation) and when each is used.

Do recursive CTEs work in DIFFERENTIAL mode?

Yes. pg_trickle supports WITH RECURSIVE in DIFFERENTIAL mode with three auto-selected strategies:

StrategyWhen usedHow it works
Semi-naive evaluationINSERT-only changes to the base caseIteratively evaluates new derivations from the inserted rows without touching existing rows. Fastest path.
Delete-and-Rederive (DRed)Mixed changes (INSERT + DELETE/UPDATE)Deletes potentially affected derived rows, then rederives them from scratch to determine the true delta.
Recomputation fallbackColumn mismatch or non-monotone recursive termsFalls back to full recomputation of the recursive CTE. Used when the recursive term contains EXCEPT, Aggregate, Window, DISTINCT, AntiJoin, or INTERSECT SET operators.

The strategy is selected automatically based on the type of changes and the recursive term's structure.

What are the three strategies for recursive CTE maintenance?

See the table above. In brief:

  • Semi-naive is the fast path for append-only workloads (e.g., adding nodes to a tree). It's O(new derivations) — much cheaper than a full re-evaluation.
  • DRed handles deletions and updates correctly by first removing potentially invalidated rows and then rederiving them. More expensive than semi-naive, but still incremental.
  • Recomputation is the safe fallback that re-executes the entire recursive CTE. Used when the recursive term's structure is too complex for incremental processing.

What triggers a fallback from semi-naive to recomputation?

A recomputation fallback is triggered when:

  1. The recursive term contains non-monotone operatorsEXCEPT, Aggregate, Window, DISTINCT, AntiJoin, or INTERSECT SET. These operators can "un-derive" rows when inputs change, which semi-naive evaluation cannot handle.
  2. Column mismatch — the CTE's output columns don't match the stream table's storage schema (e.g., after a schema change).
  3. Mixed DML with non-monotone terms — DELETE or UPDATE changes combined with non-monotone recursive terms always trigger recomputation.

Check which strategy was used in the refresh history:

SELECT action, rows_inserted, rows_deleted
FROM pgtrickle.get_refresh_history('my_recursive_st', 5);

What happens when a CTE is referenced multiple times in the same query?

When a non-recursive CTE is referenced more than once, pg_trickle uses shared delta computation — the CTE's delta is computed once and cached, then reused by each reference. This is tracked via CteScan operator nodes that look up the shared delta from an internal CTE registry.

For single-reference CTEs, pg_trickle simply inlines them as subqueries (no overhead).


Window Functions & LATERAL

Window functions are maintained via partition-based recomputation rather than row-level deltas. This section covers what’s supported, the expression restriction, and LATERAL constructs.

How are window functions maintained incrementally?

pg_trickle uses partition-based recomputation for window functions. When source data changes, the DVM engine:

  1. Identifies which partitions are affected by the changes (based on the PARTITION BY key).
  2. Recomputes the window function for only the affected partitions.
  3. Replaces the old partition results with the new ones in the stream table.

This is more efficient than a full recomputation when changes affect a small number of partitions.

Why can't I use a window function inside a CASE or COALESCE expression?

Window functions like ROW_NUMBER() OVER (…) are supported as standalone columns but cannot be embedded in expressions (e.g., CASE WHEN ROW_NUMBER() OVER (...) = 1 THEN ...).

This restriction exists because the DVM engine handles window functions by recomputing entire partitions. When a window function is buried inside an expression, the engine cannot isolate the window computation from the surrounding expression.

Rewrite: Move the window function to a separate column in one stream table, then reference it in a second stream table:

-- ST1: compute the window function
SELECT id, dept, salary,
       ROW_NUMBER() OVER (PARTITION BY dept ORDER BY salary DESC) AS rn
FROM employees

-- ST2: use it in an expression (references ST1)
SELECT id, CASE WHEN rn = 1 THEN 'top' ELSE 'other' END AS rank_label
FROM st1

What LATERAL constructs are supported?

pg_trickle supports three kinds of LATERAL constructs:

ConstructExampleDelta strategy
Set-returning functionsLATERAL jsonb_array_elements(data)Row-scoped recomputation — only affected parent rows are re-expanded
Correlated subqueriesLATERAL (SELECT ... WHERE t.id = s.id)Row-scoped recomputation
JSON_TABLE (PG 17+)JSON_TABLE(data, '$.items[*]' ...)Modeled as LateralFunction

Additional supported SRFs: jsonb_each, jsonb_each_text, unnest, generate_series, and others.

What happens when a row moves between window partitions during a refresh?

When a row's PARTITION BY key changes (e.g., an employee moves departments), the DVM engine recomputes both the old partition (to remove the row) and the new partition (to add it). Both partitions are re-evaluated from the source data, ensuring window function results are correct.


TopK (ORDER BY … LIMIT)

TopK queries (ORDER BY ... LIMIT N, optionally with OFFSET M) are handled via a specialized MERGE-based strategy that re-executes the bounded query each cycle. This section explains how it works and its limitations.

How does ORDER BY … LIMIT N work in a stream table?

When a defining query has a top-level ORDER BY … LIMIT N (with a constant integer N), pg_trickle recognizes it as a TopK pattern. An optional OFFSET M (constant integer) selects a "page" within the ranked result. The stream table stores exactly N rows and is refreshed via a MERGE-based scoped-recomputation strategy:

  1. On each refresh, the full query (with ORDER BY + LIMIT, and OFFSET if present) is re-executed against the source tables.
  2. The result is merged into the stream table using MERGE with NOT MATCHED BY SOURCE for deletes.
  3. The catalog records topk_limit, topk_order_by, and optionally topk_offset for the stream table.

TopK bypasses the DVM delta pipeline — it always re-executes the bounded query. This is efficient because the result set is bounded by N.

SELECT pgtrickle.create_stream_table(
    name         => 'top_customers',
    query        => 'SELECT customer_id, total FROM order_totals ORDER BY total DESC LIMIT 100',
    schedule     => '1m',
    refresh_mode => 'DIFFERENTIAL'
);

-- With OFFSET — "page 2" of the leaderboard (rows 101–200):
SELECT pgtrickle.create_stream_table(
    name         => 'next_customers',
    query        => 'SELECT customer_id, total FROM order_totals ORDER BY total DESC LIMIT 100 OFFSET 100',
    schedule     => '1m',
    refresh_mode => 'DIFFERENTIAL'
);

Does OFFSET work with TopK?

Yes. ORDER BY … LIMIT N OFFSET M is fully supported. The stream table stores exactly N rows starting from position M+1 in the ranked result. This is useful for:

  • Paginated dashboards: Each page is a separate stream table with a different OFFSET.
  • Excluding outliers: OFFSET 5 LIMIT 50 skips the top 5 and shows the next 50.
  • Windowed leaderboards: OFFSET 10 LIMIT 10 shows the "second tier."

Caveat: When source data changes, the "page" can shift — a row on page 3 may move to page 2 or 4. The stream table always reflects the current state of the page at the time of the last refresh.

OFFSET 0 is treated as no offset.

What happens when a row below the top-N cutoff rises above it?

On the next refresh, the full ORDER BY … LIMIT N query is re-executed. The newly qualifying row appears in the result, and the row that fell out of the top-N is removed. The MERGE operation handles this by:

  • INSERT the newly qualifying row
  • DELETE the row that fell below the cutoff
  • UPDATE any rows whose values changed but remained in the top-N

Since TopK always re-executes the bounded query, it correctly detects all ranking changes.

Can I use TopK with aggregates or joins?

Yes. The defining query can contain any SQL that pg_trickle supports, plus ORDER BY … LIMIT N:

-- TopK over an aggregate
SELECT dept, SUM(salary) AS total_salary
FROM employees GROUP BY dept
ORDER BY total_salary DESC LIMIT 10

-- TopK over a join
SELECT e.name, d.name AS dept, e.salary
FROM employees e JOIN departments d ON e.dept_id = d.id
ORDER BY e.salary DESC LIMIT 20

The only restriction is that TopK cannot be combined with set operations (UNION/INTERSECT/EXCEPT) or GROUPING SETS/CUBE/ROLLUP.


Tables Without Primary Keys

While primary keys are not required, their absence changes how pg_trickle identifies rows. This section explains the content-based hashing fallback and its limitations with duplicate rows.

Do source tables need a primary key?

No, but it is strongly recommended. When a source table has a primary key, pg_trickle uses it to generate a deterministic __pgt_row_id for each row — this is the most reliable way to track row identity across refreshes.

Without a primary key, pg_trickle falls back to content-based hashing — an xxHash of all column values. This works correctly for tables where every row is unique, but has known issues with exact duplicate rows. See What are the risks of using tables without primary keys? for details.

What are the risks of using tables without primary keys?

Content-based row identity has known limitations with exact duplicate rows (rows where every column value is identical):

  1. INSERT as no-op: If a row identical to an existing one is inserted, both have the same __pgt_row_id hash, so the MERGE treats it as a no-op (the row already exists).
  2. DELETE removes all copies: Deleting one of N identical rows generates a DELETE delta, but the MERGE removes all rows with that __pgt_row_id.
  3. Aggregate drift: Over time, these mismatches can cause aggregate values to drift from the true result.

Recommendation: Add a primary key or unique constraint to source tables, or use FULL mode for tables with frequent exact-duplicate rows.

How does content-based row identity work for duplicate rows?

For tables without a primary key, __pgt_row_id is computed as pg_trickle_hash_multi(ARRAY[col1::text, col2::text, ...]) — an xxHash of all column values. Rows with identical content produce identical hashes.

The hash uses \x1E (record separator) between values and \x00NULL\x00 for NULL values, minimizing collision risk for rows with different content. However, truly identical rows (same values in every column) will always hash to the same value — this is inherent to content-based identity.


Change Data Capture (CDC)

This section explains how pg_trickle captures changes to your source tables, the trade-offs between trigger-based and WAL-based CDC, and operational topics like backup/restore and buffer inspection.

How does pg_trickle capture changes to source tables?

pg_trickle installs AFTER INSERT/UPDATE/DELETE row-level PL/pgSQL triggers on each source table referenced by a stream table. Whenever a row in the source table is modified, the trigger writes a change record into a per-source buffer table in the pgtrickle_changes schema.

Each change record contains:

  • ActionI (insert), U (update), D (delete), or T (truncate marker)
  • Row data — old and/or new row values serialized as JSONB
  • LSN — the current WAL log sequence number, used for frontier tracking
  • Transaction ID — links the change to its originating transaction

The trigger fires within your transaction, so if you roll back, the change record is also rolled back. This guarantees that only committed changes appear in the buffer.

What is the overhead of CDC triggers?

The per-row overhead is approximately 20–55 μs, which covers the PL/pgSQL function dispatch, row_to_json() serialization, and the buffer table INSERT.

At typical write rates (fewer than 1,000 writes per second per source table), this adds less than 5% additional DML latency. For most OLTP workloads, the overhead is negligible — a single network round-trip to the database is usually 10–100× more expensive.

If you have very high-throughput source tables (>10K writes/sec), consider enabling the hybrid CDC mode (pg_trickle.cdc_mode = 'auto') which can automatically transition to WAL-based capture for lower per-row overhead (~5–15 μs).

What happens when I TRUNCATE a source table?

TRUNCATE is captured via a statement-level AFTER TRUNCATE trigger that writes a T marker row to the change buffer. When the differential refresh engine detects this marker, it automatically falls back to a full refresh for that cycle, ensuring the stream table stays consistent. Both FULL and DIFFERENTIAL mode stream tables handle TRUNCATE correctly.

Are CDC triggers automatically cleaned up?

Yes. pg_trickle tracks which source tables are referenced by which stream tables in the pgt_dependencies catalog. When the last stream table referencing a particular source table is dropped, pg_trickle automatically:

  1. Removes the CDC triggers from the source table.
  2. Drops the associated change buffer table (pgtrickle_changes.changes_<oid>).

You do not need to manually clean up triggers or buffer tables.

What happens if a source table is dropped or altered?

pg_trickle has DDL event triggers that listen for ALTER TABLE and DROP TABLE on source tables. When a change is detected, pg_trickle responds automatically:

  1. All stream tables that depend on the altered source are marked with needs_reinit = true in the catalog.
  2. On the next scheduler cycle, each affected stream table is reinitialized — the existing storage table is dropped, recreated from the current defining query schema, and re-populated with a full refresh.
  3. A reinitialize_needed NOTIFY alert is sent so your monitoring can detect the event.

If the DDL change breaks the defining query (e.g., a column referenced in the query was dropped), the reinitialization will fail and the stream table will enter ERROR status. In that case, you need to drop and recreate the stream table with an updated query.

How do I check if a source table has switched from trigger-based CDC to WAL-based CDC?

When you enable hybrid CDC (pg_trickle.cdc_mode = 'auto'), pg_trickle starts capturing changes with triggers and can automatically transition to WAL-based logical replication once conditions are met. There are several ways to check the current CDC mode for each source table:

1. Query the dependency catalog directly:

SELECT d.source_relid, c.relname AS source_table, d.cdc_mode,
       d.slot_name, d.decoder_confirmed_lsn, d.transition_started_at
FROM pgtrickle.pgt_dependencies d
JOIN pg_class c ON c.oid = d.source_relid;

The cdc_mode column shows one of three values:

  • TRIGGER — changes are captured via row-level triggers (the default)
  • TRANSITIONING — the system is in the process of switching from triggers to WAL
  • WAL — changes are captured via logical replication

2. Use the built-in health check function:

SELECT source_table, cdc_mode, slot_name, lag_bytes, alert
FROM pgtrickle.check_cdc_health();

This returns a row per source table with the current mode, replication slot lag (for WAL-mode sources), and any alert conditions such as slot_lag_exceeds_threshold or replication_slot_missing.

3. Listen for real-time transition notifications:

LISTEN pg_trickle_cdc_transition;

pg_trickle sends a NOTIFY with a JSON payload whenever a transition starts, completes, or is rolled back. Example payload:

{
  "event": "transition_complete",
  "source_table": "public.orders",
  "old_mode": "TRANSITIONING",
  "new_mode": "WAL",
  "slot_name": "pg_trickle_slot_16384"
}

This lets you integrate CDC mode changes into your monitoring stack without polling.

4. Check the global GUC setting:

SHOW pg_trickle.cdc_mode;

This shows the desired global behavior (trigger, auto, or wal), not the per-table actual state. The per-table state lives in pgt_dependencies.cdc_mode as described above.

See CONFIGURATION.md for details on the pg_trickle.cdc_mode, pg_trickle.wal_transition_timeout, pg_trickle.slot_lag_warning_threshold_mb, and pg_trickle.slot_lag_critical_threshold_mb GUCs.

Is it safe to add triggers to a stream table while the source table is switching CDC modes?

Yes, this is completely safe. CDC mode transitions and user-defined triggers operate on different tables and do not interfere with each other:

  • CDC transitions affect how changes are captured from source tables (e.g., orders). The transition switches the capture mechanism from row-level triggers on the source table to WAL-based logical replication.
  • User-defined triggers live on stream tables (e.g., order_totals) and control how the refresh engine applies changes to the materialized output.

Because these are independent concerns, you can freely add, modify, or remove triggers on a stream table at any point — including during an active CDC transition on its source tables.

How it works in practice:

  1. The refresh engine checks for user-defined triggers on the stream table at the start of each refresh cycle (via a fast pg_trigger lookup, <0.1 ms).
  2. If user triggers are detected, the engine uses explicit DELETE / UPDATE / INSERT statements instead of MERGE, so your triggers fire with correct TG_OP, OLD, and NEW values.
  3. The change data consumed by the refresh engine has the same format regardless of whether it came from CDC triggers or WAL decoding — so the trigger detection and the CDC mode are fully decoupled.

A trigger added between two refresh cycles will simply be picked up on the next cycle. The only (theoretical) edge case is adding a trigger in the tiny window during a single refresh transaction, between the trigger-detection check and the MERGE execution — but since both happen within the same transaction, this is virtually impossible in practice.

Why does pg_trickle use triggers instead of logical replication for initial CDC?

pg_trickle always bootstraps CDC with row-level AFTER triggers because they provide single-transaction atomicity — the change record is written in the same transaction as the source DML, so:

  1. No commit-order ambiguity. The change buffer always reflects committed data; rolled-back transactions never produce partial change records.
  2. No replication slot management at creation time. Logical replication requires creating and monitoring replication slots, which can bloat WAL if the subscriber falls behind. Trigger-based bootstrap avoids this complexity.
  3. Works on all hosting providers. Some managed PostgreSQL services restrict wal_level = logical or limit the number of replication slots. Trigger bootstrap works everywhere, with no configuration changes.
  4. Simpler initial deployment. No need for wal_level = logical, no publication/subscription setup, and no extra connections for WAL senders.

With pg_trickle.cdc_mode = 'auto' (the default since v0.3.0), pg_trickle uses triggers initially and then transparently transitions to WAL-based CDC if wal_level = logical is available. If WAL is not available, triggers are kept permanently — no degradation, no errors. Set pg_trickle.cdc_mode = 'trigger' if you want to disable WAL transitions entirely. See ADR-001 and ADR-002 in the architecture documentation for the full rationale.

Why is auto the default pg_trickle.cdc_mode?

As of v0.3.0, auto is the default CDC mode. This was changed from trigger based on the following considerations:

1. Safe no-op on standard installs. PostgreSQL ships with wal_level = replica by default. In this configuration, auto simply stays on trigger-based CDC permanently — it does not create replication slots, publications, or any WAL infrastructure. There is no error, warning, or user-visible difference from the old trigger default. auto only activates the WAL transition path when wal_level = logical is explicitly configured by the operator.

2. Automatic fallback hardening. The WAL transition and steady-state polling now include robust automatic fallback:

  • Consecutive poll errors (5 failures) trigger automatic revert to triggers.
  • check_decoder_health() validates slot existence, WAL lag, and wal_level on every tick.
  • The TRANSITIONING phase has a progressive timeout with informative warnings.
  • Post-restart health checks (check_cdc_transition_health()) automatically clean up stale transitions.

3. Zero overhead for trigger-only deployments. When wal_level != logical, the auto scheduler branch takes a fast-path exit after a single GUC check and pg_replication_slots query. The overhead compared to trigger mode is negligible (<1 ms per scheduler tick).

4. Progressive optimisation without config changes. When an operator later enables wal_level = logical (e.g., for other replication needs), pg_trickle automatically benefits from lower per-row CDC overhead (~5–15 μs vs ~20–55 μs) without any configuration change. This aligns with the principle of least surprise.

When to use trigger instead: Set pg_trickle.cdc_mode = 'trigger' if you want fully deterministic trigger-only behaviour, need to minimize any replication slot management, or are on a restricted managed PostgreSQL that caps replication slots. This reverts to the pre-v0.3.0 default.

Caveats to be aware of in auto mode:

  • Keyless tables (no PRIMARY KEY) stay on triggers permanently — WAL mode requires a PK for pk_hash computation.
  • Replication slots prevent WAL recycling: if the decoder falls behind, WAL accumulates. pg_trickle now warns at pg_trickle.slot_lag_warning_threshold_mb (default 100 MB) and marks per-source CDC health unhealthy at pg_trickle.slot_lag_critical_threshold_mb (default 1024 MB).
  • The TRANSITIONING phase runs both trigger and WAL decoder simultaneously; LSN-based deduplication handles correctness. If anything goes wrong, the system rolls back to triggers.

How does the trigger-to-WAL automatic transition work?

When pg_trickle.cdc_mode = 'auto', pg_trickle monitors each source table's write rate. When the rate exceeds an internal threshold, the transition proceeds in three phases:

  1. Slot creation. A logical replication slot is created for the source table's OID (e.g., pg_trickle_slot_16384).
  2. Dual capture. For a brief period, both triggers and WAL decoding capture changes. The system uses LSN comparison to deduplicate, ensuring no changes are lost or double-counted.
  3. Trigger removal. Once the WAL decoder has confirmed it is caught up (its confirmed LSN ≥ the frontier LSN), the row-level triggers are dropped and the source transitions fully to WAL mode.

The transition is tracked in pgt_dependencies.cdc_mode (values: TRIGGERTRANSITIONINGWAL). If the transition times out (pg_trickle.wal_transition_timeout, default 5 minutes), it is rolled back and triggers are kept.

What happens to CDC if I restore a database backup?

After restoring a backup (pg_dump, pg_basebackup, or PITR), the CDC state depends on the backup type:

Backup typeTriggersChange buffersFrontierAction needed
pg_dump (logical)Preserved (in DDL)Buffer rows includedCatalog restoredUsually none — next refresh detects stale frontier and does a full refresh
pg_basebackup (physical)PreservedBuffer rows preserved (committed at backup time)Catalog restoredReplication slots may be invalid — WAL-mode sources may need manual transition back to TRIGGER mode
PITR (point-in-time)PreservedOnly committed buffer rows at the recovery targetCatalog restoredSimilar to pg_basebackup; frontier may point ahead of actual buffer content → first refresh does a full refresh to reconcile

In all cases, the pg_trickle scheduler automatically detects frontier inconsistencies and falls back to a full refresh for the first cycle after restore. No manual intervention is required for trigger-mode sources.

For full guidelines on disaster recovery strategies, see our dedicated Backup and Restore chapter.

For WAL-mode sources, replication slots created after the backup point will not exist in the restored state. Set pg_trickle.cdc_mode = 'trigger' temporarily, or let the auto transition recreate slots.

Do CDC triggers fire for rows inserted via logical replication (subscribers)?

Yes. PostgreSQL fires row-level triggers on the subscriber side for rows applied via logical replication. This means if you have a subscriber database with pg_trickle installed, the CDC triggers will capture replicated changes into the local change buffers.

Implication: You can run stream tables on a subscriber database that tracks replicated tables — the change capture works transparently. However, be careful about:

  • Double-counting. If the same table is tracked by pg_trickle on both the publisher and subscriber, changes are captured twice (once on each side). This is fine if the stream tables are independent, but confusing if you expect them to be identical.
  • Replication lag. The stream table on the subscriber will be delayed by both the replication lag and the pg_trickle refresh schedule.

Can I inspect the change buffer tables directly?

Yes. Change buffers are ordinary tables in the pgtrickle_changes schema, named changes_<source_oid>:

-- List all change buffer tables
SELECT tablename FROM pg_tables WHERE schemaname = 'pgtrickle_changes';

-- Inspect recent changes for a source table (find OID first)
SELECT c.oid FROM pg_class c JOIN pg_namespace n ON n.oid = c.relnamespace
WHERE c.relname = 'orders' AND n.nspname = 'public';

-- Then query the buffer
SELECT action, lsn, txid, old_data, new_data
FROM pgtrickle_changes.changes_16384
ORDER BY lsn DESC LIMIT 10;

The action column contains: I (insert), U (update), D (delete), or T (truncate).

Warning: Do not modify buffer tables directly. The refresh engine manages buffer cleanup (truncation) after each successful refresh. Manual changes will corrupt the frontier tracking.

How does pg_trickle prevent its own refresh writes from re-triggering CDC?

When the refresh engine writes to a stream table (via MERGE or explicit DML), it does not trigger CDC capture on that stream table, even if the stream table is itself a source for a downstream stream table. This is because:

  1. CDC triggers are only installed on source tables, not on stream tables. The refresh engine writes directly to the stream table's storage without going through any change-capture mechanism.
  2. Downstream change propagation uses a different path. When stream table A is a source for stream table B, changes to A are detected at B's refresh time by re-reading A's data (not via triggers on A). The topological ordering ensures A is refreshed before B.

This design prevents infinite loops (A triggers B triggers A) and avoids the overhead of capturing changes to materialized output that will be recomputed anyway.


Diamond Dependencies & DAG Scheduling

When multiple stream tables form a diamond-shaped dependency graph, careful coordination is needed to avoid inconsistent snapshots. This section covers atomic consistency, schedule policies, and topological ordering.

What is a diamond dependency and why does it matter?

A diamond dependency occurs when two (or more) intermediate stream tables both depend on the same source, and a downstream stream table depends on both of them:

       Source: orders
       /             \
  ST: totals      ST: counts
       \             /
    ST: combined_report

Without coordination, combined_report might be refreshed after totals is updated but before counts is updated (or vice versa), producing a temporarily inconsistent snapshot — totals reflects the latest data but counts is stale.

What does diamond_consistency = 'atomic' do?

When diamond_consistency = 'atomic' is set on the downstream stream table (e.g., combined_report), pg_trickle ensures that all upstream stream tables in the diamond are refreshed within the same scheduler cycle before the downstream table is refreshed. This guarantees a consistent point-in-time snapshot.

If any upstream refresh in the atomic group fails, the downstream refresh is skipped for that cycle to avoid inconsistency. The failed upstream will be retried on the next cycle.

SELECT pgtrickle.alter_stream_table('combined_report',
    diamond_consistency => 'atomic');

What is the difference between 'fastest' and 'slowest' schedule policy?

When a stream table has multiple upstream dependencies with different schedules, pg_trickle needs a policy for when to refresh the downstream table:

PolicyBehaviorBest for
fastestRefresh downstream whenever any upstream refreshesLow-latency dashboards where partial freshness is acceptable
slowestRefresh downstream only after all upstreams have refreshedReports requiring all-or-nothing consistency

The default is fastest. Use slowest with diamond_consistency = 'atomic' for the strongest consistency guarantees.

What happens when an atomic diamond group partially fails?

When diamond_consistency = 'atomic' is set and one upstream stream table in the diamond fails to refresh:

  1. The downstream refresh is skipped for that cycle (it reads stale-but-consistent data from the previous successful cycle).
  2. The failed upstream follows the normal retry logic (exponential backoff, up to max_consecutive_errors).
  3. Other non-failing upstreams in the diamond are still refreshed normally — their data is fresh, but the downstream won't consume it until all upstreams succeed.
  4. A NOTIFY pg_trickle_alert with event diamond_partial_failure is sent so your monitoring can detect the situation.

How does pg_trickle determine topological refresh order?

The scheduler builds a directed acyclic graph (DAG) of stream table dependencies at startup and after any create_stream_table / drop_stream_table call. The algorithm:

  1. Edge discovery. For each stream table, the defining query's source tables are extracted. If a source table is itself a stream table, a dependency edge is added.
  2. Cycle detection. The DAG is checked for cycles. If a cycle is detected, the offending create_stream_table call is rejected with a clear error message listing the cycle path.
  3. Topological sort. A Kahn's algorithm topological sort produces the refresh order — leaf nodes (no stream table dependencies) are refreshed first, then their dependents, and so on.
  4. Level assignment. Each stream table is assigned a "level" (0 for leaves, max(parent levels) + 1 for dependents). Stream tables at the same level are refreshed concurrently when pg_trickle.parallel_refresh_mode = 'on'.

The topological order is recalculated whenever the DAG changes. You can inspect it with:

SELECT pgt_name, depends_on, topo_level
FROM pgtrickle.stream_tables_info
ORDER BY topo_level, pgt_name;

Schema Changes & DDL Events

pg_trickle detects source table schema changes via PostgreSQL’s DDL event trigger system and reacts automatically. This section explains what happens for various DDL operations and how to handle them.

What happens when I add a column to a source table?

Adding a column to a source table is safe and non-disruptive if the stream table's defining query does not use SELECT *:

  • Named columns: If the defining query explicitly lists columns (e.g., SELECT id, name, amount FROM orders), the new column is simply not captured by CDC and has no effect on the stream table.
  • SELECT *: If the defining query uses SELECT *, pg_trickle detects the schema mismatch at the next refresh and marks the stream table with needs_reinit = true. The next scheduler cycle performs a full reinitialization — drops the storage table, recreates it with the new column set, and does a full refresh.

CDC triggers capture the full row as JSONB regardless of which columns the stream table uses, so no trigger changes are needed.

What happens when I drop a column used in a stream table's query?

Dropping a column that is referenced in a stream table's defining query will cause the next refresh to fail because the column no longer exists in the source table. pg_trickle handles this via:

  1. DDL event trigger detects the ALTER TABLE ... DROP COLUMN and marks all affected stream tables with needs_reinit = true.
  2. On the next refresh cycle, the scheduler attempts reinitialization — but the defining query will fail with a PostgreSQL error (e.g., column "amount" does not exist).
  3. The stream table moves to ERROR status after max_consecutive_errors failures.
  4. A reinitialize_needed NOTIFY alert is sent.

Resolution: Drop and recreate the stream table with an updated defining query:

SELECT pgtrickle.drop_stream_table('order_totals');
SELECT pgtrickle.create_stream_table(
    name         => 'order_totals',
    query        => 'SELECT id, name FROM orders',  -- updated query without dropped column
    schedule     => '1m',
    refresh_mode => 'DIFFERENTIAL'
);

What happens when I CREATE OR REPLACE a view used by a stream table?

PostgreSQL event triggers fire on CREATE OR REPLACE VIEW, so pg_trickle detects the change and marks dependent stream tables with needs_reinit = true. On the next refresh:

  • If the new view definition is compatible (same output columns, same types), reinitialization succeeds transparently — the stream table is repopulated with the new query logic.
  • If the new view definition changes the output schema (different columns or types), the delta query will fail and the stream table enters ERROR status.

Tip: To avoid disruption, use pgtrickle.alter_stream_table() to pause the stream table before replacing the view, then resume after verifying compatibility.

What happens when I alter or drop a function used in a stream table's query?

If a stream table's defining query calls a user-defined function (e.g., SELECT my_func(amount) FROM orders) and that function is altered or dropped:

  • ALTER FUNCTION (changing the body): pg_trickle does not detect this automatically — PostgreSQL does not fire DDL event triggers for function body changes. The stream table continues refreshing with the new function behavior. If this is intentional, no action is needed. If you want a full rebase to the new logic, temporarily switch to FULL mode and refresh:
    SELECT pgtrickle.alter_stream_table('my_st', refresh_mode => 'FULL');
    SELECT pgtrickle.refresh_stream_table('my_st');
    SELECT pgtrickle.alter_stream_table('my_st', refresh_mode => 'DIFFERENTIAL');
    
  • DROP FUNCTION: The next refresh fails because the function no longer exists. The stream table enters ERROR status. Recreate the function or drop and recreate the stream table.

What is reinitialize and when does it trigger?

Reinitialize is pg_trickle's mechanism for handling structural changes to source tables. When a stream table is marked with needs_reinit = true, the next scheduler cycle performs:

  1. Drop the existing storage table (the physical heap table backing the stream table).
  2. Recreate the storage table from the defining query's current output schema.
  3. Full refresh — run the defining query against current source data and populate the new storage table.
  4. Reset the frontier to the current LSN.
  5. Clear the needs_reinit flag.

Reinitialize triggers automatically when:

  • DDL event triggers detect ALTER TABLE, DROP TABLE, or CREATE OR REPLACE VIEW on source tables or intermediate views.
  • A needs_reinit NOTIFY alert is sent.
  • You can also trigger it manually:
    UPDATE pgtrickle.pgt_stream_tables SET needs_reinit = true WHERE pgt_name = 'my_st';
    

Can I block DDL on tracked source tables?

pg_trickle does not currently block DDL on source tables — it only reacts to DDL changes via event triggers. If you want to prevent accidental schema changes on critical source tables, use PostgreSQL's built-in mechanisms:

-- Revoke ALTER/DROP from application roles
REVOKE ALL ON TABLE orders FROM app_user;
GRANT SELECT, INSERT, UPDATE, DELETE ON TABLE orders TO app_user;
-- Only the table owner (or superuser) can now ALTER/DROP

Alternatively, create a custom event trigger that raises an exception when DDL targets tracked source tables:

CREATE OR REPLACE FUNCTION prevent_source_ddl() RETURNS event_trigger AS $$
BEGIN
    IF EXISTS (
        SELECT 1 FROM pg_event_trigger_ddl_commands() cmd
        JOIN pgtrickle.pgt_dependencies d ON d.source_relid = cmd.objid
    ) THEN
        RAISE EXCEPTION 'Cannot ALTER/DROP a table tracked by pg_trickle';
    END IF;
END;
$$ LANGUAGE plpgsql;

CREATE EVENT TRIGGER guard_source_ddl ON ddl_command_end
EXECUTE FUNCTION prevent_source_ddl();

What happens if I run DDL on a source table during an active refresh?

PostgreSQL's locking mechanism prevents most conflicts. The refresh transaction acquires a ShareLock on source tables before reading them. Since ALTER TABLE (including ADD COLUMN, DROP COLUMN, ALTER TYPE) requires an AccessExclusiveLock, the DDL statement blocks until the refresh transaction completes.

In practice:

  • During a refresh: The ALTER TABLE waits for the refresh to finish, then proceeds. pg_trickle's DDL event trigger then detects the change and marks the stream table for reinitialization.
  • Between refreshes: DDL proceeds immediately. The next refresh picks up the reinitialization flag.

There is a tiny theoretical window between lock acquisition and the first read where DDL could sneak in, but this is prevented by PostgreSQL's MVCC — the refresh's snapshot was taken before the DDL committed, so it reads the old schema regardless.

If pg_trickle.block_source_ddl = true: Column-affecting DDL on tracked source tables is rejected entirely with an ERROR, regardless of whether a refresh is running.

Do stream tables work with logical replication?

Stream tables are replicated to standbys via physical (streaming) replication like any other heap table. However, they are not automatically maintained by pg_trickle on the subscriber:

AspectPrimaryPhysical standbyLogical subscriber
Scheduler runsYesNo (read-only)No (no pg_trickle catalog)
Stream tables readableYesYes (replicated)Only if published
Refreshes occurYesNo (standby is read-only)No
Change buffersManaged by pg_trickleReplicated but not consumedNot available

Key limitations:

  • Change buffer tables (pgtrickle_changes.*) are not published through logical replication — they are internal transient data.
  • The pg_trickle catalog (pgtrickle.pgt_stream_tables) is not replicated through logical replication.
  • On a physical standby, stream tables receive updates through streaming replication with the usual replication lag.

Recommended pattern: Run pg_trickle on the primary only. Read stream tables from any physical standby.


Performance & Tuning

This section covers scheduler tuning, the adaptive FULL fallback, disk space management, and guidance on when to use DIFFERENTIAL vs. FULL mode.

How do I tune the scheduler interval?

The pg_trickle.scheduler_interval_ms GUC controls how often the scheduler checks for stale stream tables (default: 1000 ms).

WorkloadRecommended Value
Low-latency (near real-time)100500
Standard1000 (default)
Low-overhead (many STs, long schedules)500010000

Is there any risk in setting min_schedule_seconds very low?

Yes. pg_trickle.min_schedule_seconds (default: 60) is a safety guardrail, not an arbitrary limit. Setting it very low — especially in production — can cause several problems:

WAL amplification. Every differential refresh writes a MERGE to the WAL. At 1-second intervals across many stream tables, WAL generation rises sharply, increasing replication lag and storage costs.

Lock contention. Each refresh acquires locks on the change buffer table. With cleanup_use_truncate = true (the default), this is an AccessExclusiveLock. Sub-second schedules can starve concurrent INSERT/UPDATE/DELETE statements on the source tables.

Cascading refresh load. If a refresh takes longer than the schedule interval (e.g., an 800 ms refresh on a 1-second schedule), the next refresh fires almost immediately upon completion. With chained or diamond-shaped ST graphs, the entire topological chain must complete within the interval to avoid falling behind.

Autovacuum pressure. Rapid MERGE operations produce dead tuples in the stream table faster than autovacuum can clean them up, bloating the table and degrading query performance over time.

Adaptive fallback triggering. At high change rates, pg_trickle.differential_max_change_ratio may trigger a FULL refresh instead of DIFFERENTIAL. A FULL refresh at 1-second intervals is very expensive and defeats the purpose of differential maintenance.

Practical guidance:

EnvironmentRecommended minimum
Development / testing1 s — fine for fast iteration
Lightly loaded production1030 s
Standard production60 s (default)
High-throughput OLTP120+ s — let change buffers accumulate for efficient batch merging

If you need near-real-time results, consider IMMEDIATE mode (refresh_mode => 'DIFFERENTIAL' with same-transaction refresh) instead of a very short schedule — it avoids the scheduler overhead entirely and updates the stream table within your transaction.

What is the adaptive fallback to FULL?

When the number of pending changes exceeds pg_trickle.differential_max_change_ratio (default: 15%) of the source table size, DIFFERENTIAL mode automatically falls back to FULL for that refresh cycle. This prevents pathological delta queries on bulk changes.

  • Set to 0.0 to always use DIFFERENTIAL (even on large change sets)
  • Set to 1.0 to effectively always use FULL
  • Default 0.15 (15%) is a good balance

How many concurrent refreshes can run?

By default (parallel_refresh_mode = 'off') refreshes are processed sequentially within the scheduler's single background worker. This is safe and efficient for most deployments.

Starting in v0.4.0, true parallel refresh is available via:

ALTER SYSTEM SET pg_trickle.parallel_refresh_mode = 'on';
ALTER SYSTEM SET pg_trickle.max_dynamic_refresh_workers = 4;  -- cluster-wide cap
ALTER SYSTEM SET pg_trickle.max_concurrent_refreshes = 4;    -- per-database cap
SELECT pg_reload_conf();

When enabled, independent stream tables at the same DAG level are refreshed concurrently in separate dynamic background workers. Each worker uses one max_worker_processes slot — see the worker-budget formula before enabling.

Monitor parallel refresh with:

SELECT * FROM pgtrickle.worker_pool_status();
SELECT * FROM pgtrickle.parallel_job_status(60);

For most deployments with fewer than 100 stream tables, sequential processing is still efficient (each differential refresh typically takes 5–50 ms).

How do I check if my stream tables are keeping up?

-- Quick overview
SELECT pgt_name, status, staleness, stale
FROM pgtrickle.stream_tables_info;

-- Detailed statistics
SELECT pgt_name, total_refreshes, avg_duration_ms, consecutive_errors, stale
FROM pgtrickle.pg_stat_stream_tables;

-- Recent refresh history for a specific ST
SELECT * FROM pgtrickle.get_refresh_history('order_totals', 10);

What is __pgt_row_id?

Every stream table has a __pgt_row_id BIGINT PRIMARY KEY column that stores a 64-bit xxHash of the row's identity key. The refresh engine uses it to match incoming deltas against existing rows during MERGE operations.

For a detailed explanation of how this column is computed and why it exists, see What is the __pgt_row_id column and why does it appear in my stream tables? in the General section.

You should ignore this column in your queries. It is an implementation detail.

How much disk space do change buffer tables consume?

Each change buffer table stores one row per source-table change (INSERT, UPDATE, DELETE, or TRUNCATE marker). The row size depends on the source table's column count and data types:

ComponentApproximate size
action column (char)1 byte
old_data / new_data (JSONB)1–10 KB per row (depends on source columns)
lsn (pg_lsn)8 bytes
txid (xid8)8 bytes
Index (on lsn)~40 bytes per row

Rule of thumb: Buffer tables consume roughly 2–3× the raw row size of the source change, because both OLD and NEW values are stored as JSONB.

Buffer tables are cleaned up (truncated or deleted) after each successful refresh. If you suspect buffer bloat, check:

SELECT relname, pg_size_pretty(pg_total_relation_size(oid)) AS size
FROM pg_class
WHERE relnamespace = (SELECT oid FROM pg_namespace WHERE nspname = 'pgtrickle_changes')
ORDER BY pg_total_relation_size(oid) DESC;

What determines whether DIFFERENTIAL or FULL is faster for a given workload?

The breakeven point depends on the change ratio — the number of changed rows relative to the total source table size:

Change ratioRecommended modeWhy
< 5%DIFFERENTIALDelta query touches few rows; much cheaper than re-reading everything
5–15%DIFFERENTIAL (usually)Still faster, but approaching the crossover
15–50%FULLThe delta query scans a large fraction of the source anyway; FULL avoids the overhead of delta computation
> 50%FULLBulk load scenario — TRUNCATE + INSERT is simpler and faster

Additional factors:

  • Query complexity: Queries with many joins or window functions have more expensive delta computation. The crossover shifts lower.
  • Source table size: For small tables (<10K rows), FULL is nearly always faster because the overhead is negligible.
  • Index presence: DIFFERENTIAL uses indexes to look up changed rows. Missing indexes on join keys or GROUP BY columns can make delta queries slow.

The adaptive fallback (pg_trickle.differential_max_change_ratio, default 0.15) automates this decision per-cycle.

What are the planner hints and when should I disable them?

Before executing a delta query, pg_trickle sets several session-level planner parameters to guide PostgreSQL toward efficient delta plans:

SET LOCAL enable_seqscan = off;     -- Prefer index scans for small deltas
SET LOCAL enable_nestloop = on;     -- Nested loops are good for small delta × large table joins
SET LOCAL enable_mergejoin = off;   -- Merge joins are worse for skewed delta sizes

These hints are active only during the refresh transaction and are reset afterward.

When to disable hints: If you notice that a particular stream table's refresh is slow (check avg_duration_ms in pg_stat_stream_tables), the planner hints may be suboptimal for that specific query. You can disable them by setting:

SET pg_trickle.planner_hints = off;

This allows PostgreSQL's planner to choose its own strategy. Test both settings and compare avg_duration_ms.

How do prepared statements help refresh performance?

The refresh engine uses PostgreSQL prepared statements (PREPARE / EXECUTE) for the delta and MERGE queries. On the first refresh, the statement is prepared; subsequent refreshes reuse the cached plan. Benefits:

  • Reduced planning overhead. For complex delta queries with many joins and CTEs, planning can take 5–50 ms. Prepared statements skip this on subsequent refreshes.
  • Stable plans. The planner uses generic plans after the 5th execution (PostgreSQL default), avoiding plan instability from statistic fluctuations.

Prepared statements are stored per-session and are invalidated when:

  • The stream table is reinitialized (schema change)
  • The shared cache generation advances after DDL or stream-table metadata changes
  • The PostgreSQL connection is recycled
  • The session ends

How does the adaptive FULL fallback threshold work in practice?

The pg_trickle.differential_max_change_ratio GUC (default: 0.15) is evaluated per source table, per refresh cycle:

  1. Before each differential refresh, the engine counts pending changes in the buffer table: pending_changes = COUNT(*) FROM pgtrickle_changes.changes_<oid>.
  2. It estimates the source table size from pg_class.reltuples.
  3. If pending_changes / reltuples > differential_max_change_ratio, the engine falls back to FULL for that cycle.

Edge cases:

  • If the source table has reltuples = 0 (freshly created, no ANALYZE yet), the engine always uses FULL until statistics are available.
  • For multi-source stream tables (joins), each source is evaluated independently. If any source exceeds the threshold, the entire refresh falls back to FULL.
  • The threshold applies to the current cycle only — the next cycle re-evaluates.

How many stream tables can a single PostgreSQL instance handle?

There is no hard limit. Practical limits depend on:

FactorGuideline
Scheduler overheadEach cycle iterates all STs; at 1000 STs with 1ms overhead per check, the cycle takes ~1s
Background connections1 per database (the scheduler) + 1 per manual refresh call
Change buffer bloatEach source table gets its own buffer table — many sources = many tables in pgtrickle_changes
Catalog sizepgt_stream_tables and pgt_dependencies grow linearly
Refresh throughputSequential processing means total cycle time = sum of individual refresh times

Tested benchmarks: Up to 500 stream tables on a single instance with <2s total cycle time for DIFFERENTIAL refreshes averaging 3ms each.

What is the TRUNCATE vs DELETE cleanup trade-off for change buffers?

After each successful refresh, the engine cleans up processed change records from the buffer table. The pg_trickle.cleanup_use_truncate GUC (default: true) controls the method:

MethodProsCons
TRUNCATE (default)Instant — O(1) regardless of row count. Reclaims disk space immediately.Takes an ACCESS EXCLUSIVE lock on the buffer table, briefly blocking concurrent INSERTs from CDC triggers (~0.1 ms typical).
DELETERow-level lock only — no blocking of concurrent CDC writes.O(N) — proportional to the number of processed rows. Dead tuples require VACUUM to reclaim space.

When to switch to DELETE: If your source table has extremely high write throughput (>10K writes/sec) and you observe brief stalls in DML latency during refresh cleanup, switch to DELETE:

ALTER SYSTEM SET pg_trickle.cleanup_use_truncate = false;
SELECT pg_reload_conf();

For most workloads, TRUNCATE is the better choice because buffer tables are typically emptied completely after each refresh.


Interoperability

Stream tables are standard PostgreSQL heap tables, which means they work with most PostgreSQL features. This section clarifies what’s compatible (views, replication, triggers) and what’s not (direct DML, foreign keys).

Can PostgreSQL views reference stream tables?

Yes. Since stream tables are standard PostgreSQL heap tables, you can create views on top of them just like any other table. The view will return whatever data is currently in the stream table, reflecting the most recent refresh:

CREATE VIEW high_value_customers AS
SELECT customer_id, total FROM pgtrickle.order_totals WHERE total > 1000;

This is a common pattern for adding per-user filters or formatting on top of a shared stream table.

Can materialized views reference stream tables?

Yes, though this is usually redundant — both materialized views and stream tables are physical snapshots of query results. The key difference is that the materialized view requires its own manual REFRESH MATERIALIZED VIEW call; it does not auto-refresh when the underlying stream table refreshes.

A more idiomatic approach is to create a second stream table that references the first one. This way, pg_trickle handles the dependency ordering and refresh scheduling for both automatically.

Can I replicate stream tables with logical replication?

Yes. Stream tables can be published like any ordinary table:

CREATE PUBLICATION my_pub FOR TABLE pgtrickle.order_totals;

Important caveats:

  • The __pgt_row_id column is replicated (it is the primary key)
  • Subscribers receive materialized data, not the defining query
  • Do not install pg_trickle on the subscriber and attempt to refresh the replicated table — it will have no CDC triggers or catalog entries
  • Internal change buffer tables are not published by default

Can I INSERT, UPDATE, or DELETE rows in a stream table directly?

No. Stream table contents are managed exclusively by the refresh engine, and direct DML will corrupt the internal state (row IDs, frontier tracking, and change buffer consistency). See Why can't I INSERT, UPDATE, or DELETE rows in a stream table? for a detailed explanation of what goes wrong.

If you need to post-process stream table data, create a view or a second stream table that references the first one.

Can I add foreign keys to or from stream tables?

No. Foreign key constraints are incompatible with how the refresh engine operates. The engine uses bulk MERGE operations that apply inserts and deletes atomically, without guaranteeing the row-by-row ordering that foreign key checks require. Full refreshes also use TRUNCATE + INSERT, which bypasses cascade logic entirely.

See Why can't I add foreign keys? for details. If you need referential integrity, enforce it in your application or in a view that joins the stream tables.

Can I add my own triggers to stream tables?

Yes, for DIFFERENTIAL mode stream tables. When user-defined row-level triggers are detected, the refresh engine automatically switches from MERGE to explicit DELETE + UPDATE + INSERT statements. This ensures triggers fire with the correct TG_OP, OLD, and NEW values. Legacy configs that still set pg_trickle.user_triggers = 'on' are treated the same as auto.

Limitations:

  • Row-level triggers do not fire during FULL refresh (they are automatically suppressed via DISABLE TRIGGER USER). Use REFRESH MODE DIFFERENTIAL for stream tables with triggers.
  • The IS DISTINCT FROM guard prevents no-op UPDATE triggers when the aggregate result is unchanged.
  • BEFORE triggers that modify NEW will affect the stored value — the next refresh may "correct" it back, causing oscillation.

See the pg_trickle.user_triggers GUC in CONFIGURATION.md for control options.

Can I ALTER TABLE a stream table directly?

No. Direct ALTER TABLE would change the physical table without updating pg_trickle's catalog, causing column mismatches and __pgt_row_id invalidation on the next refresh. See Why can't I ALTER TABLE a stream table directly? for details.

Instead, use the pg_trickle API:

-- Change schedule, mode, or status:
SELECT pgtrickle.alter_stream_table('order_totals', schedule => '10m');

-- To change the defining query or column structure, drop and recreate:
SELECT pgtrickle.drop_stream_table('order_totals');
SELECT pgtrickle.create_stream_table(
    name         => 'order_totals',
    query        => '...',
    schedule     => '5m',
    refresh_mode => 'DIFFERENTIAL'
);

Does pg_trickle work with PgBouncer or other connection poolers?

It depends on the pooling mode. pg_trickle's background scheduler uses session-level features that are incompatible with transaction-mode connection pooling:

FeatureIssue with Transaction-Mode Pooling
pg_advisory_lock()Session-level lock released when connection returns to pool — concurrent refreshes possible
PREPARE / EXECUTEPrepared statements are session-scoped — "does not exist" errors on different connections
LISTEN / NOTIFYNotifications lost when listeners change connections

Recommended configurations:

  • Session-mode pooling (pool_mode = session): Fully compatible. The scheduler holds a dedicated connection.
  • Direct connection (no pooler for the scheduler): Fully compatible. Application queries can still go through a pooler.
  • Transaction-mode pooling (pool_mode = transaction): Not supported. The scheduler requires a persistent session.

Tip: If your infrastructure requires transaction-mode pooling (e.g., AWS RDS Proxy, Supabase), route the pg_trickle background worker through a direct connection while keeping application traffic on the pooler. Most connection poolers support per-database or per-user routing rules.

Does pg_trickle work with pgvector?

Partially — it depends on the refresh mode and what the defining query does.

What works:

  • Source tables with vector columns. CDC triggers are generated using PostgreSQL's format_type(), which returns the full type name (e.g. vector(1536)). Change buffer tables mirror the source schema correctly, so inserts, updates, and deletes on pgvector tables are captured and replayed without issue.
  • Passing vector columns through in DIFFERENTIAL mode. Stream tables that select, filter (on non-vector columns), or join sources that happen to contain vector columns work correctly — the vector data is treated as an opaque value and copied through unchanged.
  • FULL mode with any pgvector expression. Because FULL mode re-executes the entire defining query, all pgvector operators (<->, <=>, <#>) and functions (cosine_distance, l2_normalize, etc.) work exactly as they do in a regular query.

What does not work:

  • DIFFERENTIAL mode with pgvector distance operators in the query. The DVM engine needs a differentiation rule for every SQL operator it encounters. Custom operators like <-> (L2 distance) or <=> (cosine distance) are not in the built-in rule set. The engine will fall back automatically to FULL mode if such operators appear in the delta query path. Set refresh_mode => 'FULL' explicitly to make this intent clear.
  • Incremental aggregation over vector columns. There is no meaningful incremental form for aggregates over vector values (e.g. averaging embeddings). Use FULL mode for any aggregate that involves vector arithmetic.

Recommended pattern for a nearest-neighbour cache or semantic search result set:

CREATE EXTENSION IF NOT EXISTS vector;

SELECT pgtrickle.create_stream_table(
    name         => 'top_similar_docs',
    query        => $$
        SELECT d.id, d.title, d.embedding,
               d.embedding <=> '[0.1, 0.2, 0.3]'::vector AS distance
        FROM documents d
        ORDER BY distance
        LIMIT 100
    $$,
    schedule     => '5m',
    refresh_mode => 'FULL'
);

For use-cases that only carry vector columns through without computing on them, DIFFERENTIAL mode works fine:

-- Vectors are not used in the delta computation — DIFFERENTIAL is safe here
SELECT pgtrickle.create_stream_table(
    name         => 'active_doc_embeddings',
    query        => $$
        SELECT id, embedding
        FROM documents
        WHERE status = 'published'
    $$,
    schedule     => '1m',
    refresh_mode => 'DIFFERENTIAL'
);

dbt Integration

The dbt-pgtrickle package provides a stream_table materialization that lets you manage stream tables through dbt’s standard workflow. This section covers setup, commands, freshness checks, and query change handling.

How do I use pg_trickle with dbt?

Install the dbt-pgtrickle package (a pure Jinja SQL macro package — no Python dependencies):

# packages.yml
packages:
  - package: pg_trickle/dbt_pgtrickle
    version: ">=0.2.0"

Then define a stream table model using the stream_table materialization:

-- models/order_totals.sql
{{ config(
    materialized='stream_table',
    schedule='1m',
    refresh_mode='DIFFERENTIAL'
) }}

SELECT customer_id, SUM(amount) AS total
FROM {{ source('public', 'orders') }}
GROUP BY customer_id

The stream_table materialization calls pgtrickle.create_stream_table() on the first run and pgtrickle.alter_stream_table() on subsequent runs (if the schedule or mode changes).

What dbt commands work with stream tables?

CommandBehavior
dbt runCreates stream tables that don't exist; updates schedule/mode if changed; does not alter the defining query of existing STs
dbt run --full-refreshDrops and recreates all stream tables from scratch (new defining query, fresh data)
dbt testWorks normally — tests query the stream table as a regular table
dbt source freshnessWorks if you configure a freshness block on the stream table source
dbt docs generateDocuments stream tables like any other model

How does dbt run --full-refresh work with stream tables?

When --full-refresh is passed, the stream_table materialization:

  1. Calls pgtrickle.drop_stream_table('model_name') to remove the existing stream table, CDC triggers, and change buffers.
  2. Calls pgtrickle.create_stream_table(...) with the current defining query from the model file.
  3. The new stream table starts in INITIALIZING status and performs its first full refresh.

This is the correct way to update a stream table's defining query in dbt. Without --full-refresh, dbt will not detect query changes (it only compares schedule and mode).

How do I check stream table freshness in dbt?

Use dbt's built-in source freshness feature by adding a freshness block to your source definition:

# models/sources.yml
sources:
  - name: pgtrickle
    schema: pgtrickle
    tables:
      - name: order_totals
        loaded_at_field: "last_refreshed_at"  # from stream_tables_info
        freshness:
          warn_after: {count: 5, period: minute}
          error_after: {count: 15, period: minute}

Then run dbt source freshness to check.

Alternatively, query the pg_trickle monitoring views directly in a dbt test:

-- tests/check_freshness.sql
SELECT pgt_name FROM pgtrickle.stream_tables_info WHERE stale = true

What happens when the defining query changes in dbt?

If you modify the SQL in a stream table model file and run dbt run without --full-refresh:

  • The stream_table materialization detects that the stream table already exists.
  • It compares the schedule and refresh mode — if either changed, it calls alter_stream_table() to update them.
  • It does not compare the defining query text. The existing defining query remains in effect.

To apply a new defining query, you must run dbt run --full-refresh. This drops and recreates the stream table with the new query.

Recommendation: After changing a model's SQL, always run dbt run --full-refresh -s model_name to apply the change.

Can I use dbt snapshot with stream tables?

Yes, with caveats. dbt snapshots work by tracking changes to a source table over time using updated_at or check strategies. You can snapshot a stream table like any other table.

However, keep in mind:

  • Stream tables are refreshed periodically, not on every write. The snapshot will only capture changes at refresh boundaries, not at the granularity of individual source-table writes.
  • The __pgt_row_id column will appear in the snapshot. You may want to exclude it with check_cols or a select in the snapshot configuration.
  • FULL refresh mode replaces all rows each cycle, which will appear as "updates" to the snapshot strategy even if the data hasn't changed. Use DIFFERENTIAL mode for stream tables that are snapshotted.

What dbt versions are supported?

dbt-pgtrickle is a pure Jinja SQL macro package that works with:

  • dbt-core 1.7+ (the stream_table materialization uses standard Jinja patterns)
  • dbt-postgres adapter (required for PostgreSQL connection)

There are no Python dependencies beyond dbt-core and dbt-postgres. The package is tested against dbt 1.7.x and 1.8.x in CI.


Row-Level Security (RLS)

Does RLS on source tables affect stream table content?

No. Stream tables always materialize the full, unfiltered result set, regardless of any RLS policies on source tables. This matches the behavior of PostgreSQL's built-in REFRESH MATERIALIZED VIEW.

The scheduled refresh runs as a superuser background worker. Manual calls to refresh_stream_table() and IMMEDIATE-mode IVM triggers also bypass RLS internally (SET LOCAL row_security = off / SECURITY DEFINER trigger functions), ensuring the stream table content is always complete and deterministic.

Can I use RLS on a stream table to filter reads per role?

Yes. Stream tables are regular PostgreSQL tables, so ALTER TABLE … ENABLE ROW LEVEL SECURITY and CREATE POLICY work exactly as expected. This is the recommended pattern for multi-tenant filtering:

ALTER TABLE pgtrickle.order_totals ENABLE ROW LEVEL SECURITY;

CREATE POLICY tenant_isolation ON pgtrickle.order_totals
    USING (tenant_id = current_setting('app.tenant_id')::INT);

One stream table serves all tenants. Per-tenant filtering happens at query time with zero storage duplication.

What happens when I ENABLE or DISABLE RLS on a source table?

pg_trickle's DDL event trigger detects ALTER TABLE … ENABLE ROW LEVEL SECURITY, DISABLE ROW LEVEL SECURITY, FORCE ROW LEVEL SECURITY, and NO FORCE ROW LEVEL SECURITY on source tables and marks all dependent stream tables for reinitialisation. The same applies to CREATE POLICY, ALTER POLICY, and DROP POLICY.

Why are IVM trigger functions SECURITY DEFINER?

In IMMEDIATE mode, the IVM trigger fires in the DML-issuing user's context. If that user has restricted RLS visibility, the delta query could see only a subset of the base table rows, producing a corrupt stream table. Making the trigger function SECURITY DEFINER (owned by the extension installer, typically a superuser) ensures the delta query always has full visibility. The DML itself is still subject to the user's own RLS policies — only the stream table maintenance runs with elevated privileges.

The trigger functions also set search_path = pg_catalog, pgtrickle, pgtrickle_changes, public to prevent search_path hijacking — a security best practice for all SECURITY DEFINER functions. The public schema is included because the delta SQL references user tables that typically reside there.


Deployment & Operations

This section covers the operational aspects of running pg_trickle in production: background workers, upgrades, restarts, replicas, Kubernetes, partitioned tables, and multi-database deployments.

How many background workers does pg_trickle use?

pg_trickle uses a two-tier background worker model:

  1. Launcher (pg_trickle launcher) — one per cluster, static. Scans pg_database every ~10 seconds and spawns a per-database scheduler for every database where pg_trickle is installed. Automatically re-spawns schedulers that exit.
  2. Per-database scheduler (pg_trickle scheduler) — one dynamic worker per database with pg_trickle installed.
ComponentWorkersNotes
Launcher1 (static)Cluster-wide; connects to postgres database
Scheduler1 per database (dynamic)Persistent per database; drives all refreshes
Parallel refresh workers0–N per databaseOnly when pg_trickle.parallel_refresh_mode = 'on'
WAL decoder0 (shared)Shares the scheduler's SPI connection
Manual refresh0Runs in the caller's session

How do I size max_worker_processes?

When max_worker_processes is too low, the launcher silently fails to spawn schedulers for some databases and retries every 5 minutes. Those databases stop refreshing with no error in the stream table itself — you only see it in the PostgreSQL log:

WARNING:  pg_trickle launcher: could not spawn scheduler for database 'mydb'

The minimum formula:

max_worker_processes ≥
  1 (pg_trickle launcher)
  + N  (one scheduler per database with pg_trickle installed)
  + max_dynamic_refresh_workers  (only if parallel_refresh_mode = 'on'; default 4)
  + autovacuum_max_workers        (default 3)
  + parallel query workers        (max_parallel_workers_per_gather × concurrent queries)
  + slots for other extensions    (logical replication launcher, etc.)

A practical starting point for a cluster with a handful of databases:

max_worker_processes = 32

This value requires a full PostgreSQL restart (not just reload).

How do I upgrade pg_trickle to a new version?

  1. Install the new shared library (replace the .so/.dylib file in PostgreSQL's lib directory).
  2. Run the upgrade SQL:
    ALTER EXTENSION pg_trickle UPDATE;
    
    This applies migration scripts (e.g., pg_trickle--0.2.1--0.2.2.sql) that update catalog tables, add new functions, and migrate data as needed.
  3. Restart PostgreSQL if the shared library changed (required for shared_preload_libraries changes).
  4. Verify:
    SELECT pgtrickle.version();
    

Zero-downtime upgrades are possible for minor versions (patch releases) that don't change the shared library. Just run ALTER EXTENSION pg_trickle UPDATE — no restart needed.

For detailed instructions, version-specific notes, rollback procedures, and troubleshooting, see the full Upgrading Guide.

How do I know if my shared library and SQL extension versions match?

The background worker checks for version mismatches at startup and logs a WARNING if the compiled .so version differs from the installed SQL extension version. You can also check manually:

-- Compiled .so version:
SELECT pgtrickle.version();

-- Installed SQL extension version:
SELECT extversion FROM pg_extension WHERE extname = 'pg_trickle';

If these differ, run ALTER EXTENSION pg_trickle UPDATE; and restart PostgreSQL if prompted.

Are stream tables preserved during an upgrade?

Yes. ALTER EXTENSION pg_trickle UPDATE applies only additive schema migrations (new columns, updated function signatures). Existing stream tables, their data, refresh history, and CDC infrastructure are preserved. The scheduler resumes normal operation after the upgrade completes.

For version-specific migration notes, see the Upgrading Guide — Version-Specific Notes.

What happens to stream tables during a PostgreSQL restart?

During a restart:

  1. The scheduler stops. No refreshes occur while PostgreSQL is down.
  2. CDC triggers are inactive. Source table writes during the restart window are captured when PostgreSQL comes back up (triggers are persistent DDL objects).
  3. On startup, the scheduler background worker starts, reads the catalog, rebuilds the DAG, and resumes refresh cycles from where it left off.
  4. Frontier reconciliation. The scheduler detects any gap between the stored frontier LSN and the current WAL position. Source changes that occurred between the last successful refresh and the restart are in the change buffers (for trigger-mode CDC) and will be processed in the first refresh cycle.

Net effect: Stream tables may be stale for the duration of the downtime, but no data is lost. The first refresh cycle after restart catches up automatically.

Can I use pg_trickle on a read replica / standby?

The scheduler does not run on standby servers. When pg_trickle detects it is running in recovery mode (pg_is_in_recovery() = true), the background worker enters a sleep loop and does not attempt any refreshes.

However, stream tables replicated from the primary are readable on the standby — they are regular heap tables and are replicated via physical (streaming) replication like any other table.

Pattern for read-heavy workloads:

  • Run pg_trickle on the primary — it performs all refreshes.
  • Query stream tables on the standby — read replicas get the latest refreshed data via streaming replication, with replication lag as the only additional delay.

How does pg_trickle work with CloudNativePG / Kubernetes?

pg_trickle is compatible with CloudNativePG. The cnpg/ directory in the repository contains example manifests:

  • Dockerfile.ext — builds a PostgreSQL image with pg_trickle pre-installed
  • cluster-example.yaml — CloudNativePG Cluster manifest with shared_preload_libraries = 'pg_trickle'

Key considerations:

  • Include pg_trickle in shared_preload_libraries in the Cluster's postgresql configuration.
  • The scheduler runs on the primary pod only. Replica pods detect recovery mode and sleep.
  • Pod restarts are handled the same way as PostgreSQL restarts (see above).
  • Persistent volume claims preserve catalog and change buffers across pod rescheduling.

Does pg_trickle work with partitioned source tables?

Yes. pg_trickle installs CDC triggers on the partitioned parent table, which PostgreSQL automatically propagates to all existing and future partitions. When a row is inserted into any partition, the trigger fires and writes the change to the buffer table.

Caveats:

  • TRUNCATE on individual partitions fires the partition-level trigger, which is also captured.
  • Attaching or detaching partitions (ALTER TABLE ... ATTACH/DETACH PARTITION) fires DDL event triggers, which may mark the stream table for reinitialization.
  • Row movement between partitions (when the partition key is updated) is captured as a DELETE from the old partition and an INSERT into the new partition.

Can I run pg_trickle in multiple databases on the same cluster?

Yes. Each database gets its own independent scheduler background worker, its own catalog tables, and its own change buffers. Stream tables in different databases do not interact.

Resource planning: Each database with stream tables requires 1 background worker slot in max_worker_processes. If you have many databases, the default of 8 is easily exhausted.

Important: When max_worker_processes is exhausted, the launcher silently skips databases it cannot spawn a scheduler for and retries every 5 minutes. This means stream tables in those databases stop refreshing with no visible error — they just go stale. Check the PostgreSQL log for:

WARNING:  pg_trickle launcher: could not spawn scheduler for database 'mydb'

If you see this, increase max_worker_processes and restart PostgreSQL.

See How do I size max_worker_processes? for the full formula.

-- On each database where you want pg_trickle:
CREATE EXTENSION pg_trickle;

The extension must be created separately in each database — shared_preload_libraries loads the shared library cluster-wide, but the SQL objects (catalog tables, functions) are per-database.


Monitoring & Alerting

pg_trickle provides built-in monitoring views and NOTIFY-based alerting. This section explains the available views, alert events, and failure handling.

How do I list all stream tables in my database?

Several options depending on how much detail you need:

-- Quickest: name + status + mode + staleness
SELECT name, status, refresh_mode, is_populated, staleness
FROM pgtrickle.stream_tables_info;

-- Full stats: refresh counts, rows inserted/deleted, avg duration, error streaks
SELECT * FROM pgtrickle.pg_stat_stream_tables;

-- Live status including consecutive_errors and data_timestamp
SELECT * FROM pgtrickle.pgt_status();

-- Raw catalog (all persisted properties, no computed fields)
SELECT * FROM pgtrickle.pgt_stream_tables;

How do I inspect what pg_trickle is doing right now?

Quick status snapshot:

SELECT name, status, refresh_mode, consecutive_errors, staleness
FROM pgtrickle.pgt_status();

Deep dive into a specific stream table — shows the defining query, DVM operator tree, source tables, generated delta SQL, and current WAL frontier:

SELECT * FROM pgtrickle.explain_st('my_table');

Key properties returned:

PropertyDescription
dvm_supportedWhether differential maintenance is possible for this query
operator_treeHow the DVM engine has decomposed the query
delta_queryThe actual SQL executed during a differential refresh
frontierPer-source LSN positions flushed at last refresh

Recent refresh activity:

-- Last 10 refreshes for a stream table (action, status, rows, duration):
SELECT * FROM pgtrickle.get_refresh_history('my_table', 10);

-- Aggregate refresh stats for all stream tables:
SELECT * FROM pgtrickle.st_refresh_stats();

CDC and slot health:

-- Per-source CDC mode, WAL lag, and alerts:
SELECT * FROM pgtrickle.check_cdc_health();

-- Replication slot health (slot_name, active, lag_bytes):
SELECT * FROM pgtrickle.slot_health();

Real-time event stream:

LISTEN pg_trickle_alert;
-- Receives JSON payloads for: stale_data, auto_suspended, resumed,
-- reinitialize_needed, buffer_growth_warning, refresh_completed, refresh_failed

Pending change buffers (rows not yet consumed by a differential refresh):

SELECT stream_table, source_table, cdc_mode, pending_rows, buffer_bytes
FROM pgtrickle.change_buffer_sizes()
ORDER BY pending_rows DESC;

Are there convenience functions for inspecting source tables and CDC buffers?

Yes. pg_trickle provides two functions added to complement the existing monitoring suite:

pgtrickle.list_sources(name) — shows every source table a stream table depends on, the CDC mode each uses, and any column-level usage metadata:

SELECT * FROM pgtrickle.list_sources('order_totals');
-- Returns: source_table, source_oid, source_type, cdc_mode, columns_used

pgtrickle.change_buffer_sizes() — shows, for every tracked source table, how many CDC rows are pending (not yet consumed by a differential refresh) and the estimated on-disk size of the change buffer:

SELECT * FROM pgtrickle.change_buffer_sizes()
ORDER BY pending_rows DESC;
-- Returns: stream_table, source_table, source_oid, cdc_mode, pending_rows, buffer_bytes

A large pending_rows value for a source table means a differential refresh is overdue or stalled — use pgtrickle.get_refresh_history() to investigate.

Can I see a tree view of all stream table dependencies?

Yes. pgtrickle.dependency_tree() walks the dependency DAG and renders it as an indented ASCII tree:

SELECT tree_line, status, refresh_mode
FROM pgtrickle.dependency_tree();

Example output:

tree_line                                 | status | refresh_mode
------------------------------------------+--------+--------------
report_summary                            | ACTIVE | DIFFERENTIAL
├── orders_by_region                      | ACTIVE | DIFFERENTIAL
│   ├── public.orders [src]              |        |
│   └── public.customers [src]           |        |
└── revenue_totals                        | ACTIVE | DIFFERENTIAL
    └── public.orders [src]              |        |

Each row has node (qualified name), node_type (stream_table or source_table), depth, status, and refresh_mode. Source tables are shown as leaves tagged with [src].

What monitoring views are available?

ViewDescription
pgtrickle.stream_tables_infoStatus overview with computed staleness
pgtrickle.pg_stat_stream_tablesComprehensive stats (refresh counts, avg duration, error streaks)

How do I get alerted when something goes wrong?

pg_trickle sends PostgreSQL NOTIFY messages on the pg_trickle_alert channel with JSON payloads:

EventWhen
stale_dataStaleness exceeds 2× the schedule
auto_suspendedStream table suspended after max consecutive errors
reinitialize_neededUpstream DDL change detected
buffer_growth_warningChange buffer growing unexpectedly
refresh_completedRefresh completed successfully
refresh_failedRefresh failed

Listen with:

LISTEN pg_trickle_alert;

What happens when a stream table keeps failing?

After pg_trickle.max_consecutive_errors (default: 3) consecutive failures, the stream table moves to ERROR status and automatic refreshes stop. An auto_suspended NOTIFY alert is sent.

To recover:

-- Fix the underlying issue (e.g., restore a dropped source table), then:
SELECT pgtrickle.alter_stream_table('my_table', status => 'ACTIVE');

Retries use exponential backoff (base 1s, max 60s, ±25% jitter, up to 5 retries before counting as a real failure).


Configuration Reference

All pg_trickle settings are configured via PostgreSQL GUC parameters. The table below lists the most-used parameters; for the complete reference see CONFIGURATION.md.

GUCTypeDefaultDescription
pg_trickle.enabledbooltrueEnable/disable the scheduler. Manual refreshes still work when false.
pg_trickle.scheduler_interval_msint1000Scheduler wake interval in milliseconds (100–60000)
pg_trickle.min_schedule_secondsint60Minimum allowed schedule duration (1–86400)
pg_trickle.max_consecutive_errorsint3Failures before auto-suspending (1–100)
pg_trickle.change_buffer_schematextpgtrickle_changesSchema for CDC buffer tables
pg_trickle.max_concurrent_refreshesint4Max parallel refresh workers (1–32)
pg_trickle.user_triggerstextautoUser trigger handling: auto (detect), off (suppress), on (deprecated alias for auto)
pg_trickle.differential_max_change_ratiofloat0.15Change ratio threshold for adaptive FULL fallback (0.0–1.0)
pg_trickle.cleanup_use_truncatebooltrueUse TRUNCATE instead of DELETE for buffer cleanup

All GUCs are SUSET context (superuser SET) and take effect without restart, except shared_preload_libraries which requires a PostgreSQL restart.

For SQL function signatures and descriptions see SQL_REFERENCE.md.


Troubleshooting

This section covers common problems and how to debug them. If your issue isn’t listed here, check the refresh history for error messages and the monitoring views for status information.

How do I diagnose stalled data flow through stream tables?

See also: Error Reference — comprehensive guide to all pg_trickle error variants with causes and fixes.

If data seems to have stopped flowing -- stream tables show stale results despite DML on the source tables -- follow this systematic diagnostic workflow. Each step narrows the problem from broad health checks down to specific root causes.

Step 0 -- Verify GUC configuration:

Misconfigured GUCs are a common and easy-to-miss cause of stalled or severely throttled data flow. Check all pg_trickle settings in one query:

SELECT name, setting, unit
FROM pg_settings
WHERE name LIKE 'pg_trickle.%'
   OR name = 'max_worker_processes'
ORDER BY name;

Key values to check:

GUCSafe valueProblem if set to
pg_trickle.enabledonoff -- stops all automatic refreshes
pg_trickle.tiered_schedulingon (fine)on with all STs at tier = 'frozen' -- silently skips them
pg_trickle.max_consecutive_errors3-101 -- one transient error suspends the ST permanently
pg_trickle.scheduler_interval_ms100-1000Very high (e.g. 60000) -- scheduler only wakes every 60 s
pg_trickle.auto_backoffonFine normally, but if refreshes take >95% of schedule it silently stretches intervals up to 8x
pg_trickle.default_schedule_seconds1-60Very high -- isolated CALCULATED tables refresh very infrequently
max_worker_processes>= 16 (typical)Too low -- workers cannot be spawned; parallel mode silently stalls

Also check whether any stream tables are frozen:

SELECT pgt_name, refresh_tier
FROM pgtrickle.pgt_stream_tables
WHERE refresh_tier = 'frozen';

Step 1 -- Quick health overview:

SELECT * FROM pgtrickle.health_check() WHERE severity != 'OK';

This single call checks scheduler status, error tables, stale tables, buffer growth, replication slot lag, and the worker pool. Any WARN or ERROR row tells you where to look next.

Step 2 -- Check stream table status and staleness:

SELECT name, status, refresh_mode, consecutive_errors, staleness
FROM pgtrickle.pgt_status()
ORDER BY staleness DESC NULLS FIRST;

Look for SUSPENDED status (auto-suspended after repeated errors), high consecutive_errors, or unexpectedly large staleness.

Step 3 -- Check recent refresh activity:

SELECT start_time, stream_table, action, status, duration_ms, error_message
FROM pgtrickle.refresh_timeline(20);

If no recent rows appear, the scheduler may not be running. If rows show ERROR, the error messages explain why refreshes are failing.

Step 4 -- Inspect errors for a specific stream table:

SELECT * FROM pgtrickle.diagnose_errors('my_stream_table');

Returns the last 5 FAILED refresh events with error classification and suggested remediation steps.

Step 5 -- Check the CDC pipeline (are changes being captured?):

SELECT stream_table, source_table, cdc_mode, pending_rows, buffer_bytes
FROM pgtrickle.change_buffer_sizes()
ORDER BY pending_rows DESC;
  • pending_rows = 0 everywhere: either no DML is happening on the source tables, or CDC triggers are missing.
  • pending_rows growing but stream tables are not refreshing: scheduler or refresh problem (go back to Steps 1-3).

Step 6 -- Verify CDC triggers exist and are enabled:

SELECT source_table, trigger_type, trigger_name
FROM pgtrickle.trigger_inventory()
WHERE NOT present OR NOT enabled;

Any rows returned here mean change capture is broken for that source table -- DML changes are not being recorded.

Step 7 -- Check CDC slot health (WAL mode only):

SELECT * FROM pgtrickle.check_cdc_health();

Look for alert values like slot_lag_exceeds_threshold or replication_slot_missing.

Step 8 -- Verify the dependency DAG:

SELECT tree_line, status, refresh_mode
FROM pgtrickle.dependency_tree();

Confirms the dependency graph is wired as expected. A missing edge means upstream changes will not propagate to downstream stream tables.

Step 9 -- Check the parallel worker pool (if using parallel mode):

SELECT * FROM pgtrickle.worker_pool_status();

SELECT job_id, unit_key, status, duration_ms
FROM pgtrickle.parallel_job_status(300)
WHERE status NOT IN ('SUCCEEDED');

Common root causes at a glance:

SymptomDiagnostic functionLikely root cause
No refreshes happening at allhealth_check -> scheduler_runningBackground worker not running or pg_trickle.enabled = off
Stream table in SUSPENDED statuspgt_statusRepeated errors hit max_consecutive_errors threshold
Zero pending changes despite DMLtrigger_inventoryCDC trigger was dropped or disabled by DDL
WAL slot missing or laggingcheck_cdc_health, slot_healthReplication slot dropped, or WAL retention exceeded
Buffers growing but no refresheschange_buffer_sizes + refresh_timelineScheduler stalled, refresh failing, or lock contention
Upstream changes not propagatingdependency_treeUpstream ST not connected in the DAG

Unit tests crash with symbol not found in flat namespace on macOS 26+

macOS 26 (Tahoe) changed the dynamic linker (dyld) to eagerly resolve all flat-namespace symbols at binary load time. pgrx extensions link PostgreSQL server symbols (e.g. CacheMemoryContext, SPI_connect) with -Wl,-undefined,dynamic_lookup, which previously resolved lazily. Since cargo test --lib runs outside the postgres process, those symbols are missing and the test binary aborts:

dyld[66617]: symbol not found in flat namespace '_CacheMemoryContext'

Use just test-unit — it automatically detects macOS 26+ and injects a stub library (libpg_stub.dylib) via DYLD_INSERT_LIBRARIES. The stub provides NULL/no-op definitions for the ~28 PostgreSQL symbols; they are never called during unit tests (pure Rust logic only).

This does not affect integration tests, E2E tests, just lint, just build, or the extension running inside PostgreSQL.

See the Installation Guide for details and manual usage.

My stream table is stuck in INITIALIZING status

The initial full refresh may have failed. Check:

SELECT * FROM pgtrickle.get_refresh_history('my_table', 5);

If the error is transient, retry with:

SELECT pgtrickle.refresh_stream_table('my_table');

My stream table shows stale data but the scheduler is running

Common causes:

  1. TRUNCATE on source table — bypasses CDC triggers. Manual refresh needed.
  2. Too many errors — check consecutive_errors in pgtrickle.pg_stat_stream_tables. Resume with ALTER ... status => 'ACTIVE'.
  3. Long-running refresh — check for lock contention or slow defining queries.
  4. Scheduler disabled — verify SHOW pg_trickle.enabled; returns on.

I get "cycle detected" when creating a stream table

Stream tables cannot have circular dependencies. If stream table A depends on stream table B and B depends on A (either directly or through a chain of intermediate stream tables), pg_trickle rejects the creation with a clear error message listing the cycle path.

To resolve this, restructure your queries to eliminate the circular reference. Common patterns:

  • Extract the shared logic into a single base stream table that both A and B reference.
  • Use a regular view instead of a stream table for one side of the dependency.
  • Merge the two queries into a single stream table if possible.

A source table was altered and my stream table stopped refreshing

pg_trickle detects DDL changes (column additions, drops, type changes) via event triggers and marks affected stream tables with needs_reinit = true. The next scheduler cycle will attempt to reinitialize the stream table — drop the storage table, recreate it from the current defining query schema, and perform a full refresh.

If the schema change breaks the defining query (e.g., a column referenced in the query was dropped or renamed), the reinitialization will fail repeatedly until the stream table hits max_consecutive_errors and enters ERROR status.

To fix it: Update the defining query and recreate the stream table:

SELECT pgtrickle.drop_stream_table('order_totals');
SELECT pgtrickle.create_stream_table(
    name         => 'order_totals',
    query        => 'SELECT id, name FROM orders',  -- updated query reflecting new schema
    schedule     => '1m',
    refresh_mode => 'DIFFERENTIAL'
);

Check the refresh history for the specific error message:

SELECT * FROM pgtrickle.get_refresh_history('order_totals', 5);

How do I see the delta query generated for a stream table?

SELECT pgtrickle.explain_st('order_totals');

This shows the DVM operator tree, source tables, and the generated delta SQL.

How do I interpret the refresh history?

The pgtrickle.get_refresh_history() function returns the most recent refresh records for a stream table:

SELECT * FROM pgtrickle.get_refresh_history('order_totals', 10);

Key columns:

ColumnMeaning
actionRefresh type: FULL, DIFFERENTIAL, TOPK, IMMEDIATE, or REINITIALIZE
rows_insertedRows added to the stream table in this cycle
rows_deletedRows removed from the stream table in this cycle
rows_updatedRows modified in the stream table (for explicit DML path)
duration_msWall-clock time for the refresh
error_messageNULL for success; error text for failures
source_changesNumber of pending change records processed
fallback_reasonIf DIFFERENTIAL fell back to FULL: change_ratio_exceeded, truncate_detected, or reinitialize

Patterns to look for:

  • High rows_inserted + rows_deleted with low source_changes → possible duplicate rows (keyless source tables)
  • fallback_reason = 'change_ratio_exceeded' frequently → consider lowering the threshold or switching to FULL mode
  • Increasing duration_ms over time → index maintenance or buffer bloat; consider VACUUM or checking for missing indexes

How can I tell if the scheduler is running?

Several ways to verify:

1. Check the background worker:

SELECT pid, datname, backend_type, state, query
FROM pg_stat_activity
WHERE backend_type = 'pg_trickle scheduler';

If no rows are returned, the scheduler is not running. Common causes:

  • pg_trickle.enabled = false
  • Extension not in shared_preload_libraries
  • max_worker_processes exhausted — the launcher silently skips databases it cannot accommodate and retries every 5 minutes. Check the PostgreSQL log for WARNING: pg_trickle launcher: could not spawn scheduler for database '...'.

2. Check recent refresh activity:

SELECT MAX(refreshed_at) AS last_refresh
FROM pgtrickle.pgt_stream_tables
WHERE status = 'ACTIVE';

If the last refresh was long ago relative to the shortest schedule, the scheduler may be stuck.

3. Check PostgreSQL logs: The scheduler logs startup and shutdown messages at LOG level:

LOG:  pg_trickle scheduler started for database "mydb"
LOG:  pg_trickle scheduler shutting down (SIGTERM)

How do I debug a stream table that shows stale data?

Follow this diagnostic checklist:

  1. Is the scheduler running? (See above)
  2. Is the stream table active?
    SELECT pgt_name, status, consecutive_errors FROM pgtrickle.pg_stat_stream_tables
    WHERE pgt_name = 'my_st';
    
    If status is ERROR or SUSPENDED, the stream table has been auto-suspended after repeated failures.
  3. Are there pending changes?
    SELECT COUNT(*) FROM pgtrickle_changes.changes_<source_oid>;
    
    If zero, the source table may not have CDC triggers (check SELECT tgname FROM pg_trigger WHERE tgrelid = '<source_oid>').
  4. Is the refresh failing silently?
    SELECT * FROM pgtrickle.get_refresh_history('my_st', 5);
    
    Check for error messages.
  5. Is there lock contention? Long-running transactions holding locks on the source or stream table can block refreshes. Check pg_locks and pg_stat_activity.

What does the needs_reinit flag mean and how do I clear it?

The needs_reinit flag in pgtrickle.pgt_stream_tables indicates that the stream table's physical storage needs to be rebuilt — typically because a source table's schema changed.

When needs_reinit = true:

  • The scheduler skips normal differential/full refresh.
  • Instead, it performs a reinitialize: drop the storage table, recreate it from the current defining query schema, and populate with a full refresh.
  • If reinitialization succeeds, needs_reinit is cleared automatically.

If reinitialization keeps failing (e.g., the defining query references a dropped column):

-- Fix the underlying issue first, then clear manually:
UPDATE pgtrickle.pgt_stream_tables SET needs_reinit = false WHERE pgt_name = 'my_st';
-- Or drop and recreate:
SELECT pgtrickle.drop_stream_table('my_st');
SELECT pgtrickle.create_stream_table(
    name         => 'my_st',
    query        => 'SELECT ...',
    schedule     => '1m',
    refresh_mode => 'DIFFERENTIAL'
);

Why Are These SQL Features Not Supported?

This section gives detailed technical explanations for each SQL limitation. pg_trickle follows the principle of "fail loudly rather than produce wrong data" — every unsupported feature is detected at stream-table creation time and rejected with a clear error message and a suggested rewrite.

For all of these, returning an explicit error is a deliberate design choice: the alternative would be silently producing incorrect results after a refresh, which is far harder to diagnose.

How does NATURAL JOIN work?

NATURAL JOIN is fully supported. At parse time, pg_trickle resolves the common columns between the two tables (using OpTree::output_columns()) and synthesizes explicit equi-join conditions. This supports INNER, LEFT, RIGHT, and FULL NATURAL JOIN variants.

Internally, NATURAL JOIN is converted to an explicit JOIN ... ON before the DVM engine builds its operator tree, so delta computation works identically to a manually specified equi-join.

Note: The internal __pgt_row_id column is excluded from common column resolution, so NATURAL JOINs between stream tables work correctly.

How do GROUPING SETS, CUBE, and ROLLUP work?

GROUPING SETS, CUBE, and ROLLUP are fully supported via an automatic parse-time rewrite. pg_trickle decomposes these constructs into a UNION ALL of separate GROUP BY queries before the DVM engine processes the query.

Explosion guard: CUBE(N) generates $2^N$ branches. pg_trickle rejects CUBE/ROLLUP combinations that would produce more than 64 branches to prevent runaway memory usage. Use explicit GROUPING SETS(...) instead.

For example:

-- This defining query:
SELECT dept, region, SUM(amount) FROM sales GROUP BY CUBE(dept, region)

-- Is automatically rewritten to:
SELECT dept, region, SUM(amount) FROM sales GROUP BY dept, region
UNION ALL
SELECT dept, NULL::text, SUM(amount) FROM sales GROUP BY dept
UNION ALL
SELECT NULL::text, region, SUM(amount) FROM sales GROUP BY region
UNION ALL
SELECT NULL::text, NULL::text, SUM(amount) FROM sales

GROUPING() function calls are replaced with integer literal constants corresponding to the grouping level. The rewrite is transparent — the DVM engine sees only standard GROUP BY + UNION ALL operators and can apply incremental delta computation to each branch independently.

How does DISTINCT ON (…) work?

DISTINCT ON is fully supported via an automatic parse-time rewrite. pg_trickle transparently transforms DISTINCT ON into a ROW_NUMBER() window function subquery:

-- This defining query:
SELECT DISTINCT ON (dept) dept, employee, salary
FROM employees ORDER BY dept, salary DESC

-- Is automatically rewritten to:
SELECT dept, employee, salary FROM (
    SELECT dept, employee, salary,
           ROW_NUMBER() OVER (PARTITION BY dept ORDER BY salary DESC) AS rn
    FROM employees
) sub WHERE rn = 1

The rewrite happens before DVM parsing, so the operator tree sees a standard window function query and can apply partition-based recomputation for incremental delta maintenance.

Why is TABLESAMPLE rejected?

TABLESAMPLE returns a random subset of rows from a table (e.g., FROM orders TABLESAMPLE BERNOULLI(10) gives ~10% of rows).

Stream tables materialize the complete result set of the defining query and keep it up-to-date across refreshes. Baking a random sample into the defining query is not meaningful because:

  1. Non-determinism. Each refresh would sample different rows, making the stream table contents unstable and unpredictable. The delta between refreshes would be dominated by sampling noise, not actual data changes.

  2. CDC incompatibility. The trigger-based change-capture system tracks specific row-level changes (inserts, updates, deletes). A TABLESAMPLE defining query has no stable row identity — the "changed rows" concept doesn't apply when the entire sample shifts each cycle.

Rewrite:

-- Instead of sampling in the defining query:
SELECT * FROM orders TABLESAMPLE BERNOULLI(10)

-- Materialize the full result and sample when querying:
SELECT * FROM order_stream_table WHERE random() < 0.1

Why is LIMIT / OFFSET rejected?

Stream tables materialize the complete result set and keep it synchronized with source data. Bare LIMIT/OFFSET (without a recognized pattern) would truncate the result:

  1. Undefined ordering. LIMIT without ORDER BY returns an arbitrary subset.

  2. Delta instability. When source rows change, the boundary between "in the LIMIT" and "out of the LIMIT" shifts. A single INSERT could evict one row and admit another, requiring the refresh to track the full ordered position of every row.

  3. Semantic mismatch. Users who write LIMIT 100 typically want to limit what they read, not what is stored.

Exception — TopK pattern: When the defining query has a top-level ORDER BY … LIMIT N (constant integer, optionally with OFFSET M), pg_trickle recognizes this as a TopK query and accepts it. The ORDER BY clause is required — bare LIMIT without ORDER BY is always rejected because it selects an arbitrary subset. With ORDER BY, the top-N boundary is well-defined and the stream table stores exactly those N rows (starting from position M+1 if OFFSET is specified). See the TopK section for details.

Rewrite (when TopK doesn't apply):

-- Instead of:
'SELECT * FROM orders ORDER BY created_at DESC LIMIT 100'

-- Omit LIMIT from the defining query, apply when reading:
SELECT * FROM orders_stream_table ORDER BY created_at DESC LIMIT 100

Why are window functions in expressions rejected?

Window functions like ROW_NUMBER() OVER (…) are supported as standalone columns in stream tables. However, embedding a window function inside an expression — such as CASE WHEN ROW_NUMBER() OVER (...) = 1 THEN ... or SUM(x) OVER (...) + 1 — is rejected.

This restriction exists because:

  1. Partition-based recomputation. pg_trickle's differential mode handles window functions by recomputing entire partitions that were affected by changes. When a window function is buried inside an expression, the DVM engine cannot isolate the window computation from the surrounding expression, making it impossible to correctly identify which partitions to recompute.

  2. Expression tree ambiguity. The DVM parser would need to differentiate the outer expression (arithmetic, CASE, etc.) while treating the inner window function specially. This creates a combinatorial explosion of differentiation rules for every possible expression type × window function combination.

Rewrite:

-- Instead of:
SELECT id, CASE WHEN ROW_NUMBER() OVER (PARTITION BY dept ORDER BY salary DESC) = 1
                THEN 'top' ELSE 'other' END AS rank_label
FROM employees

-- Move window function to a separate column, then use a wrapping stream table:
-- ST1:
SELECT id, dept, salary,
       ROW_NUMBER() OVER (PARTITION BY dept ORDER BY salary DESC) AS rn
FROM employees

-- ST2 (references ST1):
SELECT id, CASE WHEN rn = 1 THEN 'top' ELSE 'other' END AS rank_label
FROM pgtrickle.employees_ranked

Why is FOR UPDATE / FOR SHARE rejected?

FOR UPDATE and related locking clauses (FOR SHARE, FOR NO KEY UPDATE, FOR KEY SHARE) acquire row-level locks on selected rows. This is incompatible with stream tables because:

  1. Refresh semantics. Stream table contents are managed by the refresh engine using bulk MERGE operations. Row-level locks taken during the defining query would conflict with the refresh engine's own locking strategy.

  2. No direct DML. Since users cannot directly modify stream table rows, there is no use case for locking rows inside the defining query. The locks would be held for the duration of the refresh transaction and then released, serving no purpose.

How does ALL (subquery) work?

ALL (subquery) comparisons (e.g., WHERE x > ALL (SELECT y FROM t)) are supported via an automatic rewrite to NOT EXISTS. For example, x > ALL (SELECT y FROM t) is rewritten to NOT EXISTS (SELECT 1 FROM t WHERE y >= x), which pg_trickle handles via its anti-join operator.

Why is ORDER BY silently discarded?

ORDER BY in the defining query is accepted but ignored. This is consistent with how PostgreSQL treats CREATE MATERIALIZED VIEW AS SELECT ... ORDER BY ... — the ordering is not preserved in the stored data.

Stream tables are heap tables with no guaranteed row order. The ORDER BY in the defining query would only affect the order of the initial INSERT, which has no lasting effect. Apply ordering when querying the stream table:

-- This ORDER BY is meaningless in the defining query:
'SELECT region, SUM(amount) FROM orders GROUP BY region ORDER BY total DESC'

-- Instead, order when reading:
SELECT * FROM regional_totals ORDER BY total DESC

Why are unsupported aggregates (CORR, COVAR_*, REGR_*) limited to FULL mode?

Regression aggregates like CORR, COVAR_POP, COVAR_SAMP, and the REGR_* family require maintaining running sums of products and squares across the entire group. Unlike COUNT/SUM/AVG (where deltas can be computed from the change alone) or group-rescan aggregates (where only affected groups are re-read), regression aggregates:

  1. Lack algebraic delta rules. There is no closed-form way to update a correlation coefficient from a single row change without access to the full group's data.

  2. Would degrade to group-rescan anyway. Even if supported, the implementation would need to rescan the full group from source — identical to FULL mode for most practical group sizes.

These aggregates work fine in FULL refresh mode, which re-runs the entire query from scratch each cycle.


Why Are These Stream Table Operations Restricted?

Stream tables are regular PostgreSQL heap tables under the hood, but their contents are managed exclusively by the refresh engine. This section explains why certain operations that work on ordinary tables are disallowed or unsupported on stream tables, and what to do instead.

Why can't I INSERT, UPDATE, or DELETE rows in a stream table?

Stream table contents are the output of the refresh engine — they represent the materialized result of the defining query at a specific point in time. Direct DML would corrupt this contract in several ways:

  1. Row ID integrity. Every row has a __pgt_row_id (a 64-bit xxHash of the group-by key or all columns). The refresh engine uses this for delta MERGE — matching incoming deltas against existing rows. A manually inserted row with an incorrect or duplicate __pgt_row_id would cause the next differential refresh to produce wrong results (double-counting, missed deletes, or merge conflicts).

  2. Frontier inconsistency. Each refresh records a frontier — a set of per-source LSN positions that represent "data up to this point has been materialized." A manual DML change is not tracked by any frontier. The next differential refresh would either overwrite the change (if the delta touches the same row) or leave the stream table in a state that doesn't match any consistent point-in-time snapshot of the source data.

  3. Change buffer desync. The CDC triggers on source tables write changes to buffer tables. The refresh engine reads these buffers and advances the frontier. Manual DML on the stream table bypasses this pipeline entirely — the buffer and frontier have no record of the change, so future refreshes cannot account for it.

If you need to post-process stream table data, create a view or a second stream table that references the first one.

Why can't I add foreign keys to or from a stream table?

Foreign key constraints require that referenced/referencing rows exist at the time of each DML statement. The refresh engine violates this assumption:

  1. Bulk MERGE ordering. A differential refresh executes a single MERGE INTO statement that applies all deltas (inserts and deletes) atomically. PostgreSQL evaluates FK constraints row-by-row within this MERGE. If a parent row is deleted and a new parent inserted in the same delta batch, the child FK check may fail because it sees the delete before the insert — even though the final state would be consistent.

  2. Full refresh uses TRUNCATE + INSERT. In FULL mode, the refresh engine truncates the stream table and re-inserts all rows. TRUNCATE does not fire individual DELETE triggers and bypasses FK cascade logic, which would leave referencing tables with dangling references.

  3. Cross-table refresh ordering. If stream table A has an FK referencing stream table B, both tables refresh independently (in topological order, but in separate transactions). There is no guarantee that A's refresh sees B's latest data — the FK constraint could transiently fail between refreshes.

Workaround: Enforce referential integrity in the consuming application or use a view that joins the stream tables and validates the relationship.

How do user-defined triggers work on stream tables?

When a DIFFERENTIAL mode stream table has user-defined row-level triggers, the refresh engine uses explicit DML decomposition instead of MERGE:

  1. Delta materialized once. The delta query result is stored in a temporary table (__pgt_delta_<id>) to avoid evaluating it three times.

  2. DELETE removed rows. Rows in the stream table whose __pgt_row_id is absent from the delta are deleted. AFTER DELETE triggers fire with correct OLD values.

  3. UPDATE changed rows. Rows whose __pgt_row_id exists in both the stream table and delta but whose values differ (checked via IS DISTINCT FROM) are updated. AFTER UPDATE triggers fire with correct OLD and NEW. No-op updates (where values are identical) are skipped, preventing spurious triggers.

  4. INSERT new rows. Rows in the delta whose __pgt_row_id is absent from the stream table are inserted. AFTER INSERT triggers fire with correct NEW values.

FULL refresh behavior: Row-level user triggers are automatically suppressed during FULL refresh via DISABLE TRIGGER USER / ENABLE TRIGGER USER. A NOTIFY pgtrickle_refresh is emitted so listeners know a FULL refresh occurred. Use REFRESH MODE DIFFERENTIAL for stream tables that need per-row trigger semantics.

Performance: The explicit DML path adds ~25–60% overhead compared to MERGE for triggered stream tables. Stream tables without user triggers have zero overhead (only a fast pg_trigger check, <0.1 ms).

Control: The pg_trickle.user_triggers GUC controls this behavior:

  • auto (default): detect user triggers automatically
  • off: always use MERGE, suppressing triggers
  • on: deprecated compatibility alias for auto

Why can't I ALTER TABLE a stream table directly?

Stream table metadata (defining query, schedule, refresh mode) is stored in the pg_trickle catalog (pgtrickle.pgt_stream_tables). A direct ALTER TABLE would change the physical table without updating the catalog, causing:

  1. Column mismatch. If you add or remove columns, the refresh engine's cached delta query and MERGE statement would reference columns that no longer exist (or miss new ones), causing runtime errors.

  2. __pgt_row_id invalidation. The row ID hash is computed from the defining query's output columns. Altering the table schema without updating the defining query would make existing row IDs inconsistent with the new column set.

Use pgtrickle.alter_stream_table() to change schedule, refresh mode, or status. To change the defining query or column structure, drop and recreate the stream table.

Why can't I TRUNCATE a stream table?

TRUNCATE removes all rows instantly but does not update the pg_trickle frontier or change buffers. After a TRUNCATE:

  1. Differential refresh sees no changes. The frontier still records the last-processed LSN. No new source changes may have occurred, so the next differential refresh produces an empty delta — leaving the stream table empty even though the source still has data.

  2. No recovery path for differential mode. The refresh engine has no way to detect that the stream table was externally truncated. It assumes the current contents match the frontier.

Use pgtrickle.refresh_stream_table('my_table') to force a full re-materialization, or drop and recreate the stream table if you need a clean slate.

What are the memory limits for delta processing?

The differential refresh path executes the delta query as a single SQL statement. For large batches (e.g., a bulk UPDATE of 10M rows), PostgreSQL may attempt to materialize the entire delta result set in memory. If the delta exceeds work_mem, PostgreSQL will spill to temporary files on disk, which is slower but safe. In extreme cases, OOM (out of memory) can occur if work_mem is set very high and the delta is enormous.

Mitigations:

  1. Adaptive fallback. The pg_trickle.differential_max_change_ratio GUC (default 0.15) automatically triggers a FULL refresh when the ratio of pending changes to total rows exceeds the threshold. This prevents large deltas from consuming excessive memory.

  2. work_mem tuning. PostgreSQL's work_mem setting controls how much memory each sort/hash operation uses before spilling to disk. For pg_trickle workloads, 64–256 MB is typical. Monitor temp_blks_written in pg_stat_statements to detect spilling.

  3. pg_trickle.merge_work_mem_mb GUC. Sets a session-level work_mem override during the MERGE execution (default: 0 = use global work_mem). This allows higher memory for refresh without affecting other queries.

  4. Monitoring. If pg_stat_statements is installed, pg_trickle logs a warning when the MERGE query writes temporary blocks to disk.

Why are refreshes processed sequentially by default?

The default (parallel_refresh_mode = 'off') is sequential because it is simple, correct, and efficient for most workloads. Topological ordering guarantees upstream stream tables refresh before downstream ones with no coordination overhead.

When to consider enabling parallel refresh:

  • Your database has many independent stream tables (no shared dependencies).
  • Total cycle time = sum of all refresh durations and some refreshes are visibly blocking unrelated ones.
  • You have enough max_worker_processes headroom (each parallel worker uses one slot).

Enabling parallel refresh (v0.4.0+):

ALTER SYSTEM SET pg_trickle.parallel_refresh_mode = 'on';
SELECT pg_reload_conf();

With parallel_refresh_mode = 'on', the scheduler builds an execution-unit DAG and dispatches independent units to dynamic background workers. Atomic consistency groups and IMMEDIATE-trigger closures remain single-worker for correctness.

See CONFIGURATION.md — Parallel Refresh for tuning guidance including the worker-budget formula.

How many connections does pg_trickle use?

pg_trickle uses the following PostgreSQL connections:

ComponentConnectionsWhen
Background scheduler1Always (per database with STs)
WAL decoder polling0 (shared)Uses the scheduler's SPI connection
Manual refresh1Per-call (uses caller's session)

Total: 1 persistent connection per database. WAL decoder polling shares the scheduler's SPI connection rather than opening separate connections.

max_worker_processes: pg_trickle registers 1 background worker per database during _PG_init(). Ensure max_worker_processes (default 8) has room for the pg_trickle worker plus any other extensions.

Advisory locks: The scheduler holds a session-level advisory lock per actively-refreshing ST. These are released immediately after each refresh completes.


See also: Troubleshooting · Error Reference · Configuration · SQL Reference · Glossary

What's New

A curated, plain-language summary of recent pg_trickle releases — the bits a human reader actually wants to see. For the full exhaustive list of changes per release, see the Changelog.


v0.34 — Citus self-driving (April 2026)

The Citus integration grew up. The per-worker WAL slot lifecycle — creation, polling, lease management, recovery from rebalances — now runs automatically. There is no manual wiring left for distributed sources.

  • Per-worker slot lifecycle fully automated (CITUS)
  • Shard-rebalance auto-recovery
  • Worker failure isolation with retry budget

v0.33 — DAG observability + worker-pool quotas

  • Per-database worker quotas keep one busy database from starving the rest (SCALING)
  • New cluster-wide health view (MULTI_DATABASE)

v0.32 — Citus distributed sources & outputs

  • Stream tables can read from Citus-distributed source tables
  • output_distribution_column produces co-located distributed stream tables

v0.28 — Transactional Outbox & Inbox

  • First-class outbox and inbox patterns built on stream tables (OUTBOX · INBOX)
  • Consumer groups, lag tracking, and dead-letter queues out of the box

v0.27 — Snapshots & SLA-based scheduling

  • Snapshots of stream-table contents — point-in-time copies for backup, replica bootstrap, and rollback
  • recommend_schedule and predicted-SLA-breach alerts
  • PITR alignment guidance for replica bootstrap

v0.22 — Downstream Publications

  • Any stream table can be exposed as a PostgreSQL logical publication. Debezium, Kafka Connect, Spark Structured Streaming, a downstream PostgreSQL replica — all subscribe to pg_trickle's incrementally-computed diffs without extra pipelines (PUBLICATIONS)
  • set_stream_table_sla introduces freshness deadlines

v0.14 — AUTO mode by default + ergonomic warnings

  • refresh_mode = 'AUTO' is the new default
  • create_stream_table warns on common anti-patterns (low-cardinality aggregates, non-deterministic queries)

v0.13 — Delta SQL profiling

  • pgtrickle.explain_delta, dedup_stats, shared_buffer_stats — visibility into what the engine is actually doing per refresh

v0.12 — Tiered scheduling on by default

v0.10 — Production-readiness floor

  • Crash recovery, fuse circuit breaker, monitoring views, structured errors with SQLSTATE codes (ERRORS · TROUBLESHOOTING)

v0.9 — Algebraic aggregate maintenance

  • AVG, STDDEV, and COUNT(DISTINCT) maintained from auxiliary state — no group-rescan needed in the common case

v0.7 — Watermarks and circular DAGs

v0.6 — Partitioned source tables and idempotent DDL

  • Stream tables can now read from partitioned source tables; all partitions are tracked automatically without extra configuration
  • create_stream_table and drop_stream_table are idempotent — safe to call from migration scripts and IF NOT EXISTS guards
  • Circular dependency detection with a hard gate: cycles in the DAG raise a clear error with the offending chain listed

v0.5 — Row-level security and ETL bootstrap gating

  • RLS policies on source tables are respected during the defining query's first FULL refresh; incremental refreshes maintain the same visibility contract
  • ETL bootstrap gate: a stream table can be held in SUSPENDED state until an external ETL load completes, then released atomically
  • pgtrickle.pgt_status() view expanded with per-table health indicators

v0.4 — Parallel refresh

  • parallel_refresh_mode = 'on' dispatches independent stream tables across a worker pool (SCALING)

v0.3 — HAVING, FULL OUTER JOIN, and correlated subqueries

  • HAVING clauses are now maintained differentially — no more falling back to FULL refresh when a GROUP BY result is post-filtered
  • FULL OUTER JOIN supported in DIFFERENTIAL mode using an 8-part UNION ALL delta strategy
  • Correlated subqueries in the SELECT-list maintained with a pre/post snapshot EXCEPT ALL diff

v0.2 — IMMEDIATE mode + TopK

  • IMMEDIATE refresh mode: maintain stream tables inside the source DML's transaction
  • TopK stream tables: ORDER BY x LIMIT N
  • ALTER QUERY — change the defining query online

v0.1 — Differential foundation

  • Trigger-based CDC captures every INSERT, UPDATE, and DELETE into per-table change buffers within the source DML transaction — zero committed-change loss
  • Differential (incremental) and full refresh, with automatic fallback when a query is not IVM-eligible
  • Background scheduler with per-database workers
  • Initial monitoring views: pgt_stream_tables, pgt_refresh_history

See also: Changelog (full detail) · Roadmap (what's coming)

Viewing on GitHub? The rendered changelog lives in CHANGELOG.md. This stub is served by the pg_trickle docs site — the include below renders there.

Changelog

What's new in pg_trickle — written for everyone, not just developers.

For future plans and upcoming features, see ROADMAP.md.

Table of Contents


[0.59.0] — Performance & Observability

What's New

v0.59.0 delivers seven hot-path performance improvements and six new observability features. No behaviour-visible SQL API changes are made; the only schema change is a new defining_query_hash catalog column used internally.

PERF-1: Batched CDC Buffer-Growth Monitoring

check_change_buffer_sizes() previously issued one SELECT count(*) SPI call per source table, proportional to the number of CDC-enabled stream tables. It now builds a single UNION ALL query and executes it in one SPI round-trip, reducing latency and lock overhead for deployments with many stream tables.

PERF-2: Defining-Query Hash Cached in Catalog

A new defining_query_hash BIGINT column on pgtrickle.pgt_stream_tables caches the Rust DefaultHasher digest of each stream table's defining query. Refresh cycles skip recomputing the hash; any ALTER that changes the query updates it atomically in the same SPI transaction.

PERF-3: Arc Shared Templates

All eight SQL template fields inside CachedMergeTemplate were changed from String to Arc<str>. Cache reads now clone a reference-counted pointer instead of copying the string data, reducing heap allocations on every cache hit.

PERF-4: Single MERGE_TEMPLATE_CACHE Borrow

The two consecutive MERGE_TEMPLATE_CACHE.with() calls that were needed to check both the non_monotonic flag and the is_deduplicated flag have been merged into a single peek() call that returns both values in one borrow, halving the thread-local lock traffic on the hot cache-hit path.

PERF-5: WAL Decoder UPDATE Vec Pre-Allocation

The five Vec accumulators in the WAL decoder's UPDATE-row handler now call Vec::with_capacity(num_columns) up front, eliminating the incremental reallocations that previously occurred for each column.

PERF-6: Frontier Borrow Instead of Clone

has_stream_table_source_changes() cloned the entire Frontier (a HashMap<Oid, Lsn>) when no frontier was stored yet. It now borrows a static empty Frontier via Frontier::empty_ref(), avoiding the allocation on every scheduler tick for stream tables with no CDC sources.

PERF-7: Diamond Detection Short-Circuit

detect_diamonds() in the DAG module now performs a lazy .next().is_some() intersection check before collecting the full shared-ancestor list. Branches that share no ancestors — the common case — exit immediately without allocating the result Vec.

OBS-1: CDC Lag Percentile Metrics

A ring-buffer sampler (CdcLagSampler, 256 slots, protected by PgLwLock) records CDC-to-refresh lag in milliseconds. Three new Prometheus gauges expose rolling percentiles: pg_trickle_cdc_lag_p50_seconds, pg_trickle_cdc_lag_p95_seconds, and pg_trickle_cdc_lag_p99_seconds.

OBS-2: Parallel Worker Utilisation Metrics

Two new counters make pool-worker pressure visible:

  • pg_trickle_parallel_queue_depth — jobs currently waiting for a free worker
  • pg_trickle_worker_idle_time_seconds_total — cumulative idle time across all workers

OBS-3: WAL Decoder Pending-Record Gauge

pg_trickle_wal_decoder_pending_records reports the number of logical-replication records buffered in the last WAL poll that have not yet been written to the CDC change buffer, useful for detecting WAL consumer backpressure.

OBS-4: Refresh Mode Ratio Counters

pg_trickle_refresh_mode_total{mode="differential"} and pg_trickle_refresh_mode_total{mode="full"} count every refresh cycle by mode. The ratio surfaces differential-to-full degradation before it impacts latency.

OBS-5: pg_stat_activity Application Names

Every background-worker connection now sets application_name immediately after connecting to SPI, making pg_trickle workers trivially identifiable in pg_stat_activity:

Connectionapplication_name
Database-discovery launcherpg_trickle_launcher
Per-database schedulerpg_trickle_scheduler
Parallel refresh pool worker (N)pg_trickle_pool_N
Parallel refresh dispatcherpg_trickle_dispatcher

OBS-6: Backup & Restore Documentation

INSTALL.md now includes a dedicated Backup & Restore section explaining which schemas to include in pg_dump, how to validate catalog integrity after restore with pgtrickle.health_check(), and how to handle OID re-assignment with repair_stream_table().

Upgrade

ALTER EXTENSION pg_trickle UPDATE TO '0.59.0';

The upgrade script adds the defining_query_hash column with DEFAULT 0. Existing stream tables will recompute their hash on the next refresh and write it back via ALTER STREAM TABLE — no manual intervention is needed.


[0.58.0] — Security & Correctness Hardening

What's New

v0.58.0 closes all HIGH-severity findings from the v0.57.0 overall assessment (Report 12). No new SQL API surface is added — every change is a targeted security fix or correctness fix.

SEC-1/2: Ownership Checks for Outbox and Publication APIs

attach_outbox(), detach_outbox(), attach_embedding_outbox(), stream_table_to_publication(), and drop_stream_table_publication() now call check_stream_table_ownership() immediately after resolving stream table metadata. Previously, any role with EXECUTE on the pgtrickle schema could attach an outbox or create a publication for a stream table owned by a different role. Non-owner callers now receive ERROR: must be owner of stream table.

COR-1: Multi-Column NOT IN + NULL Row Handling

The v0.55.0 multi-column IN rewrite now detects NULL constants on either side of the row constructor in NOT IN expressions. When detected, the AntiJoin rewrite is skipped and the original subquery-based execution path is used, emitting a diagnostic NOTICE. See LIMITATIONS.md for details.

COR-2: Recursive-CTE Depth Guard in DIFFERENTIAL Mode

pg_trickle.ivm_recursive_max_depth GUC now applies consistently to both DIFFERENTIAL and IMMEDIATE modes. Previously only IMMEDIATE mode enforced the depth limit.

COR-3: WAL Decoder TOCTOU Advisory Lock

poll_source_changes() now acquires a pg_advisory_xact_lock keyed on the source OID before calling poll_wal_changes(), serialising the eligibility check and WAL consumption into an atomic unit.

COR-4: Compact-Buffer Lock Contention Is Observable

compact_change_buffer() now returns CompactionResult::Contended instead of Ok(0) when it cannot acquire the advisory lock, increments the new shared-memory counter pg_trickle_cdc_compact_contended_total, and exposes it via the Prometheus /metrics endpoint.

SEC-3: DDL Hook Escalates on SPI Failure

handle_alter_table() now retries find_downstream_pgt_ids() once on SPI error and, if the retry also fails, raises pgrx::error!() to block the originating ALTER TABLE rather than silently returning.

SEC-4: Schema Identifier Quoted in CDC Buffer Names

buffer_qualified_name_for_oid() now uses sql_builder::qualified() to properly quote the schema identifier in the change-buffer table path.

Upgrade Notes

No SQL schema changes. No ALTER EXTENSION migration is required.


[0.57.0] — Documentation Excellence

What's New

v0.57.0 completes the Documentation Excellence Arc. It delivers four new end-to-end tutorials, resolves all P2/P3 quality gaps from the Round 2 documentation audit, and applies a full consistency pass across all 83 documentation files.

New tutorials (P1):

  • docs/tutorials/FIRST_DASHBOARD.md — Build a real-time analytics dashboard backend over an e-commerce dataset: revenue by region, hourly order counts, top-10 products chain, and optional Grafana integration.
  • docs/tutorials/EVENT_SOURCING.md — Stream tables as CQRS read-model projections over an event-sourced write model: current order state, customer lifetime value, and inventory levels maintained incrementally.
  • docs/tutorials/BACKFILL_AND_MIGRATION.md — Zero-downtime migration from REFRESH MATERIALIZED VIEW to a stream table: pre-migration assessment, validate_query() check, parallel running, verification, cutover, and rollback.
  • docs/tutorials/SECURITY_HARDENING.md — Role separation, CDC trigger ownership, change-buffer protection, and audit logging; copy-paste SQL templates for all GRANT statements and a verification checklist.

Quality improvements (P2):

  • docs/SECURITY_GUIDE.md: Added "Copy-Paste Templates" section with CREATE ROLE and GRANT statements for pgtrickle_admin, pgtrickle_user, and pgtrickle_readonly.
  • docs/WHATS_NEW.md: Backfilled user-impact summaries for v0.1 through v0.7.
  • docs/tutorials/HYBRID_SEARCH_PATTERNS.md: Expanded patterns 2 (RLS-scoped) and 3 (tiered storage) to match quality of pattern 1; documented pg_trickle.enable_vector_agg GUC.
  • docs/tutorials/PER_TENANT_ANN_PATTERNS.md: Documented partition_key => 'HASH:<col>:<buckets>' syntax with a partition-count guide; expanded patterns 2–3 with full step-by-step examples.
  • docs/QUICKSTART_5MIN.md: Fixed display-text inconsistency on Installation link.
  • docs/PERFORMANCE_COOKBOOK.md: Added three worked examples to §13: (a) max_diff_ctes hit and recovery, (b) detecting when FULL beats DIFFERENTIAL via recommend_refresh_mode(), (c) deep-join chain and max_differential_joins.
  • docs/SECURITY_MODEL.md: Resolved supply-chain TODO items — filled current status or marked "Planned for v1.0" with implementation notes.

Polish (P3):

  • docs/FAQ.md: Converted plain-text GUC cross-references to markdown links pointing to CONFIGURATION.md anchors; added link to SQL_REFERENCE.md.
  • docs/DVM_OPERATORS.md: Added quick-reference table at the top (operator name, mode support, section anchor).
  • docs/tutorials/VECTOR_RAG_STARTER.md: Added full parameter breakdown for pgtrickle.embedding_stream_table() with a parameter table and examples.
  • docs/tutorials/tuning-refresh-mode.md: Added prose explanation of composite score thresholds (+0.15/−0.15) and dead-zone tuning.
  • docs/research/multi_db_refresh_broker.md: Added implementation status banner.

Consistency pass (DOC-CONS-28..31):

  • Terminology sweep: enforced stream table, differential refresh, change buffer, refresh frontier, CDC, DVM, DAG across all 83 docs files.
  • Capitalisation sweep: enforced pg_trickle lowercase, PostgreSQL (not Postgres), pgtrickle schema, pgrx lowercase.
  • Code style sweep: SQL keywords uppercase; pgtrickle. prefix on all function calls; language hints added to unlabelled code blocks.
  • Cross-link audit: verified all internal [text](path.md) links; fixed 7 broken links (USE_CASES.md, integrations/multi-tenant.md, and added docs/ESSENCE.md mdbook include).

[0.56.0] — Documentation Foundation

What's New

v0.56.0 is the first release of the Documentation Excellence Arc, resolving all findings from the Round 2 documentation audit (2026-05-11). It fixes three P0 blockers, completes two reference documents, and adds three new conceptual guides that bring the documentation to world-class standard before v1.0.

P0 fixes (breaking inaccuracies):

  • Fixed scripts/gen_catalogs.py: GUC names now correctly resolve to pg_trickle.* names instead of (registration pending — PGS_*). Rust types are converted to PostgreSQL type names (int4, float8, text). Stale garbage rows at the end of GUC_CATALOG.md are eliminated. The catalog now shows all 115 GUCs with correct names and types.
  • Fixed docs/CONFIGURATION.md: pg_trickle.parallel_refresh_mode now correctly documents its default as 'on' (changed from the stale 'off' which was the pre-v0.11.0 default).
  • Completed docs/ERRORS.md: Added documentation for 18 previously missing error variants across 6 new categories (Publication, SLA, CDC, Diagnostic, Snapshot, Outbox/pg_tide, Placeholder, DVM engine). All 39 PgTrickleError variants are now documented with SQLSTATE codes, descriptions, causes, and fixes.

Reference completeness:

  • docs/SQL_REFERENCE.md: Added working code examples for all 10 outbox/inbox consumer API functions (poll_outbox, commit_offset, extend_lease, seek_offset, consumer_heartbeat, consumer_lag, drop_consumer_group, outbox_rows_consumed, replay_inbox_messages, inbox_ordering_gaps).
  • docs/SQL_REFERENCE.md: Added full column-schema tables for all 7 previously undocumented catalog tables (pgt_outbox_config, pgt_consumer_groups, pgt_consumer_offsets, pgt_consumer_leases, pgt_inbox_config, pgt_inbox_ordering_config, pgt_inbox_priority_config).
  • docs/research/: Added standalone 3-paragraph abstracts to the three previously stub-only research documents (CUSTOM_SQL_SYNTAX.md, PG_IVM_COMPARISON.md, TRIGGERS_VS_REPLICATION.md).
  • docs/DVM_REWRITE_RULES.md: Added concrete before/after SQL examples for all 5 rewrite passes (view inlining, grouping sets expansion, EXISTS→anti/semi-join, scalar sublink hoisting, delta key restriction).
  • docs/introduction.md: Added 3 paragraphs explaining how pg_trickle works conceptually (CDC → delta SQL → MERGE cycle), plus a link to INSTALL.md.

New documents:

  • docs/MENTAL_MODEL.md: 8-section conceptual guide for developers who know SQL but not IVM. Covers the problem of full recomputation, delta semantics, change capture, delta SQL generation, algebraic operator classification, row identity, the refresh cycle, and DAG chaining.
  • docs/LIMITATIONS.md: Comprehensive reference of unsupported SQL constructs, DIFFERENTIAL mode constraints, source table restrictions, operational anti-patterns, and a "Will this work?" decision tree.
  • docs/PERFORMANCE_CHEATSHEET.md: Single-page quick reference with the three golden rules, top-10 GUC quick wins, 5 FULL-fallback patterns with rewrites, and refresh latency diagnostics.

Upgrade Notes

No SQL migration is required. Run ALTER EXTENSION pg_trickle UPDATE TO '0.56.0' or reinstall from packages. All changes are documentation and tooling only.

After upgrading, regenerate docs/GUC_CATALOG.md with:

python3 scripts/gen_catalogs.py

[0.55.0] — Final Pre-1.0 Polish

What's New

v0.55.0 is a focused polish release that lowers technical debt and improves observability ahead of the 1.0 stable label. All nine milestones deliver better diagnostics, cleaner code structure, and more operator-friendly documentation — without any SQL schema changes.

Changes

  • M-1 — Wider invalidation ring (shmem.rs, config.rs): Maximum ring capacity raised from 1 024 to 4 096; the GUC default is now 1 024 so deployments with many concurrent stream tables no longer drop events.

  • M-2 — API module decomposition (src/api/): api/mod.rs split into create.rs, alter.rs, and refresh_ops.rs. Each sub-module is now independently readable and testable.

  • M-3 — Monitor module decomposition (src/monitor/): monitor.rs split into alert.rs, health.rs, and tree.rs. Alert emission, health checks, and DAG tree rendering are now in separate, focused units.

  • M-4 — Structured NOTIFY payloads: All pg_notify calls now emit structured serde_json values instead of hand-built strings, making it easier to parse alert events in downstream consumers.

  • M-5 — Multi-column IN rewrite (src/dvm/parser/sublinks.rs): Row expressions and multi-target sub-selects in IN / NOT IN predicates are now automatically rewritten to AND-chained equality rather than returning an unsupported-syntax error.

  • M-6 — DVM parse metrics (src/shmem.rs, src/dvm/mod.rs): Two new shared-memory counters track cumulative DVM parse time (pg_trickle_dvm_parse_ms) and total delta SQL template size (pg_trickle_delta_query_size_bytes). Both are exposed via the Prometheus /metrics endpoint.

  • M-7 — Reserved column-name prefix docs (docs/SQL_REFERENCE.md): New "Reserved Column-Name Prefixes" section documents __pgt_* and __pgs_* internal prefixes and explains the consequences of naming conflicts.

  • M-8 — GUC rationale comments (src/config.rs): Every magic-number GUC default now has an inline comment explaining why that value was chosen and when operators should raise or lower it.

  • M-9 — Codecov upload in PR gate (.github/workflows/ci.yml): The Linux unit-test job now uploads coverage data to Codecov after each run. fail_ci_if_error: false ensures that a Codecov outage never blocks merges.

Upgrade

No SQL migration is required. Run ALTER EXTENSION pg_trickle UPDATE TO '0.55.0' or reinstall to pick up the new extension version string.


[0.54.0] — DVM Engine Hardening

What's New

v0.54.0 hardens the DVM (Differential View Maintenance) engine across seven dimensions: depth-limit enforcement, CTE-count cap, snapshot fingerprint caching, expression visitor pattern, view-inlining relkind cache, upstream frontier validation, and O(V+E) diamond detection. Every change is targeted at correctness and performance; no user-visible API surface changes.

Changed

C-7: diff_node() Recursion Depth Guard

diff_node() in src/dvm/diff.rs now enforces a hard depth limit drawn from the pg_trickle.max_parse_depth GUC (default 64). Exceeding the limit returns a new PgTrickleError::DiffDepthExceeded(limit) error with a user-actionable hint instead of overflowing the call stack.

R-7: DiffContext CTE Count Cap (OOM Guard)

DiffContext tracks the number of CTEs emitted during a single differentiation pass. When the count reaches the new pg_trickle.max_diff_ctes GUC (default 1000, range 10–100 000), diff_node() returns PgTrickleError::DiffCteCountExceeded(limit) before allocating further memory. This prevents pathological queries from exhausting server memory.

P-4: Snapshot Fingerprint Two-Level Cache

get_or_register_snapshot_cte() now uses a two-level cache: a fast pointer identity check (same OpTree node, O(1)) and a structural fingerprint check (equal subtrees, O(k)). Identical subtrees share a single CTE, eliminating redundant snapshot SQL generation for diamond-shaped query plans.

P-5: Expr::to_sql() Visitor Pattern

Expr::to_sql() now delegates to a new to_sql_into(&self, buf: &mut String) method that writes SQL directly into a pre-allocated buffer using push/ push_str. Intermediate heap allocations for nested expressions are eliminated, reducing allocation pressure on large queries.

P-6: View-Inlining Relkind Cache

rewrite_views_inline_once() now passes a mutable HashMap<(schema,name), Option<relkind>> through the call chain. Each relkind lookup is cached for the duration of the rewrite pass, preventing repeated SPI catalog queries for the same relation within a single inlining iteration.

C-4: Upstream Stream-Table Frontier Validation

generate_delta_query() now validates that every upstream stream-table source referenced in a query has a corresponding entry in the provided refresh frontier. Missing entries return PgTrickleError::StSourceFrontierMissing with a clear message and the affected pgt_id, allowing the scheduler to reinitialize rather than silently producing incorrect delta results.

S-1: O(V+E) Diamond Detection

detect_diamonds() in src/dag.rs previously called collect_ancestors() per fan-in branch (O(V) per branch, O(V²) total for dense graphs). It now calls the new compute_all_ancestors() which traverses the DAG once in forward topological order, building all ancestor sets in O(V+E) total work. Per-branch ancestor lookup is then O(1) via the precomputed map.


[0.53.0] — Unit Test Depth Sweep

What's New

v0.53.0 fills the unit-test coverage gaps identified in the v0.51.0 overall assessment (Report 11, findings T-2 through T-9). Six scheduler and parser submodules that previously had zero inline #[test] coverage now each have a #[cfg(test)] block covering their pure logic. Property-based testing is extended to the DAG cycle detection and topological sort invariants. Two fixed sleeps in the buffer-growth E2E tests are replaced with adaptive polling.

Changed

Scheduler Module Unit Tests

Five scheduler submodules previously had zero inline unit tests. New #[cfg(test)] blocks have been added to:

  • dispatch.rsparse_worker_extra (format validation, edge cases, rejected zero/negative job IDs) and compute_adaptive_poll_ms (exponential backoff, completion reset, no-inflight fast path).
  • pool.rspool_size_from_config_value (negative GUC values clamped to zero, positive values preserved).
  • watermark.rsshould_emit_holdback_warning pure rate-limit helper: disabled threshold, age threshold, 60-second cooldown, saturating subtraction on clock skew.
  • citus.rsrecord_worker_failure / reset_worker_failure thread-local failure counter: increment, per-key isolation, reset-to-zero, no-op on missing key.
  • scheduler_loop.rs — Structural compile-check test (module contains only BGW entry points; E2E coverage in tests/e2e_bgworker_tests.rs).

DVM Parser Unit Tests

dvm/parser/sublinks.rs had zero inline unit tests. New tests cover:

  • extract_bare_scalar_subquery_sql — parenthesised SELECT, missing parens, whitespace trimming, case-insensitive SELECT detection.
  • is_known_aggregate — known built-ins, statistical, ordered-set, and range aggregates; unknown function names.
  • is_star_only — bare *, qualified t.*, empty slice, multi-expression.
  • rewrite_having_expr — COUNT(*) and SUM rewrites, non-matching functions, recursive rewrite inside BinaryOp, literal pass-through.
  • split_exists_correlation — simple equality extraction, non-correlation remaining predicates, AND conjunction splitting.
  • collect_tree_source_aliases — single Scan, InnerJoin, Filter, Subquery.

Proptest Extension (T-2)

Two new proptest! blocks in src/dag.rs:

  • Acyclic invariant — randomly generated chain DAGs of length 1–20 always pass detect_cycles().
  • Cyclic invariant — adding a single back-edge to any chain of length 2–20 is always detected as a cycle by detect_cycles().
  • Topological order invariant — for any acyclic chain, topological_order() places every upstream node before its downstream successor.
  • Back-edge invariant — any single back-edge added to an acyclic DAG creates a cycle (parameterised over both chain length and back-edge position).

Buffer-Growth Sleep Removal (T-8, T-9)

tests/e2e_buffer_growth_tests.rs contained two long fixed sleeps in the sustained-write test:

  • 7-second sleep replaced with db.wait_for_auto_refresh("sustained_st", 30s).
  • 20-second sleep replaced with db.wait_for_condition(...) polling until the stream table count matches the source count, with a 60-second cap.

[0.52.0] — DVM Hot-Path Performance

What's New

v0.52.0 eliminates four measurable hot-path costs in the DVM differential refresh pipeline, all identified in the v0.51.0 overall assessment (Report 11).

P-1: O(1) Placeholder Resolution (aho-corasick)

resolve_delta_template() previously called .replace() twice per source table OID, scanning the full SQL string for each placeholder. For a 10-table join (~50 KB SQL), this was 20 full-string scans per refresh cycle. v0.52.0 replaces the loop with a single-pass Aho-Corasick multi-pattern replacer that resolves all __PGS_PREV_LSN_*__ and __PGS_NEW_LSN_*__ tokens in one traversal — O(template_length) regardless of the number of source tables.

P-2: Thread-Local Volatility Cache

lookup_function_volatility() and lookup_operator_volatility() previously issued one SPI round-trip to pg_proc / pg_operator for every function or operator name encountered during DVM parsing. A query referencing 50 functions triggered 50 round-trips (~50 ms overhead). v0.52.0 adds thread-local HashMap<String, char> caches so each name is resolved via SPI at most once per backend session. The caches are flushed by pgtrickle.clear_caches().

P-3: Lazy DiffContext Allocations

DiffContext::new() previously initialized all maps unconditionally. agg_sum_coalesce_defaults — only needed for queries with COALESCE-wrapped aggregates — is now Option<HashMap<String, String>> and allocated lazily on first use. Simple scan/filter/project queries never allocate it.

P-8: O(1) MERGE Template Cache LRU Eviction

The MERGE template cache previously stored entries in a plain HashMap and found the least-recently-used entry by scanning all entries for the minimum last_used counter — O(N) per eviction. v0.52.0 replaces this with lru::LruCache, which provides O(1) eviction automatically on put().

C-1: Safety Fix in filter.rs HAVING Path

Replaced a bare .expect("BUG: …") in the HAVING-filter delta path with a proper PgTrickleError::InternalError return. An invariant violation now returns a clean error rather than crashing the backend.

Upgrade Notes

No SQL schema changes. No configuration changes required.


[0.51.0] — Citus Chaos Resilience & Documentation Truth

Breaking Changes

  • pg_trickle.event_driven_wake has been removed. This GUC had no effect since v0.39.0 because PostgreSQL's LISTEN command is not permitted inside background worker processes. Remove it from postgresql.conf and any ALTER SYSTEM settings to avoid an "unrecognized configuration parameter" warning on upgrade. No behavioral change — the scheduler always used efficient latch-based polling regardless of this setting.

  • pg_trickle.wake_debounce_ms has been removed. This GUC was only meaningful when event_driven_wake was functional (it never was). Remove it from postgresql.conf as well.

What's New

FEAT-10-01: Citus Chaos Test Rig

Three new chaos resilience scenarios for the Citus distributed integration, proving correctness under real production failure modes:

  • CHAOS-5 — Coordinator restart during active refresh: Creates a distributed stream table, starts a refresh, restarts the coordinator mid-flight, and verifies that 5 subsequent cycles produce the correct result with no phantom or missing rows.

  • CHAOS-6 — Worker kill with shard redistribution: Kills a worker node, triggers rebalance_table_shards(), inserts new rows on the remaining workers, and verifies that DIFFERENTIAL refresh produces a consistent result post-recovery. Asserts that CDC change buffers contain no orphaned records.

  • CHAOS-7 — Network partition and recovery: Uses docker network disconnect to isolate one worker, inserts rows on the remaining workers, reconnects the isolated worker, and verifies that the stream table converges to the correct state within 3 refresh cycles with no data loss.

All three tests are marked #[ignore] and run nightly in the stability-tests.yml workflow alongside the existing G17-SOAK and G17-MDB tests. Use just citus-chaos-up && just test-citus-chaos to run them locally.

CQ-10-02: Remove Deprecated event_driven_wake GUC

Removed the non-functional event_driven_wake and wake_debounce_ms GUCs and all associated dead code paths from the scheduler loop. The code that emitted a WARNING when event_driven_wake = on is gone. The scheduler log message at startup no longer includes the GUC value.

DOC-10-01: ARCHITECTURE.md — pg_tide Integration Boundary

Added a new § pg_tide Integration section to docs/ARCHITECTURE.md that clearly describes the v0.46.0 extraction boundary: what remains in pg_trickle (attach_outbox() hook, change buffer subscription interface) vs what lives in the standalone pg_tide extension (outbox, inbox, consumer groups, relay binary). Updated the module layout diagram to reflect the extraction.

DOC-10-03: ARCHITECTURE.md — Recursive CTE Strategy Selection

Added a new § Recursive CTE Strategy Selection subsection to the DVM Engine section documenting the five-tier strategy selection logic (Tier 1 inline expansion → Tier 2 shared delta → Tier 3a semi-naive → Tier 3b DRed → Tier 3c recomputation), a selection criteria table, observability via explain_stream_table(), and a concrete Tier 3a example for hierarchical closure queries.

DOC-10-02 + COR-10-02: Configuration Documentation Truth

  • CONFIGURATION.md: event_driven_wake and wake_debounce_ms sections replaced with clear removal notices. All tuning profiles, interaction matrix entries, and example configs updated to remove these GUCs.
  • CONFIGURATION.md: Added deprecation ⚠️ callouts for merge_planner_hints (accepted, no effect) and user_triggers = 'on' (deprecated alias for 'auto').
  • CONFIGURATION.md: Added a Note on CDC triggers to the pg_trickle.enabled section explaining that CDC triggers continue to fire when the scheduler is disabled, why this is intentional, and how to fully quiesce CDC overhead during extended maintenance.

[0.50.0] — Performance, Security & Operational Hardening

What's New

PERF-10-01: Batch preflight source-table existence check

  • Replaced the N-query per-OID loop in execute_differential_refresh with a single batch SELECT ... FROM unnest(ARRAY[oid1, oid2, ...]) that returns all source-table existence checks in one SPI round-trip.
  • Reduces preflight overhead from O(N) queries to O(1) for stream tables with multiple sources.

PERF-10-02: CDC trigger SQL string-building micro-optimisation

  • build_stmt_trigger_fn_sql now uses String::with_capacity + direct push_str loops instead of Vec<String>::join, eliminating intermediate allocations in the column list builders (cn, ncr, ocr).
  • Noticeable on high-throughput workloads that re-register triggers frequently.

PERF-10-03: Single-query watermark computation (already present; documented)

  • Confirmed that compute_safe_upper_bound() in src/cdc.rs already consolidates pg_current_wal_lsn(), pg_stat_activity xmin probe, and pg_prepared_xacts into one compound CTE SELECT. Added an explanatory comment referencing PERF-10-03.

SEC-10-01: Replace manual SQL string escaping with pg_catalog.quote_literal

  • All dblink(...) call sites in src/citus.rs now escape connection strings and remote query strings via a new pg_quote_literal() helper that delegates to PostgreSQL's built-in pg_catalog.quote_literal($1) function.
  • Eliminates the risk of SQL injection through attacker-controlled hostnames or slot names in Citus distributed setups.
  • The manual .replace('\'', "''") pattern has been removed from worker_conn_string() and all four dblink call sites.

OPS-10-01: Kubernetes rolling-upgrade drain hook (CNPG)

  • Added lifecycle.preStop hook to cnpg/cluster-production.yaml that runs pgtrickle.drain(timeout_s => 120) before CloudNativePG shuts down a primary pod during rolling upgrades.
  • New docs/RUNBOOK_DRAIN.md section documents the Kubernetes rolling-upgrade procedure and post-upgrade verification steps.

OPS-10-02: Prometheus reliability counters

  • Three new shared-memory atomics in src/shmem.rs:
    • TEMPLATE_CACHE_STALE_EVICTIONS — incremented when a delta template cache entry is evicted because its defining_query_hash no longer matches.
    • DAG_CYCLES_DETECTED — incremented each time detect_cycles() returns Err(CycleDetected).
  • src/dvm/mod.rs: Hash-mismatch stale entries are now detected and counted before being evicted from DELTA_TEMPLATE_CACHE.
  • New pgtrickle.reliability_counters() SQL function (in src/monitor.rs) exposes all three reliability counters as a single-row table.
  • New pg_trickle_reliability query block in monitoring/prometheus/pg_trickle_queries.yml for postgres_exporter.

OPS-10-03: Docker base-image digest pinning

  • All three Dockerfiles (Dockerfile.demo, Dockerfile.ghcr, tests/Dockerfile.e2e) now pin postgres:18.3-bookworm to an exact SHA256 digest, providing supply-chain security and reproducible builds.
  • New scripts/update_base_image_digests.sh automates quarterly digest refreshes.
  • CONTRIBUTING.md documents the update process.

SCAL-10-01: Invalidation ring capacity documentation

  • New docs/CONFIGURATION.md section documents pg_trickle.invalidation_ring_capacity (default 128, hard ceiling 1024), overflow behaviour, the overflow counter, and capacity guidance for deployments with 1,000+ stream tables.

COR-10-01: Deep join chain threshold documentation

  • New docs/CONFIGURATION.md section documents pg_trickle.part3_max_scan_count (default 5), the Part 3 threshold trade-off between SQL complexity and delta correctness at depth, and recommendations for ≤6 vs. >6 table join chains.

SQL Upgrade

ALTER EXTENSION pg_trickle UPDATE TO '0.50.0';

[0.49.1] — Repository Migration to trickle-labs/pg-trickle

What's New

Repository Migration

  • pg_trickle has moved to its permanent home at trickle-labs/pg-trickle.
  • All CI/CD pipelines, Docker image publishing, and release artifacts now originate from the new repository.
  • GitHub Container Registry images are published under ghcr.io/trickle-labs/pg-trickle.
  • Docker Hub images are published under tricklehq/pg_trickle.
  • The PGXN distribution, dbt Hub package, and CloudNativePG plugin listings are updated to reflect the new repository URL.
  • No code changes — this is a pure packaging and infrastructure release.

SQL Upgrade

ALTER EXTENSION pg_trickle UPDATE TO '0.49.1';

[0.49.0] — Test Infrastructure Hardening & Scheduler Decomposition

What's New

TEST-10-01: Concurrency Test Synchronization Overhaul

  • Replaced all tokio::time::sleep busy-waits in tests/e2e_concurrent_tests.rs with pg_stat_activity-polling loops that wait until the target query is actually visible before proceeding.
  • Added wait_for_active_query helper with configurable timeout and a clear failure message so flakiness surfaces as a named error rather than a silent pass.
  • Affected tests: test_pb1_concurrent_refresh_skip_locked_no_corruption, test_concurrent_refresh_and_drop, test_conc1_alter_while_refresh, test_conc2_drop_while_refresh.

TEST-10-02: Unit Test Coverage Sweep

  • Added #[cfg(test)] modules to src/template_cache.rs, src/cdc/polling.rs, and src/cdc/rebuild.rs — modules that previously had zero unit test coverage.
  • New tests cover hash key derivation and round-trip correctness, CDC trigger naming conventions, CDC mode classification, replica identity sufficiency checks, and cache guard condition logic.

TEST-10-03: Fuzz Targets for Merge Codegen and Row Identity

  • Added fuzz/fuzz_targets/merge_sql_fuzz.rs — fuzzes the merge SQL construction pipeline (pg_quote_literal, parse_hash_bound_spec, extract_keyword_int, amplification ratio, build_content_hash_expr). Validates no panics, UTF-8 output, and deterministic results.
  • Added fuzz/fuzz_targets/row_id_fuzz.rs — fuzzes the row identity schema classifier (is_compatible_with, verify_pipeline). Validates reflexivity and that no byte sequence causes a panic.
  • Both targets registered in fuzz/Cargo.toml and the just fuzz-all recipe.

TEST-10-04: DDL During Concurrent Refresh E2E Test

  • Added test_ddl_during_concurrent_refresh to tests/e2e_concurrent_tests.rs. Fires ALTER STREAM TABLE concurrently with a running refresh and asserts either graceful completion or correct blocking — no torn state.

CI-10-02: Expanded e2e-Smoke Filter

  • The PR smoke test now also matches test_.*join.*, test_.*aggregate.*, test_.*window.*, and test_.*subquery.* patterns, catching operator-level regressions earlier.

CI-10-03: Consolidated Fuzz Recipe

  • Added just fuzz-all to the justfile — runs every fuzz target for a configurable duration (default 60 s each).
  • Documented all fuzz targets and corpus paths in CONTRIBUTING.md.

CQ-10-01: Scheduler Module Decomposition

  • src/scheduler/mod.rs was 6,700+ lines. Extracted into three focused submodules:
    • src/scheduler/dispatch.rs — parallel dispatch state, dynamic worker spawn, worker claiming, orphan reaping, adaptive poll-interval logic.
    • src/scheduler/scheduler_loop.rs — BGW registration, launcher main loop, per-database scheduler main loop.
    • src/scheduler/watermark.rs — tick watermark computation, xmin holdback, frontier advance helpers.
  • mod.rs is now a thin re-export façade. All existing public API is preserved with no behaviour change.

SQL Upgrade

ALTER EXTENSION pg_trickle UPDATE TO '0.49.0';

[0.48.0] — Complete Embedding Programme: Hybrid Search, Sparse Vectors & Ergonomic API

What's New

VH-1: Sparse and Half-Precision Vector Aggregates

  • avg(halfvec_col) and avg(sparsevec_col) stream tables now produce output columns typed halfvec(N) and sparsevec(N) respectively — no silent coercion to vector anymore.
  • The DVM engine correctly propagates vector type names through extract_vector_agg_output_dims.

VH-2: Reactive Distance Subscriptions

  • New functions: pgtrickle.subscribe_distance(stream_table, channel, vector_column, query_vector, op, threshold), pgtrickle.unsubscribe_distance(stream_table, channel), and pgtrickle.list_distance_subscriptions(stream_table).
  • After each refresh, the scheduler fires NOTIFY on registered channels when rows in the storage table satisfy the distance predicate.

VH-3: Hybrid-Search Cookbook

VH-4: Vector Benchmark Suite

  • New benchmark: benches/pgvector_bench.rs — measures OpTree construction, AggFunc dispatch, vector string encoding, and drift-detection overhead.

VA-1: embedding_stream_table() Ergonomic API

  • New function: pgtrickle.embedding_stream_table(name, source_table, vector_column, extra_columns, refresh_interval, index_type, dry_run).
  • Automatically generates a stream table, creates an HNSW or IVFFlat index, and configures post-refresh drift monitoring.
  • dry_run => true returns the generated SQL without executing it.

VA-2: Materialised k-NN Graph Research

VA-3: Multi-Tenant ANN Patterns

VA-4: Embedding Outbox

  • New function: pgtrickle.attach_embedding_outbox(stream_table, vector_column, retention_hours, inline_threshold_rows).
  • Extends outbox events with event_type: "embedding_change" and the vector_column name in event headers.

VA-5: Vector RAG Starter Guide

SQL Upgrade

ALTER EXTENSION pg_trickle UPDATE TO '0.48.0';

Direct upgrade scripts are provided from v0.40.0 onward.


[0.47.0] — Embedding Pipeline Infrastructure & ANN Maintenance

⚠ Upgrade support policy change (v0.47.0+)

Starting from v0.47.0, pg_trickle provides direct upgrade scripts only for v0.40.0 and later. If you are running v0.39.0 or older, you must first upgrade to v0.40.0 before upgrading to v0.47.0 or later:

-- Users on v0.39.x or older: upgrade to v0.40.0 first
ALTER EXTENSION pg_trickle UPDATE TO '0.40.0';
-- Then upgrade to the latest version
ALTER EXTENSION pg_trickle UPDATE;

Both steps can be issued in the same session. PostgreSQL handles the intermediate chain automatically. Users already on v0.40.0 or later are unaffected — a single ALTER EXTENSION pg_trickle UPDATE is all that is needed.

v0.47.0 resumes the deferred embedding programme with post-refresh action hooks, drift-based HNSW reindex scheduling, vector-aware monitoring, and the pgvector RAG cookbook.

Post-Refresh Actions (VP-1)

Stream tables can now specify what happens after a successful refresh that produces changed rows:

-- Run ANALYZE after each refresh (keep statistics fresh)
SELECT pgtrickle.alter_stream_table(
    'embedding_store',
    post_refresh_action => 'analyze'
);

-- Always REINDEX the storage table after each refresh
SELECT pgtrickle.alter_stream_table(
    'embedding_store',
    post_refresh_action => 'reindex'
);

-- REINDEX only when the drift threshold is exceeded
SELECT pgtrickle.alter_stream_table(
    'embedding_store',
    post_refresh_action     => 'reindex_if_drift',
    reindex_drift_threshold => 0.20   -- 20% of rows changed
);

The action runs outside the refresh transaction so it does not add latency to the critical refresh window. The four supported values are none (default), analyze, reindex, and reindex_if_drift.

Drift Detection (VP-2)

Two new catalog columns track ANN index freshness:

  • rows_changed_since_last_reindex — running count of rows changed since the last REINDEX, reset to 0 after each successful REINDEX.
  • last_reindex_at — timestamp of the last pg_trickle-triggered REINDEX.

A new GUC pg_trickle.reindex_drift_threshold (default 0.20) sets the global default fraction; per-table overrides via reindex_drift_threshold take precedence.

Vector Status View (VP-3)

SELECT * FROM pgtrickle.vector_status();

Returns one row per stream table with a non-none post_refresh_action:

ColumnDescription
nameSchema-qualified stream table name
post_refresh_actionConfigured action
reindex_drift_thresholdPer-table threshold (NULL = global GUC)
rows_changed_since_last_reindexRows changed since last REINDEX
last_reindex_atWhen the last REINDEX completed
data_timestampWhen the stream table data was last updated
embedding_lagInterval since last refresh
estimated_rowsPostgreSQL reltuples estimate
drift_pctPercentage of rows changed (NULL if no estimate available)

pgvector RAG Cookbook (VP-4)

docs/tutorials/PGVECTOR_RAG_COOKBOOK.md — copy-paste patterns for:

  • Pre-computed embeddings with always-fresh search corpus
  • Tenant-isolated embedding corpus with RLS
  • Drift-aware HNSW reindexing
  • Centroid maintenance for cluster-aware search
  • Operational sizing guidance and monitoring queries

New SQL Functions

  • pgtrickle.vector_status() — embedding lag, ANN age, drift percentage

New Catalog Columns

pgtrickle.pgt_stream_tables:

  • post_refresh_action TEXT NOT NULL DEFAULT 'none'
  • reindex_drift_threshold DOUBLE PRECISION
  • rows_changed_since_last_reindex BIGINT NOT NULL DEFAULT 0
  • last_reindex_at TIMESTAMPTZ

New GUCs

  • pg_trickle.reindex_drift_threshold (default: 0.20) — global default drift fraction for drift-triggered REINDEX

Upgrade Notes

Existing stream tables keep post_refresh_action = 'none' after upgrade — no behaviour change unless explicitly configured.


[0.46.0] — Extract pg_tide: Standalone Outbox, Inbox & Relay

v0.46.0 is a focused extraction release. The full transactional outbox, inbox, and relay subsystem (~6,150 Rust LOC + ~2,500 SQL LOC) has been moved to the new standalone pg_tide extension (trickle-labs/pg-tide). pg_trickle now ships exactly one thing: incremental view maintenance.

The only remaining integration point is attach_outbox(), which registers a pg_tide outbox for a stream table. After attachment, every non-empty refresh calls tide.outbox_publish() inside the same transaction — preserving the ADR-001/ADR-002 single-transaction atomicity guarantee.

New SQL Functions

  • TIDE-7: pgtrickle.attach_outbox(stream_table, retention_hours=>24, inline_threshold_rows=>10000) — requires pg_tide to be installed; calls tide.outbox_create() and registers the mapping in pgtrickle.pgt_outbox_config. Every subsequent non-empty refresh writes a delta-summary row to the pg_tide outbox inside the same transaction.

  • TIDE-7: pgtrickle.detach_outbox(stream_table, if_exists=>false) — removes the pgt_outbox_config entry. The pg_tide outbox table itself is NOT dropped; use tide.outbox_drop() in pg_tide after detaching to also remove the outbox data.

Removed SQL Functions

The following functions were moved to pg_tide (trickle-labs/pg-tide):

Outbox & Consumer Groups: enable_outbox, disable_outbox, outbox_status, outbox_rows_consumed, create_consumer_group, drop_consumer_group, poll_outbox, commit_offset, extend_lease, seek_offset, consumer_heartbeat, consumer_lag

Inbox: create_inbox, drop_inbox, enable_inbox_tracking, inbox_health, inbox_status, replay_inbox_messages, enable_inbox_ordering, disable_inbox_ordering, enable_inbox_priority, disable_inbox_priority, inbox_ordering_gaps, inbox_is_my_partition

Relay: set_relay_outbox, set_relay_inbox, enable_relay, disable_relay, delete_relay, get_relay_config, list_relay_configs

Removed Catalog Tables

Dropped as part of the extraction: relay_outbox_config, relay_inbox_config, relay_consumer_offsets, pgt_inbox_config, pgt_inbox_ordering_config, pgt_inbox_priority_config, pgt_consumer_groups, pgt_consumer_offsets, pgt_consumer_leases. The pgtrickle_relay role is also dropped. pgtrickle.pgt_outbox_config is replaced with a slim integration schema.

GUC Changes

The following GUCs are removed (all moved to pg_tide): pg_trickle.outbox_enabled, pg_trickle.outbox_retention_hours, pg_trickle.outbox_drain_batch_size, pg_trickle.outbox_inline_threshold_rows, pg_trickle.outbox_drain_interval_seconds, pg_trickle.outbox_storage_critical_mb, pg_trickle.outbox_skip_empty_delta, pg_trickle.outbox_force_retention, pg_trickle.inbox_enabled, pg_trickle.inbox_processed_retention_hours, pg_trickle.inbox_dlq_retention_hours, pg_trickle.inbox_drain_batch_size, pg_trickle.inbox_drain_interval_seconds, pg_trickle.inbox_dlq_alert_max_per_refresh, pg_trickle.consumer_dead_threshold_hours

Upgrade Notes

Run pg_trickle--0.45.0--0.46.0.sql to drop all removed objects and migrate pgt_outbox_config to the new schema. Base outbox payload tables (pgtrickle.outbox_<st>) are not dropped — they remain for manual data migration to pg_tide. See the pg_tide repository for migration guidance.

New: pg_tide Extension

The extracted functionality is now available as pg_tide, a standalone PostgreSQL extension at https://github.com/trickle-labs/pg-tide. It includes:

  • Transactional outbox with claim-check mode
  • Idempotent inbox with DLQ, priority, and ordering
  • The pg-tide relay binary (NATS, Kafka, SQS, webhooks, stdout)
  • Consumer group API (poll, commit, heartbeat, lag)

[0.45.0] — Operational Readiness, Scalability & CI Completeness

v0.45.0 is an operational and CI maturity release. It adds a first-class preflight() health-check function, enhances the worker pool status view, makes the invalidation ring capacity configurable, adds lag-aware scheduling, introduces incremental DAG rebuild for faster event propagation, completes dbt macro option parity, and substantially tightens CI coverage.

New SQL Functions

  • A46-4: pgtrickle.preflight() — returns a JSON health report with 7 system checks: shared_preload_libraries presence, scheduler running, max_worker_processes sufficiency, wal_level for WAL-CDC, replication slots availability, invalidation ring overflow count, and Citus worker failure total. Run this after install or after configuration changes to verify the environment is ready.

Enhanced SQL Functions

  • A46-5: pgtrickle.worker_pool_status() gains four new columns: idle_workers (free slots), last_scheduler_tick_unix (Unix timestamp of last scheduler wake), ring_overflow_count (invalidation ring overflows since startup), and citus_failure_total (Citus worker failures logged).

Configuration (GUCs)

  • A46-7: New GUC pg_trickle.invalidation_ring_capacity (integer, default 128, max 1024, postmaster scope). Configures the in-memory invalidation ring used for cross-backend event propagation. Requires a PostgreSQL restart when changed.
  • A46-10: New GUC pg_trickle.lag_aware_scheduling (boolean, default false, superuser scope). When enabled, the per-database refresh quota is boosted proportionally to refresh lag (up to 2×), accelerating catch-up without starving other databases.

Performance

  • A46-9: Incremental DAG schedule re-resolution — when upstream CDC events affect a subset of stream tables, the scheduler now recomputes only the affected CALCULATED-schedule nodes (O(affected)) instead of the full DAG (O(V)). Falls back to full resolution if more than 25% of the DAG is affected. The new resolve_calculated_schedule_incremental() method is benchmarked in benches/scheduler_bench.rs.
  • A46-11: Citus worker failure counter persisted in shared memory (pg_trickle_citus_fail_total), visible via worker_pool_status(). The counter increments when a Citus worker crosses the failure threshold, enabling operational dashboards to track distribution health over time.

Observability & Deployment

  • A46-1/A46-2: Dockerfile.hub, Dockerfile.ghcr, and Dockerfile.demo now carry the correct default ARG VERSION=0.45.0 and a HEALTHCHECK directive (pg_isready) for Docker Compose and Kubernetes readiness probes.
  • A46-3: cnpg/cluster-dev.yaml (single-instance) and cnpg/cluster-production.yaml (3-node HA) added as ready-to-use CloudNativePG cluster manifests, including the worker budget formula max_worker_processes = 8 + (2 × num_databases) + worker_pool_size.
  • A46-6: monitoring/production/README.md documents least-privilege role setup, TLS Prometheus scrape config, Kubernetes ServiceMonitor, and recommended alert thresholds for production deployments.
  • A46-16: docs/STORAGE_BACKENDS.md — reference page covering Heap, Unlogged, Citus columnar, and pg_mooncake backends with migration guidance.

CI & Developer Experience

  • A46-13: Windows compile failures are now blocking on scheduled CI runs (removed continue-on-error: true). A lightweight windows-compile-gate job also runs on every PR to catch Windows-specific compile errors early.
  • A46-14: New e2e-smoke CI job runs on every PR and push to main. It builds the full E2E Docker image and runs a representative subset of tests (DVM, CDC, scheduler), catching packaging/install regressions faster than the full E2E run (schedule/manual only).
  • A46-15: The Coverage workflow now runs on a weekly Monday schedule in addition to push-to-main and manual dispatch, providing consistent module-level coverage trend data.
  • A46-17: dbt macros fully synced with CreateStreamTableOptionsstorage_backend, temporal, append_only, diamond_consistency, diamond_schedule_policy, pooler_compatibility_mode, max_differential_joins, max_delta_fraction, and output_distribution_column are now configurable from dbt model configs and are correctly passed to the underlying SQL functions.

Schema Changes

-- New function
pgtrickle.preflight() RETURNS text

-- worker_pool_status() return type extended (4 new columns):
--   idle_workers             integer
--   last_scheduler_tick_unix bigint
--   ring_overflow_count      bigint
--   citus_failure_total      bigint

-- New GUCs (set in postgresql.conf):
--   pg_trickle.invalidation_ring_capacity = 128  -- postmaster scope
--   pg_trickle.lag_aware_scheduling = false       -- superuser scope

Upgrade note: ALTER EXTENSION pg_trickle UPDATE will DROP and re-create worker_pool_status() automatically (return type changed). The migration script pg_trickle--0.44.0--0.45.0.sql handles this.


[0.44.0] — Security Hardening & Code Quality

v0.44.0 is a security and code-quality sprint. It hardens SECURITY DEFINER paths, centralizes dynamic SQL construction, adds RLS bypass warnings, decomposes large modules, consolidates API options, and strengthens the parser's unsafe FFI façade.

Security

  • A45-1: IVM trigger function SET search_path hardened. BEFORE trigger functions (advisory lock only) now use a restricted path with no public, preventing search_path shadowing of extension internals. AFTER trigger functions retain public so that user delta SQL can resolve unqualified source-table references; their PLPGSQL bodies call only schema-qualified pgtrickle.* functions, so the security boundary is maintained.
  • A45-3: A WARNING is now emitted when a stream table is created over a source table that has Row-Level Security (RLS) enabled, clarifying that source-table RLS does not protect stream-table contents.
  • A45-4: Monitoring docker-compose.yml credentials are now driven by environment variables with a monitoring/.env.example template. PostgreSQL and Grafana services bind to 127.0.0.1 by default.
  • A45-5: New scripts/check_security_definer.sh CI check validates that every SECURITY DEFINER occurrence in Rust and SQL files has a corresponding SET search_path and does not include public without justification. Added to just lint pipeline.
  • A45-6: docs/SECURITY_MODEL.md now documents why superuser = true and trusted = false are required, with a privilege table and guidance for managed environments (RDS, AlloyDB, CNPG).

Code Quality

  • A45-2: New src/sql_builder.rs module provides safe helpers for all dynamic SQL construction: ident, qualified, literal, regclass, spi_param, list_idents. Includes unit tests and a new fuzz target (FUZZ-6).
  • A45-7: src/cdc.rs split into three files — trigger-rebuild logic extracted to src/cdc/rebuild.rs and polling CDC extracted to src/cdc/polling.rs, reducing the main file from 4,259 to 3,386 lines.
  • A45-8: CreateStreamTableOptions struct introduced in src/api/mod.rs to centralize all create_stream_table parameters. All four create paths (create_stream_table, create_stream_table_if_not_exists, bulk_create, create_or_replace_stream_table) now construct this struct before calling the implementation.
  • A45-9: Extended the SAF-2 typed unsafe façade in src/dvm/parser/mod.rs with six additional safe wrapper functions (safe_deparse_sort_clause, safe_deparse_target_list, safe_node_contains_window_func, safe_collect_all_window_func_nodes, safe_extract_func_name, safe_extract_operator_name). Added FUZZ-6 fuzz target for sql_builder and parser volatility helpers.
  • A45-10: Scheduler background worker now emits structured pgrx::warning!() calls instead of silently discarding errors from pg_backend_pid(), SchedulerJob::claim(), and pg_current_wal_lsn() SPI calls.
  • A45-11: All milestone-ID comments audited; each ID is now accompanied by a human-readable invariant description and links to a live design document in plans/.

[0.43.0] — D+I Change-Buffer Schema, GUC Tuning & WAL Diagnostics

v0.43.0 delivers a fundamental change to how CDC change buffers are stored (D+I schema: flat column names, UPDATE decomposed into a D-row + I-row at write time), five new operator-tuning GUCs, a new wal_source_status() diagnostic view for per-source WAL CDC state, extended explain_stream_table() output, and a comprehensive microbenchmark suite for all new code paths.

A44-1 — Deep-Join Threshold GUCs

Two new GUCs let operators tune when the DVM planner switches from the fast L0-scan path to the full recursive join decomposition:

GUCDefaultDescription
pg_trickle.part3_max_scan_count10000Maximum number of source rows before the planner escalates from P3 (direct scan) to a deeper join strategy.
pg_trickle.deep_join_l0_scan_threshold256Row count at which multi-level join decomposition uses an L0 pre-scan instead of a full plan.
-- Lower threshold to force deep-join path for testing
SET pg_trickle.deep_join_l0_scan_threshold = 1;

A44-2 — GROUP_RESCAN: Correct Incremental SUM(CASE …) Aggregates

The P5 aggregate differentiation path now produces correct incremental results for non-invertible expressions such as SUM(CASE WHEN status = 'active' THEN amount ELSE 0 END). The previous LATERAL VALUES decomposition has been replaced with direct c.action = 'I' / c.action = 'D' filtering against the D+I change buffer, eliminating the extra join overhead and fixing a correctness gap for UPDATE rows that cross a CASE boundary.

A44-3 — WAL Poll GUCs

Two new GUCs for tuning the WAL logical replication decoder polling loop:

GUCDefaultDescription
pg_trickle.wal_max_changes_per_poll10000Maximum number of change messages to consume from a WAL slot in a single poll pass.
pg_trickle.wal_max_lag_bytes104857600 (100 MiB)WAL slot lag threshold (bytes) above which the decoder pauses to avoid slot saturation.

A44-4 — Cost-Cache Capacity GUC

pg_trickle.cost_cache_capacity (default 4096) controls the maximum number of entries in the shared refresh-cost estimate cache. On deployments with thousands of stream tables, increasing this value avoids cold-cache fallback to full-plan estimation.

A44-5 through A44-7 — Mandatory Microbenchmarks

Three new Criterion benchmark groups:

  • bench_a44_5_pool_vs_spawn — measures EU-DAG pool reuse vs. per-tick rebuild at n_sts ∈ {50, 200, 500, 1000}.
  • bench_a44_6_write_amplification — compares single-hash (pre-D+I wide schema) vs. double-hash (D+I) write overhead at cols ∈ {4, 10, 20, 50}.
  • bench_a44_7_join_codegen_by_depth and bench_a44_7_scan_agg_delta_sql — join chain depth 2–16 and P5 aggregate SQL generation at 1–5 group columns.

A44-8 — explain_stream_table() GUC Threshold Section

pgtrickle.explain_stream_table(name) now includes a GUC thresholds section in its output, showing the effective values of all tuning GUCs (deep-join threshold, WAL poll limits, cost-cache capacity) alongside the existing plan and mode information.

A44-9 — pgtrickle.wal_source_status() — Per-Source WAL Diagnostics

New SQL function returning one row per registered source table with WAL CDC diagnostics:

SELECT * FROM pgtrickle.wal_source_status();
ColumnDescription
source_relidSource table OID
source_nameFully-qualified source table name
cdc_modetrigger, wal, or transitioning
slot_nameLogical replication slot name (NULL if trigger-based)
slot_lag_bytesCurrent WAL slot lag in bytes
publication_namePublication name (NULL if trigger-based)
blocked_reasonHuman-readable reason why WAL CDC is unavailable (NULL if active)
transition_started_atTimestamp when WAL transition began (NULL if not transitioning)
decoder_confirmed_lsnLast LSN confirmed by the decoder (NULL if trigger-based)

A44-10 — D+I Change-Buffer Schema

Breaking internal change — the CDC change buffer table schema has been redesigned for correctness and performance.

Before (wide schema): Each source column was stored as two columns (new_<col> and old_<col>). UPDATE was stored as a single action = 'U' row; the DVM scan operator decomposed it at read time using a 5-CTE UNION ALL pipeline.

After (D+I schema): Source columns are stored with their original names ("col"). UPDATE is decomposed at write time into:

  • A D-row (action = 'D') carrying the old values.
  • An I-row (action = 'I') carrying the new values.

Both rows carry the same changed_cols VARBIT bitmask; genuine INSERT/DELETE rows have changed_cols = NULL.

Benefits:

  • Scan SQL is significantly simpler (no UNION ALL decomposition at read time).
  • Aggregate differentiation eliminates the LATERAL VALUES join.
  • Write amplification is constant (2 rows per UPDATE regardless of column count).
  • Change buffer tables are compatible with standard SQL tooling.

The sync_change_buffer_columns() migration guard detects existing wide-schema buffers (any new_*/old_* columns) and performs a no-op, logging a warning. To migrate an existing deployment, use pgtrickle.repair_stream_table(name).

A44-11 — D+I Benchmark Suite

bench_a44_11_di_delta_scan exercises the full D+I Scan→Aggregate pipeline at cols ∈ {4, 10, 20, 50} to track differential scan performance as the column count grows.


[0.42.0] — Repair API, Docs Overhaul & Test Infrastructure

v0.42.0 delivers a new repair_stream_table SQL function for disaster recovery and self-healing after PITR restores, a comprehensive documentation overhaul (deprecated GUC appendix, RLS bypass warnings, updated architecture diagrams), security hardening of the WAL decoder via SQL parameterization, and a major test infrastructure uplift with state-polling helpers, new correctness property tests, and two new CI gates.

A42-1 — pgtrickle.repair_stream_table(name text) → text

New SQL-callable function for stream table repair and self-healing. Use after point-in-time recovery (pg_basebackup / PITR) or any operation that may have left CDC triggers, change buffer tables, or catalog state inconsistent.

Actions performed:

  1. Acquires an advisory lock on the stream table to prevent concurrent mutations.
  2. Verifies the stream table exists in pgtrickle.pgt_stream_tables.
  3. Resets the refresh frontier to NULL and sets needs_reinit = true, forcing a full refresh on the next scheduler cycle.
  4. Rebuilds any missing CDC triggers on all source tables.
  5. Recreates any missing change buffer tables in pgtrickle_changes.
  6. Resets error fuse state and stream table status to ACTIVE.
  7. Returns a text summary of all actions taken.
-- After a PITR restore, reinstall all CDC infrastructure
SELECT pgtrickle.repair_stream_table('order_totals');
-- → "repair_stream_table(order_totals): frontier reset; triggers OK; buffers rebuilt (1 recreated); status reset to ACTIVE"

A42-2 — Catalog Generator Accuracy Improvement

scripts/gen_catalogs.py regex now correctly captures non-pub #[pg_extern] functions (pgrx does not require pub). The SQL API catalog grew from 24 to 98 entries, including repair_stream_table. CI fails on catalog drift.

A42-3 — SQL Reference: repair_stream_table Signature

docs/SQL_REFERENCE.md now correctly documents → text (not → void) return type with full examples and parameter table.

A42-4 — Stale-Term Docs Linter (just docs-lint)

New just docs-lint recipe greps all docs/**/*.md for retired GUC names (pg_trickle.max_workers, pg_trickle.max_parallel_refresh_workers) and fails if any are found outside deprecated/compatibility sections. Also integrated into .github/workflows/docs-drift.yml as a CI gate.

A42-5 — Deprecated GUC Compatibility Appendix

docs/CONFIGURATION.md now has an Appendix: Deprecated / Compatibility GUCs section documenting event_driven_wake and wake_debounce_ms with migration guidance. Existing active references to the two retired GUCs were updated to their current replacements across PATTERNS.md, SCALING.md, PRE_DEPLOYMENT.md, and docs/integrations/multi-tenant.md.

A42-6 — ARCHITECTURE.md Module Diagram Updated

docs/ARCHITECTURE.md module layout now correctly reflects the src/dvm/parser/ subdirectory structure introduced in v0.39.0 (G13-PRF), with all five sub-modules (mod.rs, types.rs, validation.rs, rewrites.rs, sublinks.rs) listed.

A42-7 — RLS Bypass Prominence

docs/GETTING_STARTED.md and docs/PRE_DEPLOYMENT.md now include prominent security notices explaining that pg_trickle background workers execute with SET LOCAL row_security = off (matching PostgreSQL's own REFRESH MATERIALIZED VIEW semantics), and providing mitigation guidance.

A42-8 — Generated Docs Freshness CI Gate

.github/workflows/docs-drift.yml now runs both the catalog check (python3 scripts/gen_catalogs.py --check) and the stale-term linter on every PR targeting main, on every push to main, and on a weekly schedule.

A42-9 — State-Polling Test Helpers

tests/common/mod.rs now exports seven polling helpers:

  • wait_for_first_refresh / wait_for_refresh_history / wait_for_refresh_after
  • wait_for_cdc_mode
  • wait_for_stream_table_status
  • wait_for_scheduler_tick
  • wait_for_query_count

All new E2E test files created in this release use these helpers exclusively (zero tokio::time::sleep calls). Existing tests had their most egregious blind waits replaced.

A42-10 — Differential SUM(CASE) E2E Tests

New test file tests/e2e_sum_case_differential_tests.rs (5 tests) validating that SUM(CASE WHEN ... END) expressions correctly trigger full refresh mode instead of attempting algebraically incorrect incremental updates.

A42-11 — SUM(CASE) AST-Level Detection

src/dvm/operators/aggregate.rs: is_algebraically_invertible now calls the new expr_contains_case helper which recursively inspects the Expr AST for CASE expressions at any nesting depth, catching wrapped forms like SUM(CAST(CASE ... END AS numeric)).

A42-12 — FULL JOIN Aggregate Property Tests

New test file tests/e2e_full_join_aggregate_tests.rs (4 tests) including a test_full_join_diff_vs_full_property_10_cycles property test that runs 10 insert/delete cycles and asserts DIFFERENTIAL refresh produces identical output to FULL refresh after each cycle.

A42-13 — WAL Decoder SQL Parameterization

src/wal_decoder.rs: write_decoded_change now builds fully parameterized SPI queries using $N placeholders and Spi::run_with_args, eliminating all direct string interpolation of WAL values into SQL. This closes a class of SQL injection risks in the WAL CDC path.

A42-14 — Stale EC-06 Comment Cleanup

src/dvm/operators/scan.rs: Updated design comments from the outdated EC-06 reference to accurately describe the current net-counting strategy and point to the test_keyless_multiset_property test.

A42-15 — Keyless Multiset Property Tests

New test file tests/e2e_keyless_tests.rs (4 tests) validating that keyless (no primary key) tables maintain correct multiset semantics through 10 cycles of insert/delete/update operations.

A42-16 — Fuzz Smoke CI Job

New .github/workflows/fuzz-smoke.yml runs daily and on PRs that touch fuzz targets. On PRs: replays the corpus for each target (zero new crashes allowed). On schedule/dispatch: runs each target for 60 s and uploads crash artifacts. Targets: parser_fuzz, cron_fuzz, dag_fuzz, guc_fuzz, cdc_fuzz, wal_fuzz.


[0.41.0] — DVM Correctness: Structural Cache Keys, Placeholder Safety & WAL Transition Guards

v0.41.0 targets internal correctness of the Differential View Maintenance (DVM) engine: eliminating snapshot-CTE cache collisions on structurally different subtrees, making unresolved SQL placeholders hard errors, guarding WAL CDC transitions against concurrent DDL, and ensuring the pool worker obeys the global pg_trickle.enabled switch.

A41-1 — Structural Snapshot CTE Cache Key Fingerprint

The old snapshot_cache_key() concatenated leaf-table aliases, meaning two OpTrees with identical source tables but different join conditions, join types, predicates, projections, or grouping expressions mapped to the same key and could silently share a snapshot CTE.

The function now computes a 64-bit structural fingerprint via DefaultHasher, recursively encoding every operator type, join condition, predicate, projection, group-by expression, and child fingerprints before formatting the key as a 16-character hex string. Collision probability is now astronomically low for any realistic OpTree and is independent of alias names.

A41-2 — Placeholder Resolution Full-Validation Assertion

resolve_delta_template() and resolve_lsn_placeholders() now return Result<String, PgTrickleError> instead of String. After all substitutions a check_no_remaining_placeholders() call scans for any leftover __PGS_*__ or __PGT_*__ tokens. If any are found, PgTrickleError::UnresolvedPlaceholder is returned and propagated all the way to the SQL surface as a clear ERRCODE_INTERNAL_ERROR with a detail message naming the offending token and the calling context.

This converts a class of silent wrong-query bugs (where an unresolved placeholder was executed as literal SQL text) into an immediate, actionable server error.

A41-3 — WAL Transition Eligibility Recheck at Commit Point

Before committing the TRANSITIONING → WAL state change, the background worker now calls recheck_source_eligible_for_wal() to verify that:

  • pg_class.relkind = 'r' (table not dropped)
  • primary-key columns are still present
  • REPLICA IDENTITY = FULL is still set

If any check fails, the replication slot is immediately dropped, the catalog is reset to Trigger mode, and a WalTransitionError is returned. This closes a race window in which a concurrent DROP CONSTRAINT or ALTER TABLE … REPLICA IDENTITY DEFAULT could leave the CDC pipeline in an inconsistent WAL mode with stale slot resources.

A41-4 — Pool Worker pg_trickle.enabled Check

The persistent pool-worker main loop now checks config::pg_trickle_enabled() at the top of each iteration. When pg_trickle.enabled = off the worker sleeps 500 ms and skips all job claiming, ensuring that a live-reload of the GUC immediately quiesces all workers without requiring a process restart.

A41-5 — Document Isolation Invariants (All Execution Modes)

// A41-5 — Isolation invariant: doc comments have been added to all five execution-mode functions in src/scheduler/mod.rs:

ModeInvariant
execute_worker_singletonREAD COMMITTED per-refresh; no cross-session writes visible
execute_worker_atomic_groupREAD COMMITTED with sub-transactions; repeatable-read group shares a snapshot
execute_worker_immediate_closureSingle READ COMMITTED transaction; trigger-propagated and atomic
execute_worker_cyclic_sccPer-iteration READ COMMITTED; external observers see partial states between iterations
execute_worker_fused_chainSingle READ COMMITTED transaction; bypass tables ON COMMIT DROP; externally atomic

[0.40.0] — Operator Trust, Maintainability & Release Confidence

v0.40.0 focuses on building confidence for operators, maintainers, and adopters: auto-generated API/GUC catalogs to eliminate drift, a formal security model, drain-mode runbook with E2E proof, expanded alert rules, dbt/relay parity, strict unsafe-block gate, L0-cache documentation truthfulness, formal deprecation of event_driven_wake, and secret scanning in CI.

O40-1 — Auto-Generated GUC & SQL API Catalogs

scripts/gen_catalogs.py parses src/config.rs and src/**/*.rs to produce docs/GUC_CATALOG.md (125 GUCs) and docs/SQL_API_CATALOG.md (24 SQL-callable functions). Both are checked by a new .github/workflows/docs-drift.yml CI gate that fails if the catalogs fall out of sync with the source.

Run just gen-catalogs to regenerate; just check-docs-drift (or the CI gate) detects drift.

O40-2 — Security Model & Secret-Handling Guide

New docs/SECURITY_MODEL.md covers: SECURITY DEFINER scope, search_path hardening, RLS boundary semantics, CDC buffer access controls, TRUNCATE semantics, relay credential storage guide, background worker privilege model, incident response checklist, and v1.0 supply-chain preparation checklist.

O40-3 — Drain-Mode Runbook & E2E Proof

New docs/RUNBOOK_DRAIN.md provides a step-by-step operator runbook for controlled shutdown, rolling upgrade, and load-testing drain scenarios with observability guidance and troubleshooting steps.

Six new E2E tests in tests/e2e_drain_mode_tests.rs validate: idle drain returns true, is_drained() state reflection, post-resume catch-up, drain under active workload, timeout parameter semantics, and change-buffer accumulation during drain.

O40-4 — Expanded Alert Rules

monitoring/prometheus/alerts.yml gains eight new production-grade alert rules:

AlertThresholdSeverity
PgTrickleFreshnessLagHighstaleness > 600 s for 10 minwarning
PgTrickleRefreshP99Highavg_duration > 60 000 ms for 5 minwarning
PgTrickleCdcBufferDepthHighpending_rows > 500 000 for 5 minwarning
PgTrickleWalSlotLagHighretained_wal_mb > 200 for 5 minwarning
PgTrickleWalSlotLagCriticalretained_wal_mb > 1 024 for 2 mincritical
PgTrickleWorkerPoolSaturatedactive ≥ 90 % pool_size for 5 minwarning
PgTrickleCitusLeaseUnhealthylease_held == 0 for 5 mincritical
PgTrickleOtelExportErrorsexport_errors_total > 0 for 5 minwarning

O40-5 — dbt & Relay Parity

New dbt macro pgtrickle_operational_status() returns scheduler health, drain state, CDC pause state, force-full mode, and back-pressure status. New pgtrickle_drain() macro for drain from dbt. stream_table_status() updated with cdc_paused, force_full, and is_drained fields.

O40-6 — Unsafe-Inventory Gate (Strict Mode)

.github/workflows/unsafe-inventory.yml changed from --report-only to strict mode: the workflow now exits 1 on unsafe-block regressions, making it a hard PR gate. Unsafe blocks that need to be added must update the baseline via an explicit PR that reviewers can audit.

O40-7 — L0-Cache Truthfulness

pg_trickle.template_cache GUC documentation updated to explain the full L0/L1/L2 cache architecture:

  • L0 (process-local RwLock<HashMap>) — fast, not shared across pooler connections; hit rate is low in PgBouncer transaction-pooling deployments.
  • L1 (thread-local delta template) — fastest, reset on each SPI connect.
  • L2 (UNLOGGED catalog table) — shared across all backends; the correct layer to rely on for cross-connection performance.

Operators using transaction-pooling should rely on L2 warm-up, not L0.

O40-8 — event_driven_wake Formal Deprecation

pg_trickle.event_driven_wake and pg_trickle.wake_debounce_ms are formally deprecated with full rationale in the GUC doc comments: LISTEN is not allowed in PostgreSQL background workers; the scheduler always uses latch-based polling. Both GUCs are preserved in v0.40.0 for upgrade compatibility and will be removed in v1.0. Setting them now emits a WARNING but does not break existing configurations.

O40-9 — Secret Scanning CI

New .github/workflows/secret-scan.yml runs gitleaks on all pull requests to main, on pushes to main, and weekly. .gitleaks.toml provides an allowlist for known example credentials in documentation and test fixtures.


[0.39.0] — Operational Truthfulness & Distributed Hardening

v0.39.0 focuses on making pg_trickle's operational behavior more honest and robust: CDC hold mode, enhanced diagnostics, SQLSTATE-aware retry, OpenTelemetry documentation, Citus chaos hardening, and a broader testing pyramid.

O39-1/O39-8 — CDC Hold Mode (cdc_capture_mode)

New GUC pg_trickle.cdc_capture_mode (default discard). When set to hold, captured change rows are buffered in the change table while CDC is paused rather than being silently discarded. The existing discard behavior is unchanged and remains the default to preserve backward compatibility.

New SQL function pgtrickle.cdc_pause_status() returns per-stream-table CDC pause state including paused, capture_mode, and an operator-guidance note.

O39-2 — Wake Truthfulness

The scheduler no longer attempts LISTEN/NOTIFY in background worker contexts (PostgreSQL does not support this). Wake truthfulness is documented in the header of e2e_wake_tests.rs; tests now verify that the scheduler falls back to polling correctly rather than asserting sub-polling-interval wake latency.

O39-3 — Configuration Documentation

docs/CONFIGURATION.md gains three new sections covering GUCs introduced in v0.36.0 (WAL Backpressure), v0.37.0 (pgVectorMV & OpenTelemetry), and v0.39.0 (Operational Truthfulness: cdc_capture_mode). Each section includes an operator checklist and configuration examples.

O39-4 — Upgrade Guide

docs/UPGRADING.md gains upgrade sections for every version from 0.34.0 to 0.39.0, including schema change details, new GUCs, new functions, and known limitations per release.

O39-5 — OpenTelemetry Operator Guide

New docs/OPENTELEMETRY.md provides an end-to-end operator guide for the W3C Trace Context integration introduced in v0.37.0. Covers Jaeger/Tempo/OTEL Collector configuration, span attributes, failure behavior (best-effort; never blocks refresh), and verification steps.

Three new E2E tests (tests/e2e_otel_tests.rs) verify trace context capture, unreachable-endpoint graceful degradation, and disabled-tracing NULL context.

O39-6 — SQLSTATE-First SPI Retry

New GUC pg_trickle.use_sqlstate_classification (default off). When enabled, the scheduler uses a SQLSTATE integer class (40xxx = retryable, 23xxx = not retryable, etc.) before falling back to text pattern matching. The new unified classify_error_for_retry() function is used at both retry decision points in the scheduler.

O39-7 — Citus Chaos Test Harness

New tests/e2e_citus_chaos_tests.rs containing four #[ignore] chaos tests:

  • CHAOS-1: Worker death mid-refresh (graceful failure + recovery)
  • CHAOS-2: Coordinator restart during lease (lock invalidation + re-acquire)
  • CHAOS-3: Shard rebalance during active CDC (no row gaps)
  • CHAOS-4: Stale worker slot cleanup (topology change detection)

Tests require CITUS_COORDINATOR_URL and CITUS_*_CONTAINER env vars; they are skipped automatically when not set.

O39-9 — Enhanced explain_stream_table()

pgtrickle.explain_stream_table() now shows: Status, Populated, Refresh mode (with force_full_refresh GUC note), CDC status (paused/active + capture mode), Backpressure state, and the Defining query. This makes the function a one-stop diagnostic tool for operators.

O39-10 — TPC-H EXPLAIN Artifacts CI

New workflow .github/workflows/tpch-explain-artifacts.yml captures EXPLAIN ANALYZE BUFFERS output and p50/p99 timing for TPC-H queries Q04, Q05, Q07, Q08, Q09, Q20, Q22. Runs weekly (Sunday 06:00 UTC) and on manual dispatch. Artifacts are uploaded and retained for 90 days.

New test_tpch_explain_artifacts test function (#[ignore]) in tests/e2e_tpch_tests.rs performs the collection.

O39-11 — SQLancer Light PR Mode

Two new non-#[ignore] tests in tests/e2e_sqlancer_tests.rs:

  • test_sqlancer_crash_oracle_light: 50 random queries, crash oracle.
  • test_sqlancer_equivalence_oracle_light: 50 random queries, equivalence oracle.

Both use a fixed seed (SQLANCER_LIGHT_SEED) and bounded case count (SQLANCER_LIGHT_CASES, default 50) for fast, deterministic PR CI gates.

O39-12 — Fuzz Target Expansion

Two new libFuzzer targets:

  • fuzz/fuzz_targets/wal_fuzz.rs: SQLSTATE classifier + sqlstate_to_string invariants.
  • fuzz/fuzz_targets/dag_fuzz.rs: schedule parsing, cron validation, SELECT * detection.

Both verify no-panic and determinism properties for adversarial inputs.

O39-13 — Inbox/Outbox Reliability Property Tests

Unit-level property tests in src/api/inbox.rs (#[cfg(test)]) covering:

  • Partition exhaustiveness: every aggregate ID maps to exactly one worker.
  • Hash determinism: same inputs always produce the same assignment.
  • Negative total_workers degenerate case.
  • Known hash anchors for regression protection.

SQLSTATE classifier property tests in src/error.rs covering:

  • Retryable class detection.
  • Bracket-code extraction with malformed inputs.
  • sqlstate_to_string totality and determinism.

O39-14 — PR-Scoped Upgrade E2E Slice

New CI job upgrade-e2e-pr-slice in .github/workflows/ci.yml. Triggered on PRs that modify sql/, src/config.rs, src/cdc.rs, or src/api/. Runs the most recent N-1→N upgrade pair using a stock postgres:18.3 container (no custom Docker build). Tests filtered to smoke | basic | catalog labels for speed.

Upgrade: Run ALTER EXTENSION pg_trickle UPDATE TO '0.39.0'. The 0.38.0→0.39.0 migration creates pgtrickle.cdc_pause_status() and registers the cdc_capture_mode GUC comment. No existing tables or functions are removed.


[0.38.0] — EC-01 Join Correctness Sprint

v0.38.0 is a focused correctness release for EC-01, the join phantom-row class where non-deduplicated join deltas could leave stale row IDs behind across refresh cycles.

EC-01 — Unconditional PH-D1 Cleanup

Non-deduplicated keyed join deltas now run PH-D1 cross-cycle cleanup after every differential apply. The cleanup computes the current FULL-refresh row-id set and deletes stream-table row IDs that no longer exist in the correct result. This removes historical phantoms that are not present in the current delta and keeps DIFFERENTIAL output convergent with FULL output.

RowIdSchema Planner Guard

The dormant RowIdSchema model is now exercised during DVM planning. The planner infers row-id schemas for scans, transparent operators, joins, aggregates, set operations, CTEs, recursive plans, lateral plans, and scalar subqueries before generating delta SQL. If a row-id pipeline is internally inconsistent, planning fails with a clear RowIdSchema verification failed message rather than allowing silent refresh drift.

EC-01 Property Release Gate

Added e2e_ec01_property_tests, a DIFF-vs-FULL property test that runs a deterministic three-table join aggregate through 100 mixed-DML cycles by default. Each cycle includes inserts, updates, deletes on both sides of joins, and co-delete cases, then compares DIFFERENTIAL and FULL stream tables with multiset equality and row-id diagnostics.

Q07 and Q15 are no longer allowed in IMMEDIATE_SKIP_ALLOWLIST, so CI must prove those query shapes instead of accepting silent skips.

Removed

  • pgtrickle-tui — The terminal dashboard binary has been removed from this repository. All SQL-level monitoring functions (pgtrickle.health_check(), pgtrickle.list_stream_tables(), etc.) remain fully available in the extension.

Upgrade: The 0.37.0 -> 0.38.0 migration has no SQL-object changes; the release changes Rust DVM/refresh behavior and test coverage only.


[0.37.0] — pgVector Incremental Aggregates & Distributed Trace Propagation

v0.37.0 adds two independent capability pillars: incremental vector aggregates for pgvector workloads, and W3C Trace Context propagation through the CDC → DVM → MERGE pipeline.

F4 — pgVector Incremental Aggregates

Stream tables can now maintain avg(embedding) and sum(embedding) over vector, halfvec, and sparsevec columns incrementally. The DVM planner detects vector-typed aggregate arguments at plan time and reclassifies them to use pgvector-native differential operators (VectorAvg, VectorSum) that maintain a running (count, sum_vector) auxiliary state instead of a full table scan on every change.

SQL usage:

CREATE EXTENSION pgvector;

CREATE TABLE products (
    id SERIAL PRIMARY KEY,
    category TEXT,
    embedding vector(3)
);

-- This stream table is maintained incrementally — no full scan on INSERT.
SELECT pgtrickle.create_stream_table(
    'category_centroids',
    'SELECT category, avg(embedding)::vector AS centroid
     FROM products GROUP BY category',
    schedule => '5s'
);

GUC: SET pg_trickle.enable_vector_agg = on; (session-level opt-in).

Distance operator fallback: <=>, <->, <#> operators in WHERE clauses trigger automatic full-refresh fallback because they are non-monotone. The planner emits a WARNING so operators know the mode downgrade occurred.

Criterion benchmarks are provided for vector_avg, vector_sum, and mixed workloads in benches/diff_operators.rs.

Documentation: docs/tutorials/PGVECTOR_EMBEDDING_PIPELINES.md.

F10 — W3C Trace Context Propagation

Every CDC change buffer table now contains a __pgt_trace_context TEXT column. When an application sets the pg_trickle.trace_id GUC before executing DML, the row-level and statement-level CDC triggers capture the W3C traceparent string into that column.

After each differential refresh, if pg_trickle.enable_trace_propagation = on, the extension reads the trace context from the change buffer and either:

  • exports an OTLP/JSON span to pg_trickle.otel_endpoint (Jaeger, Zipkin, OTEL Collector), or
  • logs the span at INFO level when no endpoint is configured.

The span covers the full CDC-drain → DVM-plan → merge-apply cycle, linking PostgreSQL refresh latency directly to application request traces.

GUCs added:

GUCTypeDefaultDescription
pg_trickle.enable_trace_propagationBOOLfalseEnable W3C trace propagation
pg_trickle.otel_endpointSTRING''OTLP HTTP endpoint (e.g. http://localhost:4318)
pg_trickle.trace_idSTRING''W3C traceparent set by the application session
pg_trickle.enable_vector_aggBOOLfalseEnable incremental pgvector aggregates

Upgrade: The 0.36.0 → 0.37.0 migration script adds __pgt_trace_context to all existing change buffer tables automatically.

Internal improvements

  • A15/A16: src/scheduler and src/refresh/merge each split into focused sub-modules (completed in v0.37.0 development cycle).

[0.36.0] — Structural Hardening, Performance & Temporal IVM

v0.36.0 closes structural and performance gaps accumulated since the Citus arc. The L0 process-local template cache is now constructed (was wired-but-empty since v0.31.0). WAL slot backpressure enforcement is available via the new pg_trickle.enforce_backpressure GUC. Structured JSON logging arrives for OpenTelemetry/Loki integration. The RowIdSchema type formalises cross-operator row-id compatibility, addressing the architectural root cause of EC-01 class bugs. Temporal IVM (SCD Type 2, AS OF TIMESTAMP ready) and columnar storage backend support are introduced. A drain mode API enables graceful quiesce before maintenance windows.

New features

  • A09 — L0 process-local template cache: Process-local RwLock<HashMap> keyed by (pgt_id, cache_generation) avoids ~45 ms cold-start penalty per backend for connection-pooler workloads. Invalidated automatically on generation bump. New API: shmem::l0_cache_lookup(), shmem::l0_cache_store(), shmem::invalidate_l0_cache().

  • A12 — WAL backpressure enforcement: When pg_trickle.enforce_backpressure = on, CDC trigger writes are suppressed once the WAL replication slot lag reaches slot_lag_critical_threshold_mb. Writes resume when lag drops below 50% of the threshold (hysteresis). Default: off.

  • A17 — Typed DDL event payload: Replaced string-tag matching in hooks.rs with a DdlCommandKind enum. CREATE OR REPLACE FUNCTION is now correctly classified as FunctionChange.

  • A18 — RowIdSchema type: Every DVM operator can now declare its row-id hash schema. A verify_pipeline() function asserts cross-operator compatibility at plan time, making EC-01-class bugs detectable before execution.

  • A20 — Structured JSON logging: New src/logging.rs module with PgtLogEvent struct and pgt_info! macro. When pg_trickle.log_format = json, events are emitted as structured JSON with fields event, pgt_id, cycle_id, duration_ms, refresh_reason, error_code, msg. Default: text.

  • A25 — Bulk alter / drop APIs: New SQL functions pgtrickle.bulk_alter_stream_tables(names TEXT[], params JSONB) and pgtrickle.bulk_drop_stream_tables(names TEXT[]) for dbt deployments managing many stream tables.

  • A35 — Drain mode: pgtrickle.drain(timeout_s INT DEFAULT 60) signals the scheduler to stop accepting new cycles and waits for all in-flight refreshes to complete. pgtrickle.is_drained() checks drain status. Useful before pg_upgrade, rolling restarts, and backup windows.

  • CORR-1 / UX-1 — Temporal IVM: create_stream_table() and create_stream_table_if_not_exists() now accept temporal := true. When enabled, __pgt_valid_from TIMESTAMPTZ and __pgt_valid_to TIMESTAMPTZ columns are automatically added to the storage table. A temporal_mode column is recorded in pgtrickle.pgt_stream_tables.

  • CORR-2 / UX-3 — Columnar storage backend: create_stream_table() now accepts storage_backend := 'heap'|'citus'|'pg_mooncake' (default: 'heap'). The backend is recorded in pgtrickle.pgt_stream_tables.storage_backend and can be overridden globally via the pg_trickle.columnar_backend GUC.

  • F5 — Online schema evolution: When pg_trickle.online_schema_evolution = on, ALTER QUERY with only column additions (no removals) preserves the existing frontier and is_populated flag, enabling continuous differential refresh without a full reinit. Default: off.

  • F11 — CREATE STREAM TABLE SQL syntax: New function pgtrickle.exec_stream_ddl(TEXT) parses custom DDL strings such as CREATE STREAM TABLE name AS SELECT ... and CREATE OR REPLACE STREAM TABLE name AS SELECT ... and DROP STREAM TABLE name.

  • F12 — Column lineage: New function pgtrickle.stream_table_lineage(name TEXT) returns TABLE(output_col, source_table, source_col) from the column_lineage JSONB recorded in the catalog at creation time.

New GUCs

GUCDefaultDescription
pg_trickle.enforce_backpressureoffPause CDC writes when slot lag exceeds critical threshold
pg_trickle.log_formattextLog format: text or json
pg_trickle.drain_timeout60Default drain timeout (seconds)
pg_trickle.online_schema_evolutionoffPreserve frontier on compatible ALTER QUERY
pg_trickle.columnar_backendnoneDefault columnar backend: none, citus, pg_mooncake
pg_trickle.temporal_stream_tablesoffGlobal temporal IVM flag

Schema changes

  • pgtrickle.pgt_stream_tables gains three new columns:
    • temporal_mode BOOLEAN NOT NULL DEFAULT FALSE
    • storage_backend TEXT NOT NULL DEFAULT 'heap'
    • column_lineage JSONB

Upgrade

ALTER EXTENSION pg_trickle UPDATE TO '0.36.0';

The migration script (sql/pg_trickle--0.35.0--0.36.0.sql) is fully idempotent and adds the new columns with IF NOT EXISTS.


[0.35.0] — Hardening, Reactive Subscriptions & Relay Resilience

v0.35.0 is a focused correctness, operability, and resilience sprint. It adds reactive NOTIFY subscriptions, an SLA summary API, CDC kill-switch GUCs, and several operator-facing improvements. The relay gains exponential reconnect backoff and ${ENV:VAR_NAME} secret interpolation.

New features

  • Reactive subscriptionspgtrickle.subscribe(stream_table, channel) / pgtrickle.unsubscribe() / pgtrickle.list_subscriptions(): NOTIFY-based reactive delivery after every non-empty refresh cycle (UX-SUB).
  • SLA summary APIpgtrickle.sla_summary() returns p50/p99 latency, freshness lag, and error-budget remaining over a configurable window (pg_trickle.sla_window_hours, default 24 h) (F17).
  • Explain stream tablepgtrickle.explain_stream_table(name) returns the defining query and cached refresh metadata for a stream table (A23).
  • Shadow-build evolution statuspgtrickle.view_evolution_status() lists which stream tables are in a zero-downtime shadow build (UX-STATUS).
  • CDC kill-switch — new pg_trickle.cdc_paused GUC (boolean, default off) pauses all CDC capture at the trigger level without dropping triggers (A07).
  • Force-full-refresh GUCpg_trickle.force_full_refresh (boolean, default off) forces all stream tables to use FULL refresh mode for a debugging/recovery window (A08).
  • FULL-fallback NOTICE — a NOTICE is emitted every time differential refresh falls back to FULL refresh, including the reason string (A22).
  • Shadow-ST catalog columnsin_shadow_build and shadow_table_name columns added to pgtrickle.pgt_stream_tables (UX-SHADOW).
  • History start_time indexpgt_refresh_history_start_time_idx (start_time DESC) for faster SLA queries and retention pruning (A11).
  • Relay ENV-var interpolation — connection strings support ${ENV:VAR_NAME} placeholders that are expanded from the process environment at startup (A30).
  • Relay reconnect backoff — the relay now retries failed PostgreSQL connections with exponential backoff (initial 100 ms, max 30 s, ±20 % jitter) (A38).
  • Relay backpressure — new sink_max_inflight config field (default 1 000 messages) that can be used to pause upstream polling (A39).
  • Notify coalesce GUCpg_trickle.notify_coalesce_ms (integer, default 250 ms) reserved for future NOTIFY debounce (UX-GUC).

Correctness fixes

  • A01 / EC-01: cross-cycle phantom-row cleanup now runs unconditionally after every join differential refresh cycle (batch 1 024 rows) instead of being deferred. This eliminates phantom residual rows that accumulated over multi-cycle windows (A01).
  • A05: join_and_predicates() no longer panics on an empty predicate list — now returns Result<Expr, PgTrickleError> (A05).

Performance

  • History prune batch size raised from 1 000 → 10 000 rows per transaction to reduce pruning lag on busy clusters (A10).
  • Citus lease jitter: try_acquire_st_lock() adds 50–500 ms random backoff on INSERT conflict to prevent coordinator thundering herd (A13).

Developer experience

  • Unit tests added for inbox_is_my_partition (A06) and outbox_table_name_for (A06).
  • Relay config tests added for ${ENV:VAR_NAME} expansion.

Upgrade notes

Run the upgrade migration:

ALTER EXTENSION pg_trickle UPDATE TO '0.35.0';

The migration adds pgt_refresh_history_start_time_idx, creates pgtrickle.pgt_subscriptions, and adds in_shadow_build / shadow_table_name columns to pgtrickle.pgt_stream_tables. All DDL is idempotent.


[0.34.0] — Citus: Automated Distributed CDC Scheduler & Shard Recovery

v0.33.0 shipped all the Citus distributed CDC infrastructure — per-worker WAL slots, pgt_st_locks coordination, poll_worker_slot_changes, and handle_vp_promoted. v0.34.0 closes the final gap: the scheduler is now fully aware of distributed sources and drives the per-worker slot lifecycle automatically, making distributed stream tables completely hands-off.

What's new

  • Automated scheduler integration (COORD-10, COORD-11, COORD-12): When a stream table source has source_placement = 'distributed', the scheduler now calls ensure_worker_slot() on the first tick (and after rebalances), calls poll_worker_slot_changes() to drain per-worker WAL changes into the local buffer, and acquires/extends/releases a pgt_st_locks lease around the entire operation.

  • Shard rebalance auto-recovery (COORD-13): The scheduler detects pg_dist_node topology changes by comparing active primaries against pgt_worker_slots. When a change is detected, stale slot entries are dropped, new worker slots are inserted, and the stream table is marked for a full refresh — no operator intervention needed.

  • Worker failure handling (COORD-14): If poll_worker_slot_changes() fails for a worker, the error is logged and the worker is skipped for that tick. After pg_trickle.citus_worker_retry_ticks consecutive failures, a WARNING is emitted in the PostgreSQL log for operator attention. Refreshes against healthy workers continue uninterrupted.

  • New GUC (COORD-15): pg_trickle.citus_worker_retry_ticks (default 5) — consecutive worker-poll failures before flagging in citus_status. Set to 0 to disable the alert.

  • Extended citus_status view (COORD-16): The view now includes last_polled_at (timestamp of the last successful poll for each worker slot), lease_holder, lease_acquired_at, lease_expires_at, and lease_health ('unlocked' / 'locked' / 'expired') columns for full operational visibility.

Migration

No application-level changes required. The new scheduler behaviour activates automatically for stream tables with source_placement = 'distributed'. Operators using manual LISTEN + handle_vp_promoted() wiring can remove that code — it is now redundant (though harmless to leave in place).

Run ALTER EXTENSION pg_trickle UPDATE TO '0.34.0' to pick up the new last_polled_at column and extended citus_status view.


[0.33.0] — Citus: Distributed Source CDC & Stream Tables

This release delivers world-class incremental view maintenance over Citus distributed tables, and aligns with pg_ripple v0.58.0 Citus sharding support. pg_trickle can now track changes on distributed source tables and write results to distributed output tables, while leaving all non-Citus code paths completely unchanged.

pg_ripple Citus Co-location Helper

New: pgtrickle.handle_vp_promoted(payload TEXT) → BOOLEAN

Processes a pg_ripple.vp_promoted NOTIFY payload emitted by pg_ripple v0.58.0 when a VP table is distributed via Citus. Call this from any regular backend session that is LISTENing to pg_ripple.vp_promoted:

LISTEN "pg_ripple.vp_promoted";
-- … receive notification …
SELECT pgtrickle.handle_vp_promoted(:'NOTIFY_PAYLOAD');

The function:

  • Parses the payload JSON (table, shard_count, shard_table_prefix, predicate_id).
  • Logs the promotion details.
  • When the promoted table matches an active distributed CDC source in pgt_change_tracking, signals the scheduler to probe worker slots on the next tick without a full catalog scan.
  • Returns true if a matching source was found, false otherwise.

docs/integrations/citus.md gains a new pg_ripple Integration section covering co-location DDL, the vp_promoted notification contract, and guidance on aligning pgt_st_locks lease expiry with pg_ripple.merge_fence_timeout_ms.

Distributed stream table output

create_stream_table() gains a new optional parameter output_distribution_column. When provided, and Citus is installed, the output storage table is converted to a Citus distributed table on that column immediately after creation. Existing call sites without the parameter are unaffected.

-- Co-locate the stream table with the source shards
CALL pgtrickle.create_stream_table(
    name                       => 'orders_summary',
    query                      => 'SELECT customer_id, count(*) FROM orders GROUP BY 1',
    output_distribution_column => 'customer_id'
);

Per-worker WAL slot tracking (pgt_worker_slots)

A new catalog table pgtrickle.pgt_worker_slots records the logical replication slot name and last-consumed frontier for each Citus worker node per source table. This enables per-worker CDC polling and accurate lag monitoring across all nodes in the cluster.

Cross-node refresh coordination (pgt_st_locks)

A new catalog table pgtrickle.pgt_st_locks provides lightweight distributed mutex semantics using INSERT … ON CONFLICT DO NOTHING. This replaces advisory locks for distributed stream table refreshes, ensuring that only one coordinator node applies changes at a time across a multi-coordinator Citus setup.

Citus observability view (citus_status)

SELECT * FROM pgtrickle.citus_status returns one row per (stream table, source, worker) combination, showing the coordinator slot, worker slot name, last consumed LSN, and source placement type. Use this view to monitor replication lag and detect unreachable workers.

SELECT pgt_name, worker_name, worker_port, worker_slot, worker_frontier
FROM pgtrickle.citus_status;

Correct apply path for distributed stream tables

Citus blocks cross-shard MERGE statements. pg_trickle now automatically detects distributed output stream tables and switches to a DELETE + INSERT … ON CONFLICT DO UPDATE apply path, which Citus supports natively. Single-node and reference-table stream tables continue to use the existing MERGE path.

Pre-flight checks for Citus clusters

Two new pre-flight check functions are available via the Rust API:

  • check_citus_version_compat() — verifies that all worker nodes are running the same pg_trickle version as the coordinator. Returns an error listing any mismatched workers.
  • check_worker_wal_levels() — verifies that wal_level = logical is configured on all worker nodes. Returns an error if any worker has a lower WAL level, preventing silent slot-creation failures.

Per-worker CDC helpers

The poll_worker_slot_changes() function drains a logical replication slot on a remote Citus worker via dblink and writes the decoded changes into the coordinator's local change buffer. The ensure_worker_slot() function creates the slot if it does not already exist, making the setup idempotent on every scheduler tick.

Citus integration guide

A new documentation page at docs/integrations/citus.md covers prerequisites, installation, placement options, the observability view, known failure modes (unreachable workers, recycled WAL slots, shard rebalancing), and performance considerations.

Upgrade

Run the standard extension upgrade. The migration script adds the three new catalog objects (pgt_st_locks, pgt_worker_slots, citus_status) and replaces the three create_stream_table function signatures with versions that include the new output_distribution_column parameter. Existing call sites without the new parameter continue to work without change.

ALTER EXTENSION pg_trickle UPDATE TO '0.33.0';

[0.32.0] — Citus: Stable Naming & Per-Source Frontier Foundation

This release lays the foundation for world-class Citus support by replacing OID-based internal object names with stable hash-derived names and adding Citus cluster detection helpers.

Stable internal object naming

pg_trickle now names every internal object (change buffer tables, trigger functions, WAL replication slots, publication names) using a short 16-character hex string derived from the schema-qualified source table name:

changes_a3f7b2c1d0e5f9a8       -- was: changes_12345
pgt_cdc_fn_a3f7b2c1d0e5f9a8    -- was: pgt_cdc_fn_12345
pgtrickle_a3f7b2c1d0e5f9a8     -- was: pgtrickle_12345

This name is identical on every Citus node, survives pg_dump/restore cycles, and survives OID reassignment after a major-version upgrade. Existing installations are upgraded automatically by the migration script — all existing objects are renamed in a single transaction with no downtime.

The change is invisible to end users: no SQL API changes, no configuration changes, no behaviour changes on single-node PostgreSQL.

Citus cluster detection

A new internal module (src/citus.rs) provides helpers to detect whether Citus is loaded and how a given source table is distributed (local, reference, or distributed). This information is stored in the catalog and will drive per-node CDC and apply strategies in v0.35.0.

New catalog columns

Three catalog tables gain new columns:

  • pgtrickle.pgt_stream_tables: st_placement TEXT DEFAULT 'local'
  • pgtrickle.pgt_dependencies: source_stable_name TEXT, source_placement TEXT DEFAULT 'local'
  • pgtrickle.pgt_change_tracking: source_stable_name TEXT, source_placement TEXT DEFAULT 'local', frontier_per_node JSONB

New SQL function

pgtrickle.source_stable_name(oid) → TEXT — returns the 16-character stable hash for any source relation by OID. Useful for diagnostics.

Upgrade notes

The 0.31.0 → 0.32.0 migration script handles all object renames automatically. Replication slots are renamed if the PostgreSQL version is 15+; on older versions a manual rename step is logged as a NOTICE. Existing change buffer data is preserved — only the table and function names change.


[0.31.0] — Performance & Scheduler Intelligence

This release delivers measurable performance improvements for deployments with many stream tables, along with new tools for monitoring scheduler behaviour and reacting to processing backlogs before they become a problem.

Faster immediate-mode updates

Stream tables configured in immediate mode — which update on every data change rather than on a schedule — now handle those changes more efficiently. Previously, every single data change caused PostgreSQL to create and destroy a temporary table in the background, a fixed cost that adds up at high write rates. That overhead has been eliminated.

This improvement is opt-in. Enable it with pg_trickle.ivm_use_enr = true (requires PostgreSQL 18+).

Fewer database round-trips for shared sources

When multiple stream tables all read from the same source table, pg_trickle now scans their pending changes in a single database pass instead of once per stream table. If you have ten stream tables all watching the same orders table, pg_trickle makes one read instead of ten. The benefit scales with the number of stream tables. This is on by default.

Smarter update-strategy hints

Every refresh, pg_trickle chooses between two strategies for applying changes: a merge approach (efficient for small change sets) and a delete-then-reinsert approach (faster when large portions of the data have changed). Enabling pg_trickle.adaptive_merge_strategy now logs a suggestion after each refresh indicating whether the current strategy is optimal, based on the ratio of changes to total rows. This makes performance tuning straightforward — no restarts or code changes required.

Silent fallbacks are now visible

When pg_trickle encounters a problem analysing certain query types, it falls back to a slower, more conservative update mode. Previously this was invisible. The count of such fallbacks is now tracked and surfaced in pgtrickle.metrics_summary() under ivm_lock_parse_error_count, so you can spot and address the underlying cause.

Back-pressure alerts for overloaded pipelines

If data is arriving faster than pg_trickle can process it, the change buffer grows. pg_trickle now watches this and, after 3 consecutive cycles above the alert threshold (configurable via pg_trickle.backpressure_consecutive_limit), raises a change_buffer_backpressure alert. Applications or monitoring systems can listen for this event and respond — for example by slowing producers or adding consumers.

Coming soon: cross-database refresh coordination

A detailed design for a future cross-database refresh coordinator has been published in docs/research/multi_db_refresh_broker.md. Implementation is planned for v0.32.0.

What changed

  • Error messages are now categorised by standard SQL error code by default, making them easier to parse in automated monitoring. The previous behaviour can be restored with pg_trickle.use_sqlstate_classification = false.

New settings

SettingDefaultWhat it does
pg_trickle.ivm_use_enroffEliminate temporary-table overhead in immediate mode (PostgreSQL 18+ only)
pg_trickle.adaptive_batch_coalescingonScan change buffers for shared sources in a single pass
pg_trickle.adaptive_merge_strategyoffLog update-strategy suggestions after each refresh
pg_trickle.backpressure_consecutive_limit3Consecutive over-threshold cycles before raising a back-pressure alert

Upgrade

Run ALTER EXTENSION pg_trickle UPDATE TO '0.31.0'; — no manual changes required. The faster immediate-mode path is opt-in; set pg_trickle.ivm_use_enr = true to enable it.


[0.30.0] — Pre-GA Correctness & Stability Sprint

This release is focused entirely on correctness and stability in preparation for the 1.0 release. There are no new user-facing features — every change is a fix, a safety guard, or a memory efficiency improvement.

Fixed: phantom rows in join-based stream tables

Stream tables that join multiple source tables could silently accumulate stale rows over time when a refresh was interrupted part-way through. Those rows are now cleaned up automatically after every refresh, ensuring the result always converges to the correct answer.

Fixed: incorrect results for complex query patterns

Subqueries nested inside CASE expressions, COALESCE calls, and function arguments are now correctly detected and handled. Previously, stream tables using these patterns could produce wrong incremental refresh results.

Safer snapshots

Snapshot creation and restore are now fully atomic. If anything goes wrong mid-operation — a disk error, a timeout, a lost connection — the operation is cleanly rolled back and no partial tables are left behind.

Restoring from a snapshot no longer relies on PostgreSQL's internal column ordering, making restores safe across different PostgreSQL minor versions.

Bounded memory for in-flight update data

The internal cache that stores update data between steps was previously unbounded. On deployments with many stream tables, it could grow large over time. The cache now enforces a configurable maximum and evicts the oldest entries when full, keeping memory usage predictable.

Additionally, cached query templates now expire after a configurable age (default: 7 days). Old plans are automatically removed during background maintenance, preventing stale query plans from accumulating.

Complexity cap for queries

A new pg_trickle.max_parse_nodes setting lets you cap query complexity. Queries that exceed the limit are rejected immediately with a clear error instead of consuming unexpected memory.

New settings

SettingDefaultWhat it does
pg_trickle.use_sqlstate_classificationoffCategorise errors by SQL error code (useful for automated retry logic)
pg_trickle.template_cache_max_age_hours168 (7 days)Evict cached query plans older than this
pg_trickle.max_parse_nodes0 (disabled)Reject queries that exceed this complexity limit

Upgrade

No schema changes. Upgrade from v0.29.0 with:

ALTER EXTENSION pg_trickle UPDATE TO '0.30.0';

[0.29.0] — Relay CLI (pgtrickle-relay)

This release introduces pgtrickle-relay — a standalone companion tool that connects pg_trickle to the outside world.

What is pgtrickle-relay?

The relay bridges pg_trickle's inbox and outbox tables with external messaging systems, handling the reliable "last mile" of getting data in and out of your database.

  • Forward (outbox → external): Watches your pg_trickle outbox tables and forwards new records to external systems as they arrive. Supported destinations include Kafka, NATS, HTTP webhooks, Redis Streams, AWS SQS, RabbitMQ, and plain text output.
  • Reverse (external → inbox): Reads messages from external systems and writes them into your pg_trickle inbox tables, enabling fully bidirectional event-driven pipelines.

Configured entirely through SQL

There are no YAML files or config files to manage. You set up and manage relay pipelines with SQL:

FunctionWhat it does
pgtrickle.set_relay_outbox(...)Configure an outbox-to-external pipeline
pgtrickle.set_relay_inbox(...)Configure an external-to-inbox pipeline
pgtrickle.enable_relay(name)Start a relay pipeline
pgtrickle.disable_relay(name)Pause a relay pipeline
pgtrickle.delete_relay(name)Remove a relay pipeline
pgtrickle.list_relay_configs()List all configured pipelines

Built for reliability

  • No duplicate messages: Every destination uses a deduplication key to prevent the same message from being delivered more than once, even if the relay restarts mid-send.
  • High availability: Multiple relay instances can run simultaneously and coordinate automatically using database-level locks — no external coordination service such as ZooKeeper or Redis is needed.
  • Live config updates: Change relay configuration in SQL and it takes effect within seconds, with no restart.
  • Built-in monitoring: Health check at /health and Prometheus metrics at /metrics (port 9090 by default).

Upgrade notes

ALTER EXTENSION pg_trickle UPDATE TO '0.29.0';

The relay binary is distributed separately (see Dockerfile.relay). Existing stream tables, views, and outbox/inbox APIs are unchanged.


[0.28.0] — Transactional Inbox & Outbox Patterns

This release adds two complementary patterns for reliably integrating pg_trickle with external systems.

The problem these patterns solve

When you update a database and need to notify an external system — a message queue, an API, a downstream service — you face a reliability challenge: what happens if the database update succeeds but the notification fails? You can end up with data in your database that the external system never heard about, or a notification sent for a change that was rolled back.

The outbox pattern solves this: the notification is written in the same database transaction as the data change, so they either both succeed or both fail. pg_trickle then delivers the notification reliably once the transaction has committed.

The inbox pattern is the reverse: external messages arrive into a managed queue inside PostgreSQL, where they can be processed reliably, retried on failure, and replayed if needed.

Outbox

Enable the outbox on any stream table with pgtrickle.enable_outbox(). After each refresh, pg_trickle writes a record to a dedicated outbox table. Your application or the relay tool picks it up from there and forwards it to external consumers.

Consumers can work in named consumer groups — similar to Kafka consumer groups. Each consumer tracks its own position in the stream independently and can be replayed, paused, or have its lease extended without affecting others.

FunctionWhat it does
pgtrickle.enable_outbox(name, retention_hours)Start capturing refresh output for external delivery
pgtrickle.disable_outbox(name)Stop capturing
pgtrickle.outbox_status(name)See the current outbox state
pgtrickle.outbox_rows_consumed(stream_table, outbox_id)Acknowledge that records have been delivered
pgtrickle.create_consumer_group(name, outbox, ...)Create a named group of consumers
pgtrickle.drop_consumer_group(name)Remove a consumer group
pgtrickle.poll_outbox(group, consumer, batch_size, ...)Claim the next batch of records
pgtrickle.commit_offset(group, consumer, last_offset)Acknowledge processed records
pgtrickle.extend_lease(group, consumer, ...)Hold onto a batch longer before it times out
pgtrickle.seek_offset(group, consumer, new_offset)Jump to a specific position (for replay)
pgtrickle.consumer_heartbeat(group, consumer)Signal that a consumer is still alive
pgtrickle.consumer_lag(group)See how far behind each consumer is

Inbox

Create a named inbox with pgtrickle.create_inbox(). pg_trickle automatically sets up a pending queue, a dead-letter queue (for messages that could not be processed), and a stats table.

FunctionWhat it does
pgtrickle.create_inbox(name, ...)Create a managed inbox with pending queue and dead-letter queue
pgtrickle.drop_inbox(name, ...)Remove an inbox
pgtrickle.enable_inbox_tracking(name, ...)Attach inbox tracking to an existing table
pgtrickle.inbox_health(name)Get a health summary for an inbox
pgtrickle.inbox_status(name)Show queue depths and processing stats
pgtrickle.replay_inbox_messages(name, event_ids)Reset specific messages for re-processing

Additional inbox capabilities:

  • Ordered processing: pgtrickle.enable_inbox_ordering() ensures messages for the same entity (e.g. the same customer or order ID) are processed in sequence, eliminating race conditions without any extra coordination in your application.
  • Priority tiers: pgtrickle.enable_inbox_priority() marks messages as high or low priority so the scheduler processes urgent messages first.
  • Horizontal scaling: pgtrickle.inbox_is_my_partition() provides consistent hash-based partition assignment for multi-worker inbox processing. Multiple workers can safely share an inbox without an external coordinator.
  • Gap detection: pgtrickle.inbox_ordering_gaps() surfaces any sequence gaps per entity so you can detect and recover from missing messages.

New settings

SettingDefaultWhat it does
pg_trickle.outbox_enabledonEnable the outbox subsystem
pg_trickle.outbox_retention_hours24How long to keep delivered outbox records
pg_trickle.outbox_drain_batch_size1000Records to process per drain pass
pg_trickle.outbox_skip_empty_deltaonSkip writing an outbox record when there are no changes
pg_trickle.consumer_dead_threshold_hours24Hours before a silent consumer is considered dead
pg_trickle.inbox_enabledonEnable the inbox subsystem
pg_trickle.inbox_processed_retention_hours72How long to keep processed inbox records
pg_trickle.inbox_dlq_alert_max_per_refresh10Alert when this many messages land in the dead-letter queue in one cycle

Upgrade

ALTER EXTENSION pg_trickle UPDATE TO '0.28.0';

[0.27.0] — Operability, Observability & DR

This release focuses on three areas: disaster recovery tooling, better visibility into multi-database deployments, and a more reliable built-in metrics server.

Snapshot and restore

You can now export a stream table's current data to an archive table and restore it later. This is useful for bootstrapping a new read replica without a full database dump, taking a point-in-time snapshot before a risky migration, or recovering a stream table to a known-good state.

FunctionWhat it does
pgtrickle.snapshot_stream_table(name, target)Export a stream table to an archive table
pgtrickle.restore_from_snapshot(name, source)Restore from an archive table
pgtrickle.list_snapshots(name)List available snapshots with size and age
pgtrickle.drop_snapshot(snapshot_table)Delete a snapshot

Restore aligns the stream table's internal progress marker with the snapshot, so incremental refresh resumes correctly without any manual steps.

Predictive schedule recommendations

pg_trickle now analyses its own refresh history and recommends optimal refresh intervals for each stream table.

  • pgtrickle.recommend_schedule(name) returns a suggested interval and a confidence score. Confidence is low on new deployments and rises as history accumulates (at least 20 samples are needed before the score is meaningful).
  • pgtrickle.schedule_recommendations() returns recommendations for all stream tables in one call.
  • A predicted_sla_breach alert fires when the model predicts the next refresh is likely to miss your freshness target by more than 20%. The alert fires at most once every 5 minutes by default, to avoid flooding.

Cluster-wide worker visibility

In deployments that run pg_trickle across multiple databases, pgtrickle.cluster_worker_summary() shows which databases are consuming background workers. This makes it easy to diagnose situations where one database is crowding out others.

All Prometheus metrics now include database-level labels, so you can split a single Grafana panel by database.

Metrics server improvements

  • A new pgtrickle.metrics_summary() SQL function returns cluster-wide refresh and error counts — useful for monitoring without a Prometheus scraper.
  • Port conflicts now produce a clear error message instead of failing silently.
  • Malformed HTTP requests now return a proper 400 Bad Request response.

New settings

SettingDefaultWhat it does
pg_trickle.schedule_recommendation_min_samples20Minimum history samples before schedule confidence is meaningful
pg_trickle.schedule_alert_cooldown_seconds300Minimum seconds between consecutive predicted_sla_breach alerts
pg_trickle.metrics_request_timeout_ms5000Maximum time the metrics server waits for a request (ms)

Upgrade

This release upgrades the internal pgrx library to 0.18.0. This is transparent to users. Run:

ALTER EXTENSION pg_trickle UPDATE TO '0.27.0';

[0.26.0] — Test & Concurrency Hardening

This release is all about making pg_trickle more reliable and battle-tested. There are no new SQL commands or user-facing features — every change is internal: more tests, safer concurrent operations, cleaner code structure, and better error messages.

Safer under concurrent load

Running multiple operations at the same time — such as modifying a stream table while it's actively refreshing, or dropping a table while its workers are still running — is now explicitly tested and guaranteed to be safe. These scenarios were handled before, but lacked the tests to prove it. That proof is now part of every build.

  • Simultaneous alter + refresh no longer risks a deadlock. The catalog stays consistent throughout.
  • Drop during refresh aborts cleanly — no orphaned change buffers or dangling catalog rows left behind.
  • Parallel scheduler workers are prevented from picking the same stream table for refresh at the same time — a hard guarantee, not just a convention.
  • Simultaneous buffer promotion — when two workers race to promote a change buffer, exactly one succeeds and the metadata stays consistent.

More stable SLA-based scheduling

The scheduler uses a predictive model to decide when to refresh stream tables, balancing your freshness targets against system load. That model now holds its ground under difficult workloads.

  • Bursty, sawtooth, and spike workloads are all validated in a new dedicated test suite.
  • No more tier flapping — the priority tier of a stream table (which controls how aggressively it is refreshed) now requires 3 consecutive breaches before downgrading and 3 consecutive successes before upgrading. This prevents the system from oscillating at the boundary, which caused unnecessary refresh churn in earlier releases.
  • A 10,000-iteration randomised stress test confirms the tier stays stable even under adversarial latency patterns.

Fuzz testing and extreme-scale validation

The extension is now tested against malformed, random, and adversarial inputs in three new fuzz test areas, preventing certain classes of unexpected input from crashing the extension:

  • Invalid cron schedule expressions
  • Unrecognised or malformed configuration values
  • Unexpected row shapes in change-capture triggers

Two new scale tests verify behaviour at extremes:

  • A source table with 1,000 partitions installs change-capture triggers and completes its first refresh within 60 seconds.
  • A flooded worker pool does not starve high-priority stream tables in a second database — multi-database fairness is enforced under load.

Cleaner internals: refresh module reorganised

The refresh orchestrator had grown into a single very large file. It has been split into three focused modules with no behaviour change:

ModuleWhat it handles
orchestratorDeciding when and how to refresh — timing, cost model, recovery
codegenBuilding the SQL queries and managing the query cache
mergeExecuting the actual refresh — incremental, full, or TopK

Better error messages

Error messages throughout the extension now include more context — table names, operation types, and hints such as "check system clock" on timestamp failures. This makes it easier to diagnose problems from logs alone.

A new crash-recovery test verifies that a publication subscriber that was active when the database was killed catches up with zero data loss after restart.


[0.25.0] — Scheduler Scalability & Pooler Performance

pg_trickle now comfortably manages thousands of stream tables on commodity hardware — a significant jump from the practical ceiling of a few hundred in earlier releases. The scheduler avoids reloading the full catalog on every tick, change detection is batched into far fewer database round-trips, and a new cache-sharing mechanism means connecting backends can skip expensive query re-parsing entirely. If you use a connection pooler such as PgBouncer, RDS Proxy, or Supabase Pooler, this release delivers the largest latency improvement to date.

Scales to thousands of stream tables

Previously, the scheduler queried the catalog on every tick — a process that grew slower as the stream table count increased. Metadata is now cached per backend and only reloaded when the dependency graph actually changes. Checking whether source tables have new rows is batched across an entire refresh group into a single query, down from one query per source per tick. Dependency-graph rebuilds now happen in the background without blocking ongoing refreshes, so you never get a stall when a stream table is created or dropped.

New GUC: pg_trickle.worker_pool_size (default 0 = spawn-per-task). Set this to a positive number to keep that many background workers running permanently, eliminating roughly 2 ms of spawn overhead per worker on high-throughput deployments.

Faster connections through poolers

A new shared-memory signal lets each connecting backend check whether the query-template cache is already warm. If it is, the backend skips query parsing entirely and jumps straight to the cached result. This matters most in pooled environments — PgBouncer, RDS Proxy, Supabase — where backends connect and disconnect frequently and re-parsing on every connection was a hidden cost.

The per-backend template cache is now bounded by pg_trickle.template_cache_max_entries (default 0 = unbounded). When the limit is reached, the least-recently-used entry is evicted automatically, keeping memory usage predictable on servers with many concurrent backends.

A new SQL function, pgtrickle.clear_caches(), flushes all cache levels in one call — useful after schema changes or when debugging unexpected behaviour.

Lower overhead on high-write workloads

Change fingerprinting — the hashing that identifies which rows changed — now streams values directly into the hash function instead of building a temporary string per row, eliminating one heap allocation per incoming change. SQL buffers in the query-projection step are pre-sized rather than repeatedly concatenated. Refresh timing data (how long full and incremental refreshes take) is stored in shared memory so parallel workers can read it without a catalog round-trip.

More conservative refresh-mode predictions

The predictive model that decides when to fall back from incremental to full refresh is now more stable. It waits for at least 60 seconds of history before making any prediction — preventing erratic switches on fresh deployments — removes statistical outliers before fitting, and keeps its output within a reasonable band around recent observed timings.

Subscriber lag tracking for downstream publications

If you use stream_table_to_publication() to feed a downstream system, pg_trickle now monitors how far behind each subscriber's replication slot has fallen. When a subscriber exceeds pg_trickle.publication_lag_warn_bytes, a warning is logged and change-buffer cleanup is paused for that slot until it catches up — preventing data loss for slow consumers.

A new SQL function, pgtrickle.worker_allocation_status(), returns per-database worker usage, quotas, and queue depth across the cluster. Useful for diagnosing scheduler starvation in multi-tenant deployments.

Upgrade notes

  • Row ID change: The internal hash function changed from xxh64 to xxh3. If your application relies on stable pg_trickle row ID values across versions, run SELECT pgtrickle.reinitialize('<schema>.<table>') on each affected stream table after upgrading.
  • No schema changes beyond two new SQL functions (clear_caches and worker_allocation_status). No data migration required.

[0.24.0] — Join Correctness & Durability Hardening

This release focuses on two themes: correctness — ensuring stream tables that join multiple source tables always give you the right answer — and durability — ensuring your data is never lost or skipped, even when the server crashes or long-running transactions are in flight.

More accurate results from multi-table joins

When a stream table combines rows from two or more source tables, pg_trickle now guarantees that an incremental refresh produces exactly the same result as a full recompute from scratch. A subtle bug in how rows were tracked across refresh cycles could previously cause phantom rows to accumulate silently over time. Those phantom rows are now detected automatically after every incremental refresh and cleaned up.

No data loss across crashes or restarts

pg_trickle now records its progress in a crash-safe sequence: it saves its intent before writing data, then marks completion afterwards. If the server goes down between those two steps, pg_trickle reconciles its position on restart — no changes are processed twice and none are silently dropped. The scheduler also persists its last known-safe position across restarts, closing a narrow gap that existed in earlier versions.

Long-running transactions no longer cause missed changes

If a database transaction stays open while pg_trickle is running a refresh, the changes it is writing could previously be overlooked — they were captured before the refresh started but not yet visible to it. pg_trickle now checks for open transactions before advancing its read position and waits for them to commit first.

  • pg_trickle.frontier_holdback_mode — controls the holdback behaviour.
  • pg_trickle.frontier_holdback_warn_seconds (default 60) — logs a warning when a transaction has been blocking progress longer than this threshold.

Works correctly on managed cloud databases

AWS RDS, Cloud SQL, and Azure Database for PostgreSQL restrict access to certain monitoring views. pg_trickle now detects this automatically and tells you exactly what to do:

GRANT pg_monitor TO <your_pg_trickle_role>;

Without this grant, pg_trickle previously behaved as if no transactions were open — the same unsafe condition the holdback feature was built to prevent. See docs/TROUBLESHOOTING.md section 14 for full diagnosis steps.

Choose your durability level

The new pg_trickle.change_buffer_durability setting controls how carefully incoming changes are stored before processing:

  • unlogged (default) — fastest; change buffers do not survive a server crash.
  • logged — survives crashes and replicates to standby servers.
  • sync — maximum safety; every write is confirmed to disk before continuing.

Automatic history clean-up

Old refresh history rows are now pruned automatically in small background batches during idle time. Previously the history table grew without bound, which could become noticeable on busy deployments.

Alerts for frozen stream tables

The new pgtrickle.df_frozen_stream_tables view flags any stream table that has not refreshed within 5× its expected interval, and sends a notification on the pgtrickle_alert channel. Useful for catching a stuck or disabled stream table before users notice stale data.

New monitoring metrics

Two new Prometheus metrics expose holdback state:

  • pg_trickle_frontier_holdback_lsn_bytes — how far behind the read position is being held, in bytes of WAL.
  • pg_trickle_frontier_holdback_seconds — how long the oldest blocking transaction has been running.

Note: All metrics now use the pg_trickle_ prefix consistently. If your dashboards or alerting rules use the old pgtrickle_ prefix, update them before upgrading.


[0.23.0] — Performance Tuning & Diagnostics

This release gives you better tools to understand and control how pg_trickle performs, with new settings for memory tuning and new functions for inspecting what the extension is doing under the hood.

See exactly what SQL is running

Turn on pg_trickle.log_delta_sql and pg_trickle will log the SQL it generates for each incremental refresh. You can paste that SQL directly into EXPLAIN ANALYZE to understand why a particular refresh is taking longer than expected — no code changes required.

Tune memory for refreshes without restarting

pg_trickle.delta_work_mem lets you give incremental refresh queries more (or less) working memory without touching PostgreSQL's global settings or restarting the server. Apply it instantly with:

ALTER SYSTEM SET pg_trickle.delta_work_mem = 256;

Automatic statistics before each refresh

pg_trickle now runs a quick statistics pass on change buffers before executing an incremental refresh. This gives PostgreSQL's query planner accurate row counts and generally produces faster, more predictable query plans with no manual intervention. Controlled by pg_trickle.analyze_before_delta (on by default).

Warning when incremental is unexpectedly slower than full

If an incremental refresh takes longer than the last full refresh, pg_trickle now logs a warning that includes both timings. This surfaces scenarios where incremental refresh has become counterproductive so you can investigate and adjust thresholds.

Alert when too many changes pile up

Set pg_trickle.max_change_buffer_alert_rows to a row count and pg_trickle will warn you whenever any source table's pending change buffer exceeds that threshold. This is useful for catching unexpected write bursts before they slow down your refreshes.

Refresh timing statistics at a glance

The new pgtrickle.pgtrickle_refresh_stats() function returns per-stream-table refresh durations — average, 95th percentile, and 99th percentile — in a single query. No need to manually aggregate the history table.

Inspect generated SQL without running it

Call pgtrickle.explain_diff_sql(name) on any stream table to see the SQL pg_trickle would use for an incremental refresh — without actually executing it. Useful for understanding query structure and diagnosing performance issues.


[0.22.0] — Downstream CDC, Parallel Refresh & Predictive Cost Model

This release makes it easier to feed stream table changes to other systems, gives you a knob to control how many refreshes run at once, and adds automatic intelligence for choosing between incremental and full refresh.

Stream table changes can flow to other systems

stream_table_to_publication(name) creates a PostgreSQL logical replication publication for a stream table. Any downstream tool that understands PostgreSQL replication — Debezium, Kafka Connect, a read replica, or a custom consumer — can then subscribe and receive changes as they happen. Publications are removed automatically when the stream table is dropped. Use drop_stream_table_publication(name) to remove one manually.

Control how many tables refresh at once

pg_trickle.max_parallel_workers caps the number of stream tables that can refresh simultaneously. The scheduler already runs independent refreshes in parallel; this setting gives you an explicit limit if you want to reserve database resources for your application.

Automatic mode switching based on predicted cost

pg_trickle now learns from your refresh history. Before each incremental refresh it predicts how long it will take based on recent timings. If that prediction exceeds 1.5× the cost of a full refresh, it switches to full refresh for that cycle automatically — no manual intervention needed. The lookback window, threshold, and minimum sample count are all configurable:

  • pg_trickle.prediction_window — how many recent refreshes to consider (default 60).
  • pg_trickle.prediction_ratio — how much more expensive incremental must be before switching to full (default 1.5).
  • pg_trickle.prediction_min_samples — minimum history before the model activates (default 5).

Set a freshness target and let pg_trickle handle the rest

Call set_stream_table_sla(name, interval) with your target maximum data age — for example '5 seconds' or '1 minute' — and pg_trickle assigns the most appropriate refresh tier automatically. It re-evaluates the assignment over time as real-world refresh performance changes.


[0.21.0] — Reliability, Safety & Operational Tools

This release focuses on making pg_trickle safer and easier to operate day-to-day. It eliminates hidden crash risks in the query analysis engine, adds new operational commands for maintenance windows, and introduces a built-in monitoring endpoint so you don't need extra software to observe pg_trickle.

The extension can no longer crash your database

When pg_trickle analyses a query internally, it previously had hidden error paths that could — in rare edge cases — abort a PostgreSQL backend process. All of those paths now return a structured error instead of crashing. Additionally, a compile-time rule now prevents production code from ever calling the Rust equivalent of an unchecked assertion, so this class of bug cannot be reintroduced silently.

Warning for queries that shouldn't use incremental refresh

If you create a stream table with a query that calls time-sensitive or non-deterministic functions such as now(), random(), or gen_random_uuid(), pg_trickle now warns you at creation time. Those functions produce a different result every time they run, which means incremental refresh would produce wrong answers — the warning lets you catch this before it becomes a data problem.

Pause and resume everything at once

Two new functions let you halt and restart all active stream tables with a single SQL call:

SELECT pgtrickle.pause_all();   -- stop all refreshes (e.g. before maintenance)
SELECT pgtrickle.resume_all();  -- restart them when you're done

Refresh only when the data is actually stale

pgtrickle.refresh_if_stale(name, max_age) triggers a refresh only if the stream table is older than your specified threshold. Returns TRUE when a refresh ran, FALSE when the data was already fresh enough. Useful for scripts and scheduled jobs that shouldn't over-refresh.

Export a stream table's definition

pgtrickle.stream_table_definition(name) returns the complete CREATE STREAM TABLE statement for any stream table. Handy for documentation, disaster recovery playbooks, and migrations.

Test query changes safely before going live

A three-step canary workflow lets you try a new query on a shadow copy of your stream table and compare the results before committing to the change:

  1. canary_begin(name, new_query) — creates a shadow stream table running the new query in parallel with the original.
  2. canary_diff(name) — shows exactly which rows differ between the old and new queries.
  3. canary_promote(name) — atomically switches the live stream table to the new query once you are satisfied with the results.

Built-in monitoring endpoint

Set pg_trickle.metrics_port = 9188 and pg_trickle serves a Prometheus- compatible metrics endpoint directly — no extra exporter software needed. Metrics include total refreshes, failures, rows changed per refresh, and the number of active stream tables.

Visibility into recursive query fallbacks

When a query containing a recursive clause cannot be refreshed incrementally and falls back to a full refresh, pg_trickle now logs a notice and records the reason in refresh history. Previously this happened silently.

Upgrade

ALTER EXTENSION pg_trickle UPDATE TO '0.21.0';

[0.20.0] — Self Monitoring

pg_trickle now monitors itself. Instead of you having to check on pg_trickle's health manually, this release lets pg_trickle watch its own performance, spot problems early, and even fix some of them on its own. Five new stream tables sit in the pgtrickle schema and continuously analyse refresh history — the same technology you use for your own data, pointed inward. One SQL call sets everything up; one call tears it down.

We call this self monitoring — pg_trickle uses its own stream-table technology to keep an eye on itself, just like it keeps your data views up to date.

What's new

  • One-click self-monitoring — run SELECT pgtrickle.setup_self_monitoring() and pg_trickle creates five monitoring stream tables that continuously track how well it is performing. Run teardown_self_monitoring() to remove them. Both are idempotent — safe to call as many times as you like, even during rolling upgrades.

  • Health at a glance — the new self_monitoring_status() function shows the status of all five monitoring views in one query: whether each one exists, its refresh mode, and the last time it refreshed. Quick to run from a monitoring script or dashboard.

  • Threshold recommendations — after enough refresh cycles accumulate (typically 10–20 minutes of activity), df_threshold_advice starts producing suggestions for each stream table. Each recommendation includes a confidence level (HIGH / MEDIUM / LOW) and a reason — for example, "DIFF is 73% faster — raise threshold to allow more DIFF". A sla_headroom_pct column shows exactly how much faster incremental refresh is versus full refresh for that table.

  • Automatic tuning — set pg_trickle.self_monitoring_auto_apply = 'threshold_only' and pg_trickle will apply HIGH-confidence threshold recommendations automatically. Changes are rate-limited to once per 10 minutes per stream table, and every adjustment is logged to pgt_refresh_history with initiated_by = 'SELF_MONITOR' so you have a full audit trail.

  • Real-time alerts — when pg_trickle detects an anomaly (duration spike exceeding 3× the baseline, or two or more recent failures), it sends a NOTIFY on the pgtrickle_alert channel with a JSON payload. Your application, Alertmanager webhook, or LISTEN client can act immediately without polling.

  • Scheduling interference detectiondf_scheduling_interference tracks pairs of stream tables that consistently overlap during refresh. When overlap is heavy, the scheduler automatically backs off its poll interval (up to 2× the configured base) to reduce contention.

  • Visual dependency graph — the new explain_dag() function renders your full refresh pipeline as a Mermaid or Graphviz DOT diagram. User stream tables appear in blue, self-monitoring tables in green, suspended tables in red. Paste the output into any Mermaid renderer or dot to see exactly how your tables depend on each other.

  • Scheduler overhead reportscheduler_overhead() returns metrics for the last hour: total refreshes, how many were self-monitoring, the fraction they represent, and average durations. Useful for confirming that self-monitoring adds negligible cost.

What pg_trickle watches

Monitoring viewWhat it tracks
df_efficiency_rollingRolling-window refresh speed, change ratio, DIFF vs FULL counts
df_anomaly_signalsDuration spikes (> 3× baseline), error bursts, mode oscillation
df_threshold_advicePer-table threshold recommendations with confidence level and reasoning
df_cdc_buffer_trendsChange-capture buffer growth rate per source table; alerts on burst spikes
df_scheduling_interferenceRefresh overlap patterns; pairs with 3+ concurrent refreshes in the last hour

Faster and more reliable

  • A new index on pgt_refresh_history(pgt_id, start_time) speeds up all self-monitoring queries and general history lookups. Applied automatically during the 0.19.0 → 0.20.0 upgrade.
  • Old history records are now pruned in batches of 1,000 rows per transaction (previously one large DELETE), which avoids long lock holds on pgt_refresh_history during the nightly cleanup.
  • check_cdc_health() is enriched with spill-risk alerts: if a source table's max burst delta exceeds 10× its average, you get an early warning before the buffer fills.
  • explain_st() now shows two new properties: self_monitoring_coverage (none / partial / full) and recommended_refresh_mode, so diagnostics automatically surface self-monitoring data when it is available.

New documentation and tooling

  • SQL Reference — a new "Self Monitoring — Self-Monitoring" section covers all five stream tables, setup_self_monitoring(), teardown_self_monitoring(), confidence levels, and the sla_headroom_pct column.
  • Getting Started — a new "Day 2 Operations" section walks through enabling self-monitoring, reading recommendations, enabling auto-apply, and visualising the DAG.
  • Configurationpg_trickle.self_monitoring_auto_apply is fully documented with values, rate-limiting behaviour, and the audit trail.
  • A ready-made Grafana dashboard (pg_trickle_self_monitoring.json) with five panels covers refresh throughput, anomaly heatmap, threshold calibration, CDC buffer growth, and the scheduling interference matrix.
  • A dbt macro (pgtrickle_enable_monitoring) enables monitoring as a post-hook with one line in dbt_project.yml.
  • A quick-start SQL script at sql/self_monitoring_setup.sql walks through setup, auto-apply, alert listening, and status verification in six steps.

[0.19.0] — Security, Scheduler Performance & Operator Convenience

Safer, faster, easier to operate. This release closes several security and correctness gaps, adds new conveniences for operators and developers, and significantly improves performance for deployments with many stream tables. The background scheduler finds the next table to refresh 10–15× faster. Four breaking changes are included — all easy to adapt to, each one correcting behaviour that was a source of subtle bugs in production.

Breaking changes

  • Only owners can modify their own stream tables — other database users can no longer drop or alter a stream table they did not create. If shared access is intentional, grant superuser or explicitly add the user as owner. Superusers are unaffected.

  • Dropping a stream table no longer cascadesdrop_stream_table() now behaves like PostgreSQL's own DROP TABLE: it refuses to drop if dependent objects exist, unless you pass cascade => true explicitly. Previously it silently removed all dependents, which surprised operators after restructuring.

  • The refresh notification channel was renamed — change LISTEN pgtrickle_refresh to LISTEN pg_trickle_refresh (note the added underscore). The old name was inconsistent with every other channel in the extension.

  • The delete_insert refresh strategy was removed — this strategy could produce wrong results for queries containing aggregates or DISTINCT. If you had it configured, pg_trickle logs a warning and automatically switches to the safe auto strategy. No data is lost; the next refresh corrects any affected rows.

New features

  • Installation health checkversion_check() returns the installed extension version, the loaded library version, and the PostgreSQL server version in one row. If the extension was upgraded but the server has not been restarted, you get an explicit warning. Useful in deploy scripts and smoke tests.

  • Write and refresh in one stepwrite_and_refresh(sql, st_name) executes an arbitrary SQL statement and immediately refreshes the named stream table in the same transaction. Downstream readers see consistent results as soon as the transaction commits — no polling loop needed.

  • Better connection-pooler support — the new pg_trickle.connection_pooler_mode GUC configures pg_trickle for PgBouncer, pgcat, or Supavisor at the cluster level. Previously each stream table had to be configured individually, which was error-prone on large deployments.

  • Automatic refresh history cleanuppgt_refresh_history is now trimmed automatically after 90 days (configurable with pg_trickle.history_retention_days; set to 0 to disable). Without this, the history table could grow by thousands of rows per day on busy deployments.

  • Schema migration tracking — pg_trickle now records which upgrade scripts have been applied in pgtrickle.pgt_schema_version. This makes it straightforward to verify that a deployment is fully up to date and simplifies the rollback story.

  • Clearer skip messages — when a refresh is skipped because another refresh of the same stream table is already running, you now see a NOTICE: skipping refresh of <name> — already running message instead of silence. Reduces confusion when debugging slow or stuck schedulers.

  • Deeper diagnosticsexplain_st() gains a with_analyze parameter. When set to true, it runs EXPLAIN (ANALYZE, BUFFERS) on the refresh query and returns actual row counts, timing, and buffer hit/miss ratios — the same information PostgreSQL's query planner provides for any query, but surfaced inside the stream-table diagnostic tool.

  • New deployment guides — step-by-step documentation for PgBouncer, pgcat, Supavisor, CNPG, and Kubernetes deployments, plus an operational runbook for common Kubernetes failure modes.

Bug fixes

  • Fixed a constraint-validation inconsistency in databases upgraded from 0.11.0 or earlier where pgt_refresh_history had a duplicate check entry in the catalog. Affected databases could see spurious constraint errors on busy write paths.

  • Error messages throughout the extension now show human-readable table names (e.g. public.orders) instead of raw PostgreSQL OIDs. This affects "source table was dropped", "schema changed", and several other error paths that were previously unreadable without a catalog lookup.

Performance

  • 10–15× faster scheduler dispatch — the scheduler now finds the next stream table to process with a direct lookup instead of scanning the full list on every poll cycle. On a deployment with 500 stream tables this drops from ~650 µs to ~45 µs per poll, reducing background CPU overhead significantly at scale.

  • Single-query change detection — when the scheduler checks whether any source tables have changed, it now issues one query covering all sources at once instead of one query per source table. On deployments with 50+ source tables this meaningfully reduces the overhead of each scheduler cycle, especially under PgBouncer transaction pooling.


[0.18.0] — Hardening & Delta Performance

Hardening & Delta Performance. This release focuses on correctness, reliability, and giving operators better visibility into what pg_trickle is doing. Stream tables that group by columns containing NULL values now refresh correctly in all cases. A new memory safety net prevents runaway refreshes from consuming too much RAM. Error messages across the board now explain what went wrong and suggest how to fix it. Two new SQL functions — health_summary() and cache_stats() — give you a single-query overview of the entire system, and updated Grafana dashboards make monitoring plug-and-play. The TPC-H industry benchmark now runs as a nightly regression guard, and property-based tests mathematically verify the core delta engine's arithmetic.

Highlights

  • NULL values in GROUP BY now handled correctly — previous versions could produce wrong results when a stream table grouped by a column that contained NULL values and rows were deleted. The root cause was that NULL group keys broke the internal row-matching logic. This is now fixed: NULL keys are matched correctly during both inserts and deletes, so aggregate stream tables always return the right answer regardless of NULLs in the data.

  • Memory safety net for large deltas — if an unexpectedly large batch of changes arrives (for example, a bulk import into a source table), the incremental refresh could previously consume unbounded memory. A new configuration option (pg_trickle.delta_work_mem_cap_mb) lets you set a ceiling. When a refresh would exceed it, pg_trickle automatically falls back to a full refresh instead of risking an out-of-memory crash.

  • Early warning when refreshes spill to disk — when the incremental refresh engine runs low on memory, PostgreSQL may spill intermediate data to temporary files on disk, which is much slower. pg_trickle now detects this and sends a notification so you can investigate before performance degrades. If spilling happens repeatedly, the scheduler automatically switches the affected stream table to full refresh.

  • One-query system health check — the new pgtrickle.health_summary() function returns a single row with everything you need at a glance: how many stream tables are active, how many are in error or suspended state, the worst staleness across all tables, whether the scheduler is running, and the overall cache hit rate. Perfect for dashboards, alerting rules, or a quick manual check.

  • Cache performance visibility — the new pgtrickle.cache_stats() function shows how effectively pg_trickle is reusing its internal query templates. You can see cache hit rates, eviction counts, and current cache size — useful for tuning pg_trickle.template_cache_size on busy systems.

  • Better error messages — every error pg_trickle can raise now includes a standard PostgreSQL error code (SQLSTATE), a DETAIL line explaining the context, and a HINT suggesting what to do. Instead of a cryptic internal error, you get actionable guidance like "Table 'orders' was dropped while stream table 'order_summary' depends on it — recreate the source table or drop the stream table."

Monitoring & dashboards

  • Updated Grafana dashboards — the bundled pg_trickle_overview.json dashboard now includes panels for template cache hit rate, P99 and average refresh latency, hourly refresh success/failure counts, and cache eviction trends. Import it into Grafana and point it at your Prometheus instance for instant visibility.

  • Prometheus metric documentation — all 8 new metrics exposed by cache_stats() and health_summary() are now fully documented in the monitoring guide, with ready-to-use PromQL queries.

Correctness & testing

  • TPC-H regression guard — all 22 queries from the TPC-H industry benchmark now run nightly against known-good expected output. If a code change causes any query to return different results, CI fails immediately. This catches subtle correctness regressions that targeted tests might miss.

  • Mathematical proof of delta arithmetic — 6 property-based tests (2,000 random cases each) verify that the core engine's insert/delete accounting is correct: operations compose in the right order, groups cancel out properly, and no phantom rows appear after mixed workloads. An additional 4 end-to-end property tests exercise the full pipeline from change capture through to the final merged result.

  • CDC edge case coverage — new tests cover composite primary keys, generated (computed) columns, NULL values in non-key columns, and domain types — real-world schema patterns that were previously untested.

  • dbt integration tests — the dbt adapter now has regression tests for AUTO refresh mode, stream table health checks, and refresh history lifecycle — ensuring the dbt workflow stays reliable across releases.

Scalability

  • Scaling guide — a new docs/SCALING.md document covers how to configure pg_trickle for large deployments (200+ stream tables), including worker pool sizing, tiered scheduling, per-database quotas, and tuning profiles for different workload types.

  • Buffer growth stress tests — new tests verify that the max_buffer_rows safety limit works correctly under sustained high write rates, including automatic recovery back to incremental refresh after a burst subsides.

Testing infrastructure

  • Faster CI on pull requests — 19 additional test files (~197 tests) were moved to the lightweight test runner that does not require building a custom Docker image. Pull request CI is now faster without sacrificing coverage.

  • Upgrade path tested — the full upgrade chain from version 0.1.3 through every release up to 0.18.0 is verified automatically in CI, including function availability, schema integrity, and data survival.

Fixed

  • Upgrade script completeness — the 0.17.0→0.18.0 upgrade migration now includes all new and changed functions (pg_trickle_hash, cache_stats(), health_summary()), so ALTER EXTENSION pg_trickle UPDATE works correctly.

[0.17.0] — Query Intelligence & Stability

Query Intelligence & Stability. This release teaches pg_trickle to make smarter decisions about how to refresh each stream table, reduces unnecessary work when only a handful of columns actually changed, and proves correctness through 10,000 automated random mutations every night. Large deployments with hundreds of stream tables now handle schema changes much faster. Alongside these improvements, three new documentation resources make it easier to get started, troubleshoot problems, and migrate from pg_ivm.

Highlights

  • Query-aware refresh decisions — pg_trickle previously used a fixed threshold to decide between incremental and full refresh: if more than 50% of rows changed, switch to full. That works for simple queries but is poorly calibrated for joins or aggregates. The engine now classifies each query by its complexity (simple scan, filter, aggregate, join, or join+aggregate) and weights the cost estimate accordingly. Simple queries stay incremental even at high change rates; expensive join-heavy queries switch to full refresh sooner when the data is largely different. You can also pin a table to always use one strategy with the new pg_trickle.refresh_strategy setting ('auto' / 'differential' / 'full'), or tune the aggressiveness with pg_trickle.cost_model_safety_margin.

  • Skip columns that did not change — when a row is updated in a wide source table (say, 50 columns) but only 2 columns that the stream table actually uses are modified, pg_trickle previously processed the full change anyway. It now tracks exactly which columns were modified and skips updates that touch none of the relevant columns. For aggregate stream tables the savings go further: a value-only update that does not affect group membership is applied as a single lightweight correction instead of a delete-then-insert pair. On write-heavy workloads with wide tables, this reduces the volume of data flowing through the refresh pipeline by 50–90%.

  • Faster schema changes on large deployments — every time you create, alter, or drop a stream table, pg_trickle previously rebuilt the entire internal dependency graph from scratch. With 100 stream tables that takes only a few milliseconds, but at 1,000 it becomes noticeable. The graph is now updated incrementally — only the affected edges are touched, leaving everything else in place. At 1,000 stream tables the rebuild time drops from ~600 µs to ~116 µs and no longer scales with the total number of tables in the database.

  • Nightly correctness oracle — a new automated test runs 10,000 random data mutations every night against a broad set of query shapes. For each mutation it compares the result of incremental refresh against a full recompute and fails if they ever disagree. This catches subtle correctness bugs that only surface after unusual sequences of inserts, updates, and deletes — the kind that hand-written tests rarely reach.

  • ROWS FROM() fully supported — queries that use ROWS FROM() to call multiple set-returning functions side-by-side are now fully supported in incremental mode, including updates and deletes. This was previously restricted to insert-only workloads.

New documentation

  • Try it in 60 seconds — a new playground/ directory contains a docker compose up environment with PostgreSQL 18 + pg_trickle pre-wired, sample data loaded, and five stream tables ready to query. No installation required beyond Docker.

  • Troubleshooting runbookdocs/TROUBLESHOOTING.md covers 13 real-world failure scenarios: scheduler not running, stream table stuck in SUSPENDED state, CDC triggers missing, WAL slot problems, out-of-memory, disk full, circular dependency convergence issues, unexpected schema changes, worker pool exhaustion, and blown fuses. Each scenario lists symptoms, diagnostic queries, and step-by-step resolution.

  • Migrating from pg_ivmdocs/tutorials/MIGRATING_FROM_PG_IVM.md is a step-by-step guide for teams moving from the pg_ivm extension. It maps every pg_ivm API to its pg_trickle equivalent, explains behavioral differences, and includes ready-to-run SQL examples and a post-migration verification checklist.

  • New user FAQ — the top 15 common questions are now answered at the top of docs/FAQ.md so new users find answers before scrolling through the full document.

  • Post-install verification scriptscripts/verify_install.sql walks through the complete setup: checks that pg_trickle is loaded, creates a test stream table, runs a refresh, verifies the result, and cleans up. Useful for confirming a fresh installation or diagnosing environment issues.

Stability & code quality

  • Safer internal code — the number of unsafe Rust blocks in the query parser was reduced from 690 to 441 (a 36% drop) by introducing two helper macros that wrap the most common unsafe patterns. No behavior change; this makes the codebase easier to audit and maintain.

  • Cleaner internal structure — the largest source file (api.rs, ~9,400 lines) was split into three focused modules. This has no user-visible effect but makes the codebase significantly easier to work with and reduces the risk of regressions from unrelated code being in the same file.

  • Refresh logic extracted and tested — seven functions responsible for building the SQL used during refresh were extracted into standalone testable units and covered with 29 new unit tests. This catches regressions in generated SQL templates before they reach production.


[0.16.0] — Performance & Refresh Optimization

Performance & Refresh Optimization. This release makes stream table refreshes significantly faster across the board. Small changes to large tables are now applied without expensive full-table scans. Tables that only receive new rows (no updates or deletes) use a streamlined path that skips unnecessary work. Aggregate queries like SUM and COUNT are refreshed with pinpoint updates instead of recalculating entire groups. A new template cache eliminates repeated startup work when database connections are recycled. An automated benchmark system now prevents future changes from accidentally slowing things down.

Highlights

  • Smarter refresh for small changes — when only a handful of rows change in a large stream table (less than 1% of total rows), pg_trickle now uses a faster strategy that skips the full-table comparison. This can reduce refresh time by up to 40% for common workloads where most data stays the same between refreshes. The system picks the best strategy automatically, but you can override it via the merge_strategy setting.

  • Insert-only fast path — stream tables backed by append-only data sources (like event logs or audit trails that never update or delete rows) are now detected automatically and refreshed using a much simpler, faster path. No configuration is needed — pg_trickle observes your data patterns and switches to the fast path on its own. If an update or delete is later detected, it safely falls back to the standard approach with a warning.

  • Faster aggregate refreshes — stream tables that use SUM, COUNT, AVG, or STDDEV aggregates now update individual groups directly instead of re-joining against the entire table. For queries with many distinct groups, this can be 5–20× faster. Non-invertible aggregates like MIN, MAX, and STRING_AGG continue using the standard path.

  • Template cache for faster cold starts — the first time a database connection refreshes a stream table, pg_trickle normally spends ~45 ms preparing the refresh query. A new cross-connection cache stores these prepared queries so that subsequent connections (including those from connection poolers like PgBouncer) start refreshing in about 1 ms instead.

  • Automated performance regression checks — every code change to pg_trickle is now automatically benchmarked before it can be merged. If any operation slows down by more than 10%, the change is blocked until the regression is fixed. This protects users from accidental performance degradation in future releases.

New features

  • Error reference guide — a new error reference page documents every error message pg_trickle can produce, explains what caused it, and suggests how to fix it. Useful when troubleshooting unexpected behavior in production.

  • Change buffer growth protection — if a stream table's refresh keeps failing, the backlog of unprocessed changes could previously grow without limit, consuming disk space. A new max_buffer_rows setting (default: 1,000,000 rows) caps this growth. When the limit is reached, pg_trickle performs a full refresh to clear the backlog and warns you about the situation.

  • Automatic index creation control — pg_trickle has always created helpful indexes on stream tables automatically. A new auto_index setting lets you disable this behavior when you want full control over indexing. Stream tables using SELECT DISTINCT now also get an automatic index on their distinct columns.

  • Compaction and predicate pushdown stats — the explain_st() diagnostics function now shows additional information about change buffer compaction thresholds, merge strategy selection, append-only mode, aggregate fast-path status, and template cache hit rates.

Improved

  • Configuration guidance — the documentation now includes detailed tuning advice for the planner_aggressive and cleanup_use_truncate settings, especially for environments using connection poolers like PgBouncer or running under memory pressure.

  • Terminal dashboard improvements — the pgtrickle TUI dashboard now shows the effective refresh mode for each stream table (e.g., when a table is temporarily downgraded from differential to full refresh). The Alerts tab has been restructured with a clearer table layout and better distinction between "stale data" and "no upstream changes" conditions.

Fixed

  • Append-only detection with chained stream tables — stream tables that feed into other stream tables (cascading dependencies) now correctly skip the append-only fast path to avoid data inconsistencies. Previously, a chained stream table could incorrectly use the insert-only path even when downstream tables needed the full change set.

  • Append-only heuristic accuracy — the automatic detection of insert-only data sources now also checks the stream table's own change buffer for non-insert operations, avoiding false positives.

  • Full refresh fallback for mixed changes — when both a stream table and its source table have pending changes in the same refresh cycle, pg_trickle now correctly falls back to a full refresh to avoid inconsistencies.

  • resume_stream_table() confirmed working — the function referenced in error messages when a stream table enters SUSPENDED state was verified to exist and work correctly (present since v0.2.0).

Testing & quality

  • 13 new end-to-end tests covering JOIN correctness across update/delete cycles, window function differential behavior, differential-vs-full equivalence validation, and source table schema evolution resilience.
  • 5 new benchmark scenarios covering semi-joins, anti-joins, multi-table join chains, and aggregate queries at varying group counts. Total: 22 benchmark functions.
  • 1,700 unit tests pass (up from 1,630 in v0.15.0).

[0.15.0] — Interactive TUI, Bulk Create & Runaway-Refresh Protection

0.15.0 brings the terminal dashboard to full operational capability, adds safety features that protect against runaway refreshes, and broadens the ecosystem with guides for popular migration and ORM frameworks. It also includes a major internal refactoring of the query parser and a new streaming benchmark suite.

Highlights

  • Interactive terminal dashboard — the pgtrickle TUI is no longer read-only. Refresh, pause, resume, and repair stream tables directly from the dashboard. A command palette (:) with fuzzy search makes common operations fast. The poller reconnects automatically after network interruptions.

  • Bulk creationpgtrickle.bulk_create() creates many stream tables in a single atomic transaction, ideal for CI/CD and dbt pipelines.

  • Runaway-refresh protection — two new safety nets prevent expensive merges from spiralling: a pre-flight row-count estimate that downgrades to FULL refresh when deltas are too large (max_delta_estimate_rows), and a spill detector that forces FULL refresh after repeated temp-file writes (spill_threshold_blocks).

  • Stuck-watermark alerting — if an upstream ETL pipeline stops advancing its watermark, pg_trickle now pauses affected stream tables and sends a watermark_stuck notification so the issue is surfaced immediately rather than silently producing stale data.

  • Integration guides — new documentation for Flyway, Liquibase, SQLAlchemy, Django, and dbt Hub helps teams adopt pg_trickle alongside their existing tooling.

New Features

  • Volatile function policy — a new volatile_function_policy setting lets you choose whether volatile functions (like random() or clock_timestamp()) should be rejected (the default), allowed with a warning, or allowed silently when creating stream tables.

  • Bulk create APIpgtrickle.bulk_create(definitions) accepts a JSON array of stream table definitions and creates them all in one transaction. If any definition fails, the entire batch is rolled back.

  • Enhanced diagnosticspgtrickle.explain_st() now shows refresh timing statistics (min/max/average duration), partition info for partitioned source tables, and a dependency graph you can render with Graphviz.

  • Join strategy override — the merge_join_strategy setting lets you force a specific join method (hash_join, nested_loop, or merge_join) during delta merges, which can help when the automatic heuristic doesn't suit your workload.

  • Pre-flight delta estimation — when max_delta_estimate_rows is set, pg_trickle counts the delta rows before merging. If the count exceeds the limit, it falls back to a FULL refresh and logs a notice, preventing out-of-memory conditions on unexpectedly large change sets.

  • Spill-aware refresh — if differential merges spill to disk repeatedly (controlled by spill_threshold_blocks and spill_consecutive_limit), the scheduler switches to FULL refresh automatically.

  • Stuck watermark hold-back — the watermark_holdback_timeout setting detects watermarks that have not advanced within a configurable window. Downstream stream tables are paused and a watermark_stuck notification is emitted until the watermark advances again.

  • Cascade dropdrop_stream_table() now accepts an optional cascade parameter (default true). Setting it to false raises an error if dependent stream tables exist, matching PostgreSQL's RESTRICT behavior.

  • Nexmark benchmark suite — a 10-query streaming benchmark (modelled on an online auction system) validates correctness under sustained high-frequency inserts, updates, and deletes.

  • 17 new end-to-end tests — 7 tests for multi-level stream-table chains (3- and 4-level cascades with mixed refresh modes) and 10 tests for diamond/fan-in topologies with IMMEDIATE mode. No deadlocks were found.

Terminal Dashboard (TUI)

  • Write actions — refresh, pause, resume, repair, reset fuse, and gate/ungate operations can now be performed without leaving the dashboard.
  • Command palette — press : for fuzzy-matched command entry with tab-completion.
  • Automatic reconnection — the dashboard reconnects with exponential back-off (up to 15 s) after a connection loss, with a visual indicator.
  • Richer views — all 14 views now show additional live data (diagnostics, CDC health, refresh history with row-delta counts, error remediation hints, dependency-graph annotations, worker queue status, and watermark alignment).
  • Cross-view filtering — the / search filter now persists across all 10 list views.
  • Navigation re-fetch — moving between rows in the Detail view immediately fetches fresh data for the selected table.
  • Toast messages — write actions show confirmation and error toasts.
  • Sort cycling — press s / S on the Dashboard to cycle through 6 sort modes.
  • Mouse support--mouse enables scroll-wheel navigation.
  • Theme togglet or --theme dark|light switches colour themes.
  • JSON exportCtrl+E or :export writes the current view to a file.
  • TLS support--sslmode and --sslrootcert flags.

Documentation & Ecosystem

  • Flyway / Liquibase guide — migration patterns for versioned and repeatable migrations, rollback blocks, and CI environments.
  • SQLAlchemy / Django guide — read-only model patterns, write-blocking safeguards, DRF viewsets, and freshness checking.
  • dbt Hub readiness — the dbt-pgtrickle package is version-synced and ready for dbt Hub submission.
  • Kubernetes / CNPG — updated probe configuration and a new deployment section in the Getting Started guide.
  • Full documentation review — configuration reference expanded from 23 to 40+ settings, missing SQL reference entries filled in, outdated FAQ answers corrected.

Internal Improvements

  • Parser modularisation — the 21 000-line query parser has been split into 5 focused sub-modules (types, validation, rewrites, sublinks, and the main entry point). No behavior change — all 1 687 unit tests pass.
  • Unsafe audit — every unsafe block in the codebase (~750 total) now has a // SAFETY: comment explaining why it is sound.
  • Shared-memory cache RFC — an RFC for a DSM-based MERGE template cache has been written, informing the v0.16.0 implementation plan.
  • TRUNCATE handling verified — TRUNCATE on source tables in trigger CDC mode already triggers a FULL refresh; this is now documented.
  • JOIN key-change fix verified — the v0.14.0 correctness fix for simultaneous JOIN key updates and DELETEs has been verified working and the former known-limitation note replaced with a description of the fix.

Bug Fixes

  • Fixed a panic in the TUI when deserializing health-check data that returned 64-bit integers where 32-bit was expected.
  • Fixed spurious "Error: db error" toasts in the TUI Detail view — background queries now degrade silently instead of surfacing transient errors.
  • Fixed incorrect integer type annotations in two E2E tests for IMMEDIATE mode diamond topologies.

[0.14.0] — Tiered Scheduling, Diagnostics & TUI

0.14.0 is the Tiered Scheduling, Diagnostics & TUI release. It gives you fine-grained control over how often each stream table refreshes, adds tools that recommend the best refresh strategy for your workload, introduces a full-screen terminal dashboard for managing stream tables without SQL, and includes important security and reliability fixes.

Terminal Dashboard (TUI)

A new pgtrickle command-line tool lets you monitor and manage stream tables from a terminal — no SQL required. Run it with no arguments to launch a live-updating full-screen dashboard (think htop for stream tables), or use one-shot subcommands like pgtrickle list, pgtrickle status, or pgtrickle refresh for scripting and CI.

The interactive dashboard includes:

  • Live overview — stream table statuses, refresh timing, and issue counts update every 2 seconds, with color-coded health indicators.
  • Dependency graph — see how stream tables relate to each other in an ASCII tree view.
  • Diagnostics — view refresh mode recommendations with confidence levels.
  • CDC health — monitor change buffer sizes with warnings when they grow too large.
  • Alert feed — real-time notification display with severity levels.
  • Issue detection — automatically spots broken dependency chains, growing buffers, blown fuses, and stale data, with a persistent badge showing the issue count from any view.
  • Watch modepgtrickle watch provides continuous non-interactive output suitable for log aggregation.
  • Output formats — all CLI subcommands support --format json, --format csv, and human-readable table output.

See docs/TUI.md for the full user guide.

Tiered Refresh Scheduling

Stream tables can now be assigned to refresh tiers — hot, warm, cold, or frozen — to control how frequently they refresh:

  • Hot (default) — refreshes at the configured interval.
  • Warm — refreshes at 2× the interval.
  • Cold — refreshes at 10× the interval, ideal for infrequently accessed reports.
  • Frozen — pauses automatic refresh entirely until promoted back.

Assign a tier with ALTER STREAM TABLE ... SET (tier = 'cold'). A NOTICE is emitted when demoting from Hot to Cold or Frozen so operators are aware of the change in refresh frequency.

Smarter Refresh Recommendations

Two new diagnostic functions help you choose the most efficient refresh strategy for each stream table:

  • pgtrickle.recommend_refresh_mode(name) — analyzes seven workload signals (change frequency, timing history, query complexity, table size, index coverage, and latency patterns) and recommends FULL or DIFFERENTIAL mode with a confidence level and plain-language explanation. Useful when you're unsure which mode will be faster for a particular table.

  • pgtrickle.refresh_efficiency(name) — shows per-table refresh performance: how many FULL vs. DIFFERENTIAL refreshes have run, average timing for each, and the speedup factor. Good for monitoring dashboards and alerting.

A new tutorial — Tuning Refresh Mode — walks through the process step by step.

Reduced Write Overhead with UNLOGGED Buffers

Enable pg_trickle.unlogged_buffers = true and newly created change buffer tables will skip write-ahead logging, reducing WAL volume by roughly 30%. This is ideal for workloads where you can tolerate a full re-sync after a crash (the extension detects the crash and re-syncs automatically).

A utility function — pgtrickle.convert_buffers_to_unlogged() — converts existing buffers in one call. Run it during a maintenance window since it briefly locks each buffer table.

Instant Error Detection

Previously, when a stream table's refresh hit a permanent error (for example, a function that doesn't exist for the column type), the extension would retry several times before giving up. Now it recognizes permanent errors immediately, sets the stream table status to ERROR with a clear error message, and stops retrying. You can see the error at a glance in the stream_tables_info view or the TUI dashboard, and fix it by altering the stream table's query.

Security Hardening

  • CDC trigger functions now use SECURITY DEFINER — change-data-capture trigger functions run with the privileges of the extension owner rather than the current user, preventing privilege escalation through modified search paths.
  • Explicit SET search_path — all CDC trigger functions now set search_path to pgtrickle_changes, pg_catalog to prevent search-path manipulation attacks.

Other Improvements

  • Export definitionspgtrickle.export_definition(name) exports a stream table's full configuration as reproducible SQL (DROP + CREATE + ALTER statements), making it easy to version-control or migrate stream table definitions between environments.

  • Creation-time warnings — when creating a stream table with aggregates like MIN, MAX, or STRING_AGG in DIFFERENTIAL mode, a warning now suggests that FULL or AUTO mode may be more efficient. For algebraic aggregates (SUM/COUNT/AVG), the warning only appears when the estimated number of groups is below a configurable threshold.

  • Simplified settings — the merge_planner_hints and merge_work_mem_mb settings have been consolidated into a single planner_aggressive switch. The old setting names still work but are ignored in favor of the new one.

  • GHCR Docker image — a multi-architecture Docker image (ghcr.io/trickle-labs/pg_trickle) with PostgreSQL 18.3 and pg_trickle pre-installed is now published automatically on each release.

  • Pre-deployment checklist — new PRE_DEPLOYMENT.md with a 10-point checklist for production deployments.

  • Best-practice patterns guide — new PATTERNS.md with 6 common patterns: Bronze/Silver/Gold materialization, event sourcing, slowly-changing dimensions, high-fan-out topology, real-time dashboards, and tiered refresh strategies.

  • Keyless dedup fix — replaced MAX(col) with array_agg(col)[1] for deduplicating keyless scan results, which is more correct for non-orderable types.

Bug Fixes

  • ST-on-ST differential refresh — manually refreshing a stream table that reads from another stream table now uses true incremental (DIFFERENTIAL) refresh instead of falling back to a full re-scan. This matches the behavior of the automatic scheduler and is significantly faster for large tables.

  • Staleness tracking — the staleness indicator now uses the actual last refresh time instead of an internal data timestamp, making the pg_stat_stream_tables view more accurate.

Testing & Reliability

  • Soak test — a new long-running stability test validates zero worker crashes, zero ERROR states, and stable memory usage under sustained mixed workload (configurable duration, default 10 minutes).

  • Multi-database isolation test — verifies that two databases in the same PostgreSQL cluster run pg_trickle independently without interference.

  • 140 TUI tests — comprehensive unit, snapshot, and interaction tests for the terminal dashboard.

  • 23 mixed-object E2E tests — validates stream tables alongside regular PostgreSQL views, materialized views, and other objects.

  • Scheduler race fixes — eliminated flaky test failures caused by scheduler timing races and GUC leak between tests.

New SQL Functions

FunctionPurpose
pgtrickle.recommend_refresh_mode(name)Workload-based refresh mode recommendation
pgtrickle.refresh_efficiency(name)Per-table refresh performance metrics
pgtrickle.export_definition(name)Export stream table as reproducible DDL
pgtrickle.convert_buffers_to_unlogged()Convert logged change buffers to UNLOGGED

New Settings

SettingDefaultPurpose
pg_trickle.planner_aggressivetrueConsolidated switch for MERGE planner hints
pg_trickle.unlogged_buffersfalseCreate new change buffers as UNLOGGED
pg_trickle.agg_diff_cardinality_threshold1000Warn about DIFFERENTIAL mode below this group count

Deprecated

  • pg_trickle.merge_planner_hints — Use pg_trickle.planner_aggressive instead. Still accepted but ignored at runtime.
  • pg_trickle.merge_work_mem_mb — Same; use planner_aggressive instead.

Upgrading

Run ALTER EXTENSION pg_trickle UPDATE; after installing the new binaries. The upgrade adds new catalog columns, functions, and the TUI workspace member. No breaking changes — everything from v0.13.0 continues to work. See UPGRADING.md for details.


[0.13.0] — Scalability Foundations & Full TPC-H Coverage

0.13.0 is the Scalability Foundations release. It makes pg_trickle handle large tables, complex queries, and multi-tenant deployments much more efficiently — and it achieves a major milestone: all 22 TPC-H benchmark queries now run in incremental (DIFFERENTIAL) mode, meaning the engine no longer needs to fall back to slow full-refresh for any standard analytical query pattern.

Smarter Change Detection for Wide Tables

When you UPDATE a few columns in a large table — say, changing a status column in a 60-column table — pg_trickle used to treat every column as potentially changed, doing extra work to keep all downstream views up to date.

Now it knows the difference. Columns used in GROUP BY, JOIN, or WHERE clauses are "key columns"; everything else is a "value column." When only value columns change, the engine takes a shortcut: it sends a single correction row instead of a full delete-and-reinsert pair. For wide-table workloads, this can cut the volume of data processed by 50% or more.

Shared Change Buffers

If you have several stream tables watching the same source table, each one used to maintain its own private copy of the change log. That's wasteful. Now they share a single change buffer per source, and each consumer simply tracks how far it has read. The slowest reader protects the buffer for everyone.

You can see how this is working with the new pgtrickle.shared_buffer_stats() function — it shows each buffer, who's reading from it, how many rows are queued, and whether it's been automatically partitioned for performance.

Automatic Buffer Partitioning

Set pg_trickle.buffer_partitioning = 'auto' and pg_trickle will start with simple, unpartitioned change buffers. If a buffer starts accumulating a lot of rows (high-throughput sources), it automatically converts to a partitioned layout where old data can be removed almost instantly instead of deleting rows one by one.

More Partitioning Options for Stream Tables

Building on the RANGE partitioning added in v0.11.0, you can now partition stream tables in three additional ways:

  • Multi-column keys — partition by a combination of columns (partition_by='region,year')
  • LIST partitioning — for low-cardinality columns like status or type (partition_by='LIST:status')
  • HASH partitioning — for even distribution across a fixed number of partitions (partition_by='HASH:customer_id:8')

You can also change the partition key of an existing stream table at runtime with alter_stream_table(partition_by => ...) — data is preserved automatically. If rows land in the default (catch-all) partition, a WARNING is emitted to prompt you to add explicit partitions.

All 22 TPC-H Queries Now Run Incrementally

The DVM (differential view maintenance) engine received its most significant set of improvements yet, targeting the complex multi-table join patterns found in standard analytical benchmarks:

  • Smarter pre-image lookups — instead of reconstructing what the data looked like before a change by subtracting deltas (expensive for large tables), the engine now uses targeted index lookups that only touch the rows that actually changed.
  • Predicate pushdown — WHERE conditions from the original query are now pushed into the delta computation, preventing unnecessary cross-products in multi-table joins.
  • Deep-join optimizations — queries joining 5+ tables get automatic planner hints (more memory, smarter join strategies) to avoid spilling to disk.
  • Scan-count-aware strategy selector — queries that exceed configurable join complexity or delta volume thresholds automatically fall back to full refresh on a per-query basis rather than failing.

The result: all 22 TPC-H queries pass at SF=0.01 in DIFFERENTIAL mode with zero drift across 3 refresh cycles. The DIFFERENTIAL_SKIP_ALLOWLIST (queries that previously required full refresh) is now empty.

Refresh Performance Inspection Tools

Two new functions help you understand what pg_trickle is doing under the hood:

  • pgtrickle.explain_delta(name, format) — shows you the query plan for the auto-generated delta SQL, the same way EXPLAIN works for regular queries. Available in text, JSON, XML, or YAML format.
  • pgtrickle.dedup_stats() — reports how often concurrent writes produce duplicate entries that need pre-processing before the MERGE step.

Multi-Tenant Worker Quotas

New setting: pg_trickle.per_database_worker_quota — if you run many databases on one PostgreSQL cluster, this prevents a busy database from monopolizing all the refresh workers. Workers are assigned by priority (immediate-mode tables first, then hot, warm, and cold), with burst capacity up to 150% when other databases are idle.

TPC-H Benchmark Harness

You can now measure refresh performance across all 22 TPC-H queries in a structured way. Run just bench-tpch to get per-query timing, FULL vs. DIFFERENTIAL comparison, and P95 latency numbers. Five synthetic benchmarks (q01, q05, q08, q18, q21) also measure the pure Rust delta-SQL generation time without needing a database.

Broader SQL Support

  • IS JSON predicates (PG 16+) — expressions like expr IS JSON OBJECT now work in incremental mode.
  • SQL/JSON constructors (PG 16+) — JSON_OBJECT(...), JSON_ARRAY(...), JSON_OBJECTAGG(...), and JSON_ARRAYAGG(...) are now accepted.
  • Recursive CTEs — recursive queries with non-monotone operators (like EXCEPT) correctly fall back to full refresh instead of producing wrong results.

dbt Integration Updates

If you use dbt-pgtrickle, you can now set partitioning and fuse options directly from dbt model config:

  • {{ config(partition_by='customer_id') }} for partitioned stream tables
  • {{ config(fuse='auto', fuse_ceiling=100000, fuse_sensitivity=3) }} for circuit-breaker protection

Bug Fixes

  • Scheduler cascade fix — stream tables downstream of FULL-mode upstream tables now detect changes correctly via a last_refresh_at fallback, preventing stale data in chains where the upstream uses full refresh.
  • SUM(CASE WHEN ...) drift fix — aggregate expressions using CASE were occasionally producing slightly wrong incremental results; these are now correctly detected and processed via a group rescan.
  • Duplicate column DDL fix — removed a duplicate column definition in the pgt_stream_tables DDL that could cause issues on fresh installs.

Testing Improvements

  • New regression test suite targeting 9 structural weaknesses: join multi-cycle correctness (7 tests), differential-equals-full equivalence (11 tests), DVM operator execution, failure recovery, and MERGE template unit tests.
  • E2E test infrastructure now uses template databases, cutting per-test setup time significantly.

New SQL Functions

FunctionPurpose
pgtrickle.explain_delta(name, format)Show the query plan for the delta SQL
pgtrickle.dedup_stats()MERGE deduplication frequency counters
pgtrickle.shared_buffer_stats()Per-source change buffer status
pgtrickle.explain_refresh_mode(name)Why a stream table uses its current refresh mode
pgtrickle.reset_fuse(name)Reset a blown circuit-breaker fuse
pgtrickle.fuse_status()Fuse state across all stream tables

New Catalog Columns

Ten new columns on pgtrickle.pgt_stream_tables:

ColumnPurpose
effective_refresh_modeThe actual refresh mode after AUTO resolution
fuse_modeCircuit-breaker configuration (off / auto / manual)
fuse_stateCurrent fuse state (armed / blown)
fuse_ceilingMaximum change count before fuse blows
fuse_sensitivityConsecutive cycles above ceiling before triggering
blown_atWhen the fuse last blew
blow_reasonWhy the fuse blew
st_partition_keyPartition key specification
max_differential_joinsMaximum join count for differential mode
max_delta_fractionMaximum delta-to-table ratio for differential mode

Upgrading

Run ALTER EXTENSION pg_trickle UPDATE; after installing the new binaries. All new columns and functions are added automatically. No breaking changes — everything from v0.12.0 continues to work as before. See UPGRADING.md for details.


[0.12.0] — Join Correctness, Diagnostics & Reliability

0.12.0 is a correctness, reliability, and developer-experience release built on top of 0.11.0's major new features. It closes the last known wrong-answer bugs for complex join queries, adds tools to help you understand and debug stream table behavior, hardens the scheduler against several edge cases that could cause stale data or crashes, and backs it all with thousands of new automatically generated tests.

Stale Rows Fixed in Stream-Table Chains

What was the problem? When a stream table (B) reads from another stream table (A), each change in A is recorded as a small "what changed" entry — a row added or removed. But the identity key used for those entries was computed differently inside the change buffer than it was inside B's own storage. As a result, when A changed via an upstream UPDATE, B's refresh could silently fail to delete the old version of a row, leaving a stale duplicate.

What changed? The change buffer now computes row identity the same way B does — using a hash of all the data columns rather than the upstream source's primary key. Stale rows after UPDATE no longer appear in stream-table chains. This bug was found and confirmed by the new property-based test suite (see below).

Phantom Rows Fixed for Complex Joins (TPC-H Q7 / Q8 / Q9)

What was the problem? When a stream table's query joins three or more tables together and rows are deleted from more than one join side at the same time, the incremental engine could silently drop the correction — leaving rows in the stream table that should have been removed.

This affected TPC-H queries Q7, Q8, and Q9 (which all involve deep join trees), and any user query with a similar multi-table join structure. A temporary workaround (falling back to full refresh for wide joins) was in place since v0.11.0 and has now been lifted.

What changed? The incremental engine now takes an individual "before snapshot" for each leaf table in the join tree — each one cheaply computed from a single-table comparison — and re-joins them after the delete. This avoids writing multi-gigabyte temp files to disk (the root cause of the original workaround) and eliminates the phantom-row bug entirely. Q7, Q8, and Q9 now run in differential mode without any workarounds.

Type Errors Fixed in Parallel Refresh Chains

What was the problem? When a chain of stream tables is fused into a single execution unit for efficiency (the "bypass" optimisation added in v0.11.0), the internal bypass table used text for every column regardless of the actual column type. This caused an operator does not exist: text > integer error whenever a downstream stream table had a type-sensitive WHERE clause (e.g. WHERE amount > 100), making the parallel worker tests fail silently across all topologies that included a fused chain.

What changed? Bypass tables now use the real column types. The six parallel-worker benchmark tests now complete in 9–26 seconds rather than timing out after 120 seconds.

Scheduler Fixes for Diamond and ST-on-ST Topologies

Two scheduler bugs that caused incorrect refresh behavior with complex dependency graphs were fixed:

  • Diamond timeout. In a diamond topology (A → B, A → C, B+C → D), the L1 arm stream tables (B and C) were created with a 1-minute fixed interval rather than a calculated schedule. This meant D never received updates within the test window. The scheduler also had a bug loading stream table records by ID that caused silent failures in parallel worker paths. Both are fixed.

  • ST-on-ST parallel workers. When an upstream stream table changed, the parallel worker paths (singleton, atomic group, immediate closure, fused chain) were not forcing a full refresh on downstream stream tables the way the main scheduler loop did. This could leave downstream tables stale. The fix ensures all parallel paths treat upstream stream-table changes the same way.

Four New Diagnostic Functions

When stream table behavior is unexpected — wrong refresh mode, a query being rewritten in a surprising way, persistent errors — it previously required reading server logs or source code to understand why. Four new SQL functions expose that internal state directly in queries:

  • pgtrickle.explain_query_rewrite(query TEXT) — shows exactly how pg_trickle rewrites your query for incremental refresh: which operators were applied, how delta keys are injected, and how aggregates are classified. Useful for understanding why a query got a particular refresh mode.

  • pgtrickle.diagnose_errors(name TEXT) — shows the last 5 errors for a stream table, each classified by type (correctness, performance, configuration, infrastructure) with a suggested fix.

  • pgtrickle.list_auxiliary_columns(name TEXT) — lists the internal __pgt_* columns that pg_trickle injects into a stream table's query plan, with an explanation of each one's purpose. Helpful when SELECT * returns unexpected extra columns.

  • pgtrickle.validate_query(query TEXT) — analyses a SQL query and reports which refresh mode it would get, which SQL constructs were detected, and any warnings — all without creating a stream table.

Multi-Column IN (subquery) Now Gives a Clear Error

What was the problem? A query like WHERE (col_a, col_b) IN (SELECT x, y FROM …) passed validation but produced silently wrong results — the engine was only matching on the first column and ignoring the second.

What changed? This construct is now detected at stream table creation time and rejected with a clear error message that recommends rewriting it as EXISTS (SELECT 1 FROM … WHERE col_a = x AND col_b = y).

IMMEDIATE Mode Proven Correct Under High Concurrency

IMMEDIATE mode (where the stream table updates inside the same transaction as the source table change) now has a dedicated concurrency stress test: 100–120 concurrent transactions firing simultaneously against the same source table, across five scenarios (all inserts, all updates to distinct rows, all updates to the same row, all deletes, and a mixed workload). Zero lost updates, zero phantom rows, and no deadlocks were observed in any run.

Protection Against Pathological Queries

A new guard prevents a particularly deep or convoluted query from consuming all available stack space and crashing the database backend. When the query analyser recurses more than 64 levels deep (configurable via pg_trickle.max_parse_depth), it now returns a clear QueryTooComplex error instead of crashing.

Tiered Scheduling Now On By Default

The tiered scheduling feature — which automatically slows down cold (infrequently-read) stream tables and speeds up hot ones — is now enabled by default. In large deployments this reduces the scheduler's CPU usage significantly. Stream tables you query often continue refreshing at full speed. Stream tables that nobody has read recently back off gracefully.

If you rely on all stream tables refreshing at the same rate regardless of read frequency, set pg_trickle.tiered_scheduling = off.

Thousands of Automatically Generated Tests

Two new automated testing systems were added to complement the hand-written test suite:

  • Property-based tests — the test framework automatically generates thousands of random DAG shapes, schedule combinations, and edge cases and checks that the scheduler's ordering guarantees hold for all of them. If any configuration would cause a table to refresh in the wrong order or get spuriously suspended, these tests catch it.

  • SQLancer fuzzing — SQLancer generates random SQL queries and checks that pg_trickle's incremental result matches the result of running the same query directly in PostgreSQL. Any mismatch is automatically saved as a permanent regression test. A weekly CI job runs this continuously. At time of release, zero mismatches have been found.

CDC Write-Side Benchmark Published

A new benchmark suite measures the overhead that pg_trickle's change capture triggers add to your write workload. Results across five scenarios (single-row INSERT, bulk INSERT, bulk UPDATE, bulk DELETE, concurrent writers) are published in docs/BENCHMARK.md. Use these numbers to estimate the impact before deploying pg_trickle on a write-heavy table.

MERGE Template Validation at Test Startup

The SQL templates that pg_trickle generates for applying incremental changes (the MERGE statements) are now validated with an EXPLAIN dry-run at every test startup. If a code change accidentally produces a malformed MERGE template, the tests catch it before any data is processed — rather than manifesting as a cryptic runtime error.


[0.11.0] — Event-Driven Latency, Chain IVM & Observability Stack

This is the biggest release since the initial launch. The headline features are 34× lower latency for real-time workloads, stream-table chains that now refresh incrementally (no more forced full recomputation when one stream table feeds another), declarative partitioning to cut I/O on large tables by up to 100×, a ready-to-use Prometheus and Grafana monitoring stack, and a circuit breaker to protect production databases from runaway change bursts.

34× Lower Latency — Changes Arrive Instantly

Previously, the background worker woke up on a fixed timer every ~500 ms to check for new data, even when nothing had changed. Every change had to wait up to half a second in the change buffer before being processed.

Now, when a source table is modified, the change capture trigger immediately wakes the background worker via a PostgreSQL notification channel. The worker starts processing within ~15 ms of the write committing — a 34× improvement for low-volume workloads. Under heavy DML, a 10 ms debounce window coalesces rapid notifications so the worker isn't flooded.

Event-driven wake is on by default. You can turn it off (pg_trickle.event_driven_wake = off) to revert to poll-based wake, and you can tune the debounce window with pg_trickle.wake_debounce_ms (default 10).

Stream-Table-to-Stream-Table Chains Now Refresh Incrementally

Previously, when stream table B's query read from stream table A, pg_trickle had to do a full recomputation of B every time A changed — even if only a few rows in A actually changed. For long chains (A → B → C → D), every hop was a full re-scan.

Now, stream tables can read from other stream tables incrementally. When A refreshes, the rows it added and removed are recorded in a change buffer just like a base table. B wakes up, reads only the changed rows from A, and applies a delta — not a full recomputation. Even when A does a full refresh (e.g. because its query does not support differential mode), a before/after snapshot diff is captured automatically so downstream tables still receive a small insert/delete delta rather than cascading full refreshes through the chain.

Declaratively Partitioned Stream Tables

Stream tables can now be declared with a partition key:

SELECT create_stream_table(
  'monthly_sales',
  $$ SELECT month, region, SUM(amount) FROM orders GROUP BY 1, 2 $$,
  partition_by => 'month'
);

pg_trickle creates a range-partitioned storage table and, when refreshing, automatically restricts the MERGE operation to only the partitions that contain changed rows. For large tables where changes touch only 2–3 out of 100 monthly partitions, this can reduce the MERGE I/O from 10 million rows to ~100,000 — a 100× improvement.

Ready-to-Use Prometheus and Grafana Monitoring

A complete observability stack is now included in the monitoring/ directory:

  • monitoring/prometheus/pg_trickle_queries.yml — drop-in configuration for postgres_exporter that exports 14 metrics covering refresh performance, CDC buffer sizes, staleness, error rates, and per-table status.
  • monitoring/prometheus/alerts.yml — 8 alerting rules that page you when a stream table goes stale (> 5 min), starts error-looping (≥ 3 consecutive failures), is suspended, or when the CDC buffer exceeds 1 GB.
  • monitoring/grafana/dashboards/pg_trickle_overview.json — a pre-built Grafana dashboard with six sections: cluster overview, refresh latency time-series, staleness heatmap, CDC lag, per-table drill-down, and scheduler health.
  • monitoring/docker-compose.yml — brings up PostgreSQL + pg_trickle + postgres_exporter + Prometheus + Grafana with one command (docker compose up). Grafana opens at http://localhost:3000; the dashboard shows live metrics generated by a seed workload of stream tables continuously refreshing synthetic order and product data (see monitoring/init/01_demo.sql).

No code changes are needed to use this stack with an existing pg_trickle installation.

Circuit Breaker (Fuse) — Protection Against Runaway Change Bursts

A new circuit breaker mechanism halts refresh for a stream table when its pending change count exceeds a configurable threshold. This protects your database from accidental mass-delete scripts, runaway migrations, or data imports that would otherwise trigger an unexpectedly large and expensive refresh operation.

When the fuse blows, pg_trickle sends a pgtrickle_alert PostgreSQL notification that you can subscribe to, and suspends the affected stream table. You then choose how to recover using reset_fuse():

  • reset_fuse(name, action => 'apply') — process the backlog normally (default).
  • reset_fuse(name, action => 'reinitialize') — clear the change buffer and repopulate the stream table from scratch.
  • reset_fuse(name, action => 'skip_changes') — discard the pending changes and resume without reprocessing them.

Configure per-table with alter_stream_table(fuse => 'on', fuse_ceiling => 10000) or set a global default with pg_trickle.fuse_default_ceiling. Use fuse_status() to inspect the blown/active state of all stream tables at once.

Wider Column Bitmask — No More 63-Column Limit

pg_trickle's change capture tracks which columns were actually modified in each row so that stream tables that reference only a subset of columns can ignore irrelevant updates. Previously, this optimization silently stopped working for source tables with more than 63 columns — all updates were treated as touching every column.

The bitmask has been extended from a 64-bit integer to an arbitrary-width PostgreSQL VARBIT value, removing the column count cap entirely. Existing deployments are migrated automatically (the old column value becomes NULL, which the filter treats conservatively — no rows are silently dropped). Tables with fewer than 64 columns are unaffected at the data level.

Per-Database Worker Quotas

In multi-tenant environments where multiple databases share a single PostgreSQL instance, all stream-table refresh workers previously competed for the same concurrency pool. A single busy database could crowd out others.

A new GUC pg_trickle.per_database_worker_quota sets a soft concurrency limit per database. When the rest of the cluster is lightly loaded (< 80% of available capacity in use), a database can burst to 150% of its quota. When the cluster is busy, each database is held to its base quota.

Refresh work is also now dispatched in priority order: IMMEDIATE mode tables → atomic diamond groups → singleton tables.

DAG Scheduling Performance

For deployments with chains of stream tables (A → B → C), several improvements reduce end-to-end propagation latency:

  • Fused single-consumer chains. When a stream table chain has exactly one downstream consumer at each hop, the scheduler fuses the chain into a single execution unit in one background worker. Intermediate deltas are stored in temporary in-memory tables instead of persistent change buffers, eliminating the WAL writes, index maintenance, and cleanup that would normally occur at each hop.
  • Batch coalescing. Before a downstream table reads from an upstream change buffer, redundant insert/delete pairs for the same row are cancelled out. This prevents rapid-fire upstream refreshes from accumulating duplicate work for downstream tables.
  • Adaptive dispatch polling. The parallel dispatch loop now backs off exponentially (20 ms → 200 ms) instead of using a fixed 200 ms poll, and resets to 20 ms as soon as any worker finishes. Cheap refreshes no longer wait a full 200 ms for the next tick.
  • Delta amplification warnings. When a differential refresh produces many more output rows than input rows (default threshold: 100×), a WARNING is emitted with the table name, input and output counts, and a tuning hint. explain_st() now exposes amplification_stats from the last 20 refreshes.

Smarter Diagnostics and Warnings

Several improvements to make problems visible earlier and easier to diagnose:

  • Know which refresh mode is actually running. When a stream table is set to AUTO, pg_trickle now records which mode it actually chose at each refresh (DIFFERENTIAL, FULL, etc.) in a new effective_refresh_mode column on pgt_stream_tables. A new explain_refresh_mode(name) function reports the configured mode, the actual mode used, and the reason for any downgrade — all in one query.
  • Clearer warning when a stream table falls back to full refresh. If a stream table cannot use differential mode, pg_trickle now emits a WARNING message naming the affected table and the reason. Previously this happened silently.
  • Warning when using aggregates that require full group rescans. Aggregate functions like STRING_AGG, ARRAY_AGG, and JSON_AGG require re-aggregating the entire group whenever any member changes. pg_trickle now warns at stream table creation time when such aggregates are used in DIFFERENTIAL mode, and explain_st() classifies each aggregate's maintenance strategy (incremental, auxiliary-state, or group-rescan) so you can understand the cost.
  • Better error messages. Errors for unsupported query patterns, cycle detection, upstream schema changes, and query parse failures now include a DETAIL field explaining what went wrong and a HINT field suggesting how to fix it.
  • Invalid parameter combinations are rejected at creation time. For example, using diamond_schedule_policy='slowest' without diamond_consistency='atomic' now produces a clear error at create_stream_table / alter_stream_table time rather than silently doing the wrong thing at refresh time.
  • TopK queries validate their metadata on every refresh. Stream tables defined with ORDER BY ... LIMIT N now recheck that the stored LIMIT/OFFSET metadata still matches the actual query on each refresh. On mismatch, they fall back to a full refresh with a WARNING rather than silently producing wrong results.

Safety and Reliability Improvements

  • No more crashes from schema changes. If a source table's schema changes while a refresh is running (e.g. a column is dropped), pg_trickle now catches the error, emits a structured WARNING with the table name and error details, and continues refreshing all other stream tables. The scheduler never crashes due to an individual table's error.
  • Failure injection tests. New end-to-end tests deliberately drop columns and tables mid-refresh to verify that the scheduler stays alive and other stream tables continue processing correctly.
  • Safer defaults. Three default settings have been updated to reflect production-safe behavior:
    • parallel_refresh_mode now defaults to 'on' (was 'off'). Parallel refresh has been stable for several releases; serial mode is now opt-in.
    • block_source_ddl now defaults to true. Accidental ALTER TABLE on a source table while a stream table depends on it is now blocked by default, with clear instructions on how to temporarily disable the guard if needed.
    • The invalidation ring capacity has been doubled from 32 to 128 slots, reducing the risk of invalidation events being silently discarded under rapid DDL.

Getting Started Guide Restructured

docs/GETTING_STARTED.md has been reorganised into five progressive chapters:

  1. Hello World — create your first stream table and watch it update.
  2. Joins, Aggregates & Chains — multi-table dependencies and DAG patterns.
  3. Scheduling & Backpressure — controlling refresh frequency and auto-backoff.
  4. Monitoring In Depth — using the five key diagnostic functions and the Prometheus/Grafana stack.
  5. Advanced Topics — FUSE circuit breaker, partitioned stream tables, IMMEDIATE (in-transaction) IVM, and multi-tenant worker quotas.

TPC-H Correctness Gate Added to CI

Five queries derived from the TPC-H benchmark — covering single-table GROUP BY, filter-aggregate, CASE WHEN inside SUM, a three-way join, and LEFT OUTER JOIN with GROUP BY — now run in DIFFERENTIAL mode on every push to main and daily. Any correctness mismatch between pg_trickle's incremental output and plain PostgreSQL execution fails the CI build automatically.

Docker Hub Image Improvements

The Dockerfile.hub image that is published to Docker Hub has been expanded with a comprehensive set of GUC defaults fine-tuned for production use. A new just build-hub-image recipe builds the image locally for testing.

Bug Fixes

  • Scheduler crash after event-driven wake was enabled. The background worker crashed immediately after startup when event_driven_wake = on (the default) because the LISTEN command was being issued outside of a transaction. Fixed by issuing LISTEN inside a short-lived SPI transaction at startup. (#296)
  • Spurious full refresh for non-recursive CTEs. Stream tables containing WITH clauses that were not recursive (WITH foo AS (SELECT ...)) were being incorrectly forced to FULL refresh mode. Only truly recursive CTEs (WITH RECURSIVE) require this. Non-recursive CTEs now correctly use differential mode. (#298)
  • DISTINCT ON inside a CTE body caused a parse error. When a stream table's defining query contained a WITH clause whose body used DISTINCT ON (...), the DVM query analyser failed with a parse error. The DISTINCT ON clause is now rewritten before analysis so it no longer interferes. (#300)
  • Full-refresh fallback warning now names the affected table. When pg_trickle falls back from differential to full refresh, the emitted WARNING now includes the stream table name and the reason, making it straightforward to identify which table you need to investigate. (#301)

[0.10.0] — Cloud Deployment, PgBouncer & Query Engine Correctness

The headline features of 0.10.0 are cloud deployment compatibility, query engine correctness, refresh performance, and improved developer experience for auto_backoff. pg_trickle now works reliably behind PgBouncer — the connection pooler used by default on Supabase, Railway, Neon, and other managed PostgreSQL platforms. A broad set of correctness issues in the incremental query engine are fixed. And several performance optimizations cut refresh time for large tables and busy deployments.

auto_backoff Is Now Much Friendlier on Developer Machines

When pg_trickle.auto_backoff = true is enabled, the scheduler automatically slows down stream tables whose refresh cost exceeds their schedule budget — a good safeguard in production. This release makes the feature safe to use alongside short schedules (e.g. '1s') in developer and CI environments:

  • Trigger threshold raised from 80 % → 95 %. Backoff now only activates when a refresh consumes more than 95 % of the schedule window. A 900 ms refresh on a 1-second schedule (90 %) used to trigger backoff; it no longer does. EC-11 operator alerting continues to fire at 80 % (unchanged) so you still get an early warning before the scheduler is actually stuck.

  • Maximum slowdown reduced from 64× → 8×. In the worst case, a stream table's effective refresh interval is now capped at 8× its configured schedule (e.g. 8 seconds for a '1s' table) instead of 64 seconds. The cap self-heals immediately: a single on-time refresh resets the factor to 1×.

  • Backoff events now emit WARNING instead of INFO. When the scheduler stretches or resets a stream table's effective interval, you will see a WARNING message in your PostgreSQL client, including the new effective interval — rather than a silent slowdown with no explanation.

  • auto_backoff now defaults to on. With the above improvements in place, the feature is safe in all environments. New installations get CPU runaway protection out of the box. To restore the old opt-in behaviour, set pg_trickle.auto_backoff = off.

Works Behind PgBouncer

PgBouncer is the most popular PostgreSQL connection pooler. In "transaction mode" — the default setting on most cloud PostgreSQL platforms — it hands a fresh database connection to every transaction, which breaks anything that assumes the same connection stays open between calls (session locks, prepared statements). pg_trickle previously relied on both. This release makes pg_trickle work correctly in such deployments.

  • Session locks replaced with row-level locking. The background scheduler now acquires a short-lived row-level lock on each stream table's catalog entry instead of a session-level advisory lock. Row-level locks are released automatically at transaction end — exactly what PgBouncer transaction mode requires. If a concurrent refresh is already running for a given stream table, the scheduler skips that cycle and retries, rather than blocking.

  • New pooler_compatibility_mode option per stream table. Setting pooler_compatibility_mode => true when creating or altering a stream table disables prepared statements and NOTIFY emissions for that table. Leave it off (the default) if you're not behind a pooler — behaviour is unchanged from v0.9.0.

  • PgBouncer tested end-to-end. A new automated test suite boots PgBouncer in transaction-pool mode alongside pg_trickle and exercises the full lifecycle: create, refresh, alter, drop — all through the pooler. Run with just test-pgbouncer.

Query Engine Correctness Fixes

Several SQL patterns that appeared to work correctly could produce wrong results silently under the incremental query engine. All of the following are now fixed:

  • Recursive queries (WITH RECURSIVE) update correctly when rows are deleted. Recursive queries are used for organisation hierarchies, bill-of-materials roll-ups, graph traversals, and similar structures. In DIFFERENTIAL mode, deleting a row from the source previously caused a full recomputation (correct, but expensive — O(n)). Now pg_trickle uses the Delete-and-Rederive algorithm, updating only affected rows at O(delta) cost. Computed expressions like ancestor.path || ' > ' || node.name update correctly when any ancestor is renamed or moved.

  • SUM over a FULL OUTER JOIN no longer returns 0 instead of NULL. When matched rows on both join sides transition to matched on one side only (creating null-padded rows), the incremental SUM formula previously returned 0 instead of NULL. pg_trickle now tracks how many non-null values exist in each group and produces the correct answer without any full-group rescan.

  • Multi-source delta merging is now correct for diamond-shaped queries. A "diamond" topology is when two separate paths through the dependency graph both feed into the same stream table (e.g. table A → both B and C → D). Simultaneous changes on both paths could previously cause some corrections to be silently discarded, leaving D with wrong values. Now uses proper weight aggregation (Z-set algebra) so every correction is applied. Six property-based tests verify this for different diamond shapes.

  • Statistical aggregates (CORR, COVAR, REGR_*) now update in constant time. All twelve SQL correlation and regression functions — CORR, COVAR_POP, COVAR_SAMP, and the ten REGR_* variants — now update incrementally using running totals (Welford-style accumulation) instead of rescanning the whole group. Each changed row is processed once regardless of group size.

  • LATERAL subqueries only re-examine correlated rows. When data changes in the inner part of a LATERAL JOIN, pg_trickle previously re-ran the subquery for every row in the outer table. Now it re-runs it only for outer rows that actually correlate with the changed inner data, reducing work from proportional-to-table-size to proportional-to-changes.

  • Materialized view sources now work in DIFFERENTIAL mode. Stream tables can use a PostgreSQL materialized view as their data source when pg_trickle.matview_polling = on is set. Changes are detected by comparing snapshots, the same mechanism used for foreign table sources.

  • Six correctness bugs in the query rewriting engine fixed. These all involved edge cases in how the incremental engine translates SQL:

    • SQL comment fragments such as /* unsupported ... */ that were being injected into generated SQL and causing runtime syntax errors are now replaced with clear extension-level errors.
    • When a column-rename step (e.g. EXTRACT(year FROM orderdate) AS o_year) sits between an aggregate and its source, GROUP BY and aggregate expressions now resolve correctly.
    • EXCEPT queries wrapped in a projection no longer silently lose their row multiplicity tracking.
    • A placeholder row identifier value of zero could collide with real row hashes; changed to a sentinel value (i64::MIN) outside the normal hash range.
    • Empty scalar subqueries now raise a clear error instead of silently emitting NULL.
  • Change capture (CDC) fixes. The UPDATE trigger now correctly handles rows with NULL values in their primary key columns (previously those rows were silently dropped from the change buffer). WAL logical replication publications are automatically rebuilt when a source table is converted to partitioned after the publication was set up — previously this caused the stream table to silently stop updating. TRUNCATE followed by INSERT is handled atomically so post-TRUNCATE inserts are never lost.

Faster Refreshes

  • Automatic covering index on stream table row IDs. Stream tables with eight or fewer output columns now automatically get a covering index with INCLUDE (col1, col2, ...) on the internal __pgt_row_id column. This lets the MERGE step use index-only scans — no heap lookups for matched rows — reducing refresh time by roughly 20–50% in small-delta / large-table scenarios.

  • Change buffer compaction. When the pending change buffer grows beyond pg_trickle.compact_threshold (default 100,000 rows), pg_trickle compacts it before the next refresh cycle. INSERT→DELETE pairs that cancel each other out are eliminated; multiple sequential changes to the same row are collapsed to a single net change. Reduces delta scan overhead by 50–90% for high-churn tables. Uses change_id (not ctid) for safe operation under concurrent VACUUM.

  • Tiered refresh scheduling. Large deployments can assign stream tables to one of four tiers: Hot (refresh at the configured interval), Warm (2× interval), Cold (10× interval), or Frozen (skip until manually promoted). Gate the feature with pg_trickle.tiered_scheduling = on (default off). Set per stream table via ALTER STREAM TABLE ... SET (tier => 'warm'). Frozen stream tables are entirely skipped by the scheduler until you promote them.

  • Incremental dependency-graph updates. When a stream table is created, altered, or dropped, the internal dependency graph now updates only the affected entries instead of rebuilding the entire graph from scratch. Reduces the latency impact of DDL operations from roughly 50 ms to roughly 1 ms in deployments with 1,000+ stream tables.

  • Smarter topo-sort caching inside a scheduler tick. The ordering in which stream tables are refreshed (topological order through the dependency graph) is now computed once per scheduler tick and reused across all internal callers, eliminating redundant work.

Better Visibility Into What pg_trickle Is Doing

Several behaviours that previously happened silently now produce a short, actionable message at the moment they occur:

  • ORDER BY without LIMIT warns you at creation time. Adding ORDER BY to a stream table's defining query without also adding LIMIT has no effect: stream table storage has no guaranteed row order. pg_trickle now emits a WARNING pointing you toward the TopK pattern or suggesting you remove the ORDER BY.

  • append_only mode reversions are visible. When pg_trickle automatically exits append-only mode (because deletions or updates were detected in the source), the notice is now emitted at WARNING level (was INFO, normally suppressed) and also dispatched as a pgtrickle_alert notification.

  • Cleanup failures escalate after 3 consecutive attempts. If the background worker fails to clean up a source table 3 times in a row, the message is promoted from DEBUG1 (normally invisible) to WARNING so it appears in the server log.

  • Diamond dependency with diamond_consistency='none' now advises you. When you create a stream table that forms a diamond in the dependency graph and explicitly set diamond_consistency='none', a NOTICE advises you to consider diamond_consistency='atomic' for consistent cross-branch reads.

  • diamond_consistency now defaults to 'atomic'. New stream tables get atomic group semantics by default, meaning all branches of a diamond are refreshed together in a single savepoint before the convergence node is updated. This prevents a read from the convergence node seeing one branch partially updated and the other stale. To restore the old independent behavior, pass diamond_consistency => 'none' explicitly.

  • Adaptive fallback is visible at the default log level. When a differential refresh falls back to a full refresh because the delta is too large, the message is now emitted at NOTICE level (the default client_min_messages threshold) instead of INFO (usually suppressed in the client session).

  • CALCULATED schedule without downstream dependents warns you. When a stream table is created with schedule='calculated' but no existing stream table references it as a downstream dependent, a NOTICE explains that the schedule will fall back to pg_trickle.default_schedule_seconds.

  • Internal __pgt_* auxiliary columns are now documented. The hidden columns that the refresh engine may add to stream table physical storage are described in a new section of SQL_REFERENCE.md. This covers all variants from the always-present __pgt_row_id primary key through the aggregate-specific auxiliary columns for AVG, STDDEV, CORR, COVAR, REGR_*, window functions, and recursive CTE depth.

Bug Fixes

  • Scheduler no longer permanently misses stream tables created under a stale snapshot. signal_dag_invalidation is called inside the creating transaction before it commits. If the background scheduler happened to start a new tick and capture a catalog snapshot at that exact instant, the DAG rebuild query would not see the new stream table — yet the version counter was already advanced, so the scheduler would never rebuild again. The affected stream table would then never be scheduled for refresh. Fixed by verifying that every invalidated pgt_id is present in the rebuilt DAG after each rebuild. If any are missing the scheduler signals a full-rebuild for the next tick (which starts a fresh transaction that includes all committed data) rather than accepting the stale version. Fixes CI test test_autorefresh_diamond_cascade.

Upgrade Notes

  • New catalog columns. The 0.9.0 → 0.10.0 upgrade migration adds pooler_compatibility_mode BOOLEAN and refresh_tier TEXT to pgt_stream_tables. Run ALTER EXTENSION pg_trickle UPDATE TO '0.10.0' after replacing the extension files. Verification script: scripts/check_upgrade_completeness.sh.

  • Hidden auxiliary columns for statistical aggregates. Stream tables using CORR, COVAR_POP, COVAR_SAMP, or any REGR_* aggregate will get hidden __pgt_aux_* columns when created or altered under 0.10.0. These are invisible to normal queries (excluded by the NOT LIKE '__pgt_%' convention) and managed automatically.

  • pooler_compatibility_mode is off by default. Existing stream tables are unaffected. Enable it only for stream tables accessed through PgBouncer transaction-mode pooling.

Additional Bug Fixes (2026-03-24)

Scheduler stability:

  • Scheduler no longer crashes when concurrent refreshes compete. The internal function that decides whether to skip a refresh cycle was running a locking query outside a transaction boundary — a strict PostgreSQL requirement. It now runs inside a proper subtransaction, eliminating the crash.

  • Auto-backoff no longer causes a transaction conflict in the background worker. When the auto-backoff feature stretches a stream table's refresh interval, it previously tried to open a new transaction inside the background worker's already-open transaction. PostgreSQL does not allow this nesting; the code path is now restructured to avoid it.

Query engine correctness:

  • Queries that filter on hidden columns now produce correct results. For example, SELECT name FROM users WHERE internal_id > 5 — where internal_id is not part of the output — could return wrong rows during incremental updates. Fixed.

  • JOIN results are correct when both joined tables change at the same time. Simultaneous changes to two stream tables connected by a JOIN could leave the output with stale or duplicated rows. Fixed.

  • NULLIF(a, b) expressions now work in incremental queries. NULLIF returns NULL when its two arguments are equal. It was not recognised by the incremental parser, causing a fallback error. Fixed.

  • LIKE and ILIKE pattern matching now work in filter conditions. Filter expressions such as WHERE name LIKE 'A%' or WHERE description ILIKE '%widget%' were not handled by the incremental engine. Fixed.

  • Subqueries with ORDER BY, LIMIT, or OFFSET are now preserved correctly. When the incremental engine reconstructed a subquery, those clauses were silently dropped. The incremental result no longer differs from a full refresh for such queries.

  • Scalar subqueries using LIMIT or OFFSET are now handled gracefully. Rather than producing a runtime error, the engine falls back to a full refresh for those cases and continues.

SQL parser:

  • Wildcard column references (table.*) now work for qualified names. A two- or three-part column reference such as schema.table.* or alias.* caused a parser crash. Fixed.

Change capture and WAL:

  • State transitions no longer stall when the WAL replication slot is behind. When a stream table moves through the TRANSITIONING state, pg_trickle now advances the WAL replication slot up-front. This eliminates a lag-check stall that could cause the transition to hang indefinitely under write-heavy workloads.

Security:

  • Several low-severity code quality and security scanner alerts from Semgrep and CodeQL are resolved. No user-visible behaviour changes.

[0.9.0] — Incremental Aggregates & Smarter Scheduling

The headline feature of 0.9.0 is incremental aggregate maintenance: when a single row changes inside a group of 100,000 rows, pg_trickle no longer has to re-scan all 100,000 rows to update COUNT, SUM, AVG, STDDEV, or VAR results. Instead it keeps running totals and adjusts them in constant time. Only MIN/MAX still needs a rescan — and only when the deleted value happens to be the current extreme.

Beyond aggregates, this release contains a broad set of performance optimizations that reduce wasted I/O during every refresh cycle, two new configuration knobs, a refresh-group management API, and several bug fixes.

Faster Aggregates

  • Constant-time COUNT, SUM, AVG: Changed rows are now applied algebraically (new_sum = old_sum + inserted − deleted) instead of re-aggregating the whole group. AVG uses hidden auxiliary SUM and COUNT columns maintained automatically on the stream table.
  • Constant-time STDDEV and VAR: Standard-deviation and variance aggregates (STDDEV_POP, STDDEV_SAMP, VAR_POP, VAR_SAMP) now use a sum-of-squares decomposition with a hidden auxiliary column, achieving the same constant-time update as COUNT/SUM/AVG.
  • MIN/MAX safety guard: Deleting the row that currently holds the minimum (or maximum) value correctly triggers a rescan of that group. Property-based tests verify this boundary.
  • Floating-point drift reset: A new setting (pg_trickle.algebraic_drift_reset_cycles) periodically forces a full recomputation to correct any floating-point rounding drift that accumulates over many incremental cycles.

Smarter Refresh Scheduling

  • Automatic backoff for overloaded streams: The pg_trickle.auto_backoff GUC was introduced here (default off at the time). See the v0.10.0 entry for the improved thresholds, reduced cap, and the flip to on by default.
  • Index-aware MERGE: A new threshold setting (pg_trickle.merge_seqscan_threshold, default 0.001) tells PostgreSQL to use an index lookup instead of a full table scan when only a tiny fraction of the stream table's rows are changing.

Less Wasted I/O

  • Skip unchanged columns: The scan operator now checks the CDC trigger's per-row bitmask to skip UPDATE rows where none of the columns your query actually uses were modified. For wide tables where you only reference a few columns, most UPDATE processing is eliminated.
  • Skip unchanged sources in joins: When a multi-source join query has three source tables but only one of them changed, the delta branches for the two unchanged sources are now replaced with FALSE at plan time. PostgreSQL's planner recognises those branches as empty and skips them entirely.
  • Push WHERE filters into the change scan: If your stream table's defining query has a WHERE clause (e.g. WHERE status = 'shipped'), that filter is now applied immediately after reading the change buffer — before rows enter the join or aggregate pipeline. Rows that don't match the filter are discarded right away.
  • Faster DISTINCT counting: The per-row multiplicity lookup for SELECT DISTINCT queries now uses an index-driven scalar subquery instead of a LEFT JOIN, guaranteeing I/O proportional to the number of changed rows regardless of stream table size.
  • Scalar subquery short-circuit: When a scalar subquery's inner source has no changes in the current cycle, the expensive outer-table snapshot reconstruction is skipped entirely.

Refresh Group Management

  • New SQL functions for grouping stream tables that should always be refreshed together (cross-source snapshot consistency):
    • pgtrickle.create_refresh_group(name, members, isolation)
    • pgtrickle.drop_refresh_group(name)
    • pgtrickle.refresh_groups() — lists all declared groups.

Bug Fixes

  • Fixed a crash when internal status queries failed: The source_gates() and watermarks() SQL functions previously crashed the entire PostgreSQL backend process on any internal error. They now report a normal SQL error instead.
  • Clearer handling of window functions in expressions: Queries like CASE WHEN ROW_NUMBER() OVER (...) > 5 THEN ... were silently accepted but failed at refresh time with a confusing error. pg_trickle now automatically falls back to full refresh mode (in AUTO mode) or warns you at creation time (in explicit DIFFERENTIAL mode).

Documentation

  • Documented the known limitation that recursive CTE stream tables in DIFFERENTIAL mode fall back to full recomputation when rows are deleted or updated. Workaround: use refresh_mode = 'IMMEDIATE'.
  • Documented the pgt_refresh_groups catalog table schema and usage.
  • Documented the O(partition_size) cost of window function maintenance with mitigation strategies.

Deferred to v0.10.0

The following performance optimizations were evaluated and explicitly deferred. In every case the current behaviour is correct — these items would make certain workloads faster but carry enough implementation risk that they need more design work first:

  • Recursive CTE incremental delete/update in DIFFERENTIAL mode (P2-1)
  • SUM NULL-transition shortcut for FULL OUTER JOIN aggregates (P2-2)
  • Materialized view sources in IMMEDIATE mode (P2-4)
  • LATERAL subquery scoped re-execution (P2-6)
  • Welford auxiliary columns for CORR/COVAR/REGR_* aggregates (P3-2)
  • Merged-delta weight aggregation for multi-source deduplication (B3-2/B3-3)

Upgrade Notes

  • New SQL objects: The 0.8.0 → 0.9.0 upgrade migration adds the pgt_refresh_groups table and the restore_stream_tables function. Run ALTER EXTENSION pg_trickle UPDATE TO '0.9.0' after replacing the extension files.
  • Hidden auxiliary columns: Stream tables using AVG, STDDEV, or VAR aggregates will automatically get hidden __pgt_aux_* columns when created or altered. These columns are invisible to normal queries (filtered by the existing NOT LIKE '__pgt_%' convention) and are managed automatically.
  • PGXN publishing: Release artifacts are now automatically uploaded to PGXN via GitHub Actions.

[0.8.0] — Backup, Pooler Compatibility & Reliability

This release focuses on making your streams easier to back up, far more reliable under complex scenarios, and solidifying the underlying core engine through massive testing improvements.

Added

  • Backup and Restore Support: You can now safely backup your database using standard pg_dump and pg_restore commands. The system will automatically reconnect all streams and data queues to eliminate downtime during disaster recovery.
  • Connection Pooler Opt-In: Replaced the global PgBouncer pooler compatibility setting with a per-stream option. You can now enable connection pooling optimizations selectively on a stream-by-stream basis.

Fixed

  • Cyclic Stream Reliability: Fixed internal bugs that occasionally caused streams referencing each other in a loop to get stuck refreshing forever. Streams now accurately detect when row changes stop and naturally settle.
  • Large Dependency Chains: Fixed a crash (stack overflow) that could happen if you attempted to drop an extremely large or heavily recursive chain of stream tables sequentially.
  • Special Character Support in SQL: Handled an edge case causing errors when multi-byte characters or special non-ASCII symbols were parsed inside certain SQL commands.
  • Mac Support for Developer Tooling: Addressed a minor internal tool error stopping test components from automatically building on Apple Silicon machines.

Under the Hood Code and Testing Enhancements

  • Massive Testing Hardening: We have fundamentally overhauled and upgraded how we test the system. Our internal test suite has been completely enhanced with tens of thousands of continuous automated checks ensuring query answers are perfect, no matter how complex the data joins or updates get.
  • Performance Migrations: Began adopting new tools (cargo nextest) to speed up how fast we can iterate and develop the software in the background.

[0.7.0] — Watermark Gating, Circular Pipelines & SQL Broadening

0.7.0 makes pg_trickle easier to trust in real-world data pipelines. The big theme of this release is fewer surprises: the scheduler can now wait for late arriving source data, some circular pipelines can run safely instead of being blocked, more queries stay on incremental refresh, and the system does a better job of deciding when incremental work is no longer worth it.

Added

Multi-source data can wait until it is actually ready

pg_trickle can now delay a refresh until related source tables have all caught up to roughly the same point in time. This is useful for ETL jobs where, for example, orders arrives before order_lines and refreshing too early would produce a half-finished report.

  • New watermark APIs: advance_watermark(source, watermark), create_watermark_group(name, sources[], tolerance_secs), and drop_watermark_group(name).
  • New status helpers: watermarks(), watermark_groups(), and watermark_status().
  • The scheduler now skips gated refreshes when grouped sources are too far apart and records the reason in refresh history.
  • New catalog tables store per-source watermarks and watermark group definitions.
  • 28 end-to-end tests cover normal operation, bad input, tolerance windows, and scheduler behavior.

Some circular pipelines can now run safely

Stream tables that depend on each other in a loop are no longer always blocked. If the cycle is monotone and uses DIFFERENTIAL mode, pg_trickle can now keep refreshing the group until it stops changing.

  • Circular refreshes run to a fixed point, with pg_trickle.max_fixpoint_iterations as a safety limit.
  • Cycle creation and ALTER validation now check that every member is safe for convergence before allowing the loop.
  • pgtrickle.pgt_status() now reports scc_id, and pgtrickle.pgt_scc_status() shows per-cycle-group status.
  • pgtrickle.pgt_stream_tables now tracks last_fixpoint_iterations so it is easier to spot slow or unstable cycles.
  • 6 end-to-end tests cover convergence, rejection of unsafe cycles, non-convergence handling, and cleanup.

More queries stay on incremental refresh

Several query shapes that used to fall back to FULL refresh, or fail outright, now keep working in DIFFERENTIAL and AUTO mode.

  • User-defined aggregates created with CREATE AGGREGATE now work through the existing group-rescan strategy, including common extension-provided aggregates.
  • More complex OR plus subquery patterns are now rewritten correctly, including cases that need De Morgan normalization and multiple rewrite passes.
  • The rewrite pipeline has a guardrail to stop runaway branch explosion.
  • A dedicated 14-test end-to-end suite covers these previously missing cases.

Easier packaging ahead of 1.0

The release also adds infrastructure that makes evaluation and future distribution simpler.

  • Dockerfile.hub and a dedicated CI workflow can build and smoke-test a ready-to-run PostgreSQL 18 image with pg_trickle preinstalled.
  • META.json adds PGXN package metadata with release_status: "testing".
  • CNPG smoke testing is now part of the documented pre-1.0 packaging story.

Improved

Refresh strategy and performance decisions are smarter

The scheduler and refresh engine now make better choices when incremental work is likely to help and back off sooner when it is not.

  • Wide tables now use xxh64-based change detection instead of slower MD5-based comparisons.
  • Aggregate stream tables can skip expensive incremental work and jump straight to FULL refresh when the pending change set is obviously too large.
  • Strategy selection now combines a change-ratio signal with recent refresh history, which helps on workloads with uneven batch sizes.
  • DAG levels are extracted explicitly, enabling level-parallel refresh scheduling.
  • Small internal hot paths such as column-list building and LSN comparison were tightened to remove avoidable allocations.

Benchmarking is much easier to use and compare

The performance toolchain was expanded so regressions are easier to spot and large-scale behavior is easier to study.

  • Benchmarks now support per-cycle output, optional EXPLAIN ANALYZE capture, larger 1M-row runs, and more stable Criterion settings.
  • New tooling covers cross-run comparison, concurrent writers, and extra query shapes such as window, lateral, CTE, and UNION ALL workloads.
  • just bench-docker makes it easier to run Criterion inside the builder image when local linking is awkward.

Changed

Internal Code Quality: Integration Test Suite Hardening

Completed a full hardening pass of the integration test suite, bringing all items in PLAN_TEST_EVALS_INTEGRATION.md to done:

  • Multiset validation — Extracted assert_sets_equal() helper relying on EXCEPT/UNION ALL SQL logic and applied it to workflow tests to ensure storage table state correctly matches the defining query post-refresh.
  • Round-trip notificationspg_trickle_alert notifications now verify receipt end-to-end via sqlx::PgListener.
  • DVM operators — Added unit coverage for complex semi/anti-join behaviors (multi-column, filtered, complementary), multi-table join chains for inner and full joins, and proptest! fuzz tests enforcing generated SQL invariants across INNER, SEMI, and ANTI joins.
  • Resilience and edge cases — Test coverage for ST drop cascades verifying dependent object removal, exact error escalation thresholds, and scheduler job lifecycles across queued mock states.
  • Cleanups — Standardized naming practices (test_workflow_*, test_infra_*) and eliminated clock-bound flakes by widening staleness assertions.

Internal low-level code is much safer to audit

This release cuts the amount of low-level unsafe Rust in half without changing behavior.

  • Unsafe blocks were reduced by 51%, from 1,309 to 641.
  • Repeated patterns were consolidated into a small set of documented helper functions.
  • 37 internal functions no longer need to be marked unsafe.
  • Existing unit tests continued to pass unchanged after the refactor.

[0.6.0] — Idempotent DDL, Partitioned Sources & dbt Integration

Added

Idempotent DDL (create_or_replace)

New one-call function for deploying stream tables without worrying about whether they already exist. Replaces the old "check if it exists, then drop and recreate" pattern.

  • create_or_replace_stream_table() — a single function that does the right thing automatically:
    • Creates the stream table if it doesn't exist yet.
    • Does nothing if the stream table already exists with the same query and settings (logs an INFO so you know it was a no-op).
    • Updates settings (schedule, refresh mode, etc.) if only config changed.
    • Replaces the query if the defining query changed — including automatic schema migration and a full refresh.
  • dbt uses it automatically. The stream_table materialization now calls create_or_replace_stream_table() when running against pg_trickle 0.6.0+, with automatic fallback for older versions.
  • Whitespace-insensitive. Cosmetic SQL differences (extra spaces, tabs, newlines) are correctly treated as no-ops — won't trigger unnecessary rebuilds.

dbt Integration Enhancements

  • Check stream table health from dbt. New pgtrickle_stream_table_status() macro returns whether a stream table is healthy, stale, erroring, or paused. Pair it with the new built-in stream_table_healthy test in your schema.yml to fail CI when a stream table is behind or broken.
  • Refresh everything in the right order. New refresh_all_stream_tables run-operation refreshes all dbt-managed stream tables in dependency order. Run it after dbt run and before dbt test in your CI pipeline.

Partitioned Source Tables

Stream tables now work with PostgreSQL's declarative table partitioning — RANGE, LIST, and HASH partitioned tables all work as sources out of the box.

  • Changes in any partition are captured automatically. CDC triggers fire on the parent table so inserts, updates, and deletes in any child partition are picked up.
  • ATTACH PARTITION triggers automatic rebuild. When you attach a new partition, pg_trickle detects the structural change and rebuilds affected stream tables to include the new partition's pre-existing data.
  • WAL mode works with partitions. Publications are configured with publish_via_partition_root = true, so all partitions report changes under the parent table's identity.
  • New tutorial covering partitioned source tables, ATTACH/DETACH behavior, and known caveats (docs/tutorials/PARTITIONED_TABLES.md).

Circular Dependency Foundation

Lays the groundwork for stream tables that reference each other in a cycle (A → B → A). The actual cyclic refresh execution is planned for v0.7.0 — this release adds the detection, validation, and safety infrastructure.

  • Cycle detection. pg_trickle can now identify groups of stream tables that form circular dependencies.
  • Safety checks at creation time. Queries that can't safely participate in a cycle (those using aggregates, EXCEPT, window functions, or NOT EXISTS) are rejected with a clear error explaining why.
  • New settings:
    • pg_trickle.allow_circular (default: off) — master switch for circular dependencies.
    • pg_trickle.max_fixpoint_iterations (default: 100) — prevents runaway loops.

Source Gating Improvements

  • bootstrap_gate_status() function. Shows which sources are currently gated, when they were gated, how long the gate has been active, and which stream tables are waiting. Useful for debugging "why isn't my stream table refreshing?"
  • ETL coordination cookbook. SQL Reference now includes five step-by-step recipes for common bulk-load patterns.

More SQL Patterns Supported

Two query patterns that previously required workarounds now just work:

  • Window functions inside expressions. Queries like CASE WHEN ROW_NUMBER() OVER (...) = 1 THEN 'top' ELSE 'other' END or COALESCE(SUM() OVER (...), 0) are now accepted and produce correct results. Use FULL refresh mode for these queries — incremental (DIFFERENTIAL) refresh of window-in-expression patterns is not yet supported. Previously, the query was rejected entirely at creation time.

  • ALL (subquery) comparisons. Queries like WHERE price < ALL (SELECT price FROM competitors) are now accepted in both FULL and DIFFERENTIAL modes. Supports all comparison operators (>, >=, <, <=, =, <>) and correctly handles NULL values per the SQL standard.

Operational Safety Improvements

  • Function changes detected automatically. If a stream table's query calls a user-defined function and you update that function with CREATE OR REPLACE FUNCTION, pg_trickle detects the change and automatically rebuilds the stream table on the next cycle. No manual intervention needed.

  • WAL mode explains why it isn't activating. When cdc_mode = 'auto' and the system stays on trigger-based tracking, the scheduler now periodically logs the exact reason (e.g., "wal_level is not logical") and check_cdc_health() reports the current mode so you can diagnose the issue.

  • WAL + keyless tables rejected early. Creating a stream table with cdc_mode = 'wal' on a table that has no primary key and no REPLICA IDENTITY FULL is now rejected at creation time with a clear error — instead of silently producing incomplete results later.

  • Automatic recovery after backup/restore. When a PostgreSQL server is restored from pg_basebackup, WAL replication slots are lost. pg_trickle now detects the missing slot, automatically falls back to trigger-based tracking, and logs a WARNING so you know what happened.

Documentation

  • ALL (subquery) worked example in the SQL Reference with sample data and expected results.
  • Window-in-expression documentation showing before/after examples of the automatic rewrite.
  • Foreign table sources tutorial — step-by-step guide for using postgres_fdw foreign tables as stream table sources.

Fixed

  • create_or_replace whitespace handling. Extra spaces, tabs, and newlines in queries no longer trigger unnecessary rebuilds.
  • create_or_replace schema incompatibility detection. Incompatible column type changes (e.g., text → integer) are now properly detected and handled.

[0.5.0] — Row-Level Security, Source Gating & Append-Only Fast Path

Added

Row-Level Security (RLS) Support

Stream tables now work correctly with PostgreSQL's Row-Level Security feature, which lets you control which rows different users can see.

  • Refreshes always see all data. When a stream table is refreshed, it computes the full result regardless of RLS policies on the source tables. This matches how PostgreSQL's built-in materialized views work. You then add RLS policies directly on the stream table to control who can read what.
  • Internal tables are protected. The internal change-tracking tables used by pg_trickle are shielded from RLS interference, so refreshes won't silently fail if you turn on RLS at the schema level.
  • Real-time (IMMEDIATE) mode secured. Triggers that keep stream tables updated in real time now run with elevated privileges and a locked-down search path, preventing data corruption or security bypasses.
  • RLS changes are detected automatically. If you enable, disable, or force RLS on a source table, pg_trickle detects the change and marks affected stream tables for a full rebuild.
  • New tutorial. Step-by-step guide for setting up per-tenant RLS policies on stream tables (see docs/tutorials/ROW_LEVEL_SECURITY.md).

Source Gating for Bulk Loads

New pause/resume mechanism for large data imports. When you're loading a big batch of data into a source table, you can temporarily "gate" it to prevent the background scheduler from triggering refreshes mid-load. Once the load is done, ungate it and everything catches up in a single refresh.

  • gate_source('my_table') — pauses automatic refreshes for any stream table that depends on my_table.
  • ungate_source('my_table') — resumes automatic refreshes. All changes made during the gate are picked up in the next refresh cycle.
  • source_gates() — shows which source tables are currently gated, when they were gated, and by whom.
  • Manual refresh still works. Even while a source is gated, you can explicitly call refresh_stream_table() if needed.
  • Gating is idempotent — calling gate_source() twice is safe, and gating a source that's already gated is a no-op.

Append-Only Fast Path

Significant performance improvement for tables that only receive INSERTs (event logs, audit trails, time-series data, etc.). When you mark a stream table as append_only, refreshes skip the expensive merge logic (checking for deletes, updates, and row comparisons) and use a simple, fast insert.

  • How to use: Pass append_only => true when creating or altering a stream table.
  • Safe fallback. If a DELETE or UPDATE is detected on a source table, the extension automatically falls back to the standard refresh path and logs a warning. It won't silently produce wrong results.
  • Restrictions. Append-only mode requires DIFFERENTIAL refresh mode and source tables with primary keys.

Usability Improvements

  • Manual refresh history. When you manually call refresh_stream_table(), the result (success or failure, timing, rows affected) is now recorded in the refresh history, just like scheduled refreshes.
  • quick_health view. A single-row health summary showing how many stream tables you have, how many are in error or stale, whether the scheduler is running, and an overall status (OK, WARNING, CRITICAL). Easy to plug into monitoring dashboards.
  • create_stream_table_if_not_exists(). A convenience function that does nothing if the stream table already exists, instead of raising an error. Makes migration scripts and deployment automation simpler.

Smooth Upgrade from 0.4.0

  • Existing installations can upgrade with ALTER EXTENSION pg_trickle UPDATE TO '0.5.0'. All new features (source gating, append-only mode, quick health view, and the new convenience functions) are included in the upgrade script.
  • The upgrade has been verified with automated tests that confirm all 40 SQL objects survive the upgrade intact.

[0.4.0] — Parallel Refresh & Statement-Level CDC Triggers

Added

Parallel Refresh (opt-in)

Stream tables can now be refreshed in parallel, using multiple background workers instead of processing them one at a time. This can dramatically reduce end-to-end refresh latency when you have many independent stream tables.

  • Off by default. Set pg_trickle.parallel_refresh_mode = 'on' to enable. Use 'dry_run' to preview what the scheduler would do without changing behavior.
  • Automatic dependency awareness. The scheduler figures out which stream tables can safely refresh at the same time and which must wait for others. Stream tables connected by real-time (IMMEDIATE) triggers are always refreshed together to prevent race conditions.
  • Atomic groups. When a group of stream tables must succeed or fail together (e.g. diamond dependencies), all members are wrapped in a single transaction — if one fails, the whole group rolls back cleanly.
  • Worker pool controls:
    • pg_trickle.max_dynamic_refresh_workers (default 4) — cluster-wide cap on concurrent refresh workers.
    • pg_trickle.max_concurrent_refreshes — per-database dispatch cap.
  • Monitoring:
    • worker_pool_status() — shows how many workers are active and the current limits.
    • parallel_job_status(max_age_seconds) — lists recent and active refresh jobs with timing and status.
    • health_check() now warns when the worker pool is saturated or the job queue is backing up.
  • Self-healing. On startup, the scheduler automatically cleans up orphaned jobs and reclaims leaked worker slots from previous crashes.

Statement-Level CDC Triggers

Change tracking triggers have been upgraded from row-level to statement-level, reducing write-side overhead for bulk INSERT and UPDATE operations. This is now the default for all new and existing stream tables. A benchmark harness is included so you can measure the difference on your own hardware.

dbt Getting Started Example

New examples/dbt_getting_started/ project with a complete, runnable dbt example showing org-chart seed data, staging views, and three stream table models. Includes an automated test script.

Fixed

Refresh Lock Not Released After Errors

Fixed a bug where refresh_stream_table() could get permanently stuck after a PostgreSQL error (e.g. running out of temp file space). The internal lock was session-level and survived transaction rollback, causing all future refreshes for that stream table to report "another refresh is already in progress". Refresh locks are now transaction-level, so they are automatically released when the transaction ends — whether it succeeds or fails.

dbt Integration Fixes

  • Fixed query quoting in dbt macros that broke when queries contained single quotes.
  • Fixed schedule = none in dbt being incorrectly mapped to SQL NULL.
  • Fixed view inlining when the same view was referenced with different aliases.

Changed

Internal Code Quality: Integration Test Suite Hardening

Completed a full hardening pass of the integration test suite, bringing all items in PLAN_TEST_EVALS_INTEGRATION.md to done:

  • Multiset validation — Extracted assert_sets_equal() helper relying on EXCEPT/UNION ALL SQL logic and applied it to workflow tests to ensure storage table state correctly matches the defining query post-refresh.

  • Round-trip notificationspg_trickle_alert notifications now verify receipt end-to-end via sqlx::PgListener.

  • DVM operators — Added unit coverage for complex semi/anti-join behaviors (multi-column, filtered, complementary), multi-table join chains for inner and full joins, and proptest! fuzz tests enforcing generated SQL invariants across INNER, SEMI, and ANTI joins.

  • Resilience and edge cases — Test coverage for ST drop cascades verifying dependent object removal, exact error escalation thresholds, and scheduler job lifecycles across queued mock states.

  • Cleanups — Standardized naming practices (test_workflow_*, test_infra_*) and eliminated clock-bound flakes by widening staleness assertions.

  • Updated to PostgreSQL 18.3 across CI and test infrastructure.

  • Dependency updates: tokio 1.49 → 1.50 and several GitHub Actions bumps.

Breaking Changes

These behavioural changes shipped in v0.4.0. They improve usability but may require action from users upgrading from v0.3.0.

  • Schedule default changed from '1m' to 'calculated'. create_stream_table now defaults to schedule => 'calculated', which auto-computes the refresh interval from downstream dependents instead of refreshing every 1 minute. If you relied on the implicit 1-minute default, explicitly pass schedule => '1m' to preserve the old behaviour.

  • NULL schedule input rejected. Passing schedule => NULL to create_stream_table now returns an error. Use schedule => 'calculated' instead — it's explicit and self-documenting.

  • Diamond GUCs removed. The cluster-wide GUCs pg_trickle.diamond_consistency and pg_trickle.diamond_schedule_policy have been removed. Diamond behaviour is now controlled per-table via parameters on create_stream_table() / alter_stream_table(): diamond_consistency => 'atomic', diamond_schedule_policy => 'slowest'.


[0.3.0] — Incremental Correctness & Security Tooling

This is a correctness and hardening release. No new SQL functions, tables, or views were added — all changes are in the compiled extension code. ALTER EXTENSION pg_trickle UPDATE is safe and a no-op for schema objects.

Fixed

Incremental Correctness Fixes

All 18 previously-disabled correctness tests have been re-enabled (0 remaining). The following query patterns now produce correct results during incremental (non-full) refreshes:

  • HAVING clause threshold crossing. Queries with HAVING filters (e.g. HAVING SUM(amount) > 100) now produce correct totals when groups cross the threshold. Previously, a group gaining enough rows to meet the condition would show only the newly added values instead of the correct total.

  • FULL OUTER JOIN. Five bugs affecting incremental updates for FULL OUTER JOIN queries are fixed: mismatched row identifiers, incorrect handling of compound GROUP BY expressions like COALESCE(left.col, right.col), and wrong NULL handling for SUM aggregates.

  • EXISTS with HAVING subqueries. Queries using WHERE EXISTS(... GROUP BY ... HAVING ...) now work correctly — the inner GROUP BY and HAVING were previously being silently discarded.

  • Correlated scalar subqueries. Correlated subqueries in SELECT like (SELECT MAX(e.salary) FROM emp e WHERE e.dept_id = d.id) are now automatically rewritten into LEFT JOINs so the incremental engine can handle them correctly.

Background Worker Detection on PostgreSQL 18

Fixed a bug where health_check() and the scheduler reported zero active workers on PostgreSQL 18 due to a column name change in system views.

Scheduler Stability

Fixed a loop where the scheduler launcher could get stuck retrying failed database probes indefinitely instead of backing off properly.

Added

Security Tooling

Added static security analysis to the CI pipeline:

  • GitHub CodeQL — automated security scanning across all Rust source files. First scan: zero findings.
  • cargo deny — enforces a license allow-list and flags unmaintained or yanked dependencies.
  • Semgrep — custom rules that flag potentially dangerous patterns such as dynamic SQL construction and privilege escalation. Advisory-only (does not block merges).
  • Unsafe block inventory — CI tracks the count of unsafe code blocks per file and fails if any file exceeds its baseline, preventing unreviewed growth of low-level code.

[0.2.3] — Per-Table CDC Mode & WAL Lag Monitoring

Added

  • Unsafe function detection. Queries using non-deterministic functions like random() or clock_timestamp() are now rejected when creating incremental stream tables, because they can't produce reliable results. Functions like now() that return the same value within a transaction are allowed with a warning.

  • Per-table change tracking mode. You can now choose how each stream table tracks changes ('auto', 'trigger', or 'wal') via the cdc_mode parameter on create_stream_table() and alter_stream_table(), instead of relying only on the global setting.

  • CDC status view. New pgtrickle.pgt_cdc_status view shows the change tracking mode, replication slot, and transition status for every source table in one place.

  • Configurable WAL lag thresholds. The warning and critical thresholds for replication slot lag are now configurable via pg_trickle.slot_lag_warning_threshold_mb (default 100 MB) and pg_trickle.slot_lag_critical_threshold_mb (default 1024 MB), instead of being hard-coded.

  • pg_trickle_dump backup tool. New standalone CLI that exports all your stream table definitions as replayable SQL, ordered by dependency. Useful for backups before upgrades or migrations.

  • Upgrade path. ALTER EXTENSION pg_trickle UPDATE picks up all new features from this release.

Changed

Internal Code Quality: Integration Test Suite Hardening

Completed a full hardening pass of the integration test suite, bringing all items in PLAN_TEST_EVALS_INTEGRATION.md to done:

  • Multiset validation — Extracted assert_sets_equal() helper relying on EXCEPT/UNION ALL SQL logic and applied it to workflow tests to ensure storage table state correctly matches the defining query post-refresh.

  • Round-trip notificationspg_trickle_alert notifications now verify receipt end-to-end via sqlx::PgListener.

  • DVM operators — Added unit coverage for complex semi/anti-join behaviors (multi-column, filtered, complementary), multi-table join chains for inner and full joins, and proptest! fuzz tests enforcing generated SQL invariants across INNER, SEMI, and ANTI joins.

  • Resilience and edge cases — Test coverage for ST drop cascades verifying dependent object removal, exact error escalation thresholds, and scheduler job lifecycles across queued mock states.

  • Cleanups — Standardized naming practices (test_workflow_*, test_infra_*) and eliminated clock-bound flakes by widening staleness assertions.

  • After a full refresh, WAL replication slots are now advanced to the current position, preventing unnecessary WAL accumulation and false lag alarms.

  • Change buffers are now flushed after a full refresh, fixing a cycle where the scheduler would alternate endlessly between incremental and full refreshes on bulk-loaded tables.

  • IMMEDIATE mode now correctly rejects explicit WAL CDC requests with a clear error, since real-time mode uses its own trigger mechanism.

  • The pg_trickle.user_triggers setting is simplified to auto and off. The old on value still works as an alias for auto.

  • CI pipelines are faster on PRs — only essential tests run; the full suite runs on merge and daily schedule.


[0.2.2] — AUTO Refresh Mode & Query Alteration

Added

  • Change a stream table's query. alter_stream_table now accepts a query parameter, so you can change what a stream table computes without dropping and recreating it. If the new query's columns are compatible, the underlying storage table is preserved — existing views, policies, and publications continue to work.

  • AUTO refresh mode (new default). Stream tables now default to AUTO mode, which uses fast incremental updates when the query supports it and automatically falls back to a full recompute when it doesn't. You no longer need to think about whether your query is "incremental-compatible" — just create the stream table and it picks the best strategy.

  • Version mismatch warning. The background scheduler now warns if the installed extension version doesn't match the compiled library, making it easier to spot a half-finished upgrade.

  • ORDER BY + LIMIT + OFFSET. You can now page through top-N results, e.g. ORDER BY revenue DESC LIMIT 10 OFFSET 20 to get the third page of top earners.

  • Real-time mode: recursive queries. WITH RECURSIVE queries (e.g. org-chart hierarchies) now work in IMMEDIATE mode. A depth limit (default 100) prevents infinite loops.

  • Real-time mode: top-N queries. ORDER BY ... LIMIT N queries now work in IMMEDIATE mode — the top-N rows are recomputed on every data change. Maximum N is controlled by pg_trickle.ivm_topk_max_limit (default 1000).

  • Foreign table support. Stream tables can now use foreign tables as sources. Changes are detected by comparing snapshots since foreign tables don't support triggers. Enable with pg_trickle.foreign_table_polling = on.

  • Documentation reorganization. Configuration and SQL reference docs are reorganized around practical workflows. New sections cover DDL-during-refresh behavior, standby/replica limitations, and PgBouncer constraints.

Changed

Internal Code Quality: Integration Test Suite Hardening

Completed a full hardening pass of the integration test suite, bringing all items in PLAN_TEST_EVALS_INTEGRATION.md to done:

  • Multiset validation — Extracted assert_sets_equal() helper relying on EXCEPT/UNION ALL SQL logic and applied it to workflow tests to ensure storage table state correctly matches the defining query post-refresh.

  • Round-trip notificationspg_trickle_alert notifications now verify receipt end-to-end via sqlx::PgListener.

  • DVM operators — Added unit coverage for complex semi/anti-join behaviors (multi-column, filtered, complementary), multi-table join chains for inner and full joins, and proptest! fuzz tests enforcing generated SQL invariants across INNER, SEMI, and ANTI joins.

  • Resilience and edge cases — Test coverage for ST drop cascades verifying dependent object removal, exact error escalation thresholds, and scheduler job lifecycles across queued mock states.

  • Cleanups — Standardized naming practices (test_workflow_*, test_infra_*) and eliminated clock-bound flakes by widening staleness assertions.

  • Default refresh mode changed from 'DIFFERENTIAL' to 'AUTO'.

  • Default schedule changed from '1m' to 'calculated' (automatic).

  • Default change tracking mode changed from 'trigger' to 'auto' — WAL-based tracking starts automatically when available, with trigger-based as fallback.


[0.2.1] — Safe Upgrades & Scheduling Improvements

Added

  • Safe upgrades. New upgrade infrastructure ensures that ALTER EXTENSION pg_trickle UPDATE works correctly. A CI check detects missing functions or views in upgrade scripts, and automated tests verify that stream tables survive version-to-version upgrades intact. See docs/UPGRADING.md for the upgrade guide.

  • ORDER BY + LIMIT + OFFSET. You can now create stream tables over paged results, like "the second page of the top-100 products by revenue" (ORDER BY revenue DESC LIMIT 100 OFFSET 100).

  • 'calculated' schedule. Instead of passing SQL NULL to request automatic scheduling, you can now write schedule => 'calculated'. Passing NULL now gives a helpful error message.

  • Documentation expansion. Six new pages in the online book covering dbt integration, contributing guidelines, security policy, release process, and research comparisons with other projects.

  • Better warnings and safety checks:

    • Warning when a source table lacks a primary key (duplicate rows are handled safely but less efficiently).
    • Warning when using SELECT * (new columns added later can break incremental updates).
    • Alert when the refresh queue is falling behind (> 80% capacity).
    • Guard triggers prevent accidental direct writes to stream table storage.
    • Automatic fallback from WAL to trigger-based change tracking when the replication slot disappears.
    • Nested window functions and complex WHERE clauses with EXISTS are now handled automatically.
  • Change buffer partitioning. For high-throughput tables, change buffers can now be partitioned so that processed data is dropped efficiently.

  • Column pruning. The incremental engine now skips source columns not used in the query, reducing I/O for wide tables.

Changed

Internal Code Quality: Integration Test Suite Hardening

Completed a full hardening pass of the integration test suite, bringing all items in PLAN_TEST_EVALS_INTEGRATION.md to done:

  • Multiset validation — Extracted assert_sets_equal() helper relying on EXCEPT/UNION ALL SQL logic and applied it to workflow tests to ensure storage table state correctly matches the defining query post-refresh.

  • Round-trip notificationspg_trickle_alert notifications now verify receipt end-to-end via sqlx::PgListener.

  • DVM operators — Added unit coverage for complex semi/anti-join behaviors (multi-column, filtered, complementary), multi-table join chains for inner and full joins, and proptest! fuzz tests enforcing generated SQL invariants across INNER, SEMI, and ANTI joins.

  • Resilience and edge cases — Test coverage for ST drop cascades verifying dependent object removal, exact error escalation thresholds, and scheduler job lifecycles across queued mock states.

  • Cleanups — Standardized naming practices (test_workflow_*, test_infra_*) and eliminated clock-bound flakes by widening staleness assertions.

  • Default schedule changed from '1m' to 'calculated' (automatic).

  • Minimum schedule interval lowered from 60 s to 1 s.

  • Cluster-wide diamond consistency settings removed; per-table settings remain and now default to 'atomic' / 'fastest'.

Fixed

  • The 0.1.3 → 0.2.0 upgrade script was accidentally a no-op, silently skipping 11 new functions. Fixed.
  • Queries combining WITH (CTEs) and UNION ALL now parse correctly.

[0.2.0] — Monitoring, IMMEDIATE Mode & Diamond Consistency

Added

  • Monitoring & health checks. Six new functions for inspecting your stream tables at runtime (no superuser required):

    • change_buffer_sizes() — shows how much pending change data each stream table has queued up.
    • list_sources(name) — lists all base tables that feed a given stream table, with row counts and size estimates.
    • dependency_tree() — displays an ASCII tree of how your stream tables depend on each other.
    • health_check() — quick system triage that checks whether the scheduler is running, flags tables in error or stale, and warns about large change buffers or WAL lag.
    • refresh_timeline() — recent refresh history across all stream tables, showing timing, row counts, and any errors.
    • trigger_inventory() — verifies that all required change-tracking triggers are in place and enabled.
  • IMMEDIATE refresh mode (real-time updates). New 'IMMEDIATE' mode keeps stream tables updated within the same transaction as your data changes. There's no delay — the stream table reflects changes the instant they happen. Supports window functions, LATERAL joins, scalar subqueries, and aggregate queries. You can switch between IMMEDIATE and other modes at any time using alter_stream_table.

  • Top-N queries (ORDER BY + LIMIT). Queries like SELECT ... ORDER BY score DESC LIMIT 10 are now supported. The stream table stores only the top N rows and updates efficiently.

  • Diamond dependency consistency. When multiple stream tables share common sources and feed into the same downstream table (a "diamond" pattern), they can now be refreshed as an atomic group — either all succeed or all roll back. This prevents inconsistent reads at convergence points. Controlled via the diamond_consistency parameter (default: 'atomic').

  • Multi-database auto-discovery. The background scheduler now automatically finds and services all databases on the server where pg_trickle is installed. No manual pg_trickle.database configuration required — just install the extension and the scheduler discovers it.

Fixed

  • Fixed IMMEDIATE mode incorrectly trying to read from change buffer tables (which don't exist in that mode) for certain aggregate queries.
  • Fixed type mismatches when join queries had unchanged source tables producing empty change sets.
  • Fixed join condition column order being swapped when the right-side table was written first in the ON clause (e.g. ON r.id = l.id).
  • Fixed dbt macros silently rolling back stream table creation because dbt wraps statements in a ROLLBACK by default.
  • Fixed LIMIT ALL being incorrectly rejected as an unsupported LIMIT clause.
  • Fixed false "query may produce incorrect incremental results" warnings on simple arithmetic like depth + 1 or path || name.
  • Fixed auto-created indexes using the wrong column name when the query had a column alias (e.g. SELECT id AS department_id).

[0.1.3] — TPC-H Correctness, Window Functions & Aggregate Fixes

Major hardening release with 50 improvements across correctness, robustness, operational safety, and test coverage.

Added

  • DDL change tracking expanded. ALTER TYPE, ALTER POLICY, and ALTER DOMAIN on source tables are now detected and trigger a rebuild of affected stream tables. Previously only column changes were tracked.
  • Recursive query safety guard. Recursive CTEs (WITH RECURSIVE) are now checked for non-monotonic terms that could produce incorrect incremental results.
  • Read replica awareness. The background scheduler detects when it's running on a read replica and skips refresh work, preventing errors.
  • Range aggregates rejected. RANGE_AGG and RANGE_INTERSECT_AGG are now properly rejected in incremental mode with a clear error.
  • Refresh history: row counts. Refresh history now records how many rows were inserted, updated, and deleted in each refresh cycle.
  • Change buffer alerts. New pg_trickle.buffer_alert_threshold setting lets you configure when to be warned about growing change buffers.
  • st_auto_threshold() function. Shows the current adaptive threshold that decides when to switch between incremental and full refresh.
  • Wide table optimization. Tables with more than 50 columns use a hash shortcut during refresh merges, improving performance.
  • Change buffer security. Internal change buffer tables are no longer accessible to PUBLIC.
  • Documentation. PgBouncer compatibility, keyless table limitations, delta memory bounds, sequential processing rationale, and connection overhead are all now documented in the FAQ.

TPC-H Correctness Suite: 22/22 Queries Passing

The TPC-H-derived correctness test suite (22 industry-standard analytical queries) now passes completely across multiple rounds of data changes. This validates that incremental refreshes produce identical results to full recomputation for complex real-world query patterns.

Fixed

Window Function Correctness

Fixed incremental maintenance of window functions (ROW_NUMBER, RANK, DENSE_RANK, NTILE, LAG/LEAD, SUM OVER, etc.) to correctly handle:

  • Non-RANGE frame types
  • Ranking functions over tied values
  • Window functions wrapping aggregates (e.g. RANK() OVER (ORDER BY SUM(x)))
  • Multiple window functions with different PARTITION BY clauses

INTERSECT / EXCEPT Correctness

Fixed incremental maintenance of INTERSECT and EXCEPT queries that produced wrong results due to invalid SQL generation.

EXISTS / IN with OR Correctness

Fixed EXISTS and IN subqueries combined with OR in WHERE clauses that produced wrong results.

Aggregate Correctness

  • MIN / MAX now correctly rescan the source table when the current minimum or maximum value is deleted.
  • STRING_AGG(... ORDER BY ...) and ARRAY_AGG(... ORDER BY ...) no longer silently drop the ORDER BY clause.

[0.1.2] — Incremental Correctness Fixes & Project Rename

Changed

Internal Code Quality: Integration Test Suite Hardening

Completed a full hardening pass of the integration test suite, bringing all items in PLAN_TEST_EVALS_INTEGRATION.md to done:

  • Multiset validation — Extracted assert_sets_equal() helper relying on EXCEPT/UNION ALL SQL logic and applied it to workflow tests to ensure storage table state correctly matches the defining query post-refresh.
  • Round-trip notificationspg_trickle_alert notifications now verify receipt end-to-end via sqlx::PgListener.
  • DVM operators — Added unit coverage for complex semi/anti-join behaviors (multi-column, filtered, complementary), multi-table join chains for inner and full joins, and proptest! fuzz tests enforcing generated SQL invariants across INNER, SEMI, and ANTI joins.
  • Resilience and edge cases — Test coverage for ST drop cascades verifying dependent object removal, exact error escalation thresholds, and scheduler job lifecycles across queued mock states.
  • Cleanups — Standardized naming practices (test_workflow_*, test_infra_*) and eliminated clock-bound flakes by widening staleness assertions.

Project Renamed from pg_stream to pg_trickle

Renamed the entire project from pg_stream to pg_trickle to avoid a naming collision with an unrelated project. If you were using the old name, all configuration prefixes changed from pg_stream.* to pg_trickle.*, and the SQL schemas changed from pgstream to pgtrickle. The "stream tables" terminology is unchanged.

Fixed

Fixed numerous incremental computation bugs discovered while building a comprehensive correctness test suite based on all 22 TPC-H analytical queries:

  • Inner join double-counting. When both sides of a join had changes in the same refresh cycle, some rows were counted twice.
  • Shared source cleanup. Cleaning up processed changes for one stream table could accidentally delete entries still needed by another stream table sharing the same source.
  • Scalar aggregate identity mismatch. Queries like SELECT SUM(amount) FROM orders could produce mismatched row identifiers between the incremental and merge phases. AVG also failed to recompute correctly after partial group changes.
  • EXISTS / NOT EXISTS snapshots. Incremental maintenance of EXISTS and NOT EXISTS subqueries missed pre-change state, producing wrong results.
  • Column resolution in complex joins. Several fixes for column name resolution in multi-table joins and nested subqueries.
  • COUNT(*) rendering. COUNT(*) was sometimes rendered as COUNT() (missing the star), causing SQL errors.
  • Subquery rewriting. Several subquery patterns (correlated vs non-correlated scalar subqueries, derived tables in FROM) were incorrectly rewritten, blocking certain queries from being created.
  • Cleanup worker crash. The background cleanup worker no longer crashes when it encounters entries for stream tables that were dropped mid-cycle.

Added

TPC-H Correctness Test Suite

Added a comprehensive correctness test suite based on all 22 TPC-H analytical queries. These tests verify that incremental refreshes produce identical results to a full recompute after INSERT, DELETE, and UPDATE mutations. 20 of 22 queries can be created as stream tables; 15 pass full correctness checks at this point (improved to 22/22 in v0.1.3).


[0.1.1] — CloudNativePG Image & Test Hardening

Changed

Internal Code Quality: Integration Test Suite Hardening

Completed a full hardening pass of the integration test suite, bringing all items in PLAN_TEST_EVALS_INTEGRATION.md to done:

  • Multiset validation — Extracted assert_sets_equal() helper relying on EXCEPT/UNION ALL SQL logic and applied it to workflow tests to ensure storage table state correctly matches the defining query post-refresh.
  • Round-trip notificationspg_trickle_alert notifications now verify receipt end-to-end via sqlx::PgListener.
  • DVM operators — Added unit coverage for complex semi/anti-join behaviors (multi-column, filtered, complementary), multi-table join chains for inner and full joins, and proptest! fuzz tests enforcing generated SQL invariants across INNER, SEMI, and ANTI joins.
  • Resilience and edge cases — Test coverage for ST drop cascades verifying dependent object removal, exact error escalation thresholds, and scheduler job lifecycles across queued mock states.
  • Cleanups — Standardized naming practices (test_workflow_*, test_infra_*) and eliminated clock-bound flakes by widening staleness assertions.

CloudNativePG Extension Image

Replaced the full PostgreSQL Docker image (~400 MB) with a minimal extension-only image (< 10 MB) following the CloudNativePG Image Volume Extensions specification. This means faster pulls and less disk usage in Kubernetes deployments. The image contains just the extension files — no full PostgreSQL server.


[0.1.0] — Initial Release

Initial release of pg_trickle — a PostgreSQL extension that keeps query results automatically up to date as your data changes.

Core Concept

Define a SQL query and a schedule. pg_trickle creates a stream table that stores the query's results and keeps them fresh — either on a schedule (every N seconds) or in real time. When data in your source tables changes, only the affected rows are recomputed instead of re-running the entire query.

What You Can Do

  • Create stream tables from SELECT queries — joins, aggregates, subqueries, CTEs, window functions, set operations, and more.
  • Automatic refresh — a background scheduler refreshes stream tables in dependency order. You can also trigger refreshes manually.
  • Incremental updates — the engine automatically figures out how to update only the rows that changed, instead of recomputing everything. This works for most query patterns including multi-table joins and aggregates.
  • Views as sources — views referenced in your query are automatically expanded so change tracking works on the underlying tables.
  • Tables without primary keys — supported via content hashing. Tables with primary keys get better performance.
  • Hybrid change tracking — starts with lightweight triggers (no special PostgreSQL configuration needed). Can automatically switch to WAL-based tracking for lower overhead when wal_level = logical is available.
  • Multi-database support — the scheduler automatically discovers all databases on the server where the extension is installed.
  • User triggers on stream tables — your own AFTER triggers on stream tables fire correctly during incremental refreshes.
  • DDL awarenessALTER TABLE, DROP TABLE, CREATE OR REPLACE FUNCTION, and other DDL on source tables or functions used in your query are detected and handled automatically.

SQL Support

Broad coverage of SQL features:

  • Joins: INNER, LEFT, RIGHT, FULL OUTER, NATURAL, LATERAL subqueries, LATERAL set-returning functions (unnest, jsonb_array_elements, etc.)
  • Aggregates: 39 functions including COUNT, SUM, AVG, MIN, MAX, STRING_AGG, ARRAY_AGG, JSON_ARRAYAGG, JSON_OBJECTAGG, statistical regression functions (CORR, COVAR_, REGR_), and ordered-set aggregates (MODE, PERCENTILE_CONT, PERCENTILE_DISC)
  • Window functions: ROW_NUMBER, RANK, DENSE_RANK, NTILE, LAG, LEAD, SUM OVER, etc. with full frame clause support
  • Set operations: UNION, UNION ALL, INTERSECT, EXCEPT
  • Subqueries: in FROM, EXISTS/NOT EXISTS, IN/NOT IN, scalar subqueries
  • CTEs: WITH and WITH RECURSIVE
  • Special syntax: DISTINCT, DISTINCT ON, GROUPING SETS / CUBE / ROLLUP, CASE WHEN, COALESCE, JSON_TABLE (PostgreSQL 17+)
  • Unsafe function detection: queries using non-deterministic functions like random() are rejected with a clear error

Monitoring

  • explain_st() — shows the incremental computation plan
  • st_refresh_stats(), get_refresh_history(), get_staleness() — refresh performance and status
  • slot_health() — WAL replication slot health
  • check_cdc_health() — change tracking health per source table
  • stream_tables_info and pg_stat_stream_tables views
  • NOTIFY alerts for stale data, errors, and refresh events

Documentation

  • Architecture guide, SQL reference, configuration reference, FAQ, getting-started tutorial, and deep-dive tutorials.

Known Limitations

  • TABLESAMPLE, LIMIT / OFFSET, FOR UPDATE / FOR SHARE — not yet supported (clear error messages).
  • Window functions inside expressions (e.g. CASE WHEN ROW_NUMBER() ...) — not yet supported.
  • Circular stream table dependencies — not yet supported.

Viewing on GitHub? The roadmap lives in ROADMAP.md. This stub is served by the pg_trickle docs site — the include below renders there.

pg_trickle Roadmap

Audience: Product managers, stakeholders, and technically curious readers who want to understand what each release delivers and why it matters — without needing to read Rust code or SQL specifications.

Versions

Foundation (v0.1.x)

VersionThemeStatusScopeFull details
v0.1.0The complete foundation — differential engine, CDC, scheduling, monitoring✅ ReleasedVery LargeFull details
v0.1.1Change capture correctness fixes (WAL decoder, UPDATE handling)✅ ReleasedPatchFull details
v0.1.2DDL tracking improvements and PgBouncer compatibility✅ ReleasedPatchFull details
v0.1.3SQL coverage completion, WAL hardening, TPC-H 22/22✅ ReleasedPatchFull details

Early Feature Development (v0.2.x – v0.5.x)

VersionThemeStatusScopeFull details
v0.2.0Top-N views, IMMEDIATE refresh mode, diamond dependency safety✅ ReleasedMediumFull details
v0.2.1Upgrade infrastructure and documentation expansion✅ ReleasedSmallFull details
v0.2.2Paginated top-N, AUTO mode default, ALTER QUERY✅ ReleasedMediumFull details
v0.2.3Non-determinism detection and operational polish✅ ReleasedSmallFull details
v0.3.0Correctness for HAVING, FULL OUTER JOIN, and correlated subqueries✅ ReleasedMediumFull details
v0.4.0Parallel refresh, statement-level CDC triggers, cross-source consistency✅ ReleasedMediumFull details
v0.5.0Row-level security, ETL bootstrap gating, API polish✅ ReleasedMediumFull details

Scalability and Robustness (v0.6.x – v0.9.x)

VersionThemeStatusScopeFull details
v0.6.0Partitioned source tables, idempotent DDL, circular dependency foundation✅ ReleasedMediumFull details
v0.7.0Circular DAG execution, watermarks, Prometheus/Grafana observability✅ ReleasedLargeFull details
v0.8.0pg_dump backup support and multiset invariant testing✅ ReleasedSmallFull details
v0.9.0Algebraic aggregate maintenance — AVG, STDDEV, COUNT(DISTINCT)✅ ReleasedMediumFull details

Production Readiness (v0.10.x – v0.14.x)

VersionThemeStatusScopeFull details
v0.10.0DVM hardening, PgBouncer compatibility, "No Surprises" UX✅ ReleasedMediumFull details
v0.11.0Partitioned stream tables, event-driven scheduler (34× latency), circuit breaker✅ ReleasedLargeFull details
v0.12.0Three-table join fix (EC-01), developer tools, SQLancer fuzzing✅ ReleasedMediumFull details
v0.13.0Columnar change tracking, shared buffers, TPC-H 22/22 DIFFERENTIAL✅ ReleasedLargeFull details
v0.14.0Tiered scheduling, UNLOGGED buffers, diagnostics✅ ReleasedMediumFull details

Performance and Integration (v0.15.x – v0.19.x)

VersionThemeStatusScopeFull details
v0.15.0Nexmark benchmark, bulk create API, watermark hold-back, dbt Hub✅ ReleasedMediumFull details
v0.16.0Append-only fast path, algebraic aggregates, auto-indexing, benchmark CI✅ ReleasedMediumFull details
v0.17.0Cost-based refresh strategy, incremental DAG rebuild, pg_ivm migration guide✅ ReleasedLargeFull details
v0.18.0Z-set delta engine, consistency enforcement, safety hardening✅ ReleasedLargeFull details
v0.19.0Security hardening, packaging (PGXN, Docker Hub, apt/rpm)✅ ReleasedMediumFull details

Self-Monitoring and Deep Correctness (v0.20.x – v0.27.x)

VersionThemeStatusScopeFull details
v0.20.0pg_trickle monitors itself using its own stream tables✅ ReleasedLargeFull details
v0.21.0Correctness hardening, zero-crash guarantee, shadow/canary mode✅ ReleasedLargeFull details
v0.22.0Downstream CDC publication, parallel refresh pool, SLA tier auto-assignment✅ ReleasedLargeFull details
v0.23.0TPC-H DVM scaling performance — all 22 queries at O(Δ)✅ ReleasedLargeFull details
v0.24.0Join correctness complete fix, two-phase frontier, TOAST-aware CDC✅ ReleasedLargeFull details
v0.25.0Thousands of stream tables, pooler cold-start fix, predictive model✅ ReleasedLargeFull details
v0.26.0Concurrency testing, fuzz targets, refresh engine modularisation✅ ReleasedLargeFull details
v0.27.0Snapshot/PITR, schedule recommendations, cluster observability✅ ReleasedMediumFull details

Toward Stable (v0.28.x – v1.0)

VersionThemeStatusScopeFull details
v0.28.0Reliable event messaging built into PostgreSQL✅ ReleasedLargeFull details
v0.29.0Off-the-shelf connector to Kafka, NATS, SQS, and more✅ ReleasedLargeFull details
v0.30.0Quality gate before 1.0 — correctness, stability, and docs✅ ReleasedMediumFull details
v0.31.0Smarter scheduling and faster hot paths✅ ReleasedMediumFull details
v0.32.0Citus: stable object naming and per-source frontier foundation✅ ReleasedMediumFull details
v0.33.0Citus: world-class distributed source CDC and stream table support✅ ReleasedLargeFull details
v0.34.0Citus: automated distributed CDC scheduler wiring and shard rebalance auto-recovery✅ ReleasedMediumFull details
v0.35.0EC-01 correctness closeout, Citus chaos hardening, reactive subscriptions, zero-downtime schema changes✅ ReleasedLargeFull details
v0.36.0Structural hardening, L0 cache, WAL backpressure, temporal IVM, columnar storage✅ ReleasedLargeFull details
v0.37.0Scheduler & merge modularisation, pgVectorMV (vector_avg/sum), OpenTelemetry trace propagation✅ ReleasedMediumFull details
v0.38.0EC-01 Correctness Sprint (Hard Gate): join phantom rows, property-test convergence proof — BLOCKING release gate✅ ReleasedMediumFull details
v0.39.0Operational Truthfulness & Distributed Hardening: backpressure/wake fix, generated docs, Citus chaos, SQLSTATE rollout, diagnostics✅ ReleasedLargeFull details
v0.40.0Operator trust and maintainability: generated references, alerting, drain-mode proof, secret hygiene, unsafe gating✅ ReleasedLargeFull details
v0.41.0DVM correctness: structural cache keys, placeholder safety, WAL transition guards✅ ReleasedMediumFull details
v0.42.0Documentation truthfulness + test quality: repair_stream_table, catalog generator, SQL reference, sleep removal, fuzz CI✅ ReleasedLargeFull details
v0.43.0Performance tunability: deep-join GUCs, GROUP_RESCAN improvement, explain_stream_table diagnostics, D+I change buffer refactor✅ ReleasedLargeFull details
v0.44.0Security hardening: IVM search_path fix, centralized SQL builder, RLS warnings, module decomposition✅ ReleasedLargeFull details
v0.45.0Operational readiness: preflight functions, scalability infrastructure, CI completeness, CNPG production examples✅ ReleasedLargeFull details

pg_tide Extraction (v0.46.0)

VersionThemeStatusScopeFull details
v0.46.0Extract pg_tide: standalone transactional outbox, inbox, and relay into trickle-labs/pg-tide✅ ReleasedLargeFull details

Embedding & AI Programme (v0.47.x – v0.48.x)

VersionThemeStatusScopeFull details
v0.47.0Embedding pipeline infrastructure: post-refresh hooks, drift-based reindex, vector monitoring✅ ReleasedMediumFull details
v0.48.0Complete embedding programme: sparse/half-precision vector aggregates, hybrid search, embedding_stream_table() API, per-tenant ANN, embedding outbox✅ ReleasedLargeFull details

v1.0 Readiness Arc (v0.49.x – v0.51.x)

VersionThemeStatusScopeFull details
v0.49.0Test infrastructure hardening: concurrency synchronization overhaul, 10-module unit test sweep, merge/row_id fuzz targets, DDL-during-refresh E2E, scheduler decomposition, CI smoke breadth✅ ReleasedLargeFull details
v0.49.1Repository migration to trickle-labs/pg-trickle: updated CI/CD, Docker, PGXN, dbt Hub, and CloudNativePG artifact publishing✅ ReleasedPatch
v0.50.0Performance, security & operational hardening: SPI batching in differential refresh, dblink escaping fix, CNPG graceful-drain preStop hook, Docker image digest pinning, invalidation ring observability, deep-join drift monitoring, Prometheus secondary metrics✅ ReleasedLargeFull details
v0.51.0Citus chaos resilience & documentation truth: chaos test rig (node kill/rebalance/partition), deprecated GUC removal, ARCHITECTURE.md pg_tide boundary, recursive CTE strategy docs, CDC-enabled-flag documentation✅ ReleasedLargeFull details

Assessment-Driven Final Hardening Arc (v0.52.x – v0.55.x)

Driven by the findings in the v0.51.0 overall assessment (plans/PLAN_OVERALL_ASSESSMENT_11.md). The assessment found 0 critical, 2 HIGH, and 22 MEDIUM findings across correctness, performance, scalability, test coverage, code quality, security, and feature completeness — all resolved in this four-release arc before v1.0.

VersionThemeStatusScopeFull details
v0.52.0DVM hot-path performance: O(1) placeholder resolution (aho-corasick), thread-local volatility cache, lazy DiffContext allocations, O(1) template LRU eviction✅ ReleasedLargeFull details
v0.53.0Unit test depth sweep: dag, scheduler, CDC, parser, config — eleven modules with zero inline coverage — plus proptest extension and buffer-growth sleep removal✅ ReleasedLargeFull details
v0.54.0DVM engine hardening: diff_node depth limit, DiffContext CTE cap (OOM guard), snapshot fingerprint caching, Expr::to_sql() caching, view inlining fixpoint + batched relkind, ST source frontier validation, O(V+E) diamond detection✅ ReleasedLargeFull details
v0.55.0Final pre-1.0 polish: GUC-configurable invalidation ring, api/mod.rs and monitor.rs module decomposition, serde_json NOTIFY payloads, multi-column IN rewrite to EXISTS, DVM parse metrics, reserved-prefix docs, GUC rationale comments, PR coverage gate✅ ReleasedLargeFull details

Documentation Excellence Arc (v0.56.x – v0.57.x)

Driven by the findings in the Round 2 documentation audit (plans/PLAN_DOCUMENTATION_GAPS_2.md, 2026-05-11). The audit found 3 P0 blockers (corrupted GUC_CATALOG.md, 54%-complete ERRORS.md, wrong GUC default), 8 P1 items, 7 P2 items, 5 P3 items, and 7 new documents that should exist before v1.0. This two-release arc resolves all findings and delivers the world-class documentation standard planned for the stable release.

VersionThemeStatusScopeFull details
v0.56.0Documentation Foundation: fix GUC_CATALOG corruption, complete ERRORS.md (all 44 variants), correct parallel_refresh_mode default, complete SQL_REFERENCE outbox/inbox, add MENTAL_MODEL.md, LIMITATIONS.md, PERFORMANCE_CHEATSHEET.md✅ ReleasedLargeFull details
v0.57.0Documentation Excellence: four new tutorials (first dashboard, event sourcing, backfill/migration, security hardening), P2/P3 quality polish, full 83-file consistency sweep✅ ReleasedLargeFull details

Assessment-Driven Hardening Arc (v0.58.x – v0.61.x)

Driven by the findings in the v0.57.0 overall assessment (plans/PLAN_OVERALL_ASSESSMENT_12.md). The assessment found 0 critical, 4 HIGH, 23 MEDIUM, and 20 LOW findings across security (ownership bypass in outbox/publication APIs), correctness (recursive-CTE depth guard in DIFFERENTIAL mode, multi-column NOT IN + NULL semantics, WAL decoder TOCTOU race), performance (per-source SPI fan-out in monitor, merge-template clone overhead, WAL decoder allocation patterns), observability (missing CDC-lag percentiles, worker queue-depth, WAL decoder queue, refresh-mode ratio counters), code quality (scheduler log levels, codegen decomposition, cdc.rs split), and test coverage (refresh orchestrator, CDC, hooks, remaining fixed sleeps). This four-release arc resolves all findings before v1.0.

VersionThemeStatusScopeFull details
v0.58.0Security & Correctness Hardening: ownership checks for outbox/publication APIs, multi-column NOT IN + NULL fix, recursive CTE depth guard in DIFFERENTIAL mode, WAL decoder TOCTOU advisory lock, DDL hook escalation on SPI failure✅ ReleasedMediumFull details
v0.59.0Performance & Observability: batched monitor buffer-growth SPI, query-hash caching, Arc merge templates, WAL decoder Vec pre-allocation, frontier borrow not clone, CDC-lag percentile metrics, worker queue-depth, WAL decoder queue, refresh-mode ratio counters, application_name in BGW, backup/restore docs✅ ReleasedLargeFull details
v0.60.0Code Quality, Test Coverage & CI: scheduler log levels, codegen decomposition, cdc.rs 4-way split, refresh orchestrator/merge/CDC/hooks unit tests, differential idempotence proptest, sleep removal, WAL OID filter, partition-attach rebuild, path-filtered full E2E on PRs, Dockerfile non-root, codecov module thresholdsPlannedLargeFull details
v0.61.0DX, Documentation & Final Pre-1.0 Polish: health_check() foreign-owner row, SQL_REFERENCE completeness, snapshot cache secondary equality, cte_counter reset, outbox name collision fix, sublinks.rs decomposition, ctid invariant comment, 3 foundational ADRs, LIMITATIONS.md NOT IN + NULL section, SEARCH/CYCLE clear error, LATERAL+DIFFERENTIAL docsPlannedLargeFull details

Scheduler Throughput Arc (v0.62.x – v0.63.x)

Two releases targeting scheduler throughput: eliminating redundant change-buffer scans via fan-out, adding the pause_scheduler / resume_scheduler / stream_table_spec SQL API required by the planned pg_aqueduct migration tool (plans/pg-aqueduct-plan.md), and implementing fused CTE refresh to reduce per-tick statement overhead for multi-node DAGs.

VersionThemeStatusScopeFull details
v0.62.0Scheduler throughput: change-buffer fan-out (O(N)→O(1) scans for multi-consumer DAGs), pause_scheduler / resume_scheduler per-node SQL functions, stream_table_spec(oid) stable JSON projectionPlannedMediumFull details
v0.63.0Fused multi-node refresh: CTE-chain composition of per-node delta SQL in a single statement, correctness property test, benchmark regression gate (≥ 20 % wall-time reduction on TPC-H 22-node DAG)PlannedLargeFull details

Beyond v1.0

VersionThemeStatusScopeFull details
v1.0.0Stable release — PostgreSQL 19, package registries, signed artifacts, SBOMs, zero breaking changesPlannedLargeFull details
v1.1.0PostgreSQL 17 support; WITH RECURSIVE … SEARCH/CYCLE clause; auto_explain integration hookPlannedMediumFull details
v1.2.0PGlite proof of concept; pg_partman automated partition scheduling integrationPlannedMediumFull details
v1.3.0Core extraction (pg_trickle_core)PlannedLargeFull details
v1.4.0PGlite WASM extensionPlannedMediumFull details
v1.5.0PGlite reactive integrationPlannedMediumFull details

How these versions fit together

v0.1.0   ─── Foundation: differential engine, CDC, scheduling, 1300+ tests
    │
v0.2–0.5 ─── TopK, IMMEDIATE mode, RLS, partitioned sources, parallel refresh
    │
v0.6–0.9 ─── Circular DAGs, watermarks, Prometheus, algebraic aggregates
    │
v0.10–14 ─── PgBouncer compat, 34× latency, partitioned outputs, tiered scheduling
    │
v0.15–19 ─── Nexmark, append-only fast path, cost model, security, packaging
    │
v0.20–23 ─── Self-monitoring, zero-crash guarantee, downstream CDC, TPC-H at scale
    │
v0.24–27 ─── Join correctness complete, thousands of STs, snapshot/PITR
    │
v0.28–29 ─── Reliable event messaging (outbox + inbox) + relay CLI
    │
v0.30    ─── Quality gate: correctness, stability, docs (required for 1.0)
    │
v0.31    ─── Scheduler intelligence and performance hot paths
    │
v0.32    ─── Citus: stable naming foundation (additive, safe for all users)
    │
v0.33    ─── Citus: distributed CDC and stream table support
    │
v0.35    ─── EC-01 fix, Citus chaos rig, reactive subscriptions, shadow-ST, relay hardening
    │
v0.36    ─── L0 cache, WAL backpressure, api split, temporal IVM, columnar, RowIdSchema
    │
v0.37    ─── Scheduler split, pgVectorMV, OpenTelemetry, pg_partman compat
    │
v0.38    ─── Correctness closeout and truthfulness: EC-01, RowIdSchema planning, backpressure, wake/docs repair
    │
v0.39    ─── Distributed hardening and diagnostics: Citus chaos, durable CDC hold, TPC-H explain artifacts, fuzzing
    │
v0.40    ─── Operator trust and maintainability: generated docs, alerting, drain proof, secret hygiene, unsafe gating
    │
v0.41    ─── DVM correctness: structural cache keys, placeholder safety, WAL transition guards
    │
v0.42    ─── Docs truthfulness + test quality: repair_stream_table, catalog generator, sleep removal, fuzz CI
    │
v0.43    ─── Performance tunability: deep-join GUCs, GROUP_RESCAN improvement, explain diagnostics, D+I CB refactor
    │
v0.44    ─── Security hardening: IVM search_path, SQL builder, RLS warnings, module decomposition
    │
v0.45    ─── Operational readiness: preflight, scalability, CI completeness, CNPG production
    │
v0.46    ─── Extract pg_tide: standalone outbox/inbox/relay → trickle-labs/pg-tide; attach_outbox() integration
    │
v0.47    ─── Embedding infrastructure: post-refresh actions, drift-based reindex, vector monitoring
    │
v0.48    ─── Complete embedding programme: sparse vectors, hybrid search, embedding_stream_table(), per-tenant ANN
    │
v0.49    ─── Test infrastructure hardening: concurrency sync overhaul, 10-module unit sweep, merge fuzz, DDL E2E, scheduler split
    │
v0.50    ─── Performance, security & ops hardening: SPI batching, dblink fix, CNPG drain hook, digest pinning, ring observability
    │
v0.51    ─── Citus chaos resilience & doc truth: chaos rig, deprecated GUC removal, pg_tide boundary, CTE strategy docs
    │
v0.52    ─── DVM hot-path perf: O(1) placeholder resolution, volatility cache, lazy DiffContext, O(1) LRU eviction
    │
v0.53    ─── Unit test depth: dag/scheduler/CDC/parser/config sweep, proptest extension, sleep removal
    │
v0.54    ─── DVM hardening: diff_node depth limit, DiffContext OOM cap, snapshot fingerprint cache, view inlining fixpoint, O(V+E) diamond detection
    │
v0.55    ─── Final pre-1.0 polish: configurable ring, module decomposition, serde_json NOTIFY, multi-column IN rewrite, DVM metrics, docs
    │
v0.56    ─── Documentation Foundation: GUC_CATALOG fix, ERRORS.md complete (44 variants), MENTAL_MODEL.md, LIMITATIONS.md, PERFORMANCE_CHEATSHEET.md
    │
v0.57    ─── Documentation Excellence: 4 new tutorials, P2/P3 polish, full 83-file consistency sweep
    │
v0.58    ─── Security & correctness hardening: ownership checks (outbox/publication APIs), NOT IN + NULL fix, recursive CTE depth guard, WAL decoder TOCTOU lock, DDL hook escalation
    │
v0.59    ─── Performance & observability: batched monitor SPI, query-hash cache, Arc<str> templates, WAL decoder Vec pre-alloc, CDC-lag percentiles, worker queue metrics, app_name BGW, backup docs
    │
v0.60    ─── Code quality, test coverage & CI: cdc.rs split, codegen decompose, refresh/CDC/hooks unit tests, idempotence proptest, sleep removal, WAL OID filter, partition-attach rebuild, path-filtered E2E on PRs
    │
v0.61    ─── DX, docs & pre-1.0 polish: health_check foreign-owner row, SQL_REFERENCE complete, snapshot secondary equality, cte_counter reset, outbox name fix, sublinks decompose, 3 ADRs, LATERAL docs
    │
v0.62    ─── Scheduler throughput: change-buffer fan-out (O(N)→O(1)), pause/resume_scheduler SQL API, stream_table_spec() stable JSON projection
    │
v0.63    ─── Fused multi-node refresh: CTE-chain composition across ready nodes, ≥ 20 % wall-time reduction on TPC-H 22-node DAG
    │
v1.0.0   ─── Stable release, PostgreSQL 19, package registries, signed artifacts, SBOMs

v0.1.0 through v0.27.0 build the complete core engine and harden it for production use. v0.28.0 and v0.29.0 deliver the event-driven integration story. v0.30.0 is a mandatory correctness and polish gate before 1.0. v0.31.0 sharpens scheduler intelligence before new features are added. v0.32.0 is the first of two Citus releases, shipping stable object naming and detection helpers as an additive, non-breaking foundation. v0.33.0 delivers the full Citus integration immediately after — per-worker slot CDC, distributed ST placement, cross-node coordination, and the Citus test suite. Pulling v0.33.0 forward means users with Citus topologies (including billion-row all-distributed deployments) are unblocked two releases earlier. v0.35.0 was intended to be the single most important release before v1.0, but the v0.37.0 overall assessment shows several of those closeout items remain partially open or insufficiently proven. v0.36.0 and v0.37.0 still delivered substantial structural gains: L0 cache construction, temporal IVM, RowIdSchema, scheduler and merge splits, pgVectorMV, and OpenTelemetry trace capture. The next three releases now form a hardening programme rather than an immediate feature expansion.

v0.38.0 is a dedicated EC-01 correctness sprint with a hard release gate: This release will NOT ship until join phantom rows are proven closed with a comprehensive DIFF-vs-FULL property test suite covering Q07/Q15-style joins. EC-01 has been labeled critical since v0.20.0 (6+ releases) and deferred multiple times; v0.38.0 breaks that pattern by making EC-01 closure the sole release objective. No other features, no operational docs, no SQLSTATE rollout — just the join phantom-row fix and its proof.

v0.39.0 absorbs the operational truthfulness items that were originally planned for v0.38.0: backpressure hysteresis or deprecation, wake-truthfulness repair, generated configuration and upgrade docs, OpenTelemetry collector proof, SQLSTATE rollout on hot paths, and the full distributed/diagnostic coverage (Citus chaos testing, durable CDC hold semantics, per-query TPC-H explain artifacts, SQLancer light PR mode, targeted fuzzing, and inbox/outbox reliability tests).

v0.40.0 then focuses on operator trust and maintainability: generated SQL/GUC references, drain-mode proof, monitoring/alert rules, security-model and secret-handling docs, upgrade-gate coverage, unsafe-inventory PR gating, and continued decomposition of the largest files.

v0.41.0 through v0.45.0 form a second hardening arc driven by the findings in the v0.40 overall assessment (plans/PLAN_OVERALL_ASSESSMENT_9.md). These five releases systematically close every gap identified across 10 dimensions: correctness (P0 cache-key and placeholder fixes), documentation truthfulness (repair function implementation, catalog generator rewrite), test quality (sleep removal, property tests, fuzz CI — merged into v0.42.0), performance tunability (GUC-exposed thresholds, explain diagnostics), security (search_path hardening, centralized SQL building), and operational readiness (preflight functions, scalability infrastructure, CI completeness). Only after this arc does the roadmap resume the embedding programme in v0.47.0–v0.48.0, preserving the pgvector work while aligning the release order with the assessment's conclusion that closing correctness and operational gaps matters more than adding new surface area. The embedding programme itself is consolidated into two releases: v0.47.0 for infrastructure and ANN maintenance, and v0.48.0 completing the full feature set (sparse/half-precision aggregates, hybrid search, the ergonomic embedding_stream_table() API, per-tenant ANN patterns, and outbox-emitted embedding events). v0.46.0 precedes this arc with the extraction of pg_tide — moving the outbox, inbox, and relay subsystems into a standalone extension at trickle-labs/pg-tide.

v0.49.0 through v0.51.0 form the v1.0 readiness arc, driven by the findings in the v0.48.0 overall assessment (plans/PLAN_OVERALL_ASSESSMENT_10.md). The assessment confirmed that every P0 correctness issue from prior assessments is closed — EC-01 phantom rows, snapshot cache-key weakness, placeholder resolution, and WAL transition TOCTOU are all fixed. The project has transitioned from a capability problem to a coverage confidence problem. These three releases systematically close the remaining gaps across test reliability, performance, security hardening, operational polish, and documentation truth before v1.0.

v0.58.0 through v0.61.0 form the final assessment-driven hardening arc before v1.0, driven by the findings in the v0.57.0 overall assessment (plans/PLAN_OVERALL_ASSESSMENT_12.md). The assessment found 0 critical findings, 4 HIGH severity issues (ownership-check bypass in the outbox and publication APIs, recursive-CTE depth guard not applied in DIFFERENTIAL mode, multi-column NOT IN with NULL row semantics, and per-source SPI fan-out in the monitor health check), plus 23 MEDIUM and 20 LOW items spanning performance, observability, code quality, test coverage, and documentation. v0.58.0 closes all HIGH findings as a hard gate. v0.59.0 eliminates the performance and observability gaps. v0.60.0 completes the code quality and test coverage sweep. v0.61.0 delivers the final developer-experience and documentation polish, closing the last remaining items so that v1.0 is a clean, fully verified stable release.

v0.49.0 targets test infrastructure quality — the single highest-risk category from the v10 assessment. All concurrency tests currently rely on sleep(50ms) for synchronization, which provides false confidence: tests may pass locally while missing real race conditions on slow CI runners or under load. This release replaces sleep-based synchronization with pg_locks-polling patterns throughout tests/e2e_concurrent_tests.rs. Alongside, ten source modules that have zero #[cfg(test)] unit test coverage are systematically addressed: catalog.rs, template_cache.rs, ivm.rs, cdc/polling.rs, cdc/rebuild.rs, diagnostics.rs, logging.rs, metrics_server.rs, and otel.rs. New fuzz targets are added for the refresh merge SQL codegen (src/refresh/merge/) and row identity tracking (src/dvm/row_id.rs) — two high-value surfaces with no current fuzz coverage. An E2E test for concurrent DDL during active refresh (ALTER STREAM TABLE + in-flight refresh) is added. The src/scheduler/mod.rs monolith (6,700+ lines) is decomposed into focused submodule files: scheduling loop, parallel dispatch state, and cost model each become separate files. The e2e-smoke CI filter is widened to cover join, aggregate, and window operator regressions on every PR, and a consolidated just fuzz-all recipe is added to the justfile.

v0.50.0 targets performance, security, and operational hardening. The differential refresh hot path currently makes 3–4 separate SPI round-trips per refresh cycle — buffer existence check, change count per source, and table row estimate — that are consolidated into a single CTE query, saving 10–15ms per multi-source refresh. The CDC trigger SQL generation loop is tightened using String::with_capacity() to eliminate per-column heap allocations. The watermark computation in the scheduler tick is consolidated into a single compound query. On the security side, the src/citus.rs dblink calls that use manual single-quote doubling for escaping are replaced with pg_escape_literal() SPI calls for defense-in-depth. Operational gaps are closed: the CNPG cluster-production.yaml gains a preStop lifecycle hook that calls pgtrickle.drain(timeout_s => 120) before pod termination, preventing interrupted in-flight refreshes during rolling upgrades. All Docker base images are pinned to SHA256 digests for reproducible builds. The shared memory invalidation ring capacity limit (1,024 entries) is documented in docs/CONFIGURATION.md with a new pg_trickle_invalidation_ring_overflow Prometheus counter. Two additional Prometheus metrics are added: pg_trickle_dag_cycles_detected and pg_trickle_cache_stale_evictions. The deep join chain Part 3 correction threshold GUC and its trade-off between SQL complexity and correctness at >6 join tables is documented in the configuration reference with an associated soak-test assertion.

v0.51.0 closes the Citus resilience gap and brings documentation into full truth — the chaos test rig (node kill, rebalance, and network-partition scenarios) proves that every Citus failure mode is handled, while deprecated GUC removals, ARCHITECTURE.md boundary updates, recursive CTE strategy documentation, and CDC-enabled-flag documentation eliminate the last documentation inaccuracies identified in the v10 assessment.

v0.52.0 through v0.55.0 form the final pre-1.0 hardening arc, driven by the findings in the v0.51.0 overall assessment (plans/PLAN_OVERALL_ASSESSMENT_11.md). The assessment found no critical issues, two HIGH findings (both performance-class), and 22 MEDIUM findings across correctness, performance, scalability, test coverage, code quality, security, and feature completeness. These four releases close every one of them in priority order.

v0.52.0 targets the two HIGH-severity performance gaps on the DVM hot path. resolve_delta_template() currently resolves LSN placeholders by calling .replace() twice per source OID — an O(k×n) scan for k source tables in a SQL string of length n. This is replaced with a single aho-corasick pass that resolves all placeholders in one traversal, cutting multi-source refresh latency proportionally. Alongside, lookup_function_volatility() currently makes one SPI round-trip to pg_proc for every unknown function in a query — up to 50 ms overhead for function-heavy queries. A thread-local HashMap<String, char> cache pre-populated with all PostgreSQL built-in functions eliminates these trips on the hot path. Two further allocator improvements close the LOW-rated findings: DiffContext::new() switches from 12 unconditional HashMap::new() calls to Option<HashMap> with lazy initialization (saving 5–10 µs per refresh for simple queries), and the template cache eviction path is replaced with a proper LRU data structure for O(1) eviction instead of O(N) scanning.

v0.53.0 is the eleven-module unit test depth sweep. Five source modules that are responsible for core algorithmic logic — dag.rs (cycle detection, topological sort, diamond detection, schedule resolution), the eight scheduler/ submodules (cost model, tier transitions, watermark computation), cdc.rs/cdc/polling.rs/cdc/rebuild.rs (buffer naming, column escaping, trigger SQL), all five dvm/parser/ files (Expr::to_sql(), AggFunc classification, strip_qualifier()), and config.rs (mode parsing, threshold conversion) — have zero inline #[cfg(test)] unit tests and are only exercised through full-stack E2E tests. This release adds focused #[cfg(test)] modules to every one of them using mock structures that require no PostgreSQL backend. proptest! coverage is extended to DAG cycle detection and schedule resolution. The two remaining fixed-sleep tests in e2e_buffer_growth_tests.rs (7s and 20s sleeps) are replaced with adaptive pg_locks-polling helpers.

v0.54.0 hardens the DVM engine against pathological queries and slow parsing. diff_node() gains a depth counter that errors on breach of max_parse_depth (default 64), preventing stack overflow on extreme nesting. DiffContext gains a configurable CTE count ceiling (default 1,000) that returns a clean error before OOM can occur. The snapshot cache fingerprint is computed and stored at OpTree construction time instead of re-traversing the tree on every diff cycle, and Expr::to_sql() caches its result string to eliminate redundant allocations. View inlining (rewrite_views_inline()) is refactored to batch all relkind lookups into a single SPI query and use a fixpoint check (no changes this iteration) instead of a hard counter, cutting 3-level view hierarchies from ~15 ms to a single parse + one SPI call. The ST-to-ST frontier resolver is hardened to return PgTrickleError::SourceNotFound instead of silently defaulting to "0/0" when a required source is missing. Finally, diamond detection is reimplemented with a BFS-based visited-set merge algorithm, reducing complexity from O(V²) pairwise comparisons to O(V+E) — critical for deployments with 500+ stream tables.

v0.55.0 delivers the final pre-1.0 polish pass across scalability, module structure, security, documentation, and one new SQL feature. The shared-memory invalidation ring capacity (currently hardcoded at 1,024) becomes a GUC with a default of 1,024 and a maximum of 4,096, preventing excessive full DAG rebuilds in DDL-burst environments. src/api/mod.rs (7,600+ lines) is decomposed into focused submodules (api/create.rs, api/alter.rs, api/refresh.rs), and src/monitor.rs (4,000+ lines) is split into monitor/alert.rs, monitor/health.rs, and monitor/tree.rs. NOTIFY alert payloads are switched from manual string escaping to serde_json::json!() to guarantee correct JSON for error messages containing backslashes or control characters. The DVM parser gains automatic rewriting of WHERE (a, b) IN (SELECT x, y FROM ...) multi-column row IN subqueries to equivalent EXISTS form, closing the last user-visible SQL coverage gap. DVM parse timing metrics (pg_trickle_dvm_parse_ms, pg_trickle_delta_query_size_bytes) are added to the Prometheus metrics endpoint. The __PGS_/__PGT_ reserved column-name prefixes are documented in docs/SQL_REFERENCE.md, rationale comments are added to all magic-number GUC defaults in src/config.rs, and coverage reporting is added to the PR gate so regressions are visible before merge. truth.** The Citus distributed support shipped in v0.32–v0.34 has never had a chaos test suite — there are zero tests validating behaviour under node failure, shard rebalance, or network partition. This release delivers a docker-compose-based chaos rig with three scenarios: coordinator restart, worker node kill with automatic reconnect, and rolling shard rebalance during active refresh. The deprecated pg_trickle.event_driven_wake GUC (non-functional since background workers cannot use LISTEN) is removed entirely along with all associated code paths and the runtime warning it emits. Documentation is brought to full truth: docs/ARCHITECTURE.md is updated to clearly describe the pg_tide boundary after v0.46.0 extraction; docs/CONFIGURATION.md gains a deprecation header on the removed GUC entry; the recursive CTE strategy selection heuristic (semi-naive vs. DRed vs. recomputation fallback) is documented for the first time with an example EXPLAIN output; and a note is added to docs/CONFIGURATION.md clarifying that CDC triggers fire even when pg_trickle.enabled = false (by design, to keep buffers ready for re-enable).

Release Process

This document describes how to create a release of pg_trickle.

Overview

Releases are fully automated via GitHub Actions. Pushing a version tag (v*) triggers the Release workflow, which:

  1. Runs a preflight version-sync check to ensure all version references match the tag
  2. Builds extension packages for Linux (amd64), macOS (arm64), and Windows (amd64)
  3. Smoke-tests the Linux artifact against a live PostgreSQL 18 instance
  4. Creates a GitHub Release with archives and SHA256 checksums
  5. Builds and pushes a multi-arch extension image to GHCR (for CNPG Image Volumes)

A separate PGXN workflow also fires on the same v* tag and publishes the source archive to the PostgreSQL Extension Network.

Prerequisites

  • Push access to the repository (or a PR merged by a maintainer)
  • All CI checks passing on main (verify the last run on the version-bump commit succeeded)
  • The version in Cargo.toml matches the tag you intend to push
  • Required GitHub secrets configured (see Required GitHub Secrets below)

Required GitHub Secrets

The release automation uses the following GitHub Actions secrets. Set them under Settings → Secrets and variables → Actions → New repository secret.

SecretUsed byDescription
PGXN_USERNAMEpgxn.ymlYour PGXN account username. Used to authenticate the curl upload to PGXN Manager when publishing source archives to the PostgreSQL Extension Network. Register at pgxn.org.
PGXN_PASSWORDpgxn.ymlPassword for the PGXN account above. Never hardcode this — it must be stored as a secret so it is never exposed in logs or committed to the repository.
CODECOV_TOKENcoverage.ymlUpload token for Codecov. Used to publish unit and E2E coverage reports. Obtain it from the Codecov dashboard after linking the repository. The workflow degrades gracefully (fail_ci_if_error: false) if absent.
BENCHER_API_TOKENbenchmarks.ymlAPI token for Bencher, the continuous benchmarking platform. Used to track Criterion benchmark results on main and detect regressions on pull requests. The benchmark steps are skipped entirely when this secret is absent, so CI still passes without it. Create a project at bencher.dev and copy the token from the project settings.

Note: The GITHUB_TOKEN secret is provided automatically by GitHub Actions and does not need to be configured manually. It is used by the release workflow to create GitHub Releases, by the Docker workflow to push images to GHCR, and by Bencher to post PR comments.

Step-by-Step

1. Decide the version number

Follow Semantic Versioning:

Change typeBumpExample
Breaking SQL API or config changeMajor1.0.0 → 2.0.0
New feature, backward-compatibleMinor0.1.0 → 0.2.0
Bug fix, no API changePatch0.2.0 → 0.2.1
Pre-release / release candidateSuffix0.3.0-rc.1

2. Update the version

Three files must have their version bumped together:

# 1. Cargo.toml — the canonical version source for the extension
#    Change:  version = "0.7.0"  →  version = "0.8.0"

# 2. META.json — the PGXN package metadata
#    Change both top-level "version" and the nested "provides" version

# 3. CHANGELOG.md
#    Rename ## [Unreleased] → ## [0.8.0] — YYYY-MM-DD
#    Add a new empty ## [Unreleased] section at the top

The just check-version-sync script enforces version consistency across the workspace.

The extension control file (pg_trickle.control) uses default_version = '@CARGO_VERSION@', which pgrx substitutes automatically at build time — no manual edit needed there.

After editing, verify all version-related files are in sync:

just check-version-sync

3. Commit the version bump

git add Cargo.toml META.json CHANGELOG.md
git commit -m "release: v0.8.0"
git push origin main

4. Wait for CI to pass and verify upgrade completeness

Ensure the CI workflow passes on main with the version bump commit. All unit, integration, E2E, and pgrx tests must be green.

Critical: Before tagging, verify that the upgrade script covers all SQL schema changes:

# Run comprehensive upgrade completeness checks
just check-upgrade-all

# If any check fails (e.g. "ERROR: X new function(s) missing from upgrade script"),
# fix the issue by adding the missing SQL objects to:
#   sql/pg_trickle--<prev>--<new>.sql
#
# Then re-run until all checks pass:
just check-upgrade-all  # Should print "All 15 upgrade step(s) passed completeness checks."

Why this matters: New SQL functions, views, tables, and columns added in any prior release must be carried forward in the upgrade script, even if the current release doesn't change them. The upgrade script is the source of truth for what PostgreSQL applies when users run ALTER EXTENSION pg_trickle UPDATE.

Confirm the local and CI upgrade-E2E defaults were advanced to the new release:

just check-version-sync  # Verifies ci.yml, justfile, and test defaults

5. Create and push the tag

git tag -a v0.2.0 -m "Release v0.2.0"
git push origin v0.2.0

This triggers the Release workflow automatically.

6. Monitor the release

Watch the Actions tab for progress. The release workflow runs these jobs in order:

preflight  ──►  build-release (linux, macos, windows)
                      │
                      ▼
                test-release  ──►  publish-release
                              ──►  publish-docker-arch (linux/amd64 + linux/arm64)
                                         │
                                         ▼
                                   publish-docker (merge manifest + push :latest)

The PGXN workflow (pgxn.yml) runs independently and publishes the source archive to pgxn.org in parallel with the release workflow.

7. Make the GHCR package public (first release only)

When a package is pushed to GHCR for the first time it is private by default. Because this is an open-source project, packages linked to the public repository inherit public visibility — but you must make the package public once to unlock that:

  1. Go to github.com/⟨owner⟩ → Packages → pg_trickle-ext
  2. Click Package settings
  3. Scroll to Danger ZoneChange package visibility → set to Public

After that first change:

  • All future pushes keep the package public automatically
  • Unauthenticated docker pull ghcr.io/trickle-labs/pg_trickle-ext:... works
  • Storage and bandwidth are free (GHCR open-source advantage)
  • The package page shows the README, linked repository, license, and description from the OCI labels

8. Verify the release

Once both workflows complete:

  • Check the GitHub Releases page for the new release
  • Verify all three platform archives are attached (.tar.gz for Linux/macOS, .zip for Windows)
  • Verify SHA256SUMS.txt is present
  • Verify the extension image is available at ghcr.io/trickle-labs/pg_trickle-ext:<version>
  • Verify the PGXN upload succeeded: pgxn info pg_trickle should show the new version
  • Optionally verify the extension image layout:
docker pull ghcr.io/trickle-labs/pg_trickle-ext:<version>
ID=$(docker create ghcr.io/trickle-labs/pg_trickle-ext:<version>)
docker cp "$ID:/lib/" /tmp/ext-lib/
docker cp "$ID:/share/" /tmp/ext-share/
docker rm "$ID"
ls -la /tmp/ext-lib/ /tmp/ext-share/extension/

Post-Release Checklist

Complete these steps immediately after a release tag has been pushed and both the Release and PGXN workflows have finished successfully.

  • Create a post-release branch from main (e.g. post-release-<ver>-a)
  • Bump Cargo.toml version to the next development version (e.g. 0.12.00.13.0)
  • Bump META.json — both the top-level "version" and the nested "provides" → "pg_trickle" → "version" to match
  • Write plans/PLAN_0_<next>_0.md — initial planning document for the next milestone
  • Delete plans/PLAN_0_<released>_0.md — remove the now-completed plan
  • Wrap roadmap items — in ROADMAP.md, wrap all completed items from the old release with <details> tags to archive them
  • Add ## [Unreleased] stub to CHANGELOG.md above the just-released entry
  • Create sql/pg_trickle--<released>--<next>.sql — empty upgrade script stub for the next migration hop
  • Copy sql/archive/pg_trickle--<released>.sqlsql/archive/pg_trickle--<next>.sql — placeholder archive baseline for the next version
  • Update justfile — advance build-upgrade-image and test-upgrade to defaults to <next>; update the build-hub Docker image tag
  • Update tests/e2e_upgrade_tests.rs — advance all unwrap_or("<released>".into()) fallback strings to <next>
  • Update version numbers in README.md — search for occurrences of the released version (e.g. 0.17.0) and advance them to <next>: CNPG image reference (ghcr.io/trickle-labs/pg_trickle-ext:<version>), dbt revision tag, and any other hardcoded version strings. A quick check: grep -n '<released>' README.md
  • Run just check-version-sync — must exit 0 before opening the PR
  • Open a PR against main with the commit title chore: start v<next> development cycle

Preparing for the Next Release (Pre-Work Checklist)

Use this checklist at the start of each new release milestone to ensure the repository is properly set up before development begins. This maps directly to what just check-version-sync verifies.

File / targetActioncheck-version-sync check
Cargo.tomlversion = "<next>"canonical version source
META.jsonboth "version" fields set to <next>PGXN manifest
CHANGELOG.md## [Unreleased] section present(manual hygiene)
sql/pg_trickle--<prev>--<next>.sqlstub file existsupgrade SQL exists
sql/archive/pg_trickle--<next>.sqlplaceholder file exists (copy of <prev>)archive SQL exists
.github/workflows/ci.ymlupgrade matrix and chain end at <next>CI matrix up to date
justfilebuild-upgrade-image and test-upgrade to defaults = <next>justfile defaults
tests/e2e_upgrade_tests.rsall unwrap_or fallbacks = "<next>"e2e fallback strings

Quick-verify with:

just check-version-sync
# Should print: All version references are in sync.

Release Artifacts

Each release produces:

ArtifactDescription
pg_trickle-<ver>-pg18-linux-amd64.tar.gzExtension files for Linux x86_64
pg_trickle-<ver>-pg18-macos-arm64.tar.gzExtension files for macOS Apple Silicon
pg_trickle-<ver>-pg18-windows-amd64.zipExtension files for Windows x64
SHA256SUMS.txtSHA-256 checksums for all archives
ghcr.io/trickle-labs/pg_trickle-ext:<ver>CNPG extension image for Image Volumes (amd64 + arm64)

Installing from an archive

tar xzf pg_trickle-<version>-pg18-linux-amd64.tar.gz
cd pg_trickle-<version>-pg18-linux-amd64

sudo cp lib/*.so "$(pg_config --pkglibdir)/"
sudo cp extension/*.control extension/*.sql "$(pg_config --sharedir)/extension/"

Then add to postgresql.conf and restart:

shared_preload_libraries = 'pg_trickle'

See Installation for full installation details.

Pre-releases

Tags containing -rc, -beta, or -alpha (e.g., v0.3.0-rc.1) are automatically marked as pre-releases on GitHub. Pre-release extension images are tagged but do not update the latest tag.

Hotfix Releases

For urgent fixes on an older release:

# Branch from the tag
git checkout -b hotfix/v0.2.1 v0.2.0

# Apply fix, bump version to 0.2.1
git commit -am "fix: ..."
git push origin hotfix/v0.2.1

# Tag from the branch (CI will still run the release workflow)
git tag -a v0.2.1 -m "Release v0.2.1"
git push origin v0.2.1

Files to Update for Each Release

Every release requires manual updates to the files below. Missing any of them leads to version skew between the code, the docs, and the packages.

FileWhat to changeWhy
Cargo.tomlversion = "x.y.z" fieldThe canonical version source. pgrx reads this at build time and substitutes it into pg_trickle.control via @CARGO_VERSION@. The git tag must match.
META.jsonBoth "version" fields (top-level and inside "provides")The PGXN package manifest. The pgxn.yml workflow uploads this file as part of the source archive; a stale version here means the wrong version appears on pgxn.org.
CHANGELOG.mdRename ## [Unreleased]## [x.y.z] — YYYY-MM-DD; add a new empty ## [Unreleased] at the topKeeps the public changelog accurate and gives downstream users a dated record of changes.
ROADMAP.mdUpdate the preamble's latest-release/current-milestone lines; mark the released milestone done; advance the "We are here" pointer to the next milestoneKeeps the forward-looking plan aligned with reality. Leaves no confusion about what just shipped versus what is next.
README.mdUpdate test-count line (~N unit tests + M E2E tests) if test counts changed significantlyThe README is the first thing users read; stale numbers erode trust.
INSTALL.mdUpdate any version numbers in install commands or example URLsUsers copy-paste installation commands; stale versions cause failures.
docs/UPGRADING.mdAdd the new version-specific migration notes and extend the supported upgrade-path tableDocuments exactly what ALTER EXTENSION ... UPDATE will do and which chains are supported.
sql/pg_trickle--<old>--<new>.sqlAdd or update the hand-authored upgrade script for every SQL-surface change (new objects, changed signatures, changed defaults, view changes). Also carry forward all functions/views/tables added in previous releases — the upgrade script is cumulative.ALTER EXTENSION ... UPDATE only applies what is explicitly scripted; function defaults and signatures stored in pg_proc do not update themselves. Omitting a function that existed in <old> but is expected in <new> will break user upgrades.
sql/archive/pg_trickle--<new>.sqlRegenerate and commit the full-install SQL baseline for the new version. This file was created as a placeholder copy of <prev> at the start of the development cycle — it must be replaced with the actual generated SQL before tagging. Run cargo pgrx schema (or the equivalent just target) to produce the final schema, then overwrite the placeholder.Future upgrade-completeness checks and upgrade E2E tests need an exact baseline for the released version. A stale placeholder from the start of the cycle will cause spurious failures.
.github/workflows/ci.yml, justfile, tests/build_e2e_upgrade_image.sh, tests/Dockerfile.e2e-upgradeAdvance the upgrade-check chain and default upgrade-E2E target version to the new releasePrevents release automation and local upgrade validation from getting stuck on the previous version after a new migration hop is added.
pg_trickle.controlNo manual edit neededdefault_version is set to '@CARGO_VERSION@' and pgrx substitutes it at build time. Verify the substitution in the built artifact.Ensures the SQL CREATE EXTENSION command installs the right version.

CRITICAL: After updating sql/pg_trickle--<old>--<new>.sql, always run just check-upgrade-all to verify that the upgrade script is complete. This checks not just the immediate hop to the new version, but the entire upgrade chain from v0.1.3 onwards. If the check fails (e.g. "ERROR: 3 new function(s) missing"), it means the upgrade script is missing one or more SQL objects that users will expect to have after upgrading. Fix all failures before tagging.

Checklist summary

[ ] Cargo.toml — version bumped
[ ] META.json — both "version" fields updated to match
[ ] CHANGELOG.md — [Unreleased] renamed to [x.y.z] with date; new empty [Unreleased] added
[ ] ROADMAP.md — preamble updated; released milestone marked done
[ ] README.md — test counts current (if materially changed)
[ ] INSTALL.md — version references current
[ ] docs/UPGRADING.md — latest migration notes and supported chains added
[ ] sql/pg_trickle--<old>--<new>.sql — covers every SQL-surface change AND carries forward all previous release functions
[ ] sql/archive/pg_trickle--<new>.sql — regenerated from final schema and committed (replaces the dev-cycle placeholder)
[ ] just check-upgrade-all — all upgrade steps pass completeness checks (not just the one-step hop)
[ ] Upgrade automation defaults — CI/local upgrade checks and E2E target the new version
[ ] just check-version-sync — all version references in sync
[ ] All CI checks on main have passed (verify the last run on the version-bump commit succeeded)
[ ] git tag matches Cargo.toml version

Troubleshooting

Release workflow failed

Go to the Actions tab and identify which job failed. Then follow the appropriate recovery path below.

Option A: Re-run (transient failure)

If the failure is transient — network timeout, registry hiccup, runner issue — you can re-run without changing anything:

  1. Open the failed workflow run in the Actions tab
  2. Click Re-run all jobs (or re-run just the failed job)

This works because the v* tag still points to the same commit, and the workflow uses cancel-in-progress: false so a re-run won't be cancelled.

Option B: Fix code and re-tag

If the failure is a real build or code issue:

# 1. Delete the remote tag
git push origin :refs/tags/v0.2.0

# 2. Delete the local tag
git tag -d v0.2.0

# 3. Fix the issue, commit, and push
git add <files>
git commit -m "fix: ..."
git push origin main

# 4. Re-tag on the new commit and push
git tag -a v0.2.0 -m "Release v0.2.0"
git push origin v0.2.0

This triggers a fresh release workflow run.

Option C: Clean up a partial GitHub Release

If the workflow created a draft or partial Release before failing:

  1. Go to Releases in the repository
  2. Delete the broken release (this does not delete the tag)
  3. Then follow Option A or Option B above

Upgrade script completeness check failed

If just check-upgrade-all reports errors like "ERROR: X new function(s) missing from upgrade script", it means the upgrade SQL script is incomplete:

# 1. Look at the error — it tells you exactly what's missing
just check-upgrade-all  # e.g. "ERROR: 3 new function(s) missing from upgrade script:
                        #        - pgtrickle.\"explain_refresh_mode\"
                        #        - pgtrickle.\"fuse_status\"
                        #        - pgtrickle.\"reset_fuse\""

# 2. Find where those objects are defined in the previous release
#    (they should already exist in sql/archive/pg_trickle--<prev>.sql)
grep -n "CREATE.*FUNCTION.*explain_refresh_mode" sql/archive/pg_trickle--*.sql

# 3. Copy the function definitions (CREATE OR REPLACE FUNCTION) to the
#    upgrade script you're fixing. They should go into:
#    sql/pg_trickle--<old>--<new>.sql
#    
#    Typically, carry-forward functions are grouped in their own section
#    at the top of the upgrade script with a comment explaining they're
#    from a prior release.

# 4. Re-run the check to verify it passes
just check-upgrade-all

Why this happens: When a new release (e.g. v0.11.0) adds SQL functions, those functions must be explicitly included in all subsequent upgrade scripts. The upgrade script is the ground truth — PostgreSQL only applies what is listed in the .sql file. If you skip a function that users expect, their upgraded extension will be missing that object.

Common failure causes

SymptomCauseFix
Version mismatch errorCargo.toml version doesn't match the git tagRun just check-version-sync, fix any skew, commit, delete tag, re-tag (Option B)
Build failureCompilation error in release profileFix on main, re-tag (Option B)
Docker push failedMissing permissionsVerify packages: write is in the workflow and GITHUB_TOKEN has GHCR access, then re-run (Option A)
Smoke test failedExtension doesn't load in PostgreSQLFix the issue, re-tag (Option B)
PGXN upload failedMissing PGXN_USERNAME / PGXN_PASSWORD secrets, or META.json version not updatedAdd the secrets in repository settings; verify META.json version matches the tag; re-run the pgxn.yml workflow from the Actions tab
just check-upgrade-all reports missing functions/viewsUpgrade script is incomplete — new objects from prior releases not carried forwardSee "Upgrade script completeness check failed" above for recovery steps
Rate limitedGitHub API or GHCR throttlingWait a few minutes, then re-run (Option A)

Yanking a release

If a release has a critical issue:

  1. Mark it as pre-release on the GitHub Releases page (uncheck "Set as the latest release")
  2. Add a warning to the release notes
  3. Publish a patch release with the fix

Project History

pg_trickle started with a practical goal. We were inspired by data platforms built around pipelines that keep themselves incrementally up to date, and we wanted to bring that same style of self-maintaining data flow directly into PostgreSQL. In particular, we needed support for recursive CTEs, which were essential to the kinds of pipelines we had in mind. We could not find an open-source incremental view-maintenance system that matched that requirement, so pg_trickle began as an attempt to close the gap.

It also became an experiment in what coding agents could realistically help build. We set out to develop pg_trickle without editing code by hand, while still holding it to the same bar we would expect from any other systems project: broad feature coverage, strong code quality, extensive tests, and thorough documentation. Skepticism toward AI-written software is reasonable; the right way to evaluate pg_trickle is by the codebase, the tests, and the docs.

That constraint changed how we worked. Agents can produce a lot of surface area quickly, but database systems are unforgiving of vague assumptions and hidden edge cases. To make the project hold together, we had to be unusually explicit about architecture, operator semantics, failure handling, and test coverage. In practice, that pushed us toward more written design, more reviewable behavior, and more verification than a quick prototype would normally get.

The result is a spec-driven development process, not vibe-coding. Every feature starts as a written plan — an architecture decision record, a gap analysis, or a phased implementation spec — before any code is generated. The plans/ directory contains over 110 documents covering operator semantics, CDC trade-offs, performance strategies, ecosystem comparisons, and edge-case catalogues. Agents work from these specs; the specs are reviewed and revised by humans. This is what makes it possible to maintain coherence across a large codebase without manually editing every line: the design is explicit, the invariants are written down, and the tests verify both.

We also do not think the use of AI should lower the standard for trust. If anything, it raises it. The point of the experiment was not to ask people to trust the toolchain; it was to see whether disciplined use of coding agents could help produce a serious, inspectable PostgreSQL extension. Whether that worked is for readers and users to judge, but the intent is simple: make the code, the tests, the documentation, and the trade-offs visible enough that the project can stand on its own merits.


Contributors


See also: Roadmap · Changelog · Contributing

Viewing on GitHub? The contributing guide lives in CONTRIBUTING.md. This stub is served by the pg_trickle docs site — the include below renders there.

Contributing to pg_trickle

Thank you for your interest in contributing! pg_trickle is an Apache 2.0-licensed open-source project and welcomes contributions of all kinds.

Before You Start

  • Check the open issues and discussions to avoid duplicating work.
  • For non-trivial changes, open an issue first to discuss the approach.
  • Read AGENTS.md — it is the authoritative guide for all coding conventions, error handling rules, module layout, and test requirements.
  • Read docs/ARCHITECTURE.md to understand the system.
  • Read ROADMAP.md to see what work is planned.

Ways to Contribute

TypeWhere to start
Bug reportOpen an issue
Feature requestOpen an issue or start a discussion
Documentation fixOpen a PR directly — no issue needed for typos/clarity
Code fix or featureOpen an issue first, then a PR
Performance improvementInclude benchmark numbers (see just bench)

Development Setup

# Install pgrx
cargo install cargo-pgrx --version "=0.18.0"
cargo pgrx init --pg18 /usr/lib/postgresql/18/bin/pg_config

# Build
cargo build

# Format + lint (required before every PR)
just fmt
just lint

# Run tests
just test-unit          # fast, no DB
just test-integration   # Testcontainers
just test-light-e2e     # PR-equivalent Light E2E tier (stock postgres)
just test-e2e           # full E2E (builds Docker image)
just test-pgbouncer     # PgBouncer transaction-pool compatibility tests

Full setup instructions are in INSTALL.md.

Devcontainer / Containerized Development

If you are developing in a devcontainer, use the default non-root vscode user and run the normal commands from the workspace root:

just fmt
just lint
just test-unit

just test-unit uses scripts/run_unit_tests.sh, which now selects a writable and cache-friendly target directory in this order:

  1. target/ (preferred)
  2. .cargo-target/ (project-local fallback)
  3. $HOME/.cache/pg_trickle-target
  4. ${TMPDIR:-/tmp}/pg_trickle-target (last resort)

This avoids permission failures on bind mounts and preserves incremental builds when source or test files change.

If you see permission errors in containerized runs, verify you are not forcing a different container user/UID than expected by your workspace mount.

Run E2E tests in devcontainer

E2E tests use Testcontainers and require Docker access from inside the devcontainer (provided by the Docker-in-Docker feature in .devcontainer/devcontainer.json).

Run from the workspace root inside the devcontainer:

just build-e2e-image
just test-e2e

Notes:

  • The E2E harness starts containers via testcontainers (tests/e2e/mod.rs).
  • The default E2E image is pg_trickle_e2e:latest (built by tests/build_e2e_image.sh).
  • A plain docker run of the dev image is not equivalent to a full VS Code devcontainer session with features/lifecycle hooks enabled.

Making a Pull Request

  1. Fork the repository and create a branch: git checkout -b fix/my-fix
  2. Make your changes following the conventions in AGENTS.md
  3. Run just fmt && just lint — both must pass with zero warnings
  4. Add or update tests — see AGENTS.md § Testing
  5. Open a PR against main

The PR template will walk you through the checklist.

CI Coverage on PRs

PR CI runs a three-tier gate:

  • Unit tests (Linux only)
  • Integration tests
  • Light E2E — curated PR-friendly end-to-end coverage split across three shards and executed against stock postgres:18.3

Full E2E, TPC-H tests, benchmarks, dbt, CNPG smoke, and the extra macOS / Windows unit jobs stay off the PR critical path and run on push-to-main, schedule, or manual dispatch. This keeps typical PR feedback closer to the single-digit-minute range while preserving broader scheduled coverage.

To trigger the full CI matrix on your PR branch (recommended for DVM engine, refresh, or CDC changes):

gh workflow run ci.yml --ref <your-branch>

To run all tests locally before pushing:

just test-all          # unit + integration + e2e

# PR-equivalent fast path:
just test-unit
just test-integration
just test-light-e2e

# TPC-H correctness tests (requires e2e Docker image):
cargo test --test e2e_tpch_tests -- --ignored --test-threads=1 --nocapture

See AGENTS.md § Testing for the full CI coverage matrix.

Coding Conventions (summary)

  • No unwrap() or panic!() in non-test code
  • All unsafe blocks require a // SAFETY: comment
  • Errors go through PgTrickleError in src/error.rs
  • New SQL functions use #[pg_extern(schema = "pgtrickle")]
  • Tests use Testcontainers — never a local PostgreSQL instance

Full details are in AGENTS.md.

Commit Messages

Use Conventional Commits:

fix: correct pgoutput action parsing for tables named INSERT_LOG
feat: add CUBE explosion guard (max 64 UNION ALL branches)
docs: document JOIN key change limitation in SQL_REFERENCE
test: add E2E test for keyless table duplicate-row behaviour

Fuzz Testing

pg_trickle uses cargo-fuzz (libFuzzer) to exercise core parser and pipeline logic. Fuzz targets live in fuzz/fuzz_targets/.

Available targets

TargetWhat it exercisesAdded in
parser_fuzzDVM SQL parser (OpTree construction)v0.1.0
cron_fuzzCron expression parserv0.26.0
guc_fuzzGUC string→enum coercionv0.26.0
cdc_fuzzCDC trigger payload decodingv0.26.0
wal_fuzzWAL/SQLSTATE error classifierv0.39.0
dag_fuzzDAG/merge SQL and snapshot column-listv0.39.0
sql_builder_fuzzSQL builder + typed parser facadev0.44.0
merge_sql_fuzzMerge SQL codegen with random change streamsv0.49.0
row_id_fuzzRow identity tracking with random operator treesv0.49.0

Running all fuzz targets

# Run each target for 60 s (requires nightly toolchain):
just fuzz-all

# Run for a custom duration (seconds):
just fuzz-all 120

# Run a single target:
cargo +nightly fuzz run merge_sql_fuzz -- -max_total_time=60

Corpus directories are at fuzz/corpus/<target_name>/. Regression cases are stored in proptest-regressions/.

Every new feature ships with documentation. This is a hard requirement, not a nice-to-have.

  • New SQL functions → entry in docs/SQL_REFERENCE.md.
  • New GUC variable → entry in docs/CONFIGURATION.md.
  • New user-facing capability → at minimum a paragraph in the relevant chapter; for headline features, a dedicated page.
  • New page → add it to docs/SUMMARY.md in the correct chapter.

PRs that introduce a #[pg_extern] function or a new GUC without documentation will be asked to add it before merge.

Updating base image digests

Docker base images are pinned to exact SHA256 digests in all Dockerfiles (Dockerfile.demo, Dockerfile.ghcr, tests/Dockerfile.e2e) for reproducibility and supply-chain security (OPS-10-03).

To update the digests when a new PostgreSQL patch release is available:

# Requires docker with manifest support
scripts/update_base_image_digests.sh

The script resolves the current linux/amd64 digest for postgres:18.3-bookworm, patches all Dockerfiles in-place, and prints the commit command. Run this script quarterly or when a PostgreSQL patch release is needed. Include the digest-update commit in the release PR.

If you are building for linux/arm64 or another platform, edit the TARGET_PLATFORM variable in the script or pin to the manifest index digest (returned by docker manifest inspect postgres:18.3-bookworm --verbose).

License

By contributing you agree that your contributions will be licensed under the Apache License 2.0.

Looking for the user security guide? See Security Guide for roles, grants, RLS interaction, SECURITY DEFINER semantics, and hardening checklists.

This page contains the project's vulnerability-reporting policy. The rendered version is served by the pg_trickle docs site; on GitHub the source is SECURITY.md.

Security Policy

Supported Versions

VersionSupported
0.13.x (current pre-release)

During pre-1.0 development, only the latest minor version receives security fixes. Once v1.0.0 is released, the two most recent minor versions will receive security fixes.

Reporting a Vulnerability

Please do not report security vulnerabilities via public GitHub Issues.

Use GitHub's built-in private vulnerability reporting:

  1. Go to the Security tab of this repository
  2. Click "Report a vulnerability"
  3. Fill in the details — affected version, description, reproduction steps, and potential impact

We aim to acknowledge reports within 48 hours and provide a fix or mitigation within 14 days for critical issues.

What to Include

A useful report includes:

  • PostgreSQL version and pg_trickle version
  • Minimal reproduction SQL or Rust code
  • Description of the unintended behaviour and its security impact
  • Whether the vulnerability requires a trusted (superuser) or untrusted role to trigger

Scope

In-scope:

  • SQL injection or privilege escalation via pgtrickle.* functions
  • Memory safety issues in the Rust extension code (buffer overflows, use-after-free, etc.)
  • Denial-of-service caused by a low-privilege user triggering runaway resource usage
  • Information disclosure through change buffers (pgtrickle_changes.*) or monitoring views

Out-of-scope:

Disclosure Policy

We follow coordinated disclosure. Once a fix is released we will publish a security advisory on GitHub with a CVE if applicable.

Architecture

This document describes the internal architecture of pg_trickle — a PostgreSQL 18 extension that implements stream tables with differential view maintenance. For a high-level description of what pg_trickle does and why, read ESSENCE.md. For release milestones and future plans, see Roadmap.


High-Level Overview

┌─────────────────────────────────────────────────────────────────┐
│                     PostgreSQL 18 Backend                       │
│                                                                 │
│  ┌──────────┐   ┌──────────┐   ┌──────────┐   ┌─────────────┐   │
│  │  Source  │   │  Source  │   │  Storage │   │  Storage    │   │
│  │  Table A │   │  Table B │   │  Table X │   │  Table Y    │   │
│  └────┬─────┘   └────┬─────┘   └────▲─────┘   └────▲────────┘   │
│       │              │              │              │            │
│  ═════╪══════════════╪══════════════╪══════════════╪════════    │
│       │              │              │              │            │
│  ┌────▼──────────────▼────┐   ┌────┴──────────────┴────┐        │
│  │  Hybrid CDC Layer      │   │  Delta Application     │        │
│  │  Triggers ──or── WAL   │   │  (INSERT/DELETE diffs) │        │
│  └────────────┬───────────┘   └────────────▲───────────┘        │
│               │                            │                    │
│  ┌────────────▼───────────┐   ┌────────────┴───────────┐        │
│  │   Change Buffer        │   │   DVM Engine           │        │
│  │   (pgtrickle_changes.*) │   │   (Operator Tree)      │        │
│  └────────────┬───────────┘   └────────────▲───────────┘        │
│               │                            │                    │
│               └────────────┬───────────────┘                    │
│                            │                                    │
│  ┌─────────────────────────▼─────────────────────────────┐      │
│  │              Refresh Engine                           │      │
│  │  ┌──────────┐  ┌──────────┐  ┌─────────────────────┐  │      │
│  │  │ Frontier │  │ DAG      │  │ Scheduler           │  │      │
│  │  │ Tracker  │  │ Resolver │  │ (canonical schedule)│  │      │
│  │  └──────────┘  └──────────┘  └─────────────────────┘  │      │
│  └───────────────────────────────────────────────────────┘      │
│                                                                 │
│  ┌────────────────────────────────────────────────────────┐     │
│  │                    Catalog (pgtrickle.*)                │     │
│  │  pgt_stream_tables │ pgt_dependencies │ pgt_refresh_history│  │
│  └────────────────────────────────────────────────────────┘     │
│                                                                 │
│  ┌──────────────────────────────────────────────────────┐       │
│  │                  Monitoring Layer                    │       │
│  │  st_refresh_stats │ slot_health │ check_cdc_health    │       │
│  │  explain_st │ views │ NOTIFY alerting               │       │
│  └──────────────────────────────────────────────────────┘       │
└─────────────────────────────────────────────────────────────────┘

Component Details

1. SQL API Layer (src/api/)

The public entry point for users. All operations are exposed as #[pg_extern] functions in the pgtrickle schema. The API module is split into focused sub-modules:

FileResponsibility
src/api/mod.rsCore lifecycle: create_stream_table, alter_stream_table, drop_stream_table, refresh_stream_table, bulk_create, repair_stream_table, pgt_status
src/api/diagnostics.rsInspection helpers: explain_st, explain_refresh_mode, dependency_tree, list_sources
src/api/outbox_hook.rspg_tide integration hook: attach_outbox() — calls into pg_tide after each successful refresh (v0.46.0+)
src/api/snapshot.rsStream table snapshots (v0.27.0): snapshot_stream_table, restore_from_snapshot, list_snapshots, drop_snapshot
src/api/self_monitoring.rsSelf-monitoring setup/teardown and auto-apply policy
src/api/cluster.rsMulti-database cluster overview: cluster_worker_summary
src/api/publication.rsLogical publication helpers and predictive cost model utilities
src/api/metrics_ext.rsExtended Prometheus metrics
src/api/helpers.rsShared utilities (name resolution, table quoting)
src/api/planner.rsSchedule recommendation API

Core functions:

  • create_stream_table — Applies a chain of auto-rewrite passes (view inlining → DISTINCT ON → GROUPING SETS → scalar subquery in WHERE → correlated scalar subquery in SELECT → SubLinks in OR → multi-PARTITION BY windows), parses the defining query, builds an operator tree, creates the storage table, registers CDC slots, populates the catalog, and optionally performs an initial full refresh.
  • alter_stream_table — Modifies schedule, refresh mode, status (ACTIVE/SUSPENDED), or defining query. Query changes trigger schema migration, dependency updates, and a full refresh within a single transaction.
  • drop_stream_table — Removes the storage table, catalog entries, and cleans up CDC slots.
  • refresh_stream_table — Triggers a manual refresh (same path as automatic scheduling).
  • pgt_status — Returns a summary of all registered stream tables.

2. Catalog (src/catalog.rs)

The catalog manages persistent metadata stored in PostgreSQL tables within the pgtrickle schema:

TablePurpose
pgtrickle.pgt_stream_tablesCore metadata: name, query, schedule, status, frontier, etc.
pgtrickle.pgt_dependenciesDAG edges from ST to source tables
pgtrickle.pgt_refresh_historyAudit log of every refresh operation
pgtrickle.pgt_change_trackingPer-source CDC slot metadata

Schema creation is handled by extension_sql!() macros that run at CREATE EXTENSION time.

Entity-Relationship Diagram

erDiagram
    pgt_stream_tables {
        bigserial pgt_id PK
        oid pgt_relid UK "OID of materialized storage table"
        text pgt_name
        text pgt_schema
        text defining_query
        text original_query "User's original SQL (pre-inlining)"
        text schedule "Duration or cron expression"
        text refresh_mode "FULL | DIFFERENTIAL | IMMEDIATE"
        text status "INITIALIZING | ACTIVE | SUSPENDED | ERROR"
        boolean is_populated
        timestamptz data_timestamp "Freshness watermark"
        jsonb frontier "DBSP-style version frontier"
        timestamptz last_refresh_at
        int consecutive_errors
        boolean needs_reinit
        float8 auto_threshold
        float8 last_full_ms
        timestamptz created_at
        timestamptz updated_at
    }

    pgt_dependencies {
        bigint pgt_id PK,FK "References pgt_stream_tables.pgt_id"
        oid source_relid PK "OID of source table"
        text source_type "TABLE | STREAM_TABLE | VIEW"
        text_arr columns_used "Column-level lineage"
        text cdc_mode "TRIGGER | TRANSITIONING | WAL"
        text slot_name "Replication slot (WAL mode)"
        pg_lsn decoder_confirmed_lsn "WAL decoder progress"
        timestamptz transition_started_at "Trigger→WAL transition start"
    }

    pgt_refresh_history {
        bigserial refresh_id PK
        bigint pgt_id FK "References pgt_stream_tables.pgt_id"
        timestamptz data_timestamp
        timestamptz start_time
        timestamptz end_time
        text action "NO_DATA | FULL | DIFFERENTIAL | REINITIALIZE | SKIP"
        bigint rows_inserted
        bigint rows_deleted
        text error_message
        text status "RUNNING | COMPLETED | FAILED | SKIPPED"
        text initiated_by "SCHEDULER | MANUAL | INITIAL"
        timestamptz freshness_deadline
    }

    pgt_change_tracking {
        oid source_relid PK "OID of tracked source table"
        text slot_name "Trigger function name"
        pg_lsn last_consumed_lsn
        bigint_arr tracked_by_pgt_ids "ST IDs sharing this source"
    }

    pgt_stream_tables ||--o{ pgt_dependencies : "has sources"
    pgt_stream_tables ||--o{ pgt_refresh_history : "has refresh history"
    pgt_stream_tables }o--o{ pgt_change_tracking : "tracks via pgt_ids array"

Note: Change buffer tables (pgtrickle_changes.changes_<oid>) are created dynamically per source table OID and live in the separate pgtrickle_changes schema.

3. CDC / Change Data Capture (src/cdc.rs, src/wal_decoder.rs)

pg_trickle uses a hybrid CDC architecture that starts with triggers and optionally transitions to WAL-based (logical replication) capture for lower write-side overhead.

Trigger Mode (default)

  1. Trigger Management — Creates AFTER INSERT OR UPDATE OR DELETE row-level triggers (pg_trickle_cdc_<oid>) on each tracked source table. Each trigger fires a PL/pgSQL function (pg_trickle_cdc_fn_<oid>()) that writes changes to the buffer table.
  2. Change Buffering — Decoded changes are written to per-source change buffer tables in the pgtrickle_changes schema. Each row captures the LSN (pg_current_wal_lsn()), transaction ID, action type (I/U/D), and the new/old row data as typed columns (new_<col> TYPE, old_<col> TYPE) — native PostgreSQL types, not JSONB.
  3. Cleanup — Consumed changes are deleted after each successful refresh via delete_consumed_changes(), bounded by the upper LSN to prevent unbounded scans.
  4. Lifecycle — Triggers and trigger functions are automatically created when a source table is first tracked and dropped when the last stream table referencing a source is removed.

The trigger approach was chosen as the default for transaction safety (triggers can be created in the same transaction as DDL), simplicity (no slot management, no wal_level = logical requirement), and immediate visibility (changes are visible in buffer tables as soon as the source transaction commits).

WAL Mode (optional, automatic transition)

When pg_trickle.cdc_mode is set to 'auto' or 'wal' and wal_level = logical is available, the system transitions from trigger-based to WAL-based CDC after the first successful refresh:

  1. WAL Availability Detection — At stream table creation, checks whether wal_level = logical is configured. If so, the source dependency is marked for WAL transition.
  2. WAL Decoder Background Worker — A dedicated background worker (src/wal_decoder.rs) polls logical replication slots and writes decoded changes into the same change buffer tables used by triggers, ensuring a uniform format for the DVM engine.
  3. Transition Orchestration — The transition is a three-step process: (a) create a replication slot, (b) wait for the decoder to catch up to the trigger's last confirmed LSN, (c) drop the trigger and switch the dependency to WAL mode. If the decoder doesn't catch up within pg_trickle.wal_transition_timeout (default 300s), the system falls back to triggers.
  4. CDC Mode Tracking — Each source dependency in pgt_dependencies carries a cdc_mode column (TRIGGER / TRANSITIONING / WAL) and WAL-specific metadata (slot_name, decoder_confirmed_lsn, transition_started_at).

See ADR-001 and ADR-002 in plans/adrs/PLAN_ADRS.md for the original design rationale and plans/sql/PLAN_HYBRID_CDC.md for the full implementation plan.

Immediate Mode / Transactional IVM (src/ivm.rs)

When refresh_mode = 'IMMEDIATE', pg_trickle uses statement-level AFTER triggers with transition tables instead of row-level CDC triggers. The stream table is maintained synchronously within the same transaction as the base table DML.

  1. BEFORE Triggers — Statement-level BEFORE triggers on each base table acquire an advisory lock on the stream table to prevent concurrent conflicting updates.
  2. AFTER Triggers — Statement-level AFTER triggers with REFERENCING NEW TABLE AS ... OLD TABLE AS ... copy the transition table data to temp tables, then call the Rust pgt_ivm_apply_delta() function.
  3. Delta Computation — The DVM engine's Scan operator reads from the temp tables (via DeltaSource::TransitionTable) instead of change buffer tables. No LSN filtering or net-effect computation is needed — each trigger invocation represents a single atomic statement.
  4. Delta Application — The computed delta is applied via explicit DML (DELETE + INSERT ON CONFLICT) to the stream table.
  5. TRUNCATE — A separate AFTER TRUNCATE trigger calls pgt_ivm_handle_truncate(), which truncates the stream table and re-populates from the defining query.

No change buffer tables, no scheduler involvement, and no WAL infrastructure is needed for IMMEDIATE mode. See plans/sql/PLAN_TRANSACTIONAL_IVM.md for the design plan.

ST-to-ST Change Capture (v0.11.0+)

When a stream table's defining query references another stream table (rather than a base table), neither triggers nor WAL capture apply — the upstream source is itself maintained by pg_trickle. A dedicated ST change buffer mechanism enables downstream stream tables to refresh differentially even when their source is another stream table.

  Base Table  ──trigger/WAL──▶  changes_<oid>       (base-table buffer)
  Stream Table A  ──refresh──▶  changes_pgt_<pgt_id>  (ST buffer for A's consumers)
  Stream Table B  reads from    changes_pgt_<pgt_id>  (B depends on A)

Buffer schema. ST change buffers are named pgtrickle_changes.changes_pgt_<pgt_id> (using the internal pgt_id rather than the OID). Unlike base-table buffers, they store only new_* columns — no old_* columns — because ST deltas are expressed as INSERT/DELETE pairs, not UPDATE rows.

Delta capture — DIFFERENTIAL path. When an upstream stream table refreshes in DIFFERENTIAL mode and has downstream consumers, the refresh engine captures the computed delta (the INSERT and DELETE rows applied to the upstream ST) into the ST change buffer via explicit DML. Downstream stream tables then read from this buffer exactly as they would read from a base-table change buffer.

Delta capture — FULL path. When an upstream stream table refreshes in FULL mode (e.g., due to a mode downgrade or full => true), the engine takes a pre-refresh snapshot, executes the full refresh, then computes an EXCEPT ALL diff between the old and new contents. The resulting INSERT/DELETE pairs are written to the ST change buffer. This prevents FULL refreshes from cascading through the entire dependency chain — downstream STs always receive a minimal delta regardless of how the upstream was refreshed.

Frontier tracking. ST source positions are tracked in the same frontier JSONB structure as base-table sources, using pgt_<upstream_pgt_id> as the key (e.g., {"pgt_42": 157}) rather than the OID-based keys used for base tables. The scheduler's has_stream_table_source_changes() function compares the downstream's last-consumed frontier position against the upstream buffer's current maximum LSN to decide whether a refresh is needed.

Lifecycle. ST change buffers are created automatically when a stream table gains its first downstream consumer (create_st_change_buffer_table()), and dropped when the last downstream consumer is removed (drop_st_change_buffer_table()). On upgrade from pre-v0.11.0, existing ST-to-ST dependencies have their buffers auto-created on the first scheduler tick. Consumed rows are cleaned up by cleanup_st_change_buffers_by_frontier() after each successful downstream refresh.

Frontier Visibility Holdback (Issue #536)

The CDC frontier (pgt_stream_tables.frontier) is advanced based on LSN ordering while the change buffer is read under standard MVCC visibility. These two dimensions are orthogonal: a change buffer row may have an LSN below the new frontier yet still be invisible (uncommitted) at the moment the scheduler queries the buffer.

Failure scenario (trigger-based CDC only): Without holdback, a transaction that inserts into a tracked table and commits after the scheduler has captured the tick watermark (pg_current_wal_lsn()) will have its change-buffer row permanently skipped on the next tick, because the frontier advanced past the row's LSN while the row was still uncommitted.

Fix — frontier_holdback_mode = 'xmin' (default): Before computing the tick watermark, the scheduler probes pg_stat_activity and pg_prepared_xacts for the oldest in-progress transaction xmin. If any transaction from before the previous tick is still running, the frontier is held back to the previous tick's safe watermark rather than advancing to pg_current_wal_lsn(). This is a single cheap SPI round-trip per scheduler tick (~µs).

The holdback algorithm (cdc::classify_holdback) is purely functional and unit-tested independently of the backend.

Configuration:

  • pg_trickle.frontier_holdback_mode'xmin' (default, safe), 'none' (fast but can lose rows), 'lsn:<N>' (hold back by N bytes, for debugging).
  • pg_trickle.frontier_holdback_warn_seconds — emit a WARNING (at most once per minute) when holdback has been active longer than this many seconds (default: 60).

Note: WAL/logical-replication CDC mode is immune to this issue (commit-LSN ordering is inherently safe). The holdback is skipped when cdc_mode = 'wal'.

Observability: Two Prometheus gauges are exposed:

  • pg_trickle_frontier_holdback_lsn_bytes — how many WAL bytes behind write_lsn the safe frontier currently is.
  • pg_trickle_frontier_holdback_seconds — age (in seconds) of the oldest in-progress transaction.

See plans/safety/PLAN_FRONTIER_VISIBILITY_HOLDBACK.md for the full design rationale.

4. DVM Engine (src/dvm/)

The Differential View Maintenance engine is the core of the system. It transforms the defining SQL query into an executable operator tree that can compute deltas efficiently.

Auto-Rewrite Pipeline (src/dvm/parser.rs)

Before the defining query is parsed into an operator tree, it passes through a chain of auto-rewrite passes that normalize SQL constructs the DVM parser doesn't handle directly:

PassFunctionPurpose
#0rewrite_views_inline()Replace view references with (view_definition) AS alias subqueries
#1rewrite_distinct_on()Convert DISTINCT ON to ROW_NUMBER() OVER (…) = 1 window subquery
#2rewrite_grouping_sets()Decompose GROUPING SETS / CUBE / ROLLUP into UNION ALL of GROUP BY
#3rewrite_scalar_subquery_in_where()Convert WHERE col > (SELECT …) to CROSS JOIN
#4rewrite_sublinks_in_or()Split WHERE a OR EXISTS (…) into UNION branches
#5rewrite_multi_partition_windows()Split multiple PARTITION BY clauses into joined subqueries

The view inlining pass (#0) runs first so that view definitions containing DISTINCT ON, GROUPING SETS, etc. are further rewritten by downstream passes. Nested views are expanded via a fixpoint loop (max depth 10).

Query Parser (src/dvm/parser.rs)

Parses the defining query using PostgreSQL's internal parser (via pgrx raw_parser) and extracts:

  • WITH clause — CTE definitions (non-recursive: inline expansion or shared delta; recursive: detected for mode gating)
  • Target list — output columns
  • FROM clause — source tables, joins, subqueries, and CTE references
  • WHERE clause — filters
  • GROUP BY / aggregate functions
  • DISTINCT / UNION ALL / INTERSECT / EXCEPT

The parser produces an OpTree — a tree of operator nodes. CTE handling follows a tiered approach:

  1. Tier 1 (Inline Expansion) — Non-recursive CTEs referenced once are expanded into Subquery nodes, equivalent to subqueries in FROM.
  2. Tier 2 (Shared Delta) — Non-recursive CTEs referenced multiple times produce CteScan nodes that share a single delta computation via a CTE registry and delta cache.
  3. Tier 3a/3b/3c (Recursive) — Recursive CTEs (WITH RECURSIVE) are detected via query_has_recursive_cte(). In FULL mode, the query executes as-is. In DIFFERENTIAL mode, the strategy is auto-selected: semi-naive evaluation for INSERT-only changes, Delete-and-Rederive (DRed) for mixed changes, or recomputation fallback when CTE columns don't match ST storage or when the recursive term contains non-monotone operators (EXCEPT, Aggregate, Window, DISTINCT, AntiJoin, INTERSECT SET). In IMMEDIATE mode, the same semi-naive / DRed machinery runs against statement transition tables and is bounded by pg_trickle.ivm_recursive_max_depth to guard against unbounded recursion.

§ Recursive CTE Strategy Selection

The DVM engine selects among five strategies for WITH RECURSIVE queries. The selection is logged at startup and visible via explain_stream_table().

TierConditionStrategy
Tier 1CTE is non-recursive and referenced onceInline expansion — CTE is expanded inline; no differential overhead.
Tier 2CTE is non-recursive and referenced 2+ timesShared delta — single delta computation reused across all reference sites.
Tier 3aCTE is recursive with monotone operators only (UNION ALL, no NOT EXISTS / aggregation)Semi-naive evaluation — frontier-bounded delta avoids full recomputation.
Tier 3bCTE is recursive with non-monotone operators; base tables have primary keysDRed (Deletion Propagation in Recursive Datalog) — handles deletions by re-deriving affected tuples.
Tier 3cCTE is recursive with non-monotone operators and no primary keys, or cycle in dependency graphFull recomputation — most conservative; correct for all inputs.

Observability: explain_stream_table(st_name) returns a recursive_cte_strategy field showing which tier was selected and the reason. Example output:

{
  "recursive_cte_strategy": "semi_naive",
  "recursive_cte_reason": "Tier 3a: monotone UNION ALL recursion with no aggregation or NOT EXISTS"
}

Example — Tier 3a (semi-naive) for hierarchical closure:

WITH RECURSIVE ancestors AS (
  SELECT id, parent_id FROM org_chart WHERE parent_id IS NULL
  UNION ALL
  SELECT c.id, c.parent_id
  FROM org_chart c
  JOIN ancestors a ON c.parent_id = a.id
)
SELECT * FROM ancestors;

Because the recursive term uses only UNION ALL and a plain JOIN (both monotone), pg_trickle selects Tier 3a (semi-naive): only newly reachable rows are computed per delta, not the full transitive closure.

Operators (src/dvm/operators/)

Each operator knows how to generate a delta query — given a set of changes to its inputs, it produces the corresponding changes to its output:

OperatorDelta Strategy
ScanDirect passthrough of CDC changes
FilterApply WHERE predicate to deltas
ProjectApply column projection to deltas
JoinJoin deltas against the other side's current state
OuterJoinLEFT/RIGHT outer join with NULL padding
FullJoinFULL OUTER JOIN with 8-part delta (both sides may produce NULLs)
AggregateRecompute group values where affected keys changed
DistinctCOUNT-based duplicate tracking
UnionAllMerge deltas from both branches
IntersectDual-count multiplicity with LEAST boundary crossing
ExceptDual-count multiplicity with GREATEST(0, L-R) boundary crossing
SubqueryTransparent delegation + optional column renaming (CTEs, subselects)
CteScanShared delta lookup from CTE cache (multi-reference CTEs)
RecursiveCteSemi-naive / DRed / recomputation for WITH RECURSIVE
WindowPartition-based recomputation for window functions
LateralFunctionRow-scoped recomputation for SRFs in FROM (jsonb_array_elements, unnest, etc.)
LateralSubqueryRow-scoped recomputation for correlated subqueries in LATERAL FROM
SemiJoinEXISTS / IN subquery delta via semi-join
AntiJoinNOT EXISTS / NOT IN subquery delta via anti-join
ScalarSubqueryCorrelated scalar subquery in SELECT list

See DVM_OPERATORS.md for detailed descriptions.

Diff Engine (src/dvm/diff.rs)

Generates the final diff SQL that:

  1. Computes the delta from the operator tree
  2. Produces ('+', row) for inserts and ('-', row) for deletes
  3. Applies the diff via DELETE matching old rows and INSERT for new rows

5. DAG / Dependency Graph (src/dag.rs)

Stream tables can depend on other stream tables (cascading), forming a Directed Acyclic Graph:

  • Cycle detection — Detects circular dependencies at creation time using Kahn's algorithm (BFS topological sort). When pg_trickle.allow_circular = true, monotone cycles (queries using only safe operators — joins, filters, UNION ALL, etc.) are allowed; non-monotone cycles (aggregates, EXCEPT, window functions, anti-joins) are rejected. SCC IDs are automatically assigned to cycle members and recomputed on drop/alter.
  • SCC decomposition — Tarjan's algorithm decomposes the graph into strongly connected components. Singleton SCCs are acyclic; multi-node SCCs contain cycles that are handled by fixed-point iteration in the scheduler.
  • Monotonicity analysis — Static check (check_monotonicity() in src/dvm/parser.rs) determines whether a query's operators are safe for cyclic fixed-point iteration. Non-monotone operators (Aggregate, EXCEPT, Window, NOT EXISTS) block cycle creation.
  • Topological ordering — Determines refresh order: upstream STs must be refreshed before downstream STs.
  • Condensation ordercondensation_order() returns SCCs in topological order, grouping cyclic STs for fixed-point iteration. The scheduler's iterate_to_fixpoint() processes multi-node SCCs by refreshing all members repeatedly until convergence (zero net changes) or max_fixpoint_iterations is exceeded.
  • Cascade operations — When a source table changes, all transitive dependents are identified for refresh.

6. Version / Frontier Tracking (src/version.rs)

Implements a per-source frontier (JSONB map of source_oid → LSN) to track exactly how far each stream table has consumed changes:

  • Read frontier — Before refresh, read the frontier to know where to start consuming changes.
  • Advance frontier — After a successful refresh, the frontier is updated to the latest consumed LSN.
  • Consistent snapshots — The frontier ensures that each refresh processes a contiguous, non-overlapping window of changes.

Delayed View Semantics (DVS) Guarantee

The contents of every stream table are logically equivalent to evaluating its defining query at some past point in time — the data_timestamp. The scheduler refreshes STs in topological order so that when ST B references upstream ST A, A has already been refreshed to the target data_timestamp before B runs its delta query against A's contents. The frontier lifecycle is:

  1. Created — on first full refresh; records the LSN of each source at that moment.
  2. Advanced — on each differential refresh; the old frontier becomes the lower bound and the new frontier (with fresh LSNs) the upper bound. The DVM engine reads changes in [old, new].
  3. Reset — on reinitialize; a fresh frontier is created from scratch.

7. Refresh Engine (src/refresh.rs)

Orchestrates the complete refresh cycle:

┌──────────────┐
│  Check State │ → Is ST active? Has it been populated?
└──────┬───────┘
       │
 ┌─────▼──────┐
 │ Drain CDC  │ → Read WAL changes into change buffer tables
 └─────┬──────┘
       │
 ┌─────▼──────────────┐
 │ Determine Action   │ → FULL, DIFFERENTIAL, NO_DATA, REINITIALIZE, or SKIP?
 │                    │   (adaptive: if change ratio > pg_trickle.differential_max_change_ratio,
 │                    │    downgrade DIFFERENTIAL → FULL automatically)
 └─────┬──────────────┘
       │
 ┌─────▼──────┐
 │ Execute    │ → Full: TRUNCATE + INSERT ... SELECT
 │            │   Differential: Generate & apply delta SQL
 └─────┬──────┘
       │
 ┌─────▼──────────────┐
 │ Record History     │ → Write to pgtrickle.pgt_refresh_history
 └─────┬──────────────┘
       │
 ┌─────▼──────────────┐
 │ Advance Frontier   │ → Update JSONB frontier in catalog
 └─────┬──────────────┘
       │
 ┌─────▼──────────────┐
 │ Reset Error Count  │ → On success, reset consecutive_errors to 0
 └──────────────────────┘

8. Background Worker & Scheduling (src/scheduler.rs)

Registration & Lifecycle

pg_trickle registers one PostgreSQL background worker — the scheduler — during _PG_init() (extension load). Because it is registered at startup, pg_trickle must appear in shared_preload_libraries, which requires a server restart.

┌──────────────────────────────────────────────────────────────────┐
│                  PostgreSQL postmaster                           │
│                                                                  │
│  shared_preload_libraries = 'pg_trickle'                          │
│       │                                                          │
│       ▼                                                          │
│  _PG_init()                                                      │
│    ├─ Register GUCs (pg_trickle.enabled, scheduler_interval_ms …) │
│    ├─ Register shared memory (PgTrickleSharedState, atomics)      │
│    └─ BackgroundWorkerBuilder::new("pg_trickle scheduler")        │
│         .set_start_time(RecoveryFinished)                        │
│         .set_restart_time(5s)       ← auto-restart on crash      │
│         .load()                                                  │
│                                                                  │
│  After recovery finishes:                                        │
│       │                                                          │
│       ▼                                                          │
│  pg_trickle_scheduler_main()         ← background worker starts   │
│    ├─ Attach SIGHUP + SIGTERM handlers                           │
│    ├─ Connect to SPI (database = "postgres")                     │
│    ├─ Crash recovery: mark stale RUNNING records as FAILED       │
│    └─ Enter main loop ─────────────────────────┐                 │
│         │                                      │                 │
│         ▼                                      │                 │
│     wait_latch(scheduler_interval_ms)          │                 │
│         │                                      │                 │
│     ┌───▼───────────────────────────────┐      │                 │
│     │ SIGTERM? → log + break            │      │                 │
│     │ pg_trickle.enabled = false? → skip │      │                 │
│     │ Otherwise → scheduler tick        │      │                 │
│     └───┬───────────────────────────────┘      │                 │
│         │                                      │                 │
│         └──────────── loop ────────────────────┘                 │
└──────────────────────────────────────────────────────────────────┘

Key lifecycle properties:

PropertyBehaviour
Start conditionAfter PostgreSQL recovery finishes (RecoveryFinished)
Auto-restart5-second delay after an unexpected crash
Graceful shutdownHandles SIGTERM — breaks the main loop and exits cleanly
Config reloadHandles SIGHUP — re-reads GUC values on the next latch wake
Crash recoveryOn startup, any pgt_refresh_history rows stuck in RUNNING status are marked FAILED (the transaction that wrote them was rolled back by PostgreSQL, but the status row may have been committed in a prior transaction)
DatabaseConnects to the postgres database via SPI
Standby / replicaOn standby servers (pg_is_in_recovery() = true), the worker enters a sleep loop and does not attempt refreshes. Stream tables are still readable on standbys — they are regular heap tables replicated via physical streaming replication. After promotion the scheduler resumes automatically. See the FAQ § Replication for details on logical replication and subscriber limitations.

Scheduler Tick

Each tick of the main loop performs the following steps inside a single transaction:

  1. DAG rebuild — Compare the shared-memory DAG_REBUILD_SIGNAL counter against the local copy. If it advanced (a CREATE, ALTER, or DROP stream table occurred), rebuild the in-memory dependency graph (StDag) from the catalog.
  2. Topological traversal — Walk stream tables in dependency order (upstream before downstream). This ensures that when ST B references ST A, A is refreshed first.
  3. Per-ST evaluation — For each active ST:
    • Skip if in retry backoff (exponential, per-ST).
    • Skip if schedule/cron says not yet due.
    • Skip if a row-level lock on the catalog entry indicates a concurrent refresh.
    • Check upstream change buffers for pending rows.
  4. Execute refresh — Acquire a row-level lock on the catalog entry → record RUNNING in history → run FULL / DIFFERENTIAL / REINITIALIZE → store new frontier → release lock → record completion.
  5. WAL transitions — Advance any trigger→WAL CDC mode transitions (src/wal_decoder.rs).
  6. Slot health — Check replication slot health and emit NOTIFY alerts.
  7. Prune retry state — Remove backoff entries for STs that no longer exist.

Sequential Processing (Default)

By default (parallel_refresh_mode = 'off), the scheduler processes stream tables sequentially within a single background worker. All STs are refreshed one at a time in topological order. pg_trickle.max_concurrent_refreshes (default 4) only prevents a manual pgtrickle.refresh_stream_table() call from overlapping with the scheduler on the same ST — it does not spawn additional workers.

The PostgreSQL GUC max_worker_processes (default 8) sets the server-wide budget for all background workers (autovacuum, parallel query, logical replication, extensions). In sequential mode pg_trickle consumes one slot from that budget.

Parallel Refresh (parallel_refresh_mode = 'on')

When enabled, the scheduler builds an execution-unit DAG from the stream-table dependency graph and dispatches independent units to dynamic background workers:

  1. Execution units — Each independent stream table becomes a singleton unit. Atomic consistency groups and IMMEDIATE-trigger closures are collapsed into composite units that run in a single worker for correctness.
  2. Ready queue — Units whose upstream dependencies have all completed enter the ready queue. The coordinator dispatches them subject to a per-database cap (max_concurrent_refreshes) and a cluster-wide cap (max_dynamic_refresh_workers).
  3. Dynamic workers — Each dispatched unit spawns a short-lived background worker via BackgroundWorkerBuilder::load_dynamic(). Workers claim a job from the pgtrickle.pgt_scheduler_jobs catalog table, execute the refresh, and exit.

The parallel path respects the same topological ordering as the sequential path — downstream units only become ready after all upstream units succeed. The worker-budget caps ensure pg_trickle does not exhaust max_worker_processes.

See PLAN_PARALLELISM.md for the full design and CONFIGURATION.md for tuning guidance.

Retry & Error Handling

Each ST maintains an in-memory RetryState (reset on scheduler restart):

  • Retryable errors (SPI failures, lock contention, slot issues) trigger exponential backoff.
  • Permanent errors (schema mismatch, user errors) skip backoff but increment consecutive_errors.
  • When consecutive_errors reaches pg_trickle.max_consecutive_errors (default 3), the ST is auto-suspended and a NOTIFY alert is emitted.
  • Schema errors additionally set needs_reinit, triggering a REINITIALIZE on the next successful cycle.

Scheduling Policy

Automatic refresh scheduling uses canonical periods (48·2ⁿ seconds, n = 0, 1, 2, …) snapped to the user's schedule:

  • Picks the smallest canonical period ≤ schedule.
  • For DOWNSTREAM schedule (NULL schedule), the ST refreshes only when explicitly triggered or when a downstream ST needs it.
  • Advisory locks prevent concurrent refreshes of the same ST.
  • The scheduler is driven by the background worker polling at the pg_trickle.scheduler_interval_ms GUC interval.

Shared Memory (src/shmem.rs)

The scheduler background worker and user sessions share a PgTrickleSharedState structure protected by a PgLwLock. Key fields:

FieldTypePurpose
dag_versionu64Incremented when the ST catalog changes; used by the scheduler to detect when the DAG needs rebuilding.
scheduler_pidi32PID of the scheduler background worker (0 if not running).
scheduler_runningboolWhether the scheduler is active.
last_scheduler_wakei64Unix timestamp of the last scheduler wake cycle (for monitoring).

A separate PgAtomic<AtomicU64> named DAG_REBUILD_SIGNAL is incremented by API functions (create, alter, drop) after catalog mutations. The scheduler compares its local copy against the atomic counter to detect when to rebuild its in-memory DAG without holding a lock.

A second PgAtomic<AtomicU64> named CACHE_GENERATION tracks DDL events that may invalidate cached delta or MERGE templates across backends. When DDL hooks fire (view change, ALTER TABLE, function change) or API functions mutate the catalog, CACHE_GENERATION is bumped. Each backend maintains a thread-local generation counter; on the next refresh, if the shared generation has advanced, the backend flushes its delta template cache, MERGE template cache, and explicitly DEALLOCATEs tracked __pgt_merge_* prepared statements before rebuilding local state.

9. DDL Tracking (src/hooks.rs)

Event triggers monitor DDL changes to source tables and functions:

  • _on_ddl_end — Fires on ALTER TABLE to detect column adds/drops/type changes. If a source table used by a ST is altered, the ST's needs_reinit flag is set. Also detects CREATE OR REPLACE FUNCTION / ALTER FUNCTION — if the function appears in a ST's functions_used catalog column, the ST is marked for reinit.
  • _on_sql_drop — Fires on DROP TABLE to set needs_reinit for affected STs. Also detects DROP FUNCTION and marks affected STs for reinit.
  • Function name extractionobject_identity strings (e.g., public.my_func(integer, text)) are parsed to extract the bare function name, which is matched against the functions_used TEXT[] column in pgt_stream_tables.

Reinitialization is deferred until the next refresh cycle, which then performs a REINITIALIZE action (drop and recreate the storage table from the updated query).

10. Error Handling (src/error.rs)

Centralized error types using thiserror:

  • PgTrickleError variants cover catalog access, SQL execution, CDC, DVM, DAG, and config errors.
  • Each refresh failure increments consecutive_errors.
  • When consecutive_errors reaches pg_trickle.max_consecutive_errors (default 3), the ST is moved to ERROR status and suspended from automatic refresh.
  • Manual intervention (ALTER ... status => 'ACTIVE') resets the counter.

11. Monitoring (src/monitor.rs)

Provides observability functions:

  • st_refresh_stats — Aggregate statistics (total/successful/failed refreshes, avg duration, staleness status).
  • get_refresh_history — Per-ST audit trail.
  • get_staleness — Current staleness in seconds.
  • slot_health — Checks replication slot state and WAL retention.
  • check_cdc_health — Per-source CDC health status including mode, slot lag, confirmed LSN, and alerts.
  • explain_st — Describes the DVM plan for a given ST.
  • diamond_groups — Lists detected diamond dependency groups, their members, convergence points, and epoch counters.
  • Viewspgtrickle.stream_tables_info (computed staleness) and pgtrickle.pg_stat_stream_tables (combined stats).

NOTIFY Alerting

Operational events are broadcast via PostgreSQL NOTIFY on the pg_trickle_alert channel. Clients can subscribe with LISTEN pg_trickle_alert; and receive JSON-formatted events:

EventCondition
staledata staleness exceeds 2× schedule
auto_suspendedST suspended after pg_trickle.max_consecutive_errors failures
reinitialize_neededUpstream DDL change detected
slot_lag_warningReplication slot WAL retention exceeded pg_trickle.slot_lag_warning_threshold_mb
cdc_transition_completeSource transitioned from trigger to WAL-based CDC
cdc_transition_failedTrigger→WAL transition failed (fell back to triggers)
refresh_completedRefresh completed successfully
refresh_failedRefresh failed with an error

12. Row ID Hashing (src/hash.rs)

Provides deterministic 64-bit row identifiers using xxHash (xxh64) with a fixed seed. Two SQL functions are exposed:

  • pgtrickle.pg_trickle_hash(text) — Hash a single text value; used for simple single-column row IDs.
  • pgtrickle.pg_trickle_hash_multi(text[]) — Hash multiple values (separated by a record-separator byte \x1E) for composite keys (join row IDs, GROUP BY keys).

Row IDs are written into every stream table's storage as an internal __pgt_row_id BIGINT column and are used by the delta application phase to match DELETE candidates precisely.

13. Diamond Dependency Consistency (src/dag.rs)

When stream tables form diamond-shaped dependency graphs, a convergence (fan-in) node may read from multiple upstream STs that share a common ancestor:

        A (source table)
       / \
      B   C   (intermediate STs)
       \ /
        D     (convergence / fan-in ST)

If B refreshes successfully but C fails, D would read a fresh version of B's data alongside stale data from C — a split-version inconsistency.

Detection

StDag::detect_diamonds() walks all fan-in nodes (STs with multiple upstream ST dependencies) and computes transitive ancestor sets per branch. If two or more branches share ancestors, a diamond is detected. Overlapping diamonds are merged.

Consistency Groups

StDag::compute_consistency_groups() converts detected diamonds into consistency groups — topologically ordered sets of STs that must be refreshed atomically. Each group contains:

  • Members — All intermediate STs plus the convergence node, in refresh order.
  • Convergence points — The fan-in nodes where multiple paths meet.
  • Epoch counter — Advances on each successful atomic refresh.

STs not involved in any diamond are placed in singleton groups (no overhead).

Scheduler Wiring

When diamond_consistency = 'atomic' (per-ST or via the pg_trickle.diamond_consistency GUC):

  1. The scheduler wraps each multi-member group in a SAVEPOINT pgt_consistency_group.
  2. Each member is refreshed in topological order within the savepoint.
  3. If all succeedRELEASE SAVEPOINT and advance the group epoch.
  4. If any member failsROLLBACK TO SAVEPOINT undoes all members' changes. The failure is logged and the group retries on the next scheduler tick.

With diamond_consistency = 'none', members refresh independently in topological order — matching pre-feature behavior.

Schedule Policy

The diamond_schedule_policy setting (per-convergence-node or via the pg_trickle.diamond_schedule_policy GUC) controls when an atomic group fires:

PolicyTrigger conditionTrade-off
'fastest' (default)Any member is dueHigher freshness, more refreshes
'slowest'All members are dueLower resource cost, staler data

The policy is set on the convergence (fan-in) node. When multiple convergence nodes exist in the same group (nested diamonds), the strictest policy wins (slowest > fastest). The GUC serves as a cluster-wide fallback for nodes without an explicit per-node setting.

Monitoring

The pgtrickle.diamond_groups() SQL function exposes detected groups for operational visibility. See SQL_REFERENCE.md for details.

14. pg_tide Integration

Extracted in v0.46.0. The outbox, inbox, and relay subsystems were moved to the standalone pg_tide extension to give event messaging its own focused release cadence and reduce the surface area of pg_trickle.

What Stays in pg_trickle

  • attach_outbox() integration hook — a lightweight hook that pg_tide calls after each successful refresh cycle to publish the delta summary to pg_tide's outbox table. pg_trickle itself never writes to the outbox; it only invokes the hook.
  • Change buffer subscription — pg_trickle exposes the internal change buffer (pgtrickle_changes.changes_<oid>) as a stable interface so pg_tide consumers can subscribe to raw CDC events without going through the refresh engine.

What Lives in pg_tide

  • enable_outbox() / poll_outbox() — outbox provisioning and polling API.
  • Consumer groups and visibility lease management.
  • Claim-check mode for large payloads.
  • create_inbox() / enable_inbox_ordering() — inbox provisioning.
  • FNV-1a consistent hashing (inbox_is_my_partition()) for horizontal scaling.
  • The pgtrickle-relay binary — forwards outbox rows to Kafka, NATS, SQS, and other transports.

API Documentation

See the pg_tide repository for the complete API reference, deployment guide, and relay architecture.

15. Stream Table Snapshots (src/api/snapshot.rs)

Added in v0.27.0.

snapshot_stream_table(name) exports the current content of a stream table into an archival table, capturing the extension version and current frontier in metadata columns (__pgt_snapshot_version, __pgt_frontier, __pgt_snapshotted_at).

restore_from_snapshot(name, source) truncates the stream table and reloads it from the snapshot, then restores the saved frontier. This ensures the next refresh cycle is DIFFERENTIAL — skipping the expensive full re-scan that would otherwise follow a blank stream table.

Primary use cases: replica bootstrap, PITR alignment, and historical archiving.

16. Configuration (src/config.rs)

Runtime behavior is controlled by a growing set of GUC (Grand Unified Configuration) variables. See CONFIGURATION.md for the complete, current list.

GUCDefaultPurpose
pg_trickle.enabledtrueMaster on/off switch for the scheduler
pg_trickle.scheduler_interval_ms1000Scheduler background worker wake interval (ms)
pg_trickle.min_schedule_seconds60Minimum allowed schedule
pg_trickle.max_consecutive_errors3Errors before auto-suspending a ST
pg_trickle.change_buffer_schemapgtrickle_changesSchema for change buffer tables
pg_trickle.max_concurrent_refreshes4Maximum parallel refresh workers
pg_trickle.differential_max_change_ratio0.15Change-to-table-size ratio above which DIFFERENTIAL falls back to FULL
pg_trickle.cleanup_use_truncatetrueUse TRUNCATE instead of DELETE for change buffer cleanup when the entire buffer is consumed
pg_trickle.user_triggers'auto'User-defined trigger handling: auto / off (on accepted as deprecated alias for auto)
pg_trickle.block_source_ddlfalseBlock column-affecting DDL on tracked source tables instead of reinit
pg_trickle.cdc_mode'auto'CDC mechanism: auto / trigger / wal
pg_trickle.wal_transition_timeout300Max seconds to wait for WAL decoder catch-up during transition
pg_trickle.slot_lag_warning_threshold_mb100Warning threshold for WAL slot retention used by slot_lag_warning and health_check()
pg_trickle.slot_lag_critical_threshold_mb1024Critical threshold for WAL slot retention used by check_cdc_health() alerts
pg_trickle.diamond_consistency'atomic'Diamond dependency consistency mode: atomic or none
pg_trickle.diamond_schedule_policy'fastest'Schedule policy for atomic diamond groups: fastest or slowest
pg_trickle.merge_planner_hintstrueInject SET LOCAL planner hints (disable nestloop, raise work_mem) before MERGE
pg_trickle.merge_work_mem_mb64work_mem (MB) applied when delta exceeds 10 000 rows and planner hints enabled
pg_trickle.use_prepared_statementstrueUse SQL PREPARE/EXECUTE for cached MERGE templates

Data Flow: End-to-End Refresh

 Source Table INSERT/UPDATE/DELETE
           │
           ▼
 Hybrid CDC Layer:
   ┌─────────────────────────────────────────────┐
   │ TRIGGER mode: Row-Level AFTER Trigger        │
   │   pg_trickle_cdc_fn_<oid>() → buffer table    │
   │                                              │
   │ WAL mode: Logical Replication Slot           │
   │   wal_decoder bgworker → same buffer table   │
   │                                              │
   │ ST-to-ST: Refresh engine captures delta      │
   │   → changes_pgt_<pgt_id> buffer table        │
   └─────────────────────────────────────────────┘
           │
           ▼
 Change Buffer Table
   Base tables:   pgtrickle_changes.changes_<oid>
   ST sources:    pgtrickle_changes.changes_pgt_<pgt_id>
   Columns: change_id, lsn, action (I/U/D), pk_hash, new_<col>, old_<col> (typed)
           │
           ▼
 DVM Engine: generate delta SQL from operator tree
   - Scan operator reads from changes_<oid> or changes_pgt_<id>
   - Filter/Project/Join transform the deltas
   - Aggregate recomputes affected groups
           │
           ▼
 Diff Engine: produce (+/-) diff rows
           │
           ▼
 Delta Application:
   DELETE FROM storage WHERE __pgt_row_id IN (removed)
   INSERT INTO storage SELECT ... FROM (added)
           │
           ▼
 Frontier Update: advance per-source LSN
           │
           ▼
 History Record: log to pgtrickle.pgt_refresh_history

Module Map

src/
├── lib.rs           # Extension entry, module declarations, _PG_init
├── bin/
│   └── pgrx_embed.rs# pgrx SQL entity embedding (generated)
├── api/
│   ├── mod.rs       # Core lifecycle functions (create/alter/drop/refresh/status)
│   ├── diagnostics.rs   # explain_st, explain_refresh_mode, dependency_tree
│   ├── outbox_hook.rs   # pg_tide integration hook (attach_outbox, v0.46.0+)
│   ├── snapshot.rs  # Stream table snapshots (v0.27.0)
│   ├── self_monitoring.rs  # Self-monitoring setup/teardown
│   ├── cluster.rs   # cluster_worker_summary
│   ├── publication.rs   # Logical publication helpers
│   ├── metrics_ext.rs   # Extended Prometheus metrics
│   ├── planner.rs   # Schedule recommendation API
│   └── helpers.rs   # Shared utilities
├── catalog.rs       # Catalog CRUD operations
├── cdc.rs           # Change data capture (triggers + WAL transition)
├── config.rs        # GUC variable registration
├── dag.rs           # Dependency graph (cycle detection, SCC decomposition, topo sort)
├── error.rs         # Centralized error types
├── hash.rs          # xxHash row ID generation (pg_trickle_hash / pg_trickle_hash_multi)
├── hooks.rs         # DDL event trigger handlers (_on_ddl_end, _on_sql_drop)
├── ivm.rs           # Transactional IVM (IMMEDIATE mode: statement-level triggers)
├── shmem.rs         # Shared memory state (PgTrickleSharedState, DAG_REBUILD_SIGNAL, CACHE_GENERATION)
├── dvm/
│   ├── mod.rs       # DVM module root + recursive CTE orchestration
│   ├── parser/      # Query → OpTree converter (modularized, G13-PRF)
│   │   ├── mod.rs        # FFI helpers, macros, entry points, tests
│   │   ├── types.rs      # OpTree, Expr, Column, AggExpr, etc.
│   │   ├── validation.rs # Volatility, IVM support, IMMEDIATE, monotonicity
│   │   ├── rewrites.rs   # SQL rewrite passes (view inlining, grouping sets)
│   │   └── sublinks.rs   # SubLink extraction from WHERE clauses
│   ├── diff.rs      # Delta SQL generation (CTE delta cache)
│   ├── row_id.rs    # Row ID generation
│   └── operators/
│       ├── mod.rs           # Operator trait + registry
│       ├── scan.rs          # Table scan (CDC passthrough)
│       ├── filter.rs        # WHERE clause filtering
│       ├── project.rs       # Column projection
│       ├── join.rs          # Inner join
│       ├── join_common.rs   # Shared join utilities (snapshot subqueries, column disambiguation)
│       ├── outer_join.rs    # LEFT/RIGHT outer join
│       ├── full_join.rs     # FULL OUTER JOIN (8-part delta)
│       ├── aggregate.rs     # GROUP BY + aggregate functions (39 AggFunc variants)
│       ├── distinct.rs      # DISTINCT deduplication
│       ├── union_all.rs     # UNION ALL merging
│       ├── intersect.rs     # INTERSECT / INTERSECT ALL (dual-count LEAST)
│       ├── except.rs        # EXCEPT / EXCEPT ALL (dual-count GREATEST)
│       ├── subquery.rs      # Subquery / inlined CTE delegation
│       ├── cte_scan.rs      # Shared CTE delta (multi-reference)
│       ├── recursive_cte.rs # Recursive CTE (semi-naive + DRed + recomputation)
│       ├── window.rs        # Window function (partition recomputation)
│       ├── lateral_function.rs  # LATERAL SRF (row-scoped recomputation)
│       ├── lateral_subquery.rs  # LATERAL correlated subquery
│       ├── semi_join.rs     # EXISTS / IN subquery (semi-join delta)
│       ├── anti_join.rs     # NOT EXISTS / NOT IN subquery (anti-join delta)
│       └── scalar_subquery.rs   # Correlated scalar subquery in SELECT
├── monitor.rs       # Monitoring & observability functions
├── refresh.rs       # Refresh orchestration
├── scheduler.rs     # Automatic scheduling with canonical periods
├── version.rs       # Frontier / LSN tracking
└── wal_decoder.rs   # WAL-based CDC (logical replication slot polling, transitions)

Extension Control File (pg_trickle.control)

The pg_trickle.control file in the repository root is required by PostgreSQL's extension infrastructure. It declares the extension's description, default version, shared-library path, and privilege requirements. PostgreSQL reads this file when CREATE EXTENSION pg_trickle; is executed.

During packaging (cargo pgrx package), pgrx replaces the @CARGO_VERSION@ placeholder with the version from Cargo.toml and copies the file into the target's share/extension/ directory alongside the SQL migration scripts.


Note: The relay binary (pgtrickle-relay), outbox, and inbox subsystems were extracted to the standalone pg_tide extension in v0.46.0. See § 14 pg_tide Integration and the pg_tide repository for the relay architecture and deployment guide.

DVM Operators

This document describes the Differential View Maintenance (DVM) operators implemented by pgtrickle. Each operator transforms a stream of row-level changes (deltas) propagated from source tables through the operator tree.

Quick Reference

OperatorFULLDIFFIMMEDSection
Simple SELECT / projectionScan & Project
WHERE filterFilter
DISTINCTDistinct
INNER JOINJoins
LEFT / RIGHT OUTER JOINJoins
FULL OUTER JOINJoins
LATERAL JOINJoins
Multi-table join (≥3 right scans)⚠️⚠️Joins
EXISTS / NOT EXISTSSubqueries
Scalar subquerySubqueries
UNION ALLSet Operations
INTERSECT / EXCEPTSet Operations
COUNT, SUM, AVGAggregates
MIN / MAXAggregates
COUNT(DISTINCT) / SUM(DISTINCT)Aggregates
STRING_AGG / ARRAY_AGG⚠️⚠️Aggregates
JSONB_AGG / JSONB_OBJECT_AGG⚠️⚠️Aggregates
Window functions⚠️⚠️Window Functions
ORDER BY … LIMIT (TopK)TopK
HAVINGHaving
GROUP BY ROLLUP / CUBEGrouping Sets
Recursive CTEs⚠️⚠️CTEs
vector_avg / halfvec_avgVector Aggregates

Legend: ✅ Supported · ⚠️ Partial (see section) · ❌ Not supported


Prior Art

  • Budiu, M. et al. (2023). "DBSP: Automatic Incremental View Maintenance." VLDB 2023. (comparison)
  • Gupta, A. & Mumick, I.S. (1999). Materialized Views: Techniques, Implementations, and Applications. MIT Press.
  • Koch, C. et al. (2014). "DBToaster: Higher-order Delta Processing for Dynamic, Frequently Fresh Views." VLDB Journal.
  • PostgreSQL 9.4+ — Materialized views with REFRESH MATERIALIZED VIEW CONCURRENTLY.

Overview

When a stream table is created, the defining SQL query is parsed into a tree of DVM operators. During an differential refresh, changes flow bottom-up through this tree:

         Aggregate
            │
         Project
            │
          Filter
            │
    ┌───────┴───────┐
   Join             │
  ┌─┴─┐            │
Scan(A) Scan(B)   Scan(C)

Each operator implements a differentiation rule: given the delta (Δ) to its input(s), it produces the corresponding delta to its output. This is conceptually similar to automatic differentiation in calculus.

The general contract:

  • Input: a set of ('+', row) and ('-', row) tuples (inserts and deletes)
  • Output: a set of ('+', row) and ('-', row) tuples

Updates are modeled as a delete of the old row followed by an insert of the new row.

DIFFERENTIAL and IMMEDIATE maintenance require deterministic expressions. VOLATILE functions and custom operators such as random() or clock_timestamp() are rejected during stream table creation because re-evaluation would corrupt delta semantics. STABLE functions such as now() and current_timestamp are allowed with a warning; FULL mode accepts all volatility classes because it recomputes the full result on each refresh.


Operator Support Matrix

The following table shows which SQL constructs are supported under each refresh mode.

SQL ConstructFULLDIFFERENTIALIMMEDIATENotes
Basic
Simple SELECT / projection
WHERE filter
Column expressions / aliases
DISTINCTUses __pgt_dup_count reference counting
DISTINCT ON
Joins
INNER JOINHybrid delta strategy
LEFT OUTER JOINNULL-padding transitions tracked
RIGHT OUTER JOIN
FULL OUTER JOIN8-part UNION ALL delta
CROSS JOIN
LATERAL JOINRow-scoped re-execution
Multi-table join (≤2 right scans)Full phantom-row-after-DELETE fix
Multi-table join (≥3 right scans)⚠️⚠️Falls back to post-change snapshot for right subtree (EC-01 boundary, fix planned for v0.12.0)
Subqueries
EXISTS / IN (semi-join)Delta-key pre-filter on left side
NOT EXISTS / NOT IN (anti-join)Inverted semantics; two-part delta
Scalar subquery (SELECT-list)Pre/post snapshot EXCEPT ALL diff
Correlated LATERAL subquery
Set Operations
UNION ALLDual-branch merge
INTERSECT / INTERSECT ALLDual-count tracking
EXCEPT / EXCEPT ALL
Aggregates
COUNT, SUM, AVGAlgebraic — fully invertible delta
MIN, MAXSemi-algebraic — group rescan on ambiguous delete
COUNT(DISTINCT), SUM(DISTINCT)Algebraic via auxiliary columns
BOOL_AND, BOOL_OR, BIT_AND, BIT_ORAlgebraic via auxiliary columns
EVERYAlgebraic via auxiliary columns
STRING_AGG, ARRAY_AGG⚠️⚠️Group-rescan strategy — warning emitted at creation time in DIFFERENTIAL mode
STDDEV, VARIANCE, STDDEV_POP, VAR_POPAlgebraic via auxiliary M2/sum/count columns
COVAR_SAMP, COVAR_POP, CORRAlgebraic via auxiliary columns
REGR_* (all 9 regression functions)Algebraic via auxiliary columns
PERCENTILE_CONT, PERCENTILE_DISC⚠️⚠️Group-rescan strategy
MODE⚠️⚠️Group-rescan strategy
XMLAGG, JSON_AGG, JSONB_AGG⚠️⚠️Group-rescan strategy
JSON_OBJECT_AGG, JSONB_OBJECT_AGG⚠️⚠️Group-rescan strategy
GROUP BY / HAVING
GROUP BY ROLLUP / CUBE / GROUPING SETSBranch count capped by max_grouping_set_branches (default 64)
Window Functions
ROW_NUMBER, RANK, DENSE_RANKPartition-scoped recompute
LAG, LEAD, FIRST_VALUE, LAST_VALUEPartition-scoped recompute
NTILE, CUME_DIST, PERCENT_RANKPartition-scoped recompute
Window frame clauses (ROWS, RANGE, GROUPS)
CTEs
Non-recursive WITHInlined or delta-cached (multi-ref)
WITH RECURSIVE (INSERT-only workload)Semi-naive evaluation
WITH RECURSIVE (mixed INSERT/DELETE/UPDATE)Delete-and-Rederive (DRed) strategy
TopK
ORDER BY … LIMIT NScoped recomputation; metadata validated each refresh
ORDER BY … LIMIT N OFFSET M
Lateral / SRF
LATERAL with set-returning functionRow-scoped re-execution
JSON_TABLEVia lateral function operator
generate_series()
unnest()
ST-to-ST Dependencies
Stream table reading from another stream tableDifferential via changes_pgt_ buffers (v0.11.0); FULL upstream produces I/D diff so downstream stays differential
Multi-level ST chainsTopological order; per-level delta propagation
Function Volatility
IMMUTABLE functions
STABLE functions (now(), current_timestamp)⚠️⚠️Allowed with warning — value may differ between initial load and delta evaluation
VOLATILE functions (random(), clock_timestamp())Rejected at creation time — re-evaluation corrupts delta semantics

Legend: ✅ = fully supported — ⚠️ = supported with caveats (see Notes column) — ❌ = not supported (blocked at creation time)


Operators

Scan

Module: src/dvm/operators/scan.rs

The leaf operator. Reads CDC changes from a source table's change buffer.

Delta Rule:

$$\Delta(\text{Scan}(R)) = \Delta R$$

The scan operator is a direct passthrough — inserts in the source become inserts in the output, deletes become deletes.

SQL Generation:

SELECT op, row_data FROM pgtrickle_changes.changes_<oid>
WHERE xid >= <last_consumed_xid>

Notes:

  • Each source table has a dedicated change buffer table created by the CDC module.
  • Row data is stored as JSONB with column names as keys.
  • The __pgt_row_id column (xxHash of primary key) is included for deduplication.

Filter

Module: src/dvm/operators/filter.rs

Applies a WHERE clause predicate to the delta stream.

Delta Rule:

$$\Delta(\sigma_p(R)) = \sigma_p(\Delta R)$$

Filtering is applied to the deltas in the same way as to the base data — only rows satisfying the predicate pass through.

SQL Generation:

SELECT * FROM (<input_delta>) AS d
WHERE <predicate>

Example:

If the defining query is:

SELECT * FROM orders WHERE status = 'shipped'

And a new row (id=5, status='pending') is inserted, it does not appear in the delta output. If (id=3, status='shipped') is inserted, it passes through.

Edge Cases:

  • For updates that change the predicate column (e.g., status from 'pending' to 'shipped'), the CDC produces a delete of the old row and insert of the new row. The filter passes the insert (matches) and blocks the delete (doesn't match the old row against the predicate), correctly resulting in a net insert.

Project

Module: src/dvm/operators/project.rs

Applies column projection from the target list.

Delta Rule:

$$\Delta(\pi_L(R)) = \pi_L(\Delta R)$$

Projects the same columns from the delta that the query projects from the base data.

SQL Generation:

SELECT <target_columns> FROM (<input_delta>) AS d

Notes:

  • Projection is applied after filtering for efficiency.
  • Computed expressions in the target list (e.g., price * quantity AS total) are evaluated on the delta rows.

Join (Inner)

Module: src/dvm/operators/join.rs

Implements inner join between two inputs.

Delta Rule:

For $R \bowtie S$:

$$\Delta(R \bowtie S) = (\Delta R \bowtie S) \cup (R' \bowtie \Delta S)$$

Where $R' = R \cup \Delta R$ (the new state of R after applying deltas).

In practice, when only one side has changes (common case), the delta join simplifies to joining the changed rows against the current state of the other side.

SQL Generation:

-- Changes to left side joined with current right side
SELECT '+' AS op, l.*, r.*
FROM (<left_delta> WHERE op = '+') AS l
JOIN <right_table> AS r ON <join_condition>

UNION ALL

-- Current left side joined with changes to right side
SELECT '+' AS op, l.*, r.*
FROM <left_table> AS l
JOIN (<right_delta> WHERE op = '+') AS r ON <join_condition>

(And corresponding DELETE queries for op = '-'.)

Notes:

  • The join uses the current state of the non-changed side, not the change buffer.
  • For equi-joins, this is efficient — the join key narrows the scan.
  • Non-equi joins (theta joins) may require broader scans.

Outer Join

Module: src/dvm/operators/outer_join.rs (LEFT JOIN), src/dvm/operators/full_join.rs (FULL JOIN)

Implements LEFT, RIGHT, and FULL OUTER JOIN.

RIGHT JOIN Handling:

RIGHT JOIN is automatically converted to a LEFT JOIN with swapped left/right operands during query parsing. This normalization happens transparently — the user can write RIGHT JOIN and the parser rewrites it to an equivalent LEFT JOIN before the operator tree is constructed.

Delta Rule:

Similar to inner join, but additionally handles NULL-padded rows:

$$\Delta(R \text{ LEFT JOIN } S) = (\Delta R \bowtie_L S) \cup (R' \bowtie_L \Delta S)$$

With special handling for:

  • Rows in ΔR that have no match in S → emit ('+', row, NULLs)
  • Rows in ΔS that create a first match for an R row → emit ('-', row, NULLs) and ('+', row, s_data)
  • Rows in ΔS that remove the last match for an R row → emit ('-', row, s_data) and ('+', row, NULLs)

SQL Generation (LEFT JOIN):

Uses anti-join detection (via NOT EXISTS) to correctly handle the NULL padding transitions.

FULL OUTER JOIN Delta Rule:

FULL OUTER JOIN extends the LEFT JOIN delta with symmetric right-side handling. The delta is computed as an 8-part UNION ALL:

  1. Parts 1–5: Same as LEFT JOIN delta (inserted/deleted rows from both sides, with NULL-padding transitions)
  2. Parts 6–7: Symmetric anti-join transitions for the right side (rows in ΔL that remove/create the last/first match for an S row)
  3. Part 8: Right-side insertions that have no match in the left side → emit ('+', NULLs, s_data)

Each part uses pre-computed delta flags (__has_ins_*, __has_del_*) to efficiently detect first-match/last-match transitions without redundant subqueries.

Nested Join Support:

Module: src/dvm/operators/join_common.rs

All join operators (inner, left, full) support nested children — i.e., a join whose left or right operand is itself another join. The join_common module provides shared helpers:

  • build_snapshot_sql() — returns the table reference for simple (Scan) operands, or a parenthesized subquery with disambiguated columns for nested join operands
  • rewrite_join_condition() — rewrites column references in ON conditions to use the correct alias prefixes for nested children (e.g., o.cust_iddl.o__cust_id)

This enables queries with 3 or more joined tables, e.g.:

SELECT o.id, c.name, p.title
FROM orders o
JOIN customers c ON o.cust_id = c.id
JOIN products p ON o.prod_id = p.id

Limitations:

  • FULL OUTER JOIN delta computation can be expensive due to dual-side NULL tracking (8 UNION ALL parts).
  • Performance degrades with high-cardinality join keys.
  • NATURAL JOIN is supported — common columns are resolved automatically and synthesized into an explicit equi-join condition.
  • EC-01 pre-change snapshot boundary (SF-5): The phantom-row-after-DELETE fix (EC-01) uses EXCEPT ALL to reconstruct the pre-change state of a join side. This is limited to join subtrees with ≤ 2 scan nodes to avoid PostgreSQL temporary file exhaustion on wide join chains. For queries with ≥ 3 base tables on one side of a join (e.g. TPC-H Q7/Q8/Q9), a simultaneous DELETE on both join sides may leave a phantom row in the stream table until the next full refresh. See use_pre_change_snapshot() in join_common.rs for the full rationale.

Aggregate

Module: src/dvm/operators/aggregate.rs

Handles GROUP BY with aggregate functions (COUNT, SUM, AVG, MIN, MAX, BOOL_AND, BOOL_OR, STRING_AGG, ARRAY_AGG, JSON_AGG, JSONB_AGG, BIT_AND, BIT_OR, BIT_XOR, JSON_OBJECT_AGG, JSONB_OBJECT_AGG, STDDEV_POP, STDDEV_SAMP, VAR_POP, VAR_SAMP, MODE, PERCENTILE_CONT, PERCENTILE_DISC, JSON_ARRAYAGG, JSON_OBJECTAGG) and the FILTER (WHERE …) and WITHIN GROUP (ORDER BY …) clauses.

Delta Rule:

$$\Delta(\gamma_{G, \text{agg}}(R)) = \gamma_{G, \text{agg}}(R' \text{ WHERE } G \in \text{affected_keys}) - \gamma_{G, \text{agg}}(R \text{ WHERE } G \in \text{affected_keys})$$

Where:

  • $G$ = grouping columns
  • affected_keys = the set of group key values that appear in ΔR
  • $R'$ = $R \cup \Delta R$ (the new state)

Strategy:

  1. Identify affected groups — Collect all group key values that appear in the delta (either inserted or deleted rows).
  2. Recompute old values — Query the storage table for current aggregate values of affected groups.
  3. Recompute new values — Query the updated source for new aggregate values of affected groups.
  4. Diff — For each affected group:
    • If old exists and new differs → emit ('-', old) and ('+', new)
    • If old exists and new is gone → emit ('-', old) (group eliminated)
    • If no old and new exists → emit ('+', new) (new group appeared)

Supported Aggregate Functions:

FunctionDVM StrategyNotes
COUNT(*)AlgebraicFully differential
COUNT(expr)AlgebraicFully differential
SUM(expr)AlgebraicFully differential
AVG(expr)AlgebraicDecomposed to SUM/COUNT internally
MIN(expr)Semi-algebraicUses LEAST merge; falls back to per-group rescan when min row is deleted
MAX(expr)Semi-algebraicUses GREATEST merge; falls back to per-group rescan when max row is deleted
BOOL_AND(expr)Group-rescanAffected groups are re-aggregated from source data
BOOL_OR(expr)Group-rescanAffected groups are re-aggregated from source data
STRING_AGG(expr, sep)Group-rescanAffected groups are re-aggregated from source data
ARRAY_AGG(expr)Group-rescanAffected groups are re-aggregated from source data
JSON_AGG(expr)Group-rescanAffected groups are re-aggregated from source data
JSONB_AGG(expr)Group-rescanAffected groups are re-aggregated from source data
BIT_AND(expr)Group-rescanAffected groups are re-aggregated from source data
BIT_OR(expr)Group-rescanAffected groups are re-aggregated from source data
BIT_XOR(expr)Group-rescanAffected groups are re-aggregated from source data
JSON_OBJECT_AGG(key, value)Group-rescanAffected groups are re-aggregated from source data
JSONB_OBJECT_AGG(key, value)Group-rescanAffected groups are re-aggregated from source data
STDDEV_POP(expr) / STDDEV(expr)Group-rescanAffected groups are re-aggregated from source data
STDDEV_SAMP(expr)Group-rescanAffected groups are re-aggregated from source data
VAR_POP(expr)Group-rescanAffected groups are re-aggregated from source data
VAR_SAMP(expr) / VARIANCE(expr)Group-rescanAffected groups are re-aggregated from source data
MODE() WITHIN GROUP (ORDER BY expr)Group-rescanOrdered-set aggregate; affected groups re-aggregated
PERCENTILE_CONT(frac) WITHIN GROUP (ORDER BY expr)Group-rescanOrdered-set aggregate; affected groups re-aggregated
PERCENTILE_DISC(frac) WITHIN GROUP (ORDER BY expr)Group-rescanOrdered-set aggregate; affected groups re-aggregated
CORR(Y, X)Group-rescanRegression aggregate; affected groups re-aggregated
COVAR_POP(Y, X)Group-rescanRegression aggregate; affected groups re-aggregated
COVAR_SAMP(Y, X)Group-rescanRegression aggregate; affected groups re-aggregated
REGR_AVGX(Y, X)Group-rescanRegression aggregate; affected groups re-aggregated
REGR_AVGY(Y, X)Group-rescanRegression aggregate; affected groups re-aggregated
REGR_COUNT(Y, X)Group-rescanRegression aggregate; affected groups re-aggregated
REGR_INTERCEPT(Y, X)Group-rescanRegression aggregate; affected groups re-aggregated
REGR_R2(Y, X)Group-rescanRegression aggregate; affected groups re-aggregated
REGR_SLOPE(Y, X)Group-rescanRegression aggregate; affected groups re-aggregated
REGR_SXX(Y, X)Group-rescanRegression aggregate; affected groups re-aggregated
REGR_SXY(Y, X)Group-rescanRegression aggregate; affected groups re-aggregated
REGR_SYY(Y, X)Group-rescanRegression aggregate; affected groups re-aggregated
ANY_VALUE(expr)Group-rescanPostgreSQL 16+; affected groups re-aggregated
JSON_ARRAYAGG(expr ...)Group-rescanSQL-standard JSON aggregation (PostgreSQL 16+); full deparsed SQL preserved
JSON_OBJECTAGG(key: value ...)Group-rescanSQL-standard JSON aggregation (PostgreSQL 16+); full deparsed SQL preserved
User-defined aggregates (CREATE AGGREGATE)Group-rescanAny custom aggregate is supported via group-rescan; full aggregate call SQL preserved verbatim

FILTER Clause:

All aggregate functions support the FILTER (WHERE …) clause:

SELECT COUNT(*) FILTER (WHERE status = 'active') AS active_count FROM orders GROUP BY region

The filter predicate is applied within the delta computation — only rows matching the filter contribute to the aggregate delta. Filtered aggregates are excluded from the P5 direct-bypass optimization.

SQL Generation:

The aggregate operator uses a 3-CTE pipeline:

  1. Merge CTE — Joins affected group keys against old (storage) and new (source) aggregate values, producing __pgt_meta_action ('I' for new-only groups, 'D' for disappeared groups, 'U' for changed groups).
  2. LATERAL VALUES expansion — A single-pass LATERAL (VALUES ...) clause expands each merge row into insert and delete actions, avoiding a 4-branch UNION ALL:
FROM merge_cte m,
LATERAL (VALUES
    ('I', m.new_count, m.new_total),
    ('D', m.old_count, m.old_total)
) v(action, count_val, val_total)
WHERE (m.__pgt_meta_action = 'I' AND v.action = 'I')
   OR (m.__pgt_meta_action = 'D' AND v.action = 'D')
   OR (m.__pgt_meta_action = 'U')
  1. Final projection — Emits ('+', row) and ('-', row) tuples for the refresh engine.

MIN/MAX Merge Strategy:

MIN and MAX use a semi-algebraic strategy with two cases:

  1. Non-extremum deletion — When the deleted row is NOT the current minimum (or maximum), the merge uses LEAST(old_value, new_inserts) for MIN or GREATEST(old_value, new_inserts) for MAX. This is fully algebraic and requires no rescan.

  2. Extremum deletion — When the row holding the current minimum (or maximum) IS deleted, the new value cannot be computed from the delta alone. The merge expression returns NULL as a sentinel, which triggers the change-detection guard (IS DISTINCT FROM) to emit the group for re-aggregation. The MERGE layer treats this as a DELETE + INSERT pair, recomputing the group from source data. This is still more efficient than a full table refresh since only affected groups are rescanned.


Distinct

Module: src/dvm/operators/distinct.rs

Implements SELECT DISTINCT using reference counting.

Delta Rule:

$$\Delta(\delta(R)) = { r \in \Delta R : \text{count}(r, R) = 0 \land \text{count}(r, R') > 0 } - { r \in \Delta R : \text{count}(r, R) > 0 \land \text{count}(r, R') = 0 }$$

In other words:

  • A row enters the output when its count transitions from 0 to ≥1
  • A row leaves the output when its count transitions from ≥1 to 0

Strategy:

Maintains a hidden __pgt_dup_count column in the storage table to track how many times each distinct row appears in the pre-distinct input.

  1. On insert: increment count. If count was 0, emit ('+', row).
  2. On delete: decrement count. If count becomes 0, emit ('-', row).

Notes:

  • The duplicate count is not visible in user queries against the storage table (projected away by the view layer).
  • Duplicate counting uses __pgt_row_id (xxHash) for efficient lookups.

Union All

Module: src/dvm/operators/union_all.rs

Merges deltas from two branches.

Delta Rule:

$$\Delta(R \cup_{\text{all}} S) = \Delta R \cup_{\text{all}} \Delta S$$

Simply concatenates the delta streams from both branches.

SQL Generation:

SELECT * FROM (<left_delta>)
UNION ALL
SELECT * FROM (<right_delta>)

Notes:

  • Column count and types must match between branches.
  • Each branch is independently processed through its own operator sub-tree.
  • This is the simplest operator since UNION ALL preserves all duplicates.

Intersect

Module: src/dvm/operators/intersect.rs

Implements INTERSECT and INTERSECT ALL using dual-count per-branch multiplicity tracking.

Delta Rule:

$$\Delta(R \cap S): \text{emit rows where } \min(\text{count}_L, \text{count}_R) \text{ crosses the 0 boundary}$$

  • INTERSECT (set): a row is present when both branches contain it.
  • INTERSECT ALL (bag): a row appears $\min(\text{count}_L, \text{count}_R)$ times.

SQL Generation (3-CTE chain):

  1. Delta CTE — tags rows from left/right child deltas with branch indicator ('L'/'R') and computes per-row net_count.
  2. Merge CTE — joins with the storage table to compute old and new per-branch counts (__pgt_count_l, __pgt_count_r).
  3. Final CTE — detects boundary crossings using LEAST(old_count_l, old_count_r) vs LEAST(new_count_l, new_count_r).

Notes:

  • Storage table requires hidden columns __pgt_count_l and __pgt_count_r for multiplicity tracking.
  • Both set and bag variants use the same 3-CTE structure; only the boundary logic stays the same (both use LEAST).

Except

Module: src/dvm/operators/except.rs

Implements EXCEPT and EXCEPT ALL using dual-count per-branch multiplicity tracking.

Delta Rule:

$$\Delta(R - S): \text{emit rows where } \max(0, \text{count}_L - \text{count}_R) \text{ crosses the 0 boundary}$$

  • EXCEPT (set): a row is present when it exists in the left but not the right branch.
  • EXCEPT ALL (bag): a row appears $\max(0, \text{count}_L - \text{count}_R)$ times.

SQL Generation (3-CTE chain):

  1. Delta CTE — same as Intersect: tags rows from both child deltas with branch indicator.
  2. Merge CTE — joins with storage table for old/new per-branch counts.
  3. Final CTE — detects boundary crossings using GREATEST(0, old_count_l - old_count_r) vs GREATEST(0, new_count_l - new_count_r).

Notes:

  • EXCEPT is not commutative — left branch is the positive input, right is subtracted.
  • Storage table requires hidden columns __pgt_count_l and __pgt_count_r.
  • Same 3-CTE structure as Intersect with different effective-count function.

Subquery

Module: src/dvm/operators/subquery.rs

Handles both inlined CTEs and explicit subqueries in FROM ((SELECT ...) AS alias).

Delta Rule:

$$\Delta(\rho_{\text{alias}}(Q)) = \rho_{\text{alias}}(\Delta Q)$$

A subquery wrapper is transparent for differentiation — it delegates to its child's delta and optionally renames output columns to match the subquery's column aliases.

SQL Generation:

-- If column aliases differ from child output columns:
SELECT __pgt_row_id, __pgt_action, child_col1 AS alias_col1, child_col2 AS alias_col2
FROM (<child_delta>)

If the child columns already match the aliases, the subquery is a pure passthrough — no additional CTE is emitted.

Notes:

  • This operator enables both CTE support (Tier 1) and standalone subqueries in FROM.
  • Column aliases on subqueries (FROM (...) AS x(a, b)) are handled by emitting a thin renaming CTE.
  • The subquery body is fully differentiated as a normal operator sub-tree.

CTE Scan (Shared Delta)

Module: src/dvm/operators/cte_scan.rs

Handles multi-reference CTEs by computing the CTE body's delta once and reusing it across all references (Tier 2).

Delta Rule:

$$\Delta(\text{CteScan}(\text{id}, Q)) = \text{cache}[\text{id}] \quad \text{(computed once, reused)}$$

When a CTE is referenced multiple times in a query, each reference produces a CteScan node with the same cte_id. The diff engine differentiates the CTE body once and caches the result. Subsequent CteScan nodes for the same CTE reuse the cached delta.

SQL Generation:

-- First reference: differentiates the CTE body and stores result in cache
-- Subsequent references: point to the same system CTE name
SELECT __pgt_row_id, __pgt_action, <columns>
FROM __pgt_cte_<cte_name>_delta  -- shared across all references

If column aliases are present, a thin renaming CTE is added on top of the cached delta.

Notes:

  • Without CteScan (Tier 1), multi-reference CTEs are inlined: each reference duplicates the full operator sub-tree. CteScan (Tier 2) eliminates this duplication.
  • The CTE body is pre-differentiated in dependency order (earlier CTEs before later ones that reference them).
  • Column alias support follows the same pattern as the Subquery operator.

Recursive CTEs

Recursive CTEs (WITH RECURSIVE) are supported in FULL, DIFFERENTIAL, and IMMEDIATE modes, with different execution paths depending on the refresh mode:

FULL Mode

Recursive CTEs work out-of-the-box with refresh_mode = 'FULL'. The defining query is executed as-is via INSERT INTO ... SELECT ..., and PostgreSQL handles the iterative evaluation internally.

DIFFERENTIAL Mode (Three-Strategy Incremental Maintenance)

Recursive CTEs with refresh_mode = 'DIFFERENTIAL' use an automatic three-strategy approach, selected based on column compatibility and change type:

Strategy 1: Semi-Naive Evaluation (INSERT-only changes)

When only INSERT changes are present in the change buffer, pg_trickle uses semi-naive evaluation — the standard technique for incremental fixpoint computation. The base case is differentiated normally through the DVM operator tree, then the resulting delta is propagated through the recursive term using a nested WITH RECURSIVE:

WITH RECURSIVE
  __pgt_base_delta AS (
    -- Normal DVM differentiation of the base case (INSERT rows only)
    <differentiated base case>
  ),
  __pgt_rec_delta AS (
    -- Seed: base case delta rows
    SELECT cols FROM __pgt_base_delta WHERE __pgt_action = 'I'
    UNION ALL
    -- Seed: new base rows joining existing ST storage
    SELECT cols FROM <recursive term with self_ref = ST_storage, base = change_buffer>
    UNION ALL
    -- Propagation: recursive term applied to growing delta
    SELECT cols FROM <recursive term with self_ref = __pgt_rec_delta, base = full>
  )
SELECT pgtrickle.pg_trickle_hash(...) AS __pgt_row_id, 'I' AS __pgt_action, cols
FROM __pgt_rec_delta

The cost is proportional to the number of new rows produced by the change, not the full result set.

Strategy 2: Delete-and-Rederive / DRed (mixed INSERT/DELETE/UPDATE changes)

When the change buffer contains DELETE or UPDATE changes, simple propagation is insufficient — a deleted base row may have transitively derived many recursive rows, some of which may still be derivable from alternative paths. DRed handles this in four phases:

  1. Insert propagation — semi-naive evaluation for the INSERT portion (same as Strategy 1)
  2. Over-deletion cascade — propagate base-case deletions through the recursive term against ST storage to find all transitively-derived rows that might be invalidated
  3. Rederivation — re-execute the recursive CTE from the remaining (non-deleted) base rows to restore any over-deleted rows that have alternative derivations
  4. Combine — final delta = inserts + (over-deletions − rederived rows)

This avoids full recomputation while correctly handling deletions with alternative derivation paths.

IMMEDIATE Mode

Recursive CTEs with refresh_mode = 'IMMEDIATE' use the same semi-naive and Delete-and-Rederive machinery as DIFFERENTIAL mode, but the base changes come from PostgreSQL statement transition tables instead of the background change buffer. This keeps the stream table transactionally up to date within the same statement. To guard against cyclic data or unexpectedly deep recursion, the semi-naive SQL injects a depth counter capped by pg_trickle.ivm_recursive_max_depth (default 100; set to 0 to disable the guard).

Strategy 3: Recomputation Fallback

When the CTE defines more columns than the outer SELECT projects (column mismatch), the incremental strategies cannot be used because the ST storage table lacks columns needed for recursive self-joins. In this case, the full defining query is re-executed and anti-joined against current storage:

WITH __pgt_recomp_new AS (
    SELECT pgtrickle.pg_trickle_hash(row_to_json(sub)::text) AS __pgt_row_id, col1, col2, ...
    FROM (<defining_query>) sub
),
__pgt_recomp_ins AS (
    SELECT n.__pgt_row_id, 'I'::text AS __pgt_action, n.col1, n.col2, ...
    FROM __pgt_recomp_new n
    LEFT JOIN <storage_table> s ON s.__pgt_row_id = n.__pgt_row_id
    WHERE s.__pgt_row_id IS NULL
),
__pgt_recomp_del AS (
    SELECT s.__pgt_row_id, 'D'::text AS __pgt_action, s.col1, s.col2, ...
    FROM <storage_table> s
    LEFT JOIN __pgt_recomp_new n ON n.__pgt_row_id = s.__pgt_row_id
    WHERE n.__pgt_row_id IS NULL
)
SELECT * FROM __pgt_recomp_ins
UNION ALL
SELECT * FROM __pgt_recomp_del

The cost is proportional to the full result set size.

Strategy Selection
CTE columns match ST?Change typerefresh_mode / DeltaSourceStrategy
✅ MatchINSERT-onlyDIFFERENTIAL (ChangeBuffer)Semi-naive (Strategy 1)
MatchMixed (INSERT+DELETE/UPDATE)DIFFERENTIAL (ChangeBuffer)DRed (Strategy 2)
MatchINSERT-onlyIMMEDIATE (TransitionTable)Semi-naive (Strategy 1)
MatchMixed (INSERT+DELETE/UPDATE)IMMEDIATE (TransitionTable)DRed (Strategy 2)
MismatchAnyAnyRecomputation (Strategy 3)

DRed in DIFFERENTIAL mode (P2-1 -- implemented in v0.10.0)

DRed is now active in both DIFFERENTIAL and IMMEDIATE modes when CTE output columns match ST storage columns. Phase 1 propagates inserts via semi-naive evaluation; Phase 2 cascades deletions through ST storage; Phase 3 rederives over-deleted rows that have alternative derivation paths; Phase 4 combines the results. DRed correctly handles derived-column changes such as path rebuilds under a renamed ancestor node. Column-mismatch cases still use recomputation fallback.

Notes:

  • Non-linear recursion (multiple self-references in the recursive term) is rejected — PostgreSQL restricts the recursive term to reference the CTE at most once.
  • The __pgt_row_id column (xxHash of the JSON-serialized row) is used for row identity.
  • For write-heavy workloads on very large recursive result sets with frequent mixed changes, refresh_mode = 'FULL' may still be more efficient than DRed.

Window Functions

Module: src/dvm/operators/window.rs

Handles window functions (ROW_NUMBER, RANK, DENSE_RANK, SUM() OVER, etc.) using partition-based recomputation.

Delta Rule:

When any row in a partition changes (insert, update, or delete), the entire partition's window function output is recomputed:

$$\Delta(\omega_{f, P}(R)) = \omega_{f, P}(R'|{\text{affected partitions}}) - \omega{f, P}(R|_{\text{affected partitions}})$$

Where $P$ is the PARTITION BY key and $f$ is the window function.

Strategy:

  1. Identify affected partition keys from the child delta.
  2. Delete old window function results for affected partitions from storage.
  3. Build the current input for affected partitions by excluding changed rows via NOT EXISTS on pass-through columns.
  4. Recompute the window function on the current input for affected partitions.
  5. Compute unique row IDs via row_to_json + row_number (handles tied values in ranking functions).
  6. Emit the recomputed rows as inserts.

SQL Generation:

-- CTE 1: Affected partition keys from delta
WITH affected_partitions AS (
    SELECT DISTINCT <partition_cols> FROM (<child_delta>)
),
-- CTE 2: Current input (surviving rows not in delta) for affected partitions
current_input AS (
    SELECT * FROM <child_snapshot>
    WHERE (<partition_cols>) IN (SELECT * FROM affected_partitions)
    AND NOT EXISTS (
        SELECT 1 FROM (<child_delta>) d
        WHERE d.<col1> IS NOT DISTINCT FROM <child_alias>.<col1>
        AND   d.<col2> IS NOT DISTINCT FROM <child_alias>.<col2> ...
    )
),
-- CTE 3: Recompute window function with unique row IDs
recomputed AS (
    SELECT *, pgtrickle.pg_trickle_hash(
        row_to_json(w)::text || '/' || row_number() OVER ()::text
    ) AS __pgt_row_id
    FROM (
        SELECT *, <window_func> OVER (PARTITION BY <partition_cols> ORDER BY <order_cols>) AS <alias>
        FROM current_input
    ) w
)
-- Delete old results + insert recomputed results
SELECT 'D' AS __pgt_action, ...  -- old rows from affected partitions
UNION ALL
SELECT 'I' AS __pgt_action, ...  -- recomputed rows

Notes:

  • The cost is proportional to the size of affected partitions, not the full table. For workloads where changes spread across few partitions, this is efficient.
  • When multiple window functions use different PARTITION BY clauses, the parser accepts all of them. If they share the same partition key it is used directly; otherwise the operator falls back to un-partitioned (full) recomputation.
  • Without PARTITION BY, the entire table is treated as a single partition — any change triggers a full recomputation.
  • Window functions wrapping aggregates (e.g., RANK() OVER (ORDER BY SUM(x))) are supported: the window diff rewrites ORDER BY / PARTITION BY expressions to reference aggregate output aliases via build_agg_alias_map.
  • Row IDs are computed from the full row content (row_to_json) plus a positional disambiguator (row_number) to avoid hash collisions with tied ranking values (DENSE_RANK, RANK).

Known Limitation: O(partition_size) Recomputation Cost

Any single-row change within a window partition triggers recomputation of the entire partition. For queries with large partitions (e.g., PARTITION BY region where a region has 500K rows), a single INSERT into that partition causes all 500K rows to be recomputed and diffed. This is inherent to the partition-based delta strategy — window functions cannot be incrementally maintained at sub-partition granularity because a single row insertion can shift the rank, row number, or running aggregate of every other row in the same partition.

Mitigation strategies:

  • Use more granular PARTITION BY keys to keep partition sizes small.
  • For queries without PARTITION BY, consider restructuring as a GROUP BY aggregate if the window function is equivalent (e.g., SUM(x) OVER ()SUM(x) as a scalar subquery).
  • Accept the cost for low-change-frequency partitions; the recomputation is still cheaper than a full table refresh since only affected partitions are touched.
  • If partition sizes routinely exceed 100K rows and changes are frequent, consider the FULL refresh mode which bypasses the per-partition delta entirely.

Window Frame Clauses:

Window frame specifications are fully supported:

  • Modes: ROWS, RANGE, GROUPS
  • Bounds: UNBOUNDED PRECEDING, N PRECEDING, CURRENT ROW, N FOLLOWING, UNBOUNDED FOLLOWING
  • Between syntax: BETWEEN <start> AND <end>
  • Exclusion: EXCLUDE CURRENT ROW, EXCLUDE GROUP, EXCLUDE TIES, EXCLUDE NO OTHERS

Example: SUM(val) OVER (ORDER BY ts ROWS BETWEEN 3 PRECEDING AND CURRENT ROW)

Named WINDOW Clauses:

Named window definitions are resolved from the query-level WINDOW clause:

SELECT id, SUM(val) OVER w, AVG(val) OVER w
FROM data
WINDOW w AS (PARTITION BY category ORDER BY ts)

The parser resolves OVER w by looking up the window definition from the WINDOW clause and merging partition, order, and frame specifications.


Lateral Function (Set-Returning Functions in FROM)

Module: src/dvm/operators/lateral_function.rs

Handles set-returning functions (SRFs) used in the FROM clause with implicit LATERAL semantics: jsonb_array_elements, jsonb_each, jsonb_each_text, unnest, etc.

Delta Rule:

When a source row changes (insert, update, or delete), the SRF expansion is re-evaluated only for that source row:

$$\Delta(R \ltimes f(R.\text{col})) = (R' \ltimes f(R'.\text{col}))|{\text{changed rows}} - (R \ltimes f(R.\text{col}))|{\text{changed rows}}$$

Where $R$ is the source table, $f$ is the SRF, and changed rows are identified via the child delta.

Strategy (Row-Scoped Recomputation):

  1. Propagate the child delta to identify changed source rows.
  2. Find all existing ST rows derived from changed source rows (via column matching).
  3. Delete old SRF expansions for those source rows.
  4. Re-expand the SRF for inserted/updated source rows.
  5. Emit deletes + inserts as the final delta.

SQL Generation (4-CTE chain):

-- CTE 1: Changed source rows from child delta
WITH lat_changed AS (
    SELECT DISTINCT "__pgt_row_id", "__pgt_action", <child_cols>
    FROM <child_delta>
),
-- CTE 2: Old ST rows for changed source rows (to be deleted)
lat_old AS (
    SELECT st."__pgt_row_id", st.<all_output_cols>
    FROM <st_table> st
    WHERE EXISTS (
        SELECT 1 FROM lat_changed cs
        WHERE st.<col1> IS NOT DISTINCT FROM cs.<col1>
          AND st.<col2> IS NOT DISTINCT FROM cs.<col2>
          ...
    )
),
-- CTE 3: Re-expand SRF for inserted/updated source rows
lat_expand AS (
    SELECT pg_trickle_hash(<all_cols>::text) AS "__pgt_row_id",
           cs.<child_cols>, <srf_alias>.<srf_cols>
    FROM lat_changed cs,
         LATERAL <srf_function>(cs.<arg>) AS <srf_alias>
    WHERE cs."__pgt_action" = 'I'
),
-- CTE 4: Final delta
lat_final AS (
    SELECT "__pgt_row_id", 'D' AS "__pgt_action", <cols> FROM lat_old
    UNION ALL
    SELECT "__pgt_row_id", 'I' AS "__pgt_action", <cols> FROM lat_expand
)

Row Identity:

Content-based: hash(child_columns || srf_result_columns). This is stable as long as the same source row produces the same expanded values.

Supported SRFs:

FunctionOutput ColumnsNotes
jsonb_array_elements(jsonb)value (jsonb)Expands JSONB array to rows
jsonb_array_elements_text(jsonb)value (text)Text variant
jsonb_each(jsonb)key (text), value (jsonb)Expands JSONB object to key-value pairs
jsonb_each_text(jsonb)key (text), value (text)Text variant
unnest(anyarray)Element typeUnnests PostgreSQL arrays
Custom SRFsUser-provided column aliasesAS alias(col1, col2)

Notes:

  • The cost is proportional to the number of changed source rows × average SRF expansion size, not the full table.
  • WITH ORDINALITY is supported — adds a bigint ordinality column to the output.
  • ROWS FROM() with multiple functions is not supported (rejected at parse time).
  • Column aliases (e.g., AS child(value)) are used to determine output column names; for known SRFs without aliases, the alias name becomes the column name.
  • JSON_TABLE (PostgreSQL 17+) — JSON_TABLE(expr, path COLUMNS (...)) is modeled as a LateralFunction and uses the same row-scoped recomputation strategy. Supported column types: regular, EXISTS, formatted, and nested columns with ON ERROR/ON EMPTY behaviors and PASSING clauses.

Lateral Subquery (Correlated Subqueries in FROM)

Module: src/dvm/operators/lateral_subquery.rs

Handles correlated subqueries used in the FROM clause with explicit or implicit LATERAL semantics: FROM t, LATERAL (SELECT ... WHERE ref = t.col) AS alias or FROM t LEFT JOIN LATERAL (...) AS alias ON true.

Delta Rule:

When an outer row changes, the correlated subquery is re-executed only for that row:

$$\Delta(R \ltimes Q(R)) = (R' \ltimes Q(R'))|{\text{changed rows}} - (R \ltimes Q(R))|{\text{changed rows}}$$

Where $R$ is the outer table, $Q(R)$ is the correlated subquery, and changed rows are identified via the child delta.

Strategy (Row-Scoped Recomputation):

  1. Propagate the child delta to identify changed outer rows.
  2. Find all existing ST rows derived from changed outer rows (via column matching with IS NOT DISTINCT FROM).
  3. Delete old subquery expansions for those outer rows.
  4. Re-execute the subquery for inserted/updated outer rows using the original outer alias.
  5. Emit deletes + inserts as the final delta.

SQL Generation (4-CTE chain):

-- CTE 1: Changed outer rows from child delta
WITH lat_sq_changed AS (
    SELECT DISTINCT "__pgt_row_id", "__pgt_action", <child_cols>
    FROM <child_delta>
),
-- CTE 2: Old ST rows for changed outer rows (to be deleted)
lat_sq_old AS (
    SELECT st."__pgt_row_id", st.<all_output_cols>
    FROM <st_table> st
    WHERE EXISTS (
        SELECT 1 FROM lat_sq_changed cs
        WHERE st.<col1> IS NOT DISTINCT FROM cs.<col1>
          AND st.<col2> IS NOT DISTINCT FROM cs.<col2>
          ...
    )
),
-- CTE 3: Re-execute subquery for inserted/updated outer rows
lat_sq_expand AS (
    SELECT pg_trickle_hash(<all_cols>::text) AS "__pgt_row_id",
           <outer_alias>.<child_cols>, <sub_alias>.<sub_cols>
    FROM lat_sq_changed AS <outer_alias>,      -- Original outer alias!
         LATERAL (<subquery_sql>) AS <sub_alias>
    WHERE <outer_alias>."__pgt_action" = 'I'
),
-- CTE 4: Final delta
lat_sq_final AS (
    SELECT "__pgt_row_id", 'D' AS "__pgt_action", <cols> FROM lat_sq_old
    UNION ALL
    SELECT "__pgt_row_id", 'I' AS "__pgt_action", <cols> FROM lat_sq_expand
)

LEFT JOIN LATERAL Handling:

For queries using LEFT JOIN LATERAL (...) ON true, the expand CTE uses LEFT JOIN LATERAL instead of comma syntax and wraps subquery columns in COALESCE for hash stability:

lat_sq_expand AS (
    SELECT pg_trickle_hash(<outer_cols>::text || '/' || COALESCE(<sub_cols>::text, '')) AS "__pgt_row_id",
           <outer_alias>.<child_cols>, <sub_alias>.<sub_cols>
    FROM lat_sq_changed AS <outer_alias>
    LEFT JOIN LATERAL (<subquery_sql>) AS <sub_alias> ON true
    WHERE <outer_alias>."__pgt_action" = 'I'
)

Row Identity:

Content-based: hash(outer_columns || '/' || subquery_result_columns). For LEFT JOIN with NULL results, COALESCE ensures a stable hash.

Supported Patterns:

PatternSyntaxNotes
Top-N per groupLATERAL (SELECT ... ORDER BY ... LIMIT N)Most common use case
Correlated aggregateLATERAL (SELECT SUM(x) FROM t WHERE t.fk = p.pk)Returns single row per outer row
Existence with dataLEFT JOIN LATERAL (...) ON truePreserves outer rows with NULLs
Multi-column lookupLATERAL (SELECT a, b FROM t WHERE t.fk = p.pk LIMIT 1)Multiple derived values
GROUP BY inside subqueryLATERAL (SELECT type, COUNT(*) FROM t WHERE t.fk = p.pk GROUP BY type)Multiple rows per outer row

Key Design Decision: Outer Alias Rewriting

The subquery body contains column references to the outer table (e.g., WHERE li.order_id = o.id). In the expansion CTE, the changed-sources CTE is aliased with the original outer table alias (e.g., lat_sq_changed AS o) so that the subquery's column references resolve naturally without rewriting.

Notes:

  • The cost is proportional to the number of changed outer rows × average subquery result size, not the full table.
  • The subquery is stored as raw SQL (like LateralFunction) because it cannot be independently differentiated — it depends on outer row context.
  • Source table OIDs referenced by the subquery body are extracted at parse time for CDC trigger setup.
  • ORDER BY + LIMIT inside the subquery are valid (they apply per-outer-row, not to the stream table).

Semi-Join (EXISTS / IN Subquery)

Module: src/dvm/operators/semi_join.rs

Handles WHERE EXISTS (SELECT ... FROM ...) and WHERE col IN (SELECT ...) patterns. The parser transforms these into a SemiJoin operator with a left (outer) child, a right (inner) child, and a join condition.

Delta Rule:

$$\Delta(L \ltimes R) = \Delta L|{R} + L|{\Delta R \text{ causes existence change}}$$

  • Part 1: Outer rows that changed and still satisfy the semi-join condition.
  • Part 2: Existing outer rows whose semi-join result flipped due to inner changes (a matching inner row was inserted or deleted).

Strategy (Two-Part Delta):

  1. Part 1 (outer delta): Filter delta_left to rows that have at least one match in the current right-hand snapshot.
  2. Part 2 (inner delta): For each row in the left snapshot, check whether the existence of matching right-hand rows changed between the old and current state. Emit 'I' if a match appeared, 'D' if all matches disappeared.

The "old" right-hand state is reconstructed from the current state by reversing the delta: R_old = (R_current EXCEPT ALL delta_right(action='I')) UNION ALL delta_right(action='D').

Row Identity:

  • Part 1: Uses __pgt_row_id from the left delta.
  • Part 2: Content-based hash via pg_trickle_hash_multi on left-side columns.

Supported Patterns:

PatternSQLNotes
EXISTSWHERE EXISTS (SELECT 1 FROM t WHERE t.fk = s.pk)Direct semi-join
IN (subquery)WHERE id IN (SELECT fk FROM t)Rewritten to EXISTS with equality
Multiple conditionsWHERE EXISTS (... AND ...)Additional predicates in subquery WHERE

Anti-Join (NOT EXISTS / NOT IN Subquery)

Module: src/dvm/operators/anti_join.rs

Handles WHERE NOT EXISTS (SELECT ... FROM ...) and WHERE col NOT IN (SELECT ...) patterns. The inverse of the semi-join operator.

Delta Rule:

$$\Delta(L \triangleright R) = \Delta L|{\neg R} + L|{\Delta R \text{ causes existence change}}$$

  • Part 1: Outer rows that changed and have no match in the right-hand snapshot.
  • Part 2: Existing outer rows whose anti-join result flipped due to inner changes.

Strategy (Two-Part Delta):

  1. Part 1 (outer delta): Filter delta_left to rows with NOT EXISTS in the current right snapshot.
  2. Part 2 (inner delta): For each row in the left snapshot, detect existence changes. Emit 'D' if a match appeared (row no longer qualifies), 'I' if all matches disappeared (row now qualifies).

Note the inverted semantics compared to semi-join: a new match means deletion, losing all matches means insertion.

Row Identity: Same as semi-join.

Supported Patterns:

PatternSQLNotes
NOT EXISTSWHERE NOT EXISTS (SELECT 1 FROM t WHERE t.fk = s.pk)Direct anti-join
NOT IN (subquery)WHERE id NOT IN (SELECT fk FROM t)Rewritten to NOT EXISTS with equality

Scalar Subquery (Correlated SELECT Subquery)

Module: src/dvm/operators/scalar_subquery.rs

Handles scalar subqueries appearing in the SELECT list, e.g., SELECT a, (SELECT max(x) FROM t) AS mx FROM s. The subquery must return exactly one row and one column.

Delta Rule:

$$\Delta(L \times q) = \Delta L \times q' + L \times (q' - q)$$

Where $q$ is the scalar subquery value and $q'$ is the updated value.

Strategy (Two-Part Delta):

  1. Part 1 (outer delta): Propagate the child delta, appending the current scalar subquery value to each row.
  2. Part 2 (scalar value change): When the scalar subquery's result changes, emit deletes for all existing outer rows (with the old scalar value) and re-inserts for all outer rows (with the new value). The old scalar value is reconstructed by reversing the inner delta.

SQL Generation (3 or 4 CTEs):

-- Part 1: child delta + current scalar value
WITH sq_outer AS (
    SELECT *, (<scalar_subquery>) AS "<alias>"
    FROM <child_delta>
),
-- Part 2a: DELETE all outer rows when scalar changed
sq_del AS (
    SELECT "__pgt_row_id", 'D' AS "__pgt_action", <cols>
    FROM <st_table>
    WHERE (<scalar_old>) IS DISTINCT FROM (<scalar_current>)
),
-- Part 2b: INSERT all outer rows with new scalar value
sq_ins AS (
    SELECT pg_trickle_hash_multi(...) AS "__pgt_row_id",
           'I' AS "__pgt_action", <cols>, (<scalar_current>) AS "<alias>"
    FROM <source_snapshot>
    WHERE (<scalar_old>) IS DISTINCT FROM (<scalar_current>)
)
-- Final: UNION ALL of all parts
SELECT * FROM sq_outer
UNION ALL SELECT * FROM sq_del
UNION ALL SELECT * FROM sq_ins

Row Identity:

  • Part 1: __pgt_row_id from the child delta.
  • Part 2: Content-based hash via pg_trickle_hash_multi on all output columns.

Notes:

  • The scalar subquery is stored as raw SQL (deparsed from the parse tree).
  • The old scalar value is approximated using the same EXCEPT ALL / UNION ALL reversal technique as semi/anti-join.
  • If the scalar subquery references a table that changes, all outer rows must be re-evaluated — the delta can be large.
  • Source OIDs used by the scalar subquery are captured at parse time for CDC trigger registration.

Operator Tree Construction

The DVM engine builds the operator tree by analyzing the parsed query:

  1. WITH clause → CTE definitions extracted into a name→body map (non-recursive) or CTE registry (multi-reference)
  2. FROM clauseScan nodes for physical tables; Subquery nodes for inlined CTEs and subqueries in FROM; CteScan nodes for multi-reference CTEs; LateralFunction nodes for SRFs and JSON_TABLE in FROM; LateralSubquery nodes for correlated subqueries in FROM
  3. JOINJoin or OuterJoin wrapping two sub-trees
  4. LATERAL SRFsLateralFunction wrapping the left-hand FROM item as its child
  5. LATERAL subqueriesLateralSubquery wrapping the left-hand FROM item as its child (comma syntax or JOIN LATERAL)
  6. WHERE subqueriesSemiJoin for EXISTS/IN (subquery), AntiJoin for NOT EXISTS/NOT IN (subquery), extracted from the WHERE clause
  7. Scalar subqueriesScalarSubquery for (SELECT ...) in the SELECT list, wrapping the child tree
  8. WHEREFilter wrapping the scan/join tree (remaining non-subquery predicates)
  9. SELECT listProject for column selection and expressions
  10. GROUP BYAggregate wrapping the filtered/projected tree
  11. DISTINCTDistinct on top
  12. UNION ALLUnionAll combining two complete sub-trees
  13. INTERSECT / EXCEPTIntersect or Except combining two sub-trees with dual-count tracking
  14. Window functionsWindow wrapping the sub-tree with PARTITION BY / ORDER BY metadata
  15. ORDER BY → silently discarded (storage row order is undefined)
  16. LIMIT / OFFSETORDER BY + LIMIT [+ OFFSET] is accepted as TopK (scoped recomputation); standalone LIMIT or OFFSET without ORDER BY is rejected

For recursive CTEs (WITH RECURSIVE), the query is parsed into an OpTree with RecursiveCte operator nodes. In DIFFERENTIAL mode, the strategy (semi-naive, DRed, or recomputation) is selected automatically based on column compatibility and change type — see the Recursive CTEs section above for details.

The tree is then traversed bottom-up during delta generation: each operator's generate_delta_sql() method composes its SQL fragment around the output of its child operator(s).


Further Reading

DVM SQL Rewrite Rules

This document describes the transformation pipeline in src/dvm/parser/rewrites.rs that prepares a defining query for differentiation by the DVM (Differential View Maintenance) engine.

Each rewrite pass targets a specific SQL pattern, transforms it into a form the DVM engine can differentiate, and has a formal algebraic correctness argument.


Rewrite Pipeline Order

The rewrite passes are applied in sequence. Each pass may be iterated until a fixed point (no further changes) is reached.

  1. View Inlining — Replace view references with their definitions
  2. Grouping Sets Expansion — Expand CUBE/ROLLUP into UNION ALL
  3. EXISTS → Anti/Semi-Join — Convert correlated EXISTS to join operators
  4. Scalar Sublink Hoisting — Lift scalar subqueries to CTEs
  5. Delta Key Restriction — Push join key filters into R_old snapshots

1. View Inlining (rewrite_views_inline)

Input Pattern: SELECT ... FROM my_view v WHERE ...

Transformation: Replace my_view with its pg_get_viewdef() body as a subquery: SELECT ... FROM (SELECT ... FROM base_tables) v WHERE ...

Correctness: A view is semantically equivalent to its definition. Inlining is required because the DVM engine needs to see the base tables to generate per-table change buffer references.

Before:

-- Defining query referencing a view
SELECT o.customer_id, SUM(o.amount) AS total
FROM order_summary_view o
GROUP BY o.customer_id

After:

-- View inlined; base tables are now visible for CDC binding
SELECT o.customer_id, SUM(o.amount) AS total
FROM (
    SELECT orders.customer_id,
           orders.amount,
           orders.created_at
    FROM public.orders
    WHERE orders.status = 'completed'
) o
GROUP BY o.customer_id

The inlined form allows the DVM engine to bind orders as the CDC source and generate delta SQL that reads from pgtrickle_changes.changes_<orders_oid> instead of the whole table.


2. Grouping Sets Expansion (rewrite_grouping_sets)

Input Pattern: SELECT ... GROUP BY CUBE(a, b) or GROUP BY ROLLUP(a, b)

Transformation: Expand into a UNION ALL of individual GROUP BY combinations. CUBE(a, b) → GROUP BY (a, b) UNION ALL GROUP BY (a) UNION ALL GROUP BY (b) UNION ALL GROUP BY ().

Correctness: CUBE/ROLLUP is algebraically equivalent to the union of all grouping combinations. The DVM engine differentiates each branch independently, and the UNION ALL operator merges the deltas.

Guard: pg_trickle.max_grouping_set_branches (default 64) limits explosion for high-dimensional CUBE expressions.

Before:

-- ROLLUP over region + product_type
SELECT region, product_type, SUM(revenue) AS total
FROM sales
GROUP BY ROLLUP(region, product_type)

After:

-- Expanded to three GROUP BY branches
SELECT region, product_type, SUM(revenue) AS total
FROM sales
GROUP BY region, product_type

UNION ALL

SELECT region, NULL AS product_type, SUM(revenue) AS total
FROM sales
GROUP BY region

UNION ALL

SELECT NULL AS region, NULL AS product_type, SUM(revenue) AS total
FROM sales

Each branch is an independent leaf node in the OpTree. The DVM engine differentiates each branch by computing delta rows from the change buffer, then merges the results via the UNION ALL parent node.


3. EXISTS → Anti/Semi-Join Conversion

Input Pattern:

SELECT ... FROM t1 WHERE EXISTS (SELECT 1 FROM t2 WHERE t2.key = t1.key)
SELECT ... FROM t1 WHERE NOT EXISTS (SELECT 1 FROM t2 WHERE t2.key = t1.key)

Transformation: Convert to OpTree::SemiJoin or OpTree::AntiJoin with the extracted condition as the join predicate.

Correctness: EXISTS (correlated subquery) is equivalent to a semi-join; NOT EXISTS is equivalent to an anti-join. The DVM engine has specialized delta operators for both.


Input Pattern: Scalar subqueries in SELECT or WHERE:

SELECT a, (SELECT max(b) FROM t2 WHERE t2.key = t1.key) FROM t1

Transformation: Hoist the scalar subquery to a CTE and replace with a reference:

WITH __pgt_scalar_1 AS (SELECT key, max(b) AS val FROM t2 GROUP BY key)
SELECT a, s.val FROM t1 LEFT JOIN __pgt_scalar_1 s ON s.key = t1.key

Correctness: A correlated scalar subquery is equivalent to a left join to its grouped equivalent. The CTE form allows the DVM engine to differentiate the subquery as a separate operator node.


5. Delta Key Restriction (DI-6)

Input Pattern: Anti-join / semi-join R_old snapshots that scan the full right table.

Transformation: Push equi-join key filters from the delta into the R_old snapshot to restrict it to only the changed keys.

Correctness: Only right-side rows matching changed keys can affect the anti/semi-join output. Restricting R_old to changed keys preserves correctness while reducing the scan from O(n) to O(Δ).

Before:

-- Anti-join delta: which left rows lost their right-side match?
-- R_old scans ALL of the right table (O(n))
SELECT l.*
FROM left_table l
WHERE NOT EXISTS (
    SELECT 1 FROM right_table r_old WHERE r_old.key = l.key
)
AND EXISTS (
    SELECT 1 FROM delta_right d WHERE d.key = l.key
)

After:

-- R_old restricted to only rows matching changed keys (O(Δ))
SELECT l.*
FROM left_table l
WHERE NOT EXISTS (
    SELECT 1 FROM right_table r_old
    WHERE r_old.key = l.key
      AND r_old.key IN (SELECT key FROM delta_right)  -- <-- restriction added
)
AND EXISTS (
    SELECT 1 FROM delta_right d WHERE d.key = l.key
)

This rewrite is critical for join-heavy queries: without it, every anti-join delta scan reads the full right table regardless of how many rows actually changed.


Adding New Rewrite Passes

To add a new rewrite pass:

  1. Add the function in src/dvm/parser/rewrites.rs
  2. Add unit tests asserting the expected SQL output for a reference input
  3. Insert the pass at the correct position in the pipeline
  4. Document the pass in this file with input pattern, transformation, and correctness argument

See Also

pg_trickle — Benchmark Guide

This document explains how the database-level refresh benchmarks work and how to interpret their output.


Overview

The benchmark suite in tests/e2e_bench_tests.rs measures wall-clock refresh time for FULL vs DIFFERENTIAL mode across a matrix of table sizes, change rates, and query complexities. Each benchmark spawns an isolated PostgreSQL 18.x container via Testcontainers, ensuring reproducible and interference-free measurements.

The core question the benchmarks answer:

How much faster is an DIFFERENTIAL refresh compared to a FULL refresh, given a specific workload?


Prerequisites

Build the E2E test Docker image before running any benchmarks:

./tests/build_e2e_image.sh

Docker must be running on the host.


Running Benchmarks

All benchmark tests are tagged #[ignore] so they are skipped during normal CI. The --nocapture flag is required to see the printed output tables.

Quick Spot Checks (~5–10 seconds each)

# Simple scan, 10K rows, 1% change rate
cargo test --test e2e_bench_tests --features pg18 -- --ignored --nocapture bench_scan_10k_1pct

# Aggregate query, 100K rows, 1% change rate
cargo test --test e2e_bench_tests --features pg18 -- --ignored --nocapture bench_aggregate_100k_1pct

# Join + aggregate, 100K rows, 10% change rate
cargo test --test e2e_bench_tests --features pg18 -- --ignored --nocapture bench_join_agg_100k_10pct

Zero-Change Latency (~5 seconds)

cargo test --test e2e_bench_tests --features pg18 -- --ignored --nocapture bench_no_data_refresh_latency

Full Matrix (~15–30 minutes)

Runs all 30 combinations and prints a consolidated summary:

cargo test --test e2e_bench_tests --features pg18 -- --ignored --nocapture bench_full_matrix

Run All Benchmarks in Parallel

cargo test --test e2e_bench_tests --features pg18 -- --ignored --nocapture

Note: each test starts its own container, so parallel execution requires sufficient Docker resources.


Benchmark Dimensions

Table Sizes

SizeRowsPurpose
Small10,000Fast iteration; measures per-row overhead
Medium100,000More realistic; reveals scaling characteristics

Change Rates

RateDescription
1%Low churn — the sweet spot for incremental refresh
10%Moderate churn — tests delta query scalability
50%High churn — stress test; approaches full-refresh cost

Query Complexities

ScenarioDefining QueryOperators Tested
scanSELECT id, region, category, amount, score FROM srcTable scan only
filterSELECT id, region, amount FROM src WHERE amount > 5000Scan + filter (WHERE)
aggregateSELECT region, SUM(amount), COUNT(*) FROM src GROUP BY regionScan + group-by aggregate
joinSELECT s.id, s.region, s.amount, d.region_name FROM src s JOIN dim d ON ...Scan + inner join
join_aggSELECT d.region_name, SUM(s.amount), COUNT(*) FROM src s JOIN dim d ON ... GROUP BY ...Scan + join + aggregate

DML Mix per Cycle

Each change cycle applies a realistic mix of operations:

OperationFractionExample at 10K rows, 10% rate
UPDATE70%700 rows have amount incremented
DELETE15%150 rows removed
INSERT15%150 new rows added

What Each Benchmark Does

1. Start a fresh PostgreSQL 18.x container
2. Install the pg_trickle extension
3. Create and populate the source table (10K or 100K rows)
4. Create dimension table if needed (for join scenarios)
5. ANALYZE for stable query plans

── FULL mode ──
6. Create a Stream Table in FULL refresh mode
7. For each of 3 cycles:
   a. Apply random DML (updates + deletes + inserts)
   b. ANALYZE
   c. Time the FULL refresh (TRUNCATE + re-execute entire query)
   d. Record refresh_ms and ST row count
8. Drop the FULL-mode ST

── DIFFERENTIAL mode ──
9. Reset source table to same starting state
10. Create a Stream Table in DIFFERENTIAL refresh mode
11. For each of 3 cycles:
    a. Apply random DML (same parameters)
    b. ANALYZE
    c. Time the DIFFERENTIAL refresh (delta query + MERGE)
    d. Record refresh_ms and ST row count

12. Print results table and summary

Both modes start from the same data to ensure a fair comparison. The 3-cycle design captures warm-up effects (cycle 1 may be slower due to plan caching).


Reading the Output

Detail Table

╔══════════════════════════════════════════════════════════════════════════════════════╗
║                    pg_trickle Refresh Benchmark Results                      ║
╠════════════╤══════════╤════════╤═════════════╤═══════╤════════════╤═════════════════╣
║ Scenario   │ Rows     │ Chg %  │ Mode        │ Cycle │ Refresh ms │ ST Rows         ║
╠════════════╪══════════╪════════╪═════════════╪═══════╪════════════╪═════════════════╣
║ aggregate  │    10000 │     1% │ FULL        │     1 │       22.1 │               5 ║
║ aggregate  │    10000 │     1% │ FULL        │     2 │        4.8 │               5 ║
║ aggregate  │    10000 │     1% │ FULL        │     3 │        5.3 │               5 ║
║ aggregate  │    10000 │     1% │ DIFFERENTIAL │     1 │        8.4 │               5 ║
║ aggregate  │    10000 │     1% │ DIFFERENTIAL │     2 │        4.4 │               5 ║
║ aggregate  │    10000 │     1% │ DIFFERENTIAL │     3 │        4.6 │               5 ║
╚════════════╧══════════╧════════╧═════════════╧═══════╧════════════╧═════════════════╝
ColumnMeaning
ScenarioQuery complexity level (scan, filter, aggregate, join, join_agg)
RowsNumber of rows in the base table
Chg %Percentage of rows changed per cycle
ModeFULL (truncate + recompute) or DIFFERENTIAL (delta + merge)
CycleWhich of the 3 measurement rounds (cycle 1 often includes warm-up)
Refresh msWall-clock time for the refresh operation
ST RowsRow count in the Stream Table after refresh (sanity check)

Summary Table

┌─────────────────────────────────────────────────────────────────────────┐
│                        Summary (avg ms per cycle)                       │
├────────────┬──────────┬────────┬─────────────────┬──────────────────────┤
│ Scenario   │ Rows     │ Chg %  │ FULL avg ms     │ DIFFERENTIAL avg ms   │
├────────────┼──────────┼────────┼─────────────────┼──────────────────────┤
│ aggregate  │    10000 │     1% │       10.7       │        5.8 (  1.8x) │
└────────────┴──────────┴────────┴─────────────────┴──────────────────────┘

The Speedup value in parentheses is FULL avg / DIFFERENTIAL avg — how many times faster the incremental refresh is compared to a full refresh.


Interpreting the Speedup

What to Expect

Change RateTable SizeExpected SpeedupExplanation
1%10K1.5–5xSmall table; overhead is similar, delta is tiny
1%100K5–50xLarger table amplifies full-refresh cost
10%100K2–10xModerate delta; still significantly faster
50%any1–2xDelta is nearly as large as full table

Rules of Thumb

SpeedupInterpretation
> 10xStrong win for DIFFERENTIAL — typical at low change rates on larger tables
5–10xClear advantage for DIFFERENTIAL
2–5xModerate advantage — DIFFERENTIAL is the right choice
1–2xMarginal gain — either mode is acceptable
~1xBreak-even — change rate is too high for incremental to help
< 1xDIFFERENTIAL is slower — would indicate overhead exceeds savings (investigate)

Key Patterns to Look For

  1. Scaling with table size: For the same change rate, speedup should increase with table size. FULL must re-process all rows; DIFFERENTIAL processes only the delta.

  2. Degradation with change rate: As change rate rises from 1% → 50%, speedup should decrease. At 50%, DIFFERENTIAL processes half the table which approaches FULL cost.

  3. Query complexity amplifies speedup: Aggregate and join queries benefit more from DIFFERENTIAL because they avoid expensive re-computation. A join_agg at 1% changes should show higher speedup than a simple scan at the same parameters.

  4. Cycle 1 warm-up: The first cycle in each mode may be slower due to PostgreSQL plan cache population. Use cycles 2–3 for the steadiest numbers.

  5. ST Rows consistency: The ST row count should be similar between FULL and DIFFERENTIAL for the same scenario (accounting for random DML). Large discrepancies indicate a correctness issue.


Zero-Change Latency

The bench_no_data_refresh_latency test measures the overhead of a refresh when no data has changed — the NO_DATA code path.

┌──────────────────────────────────────────────┐
│ NO_DATA Refresh Latency (10 iterations)      │
├──────────────────────────────────────────────┤
│ Avg:     3.21 ms                             │
│ Max:     5.10 ms                             │
│ Target: < 10 ms                              │
│ Status: ✅ PASS                              │
└──────────────────────────────────────────────┘
MetricMeaning
AvgAverage wall-clock time across 10 no-op refreshes
MaxWorst-case single iteration
TargetThe PLAN.md goal: < 10 ms per no-op refresh
StatusPASS if avg < 10 ms, SLOW otherwise

A passing result confirms the scheduler's per-cycle overhead is negligible. Values > 10 ms in containerized environments may be acceptable due to Docker overhead; bare-metal PostgreSQL should comfortably meet the target.


Available Tests

Individual Tests (10K rows)

Test NameScenarioChange Rate
bench_scan_10k_1pctscan1%
bench_scan_10k_10pctscan10%
bench_scan_10k_50pctscan50%
bench_filter_10k_1pctfilter1%
bench_aggregate_10k_1pctaggregate1%
bench_join_10k_1pctjoin1%
bench_join_agg_10k_1pctjoin_agg1%

Individual Tests (100K rows)

Test NameScenarioChange Rate
bench_scan_100k_1pctscan1%
bench_scan_100k_10pctscan10%
bench_scan_100k_50pctscan50%
bench_aggregate_100k_1pctaggregate1%
bench_aggregate_100k_10pctaggregate10%
bench_join_agg_100k_1pctjoin_agg1%
bench_join_agg_100k_10pctjoin_agg10%

Special Tests

Test NameDescription
bench_full_matrixAll 30 combinations (5 queries × 2 sizes × 3 rates)
bench_no_data_refresh_latencyZero-change overhead (10 iterations)

Nexmark Streaming Benchmark

The Nexmark benchmark validates correctness against a sustained high-frequency DML workload modelling an online auction system. It is adapted from the Nexmark benchmark specification used by streaming systems like Flink, Feldera, and Materialize.

Data Model

TableDescriptionDefault Size
personRegistered users (sellers/bidders)100 rows
auctionItems listed for sale500 rows
bidBids placed on auctions2,000 rows

Queries

QueryFeaturesDescription
Q0PassthroughIdentity projection of all bids
Q1Projection + arithmeticCurrency conversion
Q2FilterBids on specific auctions
Q3JOIN + filterLocal item suggestion (person-auction join)
Q4JOIN + GROUP BY + AVGAverage selling price by category
Q5GROUP BY + COUNTHot items (bid count per auction)
Q6JOIN + GROUP BY + AVGAverage bid price per seller
Q7Aggregate (MAX)Highest bid price
Q8JOINPerson-auction join (new users monitoring)
Q9JOIN + DISTINCT ONWinning bid per auction with bidder info

Running Nexmark Tests

# Default scale (100 persons, 500 auctions, 2000 bids, 3 cycles)
cargo test --test e2e_nexmark_tests -- --ignored --test-threads=1 --nocapture

# Larger scale
NEXMARK_PERSONS=1000 NEXMARK_AUCTIONS=5000 NEXMARK_BIDS=50000 NEXMARK_CYCLES=5 \
  cargo test --test e2e_nexmark_tests -- --ignored --test-threads=1 --nocapture

What Each Cycle Does

Each refresh cycle applies three mutation functions (RF1-RF3) then refreshes all stream tables and asserts multiset equality:

  1. RF1 (INSERT): New persons, auctions, and bids
  2. RF2 (DELETE): Remove oldest bids, orphaned auctions, orphaned persons
  3. RF3 (UPDATE): Price changes, reserve adjustments, city moves
  4. Refresh + Assert: Differential refresh → EXCEPT ALL correctness check

Correctness Validation

The test uses the same DBSP invariant as TPC-H: after every differential refresh, the stream table must be multiset-equal to re-executing the defining query from scratch (symmetric EXCEPT ALL). Additionally, negative __pgt_count values (over-retraction bugs) are detected.


DAG Topology Benchmarks

The DAG topology benchmark suite in tests/e2e_dag_bench_tests.rs measures end-to-end propagation latency and throughput through multi-level DAG topologies. While the single-ST benchmarks above measure per-operator refresh speed, these benchmarks measure how efficiently changes propagate through chains, fan-outs, diamonds, and mixed topologies with 5–100+ stream tables.

The core questions these benchmarks answer:

How long does it take for a source-table INSERT to propagate through an entire DAG to the leaf stream tables?

How does PARALLEL refresh mode compare to CALCULATED mode across different topology shapes?

Running DAG Benchmarks

# Full suite (rebuilds Docker image)
just test-dag-bench

# Skip Docker image rebuild
just test-dag-bench-fast

# Individual topology tests
cargo test --test e2e_dag_bench_tests --features pg18 -- --ignored bench_latency_linear_5 --test-threads=1 --nocapture
cargo test --test e2e_dag_bench_tests --features pg18 -- --ignored bench_throughput_diamond --test-threads=1 --nocapture

Topology Patterns

TopologyShapeDescription
Linear Chainsrc → st_1 → st_2 → ... → st_NSequential pipeline; L1 aggregate, L2+ alternating project/filter
Wide DAGsrc → [W parallel chains × D deep]W independent chains of depth D from a shared source; tests parallel refresh mode
Fan-Out Treesrc → root → [b children] → [b² grandchildren] → ...Exponential fan-out; each parent spawns b children with filter/project variants
Diamondsrc → [fan-out aggregates] → JOIN → [extension]Fan-out to independent aggregates (SUM/COUNT/MAX/MIN/AVG) then converge via JOIN
MixedTwo sources, 4 layers, ~15 STsRealistic e-commerce scenario with chains, fan-out, cross-source joins, and alerts

Measurement Modes

Latency benchmarks (auto-refresh): The scheduler is enabled with a 200 ms interval. The test INSERTs into the source table and polls pgt_refresh_history until the leaf stream table has a new COMPLETED entry. This measures the full propagation latency including scheduler overhead.

Throughput benchmarks (manual refresh): The scheduler is disabled. The test applies mixed DML (70% UPDATE, 15% DELETE, 15% INSERT) then manually refreshes all STs in topological order. This isolates pure refresh cost from scheduler overhead.

Theoretical Comparison

Each latency benchmark computes the theoretical prediction from PLAN_DAG_PERFORMANCE.md and reports the delta:

ModeFormula
CALCULATEDL = I_s + N × T_r
PARALLEL(C)L = Σ ⌈W_l / C⌉ × max(I_p, T_r) per level

Where T_r is the measured average per-ST refresh time, I_s = 200 ms (scheduler interval), and C is the concurrency limit.

Reading the Output

Per-Cycle Machine-Parseable Lines (stderr)

[DAG_BENCH] topology=linear_chain mode=CALCULATED sts=10 depth=10 width=1 cycle=1 actual_ms=820.3 theory_ms=700.0 overhead_pct=17.2 per_hop_ms=82.0

ASCII Summary Table (stdout)

╔══════════════════════════════════════════════════════════════════════════════════════════════════════╗
║                         pg_trickle DAG Topology Benchmark Results                                 ║
╠═══════════════╤═══════════════╤══════╤═══════╤═══════╤════════════╤════════════╤═══════════════════╣
║ Topology      │ Mode          │ STs  │ Depth │ Width │ Actual ms  │ Theory ms  │ Overhead          ║
╠═══════════════╪═══════════════╪══════╪═══════╪═══════╪════════════╪════════════╪═══════════════════╣
║ linear_chain  │ CALCULATED    │   10 │    10 │     1 │      820.3 │      700.0 │ +17.2%            ║
║ wide_dag      │ PARALLEL_C8   │   60 │     3 │    20 │     2430.1 │     1800.0 │ +35.0%            ║
╚═══════════════╧═══════════════╧══════╧═══════╧═══════╧════════════╧════════════╧═══════════════════╝

Per-Level Breakdown

  Per-Level Breakdown (linear_chain D=10, CALCULATED):
  Level  1: avg  52.3ms  [st_lc_1]
  Level  2: avg  48.7ms  [st_lc_2]
  ...
  Level 10: avg  51.2ms  [st_lc_10]
  Total:       513.5ms  (scheduler overhead: 306.8ms)

JSON Export

Results are written to target/dag_bench_results/<timestamp>.json (overridable via PGS_DAG_BENCH_JSON_DIR env var) for cross-run comparison.

Available DAG Benchmark Tests

Latency Tests (Auto-Refresh)

Test NameTopologyModeSTs
bench_latency_linear_5_calcLinear, D=5CALCULATED5
bench_latency_linear_10_calcLinear, D=10CALCULATED10
bench_latency_linear_20_calcLinear, D=20CALCULATED20
bench_latency_linear_10_par4Linear, D=10PARALLEL(4)10
bench_latency_wide_3x20_calcWide, D=3 W=20CALCULATED60
bench_latency_wide_3x20_par4Wide, D=3 W=20PARALLEL(4)60
bench_latency_wide_3x20_par8Wide, D=3 W=20PARALLEL(8)60
bench_latency_wide_5x20_calcWide, D=5 W=20CALCULATED100
bench_latency_wide_5x20_par8Wide, D=5 W=20PARALLEL(8)100
bench_latency_fanout_b2d5_calcFan-out, b=2 d=5CALCULATED31
bench_latency_fanout_b2d5_par8Fan-out, b=2 d=5PARALLEL(8)31
bench_latency_diamond_4_calcDiamond, fan=4CALCULATED5
bench_latency_mixed_calcMixed, ~15 STsCALCULATED~15
bench_latency_mixed_par8Mixed, ~15 STsPARALLEL(8)~15

Throughput Tests (Manual Refresh)

Test NameTopologySTsDelta Sizes
bench_throughput_linear_5Linear, D=5510, 100, 1000
bench_throughput_linear_10Linear, D=101010, 100, 1000
bench_throughput_linear_20Linear, D=202010, 100, 1000
bench_throughput_wide_3x20Wide, D=3 W=206010, 100, 1000
bench_throughput_fanout_b2d5Fan-out, b=2 d=53110, 100, 1000
bench_throughput_diamond_4Diamond, fan=4510, 100, 1000
bench_throughput_mixedMixed, ~15 STs~1510, 100, 1000

What to Look For

  1. Linear chain: CALCULATED faster than PARALLEL. For width=1 DAGs, PARALLEL adds poll overhead without parallelism benefit. CALCULATED should be faster.

  2. Wide DAG: PARALLEL(C=8) speedup over CALCULATED. For width ≥ 20, PARALLEL should show measurable improvement — it refreshes up to C STs concurrently per level instead of sequentially.

  3. Overhead < 100%. Theoretical vs actual overhead should stay below 100% across all topologies — the formulas should be in the right ballpark.

  4. DIFFERENTIAL action in per-ST breakdown. ST-on-ST hops should show DIFFERENTIAL rather than FULL, confirming differential propagation is working.

  5. Throughput scaling with delta size. Smaller deltas (10 rows) should yield lower per-cycle wall-clock time than larger deltas (1000 rows).


In-Process Micro-Benchmarks (Criterion.rs)

In addition to the E2E database benchmarks, the project includes two Criterion.rs benchmark suites that measure pure Rust computation time without database overhead. These are useful for tracking performance regressions in the internal query-building and IVM differentiation logic.

Benchmark Suites

refresh_bench — Utility Functions

benches/refresh_bench.rs benchmarks the low-level helper functions used during refresh operations:

Benchmark GroupWhat It Measures
quote_identPostgreSQL identifier quoting speed
col_listColumn list SQL generation
prefixed_col_listPrefixed column list generation (e.g., NEW.col)
expr_to_sqlAST expression → SQL string conversion
output_columnsOutput column extraction from parsed queries
source_oidsSource table OID resolution
lsn_gtLSN comparison expression generation
frontier_jsonFrontier state JSON serialization
canonical_periodInterval parsing and canonicalization
dag_operationsDAG topological sort and cycle detection
xxh64xxHash-64 hashing throughput

diff_operators — IVM Operator Differentiation

benches/diff_operators.rs benchmarks the delta SQL generation for every IVM operator. Each benchmark creates a realistic operator tree and measures differentiate() throughput:

Benchmark GroupWhat It Measures
diff_scanTable scan differentiation (3, 10, 20 columns)
diff_filterFilter (WHERE) differentiation
diff_projectProjection (SELECT subset) differentiation
diff_aggregateGROUP BY aggregate differentiation (simple + complex)
diff_inner_joinInner join differentiation
diff_left_joinLeft outer join differentiation
diff_distinctDISTINCT differentiation
diff_union_allUNION ALL differentiation (2, 5, 10 children)
diff_windowWindow function differentiation
diff_join_aggregateComposite join + aggregate pipeline
differentiate_fullFull differentiate() call for scan-only and filter+scan trees

Running Micro-Benchmarks

# Run all Criterion benchmarks
just bench

# Run only refresh utility benchmarks
cargo bench --bench refresh_bench --features pg18

# Run only IVM diff operator benchmarks
just bench-diff
# or equivalently:
cargo bench --bench diff_operators --features pg18

# Output in Bencher-compatible format (for CI integration)
just bench-bencher

Output and Reports

Criterion produces statistical analysis for each benchmark including:

  • Mean and standard deviation of execution time
  • Throughput (iterations/sec)
  • Comparison with previous run — reports improvements/regressions with confidence intervals

HTML reports are generated in target/criterion/ with interactive charts showing distributions and regression history. Open target/criterion/report/index.html to browse all results.

Sample output:

diff_scan/3_columns   time:   [11.834 µs 12.074 µs 12.329 µs]
diff_scan/10_columns  time:   [16.203 µs 16.525 µs 16.869 µs]
diff_aggregate/simple time:   [21.447 µs 21.862 µs 22.301 µs]
diff_inner_join       time:   [25.919 µs 26.421 µs 26.952 µs]

Continuous Benchmarking with Bencher

Bencher provides continuous benchmark tracking in CI, detecting performance regressions on pull requests before they merge.

How It Works

The .github/workflows/benchmarks.yml workflow:

  1. On main pushes — runs both Criterion suites and uploads results to Bencher as the baseline. This establishes the expected performance for each benchmark.

  2. On pull requests — runs the same benchmarks and compares against the main baseline using a Student's t-test with a 99% upper confidence boundary. If any benchmark regresses beyond the threshold, the PR check fails.

Setup

To enable Bencher for your fork or deployment:

  1. Create a Bencher account at bencher.dev and create a project.

  2. Add the API token as a GitHub Actions secret:

    • Go to Settings → Secrets and variables → Actions
    • Add BENCHER_API_TOKEN with your Bencher API token
  3. Update the project slug in .github/workflows/benchmarks.yml if your Bencher project name differs from pg-trickle.

The workflow gracefully degrades — if BENCHER_API_TOKEN is not set, benchmarks still run and upload artifacts but skip Bencher tracking.

Local Bencher-Format Output

To see what Bencher would receive from CI:

just bench-bencher

This runs both suites with --output-format bencher, producing JSON output compatible with bencher run.

Dashboard

Once configured, the Bencher dashboard shows:

  • Historical trends for every benchmark across commits
  • Statistical thresholds with configurable alerting
  • PR annotations highlighting which benchmarks regressed and by how much

Troubleshooting

IssueResolution
docker: command not foundInstall Docker Desktop and ensure it is running
Container startup timeoutIncrease Docker memory allocation (≥ 4 GB recommended)
image not foundRun ./tests/build_e2e_image.sh to build the test image
Highly variable timingsClose other workloads; use --test-threads=1 to avoid container contention
SLOW status on latency testExpected in Docker; bare-metal should pass < 10 ms

CDC Write-Side Overhead Benchmarks

The CDC write-overhead benchmark suite in tests/e2e_cdc_write_overhead_tests.rs measures the DML throughput cost of pg_trickle's CDC triggers on source tables. This quantifies the "write amplification factor" — how much slower DML becomes when a stream table is attached.

The core question this benchmark answers:

How much write throughput do you sacrifice by attaching a stream table to a source table?

Running CDC Write Overhead Benchmarks

# Full suite (all 5 scenarios)
cargo test --test e2e_cdc_write_overhead_tests --features pg18 -- --ignored --nocapture bench_cdc_write_overhead_full

# Individual scenarios
cargo test --test e2e_cdc_write_overhead_tests --features pg18 -- --ignored --nocapture bench_cdc_single_row_insert
cargo test --test e2e_cdc_write_overhead_tests --features pg18 -- --ignored --nocapture bench_cdc_bulk_insert
cargo test --test e2e_cdc_write_overhead_tests --features pg18 -- --ignored --nocapture bench_cdc_bulk_update
cargo test --test e2e_cdc_write_overhead_tests --features pg18 -- --ignored --nocapture bench_cdc_bulk_delete
cargo test --test e2e_cdc_write_overhead_tests --features pg18 -- --ignored --nocapture bench_cdc_concurrent_writers

Scenarios

ScenarioDescriptionRows per Cycle
Single-row INSERTOne INSERT statement per row, 1,000 rows total1,000
Bulk INSERTSingle INSERT ... SELECT generate_series(...)10,000
Bulk UPDATESingle UPDATE ... WHERE id <= N10,000
Bulk DELETESingle DELETE ... WHERE id <= N10,000
Concurrent writers4 parallel sessions each inserting 5,000 rows20,000 total

Reading the Output

╔═══════════════════════════════════════════════════════════════════════════════════╗
║               pg_trickle CDC Write-Side Overhead Benchmark                       ║
╠═══════════════════════╤═══════════════╤═══════════════╤═════════════════════════╣
║ Scenario              │ Baseline (ms) │ With CDC (ms) │ Write Amplification     ║
╠═══════════════════════╪═══════════════╪═══════════════╪═════════════════════════╣
║ single-row INSERT     │         450.2 │         890.5 │       1.98×             ║
║ bulk INSERT (10K)     │          35.1 │          72.3 │       2.06×             ║
║ bulk UPDATE (10K)     │          48.7 │         105.2 │       2.16×             ║
║ bulk DELETE (10K)     │          22.4 │          51.8 │       2.31×             ║
║ concurrent (4×5K)     │          65.3 │         142.1 │       2.18×             ║
╚═══════════════════════╧═══════════════╧═══════════════╧═════════════════════════╝
ColumnMeaning
ScenarioDML pattern being measured
BaselineAverage wall-clock time with no stream table (no CDC trigger)
With CDCAverage wall-clock time with an active stream table (CDC trigger fires)
Write AmplificationWith CDC / Baseline — how many times slower the write path becomes

Machine-Readable Output

[CDC_BENCH] scenario=single-row_INSERT baseline_avg_ms=450.2 cdc_avg_ms=890.5 write_amplification=1.98

Interpreting Write Amplification

Write AmplificationInterpretation
1.0–1.5×Minimal overhead — triggers add negligible cost. Typical for bulk DML with statement-level triggers.
1.5–2.5×Expected range for statement-level CDC triggers. Each DML statement incurs one additional INSERT into the change buffer.
2.5–4.0×Moderate overhead — acceptable for most workloads. Common with row-level triggers or single-row DML.
4.0–10×High overhead — consider pg_trickle.cdc_trigger_mode = 'statement' if using row-level triggers, or reduce DML frequency.
> 10×Investigate — may indicate lock contention on the change buffer or pathological trigger interaction.

Key Patterns to Look For

  1. Statement-level triggers vs row-level: Statement-level triggers (default since v0.11.0) should show significantly lower overhead for bulk DML compared to row-level triggers.

  2. Bulk DML advantage: Bulk INSERT/UPDATE/DELETE should show lower write amplification than single-row INSERT because the trigger fires once per statement, not once per row.

  3. Concurrent writer safety: The concurrent scenario should complete without deadlocks or errors, and the write amplification should be similar to the serial bulk INSERT case.

  4. DELETE overhead: DELETE triggers tend to be slightly more expensive than INSERT triggers because the trigger must capture the OLD row values.


CI Benchmark Workflows

All benchmark jobs run only on weekly schedule and workflow_dispatch — never on PR or push — to avoid blocking the merge gate with long-running tests.

e2e-benchmarks.yml — E2E Benchmark Tracking

Produces the numbers in README.md and this document. Each job posts a summary table to the GitHub Actions run page and uploads artifacts at 90-day retention. Manual dispatch accepts a job input (refresh | latency | cdc | tpch | all) to re-run a single job.

JobTest(s)README SectionTimeoutjust command
bench-refreshbench_full_matrixDifferential vs Full Refresh60 minjust test-bench-e2e-fast
bench-latencybench_no_data_refresh_latencyZero-Change Latency20 minjust test-bench-e2e-fast
bench-cdcbench_cdc_trigger_overheadWrite-Path Overhead30 minjust test-bench-e2e-fast
bench-tpchtest_tpch_performance_comparisonTPC-H per-query table30 minjust bench-tpch-fast

ci.yml — Benchmark Jobs

Criterion micro-benchmarks and DAG topology benchmarks. Run on the daily schedule and workflow_dispatch.

JobTest SuiteWhat It MeasuresTimeoutjust command
benchmarksbenches/refresh_bench.rs, benches/diff_operators.rsIn-process Rust: query building, delta SQL generation (sub-µs)20 minjust bench
dag-bench-calce2e_dag_bench_tests (excl. par*)DAG propagation latency + throughput, CALCULATED mode30 minjust test-dag-bench-fast
dag-bench-parallele2e_dag_bench_tests (par*)DAG propagation with 4–8 parallel workers120 minjust test-dag-bench-fast

benchmarks.yml — Bencher Integration (opt-in)

Disabled by default (no scheduled trigger). Re-enable by restoring push/pull_request triggers and adding a BENCHER_API_TOKEN secret. When active, it annotates PRs with regressions detected via Student’s t-test at a 99% upper confidence boundary.

JobTest SuiteWhat It MeasuresTracking
benchmarkbenches/refresh_bench.rs, benches/diff_operators.rsSame as ci.yml benchmarks jobBencher (regression alert on PR)

Artifact Retention Summary

WorkflowArtifactRetention
e2e-benchmarks.ymlbench-{refresh,latency,cdc,tpch}-results (stdout + JSON)90 days
ci.yml benchmarksbenchmark-results (Criterion HTML + JSON)7 days
benchmarks.ymlcriterion-results (Criterion HTML + JSON)7 days

pg_trickle vs. DBSP: Similarities and Differences

What They Share (Conceptual Foundation)

pg_trickle explicitly cites DBSP as its theoretical foundation (see PRIOR_ART.md). The key overlap:

ConceptDBSP (paper)pg_trickle (implementation)
Z-set / delta modelRows annotated with weights (+1/−1) in an abelian group__pgt_action = 'I'/'D' column on every delta row — effectively Z-sets restricted to {+1, −1}
Per-operator differentiationRecursive Algorithm 4.6: Q^Δ = D ∘ Q ∘ I, decomposed per-operator via the chain rule (Q₁ ∘ Q₂)^Δ = Q₁^Δ ∘ Q₂^ΔDiffContext::diff_node() walks the OpTree and calls per-operator differentiators (scan, filter, project, join, aggregate, distinct, union, etc.) — same recursive structural decomposition
Linear operators are self-incrementalTheorem 3.3: for LTI operator Q, Q^Δ = QFilter and Project pass deltas through unchanged (just apply predicate/projection to the delta stream)
Bilinear join ruleTheorem 3.4: Δ(a × b) = Δa × Δb + a × Δb + Δa × bdiff_inner_join generates exactly 3 UNION ALL parts: (delta_left ⋈ current_right), (current_left ⋈ delta_right), and optionally (delta_left ⋈ delta_right)
Aggregate auxiliary counters§4.2: counting algorithm for maintaining aggregates with deletions__pgt_count auxiliary column, LEFT JOIN back to stream table to read old counts and compute new counts
Recursive queries§6: fixed-point iteration with z⁻¹ delay operator, semi-naive evaluationdiff_recursive_cte uses recomputation-diff (DRed-style), not DBSP's native fixed-point circuit

Key Differences

1. Execution model — standalone engine vs. embedded in PostgreSQL

DBSP is a standalone streaming runtime (Rust library, now Feldera). It compiles query plans into dataflow graphs that maintain in-memory state and process continuous micro-batches. Operators are long-lived stateful actors with their own memory.

pg_trickle is an extension inside PostgreSQL. It has no persistent dataflow graph. On each refresh, it generates a single SQL query (CTE chain) that PostgreSQL's own planner/executor evaluates. After execution, no operator state persists — auxiliary state lives in the stream table itself (__pgt_count columns) and change buffer tables.

2. Streams vs. periodic batches

DBSP operates on true infinite streams indexed by logical time t ∈ ℕ. Each "step" processes one micro-batch of changes, and operators carry integration state (I operator = running sum from t=0).

pg_trickle operates in discrete refresh cycles triggered by a lag-based scheduler. There is no integration operator — the "current state" is just the stream table's contents, and changes are consumed from CDC buffer tables between LSN boundaries. Each refresh is a self-contained transaction.

3. Z-set weights vs. binary actions

DBSP uses integer weights in ℤ — rows can have weights > 1 (bags) or < −1 (multiple deletions). This enables correct multiset semantics and composable group algebra.

pg_trickle uses binary actions ('I' insert, 'D' delete, sometimes 'U' update). It doesn't maintain true Z-set weights. For aggregates, the __pgt_count auxiliary column serves a similar purpose but is specific to the aggregate operator — it's not a general weight propagated through the operator tree.

4. Integration operator (I)

DBSP: The integration operator I(s)[t] = Σᵢ≤ₜ s[i] is an explicit first-class circuit element. It maintains running sums of changes and is the key mechanism for computing incremental joins (z⁻¹(I(a)) = "accumulated left side up to previous step").

pg_trickle: No explicit integration. The equivalent of I is just "read the current contents of the source/stream table." Join differentiation directly reads the current snapshot of the non-delta side (build_snapshot_sql() generates FROM "public"."orders" r), which implicitly includes all historical changes.

5. Recursion

DBSP: Native fixed-point circuits with z⁻¹ delay. Can incrementally maintain recursive queries (e.g., transitive closure) by iterating only on new changes within each step — semi-naive evaluation generalized to arbitrary recursion.

pg_trickle: Uses recomputation-diff for recursive CTEs — re-executes the full recursive query and anti-joins against current storage to compute the delta. This is correct but not truly incremental for the recursive part.

6. Correctness guarantees

DBSP: Proven correct in Lean. All theorems are machine-checked. The chain rule, cycle rule, and bilinear decomposition are formally verified.

pg_trickle: Verified empirically via property-based tests (the assert_invariant checks that Contents(ST) = Q(DB) after each mutation cycle). No formal proof, but the per-operator rules are direct translations of DBSP's rules.

7. Scope

DBSP: A general-purpose theory and streaming engine. Handles nested relations, streaming aggregation over windows, arbitrary compositions. The Feldera implementation supports a full SQL frontend.

pg_trickle: Focused on materialized views inside PostgreSQL. Supports a specific subset of SQL (scan, filter, project, inner/left/full join, aggregates, DISTINCT, UNION ALL, INTERSECT, EXCEPT, CTEs, window functions, lateral joins). It is not a general streaming engine — it leverages PostgreSQL's own query planner and executor.


Summary

pg_trickle applies DBSP's differentiation rules to generate delta queries, but it is not a DBSP implementation. It borrows the mathematical framework (per-operator differentiation, Z-set-like deltas, bilinear join decomposition) while making fundamentally different architectural choices: embedded in PostgreSQL, no persistent dataflow state, periodic batch execution, and PostgreSQL's planner as the optimizer. Think of it as "DBSP's differentiation algebra, compiled down to SQL CTEs and executed by PostgreSQL."

Research: pg_ivm Comparison

This document is a detailed technical comparison between pg_trickle and pg_ivm covering supported SQL features, refresh latency, and operational differences. It is research material for contributors and evaluators performing a deep-dive comparison.

Quick comparison table (pg_trickle vs pg_ivm and other systems) is in Comparisons.


Abstract

pg_ivm and pg_trickle both implement incremental view maintenance inside PostgreSQL, but they target different maturity levels and operational requirements. pg_ivm is a research prototype that proves the IVM concept within PostgreSQL's standard materialized-view infrastructure — it supports a subset of SQL (single-table aggregates, simple joins) and uses statement-level BEFORE triggers with immediate synchronous refresh. pg_trickle is a production extension that targets the full TPC-H benchmark at O(Δ) complexity, supports thousands of concurrent stream tables, and provides an asynchronous scheduled refresh model with CDC-based change capture.

The key architectural divergence is the change-capture layer: pg_ivm uses pg_ivm_immediate_trigger() with NEW TABLE/OLD TABLE transition tables and refreshes synchronously within the originating transaction (adding write latency), whereas pg_trickle separates write-path (CDC triggers writing to change-buffer tables) from read-path (background scheduler running differential refresh). This separation allows pg_trickle to absorb write bursts, coalesce changes, and maintain stream tables at configurable latencies without blocking the application transaction.

This document provides a feature-matrix comparison across SQL coverage, refresh strategies, operational tooling, performance characteristics, and migration path from pg_ivm to pg_trickle. Where pg_ivm supports a feature that pg_trickle does not yet support (e.g., some window function variants), the gap is documented with a planned resolution version.


pg_trickle vs pg_ivm — Comparison Report & Gap Analysis

Date: 2026-02-28 (merged 2026-03-01, updated 2026-03-20) Author: Internal research Status: Reference document


1. Executive Summary

Both pg_trickle and pg_ivm implement Incremental View Maintenance (IVM) as PostgreSQL extensions — the goal of keeping materialized query results up-to-date without full recomputation. Despite the shared objective they differ fundamentally in design philosophy, maintenance model, SQL coverage, operational model, and target audience.

pg_ivm is a mature, widely-deployed C extension (1.4k GitHub stars, 17 releases) focused on immediate, synchronous IVM that runs inside the same transaction as the base-table write. pg_trickle is a Rust extension (v0.9.0) offering both deferred (scheduled) and immediate (transactional) IVM with a richer SQL dialect, a dependency DAG, and built-in operational tooling.

pg_trickle is significantly ahead of pg_ivm in SQL coverage, operator support, aggregate support, and operational features. As of v0.2.1, pg_trickle also matches pg_ivm's core strength — immediate, in-transaction maintenance — via the IMMEDIATE refresh mode (all phases complete). pg_ivm's one remaining structural advantage is broader PostgreSQL version support (PG 13–18):

  • IMMEDIATE mode — fully implemented. Statement-level AFTER triggers with transition tables update stream tables within the same transaction as base-table DML. Window functions, LATERAL, scalar subqueries, cascading IMMEDIATE stream tables, WITH RECURSIVE (with a stack-depth warning), and TopK micro-refresh are all supported. See PLAN_TRANSACTIONAL_IVM.md.
  • AUTO refresh mode — new default for create_stream_table. Selects DIFFERENTIAL when the query supports it and transparently falls back to FULL otherwise, eliminating the need to choose a mode at creation time.
  • pg_ivm compatibility layer — postponed. The pgivm.create_immv() / pgivm.refresh_immv() / pgivm.pg_ivm_immv wrappers (Phase 2) are deferred to post-1.0.
  • PLAN_PG_BACKCOMPAT.md details backporting pg_trickle to PG 14–18 (recommended) or PG 16–18 (minimum viable), requiring ~2.5–3 weeks of effort primarily in #[cfg]-gating ~435 lines of JSON/SQL-standard parse-tree handling.

With IMMEDIATE mode fully implemented, Row Level Security support (v0.5.0), pg_dump/restore support (v0.8.0), algebraic aggregate maintenance (v0.9.0), parallel refresh (v0.4.0), circular pipeline support (v0.7.0), watermark APIs (v0.7.0), and 40+ unique features, pg_ivm's only remaining advantages are PG version breadth and production maturity.


2. Project Overview

Attributepg_ivmpg_trickle
Repositorysraoss/pg_ivmtrickle-labs/pg-trickle
LanguageCRust (pgrx 0.17)
Latest release1.13 (2025-10-20)0.9.0 (2026-03-20)
Stars~1,400early stage
LicensePostgreSQL LicenseApache 2.0
PG versions13 – 1818 only; PG 14–18 planned
Schemapgivmpgtrickle / pgtrickle_changes
Shared library requiredYes (shared_preload_libraries or session_preload_libraries)Yes (shared_preload_libraries, required for background worker)
Background workerNoYes (scheduler + optional WAL decoder)

3. Maintenance Model

This is the most important design difference between the two extensions.

pg_ivm — Immediate Maintenance

pg_ivm updates its views synchronously inside the same transaction that modified the base table. When a row is inserted/updated/deleted, AFTER row triggers fire and update the IMMV before the transaction commits.

BEGIN;
  UPDATE base_table ...;   -- triggers fire here
  -- IMMV is updated before COMMIT
COMMIT;

Consequences:

  • The IMMV is always exactly consistent with the committed state of the base table — zero staleness.
  • Write latency increases by the cost of view maintenance. For large joins or aggregates on popular tables this can be significant.
  • Locking: ExclusiveLock is held on the IMMV during maintenance to prevent concurrent anomalies. In REPEATABLE READ or SERIALIZABLE isolation, errors are raised when conflicts are detected.
  • TRUNCATE on a base table triggers full IMMV refresh (for most view types).
  • Not compatible with logical replication (subscriber nodes are not updated).

pg_trickle — Deferred, Scheduled Maintenance

pg_trickle updates its stream tables asynchronously, driven by a background worker scheduler. Changes are captured by row-level triggers (or optionally by WAL decoding) into change-buffer tables and are applied in batch on the next refresh cycle.

-- Write path: only a trigger INSERT into change buffer
BEGIN;
  UPDATE base_table ...;   -- trigger captures delta into pgtrickle_changes.*
COMMIT;

-- Separate refresh cycle (background worker):
  apply_delta_to_stream_table(...)

Consequences:

  • Write latency is minimized — the trigger write into the change buffer is ~2–50 μs regardless of view complexity.
  • Stream tables are stale between refresh cycles. The staleness bound is configurable (e.g. '30s', '5m', '@hourly', or cron expressions).
  • Refresh can be triggered manually: pgtrickle.refresh_stream_table(...).
  • Multiple stream tables can share a refresh pipeline ordered by dependency (topological DAG scheduling).
  • The WAL-based CDC mode (pg_trickle.cdc_mode = 'wal') eliminates trigger overhead entirely when wal_level = logical is available.
  • Append-only fast path (v0.5.0): append_only => true skips merge for INSERT-only tables with auto-fallback if DELETE/UPDATE detected.
  • Source gating (v0.5.0): pause CDC during bulk loads via gate_source() and ungate_source() to avoid trigger overhead during large batch inserts.

Implemented: pg_trickle IMMEDIATE Mode

pg_trickle now offers an IMMEDIATE refresh mode (Phase 1 + Phase 3 complete) that uses statement-level AFTER triggers with transition tables — the same mechanism pg_ivm uses. Key implementation details:

  • Reuses the DVM engine — the Scan operator reads from transition tables (via temporary views) instead of change buffer tables.
  • Phase 1 (complete): core IMMEDIATE engine — INSERT/UPDATE/DELETE/TRUNCATE handling, advisory lock-based concurrency (IvmLockMode), mode switching via alter_stream_table, query restriction validation.
  • Phase 2 (postponed): pgivm.* compatibility layer for drop-in migration.
  • Phase 3 (complete): extended SQL support — window functions, LATERAL, scalar subqueries, cascading IMMEDIATE stream tables, WITH RECURSIVE (IM1: supported with a stack-depth warning), and TopK micro-refresh (IM2: recomputes top-K on every DML, gated by pg_trickle.ivm_topk_max_limit).
  • Phase 4 (complete): delta SQL template caching (IVM_DELTA_CACHE); ENR-based transition tables and C-level triggers deferred to post-1.0 as optimizations only.
-- Create an IMMEDIATE stream table (zero staleness)
SELECT pgtrickle.create_stream_table(
    'live_totals',
    'SELECT region, SUM(amount) AS total FROM orders GROUP BY region',
    NULL,          -- no schedule needed
    'IMMEDIATE'
);

-- Updates propagate within the same transaction
BEGIN;
  INSERT INTO orders (region, amount) VALUES ('EU', 100);
  SELECT * FROM live_totals;  -- already includes the new row
COMMIT;

4. SQL Feature Coverage — Summary

Dimensionpg_ivmpg_trickleWinner
Maintenance timingImmediate (in-transaction triggers)Deferred (scheduler/manual) and IMMEDIATE (in-transaction)pg_trickle (offers both models)
PostgreSQL versions13–1818 only; PG 14–18 plannedpg_ivm (today); planned parity
Aggregate functions5 (COUNT, SUM, AVG, MIN, MAX)60+ (all built-in aggregates incl. algebraic O(1) for COUNT/SUM/AVG/STDDEV/VAR)pg_trickle
FILTER clause on aggregatesNoYespg_trickle
HAVING clauseNoYespg_trickle
Inner joinsYes (including self-join)Yes (including self-join, NATURAL, nested)pg_trickle
Outer joinsYes (limited — equijoin, single condition, many restrictions)Yes (LEFT/RIGHT/FULL, nested, complex conditions)pg_trickle
DISTINCTYes (reference-counted)Yes (reference-counted)Tie
DISTINCT ONNoYes (auto-rewritten to ROW_NUMBER)pg_trickle
UNION / INTERSECT / EXCEPTNoYes (all 6 variants, bag + set)pg_trickle
Window functionsNoYes (partition recomputation)pg_trickle
CTEs (non-recursive)Simple only (no aggregates, no DISTINCT inside)Full (aggregates, DISTINCT, multi-reference shared delta)pg_trickle
CTEs (recursive)NoYes (semi-naive, DRed, recomputation; IMMEDIATE mode with stack-depth warning)pg_trickle
Subqueries in FROMSimple only (no aggregates/DISTINCT inside)Full supportpg_trickle
EXISTS subqueriesYes (WHERE only, AND only, no agg/DISTINCT)Yes (WHERE + targetlist, AND/OR, agg/DISTINCT inside)pg_trickle
NOT EXISTS / NOT INNoYes (anti-join operator)pg_trickle
IN (subquery)NoYes (semi-join operator)pg_trickle
Scalar subquery in SELECTNoYes (scalar subquery operator)pg_trickle
LATERAL subqueriesNoYes (row-scoped recomputation)pg_trickle
LATERAL SRFsNoYes (jsonb_array_elements, unnest, etc.)pg_trickle
JSON_TABLE (PG 17+)NoYespg_trickle
GROUPING SETS / CUBE / ROLLUPNoYes (auto-rewritten to UNION ALL)pg_trickle
Views as sourcesNo (simple tables only)Yes (auto-inlined, nested)pg_trickle
Partitioned tablesNoYespg_trickle
Foreign tablesNoFULL mode onlypg_trickle
Cascading (view-on-view)NoYes (DAG-aware scheduling)pg_trickle
Background schedulingNo (user must trigger)Yes (cron + duration, background worker)pg_trickle
Monitoring / observability1 catalog tableExtensive (stats, history, staleness, CDC health, NOTIFY)pg_trickle
CDC mechanismTriggers onlyHybrid (triggers + optional WAL)pg_trickle
DDL trackingNo automatic handlingYes (event triggers, auto-reinit)pg_trickle
TRUNCATE handlingYes (auto-truncate IMMV)IMMEDIATE mode: full refresh in same txn; DEFERRED: queued full refreshTie (functionally equivalent in IMMEDIATE mode)
Auto-indexingYes (on GROUP BY / DISTINCT / PK columns)No (user creates indexes)pg_ivm
Row Level SecurityYes (with limitations)Yes (refreshes see all data; RLS on stream table; IMMEDIATE mode secured)pg_trickle (richer model)
Concurrency modelExclusiveLock on IMMV during maintenanceAdvisory locks, non-blocking reads, parallel refreshpg_trickle
Data type restrictionsMust have btree opclass (no json, xml, point)No documented type restrictionspg_trickle
Maturity / ecosystem4 years, 1.4k stars, PGXN, yum packagesv0.9.0 released, 1,100+ unit tests + 900+ E2E tests, 22 TPC-H benchmarks, dbt integrationpg_ivm

4.1 Areas Where pg_ivm Wins

Of the ~35 dimensions in the summary table above, pg_ivm holds an advantage in only 3 (down from 6 before IMMEDIATE mode and RLS were implemented). One is substantive, two are temporary gaps with existing plans.

1. PostgreSQL Version Support (substantive, planned resolution)

pg_ivm ships pre-built packages for PostgreSQL 13–18 across all major Linux distros via yum.postgresql.org and PGXN. pg_trickle currently targets PG 18 only.

This is the single largest remaining structural gap. PG 13 is EOL (Nov 2025), but PG 14–17 are widely deployed in production environments. Users on those versions simply cannot use pg_trickle today.

Planned resolution: PLAN_PG_BACKCOMPAT.md details backporting to PG 14–18 (~2.5–3 weeks). pgrx 0.17 already supports PG 14–18 via feature flags; ~435 lines in parser.rs need #[cfg] gating for JSON/SQL-standard parse-tree handling.

2. Auto-Indexing (substantive, low priority)

When pg_ivm creates an IMMV, it automatically adds indexes on columns used in GROUP BY, DISTINCT, and primary keys. This is a genuine usability advantage — new users get reasonable read performance without manual intervention.

pg_trickle leaves index creation entirely to the user. For DIFFERENTIAL mode stream tables, the DVM engine's MERGE-based delta application already uses the stream table's primary key (which is auto-created), and index-aware MERGE (pg_trickle.merge_seqscan_threshold, added v0.9.0) uses index lookups for tiny change ratios, but secondary indexes for read-side query patterns must be added manually.

Impact: Low — experienced users always create application-specific indexes anyway. Auto-indexing mostly helps onboarding and simple use-cases.

Planned resolution: Tracked as part of the pg_ivm compatibility layer (Phase 2, postponed to post-1.0). Could also be implemented independently as a CREATE INDEX IF NOT EXISTS step in create_stream_table.

3. Maturity / Ecosystem (temporary, closing over time)

pg_ivm has 4 years of production use, ~1,400 GitHub stars, 17 releases, and is distributed via PGXN, yum, and apt package repositories. It has a track record of stability and a community of users.

pg_trickle is a v0.9.0 series release with 1,100+ unit tests, 200+ integration tests, 570+ light E2E tests, 90+ full E2E tests, and 22 TPC-H correctness benchmarks—but no wide production deployments yet. It lacks the battle-testing that comes from years of real-world usage.

Impact: High for risk-averse organizations considering production adoption. Low for greenfield projects or teams willing to adopt early.

Resolution: This gap closes naturally with time, releases, and adoption. The dbt integration (dbt-pgtrickle) and CNPG/Kubernetes deployment support accelerate ecosystem development.


5. Detailed SQL Comparison

5.1 Aggregate Functions

Functionpg_ivmpg_trickle
COUNT(*) / COUNT(expr)✅ Algebraic✅ Algebraic (O(1) running total, v0.9.0)
SUM✅ Algebraic✅ Algebraic (O(1) running total, v0.9.0)
AVG✅ Algebraic (via SUM/COUNT)✅ Algebraic (O(1) via SUM/COUNT decomposition, v0.9.0)
MIN✅ Semi-algebraic (rescan on extremum delete)✅ Semi-algebraic (O(1) unless extremum deleted, v0.9.0 safety guard)
MAX✅ Semi-algebraic (rescan on extremum delete)✅ Semi-algebraic (O(1) unless extremum deleted, v0.9.0 safety guard)
BOOL_AND / BOOL_OR✅ Group-rescan
STRING_AGG✅ Group-rescan
ARRAY_AGG✅ Group-rescan
JSON_AGG / JSONB_AGG✅ Group-rescan
BIT_AND / BIT_OR / BIT_XOR✅ Group-rescan
JSON_OBJECT_AGG / JSONB_OBJECT_AGG✅ Group-rescan
STDDEV / VARIANCE (all variants)✅ Algebraic (O(1) sum-of-squares decomposition, v0.9.0)
MODE / PERCENTILE_CONT / PERCENTILE_DISC✅ Group-rescan
CORR / COVAR / REGR_* (11 functions)✅ Group-rescan
ANY_VALUE (PG 16+)✅ Group-rescan
JSON_ARRAYAGG / JSON_OBJECTAGG (PG 16+)✅ Group-rescan
User-defined aggregates (CREATE AGGREGATE)✅ Group-rescan
FILTER (WHERE) clause
WITHIN GROUP (ORDER BY)
COUNT(DISTINCT expr) / SUM(DISTINCT expr)
Total560+

Gap for pg_ivm: Massive. Only 5 of ~60 built-in aggregate functions are supported. pg_trickle v0.9.0 also introduced algebraic (O(1)) maintenance for COUNT, SUM, AVG, STDDEV, and VARIANCE — meaning these aggregates update in constant time per changed row via running totals, whereas pg_ivm’s algebraic support is limited to COUNT, SUM, AVG. pg_trickle additionally supports user-defined aggregates via group-rescan and floating-point drift correction (pg_trickle.algebraic_drift_reset_cycles).

5.2 Joins

Featurepg_ivmpg_trickle
Inner join
Self-join
LEFT JOIN✅ (restricted)✅ (full)
RIGHT JOIN✅ (restricted)✅ (normalized to LEFT)
FULL OUTER JOIN✅ (restricted)✅ (8-part delta)
NATURAL JOIN?
Cross join?
Nested joins (3+ tables)
Non-equi joins (theta)?
Outer join + aggregates
Outer join + subqueries
Outer join + CASE/non-strict
Outer join multi-condition❌ (single equality only)

Gap for pg_ivm: Outer joins are heavily restricted — single equijoin condition, no aggregates, no subqueries, no CASE expressions, no IS NULL in WHERE.

5.3 Subqueries

Featurepg_ivmpg_trickle
Simple subquery in FROM✅ (no aggregates/DISTINCT inside)✅ (full support)
EXISTS in WHERE✅ (AND only, no agg/DISTINCT inside)✅ (AND + OR, full SQL inside)
NOT EXISTS in WHERE✅ (anti-join operator)
IN (subquery)✅ (rewritten to semi-join)
NOT IN (subquery)✅ (rewritten to anti-join)
ALL (subquery)✅ (rewritten to anti-join)
Scalar subquery in SELECT✅ (scalar subquery operator)
Scalar subquery in WHERE✅ (auto-rewritten to CROSS JOIN)
LATERAL subquery in FROM✅ (row-scoped recomputation)
LATERAL SRF in FROM✅ (jsonb_array_elements, unnest, etc.)
Subqueries in OR✅ (auto-rewritten to UNION)

Gap for pg_ivm: Severely limited subquery support. No anti-joins, no scalar subqueries, no LATERAL, no SRFs.

5.4 CTEs

Featurepg_ivmpg_trickle
Simple non-recursive CTE✅ (no aggregates/DISTINCT inside)✅ (full SQL inside)
Multi-reference CTE?✅ (shared delta optimization)
Chained CTEs?
WITH RECURSIVE✅ (semi-naive, DRed, recomputation; IMMEDIATE mode with stack-depth warning)

Gap for pg_ivm: No recursive CTEs, no aggregates/DISTINCT inside CTEs.

5.5 Set Operations

Featurepg_ivmpg_trickle
UNION ALL
UNION (set)✅ (via DISTINCT + UNION ALL)
INTERSECT✅ (dual-count multiplicity)
INTERSECT ALL
EXCEPT✅ (dual-count multiplicity)
EXCEPT ALL

Gap for pg_ivm: No set operations at all.

5.6 Window Functions

Featurepg_ivmpg_trickle
ROW_NUMBER, RANK, DENSE_RANK
SUM/AVG/COUNT OVER ()
Frame clauses (ROWS/RANGE/GROUPS)
Named WINDOW clauses
PARTITION BY recomputation

Gap for pg_ivm: Window functions are completely unsupported.

5.7 DISTINCT & Grouping

Featurepg_ivmpg_trickle
SELECT DISTINCT
DISTINCT ON (expr, ...)✅ (auto-rewritten to ROW_NUMBER)
GROUP BY
GROUPING SETS✅ (auto-rewritten to UNION ALL)
CUBE✅ (auto-rewritten via GROUPING SETS)
ROLLUP✅ (auto-rewritten via GROUPING SETS)
GROUPING() function
HAVING

5.8 Source Table Types

Source typepg_ivmpg_trickle
Simple heap tables
Views✅ (auto-inlined)
Materialized viewsFULL mode only
Partitioned tables
Partitions✅ (via parent)
Foreign tablesFULL mode only
Other IMMVs / stream tables✅ (DAG cascading)

Gap for pg_ivm: Only simple heap tables. No views, no partitioned tables, no cascading.


6. API Comparison

pg_ivm API

-- Create an IMMV
SELECT pgivm.create_immv('myview', 'SELECT * FROM mytab');

-- Full refresh (emergency)
SELECT pgivm.refresh_immv('myview', true);   -- with data
SELECT pgivm.refresh_immv('myview', false);  -- disable maintenance

-- Inspect
SELECT immvrelid, pgivm.get_immv_def(immvrelid)
FROM pgivm.pg_ivm_immv;

-- Drop
DROP TABLE myview;

-- Rename
ALTER TABLE myview RENAME TO myview2;

pg_ivm IMMVs are standard PostgreSQL tables. They can be dropped with DROP TABLE and renamed with ALTER TABLE.

pg_trickle API

-- Create a stream table (AUTO mode: DIFFERENTIAL when possible, FULL fallback)
SELECT pgtrickle.create_stream_table(
    'order_totals',
    'SELECT region, SUM(amount) AS total FROM orders GROUP BY region'
    -- refresh_mode defaults to 'AUTO', schedule defaults to 'calculated'
);

-- Create a stream table (explicit deferred, scheduled)
SELECT pgtrickle.create_stream_table(
    'order_totals',
    'SELECT region, SUM(amount) AS total FROM orders GROUP BY region',
    schedule     => '2m',
    refresh_mode => 'DIFFERENTIAL'
);

-- Create a stream table (immediate, in-transaction)
SELECT pgtrickle.create_stream_table(
    'live_totals',
    'SELECT region, SUM(amount) AS total FROM orders GROUP BY region',
    schedule     => NULL,
    refresh_mode => 'IMMEDIATE'
);

-- Manual refresh
SELECT pgtrickle.refresh_stream_table('order_totals');

-- Alter schedule, mode, or defining query
SELECT pgtrickle.alter_stream_table('order_totals', schedule => '5m');
SELECT pgtrickle.alter_stream_table(
    'order_totals',
    query => 'SELECT region, SUM(amount) AS total FROM orders WHERE active GROUP BY region'
);

-- Drop
SELECT pgtrickle.drop_stream_table('order_totals');

-- Status and monitoring
SELECT * FROM pgtrickle.pgt_status();
SELECT * FROM pgtrickle.pg_stat_stream_tables;
SELECT * FROM pgtrickle.pgt_stream_tables;

-- DAG inspection
SELECT * FROM pgtrickle.pgt_dependencies;

-- Extended observability (added v0.2.0+)
SELECT * FROM pgtrickle.change_buffer_sizes();  -- CDC buffer health
SELECT * FROM pgtrickle.list_sources('order_totals');  -- source table stats
SELECT * FROM pgtrickle.dependency_tree();  -- ASCII DAG view
SELECT * FROM pgtrickle.health_check();  -- OK/WARN/ERROR triage
SELECT * FROM pgtrickle.refresh_timeline();  -- cross-stream history
SELECT * FROM pgtrickle.trigger_inventory();  -- CDC trigger audit
SELECT * FROM pgtrickle.diamond_groups();  -- diamond consistency groups

-- Source gating (v0.5.0)
SELECT pgtrickle.gate_source('orders');      -- pause CDC
SELECT pgtrickle.ungate_source('orders');    -- resume CDC
SELECT * FROM pgtrickle.source_gates();      -- gate status

-- Watermarks (v0.7.0)
SELECT pgtrickle.advance_watermark('orders', '2026-03-20 12:00:00');
SELECT pgtrickle.create_watermark_group('sync', ARRAY['orders','products'], 30);
SELECT * FROM pgtrickle.watermarks();
SELECT * FROM pgtrickle.watermark_status();

-- Parallel refresh monitoring (v0.4.0)
SELECT * FROM pgtrickle.worker_pool_status();
SELECT * FROM pgtrickle.parallel_job_status();

-- Refresh groups (v0.9.0)
SELECT pgtrickle.create_refresh_group('my_group', ARRAY['st1','st2']);
SELECT pgtrickle.drop_refresh_group('my_group');

-- Idempotent DDL (v0.6.0)
SELECT pgtrickle.create_or_replace_stream_table(
    'order_totals',
    'SELECT region, SUM(amount) AS total FROM orders GROUP BY region'
);

pg_trickle stream tables are regular PostgreSQL tables but managed through the pgtrickle schema's API functions. They cannot be renamed with ALTER TABLE (use alter_stream_table).


7. Scheduling and Dependency Management

Capabilitypg_ivmpg_trickle
Automatic scheduling❌ (immediate only, no scheduler)✅ background worker
Manual refreshrefresh_immv()refresh_stream_table()
Cron schedules✅ (standard 5/6-field cron + aliases)
Duration-based staleness bounds✅ ('30s', '5m', '1h', …)
Dependency DAG✅ (stream tables can reference other stream tables)
Topological refresh ordering✅ (upstream refreshes before downstream)
CALCULATED schedule propagation✅ (consumers drive upstream schedules)
Parallel refresh✅ (worker pool with database + cluster caps, v0.4.0)
Circular pipeline support✅ (monotone cycles with fixed-point iteration, v0.7.0)
Watermark coordination✅ (multi-source readiness gates, v0.7.0)
Refresh group management✅ (atomic multi-ST refresh, v0.9.0)

pg_trickle's DAG scheduling is a significant differentiator: you can build multi-layer pipelines where each downstream stream table is automatically refreshed after its upstream dependencies.


8. Change Data Capture

Attributepg_ivmpg_trickle
MechanismAFTER row triggers (inline, same txn)AFTER row/statement triggers → change buffer
WAL-based CDC✅ optional (pg_trickle.cdc_mode = 'wal')
Statement-level triggers✅ (v0.4.0, reduced overhead for bulk operations)
Logical replication slotsNot usedUsed in WAL mode only
Write-side overheadHigher (view maintenance in txn)Lower (small trigger insert only)
Change buffer tablesNone (applied immediately)pgtrickle_changes.changes_<oid>
TRUNCATE handlingIMMV truncated/refreshed synchronouslyChange buffer cleared; full refresh queued

9. Concurrency and Isolation

pg_ivm

  • Holds ExclusiveLock on the IMMV during incremental update.
  • In READ COMMITTED: serializes concurrent updates to the same IMMV.
  • In REPEATABLE READ / SERIALIZABLE: raises an error when a concurrent transaction has already updated the IMMV.
  • Single-table INSERT-only IMMVs use the lighter RowExclusiveLock.

pg_trickle

  • Refresh operations acquire an advisory lock per stream table so only one refresh can run at a time.
  • Base table writes are never blocked by refresh operations.
  • Parallel refresh (v0.4.0): pg_trickle.parallel_refresh_mode = 'on' enables a worker pool with per-database (max_concurrent_refreshes, default 4) and cluster-wide (max_dynamic_refresh_workers) caps.
  • Atomic refresh groups for diamond dependencies.
  • Crash recovery: in-flight refreshes are marked failed on restart; the scheduler retries on the next cycle.

10. Observability

Featurepg_ivmpg_trickle
Catalog of managed viewspgivm.pg_ivm_immvpgtrickle.pgt_stream_tables
Per-refresh timing/historypgtrickle.pgt_refresh_history
Staleness reportingstale column + get_staleness()
Scheduler statuspgtrickle.pgt_status()
NOTIFY-based alertingpgtrickle_refresh channel (10+ alert types)
Error tracking✅ consecutive error counter, last error message
dbt integrationdbt-pgtrickle macro package
Explain/introspectionexplain_st
CDC buffer healthpgtrickle.change_buffer_sizes() (v0.2.0)
Source table statspgtrickle.list_sources() (v0.2.0)
Dependency tree viewpgtrickle.dependency_tree() (v0.2.0)
Health triagepgtrickle.health_check() (v0.2.0)
Cross-stream refresh historypgtrickle.refresh_timeline() (v0.2.0)
CDC trigger auditpgtrickle.trigger_inventory() (v0.2.0)
Diamond group inspectionpgtrickle.diamond_groups() (v0.2.0)
Quick health summarypgtrickle.quick_health view (v0.5.0)
Source gating statuspgtrickle.source_gates() (v0.5.0)
Watermark monitoringpgtrickle.watermarks() / watermark_status() (v0.7.0)
Parallel worker statuspgtrickle.worker_pool_status() / parallel_job_status() (v0.4.0)
SCC cycle statuspgtrickle.pgt_scc_status() (v0.7.0)
Replication slot healthpgtrickle.slot_health()
CDC mode per-sourcepgtrickle.pgt_cdc_status view

11. Installation and Deployment

Attributepg_ivmpg_trickle
Pre-built packagesRPM via yum.postgresql.orgOCI image, tarball
CNPG / Kubernetes❌ (no OCI image)✅ OCI extension image + CNPG smoke tests
Docker local devManual✅ documented + Docker Hub image
shared_preload_librariesRequired (or session_preload_libraries)Required
Extension upgrade scripts✅ (1.0 → 1.1 → … → 1.13)✅ (0.1.3 → … → 0.9.0, CI completeness check, upgrade E2E tests)
pg_dump / restoreManual IMMV recreation required✅ Standard pg_dump supported (v0.8.0)

12. Performance Characteristics

pg_ivm

  • Write path: slower — every DML statement triggers inline view maintenance. From the README example: a single row update on a 10M-row join IMMV takes ~15 ms vs ~9 ms for a plain table update.
  • Read path: instant — IMMV is always current, no refresh needed on read.
  • Refresh (full): comparable to REFRESH MATERIALIZED VIEW (~20 seconds for a 10M-row join in the example).

pg_trickle

  • Write path: minimal overhead — only a small trigger INSERT into the change buffer (~2–50 μs per row). In WAL mode, zero trigger overhead. Statement-level CDC triggers (v0.4.0) further reduce overhead for bulk ops.
  • Read path: instant from the materialized table (potentially stale).
  • Refresh (differential): proportional to the number of changed rows, not the total table size. A single-row change on a million-row aggregate touches one row's worth of computation. Algebraic aggregates (v0.9.0) like COUNT/SUM/AVG/STDDEV/VAR update in O(1) constant time per changed row.
  • Refresh (full): re-runs the entire query; comparable to REFRESH MATERIALIZED VIEW.
  • Parallel refresh (v0.4.0): linear speedup with worker pool size.
  • I/O optimizations (v0.9.0): column skipping, source skipping in joins, WHERE filter push-down, index-aware MERGE for tiny change ratios, scalar subquery short-circuit.

13. Known Limitations

pg_ivm Limitations

  • Adds latency to every write on tracked base tables.
  • Cannot track tables modified via logical replication (subscriber nodes are not updated).
  • pg_dump / pg_upgrade require manual recreation of all IMMVs.
  • Limited aggregate support (no user-defined aggregates, no window functions).
  • Column type restrictions (btree operator class required in target list).
  • No scheduler or background worker — refresh is immediate only.
  • On high-churn tables, min/max aggregates can trigger expensive rescans.

pg_trickle Limitations

  • In DIFFERENTIAL/FULL mode, data is stale between refresh cycles. Use IMMEDIATE mode for zero-staleness, in-transaction consistency.
  • Recursive CTEs in IMMEDIATE mode emit a stack-depth warning; very deep recursion may hit PostgreSQL's stack limit.
  • Recursive CTEs in DIFFERENTIAL mode fall back to full recomputation for mixed DELETE/UPDATE changes (DRed scheduled for v0.10.0+).
  • LIMIT without ORDER BY is not supported in defining queries.
  • OFFSET without ORDER BY … LIMIT is not supported. Paged TopK (ORDER BY … LIMIT N OFFSET M) is fully supported.
  • ORDER BY + LIMIT (TopK) without OFFSET uses scoped recomputation (MERGE).
  • Volatile SQL functions rejected in DIFFERENTIAL mode.
  • Materialized views as sources not supported in DIFFERENTIAL mode.
  • Window functions in expressions (e.g. CASE WHEN ROW_NUMBER() OVER (...) > 5) require FULL mode.
  • Foreign tables as sources require FULL mode.
  • ALTER EXTENSION pg_trickle UPDATE migration scripts ship from v0.2.1; continuous upgrade path through v0.9.0.
  • Targets PostgreSQL 18 only; no backport to PG 13–17 (planned for PG 14–18).
  • v0.9.x series — extensive testing but not yet production-hardened at scale.

14. PostgreSQL Version Support

pg_ivmpg_trickle (current)pg_trickle (planned)
PG 13❌ (EOL Nov 2025)
PG 14✅ (full plan)
PG 15✅ (full plan)
PG 16✅ (MVP target)
PG 17✅ (MVP target)
PG 18

Planned resolution: PLAN_PG_BACKCOMPAT.md:

  • Minimum viable (PG 16–18): ~1.5 weeks effort.
  • Full target (PG 14–18): ~2.5–3 weeks effort.
  • pgrx 0.17.0 already supports PG 14–18 via feature flags.
  • ~435 lines in src/dvm/parser.rs need #[cfg] gating (all in JSON/SQL-standard sections). The remaining ~13,500 lines compile unchanged.

Feature degradation matrix:

FeaturePG 14PG 15PG 16PG 17PG 18
Core streaming tables
Trigger-based CDC
Differential refresh
SQL/JSON constructors
JSON_TABLE
WAL-based CDCNeeds testNeeds testLikelyLikely

15. Features Unique to Each System

Features Unique to pg_trickle (42 items, no pg_ivm equivalent)

  1. IMMEDIATE + deferred modes (pg_ivm is immediate-only; pg_trickle offers both)
  2. 60+ aggregate functions (vs 5), including algebraic O(1) for COUNT/SUM/AVG/STDDEV/VAR
  3. FILTER / HAVING / WITHIN GROUP on aggregates
  4. Window functions (partition recomputation)
  5. Set operations (UNION ALL, UNION, INTERSECT, EXCEPT — all 6 variants)
  6. Recursive CTEs (semi-naive, DRed, recomputation; including IMMEDIATE mode with stack-depth warning)
  7. LATERAL subqueries and SRFs (jsonb_array_elements, unnest, JSON_TABLE)
  8. Anti-join / semi-join operators (NOT EXISTS, NOT IN, IN, EXISTS with full SQL)
  9. Scalar subqueries in SELECT list
  10. Views as sources (auto-inlined with nested expansion)
  11. Partitioned table support (RANGE, LIST, HASH with auto-rebuild on ATTACH PARTITION)
  12. Cascading stream tables (ST referencing other STs via DAG)
  13. Background scheduler (cron + duration + canonical periods) with multi-database auto-discovery
  14. GROUPING SETS / CUBE / ROLLUP (auto-rewritten)
  15. DISTINCT ON (auto-rewritten to ROW_NUMBER)
  16. Hybrid CDC (trigger → WAL transition)
  17. DDL change detection and automatic reinitialization (including ALTER FUNCTION body changes)
  18. Monitoring suite (15+ observability functions: change_buffer_sizes, list_sources, dependency_tree, health_check, refresh_timeline, trigger_inventory, diamond_groups, source_gates, watermarks, watermark_groups, watermark_status, worker_pool_status, parallel_job_status, pgt_scc_status, slot_health, check_cdc_health)
  19. Auto-rewrite pipeline (6 transparent SQL rewrites)
  20. Volatile function detection
  21. AUTO refresh mode (smart DIFFERENTIAL/FULL selection with transparent fallback)
  22. ALTER QUERY — change the defining query of an existing stream table online, with schema-change classification and OID-preserving migration
  23. dbt macro package (materialization, status macro, health test, refresh operation)
  24. CNPG / Kubernetes deployment
  25. SQL/JSON constructors (JSON_OBJECT, JSON_ARRAY, etc.)
  26. JSON_TABLE support (PG 17+)
  27. TopK stream tables (ORDER BY + LIMIT, including IMMEDIATE mode via micro-refresh)
  28. Paged TopK (ORDER BY + LIMIT + OFFSET for server-side pagination)
  29. Diamond dependency consistency (multi-path refresh atomicity with SAVEPOINT)
  30. Extension upgrade infrastructure (SQL migration scripts, CI completeness check, upgrade E2E tests, per-release SQL baselines)
  31. Row Level Security (refreshes see all data; RLS policies on ST itself; IMMEDIATE mode secured; internal change buffers shielded from RLS interference) (v0.5.0)
  32. Source gating (pause/resume CDC for bulk loads: gate_source, ungate_source) (v0.5.0)
  33. Append-only fast path (append_only => true skips merge for INSERT-only tables) (v0.5.0)
  34. Parallel refresh (background worker pool with per-database and cluster-wide caps, atomic groups for diamond dependencies) (v0.4.0)
  35. Statement-level CDC triggers (reduced write-side overhead for bulk operations) (v0.4.0)
  36. Circular pipeline support (monotone cycles with fixed-point iteration, max_fixpoint_iterations safety limit, SCC status monitoring) (v0.7.0)
  37. Watermark APIs (delay refresh until multi-source data is ready: advance_watermark, create_watermark_group, tolerance-based readiness) (v0.7.0)
  38. pg_dump / pg_restore support (safe backup with auto-reconnect of streams) (v0.8.0)
  39. Algebraic aggregate maintenance (O(1) constant-time updates for COUNT/SUM/AVG/STDDEV/VAR with floating-point drift correction) (v0.9.0)
  40. Refresh group management (create_refresh_group, drop_refresh_group for atomic multi-ST refresh) (v0.9.0)
  41. Automatic backoff (exponential slowdown for overloaded streams) (v0.9.0)
  42. Index-aware MERGE (use index lookups for tiny change ratios) (v0.9.0)

Features Unique to pg_ivm (with planned resolutions)

#FeatureStatusRef
1Immediate (synchronous) maintenanceClosed — IMMEDIATE refresh mode fully implemented (all phases)PLAN_TRANSACTIONAL_IVM
2Auto-index creation on GROUP BY / DISTINCT / PKPostponed (Phase 2 of transactional IVM)PLAN_TRANSACTIONAL_IVM §5.2
3TRUNCATE propagation (auto-truncate IMMV)Closed — IMMEDIATE mode fires full refresh on TRUNCATEPLAN_TRANSACTIONAL_IVM §3.2
4Row Level Security respectClosed — v0.5.0: refreshes see all data; RLS on ST itself; IMMEDIATE mode secured; change buffers shieldedROW_LEVEL_SECURITY.md
5PostgreSQL 13–17 supportPG 14–18 backcompat planned (~2.5–3 weeks)PLAN_PG_BACKCOMPAT
6session_preload_librariesNot applicable (background worker needs shared_preload)
7Rename via ALTER TABLEEvent trigger support (low effort)
8Drop via DROP TABLEPostponed (Phase 2 of transactional IVM)PLAN_TRANSACTIONAL_IVM §4.3
9Extension upgrade scriptsClosed — Scripts ship from v0.2.1; CI completeness check and upgrade E2E tests in place
10pg_dump / pg_restoreClosed — v0.8.0: safe backup with pg_dump and pg_restore, auto-reconnect streams

Of the 10 items, 5 are now closed (immediate maintenance, TRUNCATE, RLS, upgrade scripts, pg_dump), 3 have concrete implementation plans, and 2 are low-priority or not applicable.


16. Use-Case Fit

ScenarioRecommended
Need views consistent within the same transactionEither (pg_trickle IMMEDIATE mode or pg_ivm)
Application cannot tolerate any view stalenessEither (pg_trickle IMMEDIATE mode or pg_ivm)
High write throughput, views can be slightly stalepg_trickle (DIFFERENTIAL mode)
Multi-layer summary pipelines with dependenciespg_trickle
Time-based or cron-driven refresh schedulespg_trickle
Views with complex SQL (window functions, CTEs, UNION)pg_trickle
Simple aggregation with zero-staleness requirementEither (pg_trickle has richer SQL coverage)
Kubernetes / CloudNativePG deploymentpg_trickle
dbt integrationpg_trickle
Circular / self-referencing pipelinespg_trickle
Multi-source watermark coordinationpg_trickle
High-throughput bulk loading (append-only)pg_trickle (append-only fast path)
Row Level Security on analytical summariespg_trickle (richer RLS model)
pg_dump / pg_restore workflowpg_trickle
PostgreSQL 13–17pg_ivm
PostgreSQL 18pg_trickle (superset of pg_ivm)
Production-hardened, stable APIpg_ivm
Early adopter, rich SQL coverage neededpg_trickle

17. Coexistence

The two extensions can be installed in the same database simultaneously — they use different schemas (pgivm vs pgtrickle/pgtrickle_changes) and do not interfere with each other. However, with pg_trickle's IMMEDIATE mode now available and its dramatically broader feature set (v0.9.0), there is little reason to use both:

  • Use pg_trickle IMMEDIATE for small, critical lookup tables that must be perfectly consistent within transactions (the use-case that previously required pg_ivm).
  • Use pg_trickle DIFFERENTIAL/FULL for large analytical summary tables, multi-layer aggregation pipelines, circular pipelines, or views where slight staleness is acceptable.
  • Use pg_trickle AUTO (default) to let the system choose the best strategy.
  • Use pg_ivm only if you need PostgreSQL 13–17 support or prefer its mature, battle-tested codebase.

18. Recommendations

Planned work that closes pg_ivm gaps

PriorityItemPlanEffortCloses Gaps
✅ DoneIMMEDIATE refresh mode (all phases)PLAN_TRANSACTIONAL_IVMComplete#1 (immediate maintenance), #3 (TRUNCATE)
✅ DoneExtension upgrade scriptsv0.2.1 releaseComplete#9 (upgrade scripts)
✅ DoneRow Level Securityv0.5.0 releaseComplete#4 (RLS)
✅ Donepg_dump / pg_restorev0.8.0 releaseComplete#10 (backup/restore)
Postponedpg_ivm compatibility layerPLAN_TRANSACTIONAL_IVM Phase 2Deferred to post-1.0#2 (auto-indexing), #7 (rename), #8 (DROP TABLE)
HighPG 16–18 backcompat (MVP)PLAN_PG_BACKCOMPAT §11~1.5 weeks#5 (PG version support)
MediumPG 14–18 backcompat (full)PLAN_PG_BACKCOMPAT §5~2.5–3 weeks#5 (PG version support)

Remaining small gaps (no existing plan)

PriorityItemDescriptionEffort
LowALTER TABLE RENAMEDetect rename via event trigger, update catalog2–4h

Not worth pursuing

ItemReason
PG 13 supportEOL since November 2025. Incompatible raw_parser() API.
session_preload_librariesRequires background worker, which needs shared_preload_libraries.

19. Conclusion

pg_trickle covers all of pg_ivm's SQL surface and extends it dramatically with 55+ additional aggregate functions (including algebraic O(1) maintenance for COUNT/SUM/AVG/STDDEV/VAR), window functions, set operations, recursive CTEs, LATERAL support, anti/semi-joins, circular pipeline support, watermark coordination, parallel refresh, Row Level Security, and a comprehensive operational layer.

The immediate maintenance gap is now fully closed: pg_trickle's IMMEDIATE refresh mode provides the same in-transaction consistency as pg_ivm, while also supporting window functions, LATERAL, scalar subqueries, WITH RECURSIVE (IM1), TopK micro-refresh (IM2), and cascading stream tables in IMMEDIATE mode — all of which pg_ivm cannot do.

The upgrade infrastructure gap is also closed: v0.2.1 ships SQL migration scripts with continuous upgrade path through v0.9.0, a CI completeness checker, and upgrade E2E tests, matching pg_ivm's upgrade path story.

The Row Level Security gap is closed (v0.5.0): refreshes see all data, RLS policies on the stream table itself control access, and IMMEDIATE mode is secured with shielded change buffers.

The pg_dump/restore gap is closed (v0.8.0): safe backup with standard PostgreSQL tools and automatic stream reconnection on restore.

The one remaining structural gap is PG version support:

  • PLAN_PG_BACKCOMPAT details backporting to PG 14–18 (or PG 16–18 as MVP) in ~2.5–3 weeks, primarily by #[cfg]- gating ~435 lines of JSON/SQL-standard parse-tree code.

Once backcompat is implemented, pg_trickle will be a strict superset of pg_ivm in every dimension: same immediate maintenance model, comparable PG version support (14–18 vs 13–18, with PG 13 EOL), dramatically wider SQL coverage (60+ aggregates vs 5, 21 DVM operators, 42 unique features), and a complete operational layer that pg_ivm entirely lacks.

For users migrating from pg_ivm, the IMMEDIATE refresh mode already provides the same zero-staleness guarantee. A full compatibility layer (pgivm.create_immv, pgivm.refresh_immv, pgivm.pg_ivm_immv) is planned for post-1.0 to enable zero-change migration.


References

Research: Custom SQL Syntax Options

This document surveys custom-syntax extensions considered for pg_trickle (e.g. CREATE STREAM TABLE) and the tradeoffs against the current function-based API (pgtrickle.create_stream_table()). It is intended for contributors and language/parser research.

User documentation on SQL functions is in SQL Reference.


Abstract

pg_trickle deliberately chose a function-based API (pgtrickle.create_stream_table()) over custom DDL syntax (CREATE STREAM TABLE …) for three reasons: PostgreSQL's pg_catalog has no extension point for new top-level statement types without patching the core parser; function calls are portable across every client library and ORMs without any driver-level changes; and they compose naturally with PL/pgSQL, transaction blocks, and conditional DDL patterns.

This research surveys the three realistic implementation routes — grammar patches, ProcessUtility hooks, and comment-driven DDL shims — and quantifies the upgrade maintenance burden of each against the stable, zero-patch approach of function calls. The findings strongly favour the current design for an extension targeting production deployments.

The document also catalogues prior art: pg_partman's run_maintenance() model, timescaledb's grammar extension (and its associated patch surface), and pg_ivm's function-only API. For pg_trickle's scale and lifecycle goals, the function-based API remains the correct long-term choice. A lightweight CREATE STREAM TABLE compatibility shim delivered via a PL/pgSQL wrapper is noted as a viable opt-in convenience without any parser modifications.


Custom SQL Syntax for PostgreSQL Extensions

Comprehensive Technical Research Report

Date: 2026-02-25 Context: pg_trickle extension — evaluating approaches to support CREATE STREAM TABLE syntax or equivalent native-feeling DDL.


Table of Contents

  1. Executive Summary
  2. PostgreSQL Parser Hooks / Utility Hooks
  3. The ProcessUtility_hook Approach
  4. Raw Parser Extension (gram.y)
  5. The Utility Command Approach
  6. Custom Access Methods (CREATE ACCESS METHOD)
  7. Table Access Method API (PostgreSQL 12+)
  8. Foreign Data Wrapper Approach
  9. Event Triggers
  10. TimescaleDB Continuous Aggregates Pattern
  11. Citus Distributed DDL Pattern
  12. PostgreSQL 18 New Features
  13. COMMENT / OPTIONS Abuse Pattern
  14. pg_ivm (Incremental View Maintenance) Pattern
  15. CREATE TABLE ... USING (Table Access Methods) Deep Dive
  16. Comparison Matrix
  17. Recommendations for pg_trickle

1. Executive Summary

PostgreSQL's parser is not extensible — there is no parser hook that allows extensions to add new grammar rules. This is a fundamental design constraint. Every approach to "custom DDL syntax" in extensions falls into one of two categories:

  1. Intercept existing syntax — Use ProcessUtility_hook or event triggers to intercept standard DDL (e.g., CREATE TABLE, CREATE VIEW) and augment its behavior.
  2. Use a SQL function as the DDL interface — Define SELECT my_extension.create_thing(...) as the user-facing API (this is what pg_trickle currently does).

No production PostgreSQL extension ships truly new SQL grammar without forking the PostgreSQL parser. TimescaleDB, Citus, pg_ivm, and others all work within existing syntax boundaries.


2. PostgreSQL Parser Hooks / Utility Hooks

Available Hook Points

PostgreSQL provides several hook function pointers that extensions can override in _PG_init():

HookHeaderPurpose
ProcessUtility_hooktcop/utility.hIntercept utility (DDL) statement execution
post_parse_analyze_hookparser/analyze.hInspect/modify the analyzed parse tree after semantic analysis
planner_hookoptimizer/planner.hReplace or augment the query planner
ExecutorStart_hookexecutor/executor.hIntercept executor startup
ExecutorRun_hookexecutor/executor.hIntercept executor row processing
ExecutorFinish_hookexecutor/executor.hIntercept executor finish
ExecutorEnd_hookexecutor/executor.hIntercept executor cleanup
object_access_hookcatalog/objectaccess.hNotifications when objects are created/modified/dropped
emit_log_hookutils/elog.hIntercept log messages

What's Missing: No Parser Hook

There is no parser_hook or raw_parser_hook. The raw parser (gram.yscan.l → bison grammar) is compiled into the PostgreSQL server binary. Extensions cannot:

  • Add new keywords (e.g., STREAM)
  • Add new grammar productions (e.g., CREATE STREAM TABLE)
  • Modify the tokenizer/lexer
  • Intercept raw SQL text before parsing

The closest hook is post_parse_analyze_hook, which fires after the SQL has already been parsed and analyzed. By this point:

  • The SQL string has already been tokenized and parsed by gram.y
  • A parse tree (Query node) has been produced
  • If the SQL contains unknown syntax, a syntax error has already been raised

Technical Details of post_parse_analyze_hook

/* In src/backend/parser/analyze.c */
typedef void (*post_parse_analyze_hook_type)(ParseState *pstate,
                                             Query *query,
                                             JumbleState *jstate);
post_parse_analyze_hook_type post_parse_analyze_hook = NULL;

Extensions can set this in _PG_init():

static post_parse_analyze_hook_type prev_post_parse_analyze_hook = NULL;

void _PG_init(void) {
    prev_post_parse_analyze_hook = post_parse_analyze_hook;
    post_parse_analyze_hook = my_post_parse_analyze;
}

Use cases: Query rewriting after parsing (e.g., adding security predicates, row-level security), statistics collection, plan caching invalidation. Not usable for new syntax because parsing has already completed.

Pros/Cons

AspectAssessment
Native syntaxImpossible — cannot add new grammar
Intercept existing DDLYes via ProcessUtility_hook
Modify parsed queriesYes via post_parse_analyze_hook
ComplexityLow for hooking, but limited in capability
PG versionAll modern versions (hooks stable since PG 9.x)
MaintenanceVery low — hook signatures rarely change

3. The ProcessUtility_hook Approach

How It Works

ProcessUtility_hook is the most powerful DDL interception point. It fires for every "utility statement" (DDL, COPY, EXPLAIN, etc.) after parsing but before execution.

typedef void (*ProcessUtility_hook_type)(PlannedStmt *pstmt,
                                         const char *queryString,
                                         bool readOnlyTree,
                                         ProcessUtilityContext context,
                                         ParamListInfo params,
                                         QueryEnvironment *queryEnv,
                                         DestReceiver *dest,
                                         QueryCompletion *qc);

An extension can:

  1. Inspect the parse tree node — The PlannedStmt->utilityStmt field contains the parsed DDL node (e.g., CreateStmt, AlterTableStmt, ViewStmt).
  2. Modify the parse tree — Change fields before passing to the standard handler.
  3. Replace execution entirely — Skip calling the standard handler and do something else.
  4. Post-process — Call the standard handler first, then do additional work.
  5. Block execution — Raise an error to prevent the DDL.

What Extensions Use This

ExtensionWhat they interceptPurpose
TimescaleDBCREATE TABLE, ALTER TABLE, DROP TABLE, CREATE INDEX, etc.Convert regular tables to hypertables, distribute DDL
CitusMost DDL statementsPropagate DDL to worker nodes
pg_partmanCREATE TABLE, partition DDLAuto-manage partitioning
pg_stat_statementsAll utility statementsTrack DDL execution statistics
pgAuditAll utility statementsAudit logging
pg_hint_planUses post_parse_analyze_hook instead
sepgsqlObject creation/modificationSecurity label enforcement

Can It Handle New Syntax?

No. It can only intercept DDL that PostgreSQL's parser already understands. You cannot use ProcessUtility_hook to handle CREATE STREAM TABLE because the parser will reject that syntax before the hook is ever called.

However, it can intercept and augment existing syntax:

  • CREATE TABLE ... (some_option) → Intercept CreateStmt, check for special markers, do extra work
  • CREATE VIEW ... WITH (custom_option = true) → Intercept ViewStmt, check reloptions
  • CREATE MATERIALIZED VIEW ... WITH (custom = true) → Same approach

Pattern: Intercepting CREATE TABLE

static void my_process_utility(PlannedStmt *pstmt, ...) {
    Node *parsetree = pstmt->utilityStmt;

    if (IsA(parsetree, CreateStmt)) {
        CreateStmt *stmt = (CreateStmt *) parsetree;
        // Check for a special reloption or table name pattern
        ListCell *lc;
        foreach(lc, stmt->options) {
            DefElem *opt = (DefElem *) lfirst(lc);
            if (strcmp(opt->defname, "stream") == 0) {
                // This is a stream table! Do custom logic.
                create_stream_table_from_ddl(stmt, queryString);
                return; // Don't call standard handler
            }
        }
    }

    // Pass through to standard handler
    if (prev_ProcessUtility)
        prev_ProcessUtility(pstmt, ...);
    else
        standard_ProcessUtility(pstmt, ...);
}

Pros/Cons

AspectAssessment
Native CREATE STREAM TABLENo — parser rejects unknown syntax
CREATE TABLE ... WITH (stream=true)Yes — feasible via reloptions
ComplexityMedium — must carefully chain with other extensions
PG versionAll modern versions
MaintenanceLow — hook signature changes rarely (changed in PG14, PG15)
RiskMust always chain prev_ProcessUtility — misbehaving can break other extensions

4. Raw Parser Extension (gram.y)

How It Works

PostgreSQL's SQL parser is a Bison-generated LALR(1) parser defined in:

  • src/backend/parser/gram.y — Grammar rules (~18,000 lines)
  • src/backend/parser/scan.l — Flex lexer (tokenizer)
  • src/include/parser/kwlist.h — Reserved/unreserved keyword list

To add CREATE STREAM TABLE, you would:

  1. Add STREAM to the keyword list (unreserved or reserved)
  2. Add grammar rules to gram.y:
    CreateStreamTableStmt:
        CREATE STREAM TABLE qualified_name '(' OptTableElementList ')'
        OptWith AS SelectStmt
        {
            CreateStreamTableStmt *n = makeNode(CreateStreamTableStmt);
            n->relation = $4;
            n->query = $9;
            /* ... */
            $$ = (Node *) n;
        }
    ;
    
  3. Add a new NodeTag for CreateStreamTableStmt
  4. Handle it in ProcessUtility
  5. Rebuild the PostgreSQL server

Implications

This requires forking PostgreSQL. The modified parser is compiled into postgres binary. You cannot ship a grammar modification as a loadable extension (.so/.dylib).

Who Does This?

  • YugabyteDB — Fork of PG with custom grammar for distributed features
  • CockroachDB — Entirely custom parser (Go, not PG's Bison grammar)
  • Amazon Aurora (partially) — Custom grammar additions for Aurora-specific features
  • Greenplum — Fork of PG with added grammar for DISTRIBUTED BY, PARTITION BY etc.
  • ParadeDB — Fork of PG with some custom syntax additions

Pros/Cons

AspectAssessment
Native CREATE STREAM TABLEYes — full parser-level support
ComplexityVery high — must maintain a PG fork
PG versionTied to a single PG version
MaintenanceExtremely high — must rebase on every PG release (gram.y changes significantly between major versions)
DistributionCannot use CREATE EXTENSION; must ship entire modified PostgreSQL
User adoptionVery low — users must replace their PostgreSQL installation
psql autocompleteWould work with matching psql modifications
pg_dump/pg_restoreBroken unless you also modify those tools

Verdict: Not viable for an extension. Only viable for a PostgreSQL fork/distribution.


5. The Utility Command Approach

How It Works

Some sources reference a "custom utility command" mechanism. In practice, this does not exist as a formal PostgreSQL extension point. What people sometimes mean is one of:

5a. Using DO Blocks as Custom Commands

DO $$ BEGIN PERFORM pgtrickle.create_stream_table('my_st', 'SELECT ...'); END $$;

This is just a wrapped function call — not a real custom command.

5b. Abusing COMMENT or SET for Command Dispatch

Some extensions parse custom commands from strings:

-- Using SET to pass commands
SET myext.command = 'CREATE STREAM TABLE my_st AS SELECT ...';
SELECT myext.execute_pending_command();

Or using post_parse_analyze_hook to intercept a specially-formatted query:

-- Extension intercepts this via post_parse_analyze_hook
SELECT * FROM myext.dispatch('CREATE STREAM TABLE ...');

5c. Overloading Existing Syntax

Some extensions overload SELECT or CALL:

CALL pgtrickle.create_stream_table('my_st', $$SELECT ...$$);

CALL was introduced in PostgreSQL 11 for stored procedures. Using it makes the DDL feel more "command-like" than SELECT function().

Pros/Cons

AspectAssessment
Native syntaxNo — still a function call in disguise
User experienceModerate — CALL is better than SELECT
ComplexityLow
PG versionPG11+ for CALL
MaintenanceVery low

6. Custom Access Methods (CREATE ACCESS METHOD)

How It Works

PostgreSQL supports extension-defined access methods (index AMs and table AMs):

CREATE ACCESS METHOD my_am TYPE TABLE HANDLER my_am_handler;

This was introduced in PostgreSQL 9.6 for index AMs and extended to table AMs in PostgreSQL 12. The CREATE ACCESS METHOD statement shows PostgreSQL's philosophy: extensions can define new implementations of existing concepts (tables, indexes) but not new concepts (stream tables).

Table AM vs. Index AM

TypeSinceHandler SignatureExample
Index AMPG 9.6IndexAmRoutine with scan/insert/delete callbacksbloom, brin, GiST
Table AMPG 12TableAmRoutine with 60+ callbacksheap (default), columnar (Citus), zedstore (experimental)

Can We Use This for Stream Tables?

The table AM API defines how tuples are stored and retrieved, not how tables are created or maintained. A stream table's key features are:

  • Defining query — Not part of the table AM concept
  • Automatic refresh — Not part of the table AM concept
  • Change tracking — Could partially overlap with table AM's tuple modification callbacks
  • Storage — The actual storage could use heap (default) AM

You could theoretically create a custom table AM that:

  1. Uses heap storage underneath
  2. Intercepts INSERT/UPDATE/DELETE to maintain change buffers
  3. Adds custom metadata

But this would be an extreme abuse of the API. Table AMs are meant for storage engines, not for implementing materialized view semantics.

Pros/Cons

AspectAssessment
Native syntaxNoCREATE TABLE ... USING my_am is the closest
ComplexityExtremely high — 60+ callbacks to implement
FitnessPoor — table AM is about storage, not view maintenance
PG versionPG 12+
MaintenanceHigh — AM API evolves between major versions

7. Table Access Method API (PostgreSQL 12+)

Deep Technical Details

The Table Access Method (AM) API was introduced in PostgreSQL 12 via commit c2fe139c20 by Andres Freund. It abstracts the storage layer, allowing extensions to replace the default heap storage with custom implementations.

The CREATE TABLE ... USING Syntax

-- Use default AM (heap)
CREATE TABLE normal_table (id int, data text);

-- Use custom AM
CREATE TABLE my_table (id int, data text) USING my_custom_am;

-- Set default for a database
SET default_table_access_method = 'my_custom_am';

TableAmRoutine Structure

The handler function must return a TableAmRoutine struct with callbacks:

typedef struct TableAmRoutine {
    NodeTag type;

    /* Slot callbacks */
    const TupleTableSlotOps *(*slot_callbacks)(Relation rel);

    /* Scan callbacks */
    TableScanDesc (*scan_begin)(Relation rel, Snapshot snap, int nkeys, ...);
    void (*scan_end)(TableScanDesc scan);
    void (*scan_rescan)(TableScanDesc scan, ...);
    bool (*scan_getnextslot)(TableScanDesc scan, ...);

    /* Parallel scan */
    Size (*parallelscan_estimate)(Relation rel);
    Size (*parallelscan_initialize)(Relation rel, ...);
    void (*parallelscan_reinitialize)(Relation rel, ...);

    /* Index fetch */
    IndexFetchTableData *(*index_fetch_begin)(Relation rel);
    void (*index_fetch_reset)(IndexFetchTableData *data);
    void (*index_fetch_end)(IndexFetchTableData *data);
    bool (*index_fetch_tuple)(IndexFetchTableData *data, ...);

    /* Tuple modification */
    void (*tuple_insert)(Relation rel, TupleTableSlot *slot, ...);
    void (*tuple_insert_speculative)(Relation rel, ...);
    void (*tuple_complete_speculative)(Relation rel, ...);
    void (*multi_insert)(Relation rel, TupleTableSlot **slots, int nslots, ...);
    TM_Result (*tuple_delete)(Relation rel, ItemPointer tid, ...);
    TM_Result (*tuple_update)(Relation rel, ItemPointer otid, ...);
    TM_Result (*tuple_lock)(Relation rel, ItemPointer tid, ...);

    /* DDL callbacks */
    void (*relation_set_new_filelocator)(Relation rel, ...);
    void (*relation_nontransactional_truncate)(Relation rel);
    void (*relation_copy_data)(Relation rel, const RelFileLocator *newrlocator);
    void (*relation_copy_for_cluster)(Relation rel, ...);
    void (*relation_vacuum)(Relation rel, VacuumParams *params, ...);
    bool (*scan_analyze_next_block)(TableScanDesc scan, ...);
    bool (*scan_analyze_next_tuple)(TableScanDesc scan, ...);

    /* Planner support */
    void (*relation_estimate_size)(Relation rel, int32 *attr_widths, ...);

    /* ... more callbacks */
} TableAmRoutine;

Hybrid Approach: Table AM + ProcessUtility_hook

A more practical pattern:

  1. Register a custom table AM (e.g., stream_am) that wraps heap
  2. Use ProcessUtility_hook to intercept CREATE TABLE ... USING stream_am
  3. When detected, perform stream table registration (catalog, CDC, etc.)
  4. The actual storage uses standard heap via delegation
-- User writes:
CREATE TABLE order_totals (region text, total numeric)
    USING stream_am
    WITH (query = 'SELECT region, SUM(amount) FROM orders GROUP BY region',
          schedule = '1m',
          refresh_mode = 'DIFFERENTIAL');

Problems with This Approach

  1. Column list is mandatoryCREATE TABLE ... USING requires explicit column definitions. Stream tables should derive columns from the query.
  2. Query in WITH clause — Storing a full SQL query in reloptions is hacky and has length limits.
  3. No AS SELECT — Table AMs don't support CREATE TABLE ... AS SELECT with USING clause in the standard grammar.
  4. VACUUM, ANALYZE complexity — Must implement or delegate all maintenance callbacks.
  5. pg_dump compatibility — pg_dump would dump CREATE TABLE ... USING stream_am but not the associated metadata (query, schedule, etc.)

Pros/Cons

AspectAssessment
Native syntaxPartialCREATE TABLE ... USING stream_am
Feels like a stream tableNo — still looks like a regular table with options
ComplexityVery high
pg_dumpBroken — metadata in catalog tables won't be dumped
PG versionPG 12+
MaintenanceHigh — table AM API changes between versions

8. Foreign Data Wrapper Approach

How It Works

Foreign Data Wrappers (FDW) allow PostgreSQL to access external data sources via CREATE FOREIGN TABLE. An extension can register a custom FDW:

CREATE EXTENSION pg_trickle;
CREATE SERVER stream_server FOREIGN DATA WRAPPER pgtrickle_fdw;

CREATE FOREIGN TABLE order_totals (region text, total numeric)
    SERVER stream_server
    OPTIONS (
        query 'SELECT region, SUM(amount) FROM orders GROUP BY region',
        schedule '1m',
        refresh_mode 'DIFFERENTIAL'
    );

FDW API

The FDW API provides callbacks for:

  • GetForeignRelSize — Estimate relation size for planning
  • GetForeignPaths — Generate access paths
  • GetForeignPlan — Create a plan node
  • BeginForeignScan — Start scan
  • IterateForeignScan — Get next tuple
  • EndForeignScan — End scan
  • AddForeignUpdatePaths — Support INSERT/UPDATE/DELETE (optional)

How It Could Work for Stream Tables

  1. Define a custom FDW (pgtrickle_fdw)
  2. The FDW's scan callbacks read from the underlying storage table
  3. ProcessUtility_hook intercepts CREATE FOREIGN TABLE ... SERVER stream_server to set up CDC, catalog entries, etc.
  4. A background worker handles refresh scheduling

Problems

  1. Foreign tables have restrictions — Cannot have indexes, constraints, triggers, or participate in inheritance. This severely limits usability.
  2. Query planner limitations — Foreign tables use a separate planning path with potentially worse plan quality.
  3. No MVCC — Foreign tables typically don't provide snapshot isolation semantics.
  4. User model confusion — "Foreign table" implies external data, not a derived view.
  5. EXPLAIN output — Shows "Foreign Scan" instead of "Seq Scan", confusing users.
  6. pg_dump — Foreign tables are dumped, but server/FDW setup may not transfer correctly.
  7. Two-step creation — Requires CREATE SERVER before CREATE FOREIGN TABLE.

Pros/Cons

AspectAssessment
Native syntaxPartialCREATE FOREIGN TABLE with options
Feels like a stream tableNo — foreign tables have different semantics
Index supportNo — major limitation
Trigger supportNo — major limitation
ComplexityMedium
PG versionPG 9.1+
MaintenanceLow — FDW API is very stable

Verdict: Not suitable. The restrictions on foreign tables (no indexes, no triggers) make this impractical for stream tables that need to behave like regular tables.


9. Event Triggers

How It Works

Event triggers fire on DDL events at the database level:

CREATE EVENT TRIGGER my_trigger ON ddl_command_end
    WHEN TAG IN ('CREATE TABLE', 'ALTER TABLE', 'DROP TABLE')
    EXECUTE FUNCTION my_handler();

Available events:

  • ddl_command_start — Before DDL execution (PG 9.3+)
  • ddl_command_end — After DDL execution (PG 9.3+)
  • sql_drop — When objects are dropped (PG 9.3+)
  • table_rewrite — When a table is rewritten (PG 9.5+)

Inside the Handler

CREATE FUNCTION my_handler() RETURNS event_trigger AS $$
DECLARE
    obj record;
BEGIN
    FOR obj IN SELECT * FROM pg_event_trigger_ddl_commands()
    LOOP
        -- obj.objid, obj.object_type, obj.command_tag, etc.
        IF obj.command_tag = 'CREATE TABLE' AND obj.object_type = 'table' THEN
            -- Check if this table has a special marker
            -- (e.g., a specific reloption or comment)
        END IF;
    END LOOP;
END;
$$ LANGUAGE plpgsql;

Pattern: CREATE TABLE + Event Trigger

  1. User creates a table with a special comment or option:
    CREATE TABLE order_totals (region text, total numeric);
    COMMENT ON TABLE order_totals IS 'pgtrickle:query=SELECT region...;schedule=1m';
    
  2. Event trigger on ddl_command_end fires
  3. Handler parses the comment, detects stream table intent
  4. Handler registers the stream table in the catalog

Limitations

  1. Cannot modify the DDL — Event triggers observe DDL, they can't change what happened. On ddl_command_end, the table already exists.
  2. Cannot prevent DDL — On ddl_command_start, you can raise an error to prevent it, but you can't redirect it.
  3. Two-step process — User must CREATE TABLE AND then mark it somehow (comment, option, separate function call).
  4. No custom syntax — Event triggers watch existing DDL commands.
  5. pg_trickle already uses this — For DDL tracking on upstream tables (see hooks.rs).

Pros/Cons

AspectAssessment
Native syntaxNo — watches existing DDL only
ComplexityLow
Can transform DDLNo — observe only
PG versionPG 9.3+
MaintenanceVery low
pg_trickle usageAlready used for upstream DDL tracking

10. TimescaleDB Continuous Aggregates Pattern

How It Works

TimescaleDB continuous aggregates (caggs) demonstrate the most sophisticated approach to custom DDL-like syntax in a PostgreSQL extension. Their evolution is instructive.

Phase 1: Pure Function API (early versions)

-- Create a view, then register it
CREATE VIEW daily_temps AS
SELECT time_bucket('1 day', time) AS day, AVG(temp)
FROM conditions GROUP BY 1;

SELECT add_continuous_aggregate_policy('daily_temps', ...);

Phase 2: CREATE MATERIALIZED VIEW WITH (introduced in TimescaleDB 2.0)

CREATE MATERIALIZED VIEW daily_temps
WITH (timescaledb.continuous) AS
SELECT time_bucket('1 day', time) AS day, device_id, AVG(temp)
FROM conditions
GROUP BY 1, 2;

How the Hook Chain Works

TimescaleDB's approach uses layered hooks:

  1. ProcessUtility_hook intercepts CREATE MATERIALIZED VIEW
  2. Checks reloptions for timescaledb.continuous in the WithClause
  3. If found:
    • Does NOT call standard ProcessUtility for the matview
    • Instead creates a regular hypertable (the materialization)
    • Creates an internal view (the user-facing query interface)
    • Registers refresh policies in the catalog
    • Sets up continuous aggregate metadata
  4. For REFRESH MATERIALIZED VIEW, intercepts and routes to their refresh engine
  5. For DROP MATERIALIZED VIEW, intercepts and cleans up all artifacts

The Magic: Reloptions as Extension Point

PostgreSQL's CREATE MATERIALIZED VIEW ... WITH (option = value) passes options as DefElem nodes in the parse tree. The parser treats these as generic key-value pairs — it does NOT validate the option names. This is the key insight: PostgreSQL's parser accepts arbitrary options in WITH clauses.

// In ProcessUtility_hook:
if (IsA(parsetree, CreateTableAsStmt)) {
    CreateTableAsStmt *stmt = (CreateTableAsStmt *) parsetree;
    if (stmt->objtype == OBJECT_MATVIEW) {
        // Check for our custom option in stmt->into->options
        bool is_continuous = false;
        ListCell *lc;
        foreach(lc, stmt->into->rel->options) {
            DefElem *opt = (DefElem *) lfirst(lc);
            if (strcmp(opt->defname, "timescaledb.continuous") == 0) {
                is_continuous = true;
                break;
            }
        }
        if (is_continuous) {
            // Handle as continuous aggregate
            return;
        }
    }
}

Refresh Policies

-- Add a refresh policy (function call, not DDL)
SELECT add_continuous_aggregate_policy('daily_temps',
    start_offset => INTERVAL '1 month',
    end_offset => INTERVAL '1 day',
    schedule_interval => INTERVAL '1 hour');

What pg_trickle Could Learn

The TimescaleDB pattern for pg_trickle would look like:

-- Option A: CREATE MATERIALIZED VIEW with custom option
CREATE MATERIALIZED VIEW order_totals
WITH (pgtrickle.stream = true, pgtrickle.schedule = '1m', pgtrickle.mode = 'DIFFERENTIAL')
AS SELECT region, SUM(amount) FROM orders GROUP BY region;

-- Option B: CREATE TABLE with custom option (less natural)
CREATE TABLE order_totals (region text, total numeric)
WITH (pgtrickle.stream = true);
-- Then separately: SELECT pgtrickle.set_query('order_totals', 'SELECT ...');

Pros/Cons

AspectAssessment
Native syntaxGoodCREATE MATERIALIZED VIEW ... WITH (pgtrickle.stream) looks natural
User experienceVery good — familiar DDL syntax with extension options
ComplexityHigh — must implement full ProcessUtility_hook chain
pg_dumpPartial — matview DDL is dumped, but custom metadata needs pg_dump extension or config tables
PG versionPG 9.3+ (matviews), PG 12+ (better option handling)
MaintenanceMedium — must track changes to matview creation internals
Shared preloadRequired — ProcessUtility_hook needs shared_preload_libraries

11. Citus Distributed DDL Pattern

How It Works

Citus (now part of Microsoft) demonstrates another approach to extending DDL behavior:

ProcessUtility_hook Chain

Citus has one of the most comprehensive ProcessUtility_hook implementations:

void multi_ProcessUtility(PlannedStmt *pstmt, ...) {
    // 1. Classify the DDL
    Node *parsetree = pstmt->utilityStmt;

    // 2. Check if it affects distributed tables
    if (IsA(parsetree, AlterTableStmt)) {
        // Propagate ALTER TABLE to all worker nodes
        PropagateAlterTable((AlterTableStmt *)parsetree, queryString);
    }

    // 3. Call standard handler (or skip for intercepted commands)
    if (prev_ProcessUtility)
        prev_ProcessUtility(pstmt, ...);
    else
        standard_ProcessUtility(pstmt, ...);

    // 4. Post-processing
    if (IsA(parsetree, CreateStmt)) {
        // Check if we should auto-distribute this table
    }
}

Table Distribution via Function Calls

Citus does NOT add custom DDL syntax. Distribution is done via function calls:

-- Create a regular table
CREATE TABLE events (id bigint, data jsonb, created_at timestamptz);

-- Distribute it (function call, not DDL)
SELECT create_distributed_table('events', 'id');

-- Or create a reference table
SELECT create_reference_table('lookups');

Columnar Storage via Table AM

Citus also provides columnar storage as a table AM:

CREATE TABLE analytics_data (...)
    USING columnar;

This uses the table AM API (PostgreSQL 12+) — see Section 7.

What Citus Teaches Us

  • Function calls for complex operationscreate_distributed_table() is analogous to pgtrickle.create_stream_table().
  • ProcessUtility_hook for DDL propagation — Intercept standard DDL and add behavior.
  • Table AM for storage — Separate concern from distribution logic.
  • No custom syntax — Even with Microsoft's resources, Citus doesn't fork the parser.

Pros/Cons

AspectAssessment
Native syntaxNo — uses function calls like pg_trickle
Approach validatedYes — Citus is used at massive scale with this pattern
ComplexityMedium (function API) to High (ProcessUtility_hook)
User adoptionProven successful
MaintenanceLow for function API

12. PostgreSQL 18 New Features

Relevant Extension Points in PG 18

PostgreSQL 18 (released 2025) includes several features relevant to this analysis:

12a. Virtual Generated Columns

PG 18 adds GENERATED ALWAYS AS (expr) VIRTUAL columns. Not directly relevant to stream tables, but shows PostgreSQL's willingness to expand CREATE TABLE syntax incrementally.

12b. Improved Table AM API

PG 18 refines the table AM API with better TOAST handling and improved parallel scan support. This makes custom table AMs slightly more practical.

12c. Enhanced Event Trigger Information

PG 18 expands pg_event_trigger_ddl_commands() with additional metadata fields, making event-trigger-based approaches more capable.

12d. pg_stat_io Improvements

Enhanced I/O statistics infrastructure that could benefit monitoring of stream table refresh operations.

12e. No New Parser Extension Points

PostgreSQL 18 does not add any parser extension mechanism. The parser remains monolithic and non-extensible. There have been occasional discussions on pgsql-hackers about parser hooks, but no concrete proposals have been accepted.

12f. No Custom DDL Extension Points

No new general-purpose DDL extension points beyond the existing hook system.

Looking Forward: Discussion on pgsql-hackers

There have been recurring threads on pgsql-hackers about:

  • Extension-defined SQL syntax — Rejected due to complexity and parser architecture
  • Loadable parser modules — Theoretical discussions, no implementation
  • Extension catalogs — Some interest in allowing extensions to register custom catalogs

None of these are implemented in PG 18.

Pros/Cons

AspectAssessment
New syntax extension pointsNone in PG 18
Table AM improvementsMinor — slightly easier to implement
Event trigger improvementsMinor — more metadata available
Parser extensibilityNot planned for any upcoming PG release

13. COMMENT / OPTIONS Abuse Pattern

How It Works

Several extensions use table comments or reloptions as a "poor man's metadata" to tag tables with custom semantics.

Pattern 1: COMMENT-based

CREATE TABLE order_totals (region text, total numeric);
COMMENT ON TABLE order_totals IS '@pgtrickle {"query": "SELECT ...", "schedule": "1m"}';

An event trigger or background worker scans pg_description for tables with the @pgtrickle prefix and processes them.

Pattern 2: Reloptions-based

CREATE TABLE order_totals (region text, total numeric)
    WITH (fillfactor = 70, pgtrickle.stream = true);

Problem: PostgreSQL validates reloptions against a known list. You cannot add arbitrary options to WITH (...) without registering them. Extensions can register custom reloptions via add_reloption() functions, but this is a relatively obscure API.

Pattern 3: GUC-based Tagging

-- Set a GUC that our ProcessUtility_hook reads
SET pgtrickle.next_create_is_stream = true;
SET pgtrickle.stream_query = 'SELECT region, SUM(amount) FROM orders GROUP BY region';

-- Hook intercepts this CREATE TABLE and registers it
CREATE TABLE order_totals (region text, total numeric);

-- Reset
RESET pgtrickle.next_create_is_stream;

This is extremely hacky but has been used in practice (some partitioning extensions used similar patterns before native partitioning).

Who Uses This?

  • pgmemcache — Uses comments to configure caching behavior
  • Some row-level security extensions — Comments to define policies
  • pg_partman — Uses a configuration table (not comments) but similar concept

Pros/Cons

AspectAssessment
Native syntaxNo — abuses existing mechanisms
User experiencePoor — fragile, easy to break by editing comments
ComplexityLow
pg_dumpCOMMENT is dumped — metadata survives pg_dump/restore
RobustnessLow — comments can be accidentally changed
PG versionAll versions

14. pg_ivm (Incremental View Maintenance) Pattern

How It Works

pg_ivm is the most directly comparable extension to pg_trickle. It implements incremental view maintenance for PostgreSQL.

API Design

pg_ivm uses a pure function-call API:

-- Create an incrementally maintainable materialized view
SELECT create_immv('order_totals', 'SELECT region, SUM(amount) FROM orders GROUP BY region');

-- Refresh
SELECT refresh_immv('order_totals');

-- Drop
DROP TABLE order_totals;  -- Just drop the underlying table

Key function: create_immv(name, query) — Creates an "Incrementally Maintainable Materialized View" (IMMV).

Internal Implementation

  1. create_immv() is a SQL function (not a hook)
  2. It parses the query, creates a storage table, sets up triggers on source tables
  3. IMMVs are stored as regular tables with metadata in a custom catalog (pg_ivm_immv)
  4. Triggers on source tables automatically update the IMMV on DML

No ProcessUtility_hook

pg_ivm does not use ProcessUtility_hook. It operates entirely through:

  • SQL functions (create_immv, refresh_immv)
  • Row-level triggers for automatic maintenance
  • A custom catalog table for metadata

Why No Custom Syntax?

pg_ivm was developed as a proof-of-concept for PostgreSQL core IVM support. The authors explicitly chose function-call syntax to:

  1. Avoid shared_preload_libraries requirement (hooks need it)
  2. Keep the extension simple and portable
  3. Focus on the IVM algorithm, not the user interface

Eventually Merged to Core?

There was discussion about upstreaming IVM to PostgreSQL core. If merged, it would get proper syntax (CREATE INCREMENTAL MATERIALIZED VIEW). As an extension, it stays with function calls.

Relevance to pg_trickle

pg_trickle's current API (pgtrickle.create_stream_table()) follows the exact same pattern as pg_ivm. This is the established approach for IVM extensions.

Pros/Cons

AspectAssessment
Native syntaxNo — function calls
ComplexityLow — simple function API
shared_preload_librariesNot required for basic function API
pg_dumpNo — function calls are not dumped; must use custom dump/restore
User experienceModerate — familiar to pg_ivm users
Community acceptanceEstablished pattern for IVM extensions

15. CREATE TABLE ... USING (Table Access Methods) Deep Dive

Full Syntax

CREATE TABLE tablename (
    column1 datatype,
    column2 datatype,
    ...
) USING access_method_name
  WITH (storage_parameter = value, ...);

How the Parser Handles USING

In gram.y:

CreateStmt: CREATE OptTemp TABLE ...
    OptTableAccessMethod OptWith ...

OptTableAccessMethod:
    USING name    { $$ = $2; }
    | /* empty */ { $$ = NULL; }
    ;

The USING clause sets CreateStmt->accessMethod to the access method name string.

How ProcessUtility Handles It

In createRelation() (src/backend/commands/tablecmds.c):

  1. If accessMethod is specified, look it up in pg_am
  2. Verify it's a table AM (not an index AM)
  3. Store the AM OID in pg_class.relam
  4. Use the AM's callbacks for all subsequent operations

Custom Reloptions with Table AMs

Table AMs can define custom reloptions via:

static relopt_parse_elt stream_relopt_tab[] = {
    {"query", RELOPT_TYPE_STRING, offsetof(StreamOptions, query)},
    {"schedule", RELOPT_TYPE_STRING, offsetof(StreamOptions, schedule)},
    {"refresh_mode", RELOPT_TYPE_STRING, offsetof(StreamOptions, refresh_mode)},
};

This would allow:

CREATE TABLE order_totals (region text, total numeric)
    USING stream_heap
    WITH (query = 'SELECT ...', schedule = '1m', refresh_mode = 'DIFFERENTIAL');

Problems Specific to Stream Tables

  1. Column derivation — Stream tables derive columns from the query. CREATE TABLE ... USING requires explicit column definitions, creating redundancy and potential inconsistency.

  2. No AS SELECT — You can't combine USING with AS SELECT:

    -- This does NOT work in PostgreSQL grammar:
    CREATE TABLE order_totals
        USING stream_heap
        AS SELECT region, SUM(amount) FROM orders GROUP BY region;
    
  3. Full AM implementation required — Even if you delegate to heap, you must implement all callbacks and handle edge cases.

  4. VACUUM/ANALYZE — Must properly delegate to heap for these to work.

  5. Replication — Logical replication assumes heap tuples; custom AMs may break.

Hybrid Practical Approach

If pursuing this route:

-- Step 1: Set default AM
SET default_table_access_method = 'stream_heap';

-- Step 2: Create with query in options
CREATE TABLE order_totals ()
    WITH (pgtrickle.query = 'SELECT region, SUM(amount) FROM orders GROUP BY region',
          pgtrickle.schedule = '1m');

-- ProcessUtility_hook would:
-- 1. Detect USING stream_heap (or detect our custom reloptions)
-- 2. Parse the query from options
-- 3. Derive columns from the query
-- 4. Create the actual table with proper columns using heap AM
-- 5. Register in pgtrickle catalog
-- 6. Set up CDC

Pros/Cons

AspectAssessment
Native syntaxPartialCREATE TABLE ... USING stream_heap WITH (...)
Column derivationNot supported — must specify columns or use hook magic
ComplexityVery high
pg_dumpGoodCREATE TABLE ... USING is properly dumped
PG versionPG 12+
MaintenanceHigh — AM API changes between versions

16. Comparison Matrix

ApproachNative SyntaxComplexitypg_dumpPG VersionMaintenanceRecommended
Function API (current)NoLowNo*AnyVery LowYes
ProcessUtility_hook + MATVIEW WITHGoodHighPartial9.3+MediumMaybe
Raw parser forkPerfectVery HighNoFork onlyVery HighNo
Table AM USINGPartialVery HighYes12+HighNo
FDW FOREIGN TABLEPartialMediumYes9.1+LowNo
Event triggers aloneNoLowNo9.3+LowNo
COMMENT abuseNoLowYesAnyLowNo
GUC + CREATE TABLE hackNoMediumPartialAnyMediumNo
TimescaleDB pattern (MATVIEW + WITH)GoodHighPartial9.3+MediumBest option

* Custom pg_dump support can be added via pg_dump hook or wrapper script.


17. Recommendations for pg_trickle

Current Approach: Function API (Keep and Enhance)

pg_trickle's current approach (pgtrickle.create_stream_table('name', 'query', ...)) is:

  • Proven — Same pattern as pg_ivm, Citus, and many other extensions
  • Simple — No shared_preload_libraries required for basic usage
  • Maintainable — No hook chains to debug
  • Portable — Works on any PG version that supports pgrx

Enhancement opportunities:

-- Current
SELECT pgtrickle.create_stream_table('order_totals',
    'SELECT region, SUM(amount) FROM orders GROUP BY region', '1m');

-- Enhanced: CALL syntax for more DDL-like feel (PG 11+)
CALL pgtrickle.create_stream_table('order_totals',
    $$SELECT region, SUM(amount) FROM orders GROUP BY region$$, '1m');

Future Option: TimescaleDB-style Materialized View Integration

If user demand justifies the complexity, pg_trickle could add a second creation path via ProcessUtility_hook:

-- New native-feeling syntax (requires shared_preload_libraries)
CREATE MATERIALIZED VIEW order_totals
WITH (pgtrickle.stream = true, pgtrickle.schedule = '1m')
AS SELECT region, SUM(amount) FROM orders GROUP BY region
WITH NO DATA;

-- Original function API still works (no hook needed)
SELECT pgtrickle.create_stream_table('order_totals',
    'SELECT region, SUM(amount) FROM orders GROUP BY region', '1m');

Implementation plan for hook-based approach:

  1. Register ProcessUtility_hook in _PG_init() (already needed for shared_preload_libraries)
  2. Intercept CREATE MATERIALIZED VIEW → Check for pgtrickle.stream option
  3. If found: parse options, call create_stream_table_impl() internally, create standard storage table instead of matview
  4. Intercept DROP MATERIALIZED VIEW → Check if target is a stream table → Clean up
  5. Intercept REFRESH MATERIALIZED VIEW → Route to stream table refresh engine
  6. Intercept ALTER MATERIALIZED VIEW → Route to stream table alter logic

Estimated complexity: ~800-1200 lines of Rust hook code + tests.

  • Forking PostgreSQL for custom grammar — Maintenance cost is prohibitive
  • Table AM approach — Complexity without proportional benefit
  • FDW approach — Too many restrictions on foreign tables
  • COMMENT abuse — Fragile and poor UX

pg_dump / pg_restore Strategy

Regardless of approach, pg_dump is a challenge. Options:

  1. Custom dump/restore functionspgtrickle.dump_config() and pgtrickle.restore_config()
  2. Migration script generationpgtrickle.generate_migration() outputs SQL to recreate all stream tables
  3. Event trigger on restore — Detect when tables are restored and re-register them
  4. Sidecar file — Generate a companion SQL file alongside pg_dump

Appendix A: Hook Registration in pgrx (Rust)

For reference, here's how ProcessUtility_hook registration works in pgrx:

#![allow(unused)]
fn main() {
use pgrx::prelude::*;
use pgrx::pg_sys;
use std::ffi::CStr;

static mut PREV_PROCESS_UTILITY_HOOK: pg_sys::ProcessUtility_hook_type = None;

#[pg_guard]
pub extern "C-unwind" fn my_process_utility(
    pstmt: *mut pg_sys::PlannedStmt,
    query_string: *const std::os::raw::c_char,
    read_only_tree: bool,
    context: pg_sys::ProcessUtilityContext,
    params: pg_sys::ParamListInfo,
    query_env: *mut pg_sys::QueryEnvironment,
    dest: *mut pg_sys::DestReceiver,
    qc: *mut pg_sys::QueryCompletion,
) {
    // SAFETY: pstmt is a valid pointer provided by PostgreSQL
    let stmt = unsafe { (*pstmt).utilityStmt };

    // Check if this is a CreateTableAsStmt (materialized view)
    if unsafe { pgrx::is_a(stmt, pg_sys::NodeTag::T_CreateTableAsStmt) } {
        // Check for our custom options...
    }

    // Chain to previous hook or standard handler
    unsafe {
        if let Some(prev) = PREV_PROCESS_UTILITY_HOOK {
            prev(pstmt, query_string, read_only_tree, context,
                 params, query_env, dest, qc);
        } else {
            pg_sys::standard_ProcessUtility(
                pstmt, query_string, read_only_tree, context,
                params, query_env, dest, qc);
        }
    }
}

pub fn register_hooks() {
    unsafe {
        PREV_PROCESS_UTILITY_HOOK = pg_sys::ProcessUtility_hook;
        pg_sys::ProcessUtility_hook = Some(my_process_utility);
    }
}
}

Appendix B: Key Source Files in PostgreSQL

FilePurpose
src/backend/parser/gram.ySQL grammar (~18,000 lines)
src/backend/parser/scan.lLexer/tokenizer
src/include/parser/kwlist.hKeyword definitions
src/backend/tcop/utility.cProcessUtility() — DDL dispatcher
src/backend/commands/tablecmds.cCREATE/ALTER/DROP TABLE implementation
src/backend/commands/createas.cCREATE TABLE AS / CREATE MATVIEW AS
src/include/access/tableam.hTable Access Method API
src/include/foreign/fdwapi.hFDW API
src/backend/commands/event_trigger.cEvent trigger infrastructure

Appendix C: References

  1. PostgreSQL Documentation — Table Access Method Interface
  2. PostgreSQL Documentation — Event Triggers
  3. PostgreSQL Documentation — Writing A Foreign Data Wrapper
  4. TimescaleDB Source — process_utility.c
  5. Citus Source — multi_utility.c
  6. pg_ivm Source — createas.c
  7. pgrx Documentation — Hooks
  8. PostgreSQL Wiki — CustomScanProviders

Research: Triggers vs. WAL Replication for CDC

This document analyses the architectural tradeoffs between trigger-based CDC (pg_trickle's default) and WAL logical-replication CDC. It provides the engineering rationale behind ADR-001 and ADR-002.

User-facing CDC documentation is in CDC Modes.


Abstract

Change-data capture (CDC) is the mechanism by which pg_trickle discovers which rows in a source table changed between refresh cycles. Two fundamentally different approaches exist inside PostgreSQL: row/statement-level AFTER triggers that fire synchronously inside the source transaction, and logical replication (WAL decoding) that reads the write-ahead log asynchronously after the transaction commits.

pg_trickle's default CDC mode is trigger-based (ADR-001, v0.1.0) for a single decisive reason: trigger-based CDC delivers transactional atomicity — the change is captured in the same transaction that made it, under the same locks, with no possibility of a committed write being invisible to the next refresh cycle. WAL-based CDC always has a decoding lag; a crash between transaction commit and WAL decode can lose a change window. For an IVM system where data correctness is non-negotiable, the trigger approach eliminates an entire class of consistency hazards.

The tradeoffs are real: trigger-based CDC adds 5–20% write-side latency on bulk DML (mitigated by statement-level triggers since v0.4.0), requires an additional change-buffer table per source, and cannot capture changes made by pg_dump or direct file-level manipulation. WAL-based CDC has zero write-side overhead but requires wal_level = logical, is affected by slot lag under write storms, and has a non-trivial failure-recovery surface. This document quantifies both approaches with benchmarks, defines the boundary conditions under which WAL-mode is preferable (append-only high-throughput sources with relaxed latency requirements), and explains pg_trickle's hybrid auto mode that starts with triggers and promotes to WAL when conditions are right.


Triggers vs Logical Replication for CDC in pg_trickle

Status: Evaluation Report (updated with implementation status)
Date: 2026-02-24
Context: ADR-001/ADR-002 in PLAN_ADRS.md · PLAN_USER_TRIGGERS_EXPLICIT_DML.md


Executive Summary

pg_trickle uses row-level AFTER triggers to capture changes on source tables. This report evaluates the trigger-based approach against logical replication (WAL-based CDC) across five dimensions: correctness, performance, operations, and two end-user features — user-defined triggers on stream tables and logical replication subscriptions from stream tables.

Conclusion: Triggers remain the correct choice for the current scope given operational simplicity and zero-config deployment. The hybrid approach — trigger bootstrap for creation with automatic WAL transition for steady-state — is now implemented (pg_trickle.cdc_mode GUC, src/wal_decoder.rs). User- defined triggers on stream tables are also implemented (pg_trickle.user_triggers GUC, DISABLE TRIGGER USER during refresh). These were previously recommendations (§6.2, §6.6); both are now shipped.

However, the atomicity constraint — the original reason for choosing triggers — is primarily a creation-time inconvenience, not a steady-state limitation. Once a stream table exists, logical replication has three significant runtime advantages:

  • No write-side overhead — With triggers, every INSERT/UPDATE/DELETE on a tracked source table does extra work before the application's transaction can commit: it runs a PL/pgSQL function, writes a row into a buffer table, and updates an index. This slows down the application. With logical replication, PostgreSQL already writes every change to its internal transaction log (WAL) regardless — the CDC layer simply reads that log after the fact, so the application's writes are not slowed down at all.

  • TRUNCATE capture — When someone runs TRUNCATE on a source table, row-level triggers do not fire (TRUNCATE replaces the entire file rather than deleting rows one-by-one). This leaves stream tables silently stale until a manual refresh. Logical replication captures TRUNCATE natively from the WAL, so pg_trickle would know immediately that all rows were removed.

  • Change ordering from the transaction log — With triggers, each trigger independently calls pg_current_wal_lsn() to timestamp its change. With logical replication, the ordering comes directly from the WAL — the authoritative, global record of all database changes — which means change ordering is guaranteed to match commit order, even across concurrent transactions.

The two end-user features (user triggers and logical replication FROM stream tables) are both achievable without changing the CDC mechanism. A hybrid approach (triggers for creation, logical replication for steady-state) deserves serious consideration. See §3 for the full analysis.


1. Background

Current Architecture

CDC triggers on each tracked source table write typed per-column rows into per-table buffer tables (pgtrickle_changes.changes_<oid>). Each buffer row captures:

ColumnPurpose
change_idBIGSERIAL ordering within a source
lsnpg_current_wal_lsn() at trigger time
action'I' / 'U' / 'D'
pk_hashContent hash of PK columns (optional)
new_<col>Per-column NEW values (INSERT/UPDATE)
old_<col>Per-column OLD values (UPDATE/DELETE)

A covering B-tree index (lsn, pk_hash, change_id) INCLUDE (action) supports the differential refresh's LSN-range scan.

The Atomicity Constraint

create_stream_table() performs DDL (CREATE TABLE) and DML (catalog inserts) before setting up CDC. pg_create_logical_replication_slot() cannot execute inside a transaction that has already performed writes. This makes single-transaction atomic creation impossible with logical replication — the decisive factor in the original ADR.


2. Comparison Matrix

2.1 Correctness & Transactional Safety

AspectTriggersLogical Replication
Atomic creation✅ Same transaction as DDL+catalog❌ Slot creation requires separate transaction
Change visibility✅ Immediate (same transaction)⚠️ Asynchronous (after COMMIT + WAL decode)
TRUNCATE capture❌ Row-level triggers not fired✅ WAL emits TRUNCATE since PG 11
Transaction ordering✅ Change buffer rows ordered by LSN✅ WAL stream preserves commit order
Crash recovery✅ Buffer tables are WAL-logged; no orphan state⚠️ Slot survives crash but may need re-sync
Schema change handling✅ DDL event hooks rebuild trigger in-place⚠️ Requires slot re-creation or output plugin awareness

Key insight: The TRUNCATE gap is the most significant correctness limitation of the trigger approach. A statement-level AFTER TRUNCATE trigger that marks downstream STs for automatic FULL refresh would close this gap without changing the CDC architecture (see §6 Recommendation 3).

2.2 Performance

MetricTriggersLogical Replication
Per-row write overhead~2–4 μs (narrow INSERT) to ~5–15 μs (wide UPDATE)~0 (WAL writes happen regardless)
Expected throughput reduction1.5–5× on tracked source tablesNone on source tables
Write amplification2× (source WAL + buffer table WAL + index)1× (only source WAL)
Change buffer storageHeap table + index per sourceWAL segments (shared, recycled)
Sequence contentionBIGSERIAL per buffer (lightweight)N/A
Throughput ceiling~5,000 writes/sec (estimated)WAL throughput (much higher)
Decoding CPU costN/ANon-trivial; output plugin runs in WAL sender
Zero-change refresh~3 ms (EXISTS check on empty buffer)~3 ms (no pending WAL changes)

Key insight: Trigger overhead is synchronous — every committing transaction pays the cost. For applications with moderate write rates (<5,000 writes/sec) this is acceptable. For high-throughput OLTP workloads, logical replication's zero write-side overhead is a significant advantage.

2.3 Operational Complexity

AspectTriggersLogical Replication
PostgreSQL configurationNone requiredwal_level = logical + restart
Managed PG compatibility✅ Works everywhere⚠️ Some providers restrict wal_level
WAL retention riskNone (buffer tables are independent)Slots prevent WAL cleanup; disk exhaustion risk
Slot managementN/ACreate, monitor, drop; orphan detection
max_replication_slotsN/AMust be sized for number of tracked sources
REPLICA IDENTITY configN/ARequired on all tracked source tables
MonitoringBuffer table row countsSlot lag, WAL retention, decode rate
Extension dependenciesNoneOutput plugin (pgoutput, wal2json, or custom)
Upgrade pathCREATE OR REPLACE FUNCTIONSlot protocol version compatibility

Key insight: Triggers are operationally simpler by a wide margin. Logical replication introduces a class of failure modes (stuck slots, WAL bloat, replica identity misconfiguration) that require dedicated monitoring and operational runbooks.

2.4 Feature: User Triggers on Stream Tables

This addresses end-user triggers on the output stream tables, not CDC triggers on source tables.

AspectCurrent (Trigger CDC)With Logical Replication CDC
Feasibility✅ Achievable via session_replication_role✅ Same mechanism applies
Refresh suppressionSET LOCAL session_replication_role = 'replica'Same
Post-refresh notificationNOTIFY pg_trickle_refresh with metadataSame
MERGE firing patternDELETE+INSERT (not UPDATE); must be suppressedSame — refresh mechanism is independent of CDC

Key insight: User trigger support on stream tables is orthogonal to the CDC mechanism and is now implemented. The solution uses ALTER TABLE ... DISABLE TRIGGER USER / ENABLE TRIGGER USER around FULL refresh (avoiding the session_replication_role conflict with logical replication publishing). In DIFFERENTIAL mode, explicit per-row DML (INSERT/UPDATE/DELETE) is used instead of MERGE so that user-defined AFTER triggers fire correctly. The implementation is controlled by the pg_trickle.user_triggers GUC (auto/ on/off). See PLAN_USER_TRIGGERS_EXPLICIT_DML.md for the full design.

Note: Sections 2.1–2.5 compare creation-time and operational aspects. For a focused steady-state comparison (what matters once the ST exists), see §3.

2.5 Feature: Logical Replication FROM Stream Tables

This addresses end-users subscribing to stream table changes via PostgreSQL's built-in logical replication.

AspectStatusNotes
Basic publishing✅ Works todaySTs are regular heap tables; CREATE PUBLICATION works
__pgt_row_id column⚠️ Replicated by defaultUse column list in PUBLICATION to exclude, or document as usable PK
Differential refresh✅ DELETE+INSERT via MERGE are replicatedSubscriber sees individual DELETEs and INSERTs, not UPDATEs
Full refresh✅ TRUNCATE + INSERT replicatedSubscriber needs replica_identity set; receives TRUNCATE + mass INSERT
REPLICA IDENTITYNeeds configuration__pgt_row_id could serve as unique index for identity

The session_replication_role Conflict

If the refresh engine sets session_replication_role = 'replica' to suppress user triggers (Phase 1 of the user-trigger plan), this may also suppress publication of the DML to logical replication subscribers. When a session is in replica mode, PostgreSQL treats it as a replication subscriber — DML performed in that session may not be forwarded to downstream subscribers (depending on the publication's publish_via_partition_root and the subscriber's origin setting).

This is a potential conflict between the two features. Options:

OptionUser Triggers Suppressed?Replication Published?Drawback
session_replication_role = 'replica'✅ Yes❌ May not be publishedBreaks logical replication from STs
ALTER TABLE ... DISABLE TRIGGER USER✅ Yes✅ YesRequires ACCESS EXCLUSIVE lock
pg_trickle.suppress_user_triggers GUC → DISABLE TRIGGER USER only when needed✅ Configurable✅ YesLock overhead; crash safety concern (ENABLE on recovery)
tgisinternal flag manipulation✅ Yes✅ YesNon-portable; catalog-level hack

Recommended resolution: Use ALTER TABLE ... DISABLE TRIGGER USER within a SAVEPOINT, restoring on error. The ACCESS EXCLUSIVE lock is brief (only held for the catalog update, not the entire refresh). If the user has enabled both user triggers AND logical replication on a stream table, this is the only approach that supports both simultaneously. If neither feature is in use, skip the overhead entirely.


3. Separating Creation-Time from Steady-State

The original ADR chose triggers because pg_create_logical_replication_slot() cannot execute inside a transaction that has already performed writes. This report initially treated that constraint as "decisive." But it deserves scrutiny: the atomicity constraint only affects the create_stream_table() call — a one-time event. Once a stream table exists, CDC runs for hours, days, or months. The steady-state characteristics are what actually matter for performance, correctness, and user experience.

3.1 The Atomicity Constraint Is a Solvable Engineering Problem

The constraint is real but workable. Three approaches exist, all with well-understood trade-offs:

ApproachHow It WorksDownside
Two-phase creationPhase 1: DDL + catalog in one transaction. Phase 2: slot creation in a separate transaction. Rollback Phase 1 artifacts on Phase 2 failure.Brief window where catalog entry exists without CDC. Cleanup on failure adds ~50 lines of code.
Background worker handoffMain transaction creates DDL + catalog + temporary trigger. Background worker creates slot asynchronously, then drops trigger.Race window: changes between COMMIT and slot creation are captured by the temporary trigger, so no data is lost. Adds complexity (~100 lines).
Trigger bootstrap → slot transitionCreate with triggers (current approach). After first successful refresh, migrate to logical replication in the background.Trigger overhead during bootstrap period (minutes). Most natural hybrid approach.

None of these are architecturally difficult. The two-phase approach is straightforward — if slot creation fails, drop the storage table and catalog entry. The temporary-trigger approach eliminates even the theoretical data-loss window. These are engineering inconveniences, not fundamental blockers.

3.2 Steady-State: Triggers vs Logical Replication (Honest Comparison)

Once the stream table exists and CDC is running, here is how the two approaches compare on their actual runtime merits.

In plain terms: With triggers, every time the application writes a row to a tracked source table, the database does extra work right then and there — calling a function, writing to a buffer table, updating an index — all before the application's transaction can finish. This is like a toll booth on a highway: every car (write) must stop and pay (trigger overhead) before continuing.

With logical replication, the database already writes every change to its internal transaction log (the WAL) as part of normal operation. CDC simply reads that log after the fact, in a separate background process. The application's writes pass through without stopping — there is no toll booth. The cost of reading the log is paid by the database server, but it happens asynchronously and never slows down the application.

Where Logical Replication Wins (Steady-State)

DimensionTrigger ImpactLogical Replication Advantage
Write-path latencyEvery INSERT/UPDATE/DELETE on a tracked source pays ~2–15 μs synchronous overhead (PL/pgSQL dispatch, buffer INSERT, index update). This is inside the committing transaction's critical path.Zero additional write-path cost. WAL writes happen regardless; decoding is asynchronous. Source table DML performance is completely unaffected.
Write amplificationEach source row change produces: (1) source table WAL, (2) buffer table heap write, (3) buffer table WAL, (4) buffer index update, (5) index WAL. ~2–3× total write amplification.1× — only the source table's normal WAL. No additional heap writes, no secondary indexes.
TRUNCATE captureCannot capture. Row-level triggers don't fire. Requires a separate statement-level AFTER TRUNCATE workaround (§4) that only marks for reinit — the actual row deletions are invisible to differential mode.Native. WAL emits TRUNCATE events since PG 11. The decoder receives a clean signal that all rows were removed.
Throughput ceilingEstimated ~5,000 writes/sec on tracked sources before trigger overhead dominates. PL/pgSQL function dispatch is the bottleneck.Bounded by WAL throughput — typically 50,000–200,000+ writes/sec depending on hardware and wal_buffers.
Connection-pool pressureTrigger executes in the application's connection. Long-running trigger INSERTs can increase connection hold time under load.Decoding runs in a dedicated WAL sender process. Application connections are unaffected.
Vacuum pressureBuffer tables accumulate dead tuples between cleanups. Each refresh cycle creates bloat that autovacuum must reclaim.No buffer tables to vacuum. WAL segments are recycled by the WAL management subsystem.
Transaction ID consumptionEach trigger INSERT consumes sub-transaction resources within the outer transaction. High-volume batch operations can cause excessive subtransaction overhead.No additional transaction work.

Where Triggers Win (Steady-State)

DimensionTrigger AdvantageLogical Replication Impact
Operational simplicityNo external state to manage. Buffer tables are regular heap tables — queryable, monitorable, backed up normally. Drop the trigger and it's gone.Replication slots are persistent server-side state. A stuck or crashed consumer prevents WAL recycling, potentially filling the disk. Requires monitoring, max_slot_wal_keep_size guards, and orphan-slot cleanup.
Zero configurationWorks with any wal_level (minimal, replica, logical). No restart required. No REPLICA IDENTITY configuration.Requires wal_level = logical (server restart), max_replication_slots sizing, and REPLICA IDENTITY on every tracked source table. Many managed PostgreSQL providers default to wal_level = replica.
Schema evolutionDDL event hooks rebuild the trigger function via CREATE OR REPLACE FUNCTION. New columns are added to the buffer table with ADD COLUMN IF NOT EXISTS. Simple, same-transaction, no coordination.Schema changes on tracked tables require careful handling. The output plugin must be aware of column additions/removals. Slot may need to be recreated. ALTER TABLE during active decoding can cause protocol errors.
Debugging & visibilityChange buffers are queryable tables: SELECT * FROM pgtrickle_changes.changes_12345 ORDER BY change_id DESC LIMIT 10. Immediate visibility into what was captured.WAL is binary and opaque. Inspecting captured changes requires pg_logical_slot_peek_changes() which advances or peeks the slot — disruptive in production.
Crash recoveryBuffer tables are WAL-logged and survive crashes. No special recovery needed — the refresh engine picks up from the last frontier LSN.Slots survive crashes, but the decoding position may be ahead of what pg_trickle has consumed. Requires careful bookkeeping to avoid replaying or losing changes.
Multi-source coordinationEach source has an independent buffer table. The refresh engine reads from multiple buffers with independent LSN ranges. No coordination between sources.Multiple sources could share a single slot (decoding all tables) or use per-source slots. Shared slots require demultiplexing; per-source slots multiply the slot management burden.
IsolationTrigger failure (e.g., buffer table full) raises an error in the application transaction — visible and immediate.Decoding failure is asynchronous. The application commits successfully, but changes may never reach the buffer. Silent data loss is possible unless monitored.

Neutral (Roughly Equivalent)

DimensionNotes
Refresh-path performanceBoth approaches populate the same buffer table schema. The MERGE/DVM pipeline is identical regardless of how buffers were filled.
Zero-change detectionTriggers: EXISTS check on empty buffer (~3 ms). Logical replication: check slot position vs current WAL LSN (~3 ms). Equivalent.
Memory footprintTriggers: PL/pgSQL function cache per backend. Logical replication: WAL sender process + decoding context. Both are modest.

3.3 When Does Logical Replication Become the Better Choice?

The crossover point depends on workload characteristics:

ScenarioBetter ChoiceWhy
< 1,000 writes/sec on tracked sourcesTriggersOverhead is negligible; operational simplicity dominates
1,000–5,000 writes/secEither / Triggers still acceptableTrigger overhead is measurable but unlikely to be the bottleneck
> 5,000 writes/secLogical ReplicationWrite-path overhead starts to matter; 2–3× write amplification compounds
ETL patterns (TRUNCATE + bulk INSERT)Logical ReplicationNative TRUNCATE capture; no stale-data gap
Wide tables (20+ columns)Logical ReplicationTrigger overhead scales with column count (~5–15 μs); WAL overhead does not
Managed PostgreSQL with wal_level restrictionsTriggersNo choice — logical replication may not be available
Many tracked sources (50+)Logical ReplicationFewer moving parts than 50 triggers + 50 buffer tables + 50 indexes
Need logical replication FROM stream tablesTriggers (with caveats)see §2.5 — session_replication_role conflict with DISABLE TRIGGER USER as workaround

3.4 Reassessing the Decision

With the atomicity constraint properly scoped as a creation-time concern, the decision to use triggers rests on three remaining pillars:

  1. Operational simplicity — no wal_level change, no slot management, no REPLICA IDENTITY configuration. This is genuinely valuable for an early-stage extension that needs frictionless adoption.

  2. Debugging visibility — queryable buffer tables are a major developer experience advantage. Being able to SELECT * FROM changes_<oid> during debugging is invaluable.

  3. Zero-config deployment — works on any PostgreSQL 18 instance without server restarts or configuration changes. Critical for managed PostgreSQL environments.

However, these advantages are primarily about developer and operator experience, not about the fundamental capability of the system. A mature pg_trickle deployment that needs high write throughput, TRUNCATE support, or minimal source-table impact would be better served by logical replication in steady-state.

The honest assessment: Triggers are the right choice today for pragmatic reasons (simplicity, early-stage adoption, managed PG compatibility). But the report should not overstate the atomicity constraint as a fundamental blocker — it is a solvable problem. If pg_trickle grows to serve high-throughput production workloads, the migration to logical replication for steady-state CDC should be treated as a planned evolution, not a theoretical future.


4. TRUNCATE: The Gap and How to Close It

This limitation is one of the strongest arguments for logical replication in steady-state — see §3.2 for the comparison.

The TRUNCATE limitation is the most commonly cited drawback of trigger-based CDC. PostgreSQL does not fire row-level triggers for TRUNCATE because TRUNCATE operates at the file level (O(1)) — there are no individual rows to enumerate.

Current Behavior

  1. User runs TRUNCATE source_table
  2. CDC trigger does not fire — change buffer remains empty
  3. Scheduler sees zero changes → NO_DATA → stream table is stale
  4. Stream table shows data from rows that no longer exist

Proposed Fix: Statement-Level AFTER TRUNCATE Trigger

PostgreSQL supports statement-level AFTER TRUNCATE triggers. While they provide no OLD row data, they can mark downstream stream tables for reinitialization:

CREATE TRIGGER pg_trickle_truncate_<oid>
  AFTER TRUNCATE ON <source_table>
  FOR EACH STATEMENT
  EXECUTE FUNCTION pgtrickle.on_source_truncated('<source_oid>');

The trigger function would:

  1. Look up all stream tables that depend on this source
  2. Mark them needs_reinit = true in the catalog
  3. Cascade transitively to downstream STs

This closes the TRUNCATE gap without changing the CDC architecture. The next scheduler cycle would trigger a FULL refresh automatically.

Effort estimate: ~2–4 hours (trigger creation in cdc.rs, PL/pgSQL or Rust function for on_source_truncated, cascade logic reuse from hooks.rs).


5. Migration Path: Trigger → Logical Replication (Now Implemented)

Status: Phase A (Hybrid Creation) is now implemented in src/wal_decoder.rs. The pg_trickle.cdc_mode GUC controls the behavior (trigger/auto/wal).

As discussed in §3, the atomicity constraint is a creation-time problem with known solutions. The buffer table schema and downstream IVM pipeline are decoupled from the capture mechanism, so migration is isolated to the CDC layer. This should be treated as a planned evolution for high-throughput deployments, not a theoretical future:

Phase A: Hybrid Creation

  1. create_stream_table() continues using triggers for atomic creation
  2. After first successful full refresh, a background worker creates a replication slot and transitions to WAL-based capture
  3. Trigger is dropped; buffer table continues to be populated from WAL decode

Phase B: Steady-State WAL Capture

  1. Background worker runs a logical decoding consumer per tracked source
  2. WAL changes are decoded and written to the same buffer table schema
  3. Downstream pipeline (DVM, MERGE, frontier) is unchanged
  4. TRUNCATE events are captured natively from WAL

Prerequisites

  • wal_level = logical (must be documented as optional upgrade path)
  • REPLICA IDENTITY on tracked sources (auto-configured or user-managed)
  • Custom output plugin or pgoutput + column mapping
  • Slot health monitoring (WAL retention alerts, orphan cleanup)

Effort estimate: 3–5 weeks for a production-quality implementation.


6. Recommendations

Recommendation 1: Keep Trigger-Based CDC (For Now)

Operational simplicity and zero-config deployment are strong advantages for an early-stage extension. The performance ceiling (~5,000 writes/sec) is adequate for current target use cases. The atomicity constraint, while solvable (see §3.1), adds creation-time complexity that is not yet justified.

However: This decision should be revisited when any of these triggers are hit: (a) users report write-path latency from CDC triggers, (b) TRUNCATE-based ETL patterns become a common pain point, (c) pg_trickle targets environments where wal_level = logical is already the norm. The steady-state advantages of logical replication (§3.2) are substantial and should not be dismissed.

Recommendation 2: ✅ IMPLEMENTED — User Trigger Suppression

User-defined triggers on stream tables are now fully supported. The implementation uses ALTER TABLE ... DISABLE TRIGGER USER / ENABLE TRIGGER USER around FULL refresh, and explicit per-row DML (INSERT/UPDATE/DELETE) instead of MERGE during DIFFERENTIAL refresh so user AFTER triggers fire correctly. Controlled by pg_trickle.user_triggers GUC (auto/on/off). The session_replication_role approach from the original plan was rejected to avoid conflict with logical replication publishing (see §2.5).

Recommendation 3: Add TRUNCATE Capture Trigger

Add a statement-level AFTER TRUNCATE trigger on each tracked source table that marks downstream STs for reinitialization. This closes the most significant usability gap without changing the CDC architecture.

Recommendation 4: Document Logical Replication FROM Stream Tables

Add documentation and examples for CREATE PUBLICATION on stream tables, including:

  • Column filtering to exclude __pgt_row_id
  • REPLICA IDENTITY configuration using __pgt_row_id as unique index
  • Behavior during FULL vs DIFFERENTIAL refresh
  • Interaction with user trigger suppression

Recommendation 5: Benchmark Trigger Overhead

Execute the benchmark plan in PLAN_TRIGGERS_OVERHEAD.md to establish data-driven thresholds for the logical replication migration crossover point. The results should feed directly into the §3.3 crossover analysis.

Recommendation 6: ✅ IMPLEMENTED — Hybrid CDC Approach

The "trigger bootstrap → slot transition" pattern is now implemented in src/wal_decoder.rs (1152 lines). The implementation includes:

  • Automatic transition: After stream table creation with triggers, a background worker creates a logical replication slot and transitions to WAL-based capture.
  • GUC control: pg_trickle.cdc_mode (trigger/auto/wal) and pg_trickle.wal_transition_timeout control the behavior.
  • Transition orchestration: Create slot → wait for catch-up → drop trigger. Automatic fallback to triggers if slot creation fails.
  • Catalog extension: pgt_dependencies gains cdc_mode, slot_name, decoder_confirmed_lsn, transition_started_at columns.
  • Health monitoring: pgtrickle.check_cdc_health() function and NOTIFY pg_trickle_cdc_transition notifications.

7. Decision Log

#DecisionRationale
D1Keep triggers for CDC on source tables — for nowZero-config, operational simplicity, adequate for current scale
D2Atomicity constraint is solvable, not fundamentalTwo-phase creation and hybrid bootstrap are proven patterns (§3.1)
D3Logical replication is superior in steady-stateZero write overhead, TRUNCATE capture, higher throughput ceiling (§3.2)
D4User triggers on STs are orthogonal to CDC choicesession_replication_role / DISABLE TRIGGER USER works with either approach
D5Logical replication FROM STs works todayRegular heap tables; needs documentation, not code
D6TRUNCATE gap is closable with statement-level triggerLow effort, high impact — but logical replication handles it natively
D7Hybrid approach is the optimal long-term targetTrigger bootstrap for creation + logical replication for steady-state
D8User trigger suppression uses DISABLE TRIGGER USERAvoids session_replication_role conflict with logical replication publishing (§2.5)
D9Hybrid CDC implemented with auto-transitionpg_trickle.cdc_mode = 'auto' triggers → WAL transition after creation
D10Explicit DML for DIFFERENTIAL refresh with user triggersINSERT/UPDATE/DELETE instead of MERGE so AFTER triggers fire correctly

Prior Art

This document lists the academic papers, PostgreSQL commits, open-source tools, and standard algorithms whose techniques are reused in pg_trickle.

Maintaining this record serves two purposes:

  1. Attribution — credit the research and engineering work this project builds upon.
  2. Independent derivation — demonstrate that every core technique predates and is independent of any single vendor's commercial product.

Differential View Maintenance (DVM)

DBSP — Automatic Incremental View Maintenance

Budiu, M., Ryzhyk, L., McSherry, F., & Tannen, V. (2023). "DBSP: Automatic Incremental View Maintenance for Rich Query Languages." Proceedings of the VLDB Endowment (PVLDB), 16(7), 1601–1614. https://arxiv.org/abs/2203.16684

The Z-set abstraction (rows annotated with +1/−1 multiplicity) is the theoretical foundation for the __pgt_action column produced by the delta operators in src/dvm/operators/. The per-operator differentiation rules (scan, filter, project, join, aggregate, union) are direct applications of the DBSP lifting operator (D) described in this paper.

See DBSP_COMPARISON.md for a detailed comparison of pg_trickle's architecture with the DBSP model.

Gupta & Mumick — Materialized Views Survey

Gupta, A. & Mumick, I.S. (1995). "Maintenance of Materialized Views: Problems, Techniques, and Applications." IEEE Data Engineering Bulletin, 18(2), 3–18.

Gupta, A. & Mumick, I.S. (1999). Materialized Views: Techniques, Implementations, and Applications. MIT Press. ISBN 978-0-262-57122-7.

The per-operator differentiation rules in src/dvm/operators/ follow the derivation given in section 3 of the 1995 survey. The counting algorithm for maintaining aggregates with deletions uses the approach described in the MIT Press book.

DBToaster — Higher-order Delta Processing

Koch, C., Ahmad, Y., Kennedy, O., Nikolic, M., Nötzli, A., Olteanu, D., & Zavodny, J. (2014). "DBToaster: Higher-order Delta Processing for Dynamic, Frequently Fresh Views." The VLDB Journal, 23(2), 253–278. https://doi.org/10.1007/s00778-013-0348-4

Inspiration for the recursive delta compilation strategy where the delta of a complex query is itself a query that can be differentiated.

DRed — Deletion and Re-derivation

Gupta, A., Mumick, I.S., & Subrahmanian, V.S. (1993). "Maintaining Views Incrementally." Proceedings of the 1993 ACM SIGMOD International Conference, 157–166.

The DRed algorithm for handling deletions in recursive views is the basis for the recursive CTE differential refresh strategy in src/dvm/operators/recursive_cte.rs.


Scheduling

Earliest-Deadline-First (EDF)

Liu, C.L. & Layland, J.W. (1973). "Scheduling Algorithms for Multiprogramming in a Hard-Real-Time Environment." Journal of the ACM, 20(1), 46–61. https://doi.org/10.1145/321738.321743

The schedule-based scheduling in src/scheduler.rs applies the classic EDF principle: the stream table whose freshness deadline expires soonest is refreshed first. EDF is optimal for uniprocessor preemptive scheduling and is a standard technique in operating systems and real-time databases.

Topological Sort — Kahn's Algorithm

Kahn, A.B. (1962). "Topological sorting of large networks." Communications of the ACM, 5(11), 558–562. https://doi.org/10.1145/368996.369025

The dependency DAG in src/dag.rs uses Kahn's algorithm for topological ordering and cycle detection. This is standard computer science curriculum and appears in every major algorithms textbook (Cormen et al., Sedgewick, Kleinberg & Tardos).


Change Data Capture (CDC)

PostgreSQL Row-Level Triggers

Row-level AFTER INSERT/UPDATE/DELETE triggers have been available in PostgreSQL since version 6.x (late 1990s). The trigger-based change capture pattern used in src/cdc.rs is a well-established PostgreSQL technique:

  • PostgreSQL documentation: CREATE TRIGGER — trigger-based CDC has been a standard pattern for decades.
  • PostgreSQL wiki: "Trigger-based Change Data Capture in PostgreSQL."

Debezium

Debezium project (Red Hat, open source since 2016). https://debezium.io/

Debezium implements trigger-based and WAL-based CDC for PostgreSQL and other databases. The change buffer table pattern (pg_trickle_changes.changes_<oid>) follows a similar approach, modified for single-process consumption within the PostgreSQL backend.

pgaudit

pgaudit extension (2015). https://github.com/pgaudit/pgaudit

Captures DML via AFTER row-level triggers for audit logging, demonstrating the same trigger-based change-capture technique in production since 2015.


Materialized View Refresh

PostgreSQL REFRESH MATERIALIZED VIEW CONCURRENTLY

PostgreSQL 9.4 (December 2014, commit 96ef3b8). src/backend/commands/matview.c

The snapshot-diff strategy used for recomputation-diff refreshes (where the full query is re-executed and anti-joined against current storage to compute inserts and deletes) mirrors the algorithm implemented in PostgreSQL's REFRESH MATERIALIZED VIEW CONCURRENTLY. This PostgreSQL feature predates all relevant patents and is publicly documented.

SQL MERGE Statement

ISO/IEC 9075:2003 (SQL:2003 standard) — MERGE statement. PostgreSQL 15 (October 2022, commit 7103eba).

The MERGE-based delta application in src/refresh.rs uses the ISO-standard MERGE statement, independently implemented by Oracle, SQL Server, DB2, and PostgreSQL. This is not derived from any vendor-specific implementation.


General Database Theory

Relational Algebra

Codd, E.F. (1970). "A Relational Model of Data for Large Shared Data Banks." Communications of the ACM, 13(6), 377–387.

The operator tree in src/dvm/parser.rs models standard relational algebra operators (select, project, join, aggregate, union). These are foundational database theory from 1970.

Semi-Naive Evaluation

Bancilhon, F. & Ramakrishnan, R. (1986). "An Amateur's Introduction to Recursive Query Processing Strategies." Proceedings ACM SIGMOD, 16–52.

General background for recursive CTE evaluation strategies. PostgreSQL's own WITH RECURSIVE implementation uses iterative fixpoint evaluation based on these principles.


This document is maintained for attribution and independent-derivation documentation purposes. It does not constitute legal advice.

Multi-Database Refresh Broker — Design Document

Implementation Status: Design only — not yet implemented. This feature is planned for a future release after v1.0. Track progress at ROADMAP.md.

The design below is a stable reference for contributors and reviewers. The API described here is aspirational and subject to change before implementation begins.

Status: Design only (v0.31.0 SCAL-2). Implementation planned for a future release.


Problem

When pg_trickle is installed in multiple databases on the same PostgreSQL cluster, each per-database scheduler independently scans its change buffers. For workloads where two databases reference the same upstream source — commonly via postgres_fdw foreign tables or logical replication — each scheduler pays the full scan cost independently:

DB A scheduler: SELECT * FROM pgtrickle_changes.changes_12345  (full scan)
DB B scheduler: SELECT * FROM pgtrickle_changes.changes_12345  (same scan, again)

At 100 stream tables across 10 databases with 5 shared sources, this is 10× the necessary I/O.


Goal

Introduce a "refresh broker" — a singleton background worker that:

  1. De-duplicates change-buffer scans across databases in the same cluster.
  2. Distributes scan results to per-database schedulers via shared memory.
  3. Reduces total change-buffer I/O proportionally to the number of databases sharing a source.

Design

Components

┌─────────────────────────────────────────────────────────────┐
│ PostgreSQL Cluster                                           │
│                                                             │
│  ┌────────────────┐     shared memory      ┌─────────────┐ │
│  │ Refresh Broker │ ──── scan results ────► │ DB-A sched  │ │
│  │  (singleton)   │                         │ DB-B sched  │ │
│  │                │ ◄─── scan requests ──── │ DB-C sched  │ │
│  └────────────────┘                         └─────────────┘ │
│           │                                                  │
│           │ single scan per source OID per tick              │
│           ▼                                                  │
│  pgtrickle_changes.changes_{oid}                             │
└─────────────────────────────────────────────────────────────┘

Broker Protocol

  1. Registration: Each per-database scheduler registers its interest in a set of source OIDs with the broker via shared memory (PgLwLock<BrokerRegistry>).

  2. Tick coordination: At the start of each tick, the broker scans each registered source OID once. Results (row counts + LSN watermarks) are written to a shared memory segment indexed by (database_oid, source_oid).

  3. Result consumption: Per-database schedulers read the broker's results instead of issuing their own SPI queries. The broker's scan is authoritative for the tick.

  4. Fallback: If the broker is not running (e.g. max_worker_processes exhausted), per-database schedulers fall back to their current direct-scan behaviour.

Shared Memory Layout

#![allow(unused)]
fn main() {
/// BRK-1: Broker registry entry for one (database, source) pair.
struct BrokerEntry {
    db_oid: pg_sys::Oid,
    source_oid: pg_sys::Oid,
    /// Last scan result: row count in the change buffer.
    pending_rows: i64,
    /// Last scan result: maximum LSN seen in the change buffer.
    max_lsn_u64: u64,
    /// Monotone tick counter when this entry was last updated.
    last_updated_tick: u64,
}

/// Maximum number of (db, source) pairs the broker tracks.
const BROKER_CAPACITY: usize = 4096;
}

Broker Worker Loop

loop:
  1. Sleep until next tick (shared scheduler_interval_ms).
  2. Lock BrokerRegistry (read) to collect unique source OIDs.
  3. For each unique source OID, run:
       SELECT COUNT(*), MAX(lsn) FROM pgtrickle_changes.changes_{oid}
  4. Write results to BrokerScanResults (lock-free CAS update).
  5. Advance broker tick counter.
  6. Per-DB schedulers wake and consume results.

Integration with Per-DB Schedulers

In src/scheduler.rs, has_table_source_changes would gain a fast path:

#![allow(unused)]
fn main() {
fn has_table_source_changes(st: &StreamTableMeta) -> bool {
    // Fast path: try broker results first.
    if config::pg_trickle_adaptive_batch_coalescing() {
        if let Some(result) = broker::get_scan_result(source_oid) {
            return result.pending_rows > 0;
        }
    }
    // Fallback: direct SPI query (current behaviour).
    // ...
}
}

Open Questions

  1. Transaction isolation: The broker scans in its own transaction. Per-DB schedulers that read its results are using data from a different snapshot. Is this acceptable? (Short answer: yes — the existing behaviour already has a tick-window delay between when changes are written and when they are consumed.)

  2. Cross-database connectivity: The broker must connect to each database to read its change buffers. PostgreSQL background workers connect to a specific database. We may need a pool of broker workers, one per database, coordinated by a shared-memory rendezvous point.

  3. Authorization: The broker needs read access to pgtrickle_changes.* in each database. This is satisfied by shared_preload_libraries + SECURITY DEFINER wrapper functions.

  4. Failure isolation: If the broker crashes, per-DB schedulers must detect the absence of fresh results and fall back to direct scans within one tick.


Next Steps (v0.32.0+)

  • Implement BrokerRegistry shared memory struct + init_shared_memory hook
  • Implement broker background worker registration
  • Add broker::get_scan_result fast path in has_table_source_changes
  • Add pg_trickle.enable_refresh_broker GUC (default false until stable)
  • Add E2E test: two databases sharing a source, broker reduces SPI queries to 1×
  • Benchmark: 10 databases × 5 shared sources — measure scan reduction

Filed under v0.31.0 (SCAL-2). ADR reference: see plans/adrs/ for architectural rationale on trigger-based vs broker-based CDC.