pg_trickle
pg_trickle is a PostgreSQL 18 extension that adds self-maintaining materialized views — stream tables — and keeps them up to date incrementally as the underlying data changes. No external streaming engine, no sidecars, no bespoke refresh pipeline. Just install the extension and write SQL.
SELECT pgtrickle.create_stream_table(
name => 'active_orders',
query => 'SELECT * FROM orders WHERE status = ''active''',
schedule => '30s'
);
INSERT INTO orders (id, status) VALUES (42, 'active');
SELECT count(*) FROM active_orders; -- 1, automatically
New here? Read What is pg_trickle? for the plain-language overview, or jump to the 5-Minute Quickstart to try it. First time installing? See the Installation Guide.
How it works
pg_trickle keeps stream tables current by tracking every change to the source tables — inserts, updates, and deletes — and recomputing only the parts of the view that are affected by those changes. This is called differential (or incremental) view maintenance. Instead of re-running the full query on every refresh cycle, pg_trickle applies a delta computation proportional to the number of changed rows, not the total table size. A stream table over a billion-row orders table refreshes in milliseconds when only a few rows changed.
Change capture works through row-level AFTER triggers (the default) or
WAL-based logical decoding (cdc_mode = 'wal' or the automatic 'auto'
mode). Trigger-based capture writes changed rows into a per-source change-buffer
table within the same transaction, providing full atomicity with no possibility
of a committed change being missed. The background scheduler reads from the
change buffer, computes the delta SQL, and applies the result to the stream
table using MERGE in a separate transaction.
For queries that cannot be maintained incrementally (non-monotonic functions,
LATERAL with volatile sub-expressions, etc.), pg_trickle automatically falls
back to a full refresh — replacing the entire stream table contents in a
single transaction. You can also force full mode explicitly or let the
cost-based AUTO strategy choose per-refresh based on the change-to-table-size
ratio.
Choose your path
| Persona | Start here |
|---|---|
| Curious / evaluator | What is pg_trickle? → Use Cases → Comparisons → Playground |
| Application developer | 5-Minute Quickstart → Getting Started tutorial → Patterns → SQL Reference |
| DBA / SRE | Pre-Deployment Checklist → Configuration → Troubleshooting → Capacity Planning |
| Data / analytics engineer | Use Cases → dbt integration → Migrating from materialized views |
| Building a dashboard backend | Real-Time Analytics Dashboard tutorial |
| Event-sourced architecture | Event Sourcing / CQRS tutorial |
| Migrating from REFRESH MATERIALIZED VIEW | Backfill and Migration tutorial |
| Hardening a production deployment | Security Hardening tutorial → Security Guide |
| Confused by jargon | Glossary |
What's new
See What's New for a curated summary of recent releases, or the Changelog for the full history.
Project & licensing
- Written in Rust using pgrx.
- Targets PostgreSQL 18.
- Apache 2.0 licensed.
- Repository: https://github.com/trickle-labs/pg-trickle
- Project history · Roadmap · Contributing · Security policy
What is pg_trickle?
pg_trickle is a PostgreSQL 18 extension that adds stream tables — tables that are defined by a SQL query and stay up to date automatically as the underlying data changes. No external process, no streaming engine, no pipeline to operate. Just install the extension and write SQL.
If you have ever wished CREATE MATERIALIZED VIEW would just keep
itself fresh, this is that.
The problem
PostgreSQL's materialized views are powerful but frustrating.
REFRESH MATERIALIZED VIEW re-runs the entire query from scratch,
even if only one row changed in a million-row table. Your choices
are:
- Burn CPU on full recomputation, on a schedule, and hope you refresh often enough.
- Accept stale data, and try to explain that to the dashboard user.
- Build a bespoke refresh pipeline — Debezium, Kafka Connect, a streaming engine, a separate read database. Now you have two systems to operate.
Most teams pick option 3 and end up maintaining infrastructure that is more complex than the application it supports.
What pg_trickle does instead
You declare a stream table with a SQL query and a schedule:
SELECT pgtrickle.create_stream_table(
name => 'revenue_by_region',
query => $$
SELECT c.region,
COUNT(*) AS order_count,
SUM(o.quantity * p.price) AS total_revenue
FROM orders o
JOIN customers c ON c.id = o.customer_id
JOIN products p ON p.id = o.product_id
GROUP BY c.region
$$,
schedule => '1s'
);
-- Read it like any table — always fresh.
SELECT * FROM revenue_by_region ORDER BY total_revenue DESC;
Behind the scenes:
- pg_trickle parses your query into an operator tree (scans, joins, aggregates).
- It captures every
INSERT,UPDATE, andDELETEon the source tables — by default with lightweight row-level triggers, with no replication slots required. - On each refresh cycle (every
1s, in the example above) it derives a delta query from the operator tree — the SQL that computes only the change since the last refresh — and merges the result into the stream table.
When you insert one row into a million-row source table, pg_trickle processes one row's worth of computation. Not a million.
Why people care
A few things make this materially different from "just another IVM project":
No external infrastructure. No Kafka, no Flink, no Debezium, no sidecar. The extension lives inside PostgreSQL and uses only its built-in mechanisms.
Stream tables can depend on stream tables. A single write to a base table can ripple through a graph of derived tables, each refreshed in the right order, each doing only the work proportional to what actually changed.
Demand-driven scheduling. With the default CALCULATED schedule
mode, you only set a refresh interval on the consumer-facing
stream tables — the ones your application actually reads. Upstream
stream tables inherit the tightest cadence among their downstream
dependents. You declare freshness where it matters; the system
propagates it everywhere else.
Hybrid change capture. It bootstraps with row-level triggers,
which always work. If wal_level = logical is available, it
transitions automatically to WAL-based capture for near-zero
write-side overhead. The transition is seamless. If anything goes
wrong, it falls back to triggers.
Broad SQL coverage. Joins (inner, left, right, full outer,
lateral), GROUP BY with 30+ aggregates, DISTINCT,
UNION/INTERSECT/EXCEPT, subqueries (EXISTS, IN, scalar),
CTEs including WITH RECURSIVE, window functions, and most of
standard SQL. See docs/SQL_REFERENCE.md for
the complete matrix.
Inside the same transaction, if you want it. IMMEDIATE refresh
mode maintains the stream table inside the same transaction as the
source DML, giving read-your-writes consistency without any
background worker.
Where the design comes from
The mathematical foundation is the DBSP differential dataflow framework (Budiu et al., 2022). Delta queries are derived automatically from your SQL's operator tree:
- joins produce the classic bilinear expansion,
- aggregates maintain auxiliary counters,
- linear operators like filters and projections pass deltas through unchanged.
You do not need to know any of this to use the extension; the rules are baked in. If you are curious, the DVM Operators reference and the DBSP Comparison explain how pg_trickle relates to the original.
Performance, briefly
Performance is a primary goal. Maximum throughput, low latency, and minimal overhead drive every design decision. Differential refresh is the default; full refresh is a fallback of last resort.
A few headline numbers from the TPC-H validation:
- All 22 standard analytical queries pass in DIFFERENTIAL, IMMEDIATE, and FULL modes, with identical results across modes.
- 5–90× measured speedup over FULL across the suite at 1% change rate.
- The bottleneck is PostgreSQL's own
MERGE, not pg_trickle's pipeline.
For workloads with high write volume, hybrid CDC and column-level change tracking keep the write-side overhead low (sub-microsecond per row in WAL mode).
What's it built on
- Language: Rust, using pgrx 0.18.
- Targets: PostgreSQL 18.
- License: Apache 2.0.
- Status: active development, approaching 1.0. APIs may still change between minor versions; see ROADMAP.md.
- Tests: thousands of unit, integration, and end-to-end tests. TPC-H 22/22 in all modes.
Try it
The fastest paths from zero to a working stream table:
# Option 1 — playground (sample data, dashboards)
cd playground && docker compose up -d
# Option 2 — minimal Docker image
docker run --rm -e POSTGRES_PASSWORD=secret -p 5432:5432 \
ghcr.io/trickle-labs/pg_trickle:latest
Then read the 5-Minute Quickstart.
Where to go next
| Audience | Start here |
|---|---|
| Curious / evaluator | Use Cases · Comparisons |
| Application developer | Quickstart (5 min) → Tutorial (in depth) → Patterns |
| DBA / SRE | Pre-Deployment Checklist → Configuration → Troubleshooting |
| Data / analytics engineer | Use Cases → dbt integration → Migrating from materialized views |
Source: https://github.com/trickle-labs/pg-trickle
Note on terminology. A few terms are used throughout the docs without further definition: stream table, differential, delta query, DAG, frontier. The Glossary defines all of them.
Use Cases
pg_trickle is for any team that has wanted PostgreSQL to keep a derived table up to date automatically — without writing a refresh cron, an external streaming pipeline, or a custom CDC consumer.
This page is a gallery of the most common things people build with stream tables. Each section tells you what the pattern looks like, gives you a minimal SQL example, and links to a deeper guide.
New to stream tables? Read What is pg_trickle? first, or jump straight to the 5-minute Quickstart.
At a glance
| Use case | Best refresh mode | Deeper guide |
|---|---|---|
| Real-time dashboards | DIFFERENTIAL with 1s–5s schedule | Patterns §5 |
| Operational read models / CQRS | IMMEDIATE | Patterns §1 |
| Fraud detection / alerting / anomaly | DIFFERENTIAL 1s | Demo |
| Leaderboards & TopK | DIFFERENTIAL or IMMEDIATE | SQL Reference – TopK |
| Bronze / Silver / Gold (medallion) | DIFFERENTIAL with chained STs | Patterns §1 |
| Event-driven services (outbox / inbox) | IMMEDIATE for the table; DIFFERENTIAL for the views | Outbox · Inbox |
| Cross-system replication | DIFFERENTIAL | Publications |
| Slowly-changing dimensions | DIFFERENTIAL | Patterns §3 |
| Multi-tenant analytics | DIFFERENTIAL with RLS | Multi-tenant |
| Citus distributed analytics | DIFFERENTIAL | Citus |
| dbt-managed warehouse models | AUTO | dbt integration |
1. Real-time dashboards
You want KPI tiles ("orders today", "revenue per region", "active users this hour") to update within a second or two of the underlying data changing. With pg_trickle you write the SQL once and any number of dashboards (Grafana, Metabase, Looker, a custom React app) just read from the stream table.
SELECT pgtrickle.create_stream_table(
'kpi_revenue_today',
$$SELECT region, SUM(amount) AS revenue
FROM orders
WHERE created_at >= date_trunc('day', now())
GROUP BY region$$,
schedule => '2s'
);
Why it's a fit: aggregates over high-cardinality groups are exactly where DIFFERENTIAL refresh wins biggest.
2. Operational read models / CQRS
A microservice writes to a normalised event/order/customer table; a read API needs the denormalised projection (one row per order with customer name, current status, latest payment). Most teams build this with a separate read database and a CDC pipeline. With pg_trickle the projection is just a stream table sitting next to the write tables.
SELECT pgtrickle.create_stream_table(
'order_view',
$$SELECT o.id, o.placed_at, c.name AS customer,
p.status AS payment_status,
COALESCE(s.shipped_at, NULL) AS shipped_at
FROM orders o
JOIN customers c ON c.id = o.customer_id
LEFT JOIN payments p ON p.order_id = o.id
LEFT JOIN shipments s ON s.order_id = o.id$$,
refresh_mode => 'IMMEDIATE'
);
Why it's a fit: IMMEDIATE mode gives you read-your-writes
consistency without a second database.
3. Fraud detection, alerting, anomaly
Define a stream table that flags suspicious activity (large
transactions, velocity rules, unusual geographies). Subscribe an
alerter to that stream table — either via PostgreSQL LISTEN/NOTIFY,
a downstream publication, or by polling.
SELECT pgtrickle.create_stream_table(
'high_velocity_accounts',
$$SELECT account_id, COUNT(*) AS txn_count, SUM(amount) AS total
FROM transactions
WHERE occurred_at >= now() - interval '5 minutes'
GROUP BY account_id
HAVING COUNT(*) > 20$$,
schedule => '1s'
);
The demo ships a full 9-node fraud-detection DAG you can
run locally with docker compose up.
4. Leaderboards & TopK
ORDER BY ... LIMIT N stream tables are a special case where
pg_trickle stores only the top N rows and updates them with scoped
recomputation when the changes affect the leaderboard.
SELECT pgtrickle.create_stream_table(
'top_10_customers',
$$SELECT customer_id, SUM(amount) AS lifetime_spend
FROM orders
GROUP BY customer_id
ORDER BY lifetime_spend DESC
LIMIT 10$$,
schedule => '5s'
);
5. Bronze / Silver / Gold
A medallion architecture in PostgreSQL alone:
- Bronze – raw ingest table (a regular table you write to).
- Silver – cleaned and deduplicated stream table.
- Gold – business-level aggregates stream table that depends on Silver.
Set the schedule only on Gold; pg_trickle propagates the cadence upstream automatically (CALCULATED scheduling).
See Patterns §1.
6. Event-driven services
The transactional outbox/inbox pattern, native to PostgreSQL:
- Outbox — write events in the same transaction as your business data; an external system drains them to Kafka / NATS / SQS / a webhook.
- Inbox — receive events idempotently from an external system; stream tables give you live views of pending work, retries, and a dead-letter queue.
See Transactional Outbox and Transactional Inbox.
7. Cross-system replication
Once a stream table exists, you can expose it as a standard PostgreSQL logical-replication publication. Anything that speaks logical replication — Debezium, Kafka Connect, a downstream PostgreSQL replica, a Spark Structured Streaming job, a custom WAL consumer — can subscribe to live changes.
SELECT pgtrickle.stream_table_to_publication('order_view');
-- Subscribers can now subscribe to publication pgt_pub_order_view
8. Slowly-changing dimensions (SCD)
Type 2 SCDs (one row per version, with valid_from / valid_to) fall
out naturally from a stream table over an event log. See
Patterns §3.
9. Multi-tenant analytics
If your application multi-tenants by tenant_id, you can build
per-tenant aggregates as a single stream table grouped by tenant_id
and protect access with PostgreSQL Row-Level Security. See
Multi-tenant integration and the
Row-Level Security tutorial.
10. Citus distributed analytics
pg_trickle works on Citus-distributed source tables. The scheduler
polls per-worker WAL slots via dblink, merges changes on the
coordinator, and applies the delta — automatically and idempotently.
See Citus.
11. dbt-managed warehouse models
Use the stream_table materialization in dbt. No custom adapter
needed — works with the standard dbt-postgres adapter.
{{ config(materialized='stream_table', schedule='5m', refresh_mode='AUTO') }}
SELECT customer_id, SUM(amount) AS total
FROM {{ source('raw', 'orders') }}
GROUP BY customer_id
See dbt integration.
When pg_trickle is not the right fit
A short, honest list — knowing when to look elsewhere is a feature.
- Pure OLTP with no derived state. If you don't have anything to materialize, you don't need a materializer.
- Sub-millisecond latency derived state. IMMEDIATE mode is fast, but it pays the differential cost inside your transaction. If you need every transaction to commit in < 1 ms, benchmark first.
- Stateless event transformation only. If you want "transform each Kafka event" with no stored state, a stream processor (Flink, Bytewax) is closer to that shape.
- Cross-database joins at scale. pg_trickle reads from local PostgreSQL tables (or Citus distributions). For federated joins across many heterogeneous databases, consider a streaming engine.
- Workloads where you cannot install extensions. Some managed PostgreSQL services don't allow third-party extensions. Check Installation for the support matrix.
See also: Patterns · Performance Cookbook · SQL Reference · Comparisons
Comparisons
This page compares pg_trickle to adjacent tools so you can decide whether it's the right fit. Each comparison is a short, honest summary — strengths, weaknesses, and "use this instead if…".
If you are evaluating pg_trickle from a specific tool you already run, jump to the relevant section. If you want a deeper academic comparison, see also DBSP Comparison, pg_ivm Comparison, and Prior Art.
At a glance
| Tool | Lives in PostgreSQL? | Incremental? | External infra? | Best for |
|---|---|---|---|---|
| pg_trickle | ✅ | ✅ | ✕ | Self-maintaining materialized views inside one PostgreSQL |
REFRESH MATERIALIZED VIEW | ✅ | ✕ | ✕ | Periodic full recomputation, no automation |
| pg_ivm | ✅ | ✅ (limited) | ✕ | Incremental views with a smaller SQL surface |
| Materialize | ✕ (own engine) | ✅ | Whole new database | Cross-source streaming SQL |
| RisingWave | ✕ (own engine) | ✅ | Whole new database | Streaming SQL with PostgreSQL wire compat |
| Apache Flink | ✕ | ✅ | JVM cluster + state backend | Stateful event processing at scale |
| Debezium + sink | ✕ | (CDC only) | Kafka + Connect | Replicating change events out of PostgreSQL |
| ksqlDB | ✕ | ✅ | Kafka cluster | Streaming SQL on top of Kafka |
| Snowflake Dynamic Tables | ✕ | ✅ | Snowflake | Auto-refreshing tables in Snowflake |
| Custom cron + materialized view | ✅ | ✕ | ✕ | What teams build before they find pg_trickle |
vs. PostgreSQL REFRESH MATERIALIZED VIEW
The question this answers: "I'm already using materialized views — what would I gain?"
REFRESH MATERIALIZED VIEW | pg_trickle stream table | |
|---|---|---|
| Refresh trigger | Manual (or your cron) | Schedule, transition, or in-transaction (IMMEDIATE) |
| Refresh cost | Always full recomputation | Incremental (delta only) for most queries |
| Cross-table dependencies | Manual coordination | DAG-aware topological refresh |
| Concurrency | CONCURRENTLY requires unique index | Always non-blocking; advisory locks coordinate |
| Read-your-writes | Not possible | IMMEDIATE mode |
| Operator coverage | Anything PostgreSQL supports | A large but explicit subset (see SQL Reference) |
Use vanilla materialized views if: you only refresh occasionally, your data is small, and you do not have a chain of dependent views.
Switch to pg_trickle if: any of those things stop being true.
vs. pg_ivm
The question this answers: "There's another PostgreSQL extension in this space — how do they relate?"
pg_ivm is an open-source IVM extension that pioneered much of the relevant work in PostgreSQL land. The two projects have different scopes.
| pg_ivm | pg_trickle | |
|---|---|---|
| Maturity | First released 2022 | First released 2024 |
| Refresh model | Trigger-driven, statement-by-statement | Trigger or WAL CDC + scheduler + DAG |
| SQL coverage | Aggregates, simple joins, sub-queries | Full DBSP-style coverage incl. WITH RECURSIVE, window functions, FULL OUTER JOIN, LATERAL, GROUPING SETS, scalar subqueries |
| Cross-table chains | Manual | DAG with topological refresh and CALCULATED schedules |
| Modes | Always immediate | AUTO / DIFFERENTIAL / FULL / IMMEDIATE |
| Distributed | — | Citus integration |
| Operations | Minimal tooling | Health-check, fuse, parallel refresh, snapshots, dbt |
There is a more thorough side-by-side at research/PG_IVM_COMPARISON.md.
If your queries are simple aggregates and you want the smallest possible install footprint, pg_ivm is a perfectly good choice. If you want broader SQL, multi-layer DAGs, or operational tooling, pg_trickle is closer to that shape.
vs. Materialize
Materialize is a cloud-native database built specifically for incremental view maintenance. It is the inspiration for much of this space.
| Materialize | pg_trickle | |
|---|---|---|
| Deployment | Separate cloud database (or self-hosted server) | Extension inside PostgreSQL |
| Source coverage | PostgreSQL, Kafka, S3, MySQL, … | PostgreSQL tables (incl. Citus, foreign tables) |
| Latency | Streaming, sub-second | Sub-second with 1s schedule; in-transaction with IMMEDIATE |
| Joins / aggregates / recursion | Yes, very mature | Yes |
| Pricing | Commercial cloud product | Open-source, runs anywhere PostgreSQL runs |
| Operational footprint | Managed service or significant self-hosted commitment | Add-on to existing PostgreSQL |
Use Materialize if: you want one engine to materialise across many heterogeneous sources, you want true streaming semantics, and you are happy operating a separate database.
Use pg_trickle if: your data lives in PostgreSQL and you want the materialisation to live there too.
vs. RisingWave
RisingWave is a PostgreSQL-wire-compatible streaming database in Rust. Like Materialize, it is its own engine that you deploy alongside (or instead of) PostgreSQL.
The same trade-off applies: RisingWave is a richer streaming engine; pg_trickle is the answer if you do not want to operate a second database.
vs. Apache Flink (or Spark Structured Streaming)
Flink is a general stateful stream processor. It can do everything pg_trickle can and a lot more — including state-machine workflows, event-time semantics, and complex windowing.
The trade-off is operational. Flink wants a JVM cluster, a state backend (RocksDB / S3), checkpointing, savepoint management, a schema registry, and so on. For "I want my materialized views to update themselves", that is overkill.
Use Flink if: you have stateful event processing that goes beyond derived tables — state machines, complex CEP, multi-source joins at high throughput.
Use pg_trickle if: you want stream-table semantics and you are already running PostgreSQL.
vs. Debezium + sink (Kafka Connect, etc.)
Debezium captures changes from PostgreSQL and emits them onto Kafka (or another stream). It is only the change-capture half of the problem — you still need a downstream consumer that turns those changes into a derived table.
| Debezium | pg_trickle | |
|---|---|---|
| Captures changes from PostgreSQL | ✅ | ✅ (built-in CDC) |
| Computes derived tables | ✕ (you write that) | ✅ |
| Kafka required | ✅ | ✕ |
| Downstream sinks | Many | Logical replication via downstream publications |
Use Debezium if: you need to fan changes out to many heterogeneous downstream systems (Elasticsearch, S3, Snowflake, a data lake).
Use pg_trickle if: you want the derived table to live in PostgreSQL itself. You can still expose stream-table changes via downstream publications — and even use Debezium to read those.
vs. ksqlDB
ksqlDB gives you streaming SQL on top of Kafka. Same trade-off as Materialize/RisingWave: another engine, another set of operational concerns.
If your data already lives in Kafka and you want SQL on it, ksqlDB is a fine choice. If your data lives in PostgreSQL, pg_trickle is closer to where it already is.
vs. Snowflake Dynamic Tables
Snowflake Dynamic Tables are auto-refreshing tables inside Snowflake. They occupy almost exactly the same conceptual slot as pg_trickle — but in a different database.
Use whichever matches the database you have.
vs. "cron + REFRESH MATERIALIZED VIEW"
This is what most teams build before they find a real IVM tool. It works, until:
- Refreshes start to overlap.
- A long refresh blocks readers.
- The refresh becomes too expensive to run as often as you'd like.
- A second view depends on the first and you start writing ordering logic.
- A failure leaves stale data and nobody notices.
When that happens, pg_trickle's quick start is ~5 minutes of setup.
See also: Use Cases · Migrating from materialized views · Migrating from pg_ivm · Research and prior art
Glossary
A plain-language reference for the terms used throughout the pg_trickle documentation. If a term isn't here, check the FAQ — and please open an issue so we can add it.
How to use this page. Most pg_trickle pages link the first use of a jargon term back to the matching entry below. You can also search this page directly (the entries are alphabetised within each section).
Core concepts
Stream table
A table whose contents are defined by a SQL query, and that pg_trickle keeps
up to date automatically as the underlying data changes. Think of it as a
materialized view that maintains itself — without you ever calling
REFRESH MATERIALIZED VIEW.
Defining query
The SQL SELECT statement you give to pgtrickle.create_stream_table().
It can use joins, aggregates, CTEs, window functions, and most of standard
SQL. The defining query is what pg_trickle differentiates to compute deltas.
Source table
A regular PostgreSQL table that a stream table reads from. Source tables
are written to in the normal way (INSERT, UPDATE, DELETE); pg_trickle
captures those writes and propagates them downstream.
Base table
Synonym for source table. Used interchangeably in older docs.
Schedule
How often a stream table refreshes. May be a duration ('5s', '10m'),
a cron expression ('@hourly', '0 * * * *'), the special value
'CALCULATED' (derived from downstream consumers), or NULL (only refresh
when called manually, or for IMMEDIATE mode).
Refresh
A single round of bringing a stream table up to date with its sources. Each refresh either rewrites the whole result (FULL) or applies only the incremental change (DIFFERENTIAL).
Refresh mode
Tells pg_trickle how to refresh. The four modes:
- AUTO — pick the cheapest mode each cycle (the default).
- DIFFERENTIAL — incremental: only changed rows are processed.
- FULL — re-run the entire defining query.
- IMMEDIATE — refresh inside the same transaction as the source DML (no scheduler involved).
Incremental View Maintenance (IVM)
The technique of updating a materialized view by computing only the change induced by recent edits, rather than re-running the whole query. pg_trickle is an IVM engine for PostgreSQL.
Differential
Synonym for "incremental" in the IVM sense. Also the name of the refresh mode that uses incremental computation. Inspired by differential dataflow and the DBSP framework — see also delta query below.
Delta query (ΔQ)
The SQL pg_trickle generates internally to compute the change in a stream
table given a change in its inputs. ΔQ is derived automatically from the
defining query's operator tree. You can inspect it with
pgtrickle.explain_st(name).
Operator tree
The internal representation of your defining query — a tree of nodes like scan, filter, join, aggregate. pg_trickle differentiates this tree operator by operator to derive the delta query.
DAG (directed acyclic graph)
The shape of your stream-table dependencies. If stream table B reads from stream table A, there is an edge A → B. pg_trickle refreshes the DAG in topological order so that downstream tables always see consistent upstream state.
Diamond (in a DAG)
A pattern where two parallel branches both depend on a common ancestor and both feed into a common descendant (A → B → D and A → C → D). pg_trickle refreshes diamonds atomically to prevent the descendant from seeing one branch updated and the other not.
SCC (strongly connected component)
A cycle in the dependency graph — a group of stream tables that all
transitively depend on each other. pg_trickle supports cycles only for
monotone queries and only when explicitly enabled
(pg_trickle.allow_circular). See
Circular Dependencies.
Change capture
CDC (Change Data Capture)
The mechanism that records every INSERT, UPDATE, and DELETE on a
source table so pg_trickle knows what changed since the last refresh.
pg_trickle has two CDC backends — see CDC Modes.
Trigger-based CDC
The default backend. Lightweight AFTER row-level triggers on each source
table write a single row to a change buffer per data change. Cost is
roughly 2–15 µs per row, paid by the writing transaction.
WAL-based CDC
The optional backend that uses PostgreSQL's logical replication to read
changes from the write-ahead log instead of via triggers. Requires
wal_level = logical. Adds near-zero write-side cost.
Hybrid CDC
The default behaviour: pg_trickle starts with triggers (which always work),
and if wal_level = logical is available, transitions automatically to WAL
once the first refresh succeeds. If anything goes wrong, it falls back to
triggers.
Change buffer
A small per-source table in the pgtrickle_changes.* schema that holds
captured changes between refreshes. Each refresh drains the relevant rows.
Compaction
Collapsing redundant entries in a change buffer — for example an
INSERT followed by a matching DELETE cancels out. pg_trickle compacts
buffers automatically when refreshes are batched.
Watermark
A timestamp or position published by an external loader that tells pg_trickle "you can safely consider data through here as complete". Downstream refreshes wait until all relevant watermarks are aligned. Used in ETL bootstrap patterns.
Frontier
A set of per-source positions (LSN or logical timestamps) that records where each input was up to at the moment of the last refresh. The next refresh reads from frontier→now. Frontiers are how pg_trickle guarantees correctness across multiple sources.
LSN (Log Sequence Number)
A PostgreSQL identifier for a position in the write-ahead log. Looks like
16/B374D8A0. pg_trickle uses LSNs to record frontiers in WAL CDC mode.
Refresh & scheduling
CALCULATED schedule
The default. Set an explicit refresh interval only on the consumer-facing stream tables (the ones your application queries). Upstream stream tables inherit the tightest cadence among their downstream dependents automatically.
Tier (Hot / Warm / Cold / Frozen)
A coarse refresh cadence bucket used for very large deployments
(pg_trickle.tiered_scheduling = on). Hot tables refresh as scheduled;
Frozen tables never refresh until manually thawed.
Adaptive fallback
The engine's automatic switch from DIFFERENTIAL to FULL when the change ratio exceeds a threshold (default 50%). It switches back when the rate drops.
Fuse (circuit breaker)
A safety mechanism: a stream table that fails repeatedly is automatically
suspended so it cannot block the scheduler. You re-enable it with
pgtrickle.reset_fuse(). See
Fuse Circuit Breaker.
MERGE
PostgreSQL's MERGE statement, which lets pg_trickle apply a delta
(insert / update / delete in one go) to the stream table's storage.
Most of a refresh's wall-clock time is spent in MERGE — i.e., in
PostgreSQL itself, not in pg_trickle.
Scoped recomputation
A delta-application strategy used for MIN, MAX, and TopK aggregates:
re-aggregate just the affected groups (rather than the whole result) by
reading only the rows that match the changed keys.
Group-rescan
Similar to scoped recomputation, used for "holistic" aggregates like
STRING_AGG, ARRAY_AGG, MODE, PERCENTILE_*.
Predicate pushdown
The optimisation that injects WHERE clauses from the defining query
directly into change-buffer scans, so irrelevant changes are filtered out
at read time.
Columnar tracking
A capture-side optimisation: CDC records only the columns referenced by the defining query, encoded as a bitmask. Updates that touch only unreferenced columns are skipped entirely.
Background workers
Launcher
The single per-server background worker that scans pg_database every few
seconds and spawns a scheduler in every database where pg_trickle is
installed.
Scheduler
The per-database background worker that wakes periodically, decides which stream tables are due for refresh, and (in parallel mode) dispatches refresh jobs to a worker pool.
BGW (BackGround Worker)
A PostgreSQL concept — a long-running process spawned by the postmaster. pg_trickle uses BGWs for the launcher, schedulers, and parallel refresh workers.
Parallel refresh
An execution mode (pg_trickle.parallel_refresh_mode = 'on') where
independent stream tables in the DAG are refreshed concurrently across a
pool of dynamic background workers.
Aggregates
Algebraic aggregate
An aggregate that can be maintained from previous state plus a delta (SUM, COUNT, AVG by tracking sum + count). Cheapest possible IVM.
Semi-algebraic aggregate
An aggregate that can be maintained on inserts cheaply, but on a delete may need to rescan the affected group (MIN, MAX). pg_trickle handles this with scoped recomputation.
Holistic aggregate
An aggregate that has no incremental form (PERCENTILE_*, MODE, STRING_AGG, ARRAY_AGG). pg_trickle re-aggregates the affected groups from source.
Stream-table features
Snapshot
A point-in-time copy of a stream table's contents (v0.27+). Useful for backups, replica bootstrap, or test fixtures. See Snapshots.
Outbox
A stream-table-backed implementation of the transactional outbox pattern: write events in the same transaction as your business data; external systems consume them with at-least-once delivery. See Transactional Outbox.
Inbox
The mirror of the outbox: receive events idempotently from an external system, with stream tables giving you live views of pending work and a dead-letter queue. See Transactional Inbox.
Publication (downstream)
A regular PostgreSQL logical-replication publication automatically created over a stream table's storage. Lets Debezium, Kafka Connect, Spark, etc. subscribe to stream-table changes without an extra pipeline. See Downstream Publications.
Relay
A standalone Rust binary (pg-tide-relay) that bridges outbox/inbox
tables with external messaging systems (NATS, Kafka, Redis Streams,
SQS, RabbitMQ, webhooks). Extracted to the
pg_tide project in v0.46.0.
TopK
Stream tables of the form SELECT … ORDER BY x LIMIT N (optionally with
OFFSET M). pg_trickle stores only the top N rows and recomputes them
incrementally when the changes affect the leaderboard.
IMMEDIATE mode
A refresh mode that maintains the stream table inside the same transaction as the source DML — no scheduler, no change buffers. Gives read-your-writes consistency at the cost of slightly heavier writes.
Engine internals
DVM (Differential View Maintenance)
The engine inside pg_trickle that turns operator trees into delta queries. The name is used informally; the academic name for the underlying technique is differential dataflow.
DBSP
The academic framework that pg_trickle's differentiation rules are based on. See the DBSP Comparison for the relationship between pg_trickle and the original DBSP runtime.
Bilinear expansion
The expansion of Δ(A ⋈ B) = ΔA ⋈ B + A ⋈ ΔB + ΔA ⋈ ΔB for joins. This
is the formal recipe for incrementally maintaining a join. You don't need
to know it to use pg_trickle.
Semi-naive evaluation
A classical algorithm for incremental recursive queries (WITH RECURSIVE).
pg_trickle uses it in IMMEDIATE and DIFFERENTIAL modes for insert-only
recursion.
DRed (Delete-and-Rederive)
A companion algorithm to semi-naive evaluation that handles deletions inside recursive queries. Also used in IMMEDIATE mode.
__pgt_row_id
A hidden column pg_trickle adds to every stream table to give each row a stable identity, even if the defining query has no primary key. You can ignore it in your queries; it is replicated correctly to logical-replication subscribers.
Auto-rewrite
A pipeline of small rewrites pg_trickle applies to your defining query
before differentiation — for example expanding DISTINCT ON to
ROW_NUMBER(), inlining views, or splitting GROUPING SETS into
UNION ALL. See
Auto-Rewrite Pipeline.
Operations & infrastructure
GUC (Grand Unified Configuration)
A PostgreSQL configuration variable, set in postgresql.conf, by
ALTER SYSTEM, or per-session with SET. All pg_trickle GUCs start with
pg_trickle.*. The full list is in Configuration.
Advisory lock
A lightweight lock you ask PostgreSQL to keep on your behalf. pg_trickle uses advisory locks to coordinate refreshes across processes — including across PgBouncer-pooled sessions, which is why pg_trickle is pooler-friendly.
Pooler
Connection pooler such as PgBouncer or pgcat, often deployed in front of PostgreSQL. pg_trickle's background workers connect directly (not through the pooler), so your app's pooler does not interact with refresh activity.
CNPG
CloudNativePG, a Kubernetes operator for
PostgreSQL. pg_trickle ships a minimal scratch-based OCI image suitable
for CNPG's Image Volume Extensions feature.
pgrx
The Rust framework pg_trickle is built with. Provides safe wrappers around PostgreSQL internals.
wal_level
The PostgreSQL setting that controls how much information the write-ahead
log contains. Values: minimal, replica (default), logical. WAL-based
CDC requires logical.
Replication slot
A PostgreSQL object that retains WAL until a consumer has read it. WAL CDC mode creates one slot per source table; trigger CDC mode does not.
Acronym key
| Acronym | Meaning |
|---|---|
| BGW | Background worker |
| CDC | Change Data Capture |
| CNPG | CloudNativePG |
| CTE | Common Table Expression (WITH …) |
| DAG | Directed Acyclic Graph |
| DBSP | Database Stream Processor (the framework pg_trickle is inspired by) |
| DDL | Data Definition Language (CREATE, ALTER, DROP) |
| DLQ | Dead-Letter Queue |
| DML | Data Manipulation Language (INSERT, UPDATE, DELETE) |
| DRed | Delete-and-Rederive (recursive query algorithm) |
| DVM | Differential View Maintenance (the engine inside pg_trickle) |
| GUC | Grand Unified Configuration (PostgreSQL setting) |
| IVM | Incremental View Maintenance |
| LSN | Log Sequence Number |
| OID | Object Identifier (PostgreSQL row ID for catalog objects) |
| ΔQ | Delta query — the SQL pg_trickle generates to compute changes |
| RLS | Row-Level Security |
| SCC | Strongly Connected Component (a cycle in the DAG) |
| SLA | Service Level Agreement |
| SLO | Service Level Objective |
| SPI | Server Programming Interface (PostgreSQL's in-process query API) |
| SRF | Set-Returning Function |
| ST | Stream Table |
| WAL | Write-Ahead Log |
See also: FAQ · SQL Reference · Architecture · Configuration
Playground
The quickest way to explore pg_trickle is the playground — a pre-configured Docker environment with sample data and stream tables ready to query. No installation, no configuration. One command and you're running.
Quick Start
git clone https://github.com/trickle-labs/pg-trickle.git
cd pg-trickle/playground
docker compose up -d
Then connect:
psql postgresql://postgres:playground@localhost:5432/playground
PostgreSQL 18+ note: The Docker image stores data in a versioned subdirectory (
/var/lib/postgresql/18/main). The compose file mounts/var/lib/postgresql(not.../data) — this is intentional.
What's Pre-Loaded
The seed script creates three base tables and five stream tables that cover the most common pg_trickle patterns.
Base Tables
| Table | Description |
|---|---|
products | Product catalog with categories and prices |
orders | Order line items with quantities and timestamps |
customers | Customer profiles with regions |
Stream Tables
| Stream Table | Query | Pattern demonstrated |
|---|---|---|
sales_by_region | SUM(total) grouped by region | Basic aggregate, DIFFERENTIAL mode |
top_products | SUM(quantity) ranked by category | Window function (RANK()) |
customer_lifetime_value | Revenue + order count per customer | Multi-table join + aggregates |
daily_revenue | Revenue per day | Time-series aggregation |
active_products | Products with orders | EXISTS subquery |
Exercises
1. Watch an INSERT propagate
-- Current state
SELECT * FROM sales_by_region ORDER BY region;
-- Insert a new order
INSERT INTO orders (customer_id, product_id, quantity, order_date)
VALUES (1, 1, 10, CURRENT_DATE);
-- After ~1 s the stream table refreshes
SELECT * FROM sales_by_region ORDER BY region;
2. Inspect pg_trickle internals
-- Overall health
SELECT * FROM pgtrickle.health_check();
-- Status of all stream tables
SELECT name, status, refresh_mode, staleness
FROM pgtrickle.pgt_status()
ORDER BY name;
-- Recent refresh activity
SELECT start_time, stream_table, action, status, duration_ms
FROM pgtrickle.refresh_timeline(10);
-- Delta SQL for a stream table
SELECT pgtrickle.explain_st('sales_by_region');
-- Change buffer sizes
SELECT * FROM pgtrickle.change_buffer_sizes();
3. Update and Delete
-- Update a product price
UPDATE products SET price = 99.99 WHERE name = 'Widget';
-- customer_lifetime_value re-calculates
SELECT * FROM customer_lifetime_value ORDER BY total_revenue DESC LIMIT 5;
-- Delete a customer's orders
DELETE FROM orders WHERE customer_id = 3;
-- Stream tables reflect the removal
SELECT * FROM sales_by_region ORDER BY region;
4. Create your own stream table
SELECT pgtrickle.create_stream_table(
name => 'my_experiment',
query => $$
SELECT p.category,
COUNT(DISTINCT o.customer_id) AS unique_buyers,
SUM(o.quantity) AS total_units
FROM orders o
JOIN products p ON p.id = o.product_id
GROUP BY p.category
HAVING SUM(o.quantity) > 5
$$,
schedule => '2s'
);
SELECT * FROM my_experiment;
Tear Down
docker compose down -v
The -v flag removes the data volume. Omit it if you want to keep your changes.
Next Steps
- Getting Started Guide — full tutorial with an org-chart example
- SQL Reference — all functions and parameters
- Best-Practice Patterns — production-ready patterns
Real-time Demo
This demo shows pg_trickle doing real work: a continuous stream of events flows into PostgreSQL, and a DAG of stream tables keeps a live view of that data up to date — automatically, incrementally, and within seconds.
Three scenarios are available via the DEMO_SCENARIO environment variable:
| Scenario | Default? | Pipeline |
|---|---|---|
fraud | — | Financial fraud detection — 9-node, 4-layer DAG over a transaction stream |
ecommerce | ✅ | E-commerce analytics — 6-node DAG over a continuous order stream |
finance | — | Financial risk analytics — 10-level deep DAG with only leaf schedules (CALCULATED throughout) |
Each scenario includes two purpose-built differential efficiency showcases: stream tables with sub-1.0 change ratios that demonstrate when DIFFERENTIAL mode is clearly the right choice.
It is the fastest way to see how stream tables, differential refresh, and DAG-aware scheduling work together on data you can watch moving.
Quick Start
cd demo
# E-commerce analytics (default)
docker compose up --build
# Fraud detection
DEMO_SCENARIO=fraud docker compose up --build
# Financial risk analytics (10-level DAG with deep calculated dependencies)
DEMO_SCENARIO=finance docker compose up --build
Open http://localhost:8080 — the dashboard refreshes every 2 seconds.
To stop and remove all data:
docker compose down -v
Switching scenarios requires removing the old data volume:
docker compose down -v DEMO_SCENARIO=fraud docker compose upOr use any of the three:
fraud,ecommerce,finance.
What the Demo Does
Three Docker services start together:
| Service | Role |
|---|---|
| postgres | PostgreSQL 18 with pg_trickle; initialises the schema, seed data, and all stream tables on first boot |
| generator | Python script that continuously inserts events; periodically triggers bursts that stress-test differential refresh |
| dashboard | Flask web app served at http://localhost:8080; reads from stream tables and auto-refreshes every 2 seconds |
The DEMO_SCENARIO variable controls which SQL files are loaded on startup
and which generator/dashboard module is activated. Both scenarios share the
same Docker Compose services.
Scenario: fraud
The demo models the data pipeline a financial institution might build to spot suspicious activity as it happens, not hours later in a batch job.
Source Data
Four regular PostgreSQL tables hold the reference data:
| Table | Contents |
|---|---|
users | 30 users, each with a name, country, and account age |
merchants | 15 merchants across categories: Retail, Electronics, Travel, Food, Pharmacy, Gambling, Crypto |
transactions | The live stream — the generator inserts here continuously |
merchant_risk_tier | Slowly-changing risk tier (STANDARD / ELEVATED / HIGH) for each merchant; the generator rotates one merchant's tier every ~30 cycles |
transactions is the only table that grows continuously. merchant_risk_tier
changes occasionally (about one row per minute). Everything else is static.
Normal vs. Suspicious Traffic
The generator creates two kinds of transactions:
Normal traffic — a random user buys something from a random merchant at a plausible amount for that merchant's category. Inserted at roughly one per second.
Suspicious burst — every ~45 seconds, the generator picks one user and fires 6–14 rapid transactions (0.15–0.45 s apart) at Crypto or Gambling merchants, with amounts that escalate with each successive transaction. This pattern is designed to cross the risk thresholds and light up the HIGH-risk column on the dashboard.
The DAG of Stream Tables (fraud)
All nine stream tables are defined in demo/postgres/fraud/02_stream_tables.sql.
Base tables Layer 1 — Silver Layer 2 — Gold Layer 3 — Platinum
──────────── ────────────────────── ───────────────────── ──────────────────────
┌──────────┐ ┌──────────────────┐
│ users │───────────►│ user_velocity │──────────────────────────────────►┌──────────────┐
└──────────┘ │ DIFFERENTIAL 1s │ │ country_risk │
└──────┬───────────┘ │ DIFF, calc │
│ └──────────────┘
┌──────────────┐ │ ┌──────────────────┐
│ transactions │───────────────┼─►│ merchant_stats │
│ (stream) │ │ │ DIFFERENTIAL 1s │
└──────────────┘ │ └──────┬───────────┘
│ │ │
│ ┌───────────┴─────────┘ ← DIAMOND DEPENDENCY
│ │
│ ▼ ┌─────────────────┐
│ ┌────────────────────┐ │ alert_summary │
│ │ risk_scores │────────────────────────────────────────────►│ DIFF, calc │
│ │ FULL, calc │ └─────────────────┘
│ └────────────────────┘
│ ┌───────────────────────┐
│ │ top_risky_merchants │
└─────────────────────────────────────────────────────────────────────► │ DIFF, calc │
└───────────┬───────────┘
┌──────────┐ ┌──────────────────┐ │
│merchants │───────────►│ category_volume │ ┌──────────────────────────────────────▼──────┐
└──────────┘ │ DIFFERENTIAL 1s │ │ top_10_risky_merchants │
└──────────────────┘ │ DIFFERENTIAL 5s ← SHOWCASE #2 │
│ change ratio ≈ 0.25 (LIMIT 10) │
┌───────────────────┐ ┌──────────────────────┐ └─────────────────────────────────────────────┘
│ merchant_risk_tier│──►│ merchant_tier_stats │ ← SHOWCASE #1
│ (slowly-changing) │ │ DIFFERENTIAL 5s │ change ratio ≈ 0.07
└───────────────────┘ │ │
┌──────────┐ │ │
│merchants │───────────►│ │
└──────────┘ └──────────────────────┘
Layer 1 — Silver: Direct Aggregates
These three stream tables each read directly from the base tables and refresh every second using DIFFERENTIAL mode. pg_trickle calculates only what changed since the last refresh — if five new transactions arrived, it adjusts exactly the five affected aggregate buckets rather than recomputing the full table.
user_velocity — per-user transaction statistics
For each of the 30 users, keeps a running count of transactions, total spend, average transaction amount, and how many distinct merchants they have visited. This is the core input for detecting users who are suddenly transacting far more than usual.
SELECT u.id, u.name, u.country,
COUNT(t.id) AS txn_count,
SUM(t.amount) AS total_spent,
ROUND(AVG(t.amount), 2) AS avg_txn_amount,
COUNT(DISTINCT t.merchant_id) AS unique_merchants
FROM users u
LEFT JOIN transactions t ON t.user_id = u.id
GROUP BY u.id, u.name, u.country
merchant_stats — per-merchant baseline
Tracks how many transactions each merchant typically sees and at what amounts. A transaction that is 3× the merchant's own average is more suspicious than one that is 3× the user's average — this table supplies that context.
category_volume — industry-level view
Groups by merchant category (Crypto, Gambling, Retail, etc.) so the dashboard can show which sectors are hot at any moment. Refreshes every second and uses DIFFERENTIAL: a new transaction in Electronics updates only the Electronics row.
Layer 2 — Gold: Derived Metrics
These stream tables read from the Layer 1 stream tables (stream tables reading stream tables). pg_trickle's scheduler refreshes Layer 1 first, then triggers Layer 2 automatically — you never schedule this yourself.
risk_scores — the diamond convergence node
This is the most interesting table in the DAG. It joins:
transactions— the raw eventuser_velocity— that user's accumulated behaviour (Layer 1)merchant_stats— that merchant's baseline (Layer 1)
Because transactions feeds both user_velocity and merchant_stats
independently, and risk_scores depends on both, this creates a classic
diamond dependency:
transactions ──→ user_velocity ──┐
├──→ risk_scores
transactions ──→ merchant_stats ──┘
pg_trickle detects this diamond and schedules both Layer 1 nodes before
triggering the Layer 2 refresh, so risk_scores always sees fresh context.
The risk scoring logic lives entirely in SQL:
CASE
WHEN uv.txn_count > 20
AND t.amount > 3 * uv.avg_txn_amount
THEN 'HIGH'
WHEN uv.txn_count > 10
OR t.amount > 2 * uv.avg_txn_amount
THEN 'MEDIUM'
ELSE 'LOW'
END AS risk_level
risk_scores uses refresh_mode => 'FULL' because it joins three sources
(including two stream tables) in a way that requires re-evaluating all rows to
maintain correctness. FULL mode is the right choice here — it is still fast
because it is triggered only when an upstream actually changes.
country_risk — geographic rollup
Reads user_velocity (Layer 1) and aggregates by country. Pure DIFFERENTIAL
because it is a simple GROUP BY country over the Silver layer.
Layer 3 — Platinum: Executive Roll-Ups
These stream tables read from risk_scores (Layer 2) and are defined with
schedule => 'calculated', meaning pg_trickle fires them automatically
whenever their upstream changes.
alert_summary — the primary KPI
Counts and totals by risk level (LOW / MEDIUM / HIGH). This is what drives the
four big counters at the top of the dashboard. Because it is a simple aggregate
over risk_scores, it uses DIFFERENTIAL mode and updates only the affected
risk-level row on each cycle.
top_risky_merchants — merchant triage list
Groups risk_scores by merchant name and category, counting how many HIGH and
MEDIUM transactions each merchant has seen, plus a risk-rate percentage.
Operationally this is where a fraud team would start when deciding which
merchants to review or block.
Differential Efficiency Showcases
Two additional stream tables sit outside the main fraud pipeline. Their purpose is to demonstrate that DIFFERENTIAL mode can achieve a meaningfully sub-1.0 change ratio when the output cardinality is constrained.
merchant_tier_stats — Showcase #1: slowly-changing lookup source
Joins merchants (static) with merchant_risk_tier (a 15-row lookup that the
generator updates one row per ~30 cycles). Because no fast-growing table is in
the query, only the one rotated merchant's row changes each cycle:
- Change ratio ≈ 1/15 ≈ 0.07
- Refresh Mode Advisor recommendation: ✓ KEEP DIFFERENTIAL
- Schedule: 5 s (independent of the main DAG)
This is the counterpoint to risk_scores. risk_scores correctly uses FULL
because its change ratio is ~1.0; merchant_tier_stats correctly uses
DIFFERENTIAL because its change ratio is ~0.07. Seeing both on the same
dashboard makes the advisor's logic concrete.
top_10_risky_merchants — Showcase #2: fixed-cardinality output
Reads top_risky_merchants (Layer 3) and applies LIMIT 10. Even though the
upstream changes heavily every cycle, only the merchants whose rank crosses the
top-10 boundary produce a net change in the output. Typically 2–3 merchants
enter or leave the top 10 per refresh cycle:
- Change ratio ≈ 0.2–0.3
- Refresh Mode Advisor recommendation: ✓ KEEP DIFFERENTIAL
- Schedule: 5 s
The Dashboard
Open http://localhost:8080. Panels update every 2 seconds.
KPI Row
Four counters: total transactions, LOW / MEDIUM / HIGH counts. Below them, a proportional colour bar that makes the risk mix visible at a glance.
Recent Alerts
A live table of the most recent HIGH and MEDIUM transactions — transaction ID, user name, merchant, category, amount, and risk badge. Rows turn red for HIGH and amber for MEDIUM. During a burst you will see this fill with HIGH rows as the generator fires rapid transactions.
Merchant Risk Leaderboard
Sorted by HIGH-risk count descending. Crypto and Gambling merchants will typically appear at the top because the generator targets them during bursts.
User Velocity, Country Overview, Category Volume
Three side-by-side tables driven by the Layer 1 stream tables. These are the most-refreshed tables in the DAG (every second); watching user velocity change in real time illustrates why DIFFERENTIAL mode matters — only the affected rows move.
Merchant Tier Stats and Tiers
Two panels driven by merchant_tier_stats. The left panel shows the full 15-row
output (merchant ID, name, category, current tier, risk score, and when the tier
last changed). The right panel shows a compact tier-only view. Tiers are
colour-coded: HIGH = red, ELEVATED = amber, STANDARD = green. Tiers rotate
visibly every ~30 generator cycles (roughly once per minute).
Top 10 Risky Merchants Leaderboard
A live leaderboard driven by top_10_risky_merchants. Shows rank, merchant
name, category, total transactions, HIGH and MEDIUM risk counts, and a
risk-rate percentage. The percentage column is coloured green (<25%), amber
(25–49%), or red (≥50%). Watch the rankings shift as the generator's burst
patterns accumulate.
Stream Table Status
A compact status panel showing each stream table's refresh mode, schedule, and
whether it is populated. This reads from pgtrickle.pgt_status().
DAG Topology
A collapsible ASCII diagram showing the full dependency graph. Useful as a reference while exploring the database directly.
Exploring the Database (fraud)
Connect directly to inspect the stream tables and pg_trickle internals:
docker compose exec postgres psql -U demo -d fraud_demo
Check all stream table status:
SELECT name, status, refresh_mode, is_populated, staleness
FROM pgtrickle.pgt_status();
Inspect the DAG dependency graph (full tree):
SELECT tree_line FROM pgtrickle.dependency_tree()
ORDER BY tree_line;
See the most recent refresh history:
SELECT st.pgt_name, rh.action, rh.status,
(rh.end_time - rh.start_time) AS duration, rh.start_time
FROM pgtrickle.pgt_refresh_history rh
JOIN pgtrickle.pgt_stream_tables st ON st.pgt_id = rh.pgt_id
ORDER BY rh.start_time DESC
LIMIT 20;
Spot the diamond groups (stream tables with shared sources):
SELECT group_id, member_name, is_convergence, schedule_policy
FROM pgtrickle.diamond_groups()
ORDER BY group_id, is_convergence DESC;
Watch a HIGH-risk alert appear:
-- In one terminal: watch for new HIGH rows
SELECT txn_id, user_name, merchant_name, amount
FROM risk_scores
WHERE risk_level = 'HIGH'
ORDER BY txn_id DESC
LIMIT 5;
-- Wait ~45 seconds for the next burst, then run the query again.
Force a manual refresh:
SELECT pgtrickle.refresh_stream_table('risk_scores');
How the Files Are Organised
demo/
├── docker-compose.yml # Service definitions; DEMO_SCENARIO selects the scenario
├── README.md # Quick start + scenario descriptions
│
├── postgres/
│ ├── fraud/
│ │ ├── 01_schema.sql # Base tables + seed data (30 users, 15 merchants,
│ │ │ # 40 initial transactions, merchant_risk_tier)
│ │ └── 02_stream_tables.sql# All 9 stream table definitions (Layers 1–3 + showcases)
│ ├── ecommerce/
│ │ ├── 01_schema.sql # Base tables + seed data (customers, products,
│ │ │ # categories, orders, product_catalog)
│ │ └── 02_stream_tables.sql# All 6 stream table definitions (Layers 1–3 + showcases)
│ └── finance/
│ ├── 01_schema.sql # Base tables + seed data (30 instruments, 50 accounts,
│ │ # 5 portfolios, 8 sectors, seed trades/prices)
│ └── 02_stream_tables.sql# All 10 stream table definitions (L1–L10 cascade)
│
├── generator/
│ ├── Dockerfile
│ ├── requirements.txt # psycopg2-binary only
│ ├── generate.py # Scenario dispatcher; reads DEMO_SCENARIO
│ └── scenarios/
│ ├── fraud.py # Transaction generator (normal + burst mode)
│ ├── ecommerce.py # Order generator (normal + flash sale mode)
│ └── finance.py # Trade + price tick generator (normal + algo burst mode)
│
└── dashboard/
├── Dockerfile
├── requirements.txt # flask + psycopg2-binary
├── app.py # Scenario dispatcher: / → HTML, /api/data → JSON,
│ # /api/internals → stream table metadata (shared)
└── scenarios/
├── fraud.py # Fraud HTML, DAG diagram, and data queries
├── ecommerce.py # E-commerce HTML, DAG diagram, and data queries
└── finance.py # Finance HTML, DAG diagram, and data queries
Why These Design Choices
Why seven stream tables instead of one big query?
Splitting the computation across three layers means each layer does a smaller,
cheaper differential computation. A single-query approach that joins users,
transactions, and merchants directly into risk_scores would force a FULL
refresh every cycle because no single source table captures all changes. With
the layered approach:
- L1 tables catch source changes differentially (they are cheap to maintain)
- L2/L3 tables read from already-aggregated L1 data (smaller join inputs)
- The scheduler only re-runs a layer when its inputs actually changed
Why is risk_scores FULL and not DIFFERENTIAL?
risk_scores is a 1:1 projection of transactions — one output row per
input transaction, with enrichment from the L1 stream tables. Since
transactions is append-only, the change ratio is ~1.0: every refresh adds
roughly as many new rows as existed before, making DIFFERENTIAL no more
efficient than FULL. Additionally, FULL mode simplifies the refresh logic
(avoiding complex multi-source delta algebra) while remaining fast because the
table is small and updates are infrequent (only triggered when L1 feeds new
data).
pg_trickle's diagnostic system (recommend_refresh_mode()) confirms this
choice: it observed an avg change ratio of 1.0 and recommended to KEEP FULL.
The L3 tables (alert_summary, top_risky_merchants) are purely
single-upstream aggregates over risk_scores and therefore support
DIFFERENTIAL efficiently — the change ratio there is much lower.
Why schedule => 'calculated' for L2 and L3?
calculated means "refresh whenever an upstream stream table has new data."
This is the right choice for derived layers: there is no point refreshing
risk_scores if neither user_velocity nor merchant_stats has changed, and
there is no point waiting for a clock interval when they have.
Why triggers and not logical replication?
The demo uses the default CDC mode (row-level AFTER triggers). This works with
any PostgreSQL 18 installation out of the box — no replication slot
configuration, no wal_level = logical requirement. For production deployments
with very high write throughput, WAL-based CDC is more efficient. pg_trickle
can switch modes transparently; see CONFIGURATION.md for
details.
Empirical optimization: FULL vs DIFFERENTIAL by change ratio
The demo illustrates a practical rule of thumb: when a stream table's change
ratio (fraction of output rows that are inserted or deleted per refresh cycle)
is high (>0.5), FULL mode is often faster than DIFFERENTIAL because the delta
overhead dominates the benefit. Use pgtrickle.recommend_refresh_mode(table_name)
to check — it analyzes actual refresh history and recommends the best mode with
confidence scores.
The Refresh Mode Advisor computes change ratio as:
change_ratio = (rows_inserted + rows_deleted) / max(reltuples, 1)
where reltuples is the stream table's current row count from pg_class. This
gives a meaningful fraction: 0.07 means 7% of output rows changed last cycle;
1.0 means the entire output turned over.
The two showcase tables make this concrete:
| Table | Change ratio | Advisor says |
|---|---|---|
merchant_tier_stats | ≈ 0.07 | ✓ KEEP DIFFERENTIAL |
top_10_risky_merchants | ≈ 0.25 | ✓ KEEP DIFFERENTIAL |
risk_scores | ≈ 1.0 | KEEP FULL (append-only source) |
alert_summary | ≈ 1.0 | KEEP FULL (small table; delta overhead dominates) |
Scenario: ecommerce (default)
The e-commerce scenario models a real-time online store analytics pipeline with orders streaming in continuously.
cd demo
DEMO_SCENARIO=ecommerce docker compose up --build
Source Data
| Table | Contents |
|---|---|
customers | 30 customers with name and country |
products | 15 products across 8 categories (Electronics, Clothing, Sports, etc.) |
categories | 8 product categories |
orders | The live stream — the generator inserts here continuously |
product_catalog | Slowly-changing current price per product; the generator reprices one product every ~30 cycles |
orders is the only table that grows continuously. product_catalog changes
occasionally (about one row per minute). Everything else is static.
Normal vs. Flash Sale Traffic
Normal orders — a random customer orders a random product in quantity 1–2 at roughly the catalog price (±15%). Inserted at roughly one per second.
Flash sale burst — every ~45 seconds, the generator picks one category and fires 8–18 rapid orders (0.10–0.35 s apart) at a 70–90% discount. This creates a visible revenue spike in the Category Revenue panel.
The DAG of Stream Tables (ecommerce)
All six stream tables are defined in demo/postgres/ecommerce/02_stream_tables.sql.
Base tables Layer 1 — Silver Layer 2 — Gold Layer 3 — Platinum
──────────── ────────────────────── ───────────────────── ──────────────────────
┌────────────┐ ┌──────────────────┐
│ customers │──────►│ customer_stats │──────────────────────────────►┌──────────────────┐
└────────────┘ │ DIFFERENTIAL 1s │ │ country_revenue │
└──────────────────┘ │ DIFF, calc │
│ └──────────────────┘
┌────────────┐ │
│ orders │────────────────┘
│ (stream) │────────────────────────────────►┌────────────────┐
└────────────┘ ┌────────────────┐ │ product_sales │
┌────────────┐ │category_revenue│ │ DIFFERENTIAL │
│ products │──────►│ DIFFERENTIAL │ │ 1s │
│ categories │──────►│ 1s │ └────────────────┘
└─────┬──────┘ └────────────────┘
│
┌─────▼──────────────┐ ┌──────────────────────┐
│ product_catalog │──►│ catalog_price_impact │ ← DIFFERENTIAL SHOWCASE #1
│ (slowly-changing) │ │ DIFFERENTIAL 5s │ change ratio ≈ 0.07
└────────────────────┘ └──────────────────────┘
┌──────────────────┐ ┌──────────────────┐
│ customer_stats │──►│ top_10_customers│ ← DIFFERENTIAL SHOWCASE #2
│ DIFFERENTIAL 1s │ │ DIFFERENTIAL │ change ratio ≈ 0.1–0.2
└──────────────────┘ │ calc, LIMIT 10 │
└──────────────────┘
Stream Tables (ecommerce)
| Name | Layer | Mode | Schedule | What it computes |
|---|---|---|---|---|
product_sales | L1 | DIFFERENTIAL | 1 s | Per-product: units sold, revenue, avg selling price |
customer_stats | L1 | DIFFERENTIAL | 1 s | Per-customer: order count, total spent, avg order value |
category_revenue | L1 | DIFFERENTIAL | 1 s | Per-category: orders, units, revenue |
country_revenue | L2 | DIFFERENTIAL | calculated | Per-country: roll-up from customer_stats |
catalog_price_impact | showcase | DIFFERENTIAL | 5 s | Per-product: current vs. base price delta |
top_10_customers | showcase | DIFFERENTIAL | calculated | Top 10 customers by total spend (LIMIT 10) |
Differential efficiency showcases (ecommerce)
catalog_price_impact — Showcase #1: slowly-changing price catalog
Joins products (static) with product_catalog (15 rows, one repriced per
~30 cycles). Only the repriced product's row changes each cycle:
- Change ratio ≈ 1/15 ≈ 0.07
- Refresh Mode Advisor recommendation: ✓ KEEP DIFFERENTIAL
- The
price_deltaandpct_changecolumns highlight repriced products in real time.
top_10_customers — Showcase #2: fixed-cardinality leaderboard
Reads customer_stats (all 30 customers) and applies LIMIT 10. Only rank
boundary crossings produce net output changes:
- Change ratio ≈ 0.1–0.2
- Refresh Mode Advisor recommendation: ✓ KEEP DIFFERENTIAL
Exploring the Database (ecommerce)
docker compose exec postgres psql -U demo -d ecommerce_demo
-- See category revenue live
SELECT category, order_count, units_sold, revenue
FROM category_revenue ORDER BY revenue DESC;
-- Top 10 customers leaderboard
SELECT * FROM top_10_customers;
-- Price changes vs. base price
SELECT product_name, base_price, current_price, pct_change
FROM catalog_price_impact ORDER BY ABS(pct_change) DESC;
-- Refresh efficiency comparison across all stream tables
SELECT pgt_name, avg_diff_ms, diff_speedup, avg_change_ratio
FROM pgtrickle.refresh_efficiency() ORDER BY pgt_name;
Scenario: finance
The finance scenario models a real-time financial risk analytics pipeline
demonstrating a 10-level deep DAG where only the leaf tables have fixed
schedules, and all downstream layers cascade with schedule => 'calculated'.
This showcases pg_trickle's strength in maintaining deep computation graphs
where derived data automatically flows upward through the layers.
cd demo
DEMO_SCENARIO=finance docker compose up --build
Source Data
| Table | Contents |
|---|---|
sectors | 8 financial sectors (Technology, Healthcare, Finance, Energy, etc.) |
instruments | 30 financial instruments (stocks, bonds, commodities) |
accounts | 50 trading accounts |
portfolios | 5 portfolios (institutional investors) |
trades | The live stream — the generator inserts here continuously with buy/sell orders |
market_prices | Slowly-changing OHLC prices per instrument; the generator updates one instrument per cycle with realistic bid/ask/mid prices |
trades is append-only; market_prices updates continuously (one instrument per
generator cycle). Everything else is static reference data.
Real-Time Market Activity
Normal trading — the generator randomly selects accounts and instruments, inserting buy/sell trades at current market prices (~1 trade per second). Buy quantities are positive; sell quantities are negative. This creates a natural, continuous flow of position changes.
Algo bursts — every ~60 seconds, the generator triggers an algorithmic burst (0.05–0.20s between trades, 8–20 trades) concentrated on a few instruments, simulating realistic high-frequency trading patterns that stress-test the differential calculation pipeline.
The DAG of Stream Tables (finance)
All 10 stream tables are defined in demo/postgres/finance/02_stream_tables.sql.
This is the deepest DAG in all three scenarios — a 10-level cascade where each
level reads only from the previous level (plus static reference data), creating
a linear dependency chain. The two leaf tables (price_snapshot at 2s and
net_positions at 1s) are the only ones with fixed schedules; all nine downstream
layers use schedule => 'calculated' to propagate changes automatically.
Base tables Layer 1 Layer 2 ... Layer 10
──────────── ──────── ──────── ─────────
┌──────────────┐ ┌──────────────────────┐
│market_prices │────►│ price_snapshot │
│(stream, 2s) │ │ DIFFERENTIAL 2s │
└──────────────┘ └─────────┬────────────┘
│
┌──────────────┐ │ ┌────────────────┐
│ trades │──────────┐ └────►│position_values │ L2 (calculated)
│(stream, 1s) │ │ │ DIFFERENTIAL │
└──────────────┘ │ └─────────┬──────┘
│ │ │
│ ▼ ▼
│ ┌────────────────┐ ┌───────────────┐
│ │net_positions │ │ account_pnl │ L3 (calculated)
│ │DIFFERENTIAL 1s │ │ DIFFERENTIAL │
│ └────────────────┘ └───────┬───────┘
│ │
│ ┌──────────────────────┤
│ │ │
│ ▼ ▼
└─────────────────►┌────────────────────────────────┐
│ portfolio_pnl, sector_exposure │ L4–L5 (calculated)
│ DIFFERENTIAL │
└────────────────┬───────────────┘
│
▼
┌────────────────────────────┐
│ var_contributions │ L6 (calculated)
│ account_var, portfolio_var│ L7–L8 (calculated)
│ DIFFERENTIAL throughout │
└────────────────┬───────────┘
│
▼
┌────────────────────────────┐
│ regulatory_capital │ L9 (calculated)
│ DIFFERENTIAL │
└────────────────┬───────────┘
│
▼
┌────────────────────────────┐
│ breach_dashboard │ L10 (calculated)
│ DIFFERENTIAL, LIMIT 10 │ Fixed cardinality
└────────────────────────────┘
Stream Tables (finance)
| Layer | Name | Mode | Schedule | What it computes |
|---|---|---|---|---|
| L1 | price_snapshot | DIFFERENTIAL | 2s (fixed) | Per-instrument: current bid/ask/mid, sector affinity |
| L1 | net_positions | DIFFERENTIAL | 1s (fixed) | Per-account-instrument: net quantity, position status |
| L2 | position_values | DIFFERENTIAL | calculated | Per-account-instrument: marked-to-market value (qty × mid) |
| L3 | account_pnl | DIFFERENTIAL | calculated | Per-account: total P&L, position count, exposure |
| L4 | portfolio_pnl | DIFFERENTIAL | calculated | Per-portfolio: aggregate P&L from accounts |
| L5 | sector_exposure | DIFFERENTIAL | calculated | Per-sector: total exposure, position count |
| L6 | var_contributions | DIFFERENTIAL | calculated | Per-position: parametric Value-at-Risk (95% & 99%) |
| L7 | account_var | DIFFERENTIAL | calculated | Per-account: aggregated VaR, diversification |
| L8 | portfolio_var | DIFFERENTIAL | calculated | Per-portfolio: total VaR + stressed scenario (99%) |
| L9 | regulatory_capital | DIFFERENTIAL | calculated | Per-portfolio: Basel simplified capital requirement |
| L10 | breach_dashboard | DIFFERENTIAL | calculated | Top 10 portfolios by capital utilization ratio (LIMIT 10) |
Why This DAG Demonstrates Differential Efficiency
The finance scenario is built specifically to show how DIFFERENTIAL mode excels in deep DAGs:
-
High cardinality at L1 — The 30 × 50 instrument-account combinations yield up to 1,500 potential positions. A price tick affects ~30 positions (change ratio ≈ 0.02).
-
Compressing cardinality downstream — As data aggregates up the layers (L2 → L3 → L4 ...), the output cardinality shrinks. By L5 (sector exposure), there are only 8 rows max. By L10 (top 10 portfolios), there are ≤10 rows.
-
Sub-1.0 change ratios throughout — Because each layer reads only from the previous layer and applies aggregation or filtering:
- L2 (
position_values): change ratio ≈ 0.02 (only changed positions) - L5 (
sector_exposure): change ratio ~1 row per tick (typically only 1–2 sectors affected) - L10 (
breach_dashboard): change ratio ≈ 0–0.1 (typically 0–1 row changes per cycle; fixed cardinality means rank shifts are rare)
- L2 (
-
All-DIFFERENTIAL cascade — Unlike the fraud scenario (which uses one FULL table), finance uses DIFFERENTIAL throughout. This is valid because each layer is a simple aggregate or filtered view of the previous layer (no complex diamond joins with multiple independent sources).
The Dashboard
Open http://localhost:8080. Panels update every 2 seconds.
Key metrics:
- Top KPIs — Total positions tracked, total book exposure (USD), portfolio P&L, aggregate portfolio VaR
- Breach Dashboard — Top 10 portfolios by capital utilization ratio (regulatory capital used / capital limit). Highlights portfolios approaching regulatory constraints.
- Sector Exposure — Per-sector breakdown of position count and net exposure.
- Portfolio Metrics — Per-portfolio P&L and 95% VaR.
- Market Snapshot — Current bid/ask/mid prices for all 30 instruments with sector labels.
- Stream Table Status — Refresh mode, schedule, and population status for all 10 tables.
- DAG Topology — ASCII diagram showing the 10-level cascade and dependency flow.
Exploring the Database (finance)
docker compose exec postgres psql -U demo -d finance_demo
-- See all stream table status
SELECT pgt_name, refresh_mode, schedule, is_populated
FROM pgtrickle.pgt_status()
ORDER BY pgt_id;
-- View the full 10-level dependency graph
SELECT tree_line FROM pgtrickle.dependency_tree()
ORDER BY tree_line;
-- Current market snapshot (L1)
SELECT symbol, sector, bid, ask, mid
FROM price_snapshot
ORDER BY symbol;
-- Net positions per account-instrument (L1)
SELECT account_id, instrument_symbol, net_quantity, position_value
FROM net_positions
WHERE net_quantity != 0
ORDER BY account_id, instrument_symbol;
-- Top-10 portfolio risk ranking (L10)
SELECT portfolio_name, capital_required, capital_limit, utilization_ratio
FROM breach_dashboard
ORDER BY utilization_ratio DESC;
-- Watch differential efficiency in action
-- This shows very low change ratios at the top levels (L1) and even lower at L10
SELECT pgt_name, avg_change_ratio, avg_diff_ms, diff_speedup
FROM pgtrickle.refresh_efficiency()
ORDER BY pgt_id;
Differential Efficiency at Work
Open a terminal and watch the refresh history:
docker compose exec postgres psql -U demo -d finance_demo -c "
SELECT
st.pgt_name,
rh.action,
(rh.end_time - rh.start_time)::int8 AS duration_ms,
rh.start_time
FROM pgtrickle.pgt_refresh_history rh
JOIN pgtrickle.pgt_stream_tables st ON st.pgt_id = rh.pgt_id
WHERE rh.start_time > now() - interval '10 seconds'
ORDER BY rh.start_time DESC
LIMIT 50;
" --watch
5-Minute Quickstart
The shortest possible introduction to pg_trickle. By the end of this
page you will have created a self-maintaining table, watched it update
in real time, and dropped it again — without leaving psql.
Prefer to see it first? Run the playground (
cd playground && docker compose up -d) for a pre-loaded environment, or pull the prebuilt image:docker run --rm -e POSTGRES_PASSWORD=secret -p 5432:5432 \ ghcr.io/trickle-labs/pg_trickle:latestThen connect with
psql postgres://postgres:secret@localhost:5432/postgresand skip to Step 2 below.
Step 1 — Install the extension
If you already have a PostgreSQL 18 server with pg_trickle installed (via the playground, the Docker image, or a manual install — see Installation for full options), skip this step.
Otherwise, the shortest path on a developer machine is the prebuilt Docker image — one command, no configuration:
docker run --rm -e POSTGRES_PASSWORD=secret -p 5432:5432 \
ghcr.io/trickle-labs/pg_trickle:latest
Connect with psql:
psql postgres://postgres:secret@localhost:5432/postgres
Step 2 — Enable the extension
CREATE EXTENSION IF NOT EXISTS pg_trickle;
That's all the configuration you need. The extension auto-discovers every database where it's installed and starts a per-database scheduler.
Step 3 — Create a source table
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
region TEXT NOT NULL,
amount NUMERIC NOT NULL
);
INSERT INTO orders (region, amount) VALUES
('US', 100),
('EU', 200),
('US', 300),
('APAC', 50);
This is a perfectly ordinary table. You will write to it the normal way.
Step 4 — Create a stream table
SELECT pgtrickle.create_stream_table(
name => 'revenue_by_region',
query => $$
SELECT region,
SUM(amount) AS total,
COUNT(*) AS order_count
FROM orders
GROUP BY region
$$,
schedule => '1s'
);
What just happened:
- pg_trickle parsed your query and built an internal operator tree.
- It created a new table
revenue_by_regionwith the right columns. - It installed lightweight
AFTERtriggers onordersto capture changes. - It ran an initial full refresh, populating the new table.
- It registered a 1-second refresh schedule.
Query the stream table — it's already populated:
SELECT * FROM revenue_by_region ORDER BY region;
region | total | order_count
--------+-------+-------------
APAC | 50 | 1
EU | 200 | 1
US | 400 | 2
Step 5 — Watch it update
Insert a new order:
INSERT INTO orders (region, amount) VALUES ('US', 999);
Wait one second (or call SELECT pgtrickle.refresh_stream_table('revenue_by_region')
to refresh immediately):
SELECT * FROM revenue_by_region WHERE region = 'US';
region | total | order_count
--------+-------+-------------
US | 1399 | 3
Only the US group was recomputed — the other regions were not
touched at all. That is differential refresh in action.
Step 6 — Look around
A few useful built-ins worth knowing about right away:
-- Status of all stream tables in this database
SELECT * FROM pgtrickle.pgt_status();
-- A one-shot health triage (returns rows only when something is wrong)
SELECT * FROM pgtrickle.health_check() WHERE severity != 'OK';
-- See what delta SQL pg_trickle would run on the next refresh
SELECT pgtrickle.explain_st('revenue_by_region');
Step 7 — Clean up
SELECT pgtrickle.drop_stream_table('revenue_by_region');
DROP TABLE orders;
This removes the stream table, its catalog entries, and the CDC
triggers on orders.
Where to go next
| If you want to… | Read |
|---|---|
| See a multi-table tutorial with chains and aggregates | 15-Minute Tutorial |
| Walk through every feature in depth | In-Depth Tour |
| Browse common patterns and example apps | Use Cases · Patterns |
| Understand how it works underneath | Architecture |
| Look up a function, GUC, or operator | SQL Reference · Configuration |
| Deploy it to production | Pre-Deployment Checklist |
| Decode a piece of jargon | Glossary |
Getting Started with pg_trickle
What is pg_trickle?
pg_trickle adds stream tables to PostgreSQL — tables that are defined by a SQL query and kept automatically up to date as the underlying data changes. Think of them as materialized views that refresh themselves, but smarter: instead of re-running the entire query on every refresh, pg_trickle uses Incremental View Maintenance (IVM) to process only the rows that changed.
Traditional materialized views force a choice: either re-run the full query (expensive) or accept stale data. pg_trickle eliminates this trade-off. When you insert a single row into a million-row table, pg_trickle computes the effect of that one row on the query result — it doesn't touch the other 999,999.
How data flows
The key concept is that data flows downstream automatically — from your base tables through any chain of stream tables, without you writing a single line of orchestration code:
You write to base tables
│
▼
┌─────────────┐ triggers (or WAL) ┌─────────────────────┐
│ Base Tables │ ─────────────────────▶ │ Change Buffers │
│ (you write) │ │ (pgtrickle_changes.*) │
└─────────────┘ └──────────┬──────────┘
│
delta query (ΔQ) on refresh
│
▼
┌──────────────────────────────────────────────────────────────┐
│ Stream Table A ◀── depends on base tables │
└──────────────────────────┬───────────────────────────────────┘
│ change captured, buffer written
▼
┌──────────────────────────────────────────────────────────────┐
│ Stream Table B ◀── depends on Stream Table A │
└──────────────────────────────────────────────────────────────┘
One write to a base table can ripple through an entire DAG of stream tables — each layer refreshed in the correct topological order, each doing only the work proportional to what actually changed.
- You write to your base tables normally —
INSERT,UPDATE,DELETE - Lightweight
AFTERrow-level triggers capture each change into a buffer, atomically in the same transaction. No polling, no logical replication slots required by default. - On each refresh cycle, pg_trickle derives a delta query (ΔQ) that reads only the buffered changes since the last refresh frontier
- The delta is merged into the stream table — only the affected rows are written
- If other stream tables depend on this one, they are scheduled next (topological order)
- Optionally: once
wal_level = logicalis available and the first refresh succeeds, pg_trickle automatically transitions from triggers to WAL-based CDC (near-zero write-path overhead compared to ~2–15 μs for triggers). The transition is seamless and transparent.
This tutorial walks through a concrete org-chart example so you can see this flow end to end, including a chain of stream tables that propagates changes automatically.
Prerequisites
- PostgreSQL 18.x with pg_trickle installed (see Installation)
shared_preload_libraries = 'pg_trickle'inpostgresql.confmax_worker_processesraised to at least 32 (see Installation); the PostgreSQL default of 8 is often exhausted if you have several databases, causing stream tables to silently stop refreshingpsqlor any SQL client
Deploying to production? See the Pre-Deployment Checklist for a complete list of requirements, pooler compatibility, and recommended GUC values.
Playground: The fastest way to experiment is the playground — a Docker Compose environment with sample tables and stream tables pre-loaded.
cd playground && docker compose up -dand you're running.
Quick start with Docker: Pull the pre-built GHCR image — PostgreSQL 18.3 + pg_trickle ready to run, no configuration needed:
docker run --rm -e POSTGRES_PASSWORD=secret -p 5432:5432 ghcr.io/trickle-labs/pg_trickle:latestAll GUC defaults (
wal_level,shared_preload_libraries, scheduler settings) are pre-configured. See Installation for tag details and volume mounting.
Security — RLS bypass: The pg_trickle background worker executes refresh queries with
SET LOCAL row_security = off. This mirrors the behaviour of PostgreSQL's ownREFRESH MATERIALIZED VIEW, which also bypasses Row-Level Security. As a result, stream table output is always the full, unfiltered result set regardless of any RLS policies defined on source tables. Applications must not rely on RLS to restrict what data a stream table exposes — use view filters, column masking, or a separate per-role view on top of the stream table instead. See PRE_DEPLOYMENT.md for the full security checklist.
Connect to the database you want to use and enable the extension:
CREATE EXTENSION pg_trickle;
No additional configuration is needed. pg_trickle automatically discovers all databases on the server and starts a scheduler for each one where the extension is installed.
Chapter 1: Hello World — Your First Stream Table
Before diving into multi-table joins and recursive CTEs, start with the simplest possible stream table: a single-source aggregate with no joins.
1.1 Setup
Create one table and enable the extension:
CREATE EXTENSION IF NOT EXISTS pg_trickle;
CREATE TABLE products (
id SERIAL PRIMARY KEY,
category TEXT NOT NULL,
price NUMERIC(10,2) NOT NULL,
in_stock BOOLEAN NOT NULL DEFAULT true
);
INSERT INTO products (category, price) VALUES
('Electronics', 299.99),
('Electronics', 49.99),
('Books', 14.99),
('Books', 24.99),
('Books', 9.99);
1.2 Create the stream table
SELECT pgtrickle.create_stream_table(
name => 'category_summary',
query => $$
SELECT
category,
COUNT(*) AS product_count,
ROUND(AVG(price), 2) AS avg_price,
MIN(price) AS min_price,
MAX(price) AS max_price,
COUNT(*) FILTER (WHERE in_stock) AS in_stock_count
FROM products
GROUP BY category
$$,
schedule => '1s'
);
Query it immediately — it was populated by the initial full refresh:
SELECT category, product_count, avg_price, min_price, max_price, in_stock_count
FROM category_summary ORDER BY category;
category | product_count | avg_price | min_price | max_price | in_stock_count
-------------+---------------+-----------+-----------+-----------+----------------
Books | 3 | 16.66 | 9.99 | 24.99 | 3
Electronics | 2 | 174.99 | 49.99 | 299.99 | 2
(2 rows)
1.3 Watch an INSERT update one group
INSERT INTO products (category, price) VALUES ('Books', 39.99);
Within ~1 second (or call SELECT pgtrickle.refresh_stream_table('category_summary') to force it):
SELECT category, product_count, avg_price, min_price, max_price, in_stock_count
FROM category_summary WHERE category = 'Books';
category | product_count | avg_price | min_price | max_price | in_stock_count
----------+---------------+-----------+-----------+-----------+----------------
Books | 4 | 22.49 | 9.99 | 39.99 | 4
(1 row)
The Electronics row was not touched at all — pg_trickle read exactly 1
row from the change buffer, adjusted only the Books group.
1.4 Watch an UPDATE propagate
UPDATE products SET price = 19.99 WHERE price = 299.99;
After the next refresh:
SELECT category, product_count, avg_price, min_price, max_price, in_stock_count
FROM category_summary WHERE category = 'Electronics';
category | product_count | avg_price | min_price | max_price | in_stock_count
-------------+---------------+-----------+-----------+-----------+----------------
Electronics | 2 | 34.99 | 19.99 | 49.99 | 2
(1 row)
For AVG, pg_trickle maintains running sum and count columns internally, so
re-aggregating a group is O(1) regardless of group size.
1.5 What you just saw
- A single function call created the storage table, installed CDC triggers, ran the initial full refresh, and registered a 1-second schedule.
- Every subsequent DML on
productswas captured in anAFTERtrigger — no polling, no logical replication. - Each refresh touched only the rows and groups that changed.
- The stream table is a real PostgreSQL table — you can
SELECT, index, and join againstcategory_summarylike any other table.
Clean up:
SELECT pgtrickle.drop_stream_table('category_summary'); DROP TABLE products;
Chapter 2: Joins, Aggregates & Chains
What you'll build
An employee org-chart system with two stream tables:
department_tree— a recursive CTE that flattens a department hierarchy into paths likeCompany > Engineering > Backenddepartment_stats— a join + aggregation overdepartment_tree(a stream table!) that computes headcount and salary budget, with the full path includeddepartment_report— a further aggregation that rolls up stats to top-level departments
The chain departments → department_tree → department_stats → department_report demonstrates automatic downstream propagation: modify a department name in the base table and all three stream tables update automatically, in the right order, without any manual orchestration.
By the end you will have:
- Seen how stream tables are created, queried, and refreshed
- Watched a single
UPDATEin a base table cascade through three layers of stream tables automatically - Understood the four refresh modes and IVM strategies
Prefer dbt? A runnable dbt companion project mirrors every step below. Clone the repo and run:
./examples/dbt_getting_started/scripts/run_example.shSee examples/dbt_getting_started/ for full details.
2.1 Create the Base Tables
These are ordinary PostgreSQL tables — pg_trickle doesn't require any special column types, annotations, or schema conventions.
Tables without a primary key work, but pg_trickle will emit a WARNING at stream table creation time: change detection falls back to a content-based hash across all columns, which is slower for wide tables and cannot distinguish between identical duplicate rows. Adding a primary key gives the best performance and most reliable change detection. A primary key is also required for automatic transition to WAL-based CDC (cdc_mode = 'auto'); without one the source table stays on trigger-based CDC.
-- Department hierarchy (self-referencing tree)
CREATE TABLE departments (
id SERIAL PRIMARY KEY,
name TEXT NOT NULL,
parent_id INT REFERENCES departments(id)
);
-- Employees belong to a department
CREATE TABLE employees (
id SERIAL PRIMARY KEY,
name TEXT NOT NULL,
department_id INT NOT NULL REFERENCES departments(id),
salary NUMERIC(10,2) NOT NULL
);
Now insert some data — a three-level department tree and a handful of employees:
-- Top-level
INSERT INTO departments (id, name, parent_id) VALUES
(1, 'Company', NULL);
-- Second level
INSERT INTO departments (id, name, parent_id) VALUES
(2, 'Engineering', 1),
(3, 'Sales', 1),
(4, 'Operations', 1);
-- Third level (under Engineering)
INSERT INTO departments (id, name, parent_id) VALUES
(5, 'Backend', 2),
(6, 'Frontend', 2),
(7, 'Platform', 2);
-- Employees
INSERT INTO employees (name, department_id, salary) VALUES
('Alice', 5, 120000), -- Backend
('Bob', 5, 115000), -- Backend
('Charlie', 6, 110000), -- Frontend
('Diana', 7, 130000), -- Platform
('Eve', 3, 95000), -- Sales
('Frank', 3, 90000), -- Sales
('Grace', 4, 100000); -- Operations
At this point these are plain tables with no triggers, no change tracking, nothing special. The department tree looks like this:
Company (1)
├── Engineering (2)
│ ├── Backend (5) — Alice, Bob
│ ├── Frontend (6) — Charlie
│ └── Platform (7) — Diana
├── Sales (3) — Eve, Frank
└── Operations (4) — Grace
2.2 Create the First Stream Table — Recursive Hierarchy
Our first stream table flattens the department tree. For every department, it computes the full path from the root and the depth level. This uses WITH RECURSIVE — a SQL construct that can't be differentiated with simple algebraic rules (the recursion depends on itself), but pg_trickle handles it using incremental strategies (semi-naive evaluation for inserts, Delete-and-Rederive for mixed changes) that we'll explain later.
SELECT pgtrickle.create_stream_table(
name => 'department_tree',
query => $$
WITH RECURSIVE tree AS (
-- Base case: root departments (no parent)
SELECT id, name, parent_id, name AS path, 0 AS depth
FROM departments
WHERE parent_id IS NULL
UNION ALL
-- Recursive step: children join back to the tree
SELECT d.id, d.name, d.parent_id,
tree.path || ' > ' || d.name AS path,
tree.depth + 1
FROM departments d
JOIN tree ON d.parent_id = tree.id
)
SELECT id, name, parent_id, path, depth FROM tree
$$,
schedule => '1s'
);
Note on short schedules: A 1-second schedule is safe for development and production thanks to
auto_backoff(on by default since v0.10.0). If a refresh takes more than 95% of the schedule window, the scheduler automatically stretches the effective interval (up to 8× the configured schedule) to prevent CPU runaway, then resets to 1× as soon as a refresh completes on time. You will see aWARNINGmessage when backoff activates.v0.2.0+:
create_stream_tablealso acceptsdiamond_consistency('none'or'atomic') anddiamond_schedule_policy('fastest'or'slowest') for diamond-shaped dependency graphs. Schedules can be cron expressions (e.g.,'*/5 * * * *','@hourly'). Setpooler_compatibility_mode => trueif you're connecting through PgBouncer or another transaction-mode connection pooler. See SQL_REFERENCE.md for the full parameter list.
What just happened?
That single function call did a lot of work atomically (all in one transaction):
- Parsed the defining query into an operator tree — identifying the recursive CTE, the scan on
departments, the join, the union - Created a storage table called
department_treein thepublicschema — a real PostgreSQL heap table with columns matching the SELECT output, plus internal columns__pgt_row_id(a hash used to track individual rows) - Installed CDC triggers on the
departmentstable — lightweightAFTER INSERT OR UPDATE OR DELETErow-level triggers that will capture every future change - Created a change buffer table in the
pgtrickle_changesschema — this is where the triggers write captured changes - Ran an initial full refresh — executed the recursive query against the current data and populated the storage table
- Registered the stream table in pg_trickle's catalog with a 1-second refresh schedule
TRUNCATE caveat: Row-level triggers do not fire on
TRUNCATE. If youTRUNCATEa base table, the change is not captured incrementally — the stream table will become stale. UseDELETE FROM tableinstead, or callpgtrickle.refresh_stream_table('department_tree')after a TRUNCATE. If the stream table uses DIFFERENTIAL mode, temporarily switch to FULL for a full recompute:pgtrickle.alter_stream_table('department_tree', refresh_mode => 'FULL'), refresh, then switch back. Query it immediately — it's already populated:
SELECT id, name, parent_id, path, depth FROM department_tree ORDER BY path;
Expected output:
id | name | parent_id | path | depth
----+-------------+-----------+----------------------------------+-------
1 | Company | | Company | 0
2 | Engineering | 1 | Company > Engineering | 1
5 | Backend | 2 | Company > Engineering > Backend | 2
6 | Frontend | 2 | Company > Engineering > Frontend | 2
7 | Platform | 2 | Company > Engineering > Platform | 2
4 | Operations | 1 | Company > Operations | 1
3 | Sales | 1 | Company > Sales | 1
(7 rows)
This is a real PostgreSQL table — you can create indexes on it, join it in other queries, reference it in views, or even use it as a source for other stream tables. pg_trickle keeps it in sync automatically.
Key insight: The recursive query that computes paths and depths would normally need to be re-run manually (or via
REFRESH MATERIALIZED VIEW). With pg_trickle, it stays fresh — any change to thedepartmentstable is automatically reflected within the schedule bound (1 second here).
2.3 Chain Stream Tables — Build the Downstream Layers
Now create department_stats. The twist: instead of joining directly against departments, it joins against department_tree — the stream table we just created. This creates a chain: changes to departments update department_tree, whose changes then trigger department_stats to update.
This demonstrates how pg_trickle builds a DAG — a directed acyclic graph of stream tables — and automatically schedules refreshes in topological order.
SELECT pgtrickle.create_stream_table(
name => 'department_stats',
query => $$
SELECT
t.id AS department_id,
t.name AS department_name,
t.path AS full_path,
t.depth,
COUNT(e.id) AS headcount,
COALESCE(SUM(e.salary), 0) AS total_salary,
COALESCE(AVG(e.salary), 0) AS avg_salary
FROM department_tree t
LEFT JOIN employees e ON e.department_id = t.id
GROUP BY t.id, t.name, t.path, t.depth
$$,
schedule => 'calculated' -- CALCULATED: inherit schedule from downstream; see explanation below
);
What just happened — and why this one is different?
Like before, pg_trickle parsed the query, created a storage table, and set up CDC. But department_stats depends on department_tree, not a base table — so no new triggers were installed. Instead, pg_trickle registered department_tree as an upstream dependency in the DAG.
The schedule is 'calculated' (CALCULATED mode), which means: "don't give this table its own schedule — inherit the tightest schedule of any downstream table that queries it". Internally this stores NULL in the catalog, but you must pass the string 'calculated' — passing SQL NULL is an error. Since no other stream table has been created yet, it will be refreshed on demand or when a downstream dependent triggers it.
The query has no recursive CTE, so pg_trickle uses algebraic differentiation:
- Decomposed into operators:
Scan(department_tree)→LEFT JOIN→Scan(employees)→Aggregate(GROUP BY + COUNT/SUM/AVG)→Project - Derived a differentiation rule for each:
Δ(Scan)= read only change buffer rows (not the full table)Δ(LEFT JOIN)= join change rows from one side against the full other sideΔ(Aggregate)= for COUNT/SUM/AVG, add or subtract per group — no rescan needed
- Composed these into a single delta query (ΔQ) that never touches unchanged rows
When one employee is inserted, the refresh reads one change buffer row, joins to find the department, and adjusts only that group's count and sum.
Query it:
SELECT department_name, full_path, headcount, total_salary
FROM department_stats
ORDER BY full_path;
Expected output:
department_name | full_path | headcount | total_salary
-----------------+----------------------------------+-----------+--------------
Company | Company | 0 | 0
Engineering | Company > Engineering | 0 | 0
Backend | Company > Engineering > Backend | 2 | 235000.00
Frontend | Company > Engineering > Frontend | 1 | 110000.00
Platform | Company > Engineering > Platform | 1 | 130000.00
Operations | Company > Operations | 1 | 100000.00
Sales | Company > Sales | 2 | 185000.00
(7 rows)
Notice that the full_path column comes from department_tree — this data already went through one layer of incremental maintenance before landing here.
Add a third layer: department_report
Now add a rollup that aggregates department_stats by top-level group (depth = 1):
SELECT pgtrickle.create_stream_table(
name => 'department_report',
query => $$
SELECT
split_part(full_path, ' > ', 2) AS division,
SUM(headcount) AS total_headcount,
SUM(total_salary) AS total_payroll
FROM department_stats
WHERE depth >= 1
GROUP BY 1
$$,
schedule => '1s' -- this is the only explicit schedule; CALCULATED tables above inherit it
);
The DAG is now:
departments (base) employees (base)
│ │
▼ │
department_tree ──────────┤
(DIFF, CALCULATED) │
│ ▼
└──────▶ department_stats
(DIFF, CALCULATED)
│
▼
department_report
(DIFF, 1s) ◀── only explicit schedule
department_report drives the whole pipeline. Because it has a 1-second schedule, pg_trickle automatically propagates that cadence upstream: department_stats and department_tree will also be refreshed within 1 second of a base table change, in topological order, with no manual configuration.
Query the report:
SELECT division, total_headcount, total_payroll FROM department_report ORDER BY division;
division | total_headcount | total_payroll
-------------+-----------------+---------------
Engineering | 4 | 475000.00
Operations | 1 | 100000.00
Sales | 2 | 185000.00
(3 rows)
2.4 Watch a Change Cascade Through All Three Layers
This is the heart of pg_trickle. We'll make four changes to the base tables and watch changes propagate automatically through the three-layer DAG — each layer doing only the minimum work.
The data flow pipeline (three layers)
Your SQL statement
│
▼
CDC trigger fires (same transaction)
Change buffer receives one row
│
▼
Background scheduler fires (within ~1 second)
│
├──▶ [Layer 1] Refresh department_tree
│ delta query reads change buffer
│ MERGE touches only affected rows in department_tree
│ department_tree's own change buffer is updated
│
├──▶ [Layer 2] Refresh department_stats
│ delta query reads department_tree's change buffer
│ MERGE touches only affected department groups
│
└──▶ [Layer 3] Refresh department_report
delta query reads department_stats' change buffer
MERGE touches only affected division rows
All change buffers cleaned up ✓
All three layers run in a single scheduled pass, in topological order.
2.4a: INSERT ripples through all three layers
INSERT INTO employees (name, department_id, salary) VALUES
('Heidi', 6, 105000); -- New Frontend engineer
What happened immediately (in your transaction): The AFTER INSERT trigger on employees fired and wrote one row to pgtrickle_changes.changes_<employees_oid>. The row contains the new values, action type I, and the LSN at the time of insert. Your transaction committed normally — no blocking.
The stream tables don't know about Heidi yet. The change is in the buffer, waiting for the next refresh.
The background scheduler handles this automatically. With a 1-second schedule,
department_statsanddepartment_reportrefresh within about a second.To confirm a refresh has happened, check
data_timestampin the monitoring view:SELECT name, data_timestamp, staleness FROM pgtrickle.pgt_status();To force an immediate synchronous refresh, wait a moment first (so the scheduler can finish its current tick), then call in topological order. Note that
refresh_stream_tableonly refreshes the named table — it does not cascade upstream:SELECT pg_sleep(2); -- let the scheduler finish any in-progress tick SELECT pgtrickle.refresh_stream_table('department_stats'); SELECT pgtrickle.refresh_stream_table('department_report');
What happened across the three layers:
| Layer | What ran | Rows touched |
|---|---|---|
department_tree | No change — employees is not a source for this ST | 0 |
department_stats | Delta query: read 1 buffer row, join to Frontend, COUNT+1, SUM+105000 | 1 (Frontend group only) |
department_report | Delta query: read 1 change from dept_stats, SUM += 1 headcount, += 105000 | 1 (Engineering row only) |
Check the result:
SELECT department_name, headcount, total_salary FROM department_stats
WHERE department_name = 'Frontend';
department_name | headcount | total_salary
-----------------+-----------+--------------
Frontend | 2 | 215000.00
The 6 other groups in department_stats were not touched at all.
Contrast with a standard materialized view:
REFRESH MATERIALIZED VIEWwould re-scan all 8 employees, re-join with all 7 departments, re-aggregate, and update all 7 rows. With pg_trickle, the work was proportional to the 1 changed row — across all three layers.
2.4b: A department change cascades through the whole DAG
Now change the departments table — the root of the entire chain:
INSERT INTO departments (id, name, parent_id) VALUES
(8, 'DevOps', 2); -- New team under Engineering
What happened: The CDC trigger on departments fired. The change buffer for departments has one new row. None of the stream tables know about it yet.
The scheduler handles this automatically — all three tables will refresh within a second in the correct dependency order (upstream first). To force it synchronously, wait a moment first then refresh each table in topological order (
refresh_stream_tabledoes not cascade upstream):SELECT pg_sleep(2); SELECT pgtrickle.refresh_stream_table('department_tree'); SELECT pgtrickle.refresh_stream_table('department_stats'); SELECT pgtrickle.refresh_stream_table('department_report');
What happened across all three layers:
| Layer | What ran | Rows touched |
|---|---|---|
department_tree | Semi-naive evaluation: base case finds new dept, recursive term computes its path. Result: 1 new row | 1 inserted |
department_stats | Delta query reads new row from dept_tree's change buffer; DevOps has 0 employees so delta is minimal | 1 inserted (headcount=0) |
department_report | Delta on Engineering row: headcount stays the same (DevOps has 0 employees) | 0 effective changes |
How the recursive CTE refresh works — unlike department_stats, recursive CTEs can't be algebraically differentiated (the recursion references itself). pg_trickle uses incremental fixpoint strategies:
- INSERT → semi-naive evaluation: differentiate the base case, propagate the delta through the recursive term, stopping when no new rows are produced. Only new rows inserted.
- DELETE or UPDATE → Delete-and-Rederive (DRed): remove rows derived from deleted facts, re-derive rows that may have alternative derivation paths, handle cascades cleanly.
SELECT id, name, depth, path FROM department_tree WHERE name = 'DevOps';
id | name | depth | path
----+--------+-------+--------------------------------
8 | DevOps | 2 | Company > Engineering > DevOps
(1 row)
The recursive CTE automatically expanded to include the new department at the correct depth and path. One inserted row in departments produced one new row in the stream table.
2.4c: UPDATE — A single rename that cascades everywhere
Rename "Engineering" to "R&D":
UPDATE departments SET name = 'R&D' WHERE id = 2;
What happened in the change buffer: The CDC trigger captured the old row (name='Engineering') and the new row (name='R&D'). Both old and new values are stored so the delta can compute what to remove and what to add.
Wait a moment for the scheduler to propagate the rename through all layers. To force it synchronously, wait then refresh each table in topological order (refresh_stream_table does not cascade upstream):
SELECT pg_sleep(2);
SELECT pgtrickle.refresh_stream_table('department_tree');
SELECT pgtrickle.refresh_stream_table('department_stats');
SELECT pgtrickle.refresh_stream_table('department_report');
What happened across all three layers:
| Layer | Work done | Result |
|---|---|---|
department_tree | DRed strategy: delete rows derived with old name, re-derive with new name. 5 rows updated (Engineering + 4 sub-teams) | Paths now say Company > R&D > … |
department_stats | Delta reads 5 changed rows from dept_tree's buffer; updates full_path column for those 5 departments | 5 rows updated |
department_report | Division name changed: "Engineering" row replaced by "R&D" row | 1 DELETE + 1 INSERT |
Query to verify the cascade:
SELECT name, path FROM department_tree WHERE path LIKE '%R&D%' ORDER BY depth, name;
name | path
----------+--------------------------
R&D | Company > R&D
Backend | Company > R&D > Backend
DevOps | Company > R&D > DevOps
Frontend | Company > R&D > Frontend
Platform | Company > R&D > Platform
(5 rows)
One UPDATE to a department name flowed through all three layers automatically — updating 5 + 5 + 2 rows across the chain.
2.4d: DELETE — Remove an employee
DELETE FROM employees WHERE name = 'Bob';
What happened: The AFTER DELETE trigger on employees fired, writing a change buffer row with action type D and Bob's old values (department_id=5, salary=115000). The delta query will use these old values to compute the correct aggregate adjustment — it knows to subtract 115000 from Backend's salary sum and decrement the count.
Important — refresh before querying: The background scheduler refreshes all three tables within ~1 second, in topological order. To see the result immediately, wait a moment then explicitly refresh in upstream-first order:
SELECT pg_sleep(2);
SELECT pgtrickle.refresh_stream_table('department_stats');
SELECT pgtrickle.refresh_stream_table('department_report');
Why call
department_statsfirst?department_statssources from bothemployeesanddepartment_tree. Refreshing in topological order ensures each layer processes its upstream changes before computing its own deltas. Even whendepartment_treehas unprocessed changes from step 4c and a new employee change arrives simultaneously, pg_trickle's differential engine handles both correctly — using the pre-change left snapshot (L₀) to avoid double-counting.
Then verify the result:
SELECT department_name, headcount, total_salary, avg_salary
FROM department_stats WHERE department_name = 'Backend';
department_name | headcount | total_salary | avg_salary
-----------------+-----------+--------------+---------------------
Backend | 1 | 120000.00 | 120000.000000000000
(1 row)
Headcount dropped from 2 → 1 and the salary aggregates updated. Again, only the Backend group was touched — the other 6 department rows were untouched.
Chapter 3: Scheduling & Backpressure
Automatic Scheduling — Let the DAG Drive Itself
pg_trickle runs a background scheduler that automatically refreshes stale tables in topological order. In the Step 4 examples above, the scheduler handled every change within about a second. You can also call refresh_stream_table() directly when needed (e.g. in scripts or tests), but in normal operation the scheduler takes care of everything.
How schedules propagate
We gave department_report a '1s' schedule and the two upstream tables a NULL schedule (CALCULATED mode). This is the recommended pattern:
department_tree (CALCULATED → inherits 1s from downstream)
│
department_stats (CALCULATED → inherits 1s from downstream)
│
department_report (1s — the only explicit schedule)
CALCULATED mode (pass schedule => 'calculated') means: compute the tightest schedule across all downstream dependents. You declare freshness requirements at the tables your application queries — the system figures out how often each upstream table needs to refresh.
What the scheduler does every second
- Queries the catalog for stream tables past their freshness bound
- Sorts them topologically (upstream first) —
department_treerefreshes beforedepartment_stats, which refreshes beforedepartment_report - Runs each refresh (respecting
pg_trickle.max_concurrent_refreshes) - Updates the last-refresh frontier
Monitoring
-- Current status of all stream tables
SELECT name, status, refresh_mode, schedule, data_timestamp, staleness
FROM pgtrickle.pgt_status();
name | status | refresh_mode | schedule | data_timestamp | staleness
-----------------------------+--------+---------------+----------+-----------------------------+-----------------
public.department_tree | ACTIVE | DIFFERENTIAL | | 2026-02-26 10:30:00.123+01 | 00:00:00.877
public.department_stats | ACTIVE | DIFFERENTIAL | | 2026-02-26 10:30:00.456+01 | 00:00:00.544
public.department_report | ACTIVE | DIFFERENTIAL | 1s | 2026-02-26 10:30:00.789+01 | 00:00:00.211
-- Detailed performance stats
SELECT pgt_name, total_refreshes, avg_duration_ms, successful_refreshes
FROM pgtrickle.pg_stat_stream_tables;
-- Health check: quick triage of common issues
SELECT check_name, severity, detail FROM pgtrickle.health_check();
-- Visualize the dependency DAG
SELECT * FROM pgtrickle.dependency_tree();
-- Recent refresh timeline across all stream tables
SELECT * FROM pgtrickle.refresh_timeline(10);
-- Check CDC change buffer sizes (spotting buffer build-up)
SELECT * FROM pgtrickle.change_buffer_sizes();
See SQL_REFERENCE.md for the full list of monitoring functions including list_sources(), trigger_inventory(), and diamond_groups().
Chapter 4: Monitoring In Depth
All the monitoring capabilities from the monitoring quick reference above, expanded. For the five most important day-to-day introspection queries see the Monitoring Quick Reference at the end of this guide.
Optional: WAL-based CDC
By default pg_trickle uses triggers. If wal_level = logical is configured, set:
ALTER SYSTEM SET pg_trickle.cdc_mode = 'auto';
SELECT pg_reload_conf();
pg_trickle will automatically transition each stream table from trigger-based to WAL-based capture after the first successful refresh — reducing per-write overhead from ~2–15 μs (triggers) to near-zero (WAL-based capture adds no synchronous overhead to your DML). The transition is transparent; your queries and the refresh schedule are unaffected.
Optional: Parallel Refresh (v0.4.0+)
By default the scheduler refreshes stream tables sequentially in topological order within a single background worker. This is correct and efficient for most workloads.
For deployments with many independent stream tables, enable parallel refresh:
ALTER SYSTEM SET pg_trickle.parallel_refresh_mode = 'on';
ALTER SYSTEM SET pg_trickle.max_dynamic_refresh_workers = 4; -- cluster-wide cap
SELECT pg_reload_conf();
Independent stream tables at the same DAG level will then refresh concurrently in separate dynamic background workers. Refresh pairs with IMMEDIATE-trigger connections and atomic consistency groups still run in a single worker for correctness.
Before enabling, ensure max_worker_processes has enough room:
max_worker_processes >= 1 (launcher)
+ number of databases with stream tables
+ max_dynamic_refresh_workers (default 4)
+ autovacuum and other extension workers
Monitor parallel refresh:
SELECT * FROM pgtrickle.worker_pool_status(); -- live worker budget
SELECT * FROM pgtrickle.parallel_job_status(60); -- recent jobs
See CONFIGURATION.md — Parallel Refresh for the complete tuning reference.
Optional: PgBouncer / Connection Pooler Compatibility (v0.10.0+)
If you're connecting through PgBouncer or another connection pooler in transaction mode (the default on Supabase, Railway, Neon, and most managed PostgreSQL platforms), set pooler_compatibility_mode when creating or altering a stream table:
SELECT pgtrickle.create_stream_table(
name => 'live_headcount',
query => 'SELECT department_id, COUNT(*) FROM employees GROUP BY 1',
schedule => '1s',
pooler_compatibility_mode => true
);
This disables prepared statements and NOTIFY emissions for that table — the two features that break in transaction-pool mode. Leave it off (the default) if you connect directly to PostgreSQL.
Optional: Change Buffer Compaction (v0.10.0+)
For high-churn tables, pg_trickle automatically compacts the pending change buffer before each refresh cycle when it exceeds pg_trickle.compact_threshold (default 100,000 rows). INSERT→DELETE pairs that cancel each other out are eliminated, and multiple changes to the same row are collapsed to a single net change, reducing delta scan overhead by 50–90%.
Chapter 5: Advanced Topics
Refresh Modes and IVM Strategies
You've now seen the IVM strategies pg_trickle uses for incremental view maintenance. Understanding the four refresh modes and when each strategy applies helps you write efficient stream table queries.
The Four Refresh Modes
| Mode | When it refreshes | Use case |
|---|---|---|
| AUTO (default) | On a schedule (background) | Most use cases — uses DIFFERENTIAL when possible, falls back to FULL automatically |
| DIFFERENTIAL | On a schedule (background) | Like AUTO but errors if the query can't be differentiated |
| FULL | On a schedule (background) | Forces full recompute every cycle |
| IMMEDIATE | Synchronously, in the same transaction as the DML | Real-time dashboards, audit tables — the stream table is always up-to-date |
When you omit refresh_mode, the default is 'AUTO' — it uses differential (delta-only) maintenance when the query supports it, and automatically falls back to full recomputation when it doesn't. You only need to specify a mode explicitly for advanced cases.
IMMEDIATE mode (new in v0.2.0) maintains stream tables synchronously within the same transaction as the base table DML. It uses statement-level AFTER triggers with transition tables — no change buffers, no scheduler. The stream table is always consistent with the current transaction.
-- Create a stream table that updates in real-time
SELECT pgtrickle.create_stream_table(
name => 'live_headcount',
query => $$
SELECT department_id, COUNT(*) AS headcount
FROM employees
GROUP BY department_id
$$,
refresh_mode => 'IMMEDIATE'
);
-- After any INSERT/UPDATE/DELETE on employees,
-- live_headcount is already up-to-date — no refresh needed!
IMMEDIATE mode supports joins, aggregates, window functions, LATERAL subqueries, and cascading IMMEDIATE stream tables. Recursive CTEs are not supported in IMMEDIATE mode (use DIFFERENTIAL instead).
You can switch between modes at any time:
-- Switch from DIFFERENTIAL to IMMEDIATE
SELECT pgtrickle.alter_stream_table('department_stats', refresh_mode => 'IMMEDIATE');
-- Switch back to DIFFERENTIAL with a schedule
SELECT pgtrickle.alter_stream_table('department_stats', refresh_mode => 'DIFFERENTIAL', schedule => '1s');
Algebraic Differentiation (used by department_stats)
For queries composed of scans, filters, joins, and algebraic aggregates (COUNT, SUM, AVG), pg_trickle can derive the IVM delta mathematically. The rules come from the theory of DBSP (Database Stream Processing):
| Operator | Delta Rule | Cost |
|---|---|---|
| Scan | Read only change buffer rows (not the full table) | O(changes) |
| Filter (WHERE) | Apply predicate to change rows | O(changes) |
| Join | Join change rows from one side against the full other side | O(changes × lookup) |
| Aggregate (COUNT/SUM/AVG) | Add or subtract deltas per group — no rescan | O(affected groups) |
| Project | Pass through | O(changes) |
The total cost is proportional to the number of changes, not the table size. For a million-row table with 10 changes, the delta query touches ~10 rows.
Incremental Strategies for Recursive CTEs (used by department_tree)
For recursive CTEs, pg_trickle can't derive an algebraic delta because the recursion references itself. Instead it uses two complementary strategies, chosen automatically based on what changed:
Semi-naive evaluation (for INSERT-only changes):
- Differentiate the base case — find the new seed rows
- Propagate the delta through the recursive term, iterating until no new rows are produced
- The result is only the new rows created by the change — not the whole tree
Delete-and-Rederive (DRed) (for DELETE or UPDATE):
- Remove all rows derived from the old fact
- Re-derive rows that had the old fact as one of their derivation paths (they may still be reachable via other paths)
- Insert the newly derived rows under the new fact
Both strategies are more efficient than full recomputation — they work on the affected portion of the result set, not the entire recursive query. The MERGE only modifies rows that actually changed.
When to use which strategy?
You don't choose — pg_trickle detects the strategy automatically based on the query structure:
| Query Pattern | Strategy | Performance |
|---|---|---|
| Scan + Filter + Join + algebraic Aggregate (COUNT/SUM/AVG) | Algebraic | Excellent — O(changes) |
CORR, COVAR_POP/SAMP, REGR_* (12 functions) | Algebraic (Welford running totals) | O(changes) — running totals updated per changed row, no group rescan (v0.10.0+) |
| Non-recursive CTEs | Algebraic (inlined) | CTE body is differentiated inline |
MIN / MAX aggregates | Semi-algebraic | Uses LEAST/GREATEST merge; per-group rescan only when an extremum is deleted |
STRING_AGG, ARRAY_AGG, ordered-set aggregates | Group-rescan | Affected groups fully re-aggregated from source |
GROUPING SETS / CUBE / ROLLUP | Algebraic (rewritten) | Auto-expanded to UNION ALL of GROUP BY queries; CUBE capped at 64 branches |
Recursive CTEs (WITH RECURSIVE) INSERT | Semi-naive evaluation | O(new rows derived from the change) |
Recursive CTEs (WITH RECURSIVE) DELETE/UPDATE | Delete-and-Rederive | Re-derives rows with alternative paths; O(affected subgraph) (v0.10.0+) |
| LATERAL subqueries | Correlated re-evaluation | Only outer rows correlated with changed inner data re-evaluated — O(correlated rows) (v0.10.0+) |
| Window functions | Partition recompute | Only affected partitions recomputed |
ORDER BY … LIMIT N (TopK) | Scoped recomputation | Re-evaluates top-N via MERGE; stores exactly N rows |
| IMMEDIATE mode queries | In-transaction delta | Same algebraic strategies, applied synchronously via transition tables |
FUSE Circuit Breaker (v0.11.0+)
The fuse is a circuit breaker that stops a stream table from processing an unexpectedly large batch of changes — for example from a runaway script or mass-delete migration — without operator review.
-- Arm a fuse: blow when pending changes exceed 50 000 rows
SELECT pgtrickle.alter_stream_table(
'category_summary',
fuse => 'on',
fuse_ceiling => 50000
);
-- Check fuse status across all stream tables
SELECT name, fuse_mode, fuse_state, fuse_ceiling, blown_at
FROM pgtrickle.fuse_status();
-- After investigating and deciding to apply the batch:
SELECT pgtrickle.reset_fuse('category_summary', action => 'apply');
-- Or skip the oversized batch entirely and resume from current state:
SELECT pgtrickle.reset_fuse('category_summary', action => 'skip_changes');
reset_fuse supports three actions:
'apply'— process all pending changes and resume normal scheduling.'reinitialize'— drop and repopulate the stream table from scratch.'skip_changes'— discard pending changes and resume from the current frontier.
A pgtrickle_alert NOTIFY is emitted when the fuse blows, making it easy to
hook into alerting pipelines or LISTEN from application code.
Partitioned Stream Tables (v0.11.0+)
For large stream tables, declare a partition key at creation time so MERGE operations are scoped to only the relevant partitions:
SELECT pgtrickle.create_stream_table(
name => 'sales_by_month',
query => $$
SELECT
DATE_TRUNC('month', sale_date) AS month,
product_id,
SUM(amount) AS total_sales
FROM sales
GROUP BY 1, 2
$$,
schedule => '1m',
partition_by => 'month' -- partition key must be in the SELECT output
);
pg_trickle creates the storage table as PARTITION BY RANGE (month) with a
catch-all partition, then on each refresh:
- Inspects the delta to find the
MINandMAXof the partition key. - Injects
AND st.month BETWEEN min AND maxinto the MERGE ON clause. - PostgreSQL prunes all partitions outside the range — giving ~100× I/O reduction for a 0.1% change rate on a 10M-row table.
See SQL_REFERENCE.md for full partitioning options.
IMMEDIATE Mode — Real-Time In-Transaction IVM
-- Create a stream table that updates in the same transaction as its source
SELECT pgtrickle.create_stream_table(
name => 'live_headcount',
query => $$
SELECT department_id, COUNT(*) AS headcount
FROM employees
GROUP BY department_id
$$,
refresh_mode => 'IMMEDIATE'
);
-- After any INSERT/UPDATE/DELETE on employees, live_headcount is already up-to-date:
INSERT INTO employees (name, department_id, salary) VALUES ('Zara', 2, 95000);
SELECT * FROM live_headcount WHERE department_id = 2; -- 4 rows, immediately
IMMEDIATE mode uses statement-level AFTER triggers with transition tables —
no change buffers, no scheduler, no background workers. The stream table is
always consistent with the current transaction. Ideal for audit tables,
real-time dashboards, and applications that need zero-latency reads.
Multi-Tenant Worker Quotas (v0.11.0+)
In deployments with multiple databases, one busy database can starve others
if all dynamic refresh workers are claimed. The per_database_worker_quota
GUC prevents this:
-- Limit one performance-critical database to 4 workers (with burst to 6)
ALTER DATABASE analytics SET pg_trickle.per_database_worker_quota = 4;
-- Allow a reporting database only 2 base workers
ALTER DATABASE reporting SET pg_trickle.per_database_worker_quota = 2;
-- Apply changes
SELECT pg_reload_conf();
When the cluster has spare capacity (active workers < 80% of
max_dynamic_refresh_workers), a database may temporarily burst to 150% of
its quota. Burst is reclaimed within 1 scheduler cycle once load rises.
Within each dispatch tick, IMMEDIATE-trigger closures are always dispatched
first, followed by atomic groups, singletons, and cyclic SCCs.
See CONFIGURATION.md for full quota tuning options.
Clean Up
When you're done experimenting, drop the stream tables. Drop dependents before their sources:
SELECT pgtrickle.drop_stream_table('department_report');
SELECT pgtrickle.drop_stream_table('department_stats');
SELECT pgtrickle.drop_stream_table('department_tree');
DROP TABLE employees;
DROP TABLE departments;
drop_stream_table atomically removes in a single transaction:
- The storage table (e.g.,
public.department_stats) - CDC triggers on source tables (removed only if no other stream table references the same source)
- Change buffer tables in
pgtrickle_changes - Catalog entries in
pgtrickle.pgt_stream_tables
Monitoring Quick Reference
pg_trickle ships several built-in monitoring functions and a ready-made Prometheus/Grafana stack. Here are the five most useful functions for day-to-day operations.
Stream Table Status
-- Overview of all stream tables: status, staleness, last refresh time, errors
SELECT name, status, staleness, last_refresh_at, last_error
FROM pgtrickle.pgt_status();
Health Check
-- Run all built-in health checks; returns severity (OK/WARNING/CRITICAL) per check
SELECT check_name, severity, detail FROM pgtrickle.health_check();
Change Buffer Sizes
-- Show CDC buffer row counts per source table — useful for spotting backlogs
SELECT * FROM pgtrickle.change_buffer_sizes();
Dependency Tree
-- Visualize the DAG: which stream tables depend on what
SELECT * FROM pgtrickle.dependency_tree();
Fuse Status
-- Check circuit breaker state for all stream tables (v0.11.0+)
SELECT * FROM pgtrickle.fuse_status();
Prometheus & Grafana
For production monitoring, pg_trickle ships a ready-made observability stack
in the monitoring/ directory:
cd monitoring && docker compose up
This starts PostgreSQL + postgres_exporter + Prometheus + Grafana with
pre-configured dashboards and alerting rules. Grafana is available at
http://localhost:3000 (admin/admin). See the monitoring README
for the full list of exported metrics and alert conditions.
Key Prometheus metrics:
| Metric | Description |
|---|---|
pgtrickle_refresh_total | Cumulative refresh count per table |
pgtrickle_refresh_duration_seconds | Last refresh duration per table |
pgtrickle_staleness_seconds | Seconds since last successful refresh |
pgtrickle_consecutive_errors | Current error streak per table |
pgtrickle_cdc_buffer_rows | Pending change buffer rows per source table |
Pre-configured alerts: staleness > 5 min, ≥3 consecutive failures, table SUSPENDED, CDC buffer > 1 GB, scheduler down, high refresh duration.
Summary: What You Learned
| Concept | What you saw |
|---|---|
| Stream tables | Tables defined by a SQL query that stay automatically up to date |
| CDC triggers | Lightweight change capture in the same transaction — no logical replication or polling required |
| DAG scheduling | Stream tables can depend on other stream tables; refreshes run in topological order, schedules propagate upstream via CALCULATED mode |
| Algebraic IVM | Delta queries that process only changed rows — O(changes) regardless of table size |
| Semi-naive / DRed | Incremental strategies for WITH RECURSIVE — INSERT uses semi-naive, DELETE/UPDATE uses Delete-and-Rederive (v0.10.0+) |
| IMMEDIATE mode | Synchronous in-transaction IVM — stream tables updated within the same transaction as your DML, always consistent |
| TopK | ORDER BY … LIMIT N queries store exactly N rows, refreshed via scoped recomputation |
| Diamond consistency | Atomic refresh groups for diamond-shaped dependency graphs via diamond_consistency = 'atomic' |
| Downstream propagation | A single base table write cascades through an entire chain of stream tables, automatically, in the right order |
| Trigger-based CDC | Lightweight row-level triggers by default (no WAL configuration needed); optional transition to WAL-based capture via pg_trickle.cdc_mode = 'auto' |
| Parallel refresh | Independent stream tables refresh concurrently in dynamic background workers via pg_trickle.parallel_refresh_mode = 'on' (v0.4.0+, default off) |
| auto_backoff | Scheduler automatically stretches effective interval when refresh cost exceeds 95% of the schedule window, capped at 8× (on by default, v0.10.0+) |
| PgBouncer compatibility | Set pooler_compatibility_mode => true per stream table to work behind transaction-mode connection poolers (v0.10.0+) |
| Monitoring | pgt_status(), health_check(), dependency_tree(), pg_stat_stream_tables, and more for freshness, timing, and error history |
The key takeaway: you write to base tables — pg_trickle does the rest. Data flows downstream automatically, each layer doing the minimum work proportional to what changed, in dependency order.
Troubleshooting
Stream table is stale / not refreshing
Check the status view first:
SELECT name, status, last_error, last_refresh_at, staleness FROM pgtrickle.pgt_status();
A status of ERROR means the last refresh failed. last_error contains the message. Fix the underlying issue (e.g., a dropped column referenced in the query) then call:
SELECT pgtrickle.refresh_stream_table('your_table');
For a broader health check:
SELECT check_name, severity, detail FROM pgtrickle.health_check();
Change buffer growing large
If a stream table has status = 'PAUSED' or refreshes are falling behind:
SELECT * FROM pgtrickle.change_buffer_sizes(); -- find large buffers
Large buffers are normal under heavy load — auto_backoff slows the schedule to avoid CPU runaway and will self-correct once throughput stabilises. If a buffer stays large indefinitely, check last_error in pgt_status() for a blocked refresh.
CDC triggers missing after restore / point-in-time recovery
PITR restores the heap table but not the triggers if the extension was installed after the base backup. Verify:
SELECT * FROM pgtrickle.trigger_inventory(); -- expected vs installed triggers
Any missing trigger can be reinstalled with:
SELECT pgtrickle.repair_stream_table('your_table');
Deployment Best Practices
Once you've built your stream tables interactively, you'll want to deploy them reliably — via SQL migration scripts, dbt, or GitOps pipelines.
Kubernetes Deployment (CloudNativePG)
pg_trickle integrates natively with CloudNativePG
using Image Volume Extensions (Kubernetes 1.33+). The extension is packaged
as a scratch-based OCI image containing only the .so, .control, and .sql
files — no custom PostgreSQL image required.
Prerequisites
- Kubernetes 1.33+ with the
ImageVolumefeature gate enabled - CloudNativePG operator 1.28+
- pg_trickle extension image pushed to your cluster registry
Quick Start
- Deploy the Cluster with the extension mounted as an Image Volume:
# cnpg/cluster-example.yaml (abridged)
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: pg-trickle-demo
spec:
instances: 3
imageName: ghcr.io/cloudnative-pg/postgresql:18
postgresql:
shared_preload_libraries:
- pg_trickle
extensions:
- name: pg-trickle
image:
reference: ghcr.io/<owner>/pg_trickle-ext:<version>
parameters:
max_worker_processes: "8"
- Create the extension declaratively with a CNPG Database resource:
# cnpg/database-example.yaml
apiVersion: postgresql.cnpg.io/v1
kind: Database
metadata:
name: pg-trickle-app
spec:
name: app
owner: app
cluster:
name: pg-trickle-demo
extensions:
- name: pg_trickle
- Apply both resources:
kubectl apply -f cnpg/cluster-example.yaml
kubectl apply -f cnpg/database-example.yaml
Full example manifests are in the cnpg/ directory.
Health Monitoring
CNPG manages PostgreSQL liveness/readiness probes via its instance manager. For pg_trickle-specific health, use the built-in health check function:
-- Run against the primary or any replica:
SELECT * FROM pgtrickle.health_check();
This returns rows for scheduler status, error/suspended tables, stale tables, CDC buffer growth, WAL slot lag, and worker pool utilization. Integrate it into your monitoring stack:
- Prometheus: Use the CNPG monitoring integration to expose
pgtrickle.health_check()results as custom metrics - Kubernetes CronJob: Schedule periodic health checks and alert via your existing alerting pipeline
Probe Configuration
The example manifests include probe settings tuned for pg_trickle workloads:
probes:
startup:
periodSeconds: 10
failureThreshold: 60 # 10 min for shared_preload_libraries init
liveness:
periodSeconds: 10
failureThreshold: 6 # 60s before restart
readiness:
type: streaming
maximumLag: 64Mi # replicas must be streaming before serving reads
Why readiness: streaming? Stream tables are readable on replicas, but
a lagging replica serves stale stream table data. The maximumLag setting
ensures replicas are caught up before receiving traffic.
Failover Behavior
When the primary pod fails and CNPG promotes a replica:
- Scheduler: The new primary starts the pg_trickle scheduler background
worker automatically (registered via
shared_preload_libraries) - Stream tables: All stream table definitions are stored in the
pgtrickle.pgt_stream_tablescatalog table, which is replicated to all replicas. The promoted replica has the complete catalog. - CDC triggers: Trigger definitions are replicated as part of the WAL stream. The new primary's triggers fire normally on new writes.
- Change buffers: Uncommitted change buffer rows from in-flight transactions on the old primary are lost (standard PostgreSQL behavior). The next refresh cycle detects the gap and performs a FULL refresh to resynchronize.
- Refresh frontiers: Each stream table's last-refresh frontier is stored in the catalog. If the frontier is ahead of the available change buffer data (due to WAL replay lag), the scheduler falls back to FULL refresh once and then resumes DIFFERENTIAL.
No manual intervention is required after failover.
Idempotent SQL Migrations
Use create_or_replace_stream_table() in your migration scripts. It's safe to
run on every deploy:
-- migrations/V003__stream_tables.sql
-- Creates if absent, updates if definition changed, no-op if identical.
SELECT pgtrickle.create_or_replace_stream_table(
name => 'employee_salaries',
query => 'SELECT e.id, e.name, d.name AS department, e.salary
FROM employees e JOIN departments d ON e.department_id = d.id',
schedule => '30s',
refresh_mode => 'DIFFERENTIAL'
);
SELECT pgtrickle.create_or_replace_stream_table(
name => 'department_stats',
query => 'SELECT department, COUNT(*) AS headcount, AVG(salary) AS avg_salary
FROM employee_salaries GROUP BY department',
schedule => '30s',
refresh_mode => 'DIFFERENTIAL'
);
If someone changes the query in a later migration, create_or_replace detects
the difference and migrates the storage table in place — no need to drop and
recreate.
dbt Integration
With the dbt-pgtrickle
package, stream tables are just dbt models with materialized='stream_table':
-- models/department_stats.sql
{{ config(
materialized='stream_table',
schedule='30s',
refresh_mode='DIFFERENTIAL'
) }}
SELECT department, COUNT(*) AS headcount, AVG(salary) AS avg_salary
FROM {{ ref('employee_salaries') }}
GROUP BY department
Every dbt run calls create_or_replace_stream_table() under the hood,
so deployments are always idempotent.
Day 2 Operations
Added in v0.20.0 (UX-4).
Once your stream tables are running in production, pg_trickle can monitor itself using its own stream tables — a technique called self-monitoring.
Enabling Self-Monitoring
-- Create all five monitoring stream tables (idempotent, safe to repeat).
SELECT pgtrickle.setup_self_monitoring();
-- Check what was created.
SELECT * FROM pgtrickle.self_monitoring_status();
This creates five stream tables in the pgtrickle schema:
| Stream Table | Purpose |
|---|---|
df_efficiency_rolling | Rolling-window refresh statistics (replaces manual refresh_efficiency() calls) |
df_anomaly_signals | Detects duration spikes, error bursts, mode oscillation |
df_threshold_advice | Recommends threshold adjustments based on multi-cycle analysis |
df_cdc_buffer_trends | Tracks CDC buffer growth rates per source table |
df_scheduling_interference | Detects concurrent refresh overlap patterns |
Checking Recommendations
After at least 10–20 refresh cycles have accumulated:
-- Which stream tables have poorly calibrated thresholds?
SELECT pgt_name, current_threshold, recommended_threshold, confidence, reason
FROM pgtrickle.df_threshold_advice
WHERE confidence IN ('HIGH', 'MEDIUM')
AND abs(recommended_threshold - current_threshold) > 0.05;
-- Are any stream tables experiencing anomalies?
SELECT pgt_name, duration_anomaly, recent_failures
FROM pgtrickle.df_anomaly_signals
WHERE duration_anomaly IS NOT NULL OR recent_failures >= 2;
Automatic Threshold Tuning
To let pg_trickle automatically apply threshold recommendations:
SET pg_trickle.self_monitoring_auto_apply = 'threshold_only';
This applies changes only when confidence is HIGH and the recommended threshold
differs by more than 5%. Changes are rate-limited to once per 10 minutes per
stream table and logged with initiated_by = 'SELF_MONITOR'.
Visualizing the DAG
-- See the full refresh graph (Mermaid format, paste into any Mermaid renderer).
SELECT pgtrickle.explain_dag();
Dog-feeding STs appear in green, user STs in blue, suspended in red.
Disabling Self-Monitoring
SELECT pgtrickle.teardown_self_monitoring();
This drops all monitoring stream tables. User stream tables are never affected. The control plane continues operating identically without self-monitoring.
What's Next?
- SQL_REFERENCE.md — Full API reference for all functions, views, and configuration
- ARCHITECTURE.md — Deep dive into the system architecture and data flow
- DVM_OPERATORS.md — How each SQL operator is differentiated for incremental maintenance
- CONFIGURATION.md — GUC variables for tuning schedule, concurrency, and cleanup behavior
- Flyway & Liquibase Integration — Migration patterns for Flyway and Liquibase
- ORM Integration — SQLAlchemy and Django ORM patterns for stream tables
- What Happens on INSERT — Detailed trace of a single INSERT through the entire pipeline
- What Happens on UPDATE — How UPDATEs are split into D+I, group key changes, and net-effect computation
- What Happens on DELETE — Reference counting, group deletion, and INSERT+DELETE cancellation
- What Happens on TRUNCATE — Why TRUNCATE bypasses triggers and how to recover
- dbt Getting Started example — Everything above, expressed as dbt models and seeds with a one-command Docker runner
Viewing on GitHub? The installation guide lives in INSTALL.md. This stub is served by the pg_trickle docs site — the include below renders there.
Installation Guide
Choose your installation path
| Environment | Recommended approach |
|---|---|
| Local development / quick evaluation | Docker sandbox — docker compose up -d in playground/ gets you a running pg_trickle instance in ~60 seconds with no build step. |
| Self-hosted Linux server | Pre-built release — download the .tar.gz, copy two directories, CREATE EXTENSION. |
| macOS development | Build from source using cargo pgrx install, or use the Docker sandbox above. |
| Kubernetes / CNPG | CloudNativePG — use the Dockerfile.ghcr image or the CNPG extension manifest. |
| Managed PostgreSQL | Check your provider's extension list. If pg_trickle is not available, the Docker or Kubernetes path is your best option. |
Prerequisites
| Requirement | Version |
|---|---|
| PostgreSQL | 18.x |
Building from source additionally requires Rust 1.85+ (edition 2024) and pgrx 0.18.x. Pre-built release artifacts only need a running PostgreSQL 18.x instance.
Installing from a Pre-built Release
1. Download the release archive
Download the archive for your platform from the GitHub Releases page:
| Platform | Archive |
|---|---|
| Linux x86_64 | pg_trickle-<ver>-pg18-linux-amd64.tar.gz |
| macOS Apple Silicon | pg_trickle-<ver>-pg18-macos-arm64.tar.gz |
| Windows x64 | pg_trickle-<ver>-pg18-windows-amd64.zip |
Optionally verify the checksum against SHA256SUMS.txt from the same release:
sha256sum -c SHA256SUMS.txt
2. Extract and install
Linux / macOS:
tar xzf pg_trickle-<ver>-pg18-linux-amd64.tar.gz
cd pg_trickle-<ver>-pg18-linux-amd64
sudo cp lib/*.so "$(pg_config --pkglibdir)/"
sudo cp extension/*.control extension/*.sql "$(pg_config --sharedir)/extension/"
Windows (PowerShell):
Expand-Archive pg_trickle-<ver>-pg18-windows-amd64.zip -DestinationPath .
cd pg_trickle-<ver>-pg18-windows-amd64
Copy-Item lib\*.dll "$(pg_config --pkglibdir)\"
Copy-Item extension\* "$(pg_config --sharedir)\extension\"
3. Using with CloudNativePG (Kubernetes)
pg_trickle is distributed as an OCI extension image for use with CloudNativePG Image Volume Extensions.
Requirements: Kubernetes 1.33+, CNPG 1.28+, PostgreSQL 18.
# Pull the extension image
docker pull ghcr.io/trickle-labs/pg_trickle-ext:<ver>
See cnpg/cluster-example.yaml and cnpg/database-example.yaml for complete Cluster and Database deployment examples.
4. GHCR Docker image (recommended for local dev)
pg_trickle is published as a ready-to-run Docker image on the GitHub Container
Registry. PostgreSQL 18.3 and pg_trickle are pre-installed and all sensible GUC
defaults (wal_level, shared_preload_libraries, memory, scheduler settings)
are baked in — no configuration file editing needed.
docker pull ghcr.io/trickle-labs/pg_trickle:latest
docker run --rm \
-e POSTGRES_PASSWORD=secret \
-p 5432:5432 \
ghcr.io/trickle-labs/pg_trickle:latest
CREATE EXTENSION pg_trickle; runs automatically on the default postgres
database at first startup.
Available tags:
| Tag | Meaning |
|---|---|
latest | Most recent release |
pg18 | Floating alias for the latest PostgreSQL 18 build |
<version>-pg18.3 | Immutable tag, e.g. 0.13.0-pg18.3 |
Override any GUC at runtime without rebuilding:
docker run --rm \
-e POSTGRES_PASSWORD=secret \
-p 5432:5432 \
ghcr.io/trickle-labs/pg_trickle:latest \
-c shared_buffers=2GB -c work_mem=64MB -c effective_cache_size=6GB
For persistent data, mount a volume:
docker run -d \
--name pg_trickle \
-e POSTGRES_PASSWORD=secret \
-p 5432:5432 \
-v pg_trickle_data:/var/lib/postgresql/data \
ghcr.io/trickle-labs/pg_trickle:latest
Alternative — manual mount from a release archive:
If you prefer to use the stock postgres:18.3 image rather than the pre-built
image, extract the extension files from a release archive and mount them:
tar xzf pg_trickle-<ver>-pg18-linux-amd64.tar.gz
cd pg_trickle-<ver>-pg18-linux-amd64
docker run --rm \
-v $PWD/lib/pg_trickle.so:/usr/lib/postgresql/18/lib/pg_trickle.so:ro \
-v $PWD/extension/:/tmp/ext/:ro \
-e POSTGRES_PASSWORD=postgres \
postgres:18.3 \
sh -c 'cp /tmp/ext/* /usr/share/postgresql/18/extension/ && \
exec postgres -c shared_preload_libraries=pg_trickle'
Installing from PGXN
pg_trickle is published on the PostgreSQL Extension Network (PGXN). Installing via PGXN compiles the extension from source, so the Rust toolchain and pgrx are required.
1. Install prerequisites
# Rust toolchain
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source "$HOME/.cargo/env"
# pgrx build tool
cargo install --locked cargo-pgrx --version 0.18.0
cargo pgrx init --pg18 "$(pg_config --bindir)/pg_config"
2. Install the pgxn client
pip install pgxnclient
3. Install pg_trickle
pgxn install pg_trickle
To install a specific version:
pgxn install pg_trickle=0.10.0
Note: After installation, follow the PostgreSQL Configuration and Extension Installation steps below.
Building from Source
1. Install Rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
2. Install pgrx
cargo install --locked cargo-pgrx --version 0.18.0
cargo pgrx init --pg18 $(pg_config --bindir)/pg_config
3. Build the Extension
# Development build (faster compilation)
cargo pgrx install --pg-config $(pg_config --bindir)/pg_config
# Release build (optimized, for production)
cargo pgrx install --release --pg-config $(pg_config --bindir)/pg_config
# Package for deployment (creates installable artifacts)
cargo pgrx package --pg-config $(pg_config --bindir)/pg_config
PostgreSQL Configuration
Add the following to postgresql.conf before starting PostgreSQL:
# Required — loads the extension shared library at server start
shared_preload_libraries = 'pg_trickle'
# Must accommodate the pg_trickle launcher + one scheduler per database
# with pg_trickle installed + optional parallel refresh workers.
#
# WARNING: when this limit is reached, the launcher silently skips
# databases it cannot spawn a scheduler for and retries every 5 minutes.
# Those databases stop refreshing without any visible error.
# Check PostgreSQL logs for:
# WARNING: pg_trickle launcher: could not spawn scheduler for database '...'
#
# Formula:
# 1 (launcher) + N (one scheduler per DB) + max_dynamic_refresh_workers
# + autovacuum_max_workers + parallel query workers + other extensions
#
# 32 is a safe starting point for most clusters:
max_worker_processes = 32
Note:
wal_level = logicalandmax_replication_slotsare not required. The extension uses lightweight row-level triggers for CDC, not logical replication.
Restart PostgreSQL after modifying these settings:
pg_ctl restart -D /path/to/data
# or
systemctl restart postgresql
Extension Installation
Connect to the target database and run:
CREATE EXTENSION pg_trickle;
This creates:
- The
pgtrickleschema with catalog tables and SQL functions - The
pgtrickle_changesschema for change buffer tables - Event triggers for DDL tracking
- The
pgtrickle.pg_stat_stream_tablesmonitoring view
Verification
After installation, verify everything is working:
-- Check the extension version
SELECT extname, extversion FROM pg_extension WHERE extname = 'pg_trickle';
-- Or get a full status overview (includes version, scheduler state, stream table count)
SELECT * FROM pgtrickle.pgt_status();
Quick functional test
CREATE TABLE test_source (id INT PRIMARY KEY, val TEXT);
INSERT INTO test_source VALUES (1, 'hello');
SELECT pgtrickle.create_stream_table(
'test_st',
'SELECT id, val FROM test_source',
'1m',
'FULL'
);
SELECT * FROM test_st;
-- Should return: 1 | hello
-- Clean up
SELECT pgtrickle.drop_stream_table('test_st');
DROP TABLE test_source;
Upgrading
To upgrade pg_trickle to a newer version without losing data:
For comprehensive upgrade instructions, version-specific notes, troubleshooting, and rollback procedures, see docs/UPGRADING.md.
1. Install the new extension files
Follow the same steps as Installing from a Pre-built Release to overwrite the shared library and SQL files with the new version. You do not need to drop the extension from your databases first.
Linux / macOS:
tar xzf pg_trickle-<new-ver>-pg18-linux-amd64.tar.gz
cd pg_trickle-<new-ver>-pg18-linux-amd64
sudo cp lib/*.so "$(pg_config --pkglibdir)/"
sudo cp extension/*.control extension/*.sql "$(pg_config --sharedir)/extension/"
2. Restart PostgreSQL (when required)
If the shared library ABI has changed, restart PostgreSQL before proceeding so the
new .so/.dll is loaded. The release notes for each version will call this out
explicitly when a restart is required.
pg_ctl restart -D /path/to/data
# or
systemctl restart postgresql
3. Apply the schema migration in each database
Connect to every database where pg_trickle is installed and run:
-- Upgrade to the latest bundled version
ALTER EXTENSION pg_trickle UPDATE;
-- Or upgrade to a specific version
ALTER EXTENSION pg_trickle UPDATE TO '<new-version>';
PostgreSQL uses the versioned SQL migration scripts bundled with the release
(e.g. pg_trickle--0.2.3--0.3.0.sql, pg_trickle--0.3.0--0.4.0.sql) to
apply catalog and SQL-surface changes.
PostgreSQL automatically chains these scripts when you run ALTER EXTENSION pg_trickle UPDATE. The command is a no-op when no migration script is needed
for a given release.
You can confirm the active version afterwards:
SELECT extversion FROM pg_extension WHERE extname = 'pg_trickle';
Coming soon: A future release will include a helper function (
pgtrickle.upgrade()) that automates steps 2–3 across all databases in the cluster and validates catalog integrity after the migration. Until then, the manual steps above are the supported upgrade path.
Uninstallation
-- Drop all stream tables first
SELECT pgtrickle.drop_stream_table(pgt_schema || '.' || pgt_name)
FROM pgtrickle.pgt_stream_tables;
-- Drop the extension
DROP EXTENSION pg_trickle CASCADE;
Remove pg_trickle from shared_preload_libraries in postgresql.conf and restart PostgreSQL.
Troubleshooting
Unit tests crash on macOS 26+ (symbol not found in flat namespace)
macOS 26 (Tahoe) changed dyld to eagerly resolve all flat-namespace symbols
at binary load time. pgrx extensions reference PostgreSQL server-internal
symbols (e.g. CacheMemoryContext, SPI_connect) via the
-Wl,-undefined,dynamic_lookup linker flag. These symbols are normally
provided by the postgres executable when the extension is loaded as a shared
library — but for cargo test --lib there is no postgres process, so the test
binary aborts immediately:
dyld[66617]: symbol not found in flat namespace '_CacheMemoryContext'
This affects local development only — integration tests, E2E tests, and the extension itself running inside PostgreSQL are unaffected.
The fix is built into the just test-unit recipe. It automatically:
- Compiles a tiny C stub library (
scripts/pg_stub.c→target/libpg_stub.dylib) that provides NULL/no-op definitions for the ~28 PostgreSQL symbols. - Compiles the test binary with
--no-run. - Runs the binary with
DYLD_INSERT_LIBRARIESpointing to the stub.
The stub is only built on macOS 26+. On Linux or older macOS, just test-unit
runs cargo test --lib directly with no changes.
Note: The stub symbols are never called — unit tests exercise pure Rust logic only. If a test accidentally calls a PostgreSQL function it will crash with a NULL dereference (the desired fail-fast behavior).
If you run unit tests without just (e.g. directly via cargo test --lib),
you can use the wrapper script instead:
./scripts/run_unit_tests.sh pg18
# With test name filter:
./scripts/run_unit_tests.sh pg18 -- test_parse_basic
Extension fails to load
Ensure shared_preload_libraries = 'pg_trickle' is set and PostgreSQL has been restarted (not just reloaded). The extension requires shared memory initialization at startup.
Background worker not starting
Check that max_worker_processes is high enough. In sequential mode (default) pg_trickle needs one slot per database with stream tables. With parallel refresh enabled (pg_trickle.parallel_refresh_mode = 'on') it additionally needs max_dynamic_refresh_workers slots (default 4) shared across all databases.
See the worker-budget formula in CONFIGURATION.md for sizing guidance.
Check logs for details
The extension logs at various levels. Enable debug logging for more detail:
SET client_min_messages TO debug1;
Backup & Restore
Backing Up a pg_trickle-Enabled Database
pg_trickle uses two schemas that must be included in any backup alongside your application schema:
pgtrickle— stream table catalog, dependency graph, and refresh historypgtrickle_changes— change-buffer tables (one per CDC-enabled source)
Include both schemas explicitly when using pg_dump:
pg_dump \
--schema=public \
--schema=pgtrickle \
--schema=pgtrickle_changes \
mydb > mydb_backup.sql
If your application data lives in schemas other than public, include those
schemas too. Omitting pgtrickle_changes means any unprocessed CDC rows are
lost on restore, forcing the next differential refresh to fall back to FULL mode.
Restoring
Restore normally:
psql -d mydb_restored -f mydb_backup.sql
After restore, run the health check to validate catalog integrity:
SELECT * FROM pgtrickle.health_check();
OID Re-assignment After Restore
Change-buffer tables in pgtrickle_changes are named by storage-table OID (e.g.
changes_12345). OIDs may differ after restore if tables were created in a
different order. Run pg_trickle_repair_stream_table() on each stream table
immediately after restore to reconcile any OID mismatches:
SELECT pgtrickle.repair_stream_table(pgt_schema || '.' || pgt_name)
FROM pgtrickle.pgt_stream_tables;
Next Steps
- Getting Started — Create your first stream table in 5 minutes
- Pre-Deployment Checklist — Complete checklist for production deployments
- Best-Practice Patterns — Common data modeling patterns
- Configuration Reference — All GUC variables and tuning
Best-Practice Patterns for pg_trickle
This guide covers common data modeling patterns and recommended configurations for pg_trickle stream tables. Each pattern includes worked SQL examples, anti-patterns to avoid, and refresh mode recommendations.
Version: v0.14.0+. Some features require recent versions — check SQL_REFERENCE.md for per-feature availability.
Table of Contents
- Pattern 1: Bronze / Silver / Gold Materialization
- Pattern 2: Event Sourcing with Stream Tables
- Pattern 3: Slowly Changing Dimensions (SCD)
- Pattern 4: High-Fan-Out Topology
- Pattern 5: Real-Time Dashboards
- Pattern 6: Tiered Refresh Strategy
- General Guidelines
- Replica Bootstrap & PITR Alignment (v0.27.0)
- Pattern 7: Transactional Outbox (v0.28.0)
- Pattern 8: Transactional Inbox (v0.28.0)
Pattern 1: Bronze / Silver / Gold Materialization
A multi-layer approach where raw data flows through progressively refined stream tables, similar to a medallion architecture.
Architecture
[raw_events] ← Bronze: raw ingest table (regular table)
↓
[events_cleaned] ← Silver: filtered, deduplicated, typed
↓
[events_aggregated] ← Gold: business-level aggregates
SQL Example
-- Bronze: regular PostgreSQL table (source of truth)
CREATE TABLE raw_events (
event_id BIGSERIAL PRIMARY KEY,
user_id INT NOT NULL,
event_type TEXT NOT NULL,
payload JSONB,
received_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
-- Silver: cleaned and deduplicated events
SELECT pgtrickle.create_stream_table(
'events_cleaned',
$$SELECT DISTINCT ON (event_id)
event_id,
user_id,
event_type,
(payload->>'amount')::numeric AS amount,
received_at
FROM raw_events
WHERE event_type IN ('purchase', 'refund', 'subscription')$$,
schedule => '5s',
refresh_mode => 'DIFFERENTIAL'
);
-- Gold: per-user purchase summary
SELECT pgtrickle.create_stream_table(
'user_purchase_summary',
$$SELECT user_id,
COUNT(*) AS total_purchases,
SUM(amount) AS total_spent,
AVG(amount) AS avg_order
FROM events_cleaned
WHERE event_type = 'purchase'
GROUP BY user_id$$,
schedule => 'calculated',
refresh_mode => 'DIFFERENTIAL'
);
Recommended Configuration
| Layer | Refresh Mode | Schedule | Tier |
|---|---|---|---|
| Silver | DIFFERENTIAL | 5s – 30s | hot |
| Gold | DIFFERENTIAL | calculated | hot |
Anti-Patterns
- Don't use FULL refresh for Silver. With frequent small inserts, DIFFERENTIAL is 10–100x faster.
- Don't skip the Silver layer. Joining raw tables directly in Gold queries produces wider joins and slower deltas.
- Don't use IMMEDIATE mode for Gold. Aggregate maintenance on every DML row is expensive — batched DIFFERENTIAL is more efficient.
When NOT to use this pattern
- Your data never changes after insert — a single stream table is simpler.
- The Bronze source is external (foreign table,
dblink) — CDC triggers cannot be attached to foreign tables; use WAL CDC mode or FULL refresh. - You have fewer than ~10,000 rows in Silver — the overhead of three layers is not justified; use one or two tables instead.
Pattern 2: Event Sourcing with Stream Tables
Use stream tables as projections of an append-only event log. The source table is the event store; stream tables materialize different read models.
SQL Example
-- Event store (append-only source)
CREATE TABLE events (
event_id BIGSERIAL PRIMARY KEY,
aggregate_id UUID NOT NULL,
event_type TEXT NOT NULL,
payload JSONB NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
-- Projection 1: Current state per aggregate
SELECT pgtrickle.create_stream_table(
'aggregate_state',
$$SELECT DISTINCT ON (aggregate_id)
aggregate_id,
event_type AS last_event,
payload AS current_state,
created_at AS last_updated
FROM events
ORDER BY aggregate_id, created_at DESC$$,
schedule => '2s',
refresh_mode => 'DIFFERENTIAL'
);
-- Projection 2: Event counts by type per hour
SELECT pgtrickle.create_stream_table(
'hourly_event_counts',
$$SELECT date_trunc('hour', created_at) AS hour,
event_type,
COUNT(*) AS event_count
FROM events
GROUP BY 1, 2$$,
schedule => '10s',
refresh_mode => 'DIFFERENTIAL'
);
Recommended Configuration
| Projection | Refresh Mode | Why |
|---|---|---|
| Current state | DIFFERENTIAL | Small delta per cycle; DISTINCT ON supported |
| Hourly counts | DIFFERENTIAL | Algebraic aggregate (COUNT), efficient delta |
| String aggregations | AUTO | GROUP_RESCAN aggs may benefit from FULL |
Anti-Patterns
- Don't DELETE from the event store. pg_trickle tracks changes via triggers; mixing append and delete on the source creates unnecessary delta complexity. Archive old events to a separate table.
- Don't use
append_only => truewith UPDATE/DELETE patterns. Theappend_onlyflag skips DELETE tracking in the change buffer — only use it when the source truly never updates or deletes.
When NOT to use this pattern
- Your event log is consumed and processed in real time by application code — a stream table adds latency without benefit.
- You need strict per-event ordering guarantees within a transaction —
use
IMMEDIATEmode with a single-row projection instead. - Your events are multi-gigabyte payloads — stream tables replicate the whole row; store only metadata in the event log, not the payload.
Pattern 3: Slowly Changing Dimensions (SCD)
SCD Type 1: Overwrite
The stream table always reflects the current state. Source updates overwrite previous values.
-- Source: customer dimension table (updated in place)
CREATE TABLE customers (
customer_id INT PRIMARY KEY,
name TEXT NOT NULL,
email TEXT,
tier TEXT DEFAULT 'standard',
updated_at TIMESTAMPTZ DEFAULT now()
);
-- SCD-1: current customer state enriched with order stats
SELECT pgtrickle.create_stream_table(
'customer_360',
$$SELECT c.customer_id,
c.name,
c.email,
c.tier,
COUNT(o.id) AS total_orders,
COALESCE(SUM(o.amount), 0) AS lifetime_value
FROM customers c
LEFT JOIN orders o ON o.customer_id = c.customer_id
GROUP BY c.customer_id, c.name, c.email, c.tier$$,
schedule => '30s',
refresh_mode => 'DIFFERENTIAL'
);
SCD Type 2: History Tracking
For SCD-2, maintain a history table with valid-from/valid-to ranges. The stream table provides the current snapshot.
-- Source: customer history with validity ranges
CREATE TABLE customer_history (
customer_id INT NOT NULL,
name TEXT NOT NULL,
tier TEXT NOT NULL,
valid_from TIMESTAMPTZ NOT NULL,
valid_to TIMESTAMPTZ, -- NULL = current
PRIMARY KEY (customer_id, valid_from)
);
-- Current active records only
SELECT pgtrickle.create_stream_table(
'customers_current',
$$SELECT customer_id, name, tier, valid_from
FROM customer_history
WHERE valid_to IS NULL$$,
schedule => '10s',
refresh_mode => 'DIFFERENTIAL'
);
Anti-Patterns
- Don't use FULL refresh for SCD-1 with large dimension tables. Customer tables with millions of rows but few changes per cycle are ideal for DIFFERENTIAL.
- Don't forget to index
valid_to IS NULLfor SCD-2 sources. Without it, the delta scan touches all historical rows.
When NOT to use this pattern
- You already have a purpose-built slowly-changing-dimension ETL tool (e.g. dbt snapshots) — pg_trickle's SCD support is complementary, not a replacement, and duplicate ownership creates confusion.
- Your dimension table changes every row on every load — DIFFERENTIAL offers no benefit; use FULL refresh or rethink the source update pattern.
- You need Type 3 (add-a-column) or Type 6 SCD — those require schema evolution that pg_trickle does not automate today.
Pattern 4: High-Fan-Out Topology
When a single source table feeds many downstream stream tables.
Architecture
[orders]
↙ ↓ ↓ ↘
[daily_totals] [by_region] [by_product] [top_customers]
SQL Example
-- Single source feeding multiple views
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
customer_id INT NOT NULL,
region TEXT NOT NULL,
product_id INT NOT NULL,
amount NUMERIC(10,2) NOT NULL,
order_date DATE NOT NULL DEFAULT CURRENT_DATE
);
-- Fan-out: 4 stream tables on 1 source
SELECT pgtrickle.create_stream_table('daily_totals',
'SELECT order_date, SUM(amount) AS daily_total, COUNT(*) AS order_count
FROM orders GROUP BY order_date',
schedule => '5s', refresh_mode => 'DIFFERENTIAL');
SELECT pgtrickle.create_stream_table('by_region',
'SELECT region, SUM(amount) AS total, COUNT(*) AS cnt
FROM orders GROUP BY region',
schedule => '5s', refresh_mode => 'DIFFERENTIAL');
SELECT pgtrickle.create_stream_table('by_product',
'SELECT product_id, SUM(amount) AS total, COUNT(*) AS cnt
FROM orders GROUP BY product_id',
schedule => '5s', refresh_mode => 'DIFFERENTIAL');
SELECT pgtrickle.create_stream_table('top_customers',
'SELECT customer_id, SUM(amount) AS lifetime_value, COUNT(*) AS order_count
FROM orders GROUP BY customer_id',
schedule => '10s', refresh_mode => 'DIFFERENTIAL');
Recommended Configuration
- All fan-out targets share the same source change buffer — CDC overhead
is paid once regardless of how many stream tables read from
orders. - Use
schedule => 'calculated'on downstream STs when they chain from other stream tables. - Consider
pg_trickle.max_concurrent_refreshesif fan-out exceeds 8 (default: 4 concurrent refreshes).
Anti-Patterns
- Don't use IMMEDIATE mode on high-fan-out sources. Each DML row triggers N refreshes (one per downstream ST). Use DIFFERENTIAL with a batched schedule instead.
- Don't set different schedules on STs that should be consistent.
If
daily_totalsandby_regionmust agree, give them the same schedule or usediamond_consistency => 'atomic'.
When NOT to use this pattern
- You only have one or two downstream stream tables — the fan-out pattern adds planning overhead that isn't justified below ~4 targets.
- Downstream queries have incompatible refresh modes (e.g. one needs IMMEDIATE, another needs FULL) — prefer separate source tables.
- All downstream STs will always be queried together — a single wider stream table may be simpler and faster.
Pattern 5: Real-Time Dashboards
For dashboards that need sub-second refresh latency.
SQL Example
-- Live order monitor (sub-second freshness)
SELECT pgtrickle.create_stream_table(
'order_monitor',
$$SELECT
date_trunc('minute', order_date) AS minute,
region,
COUNT(*) AS orders,
SUM(amount) AS revenue
FROM orders
WHERE order_date >= CURRENT_DATE
GROUP BY 1, 2$$,
schedule => '1s',
refresh_mode => 'DIFFERENTIAL'
);
-- For truly real-time needs, use IMMEDIATE mode (triggers on each DML)
SELECT pgtrickle.create_stream_table(
'live_counter',
$$SELECT region, COUNT(*) AS cnt, SUM(amount) AS total
FROM orders GROUP BY region$$,
schedule => 'IMMEDIATE',
refresh_mode => 'DIFFERENTIAL'
);
When to Use IMMEDIATE vs Scheduled DIFFERENTIAL
| Scenario | Mode | Why |
|---|---|---|
| Dashboard polls every 1s | 1s | Batched delta amortizes overhead |
| GraphQL subscription, < 100ms | IMMEDIATE | Triggers fire synchronously per DML |
| Aggregate with GROUP_RESCAN | 5s+ | Avoid per-row full rescans |
| High write throughput (>1K/s) | 2s–5s | IMMEDIATE adds latency to each INSERT |
Anti-Patterns
- Don't use IMMEDIATE for complex joins. Each INSERT/UPDATE/DELETE fires the full DVM delta SQL synchronously — multi-table joins in IMMEDIATE mode add significant latency to writes.
- Don't forget
pooler_compatibility_modewith PgBouncer. Transaction pooling drops temp tables between transactions; enable this flag to avoid stale PREPARE statements.
When NOT to use this pattern
- The data source itself is the bottleneck (slow sensors, infrequent API polling) — a sub-second schedule on a stream table that changes once a minute burns CPU for nothing.
- You need consistency across several related tiles — schedule them together or use a single wider query rather than sub-second individual refreshes that can transiently disagree.
- Write throughput exceeds ~5,000 rows/s — IMMEDIATE mode adds latency
to every write; profile with
pg_trickle.latency_percentiles()first.
Pattern 6: Tiered Refresh Strategy
Assign refresh importance tiers to control scheduling priority.
-- Hot: real-time operational dashboard
SELECT pgtrickle.create_stream_table('live_metrics', ...);
SELECT pgtrickle.alter_stream_table('live_metrics', tier => 'hot');
-- Warm: hourly business reports (2x interval multiplier)
SELECT pgtrickle.create_stream_table('hourly_report', ...,
schedule => '1m');
SELECT pgtrickle.alter_stream_table('hourly_report', tier => 'warm');
-- Cold: daily analytics (10x interval multiplier)
SELECT pgtrickle.create_stream_table('daily_analytics', ...,
schedule => '5m');
SELECT pgtrickle.alter_stream_table('daily_analytics', tier => 'cold');
-- Frozen: archive/audit (skip refresh entirely)
SELECT pgtrickle.alter_stream_table('audit_log_summary', tier => 'frozen');
Tier Multipliers
| Tier | Schedule Multiplier | Use Case |
|---|---|---|
| hot | 1x | Operational dashboards, alerts |
| warm | 2x | Hourly reports, batch pipelines |
| cold | 10x | Daily analytics, low-priority STs |
| frozen | skip | Paused/archived, manual refresh |
When NOT to use this pattern
- All your stream tables are equally critical — don't introduce tier complexity just to have tiers; use a flat schedule instead.
- Your scheduler runs with a single worker — tiering helps multi-worker
scheduling; it has no effect when
max_concurrent_refreshes = 1. - Tier multipliers change too frequently — tier is a static property;
if freshness requirements change continuously, use SLA scheduling
(
pg_trickle.sla_scheduling) instead.
General Guidelines
Choosing a Refresh Mode
| Scenario | Recommended Mode |
|---|---|
| Source has < 5% change ratio per cycle | DIFFERENTIAL |
| Source changes > 50% per cycle | FULL |
| Query is a simple filter/projection | DIFFERENTIAL |
| Query has GROUP_RESCAN aggregates (MIN, MAX) | AUTO |
| Query joins 4+ tables | DIFFERENTIAL |
| Target table < 1000 rows | FULL |
| Need per-row latency guarantee | IMMEDIATE |
Use pgtrickle.recommend_refresh_mode() (v0.14.0+) for automated
analysis:
SELECT pgt_name, recommended_mode, confidence, reason
FROM pgtrickle.recommend_refresh_mode();
Monitoring Checklist
-- Check refresh efficiency across all stream tables
SELECT pgt_name, refresh_mode, diff_speedup, avg_change_ratio
FROM pgtrickle.refresh_efficiency()
ORDER BY total_refreshes DESC;
-- Find stream tables that might benefit from mode change
SELECT pgt_name, current_mode, recommended_mode, reason
FROM pgtrickle.recommend_refresh_mode()
WHERE recommended_mode != 'KEEP';
-- Check for error states
SELECT pgt_name, status, last_error_message
FROM pgtrickle.stream_tables_info
WHERE status IN ('ERROR', 'SUSPENDED');
-- Export definitions for backup
SELECT pgtrickle.export_definition(pgt_schema || '.' || pgt_name)
FROM pgtrickle.pgt_stream_tables;
Common Mistakes
-
Using FULL refresh by default. Start with DIFFERENTIAL — it's correct for 80%+ of workloads. Switch to FULL only when
recommend_refresh_mode()suggests it. -
Over-scheduling. A 1-second schedule on a table with 1-hour change cycles wastes CPU. Match the schedule to actual data arrival rate.
-
Ignoring
append_only. If the source table is truly append-only (no UPDATEs, no DELETEs), setappend_only => trueto halve change buffer writes. -
Not using
calculatedschedule for chained STs. When ST-B reads from ST-A, useschedule => 'calculated'on ST-B to avoid unnecessary refreshes. The scheduler automatically propagates ST-A changes downstream. -
Mixing IMMEDIATE and complex joins. IMMEDIATE mode fires delta SQL on every DML — an 8-table join in IMMEDIATE mode adds 50–200ms to each INSERT. Use scheduled DIFFERENTIAL for complex queries.
Replica Bootstrap & PITR Alignment (v0.27.0)
When bootstrapping a new replica or performing point-in-time recovery, stream tables need special handling because their state is derived from source data at a specific frontier (LSN + timestamp).
The problem
After a pg_basebackup or logical restore, stream table rows are present
but their frontiers may be stale. The next refresh would trigger a FULL
re-scan of all source data, which is expensive for large stream tables.
Solution: use snapshots for replica bootstrap
-- On the primary: export the stream table state
SELECT pgtrickle.snapshot_stream_table(
'public.orders_agg',
'pgtrickle.orders_agg_replica_init'
);
-- Dump only the snapshot table to the replica
pg_dump -t 'pgtrickle.orders_agg_replica_init' mydb | psql replica_db
-- On the replica: restore and align the frontier
SELECT pgtrickle.restore_from_snapshot(
'public.orders_agg',
'pgtrickle.orders_agg_replica_init'
);
-- Clean up the bootstrap snapshot
SELECT pgtrickle.drop_snapshot('pgtrickle.orders_agg_replica_init');
After restore_from_snapshot(), the frontier is set to the snapshot's
frontier and the next refresh is DIFFERENTIAL — only changes after the
snapshot creation time are fetched.
PITR alignment workflow
When performing PITR to a specific LSN:
- Take a snapshot immediately before the target LSN
- Restore the database to the target LSN using
pg_basebackup+ WAL replay - Run
restore_from_snapshot()on each stream table to align frontiers
-- Step 1: snapshot all stream tables (before PITR)
SELECT pgtrickle.snapshot_stream_table(
pgt_schema || '.' || pgt_name,
'pgtrickle.pitr_snapshot_' || pgt_name || '_' || extract(epoch from now())::bigint
)
FROM pgtrickle.pgt_stream_tables
WHERE status = 'ACTIVE';
-- Step 3 (after PITR): restore all snapshots
SELECT pgtrickle.restore_from_snapshot(
pgt_schema || '.' || pgt_name,
'pgtrickle.pitr_snapshot_' || pgt_name || '_<epoch>'
)
FROM pgtrickle.pgt_stream_tables;
Performance: Restoring a 1M-row stream table from a snapshot completes in < 5 seconds (bulk INSERT from local table). The frontier alignment ensures the first differential refresh fetches only new changes, not all rows.
Pattern 7: Transactional Outbox (v0.28.0)
Requires: v0.28.0+
The transactional outbox pattern reliably publishes stream table deltas to
external consumers — even if the consumer is temporarily offline. Each time the
stream table refreshes, pg_trickle writes a header row to a dedicated outbox
table. Consumers read from the outbox via poll_outbox(), process the delta,
then commit their offset.
Use this pattern when:
- You need to publish stream table changes to a message queue, webhook, or another service
- Consumers need at-least-once delivery guarantees
- Multiple independent consumers need to read the same stream independently
- You want replay / seek-to-offset for recovery
Architecture
orders (base table)
└─→ orders_agg (stream table)
└─→ pgt_outbox_orders_agg (outbox table)
├─→ Consumer group A: analytics pipeline
└─→ Consumer group B: notification service
SQL Example
-- 1. Create the stream table
SELECT pgtrickle.create_stream_table(
'public.orders_agg',
'SELECT customer_id, SUM(amount) AS total, COUNT(*) AS cnt FROM orders GROUP BY customer_id',
schedule_seconds => 5
);
-- 2. Enable the outbox
SELECT pgtrickle.enable_outbox('public.orders_agg', retention_hours => 48);
-- 3. Create consumer groups
SELECT pgtrickle.create_consumer_group('analytics', 'public.orders_agg', auto_offset_reset => 'latest');
SELECT pgtrickle.create_consumer_group('notifications', 'public.orders_agg', auto_offset_reset => 'latest');
-- 4. Consumer A polls and processes
DO $$
DECLARE
r RECORD;
last_id BIGINT := 0;
BEGIN
FOR r IN
SELECT * FROM pgtrickle.poll_outbox('analytics', 'worker-1', batch_size => 50)
LOOP
-- process r.payload (JSONB with inserted/deleted row arrays)
last_id := r.outbox_id;
END LOOP;
IF last_id > 0 THEN
PERFORM pgtrickle.commit_offset('analytics', 'worker-1', last_id);
END IF;
END;
$$;
-- 5. Check consumer lag
SELECT * FROM pgtrickle.consumer_lag('analytics');
Consumer Group Tips
| Scenario | Setting |
|---|---|
| Multiple competing workers sharing one offset | Put all workers in the same group |
| Independent pipelines that each need the full stream | Create a separate group per pipeline |
| Replay from the beginning | seek_offset('my_group', 'worker-1', 0) |
| Resume after a crash without re-processing | Commit offsets frequently; use extend_lease() for long processing |
Recommended Configuration
pg_trickle.outbox_enabled = true
pg_trickle.outbox_retention_hours = 48 # keep 2 days of history
pg_trickle.outbox_skip_empty_delta = true # don't write rows for no-op refreshes
pg_trickle.outbox_force_retention = true # keep rows until all groups commit
pg_trickle.consumer_dead_threshold_hours = 24 # mark workers dead after 24h silence
pg_trickle.consumer_cleanup_enabled = true
Anti-Patterns
- Polling without committing: If
commit_offset()is never called, the lease expires and the rows are re-delivered. Always commit after successful processing. - One group per worker: Use one group and multiple named consumers within it for competing-consumer parallelism. Use multiple groups only when pipelines are truly independent.
- Long processing without heartbeats: Call
consumer_heartbeat()every 10–15 seconds for long-running processing to avoid being marked dead.
When NOT to use this pattern
- You only need to expose stream table changes to a single application that reads directly from PostgreSQL — a NOTIFY/LISTEN trigger or change table is simpler than a full outbox.
- Delivery guarantees are not required (analytics, dashboards) — the overhead of consumer groups and offset tracking is not justified.
- Your stream table refreshes every few seconds and consumers can tolerate a few seconds of lag — just poll the stream table directly.
Pattern 8: Transactional Inbox (v0.28.0)
Requires: v0.28.0+
The transactional inbox pattern provides a reliable, idempotent message receiver inside PostgreSQL. External producers write events to the inbox table; pg_trickle maintains stream tables that give you live views of pending, failed, and processed messages — all updated incrementally.
Use this pattern when:
- You receive events from external systems and need to process them exactly-once
- You want automatic dead-letter handling for failed messages
- Multiple workers need to process different aggregates without stepping on each other
- You need per-aggregate ordering guarantees
Architecture
external producer (Kafka / webhook / custom application)
└─→ pgtrickle.orders_inbox (raw event table)
├─→ orders_inbox_pending (stream table: awaiting processing)
├─→ orders_inbox_dlq (stream table: failed messages)
└─→ orders_inbox_stats (stream table: event counts by type)
SQL Example
-- 1. Create the inbox
SELECT pgtrickle.create_inbox(
'orders_inbox',
schema => 'pgtrickle',
max_retries => 3,
with_dead_letter => true,
with_stats => true,
schedule_seconds => 5
);
-- 2. External system inserts a message
INSERT INTO pgtrickle.orders_inbox (event_id, event_type, aggregate_id, payload)
VALUES (
gen_random_uuid()::text,
'order.placed',
'customer-123',
'{"order_id": 42, "amount": 99.50}'::jsonb
);
-- 3. Worker polls pending messages and processes
UPDATE pgtrickle.orders_inbox
SET processed_at = now()
WHERE event_id = '<event_id>'
AND processed_at IS NULL;
-- 4. Check inbox health
SELECT pgtrickle.inbox_health('orders_inbox');
-- 5. Replay failed messages
SELECT pgtrickle.replay_inbox_messages(
'orders_inbox',
ARRAY['event-id-1', 'event-id-2']
);
Per-Aggregate Ordering
When messages for the same customer / entity must be processed in sequence:
-- Enable ordering: only surface the next unprocessed message per aggregate
SELECT pgtrickle.enable_inbox_ordering(
'orders_inbox',
aggregate_id_col => 'aggregate_id',
seq_col => 'event_sequence'
);
-- Workers now read from next_orders_inbox (one row per aggregate)
SELECT * FROM pgtrickle.next_orders_inbox;
Multi-Worker Partitioning
Scale horizontally without external coordination:
-- Worker 0 of 4 handles its share of aggregates
SELECT * FROM pgtrickle.orders_inbox_pending
WHERE pgtrickle.inbox_is_my_partition(aggregate_id, 0, 4);
Recommended Configuration
pg_trickle.inbox_enabled = true
pg_trickle.inbox_processed_retention_hours = 72 # keep 3 days of processed msgs
pg_trickle.inbox_dlq_retention_hours = 0 # keep DLQ forever for forensics
pg_trickle.inbox_dlq_alert_max_per_refresh = 10 # alert on DLQ growth
Anti-Patterns
- Not marking messages as processed: The
_pendingstream table will keep growing. Always setprocessed_at = now()after successful processing. - Ignoring the DLQ: Monitor
orders_inbox_dlqand replay or investigate failed messages regularly. Useinbox_health()in your alerting pipeline. - Skipping idempotency: The inbox uses
event_idfor deduplication. Producers must supply stable, uniqueevent_idvalues — typically a UUID derived from the source event.
When NOT to use this pattern
- Message volume is very high (>10,000/s) — the inbox table becomes a hot bottleneck; consider a dedicated message queue (Kafka, NATS) fronting a batch INSERT into the inbox.
- Processing is purely stateless and idempotency is guaranteed by the
producer — writing to an inbox and querying
_pendingadds latency without benefit over direct INSERT + trigger. - The event source already provides at-least-once with dedup — a second layer of dedup in the inbox wastes storage.
See also: Use Cases · Performance Cookbook · SQL Reference · Tutorials: What Happens on INSERT · Outbox Pattern · Inbox Pattern
Mental Model: How pg_trickle Works
This document explains the core concepts behind pg_trickle's differential view maintenance engine for developers who know SQL but have not studied incremental view maintenance (IVM) theory. Analogies come before formulas.
1. The Problem: Expensive Full Recomputation
A standard PostgreSQL materialized view is a snapshot. When the source data
changes, you call REFRESH MATERIALIZED VIEW and PostgreSQL re-runs the entire
defining query — scanning every row in every source table, applying all the
joins, filtering, and aggregating — every time.
For a billion-row orders table with 100 new orders since the last refresh, this is the equivalent of re-reading an entire library to update one paragraph.
pg_trickle solves this with differential maintenance: compute only the change in the view output caused by the change in the inputs.
2. The Key Insight: Deltas Are Just Rows
Think of a change to a table as a signed multiset of rows:
+1for an inserted row-1for a deleted row-1for the old version of an updated row,+1for the new version
If the source table T changes by a delta ΔT, and the view V = f(T),
then the view output changes by ΔV = f(T + ΔT) - f(T).
For many SQL operators, ΔV can be computed without reading T at all —
only ΔT is needed. For a simple SELECT * FROM orders WHERE status = 'active':
ΔV = { new rows in ΔT where status = 'active' } - { deleted rows in ΔT where status = 'active' }
For a COUNT(*) aggregate, the delta is even simpler:
Δcount = (inserted active rows) - (deleted active rows)
This is why pg_trickle can refresh a stream table in milliseconds even when the source table has billions of rows.
3. Change Capture: The Change Buffer
Before pg_trickle can compute ΔV, it needs to know ΔT. It captures
changes using row-level AFTER triggers (the default) or WAL decoding.
Each source table gets a dedicated change buffer table:
pgtrickle_changes.changes_<source_table_oid>. The trigger writes every
inserted, updated, or deleted row into this buffer as part of the same
transaction as the DML. This gives you:
- Atomicity: A committed change is guaranteed to be in the buffer.
- No missed changes: There is no window between commit and capture.
- Snapshot isolation: The buffer holds the before/after images of each row.
The change buffer accumulates rows between refresh cycles. On each refresh, the DVM engine reads the buffer, computes the delta SQL, applies it to the stream table, and truncates the buffer.
4. The Delta SQL
For each stream table, pg_trickle pre-generates a delta SQL template at
creation time. This template is parameterized by the change buffer contents
and produces the ΔV rows to apply.
For a simple aggregation like:
SELECT customer_id, COUNT(*) AS order_count
FROM orders GROUP BY customer_id
The delta SQL looks roughly like:
-- Compute which customers changed in this refresh window
WITH changed_customers AS (
SELECT DISTINCT customer_id FROM pgtrickle_changes.changes_<oid>
),
-- Recompute count for only the affected customers
new_counts AS (
SELECT customer_id, COUNT(*) AS order_count
FROM orders
WHERE customer_id IN (SELECT customer_id FROM changed_customers)
GROUP BY customer_id
)
-- Apply: delete old rows, insert new rows for changed customers
MERGE INTO stream_table AS t
USING new_counts AS s ON t.customer_id = s.customer_id
WHEN MATCHED THEN UPDATE SET order_count = s.order_count
WHEN NOT MATCHED THEN INSERT VALUES (s.customer_id, s.order_count)
WHEN NOT MATCHED BY SOURCE AND t.customer_id IN (SELECT customer_id FROM changed_customers)
THEN DELETE;
The key property: the FROM orders scan is filtered to only the affected
customer IDs, not the full table. When 10 customers out of 10 million changed,
only 10 customer IDs are scanned.
5. Algebraic Operators: What Can Be Maintained Incrementally?
Not all SQL operators can be maintained in O(Δ). pg_trickle classifies them into categories:
✅ Fully Incremental (O(Δ))
SELECTwith filters, projections, castsINNER JOIN,LEFT JOIN(equi-join with indexed keys)GROUP BYwith algebraic aggregates:COUNT,SUM,MIN,MAX,AVGDISTINCT(with reference counting)UNION ALLWHERE EXISTS/WHERE NOT EXISTS(converted to semi/anti-join)HAVING(filter on aggregate result)
⚠️ Conditionally Incremental
COUNT(DISTINCT x)— incremental with algebraic Z-set countingSTDDEV,VARIANCE— incremental using sum-of-squares decompositionTOP-N(ORDER BY ... LIMIT) — incremental within the top-N window- Multi-table joins — incremental, but delta SQL becomes larger with more tables
CUBE/ROLLUP— expanded into UNION ALL branches, each incremental
❌ Not Incremental (falls back to FULL refresh)
TABLESAMPLE— non-deterministic, cannot be differentiatedVOLATILEfunctions (random(),now(),nextval()) in the SELECT listORDER BYwithoutLIMIT— full sort on every refreshFETCH FIRSTwithoutORDER BY— non-deterministic- Window functions in the output — planned for future support
- Recursive CTEs with
CYCLE— non-terminating delta
When a query contains a non-incremental operator, pg_trickle automatically uses FULL refresh — replacing the stream table contents entirely. This is transparent to the application.
6. The Row Identity Problem
MERGE needs to know which rows in the stream table correspond to which rows
in the delta. This is the row identity problem.
For stream tables with a natural primary key in the output (e.g., customer_id),
the MERGE key is obvious. For aggregations without a natural key, or for queries
with complex output structures, pg_trickle computes a row identity hash
(__pgt_row_id) from the grouping keys or the query structure. This column is
maintained automatically and is invisible in normal SELECT * queries.
7. The Refresh Cycle
Source DML → CDC trigger → change buffer (same txn)
↓
Scheduler background worker (async)
↓
delta SQL → MERGE into stream table
↓
Truncate change buffer
The scheduler wakes every pg_trickle.scheduler_interval_ms (default 1s),
checks which stream tables are ready to refresh based on their schedule,
and runs the refresh in dependency (topological) order.
Key property: The application sees a consistent read of the stream table at all times. The MERGE either has fully committed or not. There is no partial update visible to readers.
8. DAG Chaining: Stream Tables as Sources
Stream tables can themselves be sources for other stream tables, forming a directed acyclic graph (DAG) of dependencies:
orders → orders_by_customer → customer_top10
↑
order_items
When orders changes, pg_trickle refreshes orders_by_customer first, then
uses its delta to refresh customer_top10. Each step is O(Δ), so the full
chain completes in time proportional to the number of changed rows — not the
total data size.
pg_trickle detects cycles and rejects stream table definitions that would
create them (unless pg_trickle.allow_circular = true, which enables fixpoint
iteration for convergent circular queries).
The scheduler runs refreshes in topological order and supports parallel
refresh (pg_trickle.parallel_refresh_mode = 'on', the default) to execute
independent branches of the DAG concurrently.
See Also
- LIMITATIONS.md — What pg_trickle cannot do and why
- ARCHITECTURE.md — Internal module structure
- DVM_REWRITE_RULES.md — SQL rewrite passes
- DVM_OPERATORS.md — Per-operator delta rules
- PERFORMANCE_CHEATSHEET.md — Quick performance guide
pg_trickle Limitations
This document covers what pg_trickle cannot do, the constraints of DIFFERENTIAL mode, source table restrictions, and operational anti-patterns. Use the decision tree at the end to quickly determine if your use case is supported.
Unsupported SQL Constructs
The following SQL features are not supported in the defining query of a
stream table. Attempting to use them will produce an UnsupportedOperator
error at creation time, or at the first refresh attempt.
In DIFFERENTIAL mode only
These constructs force a fallback to FULL refresh. The stream table is created successfully, but every refresh performs a full table recomputation:
| Construct | Reason | Workaround |
|---|---|---|
ORDER BY without LIMIT | Result ordering is non-deterministic as a delta | Remove ORDER BY, or use ORDER BY ... LIMIT N for Top-N views |
TABLESAMPLE | Non-deterministic sampling cannot be differentiated | Use FULL mode explicitly |
VOLATILE functions in SELECT | random(), now(), clock_timestamp(), nextval() change on every call | Pre-compute volatile values in the source table, or use FULL mode |
STABLE functions in GROUP BY keys | Key values change between refresh cycles | Use stable or immutable functions only |
| Window functions in output | ROW_NUMBER(), RANK(), LEAD(), LAG() require global reordering | Use FULL mode, or pre-aggregate and use Top-N views |
FETCH FIRST without ORDER BY | Non-deterministic selection | Add a deterministic ORDER BY and use Top-N via LIMIT |
GROUPING SETS beyond branch limit | Explosion prevents O(Δ) maintenance | Reduce dimensions, or raise pg_trickle.max_grouping_set_branches |
Not supported at all (any mode)
| Construct | Reason |
|---|---|
WITH RECURSIVE | Recursive CTEs require convergence logic not yet implemented |
| DDL inside the defining query | CREATE TABLE, CALL etc. are not valid in SELECT |
RETURNING clauses | Not applicable to SELECT queries |
FOR UPDATE / FOR SHARE | Locking hints cannot be used in defining queries |
| Subqueries with side effects | INSERT ... RETURNING in subqueries |
pg_catalog internal tables as sources | Internal catalog tables are not tracked by CDC |
| Temp tables as sources | Temporary tables are session-scoped; CDC triggers cannot be installed |
DIFFERENTIAL Mode Constraints
Source table requirements
For DIFFERENTIAL mode to work correctly, each source table must:
-
Have a primary key or unique index on the columns used as join keys. Without a reliable row identity, the MERGE step cannot match old and new versions of a row. pg_trickle can fall back to a hash-based row ID (
__pgt_row_id) for sources without primary keys, but this adds overhead. -
Not use UNLOGGED or TEMPORARY storage for the stream table output. The stream table must survive a crash-recovery cycle. Source tables can be UNLOGGED (changes are still captured by triggers).
-
Not be altered concurrently in ways that change column structure while a refresh is running. pg_trickle blocks source DDL by default (
pg_trickle.block_source_ddl = true). Disabling this risks schema inconsistency between the change buffer and the stream table.
Multi-source join constraints
When a defining query joins multiple source tables:
- All join keys must be equi-joins (e.g.,
t1.id = t2.id). Range joins (t1.ts BETWEEN t2.start AND t2.end) force FULL mode. - The number of delta CTEs grows with the number of sources. Queries
joining 5+ large tables may hit the
pg_trickle.max_diff_cteslimit. The default limit is 200; raise it or simplify the query. - Left outer joins with nullable right-side keys add correctness complexity. pg_trickle handles them correctly, but the delta SQL is larger.
Aggregate constraints
| Aggregate | Supported? | Notes |
|---|---|---|
COUNT(*) | ✅ Yes | Fully algebraic |
SUM(x) | ✅ Yes | Fully algebraic |
MIN(x), MAX(x) | ✅ Yes | With reference counting |
AVG(x) | ✅ Yes | Via sum + count decomposition |
STDDEV(x), VARIANCE(x) | ✅ Yes | Via sum-of-squares decomposition |
COUNT(DISTINCT x) | ✅ Yes | Via Z-set algebraic counting |
ARRAY_AGG(x) | ❌ No | Order-dependent; use FULL mode |
STRING_AGG(x, sep) | ❌ No | Order-dependent; use FULL mode |
JSON_AGG(x) | ❌ No | Order-dependent; use FULL mode |
PERCENTILE_CONT(f) WITHIN GROUP (ORDER BY x) | ❌ No | Requires global sort |
MODE() | ❌ No | Requires global frequency computation |
| Custom user-defined aggregates | ⚠️ Maybe | Supported if the aggregate provides sfunc + finalfunc that pg_trickle can decompose; marked STRICT aggregates are rejected |
Source Table Restrictions
Supported source types
| Source type | Supported? | Notes |
|---|---|---|
| Regular heap tables | ✅ Yes | Full CDC support |
| Partitioned tables (declarative) | ✅ Yes | Triggers installed on each partition |
| Foreign tables (postgres_fdw) | ✅ Yes | Snapshot-comparison mode |
| Materialized views | ✅ Yes | Snapshot-comparison mode |
| Other stream tables | ✅ Yes | DAG chaining supported |
| UNLOGGED tables | ✅ Yes (source) | Changes captured; stream table output must be logged |
| Temporary tables | ❌ No | Session-scoped; CDC triggers cannot persist |
System catalogs (pg_class, etc.) | ❌ No | Not tracked by CDC |
| Views (non-materialized) | ❌ No | Automatically inlined; the base tables become the sources |
| Remote tables via dblink | ⚠️ Limited | Use foreign tables via postgres_fdw instead |
Column type constraints
text-typed columns named as join keys work, but are less efficient than integer or UUID keys. Use an index on the join key columns.jsonbcolumns in GROUP BY are supported but hash joins on JSONB are expensive. Consider extracting the key sub-field.byteacolumns work in the output but cannot be used as GROUP BY keys in DIFFERENTIAL mode.
Operational Anti-Patterns
Anti-pattern 1: Very high write rates with low schedules
Problem: If a source table receives 100K inserts/second and the stream table schedule is 1 second, the change buffer accumulates 100K rows per cycle. The DIFFERENTIAL delta SQL must process all 100K rows on every refresh, which may take longer than 1 second — causing the scheduler to fall behind.
Fix:
- Increase the schedule to allow batching:
schedule => '10s' - Enable the adaptive fallback:
pg_trickle.differential_max_change_ratio = 0.15(default: fall back to FULL when > 15% of the source table changed) - Use
pg_trickle.max_delta_estimate_rowsto cap delta size
Anti-pattern 2: Unbounded DAG depth
Problem: A chain of 20+ stream tables where each depends on the previous creates O(depth) sequential refresh latency on every cycle.
Fix: Flatten the DAG where possible. Use parallel refresh
(pg_trickle.parallel_refresh_mode = 'on') for independent branches.
Consider whether intermediate stream tables are necessary.
Anti-pattern 3: Schema changes on live sources
Problem: ALTER TABLE ... DROP COLUMN on a source table while pg_trickle
is running will break the change buffer schema and cause refresh errors.
Fix: Keep pg_trickle.block_source_ddl = true (the default). This causes
schema changes to fail with a descriptive error; you can then update the stream
table query explicitly before re-applying the schema change.
Anti-pattern 4: Treating stream tables as application write targets
Problem: Inserting or updating rows directly in a stream table bypasses pg_trickle's refresh logic. On the next refresh, the direct writes will be overwritten.
Fix: Stream tables are read-only from the application's perspective.
All writes must go through the source tables. Use the
pgtrickle.repair_stream_table() function if a stream table gets into an
inconsistent state.
Anti-pattern 5: Using pg_trickle.enabled = false in production
Problem: Setting pg_trickle.enabled = false globally stops all refreshes.
Change buffers accumulate indefinitely. Re-enabling causes a burst refresh of
all stream tables simultaneously.
Fix: Use pgtrickle.suspend_stream_table() to pause individual stream
tables, or pg_trickle.drain_mode = true to stop new work while completing
in-flight refreshes.
"Will This Work?" Decision Tree
Does your query use window functions in the output?
YES → Use FULL mode (refresh_mode => 'FULL')
NO ↓
Does your query use volatile functions (random(), now(), nextval())?
YES → Use FULL mode, or pre-compute the volatile value in the source
NO ↓
Does your query use ORDER BY without LIMIT?
YES → Remove ORDER BY, or use LIMIT N for a Top-N stream table
NO ↓
Does your query use WITH RECURSIVE?
YES → Not supported. Materialize the recursive query into a regular table first.
NO ↓
Do all join keys use equi-join conditions (= not BETWEEN / >=)?
NO → Use FULL mode, or rewrite the join condition
YES ↓
Does every source table have a primary key or unique index on the join key?
NO → pg_trickle will use hash-based row IDs (slightly less efficient, but works)
YES ↓
✅ Your query is a good candidate for DIFFERENTIAL mode.
Use: refresh_mode => 'DIFFERENTIAL' or 'AUTO' (default).
Multi-Column NOT IN with Nullable Elements (COR-1, v0.58.0)
When a defining query contains a multi-column NOT IN subquery such as:
SELECT a, b FROM t
WHERE (a, b) NOT IN (SELECT x, y FROM s)
pg_trickle v0.55.0 introduced an optimisation that rewrites (a, b) IN (SELECT x, y …) as a
SemiJoin and NOT IN as an AntiJoin. However, SQL semantics for NOT IN differ from
AntiJoin semantics when either side of the comparison can be NULL: SQL propagates
UNKNOWN (which excludes the outer row), whereas an AntiJoin keeps the outer row.
Behaviour: When any element on the left-hand side of the row constructor is a NULL
constant, or when any column in the subquery's SELECT list is a NULL literal,
pg_trickle v0.58.0 detects this condition and falls back to the subquery-based
(FULL refresh) execution path, emitting a NOTICE:
NOTICE: pg_trickle: multi-column NOT IN with nullable elements cannot be
rewritten to an anti-join; falling back to subquery-based delta computation.
Workaround: Rewrite using NOT EXISTS or add explicit IS NOT NULL guards to avoid
NULL-producing expressions in the row constructor.
Known Future Improvements
| Limitation | Planned in |
|---|---|
| Window functions in output | v1.1+ |
WITH RECURSIVE support | v1.2+ |
STRING_AGG / ARRAY_AGG incremental maintenance | Researching |
| Cross-database stream tables (without foreign tables) | Not planned |
See Also
- MENTAL_MODEL.md — Conceptual overview of how IVM works
- CONFIGURATION.md — GUC options to tune limits
- TROUBLESHOOTING.md — Diagnosing refresh problems
- ERRORS.md — Error reference
Performance Tuning Cookbook
This document is a practical, recipe-oriented guide to squeezing the best throughput and latency out of pg_trickle stream tables. Each recipe describes why a problem occurs, when to apply it, and how to implement the fix.
Table of Contents
- Choosing the Right Refresh Mode
- Tuning the Scheduler Interval
- Controlling Change-Buffer Growth
- Accelerating Wide-Join Queries
- Reducing Lock Contention
- Managing Spill-to-Disk in Large Deltas
- Speeding Up FULL Refresh with Parallelism
- Monitoring with Prometheus
- Partition-Aware Stream Tables
- Adaptive Threshold Tuning
- Canary Testing Query Changes
- Recovering from Stale Stream Tables
1. Choosing the Right Refresh Mode
Problem: DIFFERENTIAL refresh is slower than expected, or FULL refresh keeps being chosen by the adaptive engine when you expect DIFFERENTIAL.
Diagnosis: Run the diagnostics helper:
SELECT * FROM pgtrickle.diagnose_stream_table('public.orders_mv');
Look at recommended_mode, composite_score, and change_ratio_current.
Recipe — Force DIFFERENTIAL for low-churn tables:
SELECT pgtrickle.alter_stream_table(
'public.orders_mv',
refresh_mode => 'DIFFERENTIAL'
);
Use this when:
change_ratio_current < 0.05(less than 5% of rows change per tick)- The query has no DISTINCT, EXCEPT, or INTERSECT at the top level
- The table has a suitable covering index on the join/group-by columns
Recipe — Force FULL for high-churn or complex queries:
SELECT pgtrickle.alter_stream_table(
'public.summary_mv',
refresh_mode => 'FULL'
);
Use this when:
change_ratio_current > 0.30- The query contains
WITH RECURSIVE, complex GROUPING SETS, or multiple correlated subqueries
Recipe — Use AUTO (recommended default):
SELECT pgtrickle.alter_stream_table(
'public.orders_mv',
refresh_mode => 'AUTO'
);
AUTO switches between FULL and DIFFERENTIAL each cycle based on the
adaptive cost model (pg_trickle.cost_model_safety_margin).
2. Tuning the Scheduler Interval
Problem: Stream tables are falling behind the source; refreshes are not running often enough. Or conversely, the scheduler is running too frequently, creating unnecessary load.
Diagnosis:
-- Check average staleness across all active stream tables
SELECT pgt_name, staleness_seconds
FROM pgtrickle.st_refresh_stats()
ORDER BY staleness_seconds DESC NULLS LAST;
Recipe — Reduce the poll interval for fresher data:
-- In postgresql.conf or via ALTER SYSTEM:
ALTER SYSTEM SET pg_trickle.scheduler_interval_ms = 250;
SELECT pg_reload_conf();
Minimum safe value: 250 ms. Below this, CPU overhead from the scheduler
loop becomes noticeable.
Recipe — Set a per-table schedule:
-- Refresh every 30 seconds
SELECT pgtrickle.alter_stream_table('public.orders_mv', schedule => '30s');
-- Refresh using a cron expression (every 5 minutes)
SELECT pgtrickle.alter_stream_table('public.daily_agg', schedule => '*/5 * * * *');
Per-table schedules override the global poll interval for that stream table.
3. Controlling Change-Buffer Growth
Problem: The change buffer schema (pgtrickle_changes.*) keeps growing
and consuming disk space.
Diagnosis:
SELECT schemaname, tablename,
pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size
FROM pg_tables
WHERE schemaname = current_setting('pg_trickle.change_buffer_schema', true)
::text
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC
LIMIT 20;
Recipe — Reduce the WAL-to-buffer retention window:
-- Advance the frontier faster by refreshing more frequently
SELECT pgtrickle.alter_stream_table('public.orders_mv', schedule => '5s');
pg_trickle deletes change-buffer rows once every stream table that references the source has consumed them. Slow stream tables block cleanup.
Recipe — Enable truncate-based cleanup (faster for large buffers):
ALTER SYSTEM SET pg_trickle.cleanup_use_truncate = on;
SELECT pg_reload_conf();
Uses TRUNCATE instead of DELETE when cleaning up entire partitioned
change-buffer tables. Avoids bloat from frequent deletes.
4. Accelerating Wide-Join Queries
Problem: DIFFERENTIAL refresh on a query with 5+ table joins is slow.
Diagnosis:
-- Check the join scan count
SELECT pgtrickle.validate_query($$ SELECT … FROM a JOIN b JOIN c … $$);
Recipe — Enable planner hints for wide joins:
ALTER SYSTEM SET pg_trickle.planner_aggressive = on;
ALTER SYSTEM SET pg_trickle.merge_planner_hints = on;
SELECT pg_reload_conf();
This sets SET LOCAL enable_seqscan = off and SET LOCAL join_collapse_limit = 1
before the MERGE execution, forcing the planner to use indexes.
Recipe — Limit differential join depth:
SELECT pgtrickle.alter_stream_table(
'public.complex_mv',
max_differential_joins => 4
);
When join count exceeds max_differential_joins, pg_trickle falls back to
FULL refresh instead of failing with a planning error.
Recipe — Add covering indexes on join keys:
-- The differential engine joins on __pgt_row_id; ensure the join keys
-- are indexed in both the storage table and source tables.
CREATE INDEX CONCURRENTLY ON orders (customer_id, order_date);
CREATE INDEX CONCURRENTLY ON customers (id) INCLUDE (name, region);
5. Reducing Lock Contention
Problem: lock timeout errors appear in pgt_refresh_history, or
queries against the stream table are blocked during refresh.
Diagnosis:
SELECT * FROM pgtrickle.diagnose_errors('public.orders_mv') LIMIT 10;
Look for error_type = 'performance' with lock timeout in error_message.
Recipe — Increase lock timeout:
ALTER SYSTEM SET pg_trickle.lock_timeout = '5s';
SELECT pg_reload_conf();
Recipe — Use APPEND_ONLY mode for insert-only pipelines:
SELECT pgtrickle.alter_stream_table(
'public.events_mv',
append_only => true
);
APPEND_ONLY skips the MERGE and uses a fast INSERT … SELECT which
holds locks for a much shorter time.
Recipe — Use pooler compatibility mode:
SELECT pgtrickle.alter_stream_table(
'public.orders_mv',
pooler_compatibility_mode => true
);
Disables prepared-statement reuse, which can cause issues with PgBouncer in transaction-pool mode.
6. Managing Spill-to-Disk in Large Deltas
Problem: Differential refresh writes large amounts of temp data, causing performance degradation.
Diagnosis:
SELECT pgt_name, last_temp_blks_written
FROM pgtrickle.st_refresh_stats();
Recipe — Increase work_mem for MERGE operations:
ALTER SYSTEM SET pg_trickle.merge_work_mem_mb = 256;
SELECT pg_reload_conf();
Recipe — Set a spill threshold to auto-switch to FULL:
-- Force FULL refresh after 3 consecutive spilling differentials
ALTER SYSTEM SET pg_trickle.spill_threshold_blocks = 10000;
ALTER SYSTEM SET pg_trickle.spill_consecutive_limit = 3;
SELECT pg_reload_conf();
After spill_consecutive_limit consecutive differential refreshes that
write more than spill_threshold_blocks temp blocks, pg_trickle switches
to FULL refresh for that stream table.
7. Speeding Up FULL Refresh with Parallelism
Problem: FULL refresh is slow due to large source tables.
Recipe — Enable parallel query for FULL refresh:
-- Allow more parallel workers
ALTER SYSTEM SET max_parallel_workers_per_gather = 8;
ALTER SYSTEM SET parallel_tuple_cost = 0.01;
SELECT pg_reload_conf();
pg_trickle uses INSERT INTO … SELECT … which respects the standard
PostgreSQL parallel query settings.
Recipe — Enable partition-parallel refresh:
SELECT pgtrickle.alter_stream_table(
'public.orders_mv',
partition_by => 'region'
);
With partition_by, pg_trickle dispatches one refresh worker per partition,
running them in parallel.
8. Monitoring with Prometheus
Problem: You want to monitor pg_trickle metrics with Prometheus.
Recipe — Enable the built-in metrics endpoint (v0.21.0+):
ALTER SYSTEM SET pg_trickle.metrics_port = 9188;
SELECT pg_reload_conf();
Then configure Prometheus to scrape:
scrape_configs:
- job_name: pg_trickle
static_configs:
- targets: ['localhost:9188']
metrics_path: /metrics
Available metrics:
| Metric | Type | Description |
|---|---|---|
pg_trickle_refreshes_total | counter | Successful refreshes per stream table |
pg_trickle_refresh_failures_total | counter | Failed refreshes per stream table |
pg_trickle_rows_changed_total | counter | Rows inserted + deleted per table |
pg_trickle_consecutive_errors | gauge | Current error streak per table |
pg_trickle_active | gauge | 1 if ACTIVE, 0 otherwise |
Recipe — Check staleness via SQL (for custom alerting):
SELECT pgt_name, staleness_seconds, stale
FROM pgtrickle.st_refresh_stats()
WHERE stale = true;
9. Partition-Aware Stream Tables
Problem: A stream table over a large partitioned source is slow to refresh.
Recipe — Mirror source partitioning:
-- If the source is RANGE partitioned by order_date:
SELECT pgtrickle.create_stream_table(
'public.orders_by_region',
'SELECT customer_id, SUM(total) FROM orders GROUP BY customer_id',
partition_by => 'customer_id'
);
Recipe — Per-partition MERGE for HASH-partitioned targets:
pg_trickle automatically uses per-partition MERGE when the stream table is HASH-partitioned. No additional configuration is needed; the optimizer routes each row to the correct partition.
10. Adaptive Threshold Tuning
Problem: The adaptive engine keeps switching between FULL and DIFFERENTIAL unexpectedly.
Recipe — Widen the dead zone (less switching):
-- Require a 30% score difference before switching (default: 20%)
ALTER SYSTEM SET pg_trickle.cost_model_safety_margin = 0.30;
SELECT pg_reload_conf();
Recipe — Use self-monitoring analytics to auto-tune:
-- Let pg_trickle automatically apply threshold recommendations
ALTER SYSTEM SET pg_trickle.self_monitoring_auto_apply = 'threshold_only';
SELECT pg_reload_conf();
With threshold_only, pg_trickle applies max_delta_fraction changes
from pgtrickle.df_threshold_advice when confidence is HIGH.
11. Canary Testing Query Changes
Problem: You want to change a stream table's defining query safely without impacting production.
Recipe — Use canary/shadow mode (v0.21.0+):
-- 1. Create a canary table with the new query
SELECT pgtrickle.canary_begin(
'public.orders_mv',
'SELECT customer_id, COUNT(*), SUM(total) FROM orders GROUP BY customer_id'
);
-- 2. Wait for the canary to populate (check status)
SELECT status FROM pgtrickle.pgt_stream_tables
WHERE pgt_name = '__pgt_canary_orders_mv';
-- 3. Compare live vs canary output
SELECT * FROM pgtrickle.canary_diff('public.orders_mv');
-- 4. If diff is empty (or acceptable), promote the canary
SELECT pgtrickle.canary_promote('public.orders_mv');
The canary_diff result will be empty when both the old and new queries
produce identical output for the current source data.
12. Recovering from Stale Stream Tables
Problem: A stream table is SUSPENDED or has a large backlog of changes.
Recipe — Pause all tables, catch up, then resume:
SELECT pgtrickle.pause_all();
-- Investigate
SELECT pgt_name, status, consecutive_errors
FROM pgtrickle.pgt_stream_tables
ORDER BY consecutive_errors DESC;
-- Fix the root cause, then resume
SELECT pgtrickle.resume_all();
Recipe — Force immediate refresh on stale tables:
-- Refresh only if older than 10 minutes
SELECT pgtrickle.refresh_if_stale('public.orders_mv', '10 minutes');
Recipe — Full reinitialization after schema change:
-- If source schema changed, reinitialize to rebuild column metadata
SELECT pgtrickle.reinitialize_stream_table('public.orders_mv');
13. DVM Query Complexity Limits
Problem: Differential refresh is slower than full refresh for complex queries, especially at scale. Understanding when DIFFERENTIAL mode breaks down helps you choose the right strategy.
Three Failure Mode Categories
| Category | SQL Pattern | Symptom | Root Cause |
|---|---|---|---|
| Threshold Collapse | 4+ table JOINs with cascading EXCEPT ALL | Fast at small scale, 100–260× slower per data decade | Intermediate CTE cardinality blowup: O(n²) row generation from L₀ snapshot expansion |
| Early Collapse | EXISTS anti-join with non-equi predicates | 140× jump at first 10× scale step, then stable | Equi-join key filter not applied correctly; R_old EXCEPT ALL scans full table |
| Structural Bug | Doubly-nested correlated EXISTS / NOT EXISTS | Slow at all scales (constant ~2s overhead) | Inner R_old re-materialized per outer delta row: O(Δ_outer × n_inner) |
Which SQL Patterns Trigger Each Category
Threshold Collapse (queries like TPC-H Q05, Q07, Q08, Q09):
- Multi-table joins (4+ tables) using the cascading
EXCEPT ALLdelta strategy - Queries with many intermediate join nodes generate exponential intermediate rows
- Diagnosis:
pgtrickle.log_delta_sql = on+EXPLAIN (ANALYZE, BUFFERS)
Early Collapse (queries like TPC-H Q04):
WHERE EXISTS (SELECT 1 FROM t WHERE t.key = outer.key AND t.col < t.col2)- The non-equi predicates in the EXISTS clause can prevent key-filter extraction
- Diagnosis: Check if
R_oldCTE scans the full right table
Structural Bug (queries like TPC-H Q20):
WHERE EXISTS (SELECT 1 FROM t1 WHERE EXISTS (SELECT 1 FROM t2 WHERE ...))- Inner snapshot CTEs are re-evaluated per outer row instead of shared
- Diagnosis: Look for repeated CTE evaluations in
EXPLAIN ANALYZE
Recommended Scale Factors
| Pattern | Safe for DIFF | Use FULL above |
|---|---|---|
| Simple scan/filter | Any scale | — |
| 2-table JOIN | Up to ~10M rows | — |
| 3-table JOIN | Up to ~1M rows | ~10M rows |
| 4+ table JOIN | Up to ~100K rows | ~1M rows |
| EXISTS anti-join | Up to ~100K rows | ~1M rows |
| Nested EXISTS | Use FULL mode | — |
Diagnosing Your Query
-- 1. Enable delta SQL logging
SET pg_trickle.log_delta_sql = on;
-- 2. Trigger a manual refresh
SELECT pgtrickle.refresh_stream_table('my_stream_table');
-- 3. Check the PostgreSQL log for the generated delta SQL
-- 4. Run EXPLAIN (ANALYZE, BUFFERS) on the captured SQL
-- 5. Look for:
-- - Nested Loop joins on large tables (threshold collapse)
-- - Sequential scans on R_old CTEs (early collapse)
-- - Repeated CTE evaluations (structural bug)
-- Use explain_diff_sql() to inspect without executing:
SELECT pgtrickle.explain_diff_sql('my_stream_table');
Mitigation GUCs
-- Increase work_mem for delta execution
SET pg_trickle.delta_work_mem = 256; -- MB
-- Disable nested loops for delta execution
SET pg_trickle.delta_enable_nestloop = off;
-- Run ANALYZE on change buffers (enabled by default)
SET pg_trickle.analyze_before_delta = on;
Worked Example A — max_diff_ctes Hit and Recovery
Symptom: EXPLAIN ANALYZE shows more than pg_trickle.max_diff_ctes
(default 64) CTEs in the generated delta SQL, and the refresh falls back to
FULL mode with a warning in pgt_refresh_history.
Diagnosis:
-- Check the warning in the refresh history
SELECT pgt_name, refresh_mode, rows_in_last_refresh, warning_message
FROM pgtrickle.pgt_refresh_history
WHERE pgt_name = 'my_complex_view'
ORDER BY started_at DESC
LIMIT 5;
-- Inspect the generated delta SQL
SELECT pgtrickle.explain_diff_sql('my_complex_view');
-- Count the CTE blocks in the output
Recovery steps:
-- Option 1: Raise the limit (accept higher delta execution cost)
ALTER SYSTEM SET pg_trickle.max_diff_ctes = 128;
SELECT pg_reload_conf();
-- Option 2: Simplify the query — split a complex view into two stream tables
-- First level: join + filter
SELECT pgtrickle.create_stream_table(
'orders_with_products',
'SELECT o.*, p.name AS product_name FROM orders o JOIN products p ON p.id = o.product_id',
'10s', 'DIFFERENTIAL'
);
-- Second level: aggregate over the first
SELECT pgtrickle.create_stream_table(
'revenue_summary',
'SELECT product_name, SUM(amount) AS total FROM orders_with_products GROUP BY product_name',
'15s', 'DIFFERENTIAL'
);
-- Option 3: Force FULL mode for queries that genuinely exceed complexity budget
SELECT pgtrickle.alter_stream_table('my_complex_view', refresh_mode => 'FULL');
Worked Example B — Detecting When FULL Beats DIFFERENTIAL
Symptom: The AUTO cost model keeps switching between FULL and DIFFERENTIAL
every few cycles, or diff_speedup from refresh_efficiency() is below 1.5×.
Diagnosis using recommend_refresh_mode():
-- Get the weighted signal breakdown for the table
SELECT
pgt_name,
current_mode,
recommended_mode,
confidence,
reason,
jsonb_pretty(signals) AS signals
FROM pgtrickle.recommend_refresh_mode('my_table');
Examine the signals output. Key indicators that FULL is better:
| Signal | Value that favours FULL |
|---|---|
change_ratio_avg | > 0.30 (>30% of rows change per tick) |
empirical_timing | DIFF and FULL latency within 10% |
latency_variance | p95/p50 > 3 for DIFFERENTIAL |
query_complexity | Score < 0 (many joins / CTEs) |
Apply the recommendation:
-- Switch to FULL when composite_score < -0.15
SELECT pgtrickle.alter_stream_table('my_table', refresh_mode => 'FULL');
-- Switch to AUTO and let the cost model decide going forward
SELECT pgtrickle.alter_stream_table('my_table', refresh_mode => 'AUTO');
-- Set the switching dead-zone wider to reduce oscillation
ALTER SYSTEM SET pg_trickle.cost_model_safety_margin = 0.25;
SELECT pg_reload_conf();
Worked Example C — Deep-Join Chain and max_differential_joins
Symptom: A stream table with a 6-way JOIN is slow in DIFFERENTIAL mode, even though individual tables are small.
Diagnosis:
-- Enable delta SQL logging and trigger a refresh
SET pg_trickle.log_delta_sql = on;
SELECT pgtrickle.refresh_stream_table('deep_join_view');
-- Check effective join count reported by the engine
SELECT pgt_name, query_join_depth, last_refresh_mode_reason
FROM pgtrickle.pgt_stream_tables
WHERE pgt_name = 'deep_join_view';
Understanding the GUC:
pg_trickle.max_differential_joins (default: 4) sets the maximum number of
right-side scan expansions the delta engine will attempt before falling back
to FULL mode. Each additional join roughly doubles the number of delta CTE
branches generated.
Tuning steps:
-- Option 1: Allow deeper join differentiation (accept higher delta cost)
ALTER SYSTEM SET pg_trickle.max_differential_joins = 6;
SELECT pg_reload_conf();
-- Option 2: Intermediate stream table to break the join chain
-- Split a 6-way join into two 3-way steps:
SELECT pgtrickle.create_stream_table(
'join_layer_1',
$$SELECT a.*, b.val AS b_val, c.val AS c_val
FROM table_a a
JOIN table_b b ON b.id = a.b_id
JOIN table_c c ON c.id = a.c_id$$,
'5s', 'DIFFERENTIAL'
);
SELECT pgtrickle.create_stream_table(
'join_layer_2',
$$SELECT l1.*, d.val AS d_val, e.val AS e_val, f.val AS f_val
FROM join_layer_1 l1
JOIN table_d d ON d.id = l1.d_id
JOIN table_e e ON e.id = l1.e_id
JOIN table_f f ON f.id = l1.f_id$$,
'10s', 'DIFFERENTIAL'
);
-- Option 3: Verify the deep-join fast-path is eligible for a given query
SELECT pgtrickle.validate_query(
$$<your deep-join query>$$
);
Breaking the join chain into two stream tables reduces each step to ≤3
right-side expansions, well within the default max_differential_joins = 4
limit.
See Also
- docs/CONFIGURATION.md — full GUC reference
- docs/SQL_REFERENCE.md — SQL function reference
- docs/TROUBLESHOOTING.md — common error messages and fixes
- docs/BENCHMARK.md — benchmark results and methodology
- docs/SCALING.md — guidance for large deployments
Performance Cheat Sheet
Quick reference for pg_trickle performance tuning. For in-depth explanations, see CONFIGURATION.md and PERFORMANCE_COOKBOOK.md.
Three Golden Rules
-
Measure before tuning. Use
pgtrickle.explain_st('my_table')andpgtrickle.refresh_history('my_table')to understand where time is spent before adjusting any GUC. -
DIFFERENTIAL is almost always faster for Δ-small changes. Only switch to FULL mode when the change-to-table ratio is high (> 15%) or the delta SQL is genuinely slower than a full scan on your data distribution.
-
The bottleneck is usually the change buffer or
work_mem. If refreshes are slow, checkpg_stat_statementsfor temp block writes before adjusting schedule frequency.
Top-10 GUC Quick Wins
| GUC | Default | Tune when... | Recommended value |
|---|---|---|---|
pg_trickle.parallel_refresh_mode | 'on' | You have many independent stream tables | Keep 'on'; set 'off' only to debug |
pg_trickle.max_concurrent_refreshes | 4 | Parallelism is bottlenecked or over-saturating I/O | Set to number of independent DAG branches (2–8 typical) |
pg_trickle.differential_max_change_ratio | 0.15 | DIFF is slower than FULL at peak write rates | Raise to 0.30–0.50 on high-churn workloads |
pg_trickle.delta_work_mem | 0 (inherit) | Refresh spills temp blocks | Set to 128 (MB) for complex joins; 256+ for large aggregates |
pg_trickle.analyze_before_delta | true | Planner picks bad plans on stale stats | Keep true; set false only if ANALYZE overhead is measurable |
pg_trickle.aggregate_fast_path | true | Aggregation refreshes are slow | Keep true (uses explicit DML instead of MERGE for simple aggregates) |
pg_trickle.scheduler_interval_ms | 1000 | Scheduler CPU overhead is high | Raise to 5000–10000 on clusters with 100+ stream tables |
pg_trickle.cleanup_use_truncate | true | Change buffer cleanup causes lock contention | Set false if TRUNCATE AccessExclusiveLock conflicts with source DML |
pg_trickle.tiered_scheduling | true | Cold stream tables waste CPU cycles | Keep true (prevents cold STs from refreshing at full speed) |
pg_trickle.max_delta_estimate_rows | 0 | OOM or excessive temp spill on large deltas | Set to 100000–500000 to cap delta size and trigger FULL fallback |
5 FULL-Fallback Patterns and How to Fix Them
These patterns cause pg_trickle to fall back to FULL refresh automatically. Each can often be rewritten for DIFFERENTIAL support.
Pattern 1: Volatile function in SELECT
-- ❌ Forces FULL: now() is volatile
SELECT id, created_at, now() - created_at AS age FROM orders;
-- ✅ DIFFERENTIAL: compute age in the source table or exclude it
SELECT id, created_at FROM orders;
-- Then compute age in the application or in a wrapper view
Pattern 2: ORDER BY without LIMIT
-- ❌ Forces FULL: full sort on every refresh
SELECT customer_id, SUM(amount) AS total
FROM orders GROUP BY customer_id
ORDER BY total DESC;
-- ✅ DIFFERENTIAL: remove ORDER BY (sort in the query layer)
SELECT customer_id, SUM(amount) AS total
FROM orders GROUP BY customer_id;
-- Sort in the SELECT: SELECT * FROM my_stream ORDER BY total DESC
Pattern 3: Non-equi join
-- ❌ Forces FULL: range join cannot be differentiated
SELECT e.*, s.salary_band
FROM employees e
JOIN salary_bands s ON e.salary BETWEEN s.min AND s.max;
-- ✅ DIFFERENTIAL: pre-classify salary_band in the source table
ALTER TABLE employees ADD COLUMN salary_band TEXT;
-- Update via trigger or background job
SELECT e.*, e.salary_band FROM employees e;
Pattern 4: ARRAY_AGG / STRING_AGG
-- ❌ Forces FULL: order-dependent aggregate
SELECT customer_id, STRING_AGG(product, ', ') AS products
FROM order_lines GROUP BY customer_id;
-- ✅ DIFFERENTIAL: use COUNT or a separate denormalized column
SELECT customer_id, COUNT(*) AS product_count
FROM order_lines GROUP BY customer_id;
-- If you need the array: maintain it in the source table
Pattern 5: Window function in output
-- ❌ Forces FULL: window function requires global ordering
SELECT customer_id, amount,
RANK() OVER (ORDER BY amount DESC) AS rank
FROM orders;
-- ✅ Use a Top-N stream table instead
SELECT pgtrickle.create_stream_table(
name => 'top_orders',
query => 'SELECT customer_id, amount FROM orders ORDER BY amount DESC LIMIT 100',
refresh_mode => 'DIFFERENTIAL' -- pg_trickle supports ORDER BY LIMIT
);
Refresh Latency Quick Diagnostics
-- See last N refresh durations for a stream table
SELECT started_at, duration_ms, mode, rows_changed
FROM pgtrickle.refresh_history('my_stream_table', limit => 20)
ORDER BY started_at DESC;
-- Check if delta is spilling to disk
SELECT query, temp_blks_written
FROM pg_stat_statements
WHERE query LIKE '%pgtrickle_changes%'
ORDER BY temp_blks_written DESC
LIMIT 10;
-- See the generated delta SQL
SELECT pgtrickle.explain_diff_sql('my_stream_table');
-- Check change buffer size
SELECT schemaname, tablename, n_live_tup
FROM pg_stat_user_tables
WHERE schemaname = 'pgtrickle_changes'
ORDER BY n_live_tup DESC
LIMIT 10;
See Also
- CONFIGURATION.md — Full GUC reference
- PERFORMANCE_COOKBOOK.md — In-depth recipes
- MENTAL_MODEL.md — Why DIFFERENTIAL is fast
- LIMITATIONS.md — What forces FULL mode
SQL Reference
Complete reference for all SQL functions, views, and catalog tables provided by pgtrickle.
Table of Contents
- Functions
- Expression Support
- Conditional Expressions
- Comparison Operators
- Boolean Tests
- SQL Value Functions
- Array and Row Expressions
- Subquery Expressions
- Auto-Rewrite Pipeline
- HAVING Clause
- Tables Without Primary Keys (Keyless Tables)
- Volatile Function Detection
- COLLATE Expressions
- IS JSON Predicate (PostgreSQL 16+)
- SQL/JSON Constructors (PostgreSQL 16+)
- JSON_TABLE (PostgreSQL 17+)
- Unsupported Expression Types
- Restrictions & Interoperability
- Referencing Other Stream Tables
- Views as Sources in Defining Queries
- Partitioned Tables as Sources
- Foreign Tables as Sources
- IMMEDIATE Mode Query Restrictions
- Logical Replication Targets
- Views on Stream Tables
- Materialized Views on Stream Tables
- Logical Replication of Stream Tables
- Known Delta Computation Limitations
- What Is NOT Allowed
- Row-Level Security (RLS)
- Views
- Catalog Tables
- Delta SQL Profiling (v0.13.0)
- dbt Integration (v0.13.0)
- Stream Table Snapshots (v0.27.0)
- Transactional Outbox & Consumer Groups (v0.28.0)
- Transactional Inbox (v0.28.0)
Functions
Core Lifecycle
Create, modify, and manage the lifecycle of stream tables.
pgtrickle.create_stream_table
Create a new stream table.
pgtrickle.create_stream_table(
name text,
query text,
schedule text DEFAULT 'calculated',
refresh_mode text DEFAULT 'AUTO',
initialize bool DEFAULT true,
diamond_consistency text DEFAULT NULL,
diamond_schedule_policy text DEFAULT NULL,
cdc_mode text DEFAULT NULL,
append_only bool DEFAULT false,
pooler_compatibility_mode bool DEFAULT false
) → void
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
name | text | — | Name of the stream table. May be schema-qualified (myschema.my_st). Defaults to public schema. |
query | text | — | The defining SQL query. Must be a valid SELECT statement using supported operators. |
schedule | text | 'calculated' | Refresh schedule as a Prometheus/GNU-style duration string (e.g., '30s', '5m', '1h', '1h30m', '1d') or a cron expression (e.g., '*/5 * * * *', '@hourly'). Use 'calculated' for CALCULATED mode (inherits schedule from downstream dependents). |
refresh_mode | text | 'AUTO' | 'AUTO' (adaptive — uses DIFFERENTIAL when possible, falls back to FULL if the query is not differentiable), 'FULL' (truncate and reload), 'DIFFERENTIAL' (apply delta only — errors if the query is not differentiable), or 'IMMEDIATE' (synchronous in-transaction maintenance via statement-level triggers). |
initialize | bool | true | If true, populates the table immediately via a full refresh. If false, creates the table empty. |
diamond_consistency | text | NULL (defaults to 'atomic') | Diamond dependency consistency mode: 'atomic' (SAVEPOINT-based atomic group refresh) or 'none' (independent refresh). |
diamond_schedule_policy | text | NULL (defaults to 'fastest') | Schedule policy for atomic diamond groups: 'fastest' (fire when any member is due) or 'slowest' (fire when all are due). Set on the convergence node. |
cdc_mode | text | NULL (use pg_trickle.cdc_mode) | Optional per-stream-table CDC override: 'auto', 'trigger', or 'wal'. This affects all deferred TABLE sources of the stream table. |
append_only | bool | false | When true, differential refreshes use a fast INSERT path instead of MERGE. Skips DELETE/UPDATE/IS DISTINCT FROM checks. If a DELETE or Update is later detected in the change buffer, the flag is automatically reverted to false. Not compatible with FULL, IMMEDIATE, or keyless sources. |
pooler_compatibility_mode | bool | false | When true, the refresh engine uses inline SQL instead of PREPARE/EXECUTE and suppresses all NOTIFY emissions for this stream table. Enable this when the stream table is accessed through a transaction-mode connection pooler (e.g. PgBouncer). |
When refresh_mode => 'IMMEDIATE', the cluster-wide pg_trickle.cdc_mode
setting is ignored. IMMEDIATE mode always uses statement-level IVM triggers
instead of CDC triggers or WAL replication slots. If you explicitly pass
cdc_mode => 'wal' together with refresh_mode => 'IMMEDIATE', pg_trickle
rejects the call because WAL CDC is asynchronous and incompatible with
in-transaction maintenance.
Duration format:
| Unit | Suffix | Example |
|---|---|---|
| Seconds | s | '30s' |
| Minutes | m | '5m' |
| Hours | h | '2h' |
| Days | d | '1d' |
| Weeks | w | '1w' |
| Compound | — | '1h30m', '2m30s' |
Cron expression format:
schedule also accepts standard cron expressions for time-based scheduling. The scheduler refreshes the stream table when the cron schedule fires, rather than checking staleness.
| Format | Fields | Example | Description |
|---|---|---|---|
| 5-field | min hour dom mon dow | '*/5 * * * *' | Every 5 minutes |
| 6-field | sec min hour dom mon dow | '0 */5 * * * *' | Every 5 minutes at :00 seconds |
| Alias | — | '@hourly' | Every hour |
| Alias | — | '@daily' | Every day at midnight |
| Alias | — | '@weekly' | Every Sunday at midnight |
| Alias | — | '@monthly' | First of every month |
| Weekday range | — | '0 6 * * 1-5' | 6 AM on weekdays |
Note: Cron-scheduled stream tables do not participate in CALCULATED schedule resolution. The
stalecolumn in monitoring views returnsNULLfor cron-scheduled tables.
Example:
-- Duration-based: refresh when data is staler than 2 minutes (refresh_mode defaults to 'AUTO')
SELECT pgtrickle.create_stream_table(
name => 'order_totals',
query => 'SELECT region, SUM(amount) AS total FROM orders GROUP BY region',
schedule => '2m'
);
-- Cron-based: refresh every hour
SELECT pgtrickle.create_stream_table(
name => 'hourly_summary',
query => 'SELECT date_trunc(''hour'', ts), COUNT(*) FROM events GROUP BY 1',
schedule => '@hourly',
refresh_mode => 'FULL'
);
-- Cron-based: refresh at 6 AM on weekdays
SELECT pgtrickle.create_stream_table(
name => 'daily_report',
query => 'SELECT region, SUM(revenue) AS total FROM sales GROUP BY region',
schedule => '0 6 * * 1-5',
refresh_mode => 'FULL'
);
-- Immediate mode: maintained synchronously within the same transaction
-- No schedule needed — updates happen automatically when base table changes
SELECT pgtrickle.create_stream_table(
name => 'live_totals',
query => 'SELECT region, SUM(amount) AS total FROM orders GROUP BY region',
refresh_mode => 'IMMEDIATE'
);
-- Force WAL CDC for this stream table even if the global GUC is 'trigger'
SELECT pgtrickle.create_stream_table(
name => 'wal_orders',
query => 'SELECT id, amount FROM orders',
schedule => '1s',
refresh_mode => 'DIFFERENTIAL',
cdc_mode => 'wal'
);
Aggregate Examples:
All supported aggregate functions work in AUTO mode (and all other modes).
Examples below omit refresh_mode — the default 'AUTO' selects DIFFERENTIAL automatically.
Explicit modes are shown only when the mode itself is being demonstrated.
-- Algebraic aggregates (fully differential — no rescan needed)
SELECT pgtrickle.create_stream_table(
name => 'sales_summary',
query => 'SELECT region, COUNT(*) AS cnt, SUM(amount) AS total, AVG(amount) AS avg_amount
FROM orders GROUP BY region',
schedule => '1m'
);
-- Semi-algebraic aggregates (MIN/MAX)
SELECT pgtrickle.create_stream_table(
name => 'salary_ranges',
query => 'SELECT department, MIN(salary) AS min_sal, MAX(salary) AS max_sal
FROM employees GROUP BY department',
schedule => '2m'
);
-- Group-rescan aggregates (BOOL_AND/OR, STRING_AGG, ARRAY_AGG, JSON_AGG, JSONB_AGG,
-- BIT_AND, BIT_OR, BIT_XOR, JSON_OBJECT_AGG, JSONB_OBJECT_AGG,
-- STDDEV, STDDEV_POP, STDDEV_SAMP, VARIANCE, VAR_POP, VAR_SAMP,
-- MODE, PERCENTILE_CONT, PERCENTILE_DISC,
-- CORR, COVAR_POP, COVAR_SAMP, REGR_AVGX, REGR_AVGY,
-- REGR_COUNT, REGR_INTERCEPT, REGR_R2, REGR_SLOPE,
-- REGR_SXX, REGR_SXY, REGR_SYY, ANY_VALUE)
SELECT pgtrickle.create_stream_table(
name => 'team_members',
query => 'SELECT department,
STRING_AGG(name, '', '' ORDER BY name) AS members,
ARRAY_AGG(employee_id) AS member_ids,
BOOL_AND(active) AS all_active,
JSON_AGG(name) AS members_json
FROM employees
GROUP BY department',
schedule => '1m'
);
-- Bitwise aggregates
SELECT pgtrickle.create_stream_table(
name => 'permission_summary',
query => 'SELECT department,
BIT_OR(permissions) AS combined_perms,
BIT_AND(permissions) AS common_perms,
BIT_XOR(flags) AS xor_flags
FROM employees
GROUP BY department',
schedule => '1m'
);
-- JSON object aggregates
SELECT pgtrickle.create_stream_table(
name => 'config_map',
query => 'SELECT department,
JSON_OBJECT_AGG(setting_name, setting_value) AS settings,
JSONB_OBJECT_AGG(key, value) AS metadata
FROM config
GROUP BY department',
schedule => '1m'
);
-- Statistical aggregates
SELECT pgtrickle.create_stream_table(
name => 'salary_stats',
query => 'SELECT department,
STDDEV_POP(salary) AS sd_pop,
STDDEV_SAMP(salary) AS sd_samp,
VAR_POP(salary) AS var_pop,
VAR_SAMP(salary) AS var_samp
FROM employees
GROUP BY department',
schedule => '1m'
);
-- Ordered-set aggregates (MODE, PERCENTILE_CONT, PERCENTILE_DISC)
SELECT pgtrickle.create_stream_table(
name => 'salary_percentiles',
query => 'SELECT department,
MODE() WITHIN GROUP (ORDER BY grade) AS most_common_grade,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY salary) AS median_salary,
PERCENTILE_DISC(0.9) WITHIN GROUP (ORDER BY salary) AS p90_salary
FROM employees
GROUP BY department',
schedule => '1m'
);
-- Regression / correlation aggregates (CORR, COVAR_*, REGR_*)
SELECT pgtrickle.create_stream_table(
name => 'regression_stats',
query => 'SELECT department,
CORR(salary, experience) AS sal_exp_corr,
COVAR_POP(salary, experience) AS covar_pop,
COVAR_SAMP(salary, experience) AS covar_samp,
REGR_SLOPE(salary, experience) AS slope,
REGR_INTERCEPT(salary, experience) AS intercept,
REGR_R2(salary, experience) AS r_squared,
REGR_COUNT(salary, experience) AS regr_n
FROM employees
GROUP BY department',
schedule => '1m'
);
-- ANY_VALUE aggregate (PostgreSQL 16+)
SELECT pgtrickle.create_stream_table(
name => 'dept_sample',
query => 'SELECT department, ANY_VALUE(office_location) AS sample_office
FROM employees GROUP BY department',
schedule => '1m'
);
-- FILTER clause on aggregates
SELECT pgtrickle.create_stream_table(
name => 'order_metrics',
query => 'SELECT region,
COUNT(*) AS total,
COUNT(*) FILTER (WHERE status = ''active'') AS active_count,
SUM(amount) FILTER (WHERE status = ''shipped'') AS shipped_total
FROM orders
GROUP BY region',
schedule => '1m'
);
-- PgBouncer compatibility (transaction-mode pooler)
SELECT pgtrickle.create_stream_table(
name => 'pooled_orders',
query => 'SELECT id, amount FROM orders',
schedule => '5m',
pooler_compatibility_mode => true
);
CTE Examples:
Non-recursive CTEs are fully supported in both FULL and DIFFERENTIAL modes:
-- Simple CTE
SELECT pgtrickle.create_stream_table(
name => 'active_order_totals',
query => 'WITH active_users AS (
SELECT id, name FROM users WHERE active = true
)
SELECT a.id, a.name, SUM(o.amount) AS total
FROM active_users a
JOIN orders o ON o.user_id = a.id
GROUP BY a.id, a.name',
schedule => '1m'
);
-- Chained CTEs (CTE referencing another CTE)
SELECT pgtrickle.create_stream_table(
name => 'top_regions',
query => 'WITH regional AS (
SELECT region, SUM(amount) AS total FROM orders GROUP BY region
),
ranked AS (
SELECT region, total FROM regional WHERE total > 1000
)
SELECT * FROM ranked',
schedule => '2m'
);
-- Multi-reference CTE (referenced twice in FROM — shared delta optimization)
SELECT pgtrickle.create_stream_table(
name => 'self_compare',
query => 'WITH totals AS (
SELECT user_id, SUM(amount) AS total FROM orders GROUP BY user_id
)
SELECT t1.user_id, t1.total, t2.total AS next_total
FROM totals t1
JOIN totals t2 ON t1.user_id = t2.user_id + 1',
schedule => '1m'
);
-- Append-only stream table (INSERT-only fast path)
SELECT pgtrickle.create_stream_table(
name => 'event_log_st',
query => 'SELECT id, event_type, payload, created_at FROM events',
schedule => '30s',
append_only => true
);
Recursive CTEs work with FULL, DIFFERENTIAL, and IMMEDIATE modes:
-- Recursive CTE (hierarchy traversal)
SELECT pgtrickle.create_stream_table(
name => 'category_tree',
query => 'WITH RECURSIVE cat_tree AS (
SELECT id, name, parent_id, 0 AS depth
FROM categories WHERE parent_id IS NULL
UNION ALL
SELECT c.id, c.name, c.parent_id, ct.depth + 1
FROM categories c
JOIN cat_tree ct ON c.parent_id = ct.id
)
SELECT * FROM cat_tree',
schedule => '5m',
refresh_mode => 'FULL' -- FULL mode: standard re-execution
);
-- Recursive CTE with DIFFERENTIAL mode (incremental semi-naive / DRed)
SELECT pgtrickle.create_stream_table(
name => 'org_chart',
query => 'WITH RECURSIVE reports AS (
SELECT id, name, manager_id FROM employees WHERE manager_id IS NULL
UNION ALL
SELECT e.id, e.name, e.manager_id
FROM employees e JOIN reports r ON e.manager_id = r.id
)
SELECT * FROM reports',
schedule => '2m',
refresh_mode => 'DIFFERENTIAL' -- Uses semi-naive, DRed, or recomputation (auto-selected)
);
-- Recursive CTE with IMMEDIATE mode (same-transaction maintenance)
SELECT pgtrickle.create_stream_table(
name => 'org_chart_live',
query => 'WITH RECURSIVE reports AS (
SELECT id, name, manager_id FROM employees WHERE manager_id IS NULL
UNION ALL
SELECT e.id, e.name, e.manager_id
FROM employees e JOIN reports r ON e.manager_id = r.id
)
SELECT * FROM reports',
refresh_mode => 'IMMEDIATE' -- Uses transition tables with semi-naive / DRed maintenance
);
Non-monotone recursive terms: If the recursive term contains operators like
EXCEPT, aggregate functions, window functions,DISTINCT,INTERSECT(set), or anti-joins, the system automatically falls back to recomputation to guarantee correctness. Semi-naive and DRed strategies require monotone recursive terms (JOIN, UNION ALL, filter/project only).
Set Operation Examples:
INTERSECT, INTERSECT ALL, EXCEPT, EXCEPT ALL, UNION, and UNION ALL are supported:
-- INTERSECT: customers who placed orders in BOTH regions
SELECT pgtrickle.create_stream_table(
name => 'bi_region_customers',
query => 'SELECT customer_id FROM orders_east
INTERSECT
SELECT customer_id FROM orders_west',
schedule => '2m'
);
-- INTERSECT ALL: preserves duplicates (bag semantics)
SELECT pgtrickle.create_stream_table(
name => 'common_items',
query => 'SELECT item_name FROM warehouse_a
INTERSECT ALL
SELECT item_name FROM warehouse_b',
schedule => '1m'
);
-- EXCEPT: orders not yet shipped
SELECT pgtrickle.create_stream_table(
name => 'unshipped_orders',
query => 'SELECT order_id FROM orders
EXCEPT
SELECT order_id FROM shipments',
schedule => '1m'
);
-- EXCEPT ALL: preserves duplicate counts (bag subtraction)
SELECT pgtrickle.create_stream_table(
name => 'excess_inventory',
query => 'SELECT sku FROM stock_received
EXCEPT ALL
SELECT sku FROM stock_shipped',
schedule => '5m'
);
-- UNION: deduplicated merge of two sources
SELECT pgtrickle.create_stream_table(
name => 'all_contacts',
query => 'SELECT email FROM customers
UNION
SELECT email FROM newsletter_subscribers',
schedule => '5m'
);
LATERAL Set-Returning Function Examples:
Set-returning functions (SRFs) in the FROM clause are supported in both FULL and DIFFERENTIAL modes. Common SRFs include jsonb_array_elements, jsonb_each, jsonb_each_text, and unnest:
-- Flatten JSONB arrays into rows
SELECT pgtrickle.create_stream_table(
name => 'flat_children',
query => 'SELECT p.id, child.value AS val
FROM parent_data p,
jsonb_array_elements(p.data->''children'') AS child',
schedule => '1m'
);
-- Expand JSONB key-value pairs (multi-column SRF)
SELECT pgtrickle.create_stream_table(
name => 'flat_properties',
query => 'SELECT d.id, kv.key, kv.value
FROM documents d,
jsonb_each(d.metadata) AS kv',
schedule => '2m'
);
-- Unnest arrays
SELECT pgtrickle.create_stream_table(
name => 'flat_tags',
query => 'SELECT t.id, tag.tag
FROM tagged_items t,
unnest(t.tags) AS tag(tag)',
schedule => '1m'
);
-- SRF with WHERE filter
SELECT pgtrickle.create_stream_table(
name => 'high_value_items',
query => 'SELECT p.id, (e.value)::int AS amount
FROM products p,
jsonb_array_elements(p.prices) AS e
WHERE (e.value)::int > 100',
schedule => '5m'
);
-- SRF combined with aggregation
SELECT pgtrickle.create_stream_table(
name => 'element_counts',
query => 'SELECT a.id, count(*) AS cnt
FROM arrays a,
jsonb_array_elements(a.data) AS e
GROUP BY a.id',
schedule => '1m',
refresh_mode => 'FULL'
);
LATERAL Subquery Examples:
LATERAL subqueries in the FROM clause are supported in both FULL and DIFFERENTIAL modes. Use them for top-N per group, correlated aggregation, and conditional expansion:
-- Top-N per group: latest item per order
SELECT pgtrickle.create_stream_table(
name => 'latest_items',
query => 'SELECT o.id, o.customer, latest.amount
FROM orders o,
LATERAL (
SELECT li.amount
FROM line_items li
WHERE li.order_id = o.id
ORDER BY li.created_at DESC
LIMIT 1
) AS latest',
schedule => '1m'
);
-- Correlated aggregate
SELECT pgtrickle.create_stream_table(
name => 'dept_summaries',
query => 'SELECT d.id, d.name, stats.total, stats.cnt
FROM departments d,
LATERAL (
SELECT SUM(e.salary) AS total, COUNT(*) AS cnt
FROM employees e
WHERE e.dept_id = d.id
) AS stats',
schedule => '1m'
);
-- LEFT JOIN LATERAL: preserve outer rows with NULLs when subquery returns no rows
SELECT pgtrickle.create_stream_table(
name => 'dept_stats_all',
query => 'SELECT d.id, d.name, stats.total
FROM departments d
LEFT JOIN LATERAL (
SELECT SUM(e.salary) AS total
FROM employees e
WHERE e.dept_id = d.id
) AS stats ON true',
schedule => '1m'
);
WHERE Subquery Examples:
Subqueries in the WHERE clause are automatically transformed into semi-join, anti-join, or scalar subquery operators in the DVM operator tree:
-- EXISTS subquery: customers who have placed orders
SELECT pgtrickle.create_stream_table(
name => 'active_customers',
query => 'SELECT c.id, c.name
FROM customers c
WHERE EXISTS (SELECT 1 FROM orders o WHERE o.customer_id = c.id)',
schedule => '1m'
);
-- NOT EXISTS: customers with no orders
SELECT pgtrickle.create_stream_table(
name => 'inactive_customers',
query => 'SELECT c.id, c.name
FROM customers c
WHERE NOT EXISTS (SELECT 1 FROM orders o WHERE o.customer_id = c.id)',
schedule => '1m'
);
-- IN subquery: products that have been ordered
SELECT pgtrickle.create_stream_table(
name => 'ordered_products',
query => 'SELECT p.id, p.name
FROM products p
WHERE p.id IN (SELECT product_id FROM order_items)',
schedule => '1m'
);
-- NOT IN subquery: products never ordered
SELECT pgtrickle.create_stream_table(
name => 'unordered_products',
query => 'SELECT p.id, p.name
FROM products p
WHERE p.id NOT IN (SELECT product_id FROM order_items)',
schedule => '1m'
);
-- Scalar subquery in SELECT list
SELECT pgtrickle.create_stream_table(
name => 'products_with_max_price',
query => 'SELECT p.id, p.name, (SELECT max(price) FROM products) AS max_price
FROM products p',
schedule => '1m'
);
Notes:
- The defining query is parsed into an operator tree and validated for DVM support.
- Views as sources — views referenced in the defining query are automatically inlined as subqueries (auto-rewrite pass #0). CDC triggers are created on the underlying base tables. Nested views (view → view → table) are fully expanded. The user's original query is preserved in
original_queryfor reinit and introspection. Materialized views are rejected in DIFFERENTIAL mode (use FULL mode or the underlying query directly). Foreign tables are also rejected in DIFFERENTIAL mode. - CDC triggers and change buffer tables are created automatically for each source table.
- TRUNCATE on source tables — when a source table is TRUNCATEd, a CDC trigger writes a marker row (
action='T') into the change buffer. On the next refresh cycle, pg_trickle detects the marker and automatically falls back to a FULL refresh. For single-source stream tables where no subsequent DML occurred after the TRUNCATE, an optimized fast path deletes all ST rows directly without re-running the full defining query. - The ST is registered in the dependency DAG; cycles are rejected.
- Non-recursive CTEs are inlined as subqueries during parsing (Tier 1). Multi-reference CTEs share delta computation (Tier 2).
- Recursive CTEs in DIFFERENTIAL mode use three strategies, auto-selected per refresh: semi-naive evaluation for INSERT-only changes, DRed (Delete-and-Rederive) for mixed DELETE/UPDATE changes, and recomputation fallback when CTE columns do not match ST storage columns. Non-monotone recursive terms (containing EXCEPT, Aggregate, Window, DISTINCT, AntiJoin, or INTERSECT SET) automatically fall back to recomputation to ensure correctness.
Recursive CTE DIFFERENTIAL mode -- DRed algorithm (P2-1) In DIFFERENTIAL mode, mixed DELETE/UPDATE changes now use the DRed (Delete-and-Rederive) algorithm: (1) semi-naive INSERT propagation; (2) over-deletion cascade from ST storage; (3) rederivation from current source tables; (4) combine net deletions. DRed correctly handles derived-column changes such as path rebuilds under a renamed ancestor node. When CTE output columns differ from ST storage columns (mismatch), recomputation is used. Implemented in v0.10.0 (P2-1).
- LATERAL SRFs in DIFFERENTIAL mode use row-scoped recomputation: when a source row changes, only the SRF expansions for that row are re-evaluated.
- LATERAL subqueries in DIFFERENTIAL mode also use row-scoped recomputation: when an outer row changes, the correlated subquery is re-executed only for that row.
- WHERE subqueries (
EXISTS,IN, scalar) are parsed into dedicated semi-join, anti-join, and scalar subquery operators with specialized delta computation. ALL (subquery)is the only subquery form that is currently rejected.- ORDER BY is accepted but silently discarded — row order in the storage table is undefined (consistent with PostgreSQL's
CREATE MATERIALIZED VIEWbehavior). Apply ORDER BY when querying the stream table. - TopK (ORDER BY + LIMIT) — When a top-level
ORDER BY … LIMIT Nis present (with a constant integer limit, optionally withOFFSET M), the query is recognized as a "TopK" pattern and accepted. TopK stream tables store exactly N rows (starting from position M+1 if OFFSET is specified) and are refreshed via a scoped-recomputation MERGE strategy. The DVM delta pipeline is bypassed; instead, each refresh re-evaluates the full ORDER BY + LIMIT [+ OFFSET] query and merges the result into the storage table. The catalog recordstopk_limit,topk_order_by, and optionallytopk_offsetfor the stream table. TopK is not supported with set operations (UNION/INTERSECT/EXCEPT) or GROUP BY ROLLUP/CUBE/GROUPING SETS. - LIMIT / OFFSET without ORDER BY are rejected — stream tables materialize the full result set. Apply LIMIT when querying the stream table.
pgtrickle.create_stream_table_if_not_exists
Create a stream table if it does not already exist. If a stream table with the
given name already exists, this is a silent no-op (an INFO message is logged).
The existing definition is never modified.
pgtrickle.create_stream_table_if_not_exists(
name text,
query text,
schedule text DEFAULT 'calculated',
refresh_mode text DEFAULT 'AUTO',
initialize bool DEFAULT true,
diamond_consistency text DEFAULT NULL,
diamond_schedule_policy text DEFAULT NULL,
cdc_mode text DEFAULT NULL,
append_only bool DEFAULT false,
pooler_compatibility_mode bool DEFAULT false
) → void
Parameters: Same as create_stream_table.
Example:
-- Safe to re-run in migrations:
SELECT pgtrickle.create_stream_table_if_not_exists(
'order_totals',
'SELECT customer_id, sum(amount) AS total FROM orders GROUP BY customer_id',
'1m',
'DIFFERENTIAL'
);
Notes:
- Useful for deployment / migration scripts that should be safe to re-run.
- If the stream table already exists, the provided
query,schedule, and other parameters are ignored — the existing definition is preserved.
pgtrickle.create_or_replace_stream_table
Create a stream table if it does not exist, or replace the existing one if the definition changed. This is the declarative, idempotent API for deployment workflows (dbt, SQL migrations, GitOps).
pgtrickle.create_or_replace_stream_table(
name text,
query text,
schedule text DEFAULT 'calculated',
refresh_mode text DEFAULT 'AUTO',
initialize bool DEFAULT true,
diamond_consistency text DEFAULT NULL,
diamond_schedule_policy text DEFAULT NULL,
cdc_mode text DEFAULT NULL,
append_only bool DEFAULT false,
pooler_compatibility_mode bool DEFAULT false
) → void
Parameters: Same as create_stream_table.
Behavior:
| Current state | Action taken |
|---|---|
| Stream table does not exist | Create — identical to create_stream_table(...) |
| Stream table exists, query and all config identical | No-op — logs INFO, returns immediately |
| Stream table exists, query identical but config differs | Alter config — delegates to alter_stream_table(...) for schedule, refresh_mode, diamond settings, cdc_mode, append_only, pooler_compatibility_mode |
| Stream table exists, query differs | Replace query — in-place ALTER QUERY migration plus any config changes; a full refresh is applied |
The initialize parameter is honoured on create only. On replace, the stream table is always repopulated via a full refresh.
Query comparison uses the post-rewrite (normalized) form of the SQL. Cosmetic differences such as whitespace, casing, and extra parentheses are ignored.
Example:
-- Idempotent deployment — safe to run on every deploy:
SELECT pgtrickle.create_or_replace_stream_table(
name => 'order_totals',
query => 'SELECT region, SUM(amount) AS total FROM orders GROUP BY region',
schedule => '2m',
refresh_mode => 'DIFFERENTIAL'
);
-- If the query changed since last deploy, the stream table is
-- migrated in place (no data gap). If nothing changed, it's a no-op.
Notes:
- Mirrors PostgreSQL's
CREATE OR REPLACEconvention (CREATE OR REPLACE VIEW,CREATE OR REPLACE FUNCTION). - Never drops the stream table — even for incompatible schema changes, the ALTER QUERY path rebuilds storage in place while preserving the catalog entry (
pgt_id). - For migration scripts that should not modify an existing definition, use
create_stream_table_if_not_existsinstead.
pgtrickle.bulk_create
Create multiple stream tables in a single transaction.
pgtrickle.bulk_create(
definitions jsonb -- Array of stream table definitions
) → jsonb -- Array of result objects
Each element in the definitions array must be a JSON object with at least name and query keys. All other keys match the parameters of create_stream_table (snake_case):
| Key | Type | Default | Description |
|---|---|---|---|
name | string | (required) | Stream table name (optionally schema-qualified). |
query | string | (required) | Defining SQL query. |
schedule | string | 'calculated' | Refresh schedule. |
refresh_mode | string | 'AUTO' | 'AUTO', 'FULL', 'DIFFERENTIAL', or 'IMMEDIATE'. |
initialize | boolean | true | Whether to populate immediately. |
diamond_consistency | string | NULL | 'atomic' or 'none'. |
diamond_schedule_policy | string | NULL | 'fastest' or 'slowest'. |
cdc_mode | string | NULL | 'auto', 'trigger', or 'wal'. |
append_only | boolean | false | Enable append-only fast path. |
pooler_compatibility_mode | boolean | false | PgBouncer compatibility. |
partition_by | string | NULL | Partition key. |
max_differential_joins | integer | NULL | Max join scan limit. |
max_delta_fraction | number | NULL | Max delta fraction (0.0–1.0). |
Returns a JSONB array of result objects:
[
{"name": "st1", "status": "created", "pgt_id": 42},
{"name": "st2", "status": "created", "pgt_id": 43}
]
On any error, the entire transaction is rolled back (standard PostgreSQL transactional semantics). The error message includes the index and name of the failing definition.
Example:
SELECT pgtrickle.bulk_create('[
{"name": "order_totals", "query": "SELECT customer_id, SUM(amount) AS total FROM orders GROUP BY customer_id", "schedule": "30s"},
{"name": "product_stats", "query": "SELECT product_id, COUNT(*) AS cnt FROM order_items GROUP BY product_id", "schedule": "1m"}
]'::jsonb);
pgtrickle.alter_stream_table
Alter properties of an existing stream table.
pgtrickle.alter_stream_table(
name text,
query text DEFAULT NULL,
schedule text DEFAULT NULL,
refresh_mode text DEFAULT NULL,
status text DEFAULT NULL,
diamond_consistency text DEFAULT NULL,
diamond_schedule_policy text DEFAULT NULL,
cdc_mode text DEFAULT NULL,
append_only bool DEFAULT NULL,
pooler_compatibility_mode bool DEFAULT NULL,
tier text DEFAULT NULL
) → void
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
name | text | — | Name of the stream table (schema-qualified or unqualified). |
query | text | NULL | New defining query. Pass NULL to leave unchanged. When set, the function validates the new query, migrates the storage table schema if needed, updates catalog entries and dependencies, and runs a full refresh. Schema changes are classified as same (no DDL), compatible (ALTER TABLE ADD/DROP COLUMN), or incompatible (full storage rebuild with OID change). |
schedule | text | NULL | New schedule as a duration string (e.g., '5m'). Pass NULL to leave unchanged. Pass 'calculated' to switch to CALCULATED mode. |
refresh_mode | text | NULL | New refresh mode ('AUTO', 'FULL', 'DIFFERENTIAL', or 'IMMEDIATE'). Pass NULL to leave unchanged. Switching to/from 'IMMEDIATE' migrates trigger infrastructure (IVM triggers ↔ CDC triggers), clears or restores the schedule, and runs a full refresh. |
status | text | NULL | New status ('ACTIVE', 'SUSPENDED'). Pass NULL to leave unchanged. Resuming resets consecutive errors to 0. |
diamond_consistency | text | NULL | New diamond consistency mode ('none' or 'atomic'). Pass NULL to leave unchanged. |
diamond_schedule_policy | text | NULL | New schedule policy for atomic diamond groups ('fastest' or 'slowest'). Pass NULL to leave unchanged. |
cdc_mode | text | NULL | New requested CDC mode override ('auto', 'trigger', or 'wal'). Pass NULL to leave unchanged. |
append_only | bool | NULL | Enable or disable the append-only INSERT fast path. Pass NULL to leave unchanged. When true, rejected for FULL, IMMEDIATE, or keyless source stream tables. |
pooler_compatibility_mode | bool | NULL | Enable or disable pooler-safe mode. When true, prepared statements are bypassed and NOTIFY emissions are suppressed. Pass NULL to leave unchanged. |
tier | text | NULL | Refresh tier for tiered scheduling ('hot', 'warm', 'cold', or 'frozen'). Only effective when pg_trickle.tiered_scheduling GUC is enabled. Hot (1×), Warm (2×), Cold (10×), Frozen (skip). Pass NULL to leave unchanged. |
If you switch a stream table to refresh_mode => 'IMMEDIATE' while the
cluster-wide pg_trickle.cdc_mode GUC is set to 'wal', pg_trickle logs an
INFO and proceeds with IVM triggers. WAL CDC does not apply to IMMEDIATE mode.
If the stream table has an explicit cdc_mode => 'wal' override, switching to
IMMEDIATE is rejected until you change the requested CDC mode back to
'auto' or 'trigger'.
Examples:
-- Change the defining query (same output schema — fast path)
SELECT pgtrickle.alter_stream_table('order_totals',
query => 'SELECT customer_id, SUM(amount) AS total FROM orders WHERE status = ''active'' GROUP BY customer_id');
-- Change query and add a column (compatible schema migration)
SELECT pgtrickle.alter_stream_table('order_totals',
query => 'SELECT customer_id, SUM(amount) AS total, COUNT(*) AS cnt FROM orders GROUP BY customer_id');
-- Change query and mode simultaneously
SELECT pgtrickle.alter_stream_table('order_totals',
query => 'SELECT customer_id, SUM(amount) AS total FROM orders GROUP BY customer_id',
refresh_mode => 'FULL');
-- Change schedule
SELECT pgtrickle.alter_stream_table('order_totals', schedule => '5m');
-- Switch to full refresh mode
SELECT pgtrickle.alter_stream_table('order_totals', refresh_mode => 'FULL');
-- Switch to immediate (transactional) mode — installs IVM triggers, clears schedule
SELECT pgtrickle.alter_stream_table('order_totals', refresh_mode => 'IMMEDIATE');
-- Switch from immediate back to differential — re-creates CDC triggers, restores schedule
SELECT pgtrickle.alter_stream_table('order_totals',
refresh_mode => 'DIFFERENTIAL', schedule => '5m');
-- Pin a deferred stream table to trigger CDC even when the global GUC is 'auto'
SELECT pgtrickle.alter_stream_table('order_totals', cdc_mode => 'trigger');
-- Enable append-only INSERT fast path
SELECT pgtrickle.alter_stream_table('event_log_st', append_only => true);
-- Enable pooler compatibility mode (for PgBouncer transaction mode)
SELECT pgtrickle.alter_stream_table('order_totals', pooler_compatibility_mode => true);
-- Set refresh tier (requires pg_trickle.tiered_scheduling = on)
SELECT pgtrickle.alter_stream_table('order_totals', tier => 'warm');
SELECT pgtrickle.alter_stream_table('archive_stats', tier => 'frozen');
-- Suspend a stream table
SELECT pgtrickle.alter_stream_table('order_totals', status => 'SUSPENDED');
-- Resume a suspended stream table
SELECT pgtrickle.resume_stream_table('order_totals');
-- Or via alter_stream_table
SELECT pgtrickle.alter_stream_table('order_totals', status => 'ACTIVE');
Notes:
- When
queryis provided, the function runs the full query rewrite pipeline (view inlining, DISTINCT ON, GROUPING SETS, etc.) and validates the new query before applying changes. - The entire ALTER QUERY operation runs within a single transaction. If any step fails, the stream table is left unchanged.
- For same-schema and compatible-schema changes, the storage table OID is preserved — views, policies, and publications referencing the stream table remain valid.
- For incompatible schema changes (e.g., changing a column from
integertotext), the storage table is rebuilt and the OID changes. AWARNINGis emitted. - The stream table is temporarily suspended during query migration to prevent concurrent scheduler refreshes.
pgtrickle.drop_stream_table
Drop a stream table, removing the storage table and all catalog entries.
pgtrickle.drop_stream_table(name text) → void
Parameters:
| Parameter | Type | Description |
|---|---|---|
name | text | Name of the stream table to drop. |
Example:
SELECT pgtrickle.drop_stream_table('order_totals');
Notes:
- Drops the underlying storage table with
CASCADE. - Removes all catalog entries (metadata, dependencies, refresh history).
- Cleans up CDC triggers and change buffer tables for source tables that are no longer tracked by any ST.
- Automatically drops any downstream publication created by
stream_table_to_publication().
pgtrickle.stream_table_to_publication
Create a PostgreSQL logical replication publication for a stream table, enabling downstream consumers (Debezium, Kafka Connect, standby replicas) to subscribe to changes.
pgtrickle.stream_table_to_publication(name text) → void
Parameters:
| Parameter | Type | Description |
|---|---|---|
name | text | Name of the stream table (schema-qualified or unqualified). |
Example:
SELECT pgtrickle.stream_table_to_publication('order_totals');
-- Creates publication 'pgt_pub_order_totals'
Notes:
- The publication is named
pgt_pub_<table_name>. - Only one publication per stream table is allowed.
- The publication is automatically dropped when the stream table is dropped.
pgtrickle.drop_stream_table_publication
Drop the logical replication publication for a stream table.
pgtrickle.drop_stream_table_publication(name text) → void
Parameters:
| Parameter | Type | Description |
|---|---|---|
name | text | Name of the stream table (schema-qualified or unqualified). |
Example:
SELECT pgtrickle.drop_stream_table_publication('order_totals');
pgtrickle.set_stream_table_sla
Assign a freshness deadline SLA to a stream table. The extension automatically assigns the appropriate refresh tier based on the SLA.
pgtrickle.set_stream_table_sla(name text, sla interval) → void
Parameters:
| Parameter | Type | Description |
|---|---|---|
name | text | Name of the stream table (schema-qualified or unqualified). |
sla | interval | Maximum acceptable data staleness. |
Tier assignment:
- SLA ≤ 5 seconds → Hot tier
- SLA ≤ 30 seconds → Warm tier
- SLA > 30 seconds → Cold tier
Example:
SELECT pgtrickle.set_stream_table_sla('order_totals', interval '10 seconds');
-- Assigns Warm tier
Notes:
- The scheduler periodically checks actual refresh performance and dynamically re-assigns tiers if the SLA is consistently breached or over-served.
pgtrickle.resume_stream_table
Resume a suspended stream table, clearing its consecutive error count and re-enabling automated and manual refreshes.
pgtrickle.resume_stream_table(name text) → void
Parameters:
| Parameter | Type | Description |
|---|---|---|
name | text | Name of the stream table to resume (schema-qualified or unqualified). |
Example:
-- Resume a stream table that was auto-suspended due to repeated errors
SELECT pgtrickle.resume_stream_table('order_totals');
Notes:
- Errors if the ST is not in
SUSPENDEDstate. - Resets
consecutive_errorsto0and setsstatus = 'ACTIVE'. - Emits a
resumedevent on thepg_trickle_alertNOTIFY channel. - After resuming, the scheduler will include the ST in its next cycle.
pgtrickle.refresh_stream_table
Manually trigger a synchronous refresh of a stream table.
pgtrickle.refresh_stream_table(name text) → void
Parameters:
| Parameter | Type | Description |
|---|---|---|
name | text | Name of the stream table to refresh. |
Example:
SELECT pgtrickle.refresh_stream_table('order_totals');
Notes:
- Blocked if the ST is
SUSPENDED— usepgtrickle.resume_stream_table(name)first. - Uses an advisory lock to prevent concurrent refreshes of the same ST.
- For
DIFFERENTIALmode, generates and applies a delta query. ForFULLmode, truncates and reloads. - Records the refresh in
pgtrickle.pgt_refresh_historywithinitiated_by = 'MANUAL'.
pgtrickle.repair_stream_table
Repair a stream table by reinstalling any missing CDC triggers, validating catalog entries, and reconciling change buffer state.
pgtrickle.repair_stream_table(name text) → text
Parameters:
| Parameter | Type | Description |
|---|---|---|
name | text | Name of the stream table to repair. |
Example:
-- Reinstall missing CDC triggers after a point-in-time recovery
SELECT pgtrickle.repair_stream_table('order_totals');
Notes:
- Inspects all source tables in the stream table's dependency graph and reinstalls any missing or disabled CDC triggers.
- Validates that the stream table's catalog entry, storage table, and change buffer tables are consistent.
- Useful after
pg_basebackupor PITR restores where triggers may not have been captured in the backup. - Use
pgtrickle.trigger_inventory()first to identify which triggers are missing. - Safe to call on a healthy stream table — it is a no-op if everything is intact.
Status & Monitoring
Query the state of stream tables, view refresh statistics, and diagnose problems.
pgtrickle.pgt_status
Get the status of all stream tables.
pgtrickle.pgt_status() → SETOF record(
name text,
status text,
refresh_mode text,
is_populated bool,
consecutive_errors int,
schedule text,
data_timestamp timestamptz,
staleness interval
)
Example:
SELECT * FROM pgtrickle.pgt_status();
| name | status | refresh_mode | is_populated | consecutive_errors | schedule | data_timestamp | staleness |
|---|---|---|---|---|---|---|---|
| public.order_totals | ACTIVE | DIFFERENTIAL | true | 0 | 5m | 2026-02-21 12:00:00+00 | 00:02:30 |
pgtrickle.health_check
Run a set of health checks against the pg_trickle installation and return one row per check.
pgtrickle.health_check() → SETOF record(
check_name text, -- identifier for the check
severity text, -- 'OK', 'WARN', or 'ERROR'
detail text -- human-readable explanation
)
Filter to problems only:
SELECT check_name, severity, detail
FROM pgtrickle.health_check()
WHERE severity != 'OK';
Checks: scheduler_running, error_tables, stale_tables, needs_reinit,
consecutive_errors, buffer_growth (> 10 000 pending rows), slot_lag
(retained WAL above pg_trickle.slot_lag_warning_threshold_mb, default 100 MB),
worker_pool (all worker tokens in use — parallel mode only), job_queue
(> 10 jobs queued — parallel mode only).
pgtrickle.health_summary
Single-row summary of the entire pg_trickle deployment's health. Designed for monitoring dashboards that want one endpoint to poll instead of joining multiple views.
pgtrickle.health_summary() → SETOF record(
total_stream_tables int,
active_count int,
error_count int,
suspended_count int,
stale_count int,
reinit_pending int,
max_staleness_seconds float8, -- NULL if no stream tables
scheduler_status text, -- 'ACTIVE', 'STOPPED', or 'NOT_LOADED'
cache_hit_rate float8 -- NULL if no cache lookups yet
)
Example:
SELECT * FROM pgtrickle.health_summary();
| total_stream_tables | active_count | error_count | suspended_count | stale_count | reinit_pending | max_staleness_seconds | scheduler_status | cache_hit_rate |
|---|---|---|---|---|---|---|---|---|
| 12 | 11 | 0 | 1 | 0 | 0 | 45.2 | ACTIVE | 0.94 |
Tip: Use this in a Grafana single-stat panel or a Prometheus exporter to surface fleet-level health at a glance.
pgtrickle.refresh_timeline
Return recent refresh records across all stream tables in a single chronological view.
pgtrickle.refresh_timeline(
max_rows int DEFAULT 50
) → SETOF record(
start_time timestamptz,
stream_table text,
action text,
status text,
rows_inserted bigint,
rows_deleted bigint,
duration_ms float8,
error_message text
)
Example:
-- Most recent 20 events across all stream tables:
SELECT start_time, stream_table, action, status, round(duration_ms::numeric,1) AS ms
FROM pgtrickle.refresh_timeline(20);
-- Just failures in the last 100 events:
SELECT * FROM pgtrickle.refresh_timeline(100) WHERE status = 'ERROR';
pgtrickle.st_refresh_stats
Return per-ST refresh statistics aggregated from the refresh history.
pgtrickle.st_refresh_stats() → SETOF record(
pgt_name text,
pgt_schema text,
status text,
refresh_mode text,
is_populated bool,
total_refreshes bigint,
successful_refreshes bigint,
failed_refreshes bigint,
total_rows_inserted bigint,
total_rows_deleted bigint,
avg_duration_ms float8,
last_refresh_action text,
last_refresh_status text,
last_refresh_at timestamptz,
staleness_secs float8,
stale bool
)
Example:
SELECT pgt_name, status, total_refreshes, avg_duration_ms, stale
FROM pgtrickle.st_refresh_stats();
pgtrickle.get_refresh_history
Return refresh history for a specific stream table.
pgtrickle.get_refresh_history(
name text,
max_rows int DEFAULT 20
) → SETOF record(
refresh_id bigint,
data_timestamp timestamptz,
start_time timestamptz,
end_time timestamptz,
action text,
status text,
rows_inserted bigint,
rows_deleted bigint,
duration_ms float8,
error_message text
)
Example:
SELECT action, status, rows_inserted, duration_ms
FROM pgtrickle.get_refresh_history('order_totals', 5);
pgtrickle.get_staleness
Get the current staleness in seconds for a specific stream table.
pgtrickle.get_staleness(name text) → float8
Returns NULL if the ST has never been refreshed.
Example:
SELECT pgtrickle.get_staleness('order_totals');
-- Returns: 12.345 (seconds since last refresh)
pgtrickle.explain_refresh_mode
Added in v0.11.0
Explain the configured vs. effective refresh mode for a stream table, including the reason for any downgrade (e.g., AUTO choosing FULL).
pgtrickle.explain_refresh_mode(name text) → TABLE(
configured_mode text,
effective_mode text,
downgrade_reason text
)
Columns:
| Column | Type | Description |
|---|---|---|
configured_mode | text | The refresh mode set on the stream table (e.g., DIFFERENTIAL, AUTO, FULL, IMMEDIATE) |
effective_mode | text | The mode actually used on the most recent refresh. NULL for IMMEDIATE mode (handled by triggers) |
downgrade_reason | text | Human-readable explanation when effective_mode differs from configured_mode, or informational note for IMMEDIATE / APPEND_ONLY |
Example:
SELECT * FROM pgtrickle.explain_refresh_mode('public.orders_summary');
| configured_mode | effective_mode | downgrade_reason |
|---|---|---|
| AUTO | FULL | The most recent refresh used FULL mode. Possible causes: defining query contains a CTE or unsupported operator, adaptive change-ratio threshold was exceeded, or aggregate saturation occurred. Check pgtrickle.pgt_refresh_history for details. |
pgtrickle.cache_stats
Return template cache statistics from shared memory.
Reports L1 (thread-local) hits, L2 (catalog table) hits, full misses (DVM re-parse), evictions (generation flushes), and the current L1 cache size for this backend.
pgtrickle.cache_stats() → SETOF record(
l1_hits bigint,
l2_hits bigint,
misses bigint,
evictions bigint,
l1_size integer
)
| Column | Description |
|---|---|
l1_hits | Number of delta template cache hits in the thread-local (L1) cache. ~0 ns lookup. |
l2_hits | Number of delta template cache hits in the catalog table (L2) cache. ~1 ms SPI lookup. |
misses | Number of full cache misses requiring DVM re-parse (~45 ms). |
evictions | Number of entries evicted from L1 due to DDL-triggered generation flushes. |
l1_size | Current number of entries in this backend's L1 cache. |
Example:
SELECT * FROM pgtrickle.cache_stats();
| l1_hits | l2_hits | misses | evictions | l1_size |
|---|---|---|---|---|
| 142 | 3 | 5 | 10 | 8 |
Note: Counters are cluster-wide (shared memory) except
l1_sizewhich is per-backend. Requiresshared_preload_libraries = 'pg_trickle'; returns zeros when loaded dynamically.
CDC Diagnostics
Inspect CDC pipeline health, replication slots, change buffers, and trigger coverage.
pgtrickle.slot_health
Check replication slot health for all tracked CDC slots.
pgtrickle.slot_health() → SETOF record(
slot_name text,
source_relid bigint,
active bool,
retained_wal_bytes bigint,
wal_status text
)
Example:
SELECT * FROM pgtrickle.slot_health();
| slot_name | source_relid | active | retained_wal_bytes | wal_status |
|---|---|---|---|---|
| pg_trickle_slot_16384 | 16384 | false | 1048576 | reserved |
pgtrickle.check_cdc_health
Check CDC health for all tracked source tables. Returns per-source health status including the current CDC mode, replication slot details, estimated lag, and any alerts.
The alert column uses the critical threshold configured by
pg_trickle.slot_lag_critical_threshold_mb (default 1024 MB).
pgtrickle.check_cdc_health() → SETOF record(
source_relid bigint,
source_table text,
cdc_mode text,
slot_name text,
lag_bytes bigint,
confirmed_lsn text,
alert text
)
Columns:
| Column | Type | Description |
|---|---|---|
source_relid | bigint | OID of the tracked source table |
source_table | text | Resolved name of the source table (e.g., public.orders) |
cdc_mode | text | Current CDC mode: TRIGGER, TRANSITIONING, or WAL |
slot_name | text | Replication slot name (NULL for TRIGGER mode) |
lag_bytes | bigint | Replication slot lag in bytes (NULL for TRIGGER mode) |
confirmed_lsn | text | Last confirmed WAL position (NULL for TRIGGER mode) |
alert | text | Alert message if unhealthy (e.g., slot_lag_exceeds_threshold, replication_slot_missing) |
Example:
SELECT * FROM pgtrickle.check_cdc_health();
| source_relid | source_table | cdc_mode | slot_name | lag_bytes | confirmed_lsn | alert |
|---|---|---|---|---|---|---|
| 16384 | public.orders | TRIGGER | ||||
| 16390 | public.events | WAL | pg_trickle_slot_16390 | 524288 | 0/1A8B000 |
pgtrickle.change_buffer_sizes
Show pending change counts and estimated on-disk sizes for all CDC-tracked source tables.
Returns one row per (stream_table, source_table) pair.
pgtrickle.change_buffer_sizes() → SETOF record(
stream_table text, -- qualified stream table name
source_table text, -- qualified source table name
source_oid bigint,
cdc_mode text, -- 'trigger', 'wal', or 'transitioning'
pending_rows bigint, -- rows in buffer not yet consumed
buffer_bytes bigint -- estimated buffer table size in bytes
)
Example:
SELECT * FROM pgtrickle.change_buffer_sizes()
ORDER BY pending_rows DESC;
Useful for spotting a source table whose CDC buffer is growing unexpectedly (which may indicate a stalled differential refresh or a high-write source that has outpaced the schedule).
pgtrickle.worker_pool_status
Snapshot of the parallel refresh worker pool. Returns a single row.
pgtrickle.worker_pool_status() → SETOF record(
active_workers int, -- workers currently executing refresh jobs
max_workers int, -- cluster-wide worker budget (GUC)
per_db_cap int, -- per-database dispatch cap (GUC)
parallel_mode text -- current parallel_refresh_mode value
)
Example:
SELECT * FROM pgtrickle.worker_pool_status();
Returns 0 active workers when parallel_refresh_mode = 'off'.
pgtrickle.parallel_job_status
Active and recently completed scheduler jobs from the pgt_scheduler_jobs
table. Shows jobs that are currently queued or running, plus jobs that
finished within the last max_age_seconds (default 300).
pgtrickle.parallel_job_status(
max_age_seconds int DEFAULT 300
) → SETOF record(
job_id bigint,
unit_key text, -- stable unit identifier (s:42, a:1,2, etc.)
unit_kind text, -- 'singleton', 'atomic_group', 'immediate_closure'
status text, -- 'QUEUED', 'RUNNING', 'SUCCEEDED', etc.
member_count int,
attempt_no int,
scheduler_pid int,
worker_pid int, -- NULL if not yet claimed
enqueued_at timestamptz,
started_at timestamptz, -- NULL if still queued
finished_at timestamptz, -- NULL if not finished
duration_ms float8 -- NULL if not finished
)
Example — show running and recently failed jobs:
SELECT job_id, unit_key, status, duration_ms
FROM pgtrickle.parallel_job_status(60)
WHERE status NOT IN ('SUCCEEDED');
pgtrickle.trigger_inventory
List all CDC triggers that pg_trickle should have installed, and verify each one exists and is enabled in pg_catalog.
pgtrickle.trigger_inventory() → SETOF record(
source_table text, -- qualified source table name
source_oid bigint,
trigger_name text, -- expected trigger name
trigger_type text, -- 'DML' or 'TRUNCATE'
present bool, -- trigger exists in pg_catalog
enabled bool -- trigger is not disabled
)
A present = false row means change capture is broken for that source.
Example:
-- Show only missing or disabled triggers:
SELECT source_table, trigger_type, trigger_name
FROM pgtrickle.trigger_inventory()
WHERE NOT present OR NOT enabled;
pgtrickle.fuse_status
Return the circuit-breaker (fuse) state for every stream table that has a fuse configured.
pgtrickle.fuse_status() → SETOF record(
name text, -- stream table name
fuse_mode text, -- 'off', 'on', or 'auto'
fuse_state text, -- 'armed' or 'blown'
fuse_ceiling bigint, -- change-count threshold
fuse_sensitivity int, -- consecutive over-ceiling cycles before blow
blown_at timestamptz, -- when the fuse last blew (NULL if armed)
blow_reason text -- reason the fuse blew (NULL if armed)
)
Example:
-- Check all fuse-enabled stream tables
SELECT name, fuse_mode, fuse_state, fuse_ceiling, blown_at
FROM pgtrickle.fuse_status();
-- Find blown fuses
SELECT name, blow_reason, blown_at
FROM pgtrickle.fuse_status()
WHERE fuse_state = 'blown';
Notes:
- Returns one row per stream table where
fuse_mode != 'off'. - A blown fuse suspends differential refreshes until cleared with
pgtrickle.reset_fuse(). - A
pgtrickle_alertNOTIFY with eventfuse_blownis emitted when the fuse trips. - See Configuration — fuse_default_ceiling for global defaults.
pgtrickle.reset_fuse
Clear a blown circuit-breaker fuse and resume scheduling for the stream table.
pgtrickle.reset_fuse(name text, action text DEFAULT 'apply') → void
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
name | text | — | Name of the stream table whose fuse to reset. |
action | text | 'apply' | How to handle the pending changes that caused the fuse to blow. |
Actions:
| Action | Behavior |
|---|---|
'apply' | Process all pending changes normally and resume scheduling. |
'reinitialize' | Drop and repopulate the stream table from scratch (full refresh from defining query). |
'skip_changes' | Discard the pending changes that triggered the fuse and resume from the current frontier. |
Example:
-- After investigating a bulk load, apply the changes:
SELECT pgtrickle.reset_fuse('category_summary', action => 'apply');
-- Or skip the oversized batch entirely:
SELECT pgtrickle.reset_fuse('category_summary', action => 'skip_changes');
-- Or rebuild from scratch:
SELECT pgtrickle.reset_fuse('category_summary', action => 'reinitialize');
Notes:
- Errors if the stream table's fuse is not in
'blown'state. - After reset, the fuse returns to
'armed'state and the scheduler resumes normal operation. - Use
pgtrickle.fuse_status()to inspect the fuse state before resetting. - The
'skip_changes'action advances the frontier past the pending changes without applying them — use only when you are certain the changes should be discarded.
Dependency & Inspection
Visualize dependencies, understand query plans, and audit source table relationships.
pgtrickle.dependency_tree
Render all stream table dependencies as an indented ASCII tree.
pgtrickle.dependency_tree() → SETOF record(
tree_line text, -- indented visual line (├──, └──, │ characters)
node text, -- qualified name (schema.table)
node_type text, -- 'stream_table' or 'source_table'
depth int,
status text, -- NULL for source_table nodes
refresh_mode text -- NULL for source_table nodes
)
Roots (stream tables with no stream-table parents) appear at depth 0. Each
dependent is indented beneath its parent. Plain source tables are rendered as
leaf nodes tagged [src].
Example:
SELECT tree_line, status, refresh_mode
FROM pgtrickle.dependency_tree();
tree_line status refresh_mode
----------------------------------------+---------+--------------
report_summary ACTIVE DIFFERENTIAL
├── orders_by_region ACTIVE DIFFERENTIAL
│ ├── public.orders [src]
│ └── public.customers [src]
└── revenue_totals ACTIVE DIFFERENTIAL
└── public.orders [src]
pgtrickle.diamond_groups
List all detected diamond dependency groups and their members.
When stream tables form diamond-shaped dependency graphs (multiple paths converge at a single fan-in node), the scheduler groups them for coordinated refresh. This function exposes those groups for monitoring and debugging.
pgtrickle.diamond_groups() → SETOF record(
group_id int4,
member_name text,
member_schema text,
is_convergence bool,
epoch int8,
schedule_policy text
)
Return columns:
| Column | Type | Description |
|---|---|---|
group_id | int4 | Numeric identifier for the consistency group (1-based). |
member_name | text | Name of the stream table in this group. |
member_schema | text | Schema of the stream table. |
is_convergence | bool | true if this member is a convergence (fan-in) node where multiple paths meet. |
epoch | int8 | Group epoch counter — advances on each successful atomic refresh of the group. |
schedule_policy | text | Effective schedule policy for this group ('fastest' or 'slowest'). Computed from convergence node settings with strictest-wins. |
Example:
SELECT * FROM pgtrickle.diamond_groups();
| group_id | member_name | member_schema | is_convergence | epoch | schedule_policy |
|---|---|---|---|---|---|
| 1 | st_b | public | false | 0 | fastest |
| 1 | st_c | public | false | 0 | fastest |
| 1 | st_d | public | true | 0 | fastest |
Notes:
- Singleton stream tables (not part of any diamond) are omitted.
- The DAG is rebuilt on each call from the catalog — results reflect the current dependency graph.
- Groups are only relevant when
diamond_consistency = 'atomic'is set on the convergence node or globally via thepg_trickle.diamond_consistencyGUC.
pgtrickle.pgt_scc_status
List all cyclic strongly connected components (SCCs) and their convergence status.
When stream tables form circular dependencies (with pg_trickle.allow_circular = true), they are grouped into SCCs and iterated to a fixed point. This function exposes those groups for monitoring and debugging.
pgtrickle.pgt_scc_status() → SETOF record(
scc_id int4,
member_count int4,
members text[],
last_iterations int4,
last_converged_at timestamptz
)
Return columns:
| Column | Type | Description |
|---|---|---|
scc_id | int4 | SCC group identifier (1-based). |
member_count | int4 | Number of stream tables in this SCC. |
members | text[] | Array of schema.name for each member. |
last_iterations | int4 | Number of fixpoint iterations in the last convergence (NULL if never iterated). |
last_converged_at | timestamptz | Timestamp of the most recent refresh among SCC members (NULL if never refreshed). |
Example:
SELECT * FROM pgtrickle.pgt_scc_status();
| scc_id | member_count | members | last_iterations | last_converged_at |
|---|---|---|---|---|
| 1 | 2 | {public.reach_a,public.reach_b} | 3 | 2026-03-15 12:00:00+00 |
Notes:
- Only cyclic SCCs (with
scc_id IS NOT NULL) are returned. Acyclic stream tables are omitted. last_iterationsreflects the maximumlast_fixpoint_iterationsacross SCC members.- Results are queried from the catalog on each call.
pgtrickle.explain_st
Explain the DVM plan for a stream table's defining query.
pgtrickle.explain_st(name text) → SETOF record(
property text,
value text
)
Example:
SELECT * FROM pgtrickle.explain_st('order_totals');
| property | value |
|---|---|
| pgt_name | public.order_totals |
| defining_query | SELECT region, SUM(amount) ... |
| refresh_mode | DIFFERENTIAL |
| status | active |
| is_populated | true |
| dvm_supported | true |
| operator_tree | Aggregate → Scan(orders) |
| output_columns | region, total |
| source_oids | 16384 |
| delta_query | WITH ... SELECT ... |
| frontier | {"orders": "0/15A3B80"} |
| amplification_stats | {"samples":10,"min":1.0,...} |
| refresh_timing_stats | {"samples":10,"min_ms":12.3,...} |
| source_partitions | [{"source":"public.orders",...}] |
| dependency_graph_dot | digraph dependency_subgraph { ... } |
| spill_info | {"temp_blks_read":0,"temp_blks_written":1234,...} |
Output Fields
| Property | Description |
|---|---|
pgt_name | Fully-qualified stream table name |
defining_query | The SQL query that defines the stream table |
refresh_mode | DIFFERENTIAL, FULL, or IMMEDIATE |
status | Current status (active, suspended, etc.) |
is_populated | Whether the stream table has been initially populated |
dvm_supported | Whether the defining query supports differential view maintenance |
operator_tree | Debug representation of the DVM operator tree |
output_columns | Comma-separated list of output column names |
source_oids | Comma-separated list of source table OIDs |
aggregate_strategies | Per-aggregate maintenance strategies (JSON, if aggregates present) |
delta_query | The generated delta SQL used for DIFFERENTIAL refresh |
frontier | Current LSN/watermark frontier (JSON) |
amplification_stats | Delta amplification ratio statistics over the last 20 refreshes (JSON) |
refresh_timing_stats | Refresh duration statistics over the last 20 completed refreshes (JSON). Fields: samples, min_ms, max_ms, avg_ms, latest_ms, latest_action |
source_partitions | Partition info for partitioned source tables (JSON array). Fields per entry: source, partition_key, partitions |
dependency_graph_dot | Dependency sub-graph in DOT format. Shows immediate upstream sources (ellipses for base tables, boxes for stream tables) and downstream dependents. Paste into a Graphviz renderer to visualize. |
spill_info | Temp file spill metrics from pg_stat_statements (JSON). Fields: temp_blks_read, temp_blks_written, threshold, exceeds_threshold. Only present when pg_trickle.spill_threshold_blocks > 0. |
Note: Properties are only included when data is available. For example,
source_partitionsonly appears when at least one source table is partitioned, andrefresh_timing_statsonly appears after at least one completed refresh.
pgtrickle.list_sources
List the source tables that a stream table depends on.
pgtrickle.list_sources(name text) → SETOF record(
source_table text, -- qualified source table name
source_oid bigint,
source_type text, -- 'table', 'stream_table', etc.
cdc_mode text, -- 'trigger', 'wal', or 'transitioning'
columns_used text -- column-level dependency info (if available)
)
Example:
SELECT * FROM pgtrickle.list_sources('order_totals');
Returns the tables tracked by CDC for the given stream table, along with how they are being tracked. Useful when diagnosing why a stream table is not refreshing or to audit which source tables are being trigger-tracked.
Utilities
Utility functions for CDC management and row identity hashing.
pgtrickle.rebuild_cdc_triggers
Rebuild all CDC triggers (function body + trigger DDL) for every source table tracked by pg_trickle. This recreates trigger functions and re-attaches the trigger to each source table.
pgtrickle.rebuild_cdc_triggers() → text
Returns 'done' on success. Emits a WARNING per table on error and
continues processing remaining sources.
When to use:
- After changing
pg_trickle.cdc_trigger_modefromrowtostatement(or vice versa). - After
ALTER EXTENSION pg_trickle UPDATEwhen the CDC trigger function body has changed. - After restoring from a backup where triggers may have been lost.
Example:
-- Switch to statement-level triggers and rebuild
SET pg_trickle.cdc_trigger_mode = 'statement';
SELECT pgtrickle.rebuild_cdc_triggers();
Notes:
- Called automatically during
ALTER EXTENSION pg_trickle UPDATE(0.3.0 → 0.4.0) migration. - Safe to call at any time — existing triggers are dropped and recreated.
- On error for a specific table, a
WARNINGis logged and processing continues with remaining sources.
pgtrickle.pg_trickle_hash
Compute a 64-bit xxHash row ID from a text value.
pgtrickle.pg_trickle_hash(input text) → bigint
Marked IMMUTABLE, PARALLEL SAFE.
Example:
SELECT pgtrickle.pg_trickle_hash('some_key');
-- Returns: 1234567890123456789
pgtrickle.pg_trickle_hash_multi
Compute a row ID by hashing multiple text values (composite keys).
pgtrickle.pg_trickle_hash_multi(inputs text[]) → bigint
Marked IMMUTABLE, PARALLEL SAFE. Uses \x1E (record separator) between values and \x00NULL\x00 for NULL entries.
Example:
SELECT pgtrickle.pg_trickle_hash_multi(ARRAY['key1', 'key2']);
Operator Support Matrix — Summary
pg_trickle supports 60+ SQL constructs across three refresh modes. The table below summarises broad categories. For the complete per-operator matrix (including notes on caveats, auxiliary columns and strategies), see DVM_OPERATORS.md.
| Category | FULL | DIFFERENTIAL | IMMEDIATE | Notes |
|---|---|---|---|---|
| Basic SELECT / WHERE / DISTINCT | ✅ | ✅ | ✅ | |
| Joins (INNER, LEFT, RIGHT, FULL, CROSS, LATERAL) | ✅ | ✅ | ✅ | Hybrid delta strategy |
| Subqueries (EXISTS, IN, NOT EXISTS, NOT IN, scalar) | ✅ | ✅ | ✅ | |
| Set operations (UNION ALL, INTERSECT, EXCEPT) | ✅ | ✅ | ✅ | |
| Algebraic aggregates (COUNT, SUM, AVG, STDDEV, …) | ✅ | ✅ | ✅ | Fully invertible delta |
| Semi-algebraic aggregates (MIN, MAX) | ✅ | ✅ | ✅ | Group rescan on ambiguous delete |
| Group-rescan aggregates (STRING_AGG, ARRAY_AGG, …) | ✅ | ⚠️ | ⚠️ | Warning emitted at creation time |
| Window functions (ROW_NUMBER, RANK, LAG, LEAD, …) | ✅ | ✅ | ✅ | Partition-scoped recompute |
| CTEs (non-recursive and WITH RECURSIVE) | ✅ | ✅ | ✅ | Semi-naive / DRed strategies |
| TopK (ORDER BY … LIMIT) | ✅ | ✅ | ✅ | Scoped recomputation |
| LATERAL / set-returning functions / JSON_TABLE | ✅ | ✅ | ✅ | Row-scoped re-execution |
| ST-to-ST dependencies | ✅ | ✅ | ✅ | Differential via change buffers |
| VOLATILE functions | ✅ | ❌ | ❌ | Rejected at creation time |
Legend: ✅ fully supported — ⚠️ supported with caveats — ❌ not supported
For details on each operator's delta strategy, auxiliary columns, and known limitations, see the full Operator Support Matrix.
Expression Support
pgtrickle's DVM parser supports a wide range of SQL expressions in defining queries. All expressions work in both FULL and DIFFERENTIAL modes.
Conditional Expressions
| Expression | Example | Notes |
|---|---|---|
CASE WHEN … THEN … ELSE … END | CASE WHEN amount > 100 THEN 'high' ELSE 'low' END | Searched CASE |
CASE <expr> WHEN … THEN … END | CASE status WHEN 1 THEN 'active' WHEN 2 THEN 'inactive' END | Simple CASE |
COALESCE(a, b, …) | COALESCE(phone, email, 'unknown') | Returns first non-NULL argument |
NULLIF(a, b) | NULLIF(divisor, 0) | Returns NULL if a = b |
GREATEST(a, b, …) | GREATEST(score1, score2, score3) | Returns the largest value |
LEAST(a, b, …) | LEAST(price, max_price) | Returns the smallest value |
Comparison Operators
| Expression | Example | Notes |
|---|---|---|
IN (list) | category IN ('A', 'B', 'C') | Also supports NOT IN |
BETWEEN a AND b | price BETWEEN 10 AND 100 | Also supports NOT BETWEEN |
IS DISTINCT FROM | a IS DISTINCT FROM b | NULL-safe inequality |
IS NOT DISTINCT FROM | a IS NOT DISTINCT FROM b | NULL-safe equality |
SIMILAR TO | name SIMILAR TO '%pattern%' | SQL regex matching |
op ANY(array) | id = ANY(ARRAY[1,2,3]) | Array comparison |
op ALL(array) | score > ALL(ARRAY[50,60]) | Array comparison |
Boolean Tests
| Expression | Example |
|---|---|
IS TRUE | active IS TRUE |
IS NOT TRUE | flag IS NOT TRUE |
IS FALSE | completed IS FALSE |
IS NOT FALSE | valid IS NOT FALSE |
IS UNKNOWN | result IS UNKNOWN |
IS NOT UNKNOWN | flag IS NOT UNKNOWN |
SQL Value Functions
| Function | Description |
|---|---|
CURRENT_DATE | Current date |
CURRENT_TIME | Current time with time zone |
CURRENT_TIMESTAMP | Current date and time with time zone |
LOCALTIME | Current time without time zone |
LOCALTIMESTAMP | Current date and time without time zone |
CURRENT_ROLE | Current role name |
CURRENT_USER | Current user name |
SESSION_USER | Session user name |
CURRENT_CATALOG | Current database name |
CURRENT_SCHEMA | Current schema name |
Array and Row Expressions
| Expression | Example | Notes |
|---|---|---|
ARRAY[…] | ARRAY[1, 2, 3] | Array constructor |
ROW(…) | ROW(a, b, c) | Row constructor |
| Array subscript | arr[1] | Array element access |
| Field access | (rec).field | Composite type field access |
| Star indirection | (data).* | Expand all fields |
Subquery Expressions
Subqueries are supported in the WHERE clause and SELECT list. They are parsed into dedicated DVM operators with specialized delta computation for incremental maintenance.
| Expression | Example | DVM Operator |
|---|---|---|
EXISTS (subquery) | WHERE EXISTS (SELECT 1 FROM orders WHERE orders.cid = c.id) | Semi-Join |
NOT EXISTS (subquery) | WHERE NOT EXISTS (SELECT 1 FROM orders WHERE orders.cid = c.id) | Anti-Join |
IN (subquery) | WHERE id IN (SELECT product_id FROM order_items) | Semi-Join (rewritten as equality) |
NOT IN (subquery) | WHERE id NOT IN (SELECT product_id FROM order_items) | Anti-Join |
ALL (subquery) | WHERE price > ALL (SELECT price FROM competitors) | Anti-Join (NULL-safe) |
| Scalar subquery (SELECT) | SELECT (SELECT max(price) FROM products) AS max_p | Scalar Subquery |
Notes:
EXISTSandIN (subquery)in theWHEREclause are transformed into semi-join operators.NOT EXISTSandNOT IN (subquery)become anti-join operators.- Multi-column
IN (subquery)is not supported (e.g.,WHERE (a, b) IN (SELECT x, y FROM ...)). Rewrite asWHERE EXISTS (SELECT 1 FROM ... WHERE a = x AND b = y)for equivalent semantics. - Multiple subqueries in the same
WHEREclause are supported when combined withAND. Subqueries combined withORare also supported — they are automatically rewritten intoUNIONof separate filtered queries. - Scalar subqueries in the
SELECTlist are supported as long as they return exactly one row and one column. ALL (subquery)is supported — see the worked example below.
ALL (subquery) — Worked Example
ALL (subquery) tests whether a comparison holds against every row returned
by the subquery. pg_trickle rewrites it to a NULL-safe anti-join so it can be
maintained incrementally.
Comparison operators supported: >, >=, <, <=, =, <>
Example — products cheaper than all competitors:
-- Source tables
CREATE TABLE products (
id INT PRIMARY KEY,
name TEXT,
price NUMERIC
);
CREATE TABLE competitor_prices (
id INT PRIMARY KEY,
product_id INT,
price NUMERIC
);
-- Sample data
INSERT INTO products VALUES (1, 'Widget', 9.99), (2, 'Gadget', 24.99), (3, 'Gizmo', 14.99);
INSERT INTO competitor_prices VALUES (1, 1, 12.99), (2, 1, 11.50), (3, 2, 19.99), (4, 3, 14.99);
-- Stream table: find products priced below ALL competitor prices
SELECT pgtrickle.create_stream_table(
name => 'cheapest_products',
query => $$
SELECT p.id, p.name, p.price
FROM products p
WHERE p.price < ALL (
SELECT cp.price
FROM competitor_prices cp
WHERE cp.product_id = p.id
)
$$,
schedule => '1m'
);
Result: Widget (9.99 < all of [12.99, 11.50]) is included. Gadget (24.99 ≮ 19.99) is excluded. Gizmo (14.99 ≮ 14.99) is excluded.
How pg_trickle handles it internally:
WHERE price < ALL (SELECT ...)is parsed into an anti-join with a NULL-safe condition.- The condition
NOT (x op col)is wrapped as(col IS NULL OR NOT (x op col))to correctly handle NULL values in the subquery — if any subquery row is NULL, the ALL comparison fails (standard SQL semantics). - The anti-join uses the same incremental delta computation as
NOT EXISTS, so changes to eitherproductsorcompetitor_pricesare propagated efficiently.
Other common patterns:
-- Employees whose salary meets or exceeds all department maximums
WHERE salary >= ALL (SELECT max_salary FROM department_caps)
-- Orders with ratings better than all thresholds
WHERE rating > ALL (SELECT min_rating FROM quality_thresholds)
Auto-Rewrite Pipeline
pg_trickle transparently rewrites certain SQL constructs before parsing. These rewrites are applied automatically and require no user action:
| Order | Trigger | Rewrite |
|---|---|---|
| #0 | View references in FROM | Inline view body as subquery |
| #1 | DISTINCT ON (expr) | Convert to ROW_NUMBER() OVER (PARTITION BY expr ORDER BY ...) = 1 subquery |
| #2 | GROUPING SETS / CUBE / ROLLUP | Decompose into UNION ALL of separate GROUP BY queries |
| #3 | Scalar subquery in WHERE | Convert to CROSS JOIN with inline view |
| #4 | Correlated scalar subquery in SELECT | Convert to LEFT JOIN with grouped inline view |
| #5 | EXISTS/IN inside OR | Split into UNION of separate filtered queries |
| #6 | Multiple PARTITION BY clauses | Split into joined subqueries, one per distinct partitioning |
| #7 | Window functions inside expressions | Lift to inner subquery with synthetic __pgt_wf_N columns (see below) |
Window Functions in Expressions (Auto-Rewrite)
Window functions nested inside expressions (e.g., CASE WHEN ROW_NUMBER() ...,
ABS(RANK() OVER (...) - 5)) are automatically rewritten. pg_trickle lifts
each window function call into a synthetic column in an inner subquery, then
applies the original expression in the outer SELECT.
This rewrite is transparent — you write your query naturally and pg_trickle handles it:
Your query:
SELECT
id,
name,
CASE WHEN ROW_NUMBER() OVER (PARTITION BY dept ORDER BY salary DESC) = 1
THEN 'top earner'
ELSE 'other'
END AS rank_label
FROM employees
What pg_trickle generates internally:
SELECT
"__pgt_wf_inner".id,
"__pgt_wf_inner".name,
CASE WHEN "__pgt_wf_inner"."__pgt_wf_1" = 1
THEN 'top earner'
ELSE 'other'
END AS "rank_label"
FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY dept ORDER BY salary DESC) AS "__pgt_wf_1"
FROM employees
) "__pgt_wf_inner"
The inner subquery produces the window function result as a plain column
(__pgt_wf_1), which the DVM engine can maintain incrementally using its
existing window function support. The outer expression is then a simple
column reference.
More examples:
-- Arithmetic with window functions
SELECT id, ABS(RANK() OVER (ORDER BY score) - 5) AS adjusted_rank
FROM players
-- COALESCE with window function
SELECT id, COALESCE(LAG(value) OVER (ORDER BY ts), 0) AS prev_value
FROM sensor_readings
-- Multiple window functions in expressions
SELECT id,
ROW_NUMBER() OVER (ORDER BY created_at) * 100 AS seq,
SUM(amount) OVER (ORDER BY created_at) / COUNT(*) OVER (ORDER BY created_at) AS running_avg
FROM transactions
All of these are handled automatically — each distinct window function call
is extracted to its own __pgt_wf_N synthetic column.
HAVING Clause
HAVING is fully supported. The filter predicate is applied on top of the aggregate delta computation — groups that pass the HAVING condition are included in the stream table.
SELECT pgtrickle.create_stream_table(
name => 'big_departments',
query => 'SELECT department, COUNT(*) AS cnt FROM employees GROUP BY department HAVING COUNT(*) > 10',
schedule => '1m'
);
Tables Without Primary Keys (Keyless Tables)
Tables without a primary key can be used as sources. pg_trickle generates a content-based row identity
by hashing all column values using pg_trickle_hash_multi(). This allows DIFFERENTIAL mode to work,
though at the cost of being unable to distinguish truly duplicate rows (rows with identical values in all columns).
-- No primary key — pg_trickle uses content hashing for row identity
CREATE TABLE events (ts TIMESTAMPTZ, payload JSONB);
SELECT pgtrickle.create_stream_table(
name => 'event_summary',
query => 'SELECT payload->>''type'' AS event_type, COUNT(*) FROM events GROUP BY 1',
schedule => '1m'
);
Known Limitation — Duplicate Rows in Keyless Tables (G7.1)
When a keyless table contains exact duplicate rows (identical values in every column), content-based hashing produces the same
__pgt_row_idfor each copy. Consequences:
- INSERT of a duplicate row may appear as a no-op (the hash already exists in the stream table).
- DELETE of one copy may delete all copies (the MERGE matches on
__pgt_row_id, hitting every duplicate).- Aggregate counts over keyless tables with duplicates may drift from the true query result.
Recommendation: Add a
PRIMARY KEYor at least aUNIQUEconstraint to source tables used in DIFFERENTIAL mode. This eliminates the ambiguity entirely. If duplicates are expected and correctness matters, useFULLrefresh mode, which always recomputes from scratch.
Volatile Function Detection
pg_trickle checks all functions and operators in the defining query against pg_proc.provolatile:
- VOLATILE functions (e.g.,
random(),clock_timestamp(),gen_random_uuid()) are rejected in DIFFERENTIAL and IMMEDIATE modes because they produce different results on each evaluation, breaking delta correctness. - VOLATILE operators — custom operators backed by volatile functions are also detected. The check resolves the operator’s implementation function via
pg_operator.oprcodeand checks its volatility inpg_proc. - STABLE functions (e.g.,
now(),current_timestamp,current_setting()) produce a warning in DIFFERENTIAL and IMMEDIATE modes — they are consistent within a single refresh but may differ between refreshes. - IMMUTABLE functions are always safe and produce no warnings.
FULL mode accepts all volatility classes since it re-evaluates the entire query each time.
Volatile Function Policy (VOL-1)
The pg_trickle.volatile_function_policy GUC controls how volatile functions are handled:
| Value | Behavior |
|---|---|
reject (default) | ERROR — volatile functions are rejected at creation time. |
warn | WARNING emitted but creation proceeds. Delta correctness is not guaranteed. |
allow | Silent — no warning or error. Use when you understand the implications. |
-- Allow volatile functions with a warning
SET pg_trickle.volatile_function_policy = 'warn';
-- Allow volatile functions silently
SET pg_trickle.volatile_function_policy = 'allow';
-- Restore default (reject volatile functions)
SET pg_trickle.volatile_function_policy = 'reject';
COLLATE Expressions
COLLATE clauses on expressions are supported:
SELECT pgtrickle.create_stream_table(
name => 'sorted_names',
query => 'SELECT name COLLATE "C" AS c_name FROM users',
schedule => '1m'
);
IS JSON Predicate (PostgreSQL 16+)
The IS JSON predicate validates whether a value is valid JSON. All variants are supported:
-- Filter rows with valid JSON
SELECT pgtrickle.create_stream_table(
name => 'valid_json_events',
query => 'SELECT id, payload FROM events WHERE payload::text IS JSON',
schedule => '1m'
);
-- Type-specific checks
SELECT pgtrickle.create_stream_table(
name => 'json_objects_only',
query => 'SELECT id, data IS JSON OBJECT AS is_obj,
data IS JSON ARRAY AS is_arr,
data IS JSON SCALAR AS is_scalar
FROM json_data',
schedule => '1m',
refresh_mode => 'FULL'
);
Supported variants: IS JSON, IS JSON OBJECT, IS JSON ARRAY, IS JSON SCALAR, IS NOT JSON (all forms), WITH UNIQUE KEYS.
SQL/JSON Constructors (PostgreSQL 16+)
SQL-standard JSON constructor functions are supported in both FULL and DIFFERENTIAL modes:
-- JSON_OBJECT: construct a JSON object from key-value pairs
SELECT pgtrickle.create_stream_table(
name => 'user_json',
query => 'SELECT id, JSON_OBJECT(''name'' : name, ''age'' : age) AS data FROM users',
schedule => '1m'
);
-- JSON_ARRAY: construct a JSON array from values
SELECT pgtrickle.create_stream_table(
name => 'value_arrays',
query => 'SELECT id, JSON_ARRAY(a, b, c) AS arr FROM measurements',
schedule => '1m',
refresh_mode => 'FULL'
);
-- JSON(): parse a text value as JSON
-- JSON_SCALAR(): wrap a scalar value as JSON
-- JSON_SERIALIZE(): serialize a JSON value to text
Note:
JSON_ARRAYAGG()andJSON_OBJECTAGG()are SQL-standard aggregate functions fully recognized by the DVM engine. In DIFFERENTIAL mode, they use the group-rescan strategy (affected groups are re-aggregated from source data). The full deparsed SQL is preserved to handle the specialkey: value,ABSENT ON NULL,ORDER BY, andRETURNINGclause syntax.
JSON_TABLE (PostgreSQL 17+)
JSON_TABLE() generates a relational table from JSON data. It is supported in the FROM clause in both FULL and DIFFERENTIAL modes. Internally, it is modeled as a LateralFunction.
-- Extract structured data from a JSON column
SELECT pgtrickle.create_stream_table(
name => 'user_phones',
query => $$SELECT u.id, j.phone_type, j.phone_number
FROM users u,
JSON_TABLE(u.contact_info, '$.phones[*]'
COLUMNS (
phone_type TEXT PATH '$.type',
phone_number TEXT PATH '$.number'
)
) AS j$$,
schedule => '1m'
);
Supported column types:
- Regular columns —
name TYPE PATH '$.path'(with optionalON ERROR/ON EMPTYbehaviors) - EXISTS columns —
name TYPE EXISTS PATH '$.path' - Formatted columns —
name TYPE FORMAT JSON PATH '$.path' - Nested columns —
NESTED PATH '$.path' COLUMNS (...)
The PASSING clause is also supported for passing named variables to path expressions.
Unsupported Expression Types
The following are rejected with clear error messages rather than producing broken SQL:
| Expression | Error Behavior | Suggested Rewrite |
|---|---|---|
TABLESAMPLE | Rejected — stream tables materialize the complete result set | Use WHERE random() < 0.1 if sampling is needed |
FOR UPDATE / FOR SHARE | Rejected — stream tables do not support row-level locking | Remove the locking clause |
| Unknown node types | Rejected with type information | — |
Note: Window functions inside expressions (e.g.,
CASE WHEN ROW_NUMBER() OVER (...) ...) were unsupported in earlier versions but are now automatically rewritten — see Auto-Rewrite Pipeline § Window Functions in Expressions.
Restrictions & Interoperability
Stream tables are standard PostgreSQL heap tables stored in the pgtrickle schema with an additional __pgt_row_id BIGINT PRIMARY KEY column managed by the refresh engine. This section describes what you can and cannot do with them.
Referencing Other Stream Tables
Stream tables can reference other stream tables in their defining query. This creates a dependency edge in the internal DAG, and the scheduler refreshes upstream tables before downstream ones. By default, cycles are detected and rejected at creation time.
When pg_trickle.allow_circular = true, circular dependencies are allowed for stream tables that use DIFFERENTIAL refresh mode and have monotone defining queries (no aggregates, EXCEPT, window functions, or NOT EXISTS/NOT IN). Cycle members are assigned an scc_id and the scheduler iterates them to a fixed point. Non-monotone operators are rejected because they prevent convergence.
-- ST1 reads from a base table
SELECT pgtrickle.create_stream_table(
name => 'order_totals',
query => 'SELECT customer_id, SUM(amount) AS total FROM orders GROUP BY customer_id',
schedule => '1m'
);
-- ST2 reads from ST1
SELECT pgtrickle.create_stream_table(
name => 'big_customers',
query => 'SELECT customer_id, total FROM pgtrickle.order_totals WHERE total > 1000',
schedule => '1m'
);
Views as Sources in Defining Queries
PostgreSQL views can be used as source tables in a stream table's defining query. Views are automatically inlined — replaced with their underlying SELECT definition as subqueries — so CDC triggers land on the actual base tables.
CREATE VIEW active_orders AS
SELECT * FROM orders WHERE status = 'active';
-- This works (views are auto-inlined):
SELECT pgtrickle.create_stream_table(
name => 'order_summary',
query => 'SELECT customer_id, COUNT(*) FROM active_orders GROUP BY customer_id',
schedule => '1m'
);
-- Internally, 'active_orders' is replaced with:
-- (SELECT ... FROM orders WHERE status = 'active') AS active_orders
Nested views (view → view → table) are fully expanded via a fixpoint loop. Column-renaming views (CREATE VIEW v(a, b) AS ...) work correctly — pg_get_viewdef() produces the proper column aliases.
When a view is inlined, the user's original SQL is stored in the original_query catalog column for reinit and introspection. The defining_query column contains the expanded (post-inlining) form.
DDL hooks: CREATE OR REPLACE VIEW on a view that was inlined into a stream table marks that ST for reinit. DROP VIEW sets affected STs to ERROR status.
Materialized views are rejected in DIFFERENTIAL mode — their stale-snapshot semantics prevent CDC triggers from tracking changes. Use the underlying query directly, or switch to FULL mode. In FULL mode, materialized views are allowed (no CDC needed).
Foreign tables are rejected in DIFFERENTIAL mode — row-level triggers cannot be created on foreign tables. Use FULL mode instead.
Partitioned Tables as Sources
Partitioned tables are fully supported as source tables in both FULL and DIFFERENTIAL modes. CDC triggers are installed on the partitioned parent table, and PostgreSQL 13+ ensures the trigger fires for all DML routed to child partitions. The change buffer uses the parent table's OID (pgtrickle_changes.changes_<parent_oid>).
CREATE TABLE orders (
id INT, region TEXT, amount NUMERIC
) PARTITION BY LIST (region);
CREATE TABLE orders_us PARTITION OF orders FOR VALUES IN ('US');
CREATE TABLE orders_eu PARTITION OF orders FOR VALUES IN ('EU');
-- Works — inserts into any partition are captured:
SELECT pgtrickle.create_stream_table(
name => 'order_summary',
query => 'SELECT region, SUM(amount) FROM orders GROUP BY region',
schedule => '1m'
);
ATTACH PARTITION detection: When a new partition is attached to a tracked
source table via ALTER TABLE parent ATTACH PARTITION child ..., pg_trickle's
DDL event trigger detects the change in partition structure and automatically
marks affected stream tables for reinitialize. This ensures pre-existing rows
in the newly attached partition are included on the next refresh. DETACH
PARTITION is also detected and triggers reinitialization.
WAL mode: When using WAL-based CDC (cdc_mode = 'wal'), publications for
partitioned source tables are created with publish_via_partition_root = true.
This ensures changes from child partitions are published under the parent
table's identity, matching trigger-mode CDC behavior.
Note: pg_trickle targets PostgreSQL 18. On PostgreSQL 12 or earlier (not supported), parent triggers do not fire for partition-routed rows, which would cause silent data loss.
Foreign Tables as Sources
Foreign tables (via postgres_fdw or other FDWs) can be used as stream table
sources with these constraints:
| CDC Method | Supported? | Why |
|---|---|---|
| Trigger-based | ❌ No | Foreign tables don't support row-level triggers |
| WAL-based | ❌ No | Foreign tables don't generate local WAL entries |
| FULL refresh | ✅ Yes | Re-executes the remote query each cycle |
| Polling-based | ✅ Yes | When pg_trickle.foreign_table_polling = on |
-- Foreign table source — FULL refresh only
SELECT pgtrickle.create_stream_table(
name => 'remote_summary',
query => 'SELECT region, SUM(amount) FROM remote_orders GROUP BY region',
schedule => '5m',
refresh_mode => 'FULL'
);
When pg_trickle detects a foreign table source, it emits an INFO message explaining the constraints. If you attempt to use DIFFERENTIAL mode without polling enabled, the creation will succeed but the refresh falls back to FULL.
Polling-based CDC creates a local snapshot table and computes EXCEPT ALL
differences on each refresh. Enable with:
SET pg_trickle.foreign_table_polling = on;
For a complete step-by-step setup guide, see the Foreign Table Sources tutorial.
IMMEDIATE Mode Query Restrictions
The 'IMMEDIATE' refresh mode supports nearly all SQL constructs supported by 'DIFFERENTIAL' and 'FULL' modes. Queries are validated at stream table creation and when switching to IMMEDIATE mode via alter_stream_table.
Supported in IMMEDIATE mode:
- Simple
SELECT ... FROM tablescans, filters, projections JOIN(INNER, LEFT, FULL OUTER)GROUP BYwith standard aggregates (COUNT,SUM,AVG,MIN,MAX, etc.)DISTINCT- Non-recursive
WITH(CTEs) UNION ALL,INTERSECT,EXCEPTEXISTS/INsubqueries (SemiJoin,AntiJoin)- Subqueries in
FROM - Window functions (
ROW_NUMBER,RANK,DENSE_RANK, etc.) LATERALsubqueriesLATERALset-returning functions (unnest(),jsonb_array_elements(), etc.)- Scalar subqueries in
SELECT - Cascading IMMEDIATE stream tables (ST depending on another IMMEDIATE ST)
- Recursive CTEs (
WITH RECURSIVE) — uses semi-naive evaluation (INSERT-only) or Delete-and-Rederive (DELETE/UPDATE); bounded bypg_trickle.ivm_recursive_max_depth(default 100) to guard against infinite loops from cyclic data
Not yet supported in IMMEDIATE mode:
None — all constructs that work in 'DIFFERENTIAL' mode are now also available in
'IMMEDIATE' mode.
Notes on WITH RECURSIVE in IMMEDIATE mode:
- A
__pgt_depthcounter is injected into the generated semi-naive SQL. Propagation stops when the counter reachesivm_recursive_max_depth(default 100). Raise this GUC for deeper hierarchies or set it to 0 to disable the guard. - A WARNING is emitted at stream table creation time reminding operators to monitor
for
stack depth limit exceedederrors on very deep hierarchies. - Non-linear recursion (multiple self-references) is rejected — PostgreSQL itself enforces this restriction.
Attempting to create a stream table with an unsupported construct produces a clear error message.
Logical Replication Targets
Tables that receive data via logical replication require special consideration. Changes arriving via replication do not fire normal row-level triggers, which means CDC triggers will miss those changes.
pg_trickle emits a WARNING at stream table creation time if any source table is detected as a logical replication target (via pg_subscription_rel).
Workarounds:
- Use
cdc_mode = 'wal'for WAL-based CDC that captures all changes regardless of origin. - Use
FULLrefresh mode, which recomputes entirely from the current table state. - Set a frequent refresh schedule with FULL mode to limit staleness.
Views on Stream Tables
PostgreSQL views can reference stream tables. The view reflects the data as of the most recent refresh.
CREATE VIEW top_customers AS
SELECT customer_id, total
FROM pgtrickle.order_totals
WHERE total > 500
ORDER BY total DESC;
Materialized Views on Stream Tables
Materialized views can reference stream tables, though this is typically redundant (both are physical snapshots of a query). The materialized view requires its own REFRESH MATERIALIZED VIEW — it does not auto-refresh when the stream table refreshes.
Logical Replication of Stream Tables
Stream tables can be published for logical replication like any ordinary table:
-- On publisher
CREATE PUBLICATION my_pub FOR TABLE pgtrickle.order_totals;
-- On subscriber
CREATE SUBSCRIPTION my_sub
CONNECTION 'host=... dbname=...'
PUBLICATION my_pub;
Caveats:
- The
__pgt_row_idcolumn is replicated (it is the primary key), which is an internal implementation detail. - The subscriber receives materialized data, not the defining query. Refreshes on the publisher propagate as normal DML via logical replication.
- Do not install pg_trickle on the subscriber and attempt to refresh the replicated table — it will have no CDC triggers or catalog entries.
- The internal change buffer tables (
pgtrickle_changes.changes_<oid>) and catalog tables are not published by default; subscribers only receive the final output.
Known Delta Computation Limitations
The following edge cases produce incorrect delta results in DIFFERENTIAL mode under specific data mutation patterns. They have no effect on FULL mode.
JOIN Key Column Change + Simultaneous Right-Side Delete — Fixed (EC-01)
Resolved in v0.14.0. This limitation no longer exists — the delta query now uses a pre-change right snapshot (R₀) for DELETE deltas, ensuring stale rows are correctly removed even when the join partner is simultaneously deleted.
The fix splits Part 1 of the JOIN delta into two arms:
- Part 1a (inserts):
ΔL_inserts ⋈ R₁— uses current right state - Part 1b (deletes):
ΔL_deletes ⋈ R₀— uses pre-change right state
R₀ is reconstructed as R_current EXCEPT ALL ΔR_inserts UNION ALL ΔR_deletes (or via
NOT EXISTS anti-join for simple Scan nodes). This ensures the DELETE half always
finds the old join partner, even if that partner was deleted in the same cycle.
The fix applies to INNER JOIN, LEFT JOIN, and FULL OUTER JOIN delta operators. See DVM_OPERATORS.md for implementation details.
CUBE/ROLLUP Expansion Limit
CUBE(a, b, c...n) on N columns generates $2^N$ grouping set branches (a UNION ALL of N queries).
pg_trickle rejects CUBE/ROLLUP that would produce more than 64 branches to prevent runaway
memory usage during query generation. Use explicit GROUPING SETS(...) instead:
-- Rejected: CUBE(a, b, c, d, e, f, g) would generate 128 branches
-- Use instead:
SELECT pgtrickle.create_stream_table(
name => 'multi_dim',
query => 'SELECT a, b, c, SUM(v) FROM t
GROUP BY GROUPING SETS ((a, b, c), (a, b), (a), ())',
schedule => '5m'
);
What Is NOT Allowed
| Operation | Restriction | Reason |
|---|---|---|
Direct DML (INSERT, UPDATE, DELETE) | ❌ Not supported | Stream table contents are managed exclusively by the refresh engine. |
Direct DDL (ALTER TABLE) | ❌ Not supported | Use pgtrickle.alter_stream_table() to change the defining query or schedule. |
| Foreign keys referencing or from a stream table | ❌ Not supported | The refresh engine performs bulk MERGE operations that do not respect FK ordering. |
| User-defined triggers on stream tables | ✅ Supported (DIFFERENTIAL) | In DIFFERENTIAL mode, the refresh engine decomposes changes into explicit DELETE + UPDATE + INSERT statements so triggers fire with correct TG_OP, OLD, and NEW. Row-level triggers are suppressed during FULL refresh. Controlled by pg_trickle.user_triggers GUC (default: auto). |
TRUNCATE on a stream table | ❌ Not supported | Use pgtrickle.refresh_stream_table() to reset data. |
Tip: The
__pgt_row_idcolumn is visible but should be ignored by consuming queries — it is an implementation detail used for deltaMERGEoperations.
Internal __pgt_* Auxiliary Columns
Stream tables may contain additional hidden columns whose names begin with __pgt_. These are managed exclusively by the refresh engine — they are not part of the user-visible schema and should never be read or written by application queries.
__pgt_row_id — Row identity (always present)
Every stream table has a BIGINT PRIMARY KEY column named __pgt_row_id. It is a content hash of all output columns (xxHash3-128 with Fibonacci-mixing of multiple column hashes), updated by the refresh engine on every MERGE. It is used as the MERGE join key to detect inserts/updates/deletes.
__pgt_count — Group multiplicity (aggregates & DISTINCT)
Added when the defining query contains GROUP BY, DISTINCT, UNION ALL ... GROUP BY, or any aggregate expression that requires tracking how many source rows contribute to each output row.
| Type | Triggers |
|---|---|
BIGINT NOT NULL DEFAULT 0 | GROUP BY, DISTINCT, COUNT(*), SUM(...), AVG(...), STDDEV(...), VAR(...), UNION deduplication |
__pgt_count_l / __pgt_count_r — Dual multiplicity (INTERSECT / EXCEPT)
Added when the defining query contains INTERSECT or EXCEPT. Stores independently the left-branch and right-branch row counts for Z-set delta algebra.
| Type | Triggers |
|---|---|
BIGINT NOT NULL DEFAULT 0 each | INTERSECT, INTERSECT ALL, EXCEPT, EXCEPT ALL |
__pgt_aux_sum_<alias> / __pgt_aux_count_<alias> — Running totals for AVG
Pairs of auxiliary columns added for each AVG(expr) in the query. Instead of recomputing the average from scratch on each delta, the refresh engine maintains a running sum and count and derives the average algebraically.
| Type | Triggers |
|---|---|
NUMERIC NOT NULL DEFAULT 0 (sum), BIGINT NOT NULL DEFAULT 0 (count) | Any AVG(expr) in GROUP BY query |
Named __pgt_aux_sum_<output_alias> and __pgt_aux_count_<output_alias>, where <output_alias> is the column alias for the AVG expression in the SELECT list.
__pgt_aux_sum2_<alias> — Sum-of-squares for STDDEV / VARIANCE
Added alongside the sum/count pair when the query contains STDDEV, STDDEV_POP, STDDEV_SAMP, VARIANCE, VAR_POP, or VAR_SAMP. Enables O(1) algebraic computation of variance from the Welford identity.
| Type | Triggers |
|---|---|
NUMERIC NOT NULL DEFAULT 0 | STDDEV(...), STDDEV_POP(...), STDDEV_SAMP(...), VARIANCE(...), VAR_POP(...), VAR_SAMP(...) |
__pgt_aux_sumx_* / __pgt_aux_sumy_* / __pgt_aux_sumxy_* / __pgt_aux_sumx2_* / __pgt_aux_sumy2_* — Cross-product accumulators for regression aggregates
Five auxiliary columns per aggregate, used for O(1) algebraic maintenance of the twelve PostgreSQL regression and correlation aggregates.
| Type | Triggers |
|---|---|
NUMERIC NOT NULL DEFAULT 0 (five columns per aggregate) | CORR(Y,X), COVAR_POP(Y,X), COVAR_SAMP(Y,X), REGR_AVGX(Y,X), REGR_AVGY(Y,X), REGR_COUNT(Y,X), REGR_INTERCEPT(Y,X), REGR_R2(Y,X), REGR_SLOPE(Y,X), REGR_SXX(Y,X), REGR_SXY(Y,X), REGR_SYY(Y,X) |
The five columns are named with base prefix __pgt_aux_<kind>_<output_alias> where <kind> is sumx, sumy, sumxy, sumx2, or sumy2. The shared group count is stored in the companion __pgt_aux_count_<output_alias> column.
__pgt_aux_nonnull_<alias> — Non-NULL count for SUM + FULL OUTER JOIN
Added when the query contains SUM(expr) inside a FULL OUTER JOIN aggregate. When matched rows transition to unmatched (null-padded), standard algebraic SUM would produce 0 instead of NULL. This counter tracks how many non-NULL argument values exist in each group; when it reaches zero the SUM is definitively NULL without a full rescan.
| Type | Triggers |
|---|---|
BIGINT NOT NULL DEFAULT 0 | SUM(expr) in a query with FULL OUTER JOIN at the top level |
__pgt_wf_<N> — Window function lift-out (query rewrite)
Added at query-rewrite time (before storage table creation) when the defining query contains window functions embedded inside larger expressions (e.g. CASE WHEN ROW_NUMBER() OVER (...) = 1 THEN ...). The engine lifts the window function to a synthetic inner-subquery column so the outer SELECT can reference it by alias.
| Type | Triggers |
|---|---|
| Inherits the window-function return type | Window function inside expression — e.g. RANK(), ROW_NUMBER(), DENSE_RANK(), LAG(), LEAD(), etc. |
__pgt_depth — Recursion depth counter (recursive CTE)
Present only inside the DVM-generated SQL for recursive CTE queries. Used to limit unbounded recursion in semi-naive evaluation. Not added as a permanent column to the storage table.
Rule of thumb: Unless you see an
ALTER TABLEquery mentioning one of these columns, they are transparent to consuming queries. NeverSELECT __pgt_*columns in application code — their names, types, and presence may change across minor versions.
Row-Level Security (RLS)
Stream tables follow the same RLS model as PostgreSQL's built-in
MATERIALIZED VIEW: the refresh always materializes the full, unfiltered
result set. Access control is applied at read time via RLS policies on the
stream table itself.
How It Works
| Area | Behavior |
|---|---|
| RLS on source tables | Ignored during refresh. The scheduler runs as superuser; manual refresh_stream_table() and IMMEDIATE-mode triggers bypass RLS via SET LOCAL row_security = off / SECURITY DEFINER. The stream table always contains all rows. |
| RLS on the stream table | Works naturally. Enable RLS and create policies on the stream table to filter reads per role — exactly as you would on any regular table. |
| RLS policy changes on source tables | CREATE POLICY, ALTER POLICY, and DROP POLICY on a source table are detected by pg_trickle's DDL event trigger and mark the stream table for reinitialisation. |
| ENABLE/DISABLE RLS on source tables | ALTER TABLE … ENABLE ROW LEVEL SECURITY and DISABLE ROW LEVEL SECURITY on a source table mark the stream table for reinitialisation. |
| Change buffer tables | RLS is explicitly disabled on all change buffer tables (pgtrickle_changes.changes_*) so CDC trigger inserts always succeed regardless of schema-level RLS settings. |
| IMMEDIATE mode | IVM trigger functions are SECURITY DEFINER with a locked search_path, so the delta query always sees all rows. The DML issued by the calling user is still filtered by that user's RLS policies on the source table — only the stream table maintenance runs with elevated privileges. |
Recommended Pattern: RLS on the Stream Table
-- 1. Create a stream table (materializes all rows)
SELECT pgtrickle.create_stream_table(
name => 'order_totals',
query => 'SELECT tenant_id, SUM(amount) AS total FROM orders GROUP BY tenant_id'
);
-- 2. Enable RLS on the stream table
ALTER TABLE pgtrickle.order_totals ENABLE ROW LEVEL SECURITY;
-- 3. Create per-tenant policies
CREATE POLICY tenant_isolation ON pgtrickle.order_totals
USING (tenant_id = current_setting('app.tenant_id')::INT);
-- 4. Each role sees only its own rows
SET app.tenant_id = '42';
SELECT * FROM pgtrickle.order_totals; -- only tenant 42's rows
Note: This is identical to how you would apply RLS to a regular
MATERIALIZED VIEW. One stream table serves all tenants; per-tenant filtering happens at query time with zero storage duplication.
Views
pgtrickle.stream_tables_info
Status overview with computed staleness information.
SELECT * FROM pgtrickle.stream_tables_info;
Columns include all pgtrickle.pgt_stream_tables columns plus:
| Column | Type | Description |
|---|---|---|
staleness | interval | now() - last_refresh_at |
stale | bool | true when the scheduler itself is behind (last_refresh_at age exceeds schedule); false when the scheduler is healthy even if source tables have had no writes |
pgtrickle.pg_stat_stream_tables
Comprehensive monitoring view combining catalog metadata with aggregate refresh statistics.
SELECT * FROM pgtrickle.pg_stat_stream_tables;
Key columns:
| Column | Type | Description |
|---|---|---|
pgt_id | bigint | Stream table ID |
pgt_schema / pgt_name | text | Schema and name |
status | text | INITIALIZING, ACTIVE, SUSPENDED, ERROR |
refresh_mode | text | FULL or DIFFERENTIAL |
data_timestamp | timestamptz | Timestamp of last refresh |
staleness | interval | now() - last_refresh_at |
stale | bool | true when the scheduler is behind its schedule; false when the scheduler is healthy (quiet source tables do not count as stale) |
total_refreshes | bigint | Total refresh count |
successful_refreshes | bigint | Successful refresh count |
failed_refreshes | bigint | Failed refresh count |
avg_duration_ms | float8 | Average refresh duration |
consecutive_errors | int | Current error streak |
cdc_modes | text[] | Distinct CDC modes across TABLE-type sources (e.g. {wal}, {trigger,wal}, {transitioning,wal}) |
scc_id | int | SCC group identifier for circular dependencies (NULL if not in a cycle) |
last_fixpoint_iterations | int | Number of fixpoint iterations in the last SCC convergence (NULL if not cyclic) |
pgtrickle.quick_health
Single-row health summary for dashboards and alerting. Returns the overall health status of the pg_trickle extension at a glance.
SELECT * FROM pgtrickle.quick_health;
| Column | Type | Description |
|---|---|---|
total_stream_tables | bigint | Total number of stream tables |
error_tables | bigint | Stream tables with status = 'ERROR' or consecutive_errors > 0 |
stale_tables | bigint | Stream tables whose data is older than their schedule interval |
scheduler_running | boolean | Whether a pg_trickle scheduler backend is detected in pg_stat_activity |
status | text | Overall status: EMPTY, OK, WARNING, or CRITICAL |
Status values:
EMPTY— No stream tables exist.OK— All stream tables are healthy and up-to-date.WARNING— Some tables have errors or are stale.CRITICAL— At least one stream table isSUSPENDED.
pgtrickle.pgt_cdc_status
Convenience view for inspecting the CDC mode and WAL slot state of every TABLE-type source for all stream tables. Useful for monitoring in-progress TRIGGER→WAL transitions.
SELECT * FROM pgtrickle.pgt_cdc_status;
| Column | Type | Description |
|---|---|---|
pgt_schema | text | Schema of the stream table |
pgt_name | text | Name of the stream table |
source_relid | oid | OID of the source table |
source_name | text | Name of the source table |
source_schema | text | Schema of the source table |
cdc_mode | text | Current CDC mode: trigger, transitioning, or wal |
slot_name | text | Replication slot name (NULL for trigger mode) |
decoder_confirmed_lsn | pg_lsn | Last WAL position decoded (NULL for trigger mode) |
transition_started_at | timestamptz | When the trigger→WAL transition began (NULL if not transitioning) |
Subscribe to the pgtrickle_cdc_transition NOTIFY channel to receive real-time
events when a source moves between CDC modes (payload is a JSON object with
source_oid, from, and to fields).
Catalog Tables
pgtrickle.pgt_stream_tables
Core metadata for each stream table.
| Column | Type | Description |
|---|---|---|
pgt_id | bigserial | Primary key |
pgt_relid | oid | OID of the storage table |
pgt_name | text | Table name |
pgt_schema | text | Schema name |
defining_query | text | The SQL query that defines the ST |
original_query | text | The user-supplied query before normalization |
schedule | text | Refresh schedule (duration or cron expression) |
refresh_mode | text | FULL, DIFFERENTIAL, or IMMEDIATE |
status | text | INITIALIZING, ACTIVE, SUSPENDED, ERROR |
is_populated | bool | Whether the table has been populated |
data_timestamp | timestamptz | Timestamp of the data in the ST |
frontier | jsonb | Per-source LSN positions (version tracking) |
last_refresh_at | timestamptz | When last refreshed |
consecutive_errors | int | Current error streak count |
needs_reinit | bool | Whether upstream DDL requires reinitialization |
auto_threshold | double precision | Per-ST adaptive fallback threshold (overrides GUC) |
last_full_ms | double precision | Last FULL refresh duration in milliseconds |
functions_used | text[] | Function names used in the defining query (for DDL tracking) |
topk_limit | int | LIMIT value for TopK stream tables (NULL if not TopK) |
topk_order_by | text | ORDER BY clause SQL for TopK stream tables |
topk_offset | int | OFFSET value for paged TopK queries (NULL if not paged) |
diamond_consistency | text | Diamond consistency mode: none or atomic |
diamond_schedule_policy | text | Diamond schedule policy: fastest or slowest |
has_keyless_source | bool | Whether any source table lacks a PRIMARY KEY (EC-06) |
function_hashes | text | MD5 hashes of referenced function bodies for change detection (EC-16) |
scc_id | int | SCC group identifier for circular dependencies (NULL if not in a cycle) |
last_fixpoint_iterations | int | Number of iterations in the last SCC fixpoint convergence (NULL if never iterated) |
created_at | timestamptz | Creation timestamp |
updated_at | timestamptz | Last modification timestamp |
pgtrickle.pgt_dependencies
DAG edges — records which source tables each ST depends on, including CDC mode metadata.
| Column | Type | Description |
|---|---|---|
pgt_id | bigint | FK to pgt_stream_tables |
source_relid | oid | OID of the source table |
source_type | text | TABLE, STREAM_TABLE, VIEW, MATVIEW, or FOREIGN_TABLE |
columns_used | text[] | Which columns are referenced |
column_snapshot | jsonb | Snapshot of source column metadata at creation time |
schema_fingerprint | text | SHA-256 fingerprint of column snapshot for fast equality checks |
cdc_mode | text | Current CDC mode: TRIGGER, TRANSITIONING, or WAL |
slot_name | text | Replication slot name (WAL/TRANSITIONING modes) |
decoder_confirmed_lsn | pg_lsn | WAL decoder's last confirmed position |
transition_started_at | timestamptz | When the trigger→WAL transition started |
pgtrickle.pgt_refresh_history
Audit log of all refresh operations.
| Column | Type | Description |
|---|---|---|
refresh_id | bigserial | Primary key |
pgt_id | bigint | FK to pgt_stream_tables |
data_timestamp | timestamptz | Data timestamp of the refresh |
start_time | timestamptz | When the refresh started |
end_time | timestamptz | When it completed |
action | text | NO_DATA, FULL, DIFFERENTIAL, REINITIALIZE, SKIP |
rows_inserted | bigint | Rows inserted |
rows_deleted | bigint | Rows deleted |
delta_row_count | bigint | Number of delta rows processed from change buffers |
merge_strategy_used | text | Which merge strategy was used (e.g. MERGE, DELETE+INSERT) |
was_full_fallback | bool | Whether the refresh fell back to FULL from DIFFERENTIAL |
error_message | text | Error message if failed |
status | text | RUNNING, COMPLETED, FAILED, SKIPPED |
initiated_by | text | What triggered: SCHEDULER, MANUAL, or INITIAL |
freshness_deadline | timestamptz | SLA deadline (duration schedules only; NULL for cron) |
fixpoint_iteration | int | Iteration of the fixed-point loop (NULL for non-cyclic refreshes) |
pgtrickle.pgt_change_tracking
CDC slot tracking per source table.
| Column | Type | Description |
|---|---|---|
source_relid | oid | OID of the tracked source table |
slot_name | text | Logical replication slot name |
last_consumed_lsn | pg_lsn | Last consumed WAL position |
tracked_by_pgt_ids | bigint[] | Array of ST IDs depending on this source |
pgtrickle.pgt_source_gates
Bootstrap source gate registry. One row per source table that has ever been
gated. Only sources with gated = true are actively blocking scheduler
refreshes.
| Column | Type | Description |
|---|---|---|
source_relid | oid | OID of the gated source table (PK) |
gated | boolean | true while the source is gated; false after ungate_source() |
gated_at | timestamptz | When the gate was most recently set |
ungated_at | timestamptz | When the gate was cleared (NULL if still active) |
gated_by | text | Actor that set the gate (e.g. 'gate_source') |
pgtrickle.pgt_refresh_groups
User-declared Cross-Source Snapshot Consistency groups (v0.9.0). A refresh
group guarantees that all member stream tables are refreshed against a snapshot
taken at the same point in time, preventing partial-update visibility (e.g.
orders and order_lines both reflecting the same transaction boundary).
| Column | Type | Description |
|---|---|---|
group_id | serial | Primary key |
group_name | text | Unique human-readable group name |
member_oids | oid[] | OIDs of the stream table storage relations that participate in this group |
isolation | text | Snapshot isolation level for the group: 'read_committed' (default) or 'repeatable_read' |
created_at | timestamptz | When the group was created |
Management API
-- Create a refresh group
SELECT pgtrickle.create_refresh_group(
'orders_snapshot',
ARRAY['public.orders_summary', 'public.order_lines_summary'],
'repeatable_read' -- or 'read_committed' (default)
);
-- List all groups:
SELECT * FROM pgtrickle.refresh_groups();
-- Remove a group:
SELECT pgtrickle.drop_refresh_group('orders_snapshot');
Validation rules:
- At least 2 member stream tables are required.
- All members must exist in
pgt_stream_tables. - No member can appear in more than one refresh group.
- Valid isolation levels:
'read_committed'(default),'repeatable_read'.
Bootstrap Source Gating (v0.5.0)
These functions let operators pause and resume scheduler-driven refreshes for individual source tables — useful during large bulk loads or ETL windows.
pgtrickle.gate_source(source TEXT)
Mark a source table as gated. The scheduler will skip any stream table that
reads from this source until ungate_source() is called.
SELECT pgtrickle.gate_source('my_schema.big_source');
Manual refresh_stream_table() calls are not affected by gates.
pgtrickle.ungate_source(source TEXT)
Clear a gate set by gate_source(). After this call the scheduler resumes
normal refresh scheduling for dependent stream tables.
SELECT pgtrickle.ungate_source('my_schema.big_source');
pgtrickle.source_gates()
Table function returning the current gate status for all registered sources.
SELECT * FROM pgtrickle.source_gates();
-- source_table | schema_name | gated | gated_at | ungated_at | gated_by
| Column | Type | Description |
|---|---|---|
source_table | text | Relation name |
schema_name | text | Schema name |
gated | boolean | Whether the source is currently gated |
gated_at | timestamptz | When the gate was set |
ungated_at | timestamptz | When the gate was cleared (NULL if active) |
gated_by | text | Which function set the gate |
Typical workflow
-- 1. Gate the source before a bulk load.
SELECT pgtrickle.gate_source('orders');
-- 2. Load historical data (scheduler sits idle for orders-based STs).
COPY orders FROM '/data/historical_orders.csv';
-- 3. Ungate — the next scheduler tick refreshes everything cleanly.
SELECT pgtrickle.ungate_source('orders');
pgtrickle.bootstrap_gate_status() (v0.6.0)
Rich introspection of bootstrap gate lifecycle. Returns the same columns as
source_gates() plus computed fields for debugging.
SELECT * FROM pgtrickle.bootstrap_gate_status();
-- source_table | schema_name | gated | gated_at | ungated_at | gated_by | gate_duration | affected_stream_tables
| Column | Type | Description |
|---|---|---|
source_table | text | Relation name |
schema_name | text | Schema name |
gated | boolean | Whether the source is currently gated |
gated_at | timestamptz | When the gate was set (updated on re-gate) |
ungated_at | timestamptz | When the gate was cleared (NULL if active) |
gated_by | text | Which function set the gate |
gate_duration | interval | How long the gate has been active (gated: now() - gated_at; ungated: ungated_at - gated_at) |
affected_stream_tables | text | Comma-separated list of stream tables whose scheduler refreshes are blocked by this gate |
Rows are sorted with currently-gated sources first, then alphabetically.
ETL Coordination Cookbook (v0.6.0)
Step-by-step recipes for common bulk-load patterns using source gating.
Recipe 1 — Single Source Bulk Load
Gate one source table during a large data import. The scheduler pauses refreshes for all stream tables that depend on this source.
-- 1. Gate the source before loading.
SELECT pgtrickle.gate_source('orders');
-- 2. Load the data. The scheduler sits idle for orders-dependent STs.
COPY orders FROM '/data/orders_2026.csv' WITH (FORMAT csv, HEADER);
-- 3. Ungate. On the next tick the scheduler refreshes everything cleanly.
SELECT pgtrickle.ungate_source('orders');
Recipe 2 — Coordinated Multi-Source Load
When multiple sources feed into a shared downstream stream table, gate them all before loading so no intermediate refreshes occur.
-- 1. Gate all sources that will be loaded.
SELECT pgtrickle.gate_source('orders');
SELECT pgtrickle.gate_source('order_lines');
-- 2. Load each source (can be parallel, any order).
COPY orders FROM '/data/orders.csv' WITH (FORMAT csv, HEADER);
COPY order_lines FROM '/data/lines.csv' WITH (FORMAT csv, HEADER);
-- 3. Ungate all sources. The scheduler refreshes downstream STs once.
SELECT pgtrickle.ungate_source('orders');
SELECT pgtrickle.ungate_source('order_lines');
Recipe 3 — Gate + Deferred Initialization
Combine gating with initialize => false to prevent incomplete initial
population when sources are loaded asynchronously.
-- 1. Gate sources before creating any stream tables.
SELECT pgtrickle.gate_source('orders');
SELECT pgtrickle.gate_source('order_lines');
-- 2. Create stream tables without initial population.
SELECT pgtrickle.create_stream_table(
'order_summary',
'SELECT region, SUM(amount) FROM orders GROUP BY region',
'1m', initialize => false
);
SELECT pgtrickle.create_stream_table(
'order_report',
'SELECT s.region, s.total, l.line_count
FROM order_summary s
JOIN (SELECT region, COUNT(*) AS line_count FROM order_lines GROUP BY region) l
USING (region)',
'1m', initialize => false
);
-- 3. Run ETL processes (can be in separate transactions).
BEGIN;
COPY orders FROM 's3://warehouse/orders.parquet';
SELECT pgtrickle.ungate_source('orders');
COMMIT;
BEGIN;
COPY order_lines FROM 's3://warehouse/lines.parquet';
SELECT pgtrickle.ungate_source('order_lines');
COMMIT;
-- 4. Once all sources are ungated, the scheduler initializes and refreshes
-- all stream tables in dependency order.
Recipe 4 — Nightly Batch Pattern
For scheduled ETL that runs overnight, gate sources before the batch starts and ungate after the batch completes.
-- Nightly ETL script:
-- Gate all sources that will be refreshed.
SELECT pgtrickle.gate_source('sales');
SELECT pgtrickle.gate_source('inventory');
-- Truncate and reload (or use COPY, INSERT...SELECT, etc.).
TRUNCATE sales;
COPY sales FROM '/data/nightly/sales.csv' WITH (FORMAT csv, HEADER);
TRUNCATE inventory;
COPY inventory FROM '/data/nightly/inventory.csv' WITH (FORMAT csv, HEADER);
-- All data loaded — ungate and let the scheduler handle the rest.
SELECT pgtrickle.ungate_source('sales');
SELECT pgtrickle.ungate_source('inventory');
-- Verify: check the gate status to confirm everything is ungated.
SELECT * FROM pgtrickle.bootstrap_gate_status();
Recipe 5 — Monitoring During a Gated Load
Use bootstrap_gate_status() to monitor progress when streams appear stalled.
-- Check which sources are currently gated and how long they've been paused.
SELECT source_table, gate_duration, affected_stream_tables
FROM pgtrickle.bootstrap_gate_status()
WHERE gated = true;
-- If a gate has been active too long (e.g. ETL failed), ungate manually.
SELECT pgtrickle.ungate_source('stale_source');
Watermark Gating (v0.7.0)
Watermark gating is a scheduling control for ETL pipelines where multiple source tables are populated by separate jobs that finish at different times. Each ETL job declares "I'm done up to timestamp X", and the scheduler waits until all sources in a group are caught up within a configurable tolerance before refreshing downstream stream tables.
Catalog Tables
pgtrickle.pgt_watermarks
Per-source watermark state. One row per source table that has had a watermark advanced.
| Column | Type | Description |
|---|---|---|
source_relid | oid | Source table OID (primary key) |
watermark | timestamptz | Current watermark value |
updated_at | timestamptz | When the watermark was last advanced |
advanced_by | text | User/role that advanced the watermark |
wal_lsn_at_advance | text | WAL LSN at the time of advancement |
pgtrickle.pgt_watermark_groups
Watermark group definitions. Each group declares that a set of sources must be temporally aligned.
| Column | Type | Description |
|---|---|---|
group_id | serial | Auto-generated group ID (primary key) |
group_name | text | Unique group name |
source_relids | oid[] | Array of source table OIDs in the group |
tolerance_secs | float8 | Maximum allowed lag in seconds (default 0) |
created_at | timestamptz | When the group was created |
pgtrickle.pgt_template_cache
Added in v0.16.0. Cross-backend delta SQL template cache (UNLOGGED). Stores compiled delta query templates so new backends skip the ~45 ms DVM parse+differentiate step. Managed automatically — no user interaction required.
| Column | Type | Description |
|---|---|---|
pgt_id | bigint | Stream table ID (PK, FK → pgt_stream_tables) |
query_hash | bigint | Hash of the defining query (staleness detection) |
delta_sql | text | Delta SQL template with LSN placeholder tokens |
columns | text[] | Output column names |
source_oids | integer[] | Source table OIDs |
is_dedup | boolean | Whether the delta is deduplicated per row ID |
key_changed | boolean | Whether __pgt_key_changed column is present |
all_algebraic | boolean | Whether all aggregates are algebraically invertible |
cached_at | timestamptz | When the entry was last populated |
Functions
pgtrickle.advance_watermark(source TEXT, watermark TIMESTAMPTZ)
Signal that a source table's data is complete through the given timestamp.
- Monotonic: rejects watermarks that go backward (raises error).
- Idempotent: advancing to the same value is a silent no-op.
- Transactional: the watermark is part of the caller's transaction.
SELECT pgtrickle.advance_watermark('orders', '2026-03-01 12:05:00+00');
pgtrickle.create_watermark_group(group_name TEXT, sources TEXT[], tolerance_secs FLOAT8 DEFAULT 0)
Create a watermark group. Requires at least 2 sources.
tolerance_secs: maximum allowed lag between the most-advanced and least-advanced watermarks. Default0means strict alignment.
SELECT pgtrickle.create_watermark_group(
'order_pipeline',
ARRAY['orders', 'order_lines'],
0 -- strict alignment (default)
);
pgtrickle.drop_watermark_group(group_name TEXT)
Remove a watermark group by name.
SELECT pgtrickle.drop_watermark_group('order_pipeline');
pgtrickle.watermarks()
Return the current watermark state for all registered sources.
SELECT * FROM pgtrickle.watermarks();
| Column | Type | Description |
|---|---|---|
source_table | text | Source table name |
schema_name | text | Schema name |
watermark | timestamptz | Current watermark value |
updated_at | timestamptz | Last advancement time |
advanced_by | text | User that advanced it |
wal_lsn | text | WAL LSN at advancement |
pgtrickle.watermark_groups()
Return all watermark group definitions.
SELECT * FROM pgtrickle.watermark_groups();
pgtrickle.watermark_status()
Return live alignment status for each watermark group.
SELECT * FROM pgtrickle.watermark_status();
| Column | Type | Description |
|---|---|---|
group_name | text | Group name |
min_watermark | timestamptz | Least-advanced watermark |
max_watermark | timestamptz | Most-advanced watermark |
lag_secs | float8 | Lag in seconds between max and min |
aligned | boolean | Whether lag is within tolerance |
sources_with_watermark | int4 | Number of sources that have a watermark |
sources_total | int4 | Total sources in the group |
Recipes
Recipe 6 — Nightly ETL with Watermarks
-- Create a watermark group for the order pipeline.
SELECT pgtrickle.create_watermark_group(
'order_pipeline',
ARRAY['orders', 'order_lines']
);
-- Nightly ETL job 1: Load orders
BEGIN;
COPY orders FROM '/data/orders_20260301.csv';
SELECT pgtrickle.advance_watermark('orders', '2026-03-01');
COMMIT;
-- Nightly ETL job 2: Load order lines (may run later)
BEGIN;
COPY order_lines FROM '/data/lines_20260301.csv';
SELECT pgtrickle.advance_watermark('order_lines', '2026-03-01');
COMMIT;
-- order_report refreshes on the next tick after both watermarks align.
Recipe 7 — Micro-Batch Tolerance
-- Allow up to 30 seconds of skew between trades and quotes.
SELECT pgtrickle.create_watermark_group(
'realtime_pipeline',
ARRAY['trades', 'quotes'],
30 -- 30-second tolerance
);
-- External process advances watermarks every few seconds.
SELECT pgtrickle.advance_watermark('trades', '2026-03-01 12:00:05+00');
SELECT pgtrickle.advance_watermark('quotes', '2026-03-01 12:00:02+00');
-- Lag is 3s, within 30s tolerance → stream tables refresh normally.
Recipe 8 — Monitoring Watermark Alignment
-- Check which groups are currently misaligned.
SELECT group_name, lag_secs, aligned
FROM pgtrickle.watermark_status()
WHERE NOT aligned;
-- Check individual source watermarks.
SELECT source_table, watermark, updated_at
FROM pgtrickle.watermarks()
ORDER BY watermark;
Stuck Watermark Detection (WM-7, v0.15.0)
When pg_trickle.watermark_holdback_timeout is set to a positive value
(seconds), the scheduler periodically checks all watermark sources. If any
source in a watermark group has not been advanced within the timeout,
downstream stream tables in that group are paused (refresh is skipped)
and a pgtrickle_alert NOTIFY is emitted.
This protects against silent data staleness when an ETL pipeline breaks and stops advancing watermarks -- without this guard, stream tables would continue refreshing with stale external data.
Behavior:
- Stuck detection: Every ~60 seconds, the scheduler checks
updated_atfor all watermark sources. Ifnow() - updated_at > watermark_holdback_timeout, the source is stuck. - Pause: Any stream table whose source set overlaps a group containing
a stuck source is skipped. A SKIP record with
"stuck"in the reason is logged topgt_refresh_history. - Alert: A
pgtrickle_alertNOTIFY with eventwatermark_stuckis emitted (once per newly-stuck source, not repeated every check cycle). - Auto-resume: When the stuck watermark is advanced via
advance_watermark(), the next scheduler check detects the advancement, lifts the pause, and emits awatermark_resumedevent.
Recipe 9 — Stuck Watermark Protection
-- Enable stuck-watermark detection with a 10-minute timeout.
ALTER SYSTEM SET pg_trickle.watermark_holdback_timeout = 600;
SELECT pg_reload_conf();
-- Listen for alerts in a monitoring process.
LISTEN pgtrickle_alert;
-- When the ETL pipeline breaks and stops calling advance_watermark(),
-- the scheduler will start skipping downstream STs after 10 minutes.
-- You'll receive a NOTIFY payload like:
-- {"event":"watermark_stuck","group":"order_pipeline","source_oid":16385,"age_secs":620}
-- When the ETL pipeline recovers and advances the watermark:
SELECT pgtrickle.advance_watermark('orders', '2026-03-02 00:00:00+00');
-- The scheduler automatically resumes, and you'll receive:
-- {"event":"watermark_resumed","source_oid":16385}
Developer Diagnostics (v0.12.0)
Four SQL-callable introspection functions that surface internal DVM state without side-effects. All functions are read-only — they never modify catalog tables or trigger refreshes.
pgtrickle.explain_query_rewrite(query TEXT)
Walk a query through the full DVM rewrite pipeline and report each pass.
Returns one row per rewrite pass. When a pass changes the query, changed = true
and sql_after contains the SQL after the transformation. Two synthetic rows
are appended: topk_detection (detects ORDER BY … LIMIT) and dvm_patterns
(lists detected DVM constructs such as aggregation strategy, join types, and
volatility).
SELECT pass_name, changed, sql_after
FROM pgtrickle.explain_query_rewrite(
'SELECT customer_id, SUM(amount) FROM orders GROUP BY customer_id'
);
Return columns:
| Column | Type | Description |
|---|---|---|
pass_name | text | Rewrite pass name (e.g. view_inlining, distinct_on, grouping_sets) |
changed | bool | Whether this pass modified the query |
sql_after | text | SQL text after this pass (NULL if unchanged) |
Rewrite passes (in order):
| Pass | Description |
|---|---|
view_inlining | Expand view references to their defining SQL |
nested_window_lift | Lift window functions out of expressions (e.g. CASE WHEN ROW_NUMBER() OVER (...) ...) |
distinct_on | Rewrite DISTINCT ON to a ROW_NUMBER() window |
grouping_sets | Expand GROUPING SETS / CUBE / ROLLUP to UNION ALL of GROUP BY |
scalar_subquery_in_where | Rewrite scalar subqueries in WHERE to CROSS JOIN |
correlated_scalar_in_select | Rewrite correlated scalar subqueries in SELECT to LEFT JOIN |
sublinks_in_or_demorgan | Apply De Morgan normalization and expand SubLinks inside OR |
rows_from | Rewrite ROWS FROM() multi-function expressions |
topk_detection | Detect ORDER BY … LIMIT n TopK pattern |
dvm_patterns | Detected DVM constructs: join types, aggregate strategies, volatility |
pgtrickle.diagnose_errors(name TEXT)
Return the last 5 FAILED refresh events for a stream table, with each error classified by type and supplied with a remediation hint.
SELECT event_time, error_type, error_message, remediation
FROM pgtrickle.diagnose_errors('my_stream_table');
Return columns:
| Column | Type | Description |
|---|---|---|
event_time | timestamptz | When the failed refresh started |
error_type | text | Classification: user, schema, correctness, performance, infrastructure |
error_message | text | Raw error text from pgt_refresh_history |
remediation | text | Suggested next step |
Error types:
| Type | Trigger patterns | Typical action |
|---|---|---|
user | query parse error, unsupported operator, type mismatch | Check query; run validate_query() |
schema | upstream table schema changed, upstream table dropped | Reinitialize; check pgt_dependencies |
correctness | phantom, EXCEPT ALL, row count mismatch | Switch to refresh_mode='FULL'; report bug |
performance | lock timeout, deadlock, serialization failure, spill | Tune lock_timeout; enable buffer_partitioning |
infrastructure | permission denied, SPI error, replication slot | Check role grants; verify slot config |
pgtrickle.list_auxiliary_columns(name TEXT)
List all __pgt_* internal columns on a stream table's storage relation,
with an explanation of each column's role.
These columns are normally hidden from SELECT * output. This function
surfaces them for debugging and operator visibility.
SELECT column_name, data_type, purpose
FROM pgtrickle.list_auxiliary_columns('my_stream_table');
Return columns:
| Column | Type | Description |
|---|---|---|
column_name | text | Internal column name (e.g. __pgt_row_id) |
data_type | text | PostgreSQL type (e.g. bigint, text) |
purpose | text | Human-readable description of the column's role |
Common auxiliary columns:
| Column | Purpose |
|---|---|
__pgt_row_id | Row identity hash — MERGE join key for delta application |
__pgt_count | Multiplicity counter for DISTINCT / aggregation / UNION dedup |
__pgt_count_l | Left-side multiplicity for INTERSECT / EXCEPT |
__pgt_count_r | Right-side multiplicity for INTERSECT / EXCEPT |
__pgt_aux_sum_<col> | Running SUM for algebraic AVG maintenance |
__pgt_aux_count_<col> | Running COUNT for algebraic AVG maintenance |
__pgt_aux_sum2_<col> | Sum-of-squares for STDDEV / VAR maintenance |
__pgt_aux_sum{x,y,xy,x2,y2}_<col> | Five-column set for CORR / COVAR / REGR_* |
__pgt_aux_nonnull_<col> | Non-null count for SUM-above-FULL-JOIN maintenance |
pgtrickle.validate_query(query TEXT)
Parse and validate a query through the DVM pipeline without creating a stream table. Returns detected SQL constructs, warnings, and the resolved refresh mode.
SELECT check_name, result, severity
FROM pgtrickle.validate_query(
'SELECT customer_id, COUNT(*) FROM orders GROUP BY customer_id'
);
Return columns:
| Column | Type | Description |
|---|---|---|
check_name | text | Name of the check or detected construct |
result | text | Resolved value or construct description |
severity | text | INFO, WARNING, or ERROR |
The first row always has check_name = 'resolved_refresh_mode' with the mode
that would be assigned under refresh_mode = 'AUTO': DIFFERENTIAL, FULL,
or TOPK.
Common check names:
| Check | Description |
|---|---|
resolved_refresh_mode | DIFFERENTIAL, FULL, or TOPK |
topk_pattern | Detected LIMIT + ORDER BY values |
unsupported_construct | Feature not supported for DIFFERENTIAL mode (→ WARNING) |
matview_or_foreign_table | Query references matview/foreign table (→ WARNING, FULL) |
ivm_support_check | DVM parse result (→ WARNING if DIFFERENTIAL not possible) |
aggregate | Aggregate with strategy: ALGEBRAIC_INVERTIBLE, ALGEBRAIC_VIA_AUX, SEMI_ALGEBRAIC, or GROUP_RESCAN |
join | Detected join type: INNER, LEFT_OUTER, FULL_OUTER, SEMI, ANTI |
set_op | Set operation: DISTINCT, UNION_ALL, INTERSECT, EXCEPT, EXCEPT_ALL |
window_function | Query contains window functions |
scalar_subquery | Query contains scalar subqueries |
lateral | Query contains LATERAL functions or subqueries |
recursive_cte | Query uses WITH RECURSIVE |
volatility | Worst-case volatility of functions used: immutable, stable, volatile |
needs_pgt_count | Multiplicity counter column will be added |
needs_dual_count | Left/right multiplicity counters will be added |
parse_warning | Advisory warning from the DVM parse phase |
Example output for a GROUP_RESCAN query:
SELECT check_name, result, severity
FROM pgtrickle.validate_query(
'SELECT grp, STRING_AGG(tag, '','') FROM events GROUP BY grp'
);
| check_name | result | severity |
|---|---|---|
resolved_refresh_mode | DIFFERENTIAL | INFO |
aggregate | STRING_AGG(GROUP_RESCAN) | WARNING |
needs_pgt_count | true — multiplicity counter column required | INFO |
volatility | immutable | INFO |
Note on GROUP_RESCAN:
STRING_AGG,ARRAY_AGG,BOOL_AND, and other non-algebraic aggregates use a group-rescan strategy — any change in a group triggers full re-aggregation from the source data for that group. This is still DIFFERENTIAL (only changed groups are rescanned), but has higher per-group cost than algebraic strategies. If this is performance-sensitive, consider pre-aggregating with a simpler aggregate and post-processing.
Delta SQL Profiling (v0.13.0)
pgtrickle.explain_delta(st_name text, format text DEFAULT 'text')
Generate the delta SQL query plan for a stream table without executing a refresh.
explain_delta produces the differential delta SQL that would be used on the
next DIFFERENTIAL refresh, then runs EXPLAIN (ANALYZE false, FORMAT <format>)
on it and returns the plan lines. This function is useful for:
- Identifying slow joins or missing indexes in auto-generated delta SQL.
- Comparing plan complexity between different query forms.
- Monitoring how the size of change buffers affects plan shape.
The delta SQL is generated against a hypothetical "scan all changes" window
(LSN 0/0 → FF/FFFFFFFF) so the plan shows the full join/filter structure
even when the change buffer is currently empty.
Parameters:
| Name | Type | Description |
|---|---|---|
st_name | text | Qualified stream table name (e.g. 'public.orders_summary'). |
format | text | Plan format: 'text' (default), 'json', 'xml', or 'yaml'. |
Returns: SETOF text — one row per plan line (text format) or one row containing the full JSON/XML/YAML plan.
Example:
-- Show the text plan for the delta query
SELECT line FROM pgtrickle.explain_delta('public.orders_summary');
-- Get the JSON plan for programmatic analysis
SELECT line FROM pgtrickle.explain_delta('public.orders_summary', 'json');
Environment variable (PGS_PROFILE_DELTA=1): When the environment variable
PGS_PROFILE_DELTA=1 is set in the PostgreSQL server process, every
DIFFERENTIAL refresh automatically captures EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON)
for the resolved delta SQL and writes the plan to
/tmp/delta_plans/<schema>_<table>.json. This is intended for E2E test
diagnostics and local profiling sessions.
pgtrickle.dedup_stats()
Show MERGE deduplication profiling counters accumulated since server start.
When the delta cannot be guaranteed to contain at most one row per
__pgt_row_id (e.g. for aggregate queries or keyless sources), the MERGE
must group and aggregate the delta before merging. This is tracked as
dedup needed. A consistently high ratio indicates that pre-MERGE compaction
in the change buffer would reduce refresh latency.
Returns: one row with:
| Column | Type | Description |
|---|---|---|
total_diff_refreshes | bigint | Total DIFFERENTIAL refreshes executed since server start that processed at least one change. Resets on server restart. |
dedup_needed | bigint | Number of those refreshes where the delta required weight aggregation / deduplication in the MERGE USING clause. |
dedup_ratio_pct | float8 | dedup_needed / total_diff_refreshes × 100. 0 when total_diff_refreshes = 0. |
Example:
SELECT * FROM pgtrickle.dedup_stats();
-- total_diff_refreshes | dedup_needed | dedup_ratio_pct
-- ----------------------+--------------+-----------------
-- 1234 | 87 | 7.05
A dedup_ratio_pct ≥ 10 is the threshold recommended for investigating a
two-pass MERGE strategy. See plans/performance/REPORT_OVERALL_STATUS.md §14
for background.
pgtrickle.shared_buffer_stats()
Added in v0.13.0
D-4 observability function. Returns one row per shared change buffer (one per tracked source table), showing how many stream tables share the buffer, which columns are tracked, the safe cleanup frontier, and the current buffer size.
Return columns:
| Column | Type | Description |
|---|---|---|
source_oid | bigint | PostgreSQL OID of the source table |
source_table | text | Fully qualified source table name |
consumer_count | integer | Number of stream tables sharing this buffer |
consumers | text | Comma-separated list of consumer stream table names |
columns_tracked | integer | Number of new_* columns in the buffer (column superset) |
safe_frontier_lsn | text | MIN(frontier LSN) across all consumers — rows at or below this are safe to clean up |
buffer_rows | bigint | Current number of rows in the change buffer |
is_partitioned | boolean | Whether the buffer uses LSN-range partitioning |
Example:
SELECT * FROM pgtrickle.shared_buffer_stats();
-- source_oid | source_table | consumer_count | consumers | columns_tracked | safe_frontier_lsn | buffer_rows | is_partitioned
-- -----------+--------------------+----------------+------------------------------------+-----------------+-------------------+-------------+----------------
-- 16456 | public.orders | 3 | public.orders_by_region, public... | 5 | 0/1A2B3C4D | 142 | f
UNLOGGED Change Buffers (v0.14.0)
pgtrickle.convert_buffers_to_unlogged()
Converts all existing logged change buffer tables to UNLOGGED. This
eliminates WAL writes for trigger-inserted CDC rows, reducing WAL
amplification by ~30%.
Returns: bigint — the number of buffer tables converted.
SELECT pgtrickle.convert_buffers_to_unlogged();
-- convert_buffers_to_unlogged
-- ----------------------------
-- 5
Warning: Each conversion acquires
ACCESS EXCLUSIVElock on the buffer table. Run this function during a low-traffic maintenance window to minimize lock contention.
After conversion: Buffer contents will be lost on crash recovery. The scheduler automatically detects this and enqueues a FULL refresh for affected stream tables. See
pg_trickle.unlogged_buffersfor the full trade-off discussion.
Refresh Mode Diagnostics (v0.14.0)
pgtrickle.recommend_refresh_mode(st_name TEXT DEFAULT NULL)
Analyze stream table workload characteristics and recommend the optimal
refresh mode (FULL vs DIFFERENTIAL). When st_name is NULL, returns one
row per stream table. When provided, returns a single row for the named
stream table.
The function evaluates seven weighted signals — change ratio, empirical timing, query complexity, target size, index coverage, and latency variance — and computes a composite score. Scores above +0.15 recommend DIFFERENTIAL; below −0.15 recommend FULL; in between, the function recommends KEEP (current mode is near-optimal).
Parameters:
| Name | Type | Default | Description |
|---|---|---|---|
st_name | text | NULL | Optional stream table name. NULL = all stream tables. |
Return columns:
| Column | Type | Description |
|---|---|---|
pgt_schema | text | Stream table schema |
pgt_name | text | Stream table name |
current_mode | text | Currently configured refresh mode |
effective_mode | text | Mode actually used in the last refresh |
recommended_mode | text | DIFFERENTIAL, FULL, or KEEP |
confidence | text | high, medium, or low |
reason | text | Human-readable explanation of the recommendation |
signals | jsonb | Detailed signal breakdown with scores and weights |
Example:
-- Check all stream tables
SELECT pgt_name, current_mode, recommended_mode, confidence, reason
FROM pgtrickle.recommend_refresh_mode();
-- Check a specific stream table
SELECT recommended_mode, confidence, reason, signals
FROM pgtrickle.recommend_refresh_mode('public.orders_summary');
Signal weights:
| Signal | Base Weight | Description |
|---|---|---|
change_ratio_current | 0.25 | Current pending changes / source rows |
change_ratio_avg | 0.30 | Historical average change ratio |
empirical_timing | 0.35 | Observed DIFF vs FULL speed ratio |
query_complexity | 0.10 | JOIN/aggregate/window count |
target_size | 0.10 | Target relation + index size |
index_coverage | 0.05 | Whether __pgt_row_id index exists |
latency_variance | 0.05 | DIFF latency p95/p50 ratio |
pgtrickle.refresh_efficiency()
Per-table refresh efficiency metrics. Returns operational statistics for every stream table — useful for monitoring dashboards and Grafana alerts.
Return columns:
| Column | Type | Description |
|---|---|---|
pgt_schema | text | Stream table schema |
pgt_name | text | Stream table name |
refresh_mode | text | Current refresh mode |
total_refreshes | bigint | Total completed refresh count |
diff_count | bigint | DIFFERENTIAL refresh count |
full_count | bigint | FULL refresh count |
avg_diff_ms | float8 | Average DIFFERENTIAL duration (ms) |
avg_full_ms | float8 | Average FULL duration (ms) |
avg_change_ratio | float8 | Average change ratio from history |
diff_speedup | text | Speedup factor (e.g. 12.3x) of FULL / DIFF timing |
last_refresh_at | text | Timestamp of last data refresh |
Example:
SELECT pgt_name, refresh_mode, diff_count, full_count,
avg_diff_ms, avg_full_ms, diff_speedup
FROM pgtrickle.refresh_efficiency()
ORDER BY total_refreshes DESC;
Export API (v0.14.0)
pgtrickle.export_definition(st_name TEXT)
Export a stream table's configuration as reproducible DDL. Returns a SQL
script containing DROP STREAM TABLE IF EXISTS followed by
SELECT pgtrickle.create_stream_table(...) with all configured options,
plus any ALTER STREAM TABLE calls for post-creation settings (tier,
fuse mode, etc.).
Parameters:
| Name | Type | Description |
|---|---|---|
st_name | text | Fully qualified or search-path-resolved stream table name. |
Returns: text — SQL script that recreates the stream table.
Example:
-- Export a single definition
SELECT pgtrickle.export_definition('public.orders_summary');
-- Export all definitions
SELECT pgtrickle.export_definition(pgt_schema || '.' || pgt_name)
FROM pgtrickle.pgt_stream_tables;
dbt Integration (v0.13.0)
The dbt-pgtrickle package exposes two new config(...) keys added in
v0.13.0: partition_by and the fuse circuit-breaker options. Use them directly
in any stream_table materialization model.
For full dbt documentation see dbt-pgtrickle/README.md.
partition_by config
Partition the stream table's underlying storage table using PostgreSQL
PARTITION BY RANGE. Only applied at creation time — changing it after the
stream table exists has no effect (use --full-refresh to recreate).
-- models/marts/events_by_day.sql
{{ config(
materialized='stream_table',
schedule='1m',
refresh_mode='DIFFERENTIAL',
partition_by='event_day'
) }}
SELECT
event_day,
user_id,
COUNT(*) AS event_count
FROM {{ source('raw', 'events') }}
GROUP BY event_day, user_id
pg_trickle creates a PARTITION BY RANGE (event_day) storage table with an
automatic default catch-all partition. Add named partitions via standard DDL:
CREATE TABLE analytics.events_by_day_2026
PARTITION OF analytics.events_by_day
FOR VALUES FROM ('2026-01-01') TO ('2027-01-01');
The partition_by value is stored in pgtrickle.pgt_stream_tables.st_partition_key
and visible via pgtrickle.stream_tables_info.
fuse config
The fuse circuit breaker suspends differential refreshes when the incoming
change volume exceeds a threshold, preventing runaway refresh cycles during
bulk ingestion. Fuse parameters are applied via alter_stream_table() on
every dbt run; they are a no-op if the values have not changed.
-- models/marts/order_totals.sql
{{ config(
materialized='stream_table',
schedule='5m',
refresh_mode='DIFFERENTIAL',
fuse='auto',
fuse_ceiling=50000,
fuse_sensitivity=3
) }}
SELECT customer_id, SUM(amount) AS total
FROM {{ source('raw', 'orders') }}
GROUP BY customer_id
| Config key | Type | Default | Description |
|---|---|---|---|
fuse | 'off'|'on'|'auto' | null (no-op) | Fuse mode. 'auto' activates only when FULL refresh would be cheaper than DIFFERENTIAL. |
fuse_ceiling | integer | null | Change-count threshold (number of changed rows) that triggers the fuse. null uses the global pg_trickle.fuse_default_ceiling GUC. |
fuse_sensitivity | integer | null | Number of consecutive over-ceiling observations required before the fuse blows. null means 1 (blow immediately). |
Monitor fuse state via pgtrickle.dedup_stats() or check
pgtrickle.pgt_stream_tables.fuse_state directly:
SELECT pgt_name, fuse_mode, fuse_state, fuse_ceiling, fuse_sensitivity
FROM pgtrickle.pgt_stream_tables
WHERE fuse_mode != 'off';
Self Monitoring — Self-Monitoring (v0.20.0)
Added in v0.20.0.
pg_trickle can monitor itself using its own stream tables. Five self-monitoring stream tables maintain reactive analytics over the internal catalog, replacing repeated full-scan diagnostic queries with continuously-maintained incremental views.
Quick Start
-- Create all five self-monitoring stream tables (idempotent).
SELECT pgtrickle.setup_self_monitoring();
-- Check status.
SELECT * FROM pgtrickle.self_monitoring_status();
-- View threshold recommendations (after 10+ refresh cycles).
SELECT * FROM pgtrickle.df_threshold_advice
WHERE confidence IN ('HIGH', 'MEDIUM');
-- View anomalies.
SELECT * FROM pgtrickle.df_anomaly_signals
WHERE duration_anomaly IS NOT NULL;
-- Enable auto-apply (optional).
SET pg_trickle.self_monitoring_auto_apply = 'threshold_only';
-- Clean up.
SELECT pgtrickle.teardown_self_monitoring();
pgtrickle.setup_self_monitoring()
Creates all five self-monitoring stream tables. Idempotent — safe to call multiple
times. Emits a warm-up warning if pgt_refresh_history has fewer than 50 rows.
Stream tables created:
| Name | Schedule | Mode | Purpose |
|---|---|---|---|
pgtrickle.df_efficiency_rolling | 48s | AUTO | Rolling-window refresh statistics |
pgtrickle.df_anomaly_signals | 48s | AUTO | Duration spikes, error bursts, mode oscillation |
pgtrickle.df_threshold_advice | 96s | AUTO | Multi-cycle threshold recommendations |
pgtrickle.df_cdc_buffer_trends | 48s | AUTO | CDC buffer growth rates per source |
pgtrickle.df_scheduling_interference | 96s | FULL | Concurrent refresh overlap detection |
pgtrickle.teardown_self_monitoring()
Drops all self-monitoring stream tables. Safe with partial setups — missing tables are silently skipped. User stream tables are never affected.
pgtrickle.self_monitoring_status()
Returns the status of all five expected self-monitoring stream tables:
| Column | Type | Description |
|---|---|---|
st_name | text | Stream table name |
exists | bool | Whether the ST exists |
status | text | Current status (ACTIVE, SUSPENDED, etc.) |
refresh_mode | text | Effective refresh mode |
last_refresh_at | text | Last successful refresh timestamp |
total_refreshes | bigint | Total completed refreshes |
pgtrickle.scheduler_overhead()
Returns scheduler efficiency metrics for the last hour:
| Column | Type | Description |
|---|---|---|
total_refreshes_1h | bigint | Total refreshes in the last hour |
df_refreshes_1h | bigint | Dog-feeding refreshes in the last hour |
df_refresh_fraction | float | Fraction of refreshes that are self-monitoring |
avg_refresh_ms | float | Average refresh duration (ms) |
avg_df_refresh_ms | float | Average DF refresh duration (ms) |
total_refresh_time_s | float | Total time spent refreshing (seconds) |
df_refresh_time_s | float | Time spent on DF refreshes (seconds) |
pgtrickle.explain_dag(format)
Returns the full refresh DAG as a Mermaid markdown (default) or Graphviz DOT string. Node colours: user STs = blue, self-monitoring STs = green, suspended = red, fused = orange.
-- Mermaid format (default).
SELECT pgtrickle.explain_dag();
-- Graphviz DOT format.
SELECT pgtrickle.explain_dag('dot');
Auto-Apply Policy
The pg_trickle.self_monitoring_auto_apply GUC controls whether analytics can
automatically adjust stream table configuration:
| Value | Behaviour |
|---|---|
off (default) | Advisory only — no automatic changes |
threshold_only | Apply threshold recommendations when confidence is HIGH and delta > 5% |
full | Also apply scheduling hints from interference analysis |
Auto-apply is rate-limited to at most one threshold change per stream table
per 10 minutes. Changes are logged to pgt_refresh_history with
initiated_by = 'SELF_MONITOR'.
Confidence Levels and Sparse History
df_threshold_advice assigns a confidence level to each recommendation:
| Confidence | Criteria | What to expect |
|---|---|---|
| HIGH | ≥ 20 total refreshes, ≥ 5 DIFFERENTIAL, ≥ 2 FULL | Reliable recommendation — auto-apply will act on this |
| MEDIUM | ≥ 10 total refreshes | Directionally useful, but may lack enough FULL/DIFF mix |
| LOW | < 10 total refreshes | Insufficient data — recommendation equals the current threshold |
When you see LOW confidence: This is normal during the first minutes after
setup_self_monitoring(). The stream tables need time to accumulate refresh
history. In typical deployments with a 1-minute schedule, expect:
- LOW for the first ~10 minutes
- MEDIUM after ~10 minutes
- HIGH after ~20 minutes (requires at least 2 FULL refreshes — these happen naturally when the auto-threshold triggers a mode switch)
If a stream table uses FULL mode exclusively, the advice will remain
at MEDIUM because no DIFFERENTIAL observations exist for comparison.
The sla_headroom_pct column shows how much faster DIFFERENTIAL is compared
to FULL as a percentage. A value of 70% means "DIFF is 70% faster than FULL".
This column is NULL when either FULL or DIFF observations are missing.
Stream Table Snapshots (v0.27.0)
Added in v0.27.0 (SNAP-1–3).
Snapshots let you export the current state of a stream table into an archival table, then restore from that snapshot on another node or after a PITR operation. The main use cases are:
- Replica bootstrap — populate a new standby without a full re-scan
- PITR alignment — re-align stream table frontiers after point-in-time recovery so the first refresh is DIFFERENTIAL, not a full re-scan
- Archiving — preserve a historical snapshot for audit or rollback
pgtrickle.snapshot_stream_table(name, target)
Export the current content of a stream table into a new archival table.
pgtrickle.snapshot_stream_table(
name TEXT, -- stream table (schema.name or plain name)
target TEXT DEFAULT NULL -- destination table name; auto-generated if NULL
) → TEXT -- returns the fully-qualified snapshot table name
The snapshot table is created in the pgtrickle schema with the naming
convention snapshot_<name>_<epoch_ms> unless you supply target. The table
includes three metadata columns added by pg_trickle: __pgt_snapshot_version,
__pgt_frontier, and __pgt_snapshotted_at.
-- Auto-named snapshot
SELECT pgtrickle.snapshot_stream_table('public.orders_agg');
-- → 'pgtrickle.snapshot_orders_agg_1745452800000'
-- Named snapshot (useful when targeting a replica)
SELECT pgtrickle.snapshot_stream_table(
'public.orders_agg',
'pgtrickle.orders_agg_replica_init'
);
pgtrickle.restore_from_snapshot(name, source)
Restore a stream table from an archival snapshot and realign its frontier.
pgtrickle.restore_from_snapshot(
name TEXT, -- stream table to restore into
source TEXT -- fully-qualified snapshot table created by snapshot_stream_table()
) → void
After restore_from_snapshot() completes:
- The stream table's rows are replaced with the snapshot contents.
- The frontier is set to the snapshot's frontier, so the next refresh cycle is DIFFERENTIAL — only changes made after the snapshot are fetched.
SELECT pgtrickle.restore_from_snapshot(
'public.orders_agg',
'pgtrickle.orders_agg_replica_init'
);
pgtrickle.list_snapshots(name)
List all archival snapshots for a stream table.
pgtrickle.list_snapshots(
name TEXT -- stream table name
) → SETOF record(
snapshot_table TEXT,
created_at TIMESTAMPTZ,
row_count BIGINT,
frontier JSONB,
size_bytes BIGINT
)
pgtrickle.drop_snapshot(snapshot_table)
Drop an archival snapshot table and remove it from the catalog.
pgtrickle.drop_snapshot(
snapshot_table TEXT -- fully-qualified snapshot table
) → void
SELECT pgtrickle.drop_snapshot('pgtrickle.orders_agg_replica_init');
Catalog Table
| Table | Description |
|---|---|
pgtrickle.pgt_snapshots | One row per snapshot: pgt_id, snapshot_schema, snapshot_table, snapshot_version, frontier, created_at |
Transactional Outbox & Consumer Groups (v0.28.0)
Added in v0.28.0 (OUTBOX-1–6, OUTBOX-B1–B6).
The outbox pattern lets you reliably publish stream table deltas to external consumers — even if the consumer is temporarily unavailable. Each refresh writes a header row to a dedicated outbox table. Consumers poll for new rows, process them, and commit their offset. The pattern provides:
- At-least-once delivery with explicit offset commits
- Kafka-style consumer groups for parallel consumption with independent offsets
- Visibility leases to prevent duplicate processing within a group
- Claim-check delivery for large deltas (automatic when delta exceeds a configurable row threshold)
- Consumer lag metrics (
consumer_lag()) for monitoring
Quickstart
-- 1. Enable the outbox on a stream table
SELECT pgtrickle.enable_outbox('public.orders_agg');
-- 2. Create a consumer group
SELECT pgtrickle.create_consumer_group('my_group', 'public.orders_agg');
-- 3. Poll for new messages (returns rows since last committed offset)
SELECT * FROM pgtrickle.poll_outbox('my_group', 'worker-1');
-- 4. Process the rows, then commit the highest offset you processed
SELECT pgtrickle.commit_offset('my_group', 'worker-1', 42);
pgtrickle.enable_outbox(name, retention_hours)
Enable the outbox pattern for a stream table.
pgtrickle.enable_outbox(
name TEXT, -- stream table name
retention_hours INT DEFAULT 24 -- how long to keep outbox rows
) → void
Creates an outbox table pgtrickle.pgt_outbox_<st> and a convenience view
pgtrickle.pgt_outbox_latest_<st>. Records configuration in
pgtrickle.pgt_outbox_config.
Restriction: Not compatible with
IMMEDIATErefresh mode — useSCHEDULEDorAUTOinstead.
pgtrickle.disable_outbox(name, if_exists)
Disable the outbox pattern and drop the associated outbox table.
pgtrickle.disable_outbox(
name TEXT,
if_exists BOOLEAN DEFAULT false
) → void
pgtrickle.outbox_status(name)
Return a JSONB summary of outbox state for a stream table.
pgtrickle.outbox_status(name TEXT) → JSONB
Returns: enabled, outbox_table, retention_hours, pending_rows,
oldest_row_age, consumer_groups.
pgtrickle.outbox_rows_consumed(stream_table, outbox_id)
Mark an outbox row as consumed and release its claim-check rows (if any).
pgtrickle.outbox_rows_consumed(
stream_table TEXT,
outbox_id BIGINT
) → void
Use this when consuming outbox rows without a consumer group. For
consumer-group mode, use commit_offset() instead.
Example:
-- Simple (non-group) consumer: fetch latest, process, release
SELECT outbox_id, payload
FROM pgtrickle.pgt_outbox_latest_orders_agg
LIMIT 10;
-- After successful processing, release the outbox row:
SELECT pgtrickle.outbox_rows_consumed('public.orders_agg', 77);
Consumer Groups
Consumer groups give independent consumers their own offset pointer into the outbox. Multiple consumers in the same group share a single offset (competing consumers); multiple groups each get the full message stream.
pgtrickle.create_consumer_group(name, outbox, auto_offset_reset)
pgtrickle.create_consumer_group(
name TEXT,
outbox TEXT,
auto_offset_reset TEXT DEFAULT 'latest' -- 'latest' | 'earliest'
) → void
auto_offset_reset = 'latest' means a new group starts consuming from the
newest row. Use 'earliest' to replay from the beginning.
pgtrickle.drop_consumer_group(name, if_exists)
pgtrickle.drop_consumer_group(
name TEXT,
if_exists BOOLEAN DEFAULT false
) → void
Drops the group and all its offsets and leases.
Example:
-- Remove a consumer group (error if not found)
SELECT pgtrickle.drop_consumer_group('retired-group');
-- Idempotent removal
SELECT pgtrickle.drop_consumer_group('retired-group', if_exists => true);
pgtrickle.poll_outbox(group, consumer, batch_size, visibility_seconds)
Fetch the next batch of unprocessed messages for a consumer.
pgtrickle.poll_outbox(
group TEXT,
consumer TEXT,
batch_size INT DEFAULT 100,
visibility_seconds INT DEFAULT 30
) → SETOF record(
outbox_id BIGINT,
pgt_id UUID,
created_at TIMESTAMPTZ,
inserted_count BIGINT,
deleted_count BIGINT,
is_claim_check BOOLEAN,
payload JSONB
)
poll_outbox grants a visibility lease for visibility_seconds. The
consumer must call commit_offset() or extend_lease() before the lease
expires, otherwise the rows become visible again to other consumers.
When is_claim_check = true, the payload is NULL and the actual delta
rows are in a separate table (call outbox_rows_consumed() to release them
after processing).
Example:
-- Fetch up to 50 messages with a 60-second visibility window
SELECT outbox_id, inserted_count, deleted_count, payload
FROM pgtrickle.poll_outbox(
'analytics-group',
'worker-1',
batch_size => 50,
visibility_seconds => 60
);
pgtrickle.commit_offset(group, consumer, last_offset)
Commit the highest outbox offset the consumer has successfully processed.
pgtrickle.commit_offset(
group TEXT,
consumer TEXT,
last_offset BIGINT
) → void
Example:
-- After successfully processing messages up through offset 142:
SELECT pgtrickle.commit_offset('analytics-group', 'worker-1', 142);
pgtrickle.extend_lease(group, consumer, extension_seconds)
Extend the visibility lease when processing takes longer than expected.
pgtrickle.extend_lease(
group TEXT,
consumer TEXT,
extension_seconds INT DEFAULT 30
) → void
Example:
-- Extend the lease by 2 minutes when a large batch takes longer than expected
SELECT pgtrickle.extend_lease('analytics-group', 'worker-1', extension_seconds => 120);
pgtrickle.seek_offset(group, consumer, new_offset)
Jump to a specific offset for replay or recovery.
pgtrickle.seek_offset(
group TEXT,
consumer TEXT,
new_offset BIGINT
) → void
Example:
-- Rewind consumer to replay from offset 100 (disaster recovery)
SELECT pgtrickle.seek_offset('analytics-group', 'worker-1', 100);
-- Fast-forward past known-bad messages to offset 500
SELECT pgtrickle.seek_offset('analytics-group', 'worker-1', 500);
pgtrickle.consumer_heartbeat(group, consumer)
Signal that a consumer is still alive. Prevents the consumer from being marked
as dead (controlled by pg_trickle.consumer_dead_threshold_hours).
pgtrickle.consumer_heartbeat(
group TEXT,
consumer TEXT
) → void
Example:
-- Call periodically from a long-running consumer to stay alive
SELECT pgtrickle.consumer_heartbeat('analytics-group', 'worker-1');
pgtrickle.consumer_lag(group)
Return per-consumer lag metrics for a consumer group.
pgtrickle.consumer_lag(group TEXT) → SETOF record(
consumer TEXT,
committed_offset BIGINT,
latest_offset BIGINT,
lag BIGINT,
last_seen TIMESTAMPTZ
)
Example:
-- Monitor lag for all consumers in a group
SELECT consumer, lag, last_seen
FROM pgtrickle.consumer_lag('analytics-group')
ORDER BY lag DESC;
-- Alert if any consumer is more than 1000 messages behind
SELECT consumer, lag
FROM pgtrickle.consumer_lag('analytics-group')
WHERE lag > 1000;
Outbox Catalog Tables
pgtrickle.pgt_outbox_config
Maps stream tables to their pg_tide outbox names. Populated by
attach_outbox(); one row per stream table with an outbox enabled.
| Column | Type | Description |
|---|---|---|
stream_table_oid | OID | PostgreSQL OID of the stream table (PRIMARY KEY) |
stream_table_name | TEXT | Qualified name (schema.table) of the stream table |
tide_outbox_name | TEXT | Name of the corresponding pg_tide outbox |
created_at | TIMESTAMPTZ | When the outbox was attached |
pgtrickle.pgt_consumer_groups
Named consumer groups that track consumption progress on an outbox.
| Column | Type | Description |
|---|---|---|
group_name | TEXT | Consumer group name (PRIMARY KEY) |
outbox_name | TEXT | Name of the outbox being consumed |
auto_offset_reset | TEXT | Starting position for new groups: 'latest' or 'earliest' |
created_at | TIMESTAMPTZ | When the group was created |
pgtrickle.pgt_consumer_offsets
Per-consumer committed offsets and heartbeat tracking within a group.
| Column | Type | Description |
|---|---|---|
group_name | TEXT | Consumer group (FK → pgt_consumer_groups) |
consumer_id | TEXT | Consumer identifier within the group |
committed_offset | BIGINT | Highest outbox offset successfully committed |
last_committed_at | TIMESTAMPTZ | When the last commit occurred |
last_heartbeat_at | TIMESTAMPTZ | Last heartbeat signal timestamp |
Primary key: (group_name, consumer_id)
pgtrickle.pgt_consumer_leases
Visibility leases for in-flight outbox message batches (prevents duplicate delivery).
| Column | Type | Description |
|---|---|---|
group_name | TEXT | Consumer group (FK → pgt_consumer_offsets) |
consumer_id | TEXT | Consumer holding the lease |
batch_start | BIGINT | First offset in the leased batch |
batch_end | BIGINT | Last offset in the leased batch |
lease_expires | TIMESTAMPTZ | Lease expiry time; expired leases become visible again |
Primary key: (group_name, consumer_id)
Transactional Inbox (v0.28.0)
Added in v0.28.0 (INBOX-1–6, INBOX-B1–B4).
The inbox pattern provides a reliable, idempotent message receiver inside PostgreSQL. Incoming events are written to an inbox table; pg_trickle automatically creates stream tables that give you views of pending messages, dead-letter messages, and statistics — all updated incrementally.
What gets created
When you call create_inbox('orders_inbox', ...), pg_trickle creates:
| Table / View | Purpose |
|---|---|
pgtrickle.orders_inbox | The raw inbox table (one row per event) |
orders_inbox_pending stream table | Events with processed_at IS NULL and retry_count < max_retries |
orders_inbox_dlq stream table | Dead-letter events (retry_count >= max_retries) |
orders_inbox_stats stream table | Event counts grouped by event_type |
pgtrickle.create_inbox(name, ...)
Create a new transactional inbox with its associated stream tables.
pgtrickle.create_inbox(
name TEXT,
schema TEXT DEFAULT 'pgtrickle',
max_retries INT DEFAULT 3,
with_dead_letter BOOLEAN DEFAULT true,
with_stats BOOLEAN DEFAULT true,
schedule_seconds INT DEFAULT 5
) → void
SELECT pgtrickle.create_inbox('orders_inbox');
-- Creates: pgtrickle.orders_inbox, orders_inbox_pending, orders_inbox_dlq, orders_inbox_stats
pgtrickle.drop_inbox(name, if_exists, cascade)
Drop an inbox and all associated stream tables.
pgtrickle.drop_inbox(
name TEXT,
if_exists BOOLEAN DEFAULT false,
cascade BOOLEAN DEFAULT false
) → void
pgtrickle.enable_inbox_tracking(name, table_ref, ...)
Bring-your-own-table (BYOT) mode: register an existing table as an inbox without creating a new one.
pgtrickle.enable_inbox_tracking(
name TEXT,
table_ref TEXT, -- fully-qualified existing table
max_retries INT DEFAULT 3,
with_dead_letter BOOLEAN DEFAULT true,
with_stats BOOLEAN DEFAULT true,
schedule_seconds INT DEFAULT 5
) → void
pgtrickle.inbox_health(name)
Return a JSONB health summary for an inbox.
pgtrickle.inbox_health(name TEXT) → JSONB
Returns: inbox_name, pending_count, dlq_count, processed_24h,
oldest_pending_age, stream_table_statuses.
pgtrickle.inbox_status(name)
Return a tabular status summary for one or all inboxes.
pgtrickle.inbox_status(
name TEXT DEFAULT NULL -- NULL = all inboxes
) → SETOF record(
inbox_name TEXT,
pending BIGINT,
dlq BIGINT,
max_retries INT,
created_at TIMESTAMPTZ
)
pgtrickle.replay_inbox_messages(name, event_ids)
Reset specific messages back to pending state for re-processing.
pgtrickle.replay_inbox_messages(
name TEXT,
event_ids TEXT[] -- list of event_id values to replay
) → BIGINT -- number of messages reset
Example:
-- Replay two specific messages that failed processing
SELECT pgtrickle.replay_inbox_messages(
'orders_inbox',
ARRAY['evt-001', 'evt-002']
);
-- Returns: 2
-- Replay all dead-letter messages for manual retry
SELECT pgtrickle.replay_inbox_messages(
'orders_inbox',
ARRAY(SELECT event_id FROM orders_inbox_dlq)
);
Per-Aggregate Ordering (INBOX-B1)
By default, multiple workers can process inbox messages concurrently. If messages for the same aggregate must be processed in order, enable per-aggregate ordering:
pgtrickle.enable_inbox_ordering(inbox, aggregate_id_col, seq_col)
pgtrickle.enable_inbox_ordering(
inbox TEXT,
aggregate_id_col TEXT, -- column that identifies the aggregate (e.g. 'customer_id')
seq_col TEXT -- monotonic sequence column (e.g. 'event_sequence')
) → void
Creates a next_<inbox> stream table that surfaces only the lowest-sequence
unprocessed message per aggregate. Workers consume from next_<inbox> to
avoid concurrent processing of the same aggregate.
pgtrickle.disable_inbox_ordering(inbox, if_exists)
pgtrickle.disable_inbox_ordering(inbox TEXT, if_exists BOOLEAN DEFAULT false) → void
Priority Tiers (INBOX-B2)
pgtrickle.enable_inbox_priority(inbox, priority_col, tiers)
Register a priority column for cost-model–aware scheduling.
pgtrickle.enable_inbox_priority(
inbox TEXT,
priority_col TEXT, -- column name that holds the priority value
tiers INT DEFAULT 3
) → void
pgtrickle.disable_inbox_priority(inbox, if_exists)
pgtrickle.disable_inbox_priority(inbox TEXT, if_exists BOOLEAN DEFAULT false) → void
Sequence Gap Detection (INBOX-B3)
pgtrickle.inbox_ordering_gaps(inbox_name)
Detect gaps in the per-aggregate sequence — useful for identifying lost or out-of-order messages.
pgtrickle.inbox_ordering_gaps(inbox_name TEXT) → SETOF record(
aggregate_id TEXT,
expected_seq BIGINT,
actual_seq BIGINT,
gap_size BIGINT
)
Example:
-- Find any ordering gaps (missing events) across all aggregates
SELECT aggregate_id, expected_seq, actual_seq, gap_size
FROM pgtrickle.inbox_ordering_gaps('orders_inbox')
ORDER BY gap_size DESC;
-- Alert if any gap is larger than 1
DO $$
DECLARE gap RECORD;
BEGIN
FOR gap IN
SELECT * FROM pgtrickle.inbox_ordering_gaps('orders_inbox')
WHERE gap_size > 1
LOOP
RAISE WARNING 'Sequence gap for aggregate %: expected %, got % (gap=%)'
USING DETAIL = gap.aggregate_id || ' seq ' || gap.expected_seq;
END LOOP;
END;
$$;
Consistent-Hash Partitioning (INBOX-B4)
pgtrickle.inbox_is_my_partition(aggregate_id, worker_id, total_workers)
Distribute inbox processing across multiple workers without external
coordination. Returns true when this worker should process messages for the
given aggregate.
pgtrickle.inbox_is_my_partition(
aggregate_id TEXT,
worker_id INT, -- 0-based worker index
total_workers INT
) → BOOLEAN
Uses FNV-1a consistent hashing so the same aggregate always routes to the same worker, preventing concurrent processing.
-- Worker 2 of 4 processes only its assigned aggregates:
SELECT * FROM orders_inbox_pending
WHERE pgtrickle.inbox_is_my_partition(customer_id::text, 2, 4);
Inbox Catalog Tables
pgtrickle.pgt_inbox_config
Catalog of named transactional inbox configurations.
| Column | Type | Description |
|---|---|---|
inbox_name | TEXT | Inbox name (PRIMARY KEY) |
inbox_schema | TEXT | Schema where the inbox table is created (default: pgtrickle) |
max_retries | INT | Maximum retry attempts before a message moves to DLQ (default: 3) |
schedule | TEXT | Refresh schedule for associated stream tables (default: '1s') |
with_dead_letter | BOOL | Whether a dead-letter-queue stream table is created (default: true) |
with_stats | BOOL | Whether a stats stream table is created (default: true) |
retention_hours | INT | How long processed messages are retained (default: 72) |
id_column | TEXT | Column name for the unique event ID (default: 'event_id') |
processed_at_column | TEXT | Column name for the processing timestamp (default: 'processed_at') |
retry_count_column | TEXT | Column name for the retry counter (default: 'retry_count') |
error_column | TEXT | Column name for the last error message (default: 'error') |
received_at_column | TEXT | Column name for the receipt timestamp (default: 'received_at') |
event_type_column | TEXT | Column name for the event type (default: 'event_type') |
is_managed | BOOL | Whether pg_trickle manages the inbox lifecycle (default: true) |
created_at | TIMESTAMPTZ | When the inbox was created |
pgtrickle.pgt_inbox_ordering_config
Per-inbox ordering configuration for per-aggregate sequenced processing (INBOX-B1).
| Column | Type | Description |
|---|---|---|
inbox_name | TEXT | Inbox name (PK, FK → pgt_inbox_config) |
aggregate_id_col | TEXT | Column that identifies the aggregate (e.g., 'customer_id') |
sequence_num_col | TEXT | Monotonic sequence column (e.g., 'event_sequence') |
created_at | TIMESTAMPTZ | When ordering was enabled |
pgtrickle.pgt_inbox_priority_config
Priority tier configuration for inbox message scheduling (INBOX-B2).
| Column | Type | Description |
|---|---|---|
inbox_name | TEXT | Inbox name (PK, FK → pgt_inbox_config) |
priority_col | TEXT | Column that holds the priority value |
tiers | JSONB | Priority tier definitions (threshold → schedule mapping) |
created_at | TIMESTAMPTZ | When priority was enabled |
Note: The relay pipeline SQL API (
set_relay_outbox,set_relay_inbox,enable_relay,disable_relay,delete_relay,get_relay_config,list_relay_configs) was moved to thepg_tideextension in v0.46.0.
Public API Stability Contract
Added in v0.19.0 (DB-6).
Stable (will not break without a major version bump)
| Surface | Guarantee |
|---|---|
All functions in the pgtrickle schema documented in this reference | Signature and return type preserved across minor releases. New optional parameters may be added with defaults that preserve existing behaviour. |
Catalog tables pgtrickle.pgt_stream_tables, pgtrickle.pgt_dependencies, pgtrickle.pgt_refresh_history | Existing columns are not renamed or removed. New columns may be added. |
NOTIFY channels pg_trickle_refresh, pgtrickle_alert, pgtrickle_wake | Channel names and JSON payload structure preserved. New keys may be added to JSON payloads. |
GUC names listed in docs/CONFIGURATION.md | Names preserved; default values may change between minor releases (documented in CHANGELOG). |
Unstable (may change in any release)
| Surface | Notes |
|---|---|
Functions prefixed with _ (e.g. _signal_launcher_rescan) | Internal use only. |
Catalog tables not listed above (e.g. pgt_scheduler_jobs, pgt_source_gates, pgt_watermarks) | Schema may change. |
The pgtrickle_changes schema and its changes_* tables | CDC implementation detail; format may change. |
| SQL generated by the DVM engine (MERGE, delta CTEs) | Internal query structure is not an API. |
The pgtrickle.pgt_schema_version table | Migration infrastructure; rows and schema may change. |
Versioning Policy
- Patch releases (0.x.Y): Bug fixes only. No breaking changes.
- Minor releases (0.X.0): New features. Stable API preserved; unstable surfaces may change. Breaking changes to stable API only with a deprecation cycle (WARNING for one release, removal in the next).
- Major release (1.0.0): Stable API locked. Breaking changes require a major version bump.
See also: Configuration · Patterns · Performance Cookbook · Error Reference · Glossary · FAQ
Reserved Column-Name Prefixes (v0.55.0)
Added in v0.55.0 (M-7).
pg_trickle uses several internal column-name prefixes for synthetic columns it creates during query analysis and delta SQL generation. User-defined columns whose names begin with these prefixes will conflict with internal template tokens and produce incorrect query results.
__pgt_* — DVM engine columns
| Prefix | Purpose | Example |
|---|---|---|
__pgt_count | Weight column for aggregate deduplication (DIFF mode) | __pgt_count |
__pgt_row_id | Content-based row identity hash for tables without primary keys | __pgt_row_id |
__pgt_wf_N | Synthetic window-function lifting columns (rewrite pass #7) | __pgt_wf_1, __pgt_wf_2 |
__pgt_in_sub_* | Derived-table alias for multi-column IN → SemiJoin rewrite (M-5) | __pgt_in_sub_t |
__pgt_src_N | Source partition aliases in generated delta CTEs | __pgt_src_0 |
__pgs_* — Scheduler / shared-memory columns
| Prefix | Purpose | Example |
|---|---|---|
__pgs_* | Reserved for future scheduler-side synthetic columns | (none exposed today) |
Consequences of prefix collision
If your defining query produces a column whose name starts with __pgt_ or
__pgs_, the DVM engine may:
- Fail to generate correct delta SQL (silently wrong results in DIFFERENTIAL mode).
- Produce a parse error if the synthetic name is also used as a template token.
- Cause the MERGE statement to reference the wrong column in the ON clause.
Mitigation: rename the conflicting column using an alias:
-- Bad: __pgt_count collides with the weight column
SELECT id, sum(amount) AS __pgt_count FROM orders GROUP BY id;
-- Good: use any name that does not start with __pgt_ or __pgs_
SELECT id, sum(amount) AS order_total FROM orders GROUP BY id;
Configuration
Complete reference for all pg_trickle GUC (Grand Unified Configuration) variables.
Quick-tuning by goal
Not sure which GUC to change? Start here.
| Goal | GUCs to adjust |
|---|---|
| Lower refresh latency | scheduler_interval_ms, min_schedule_seconds |
| Reduce write overhead on busy tables | compact_threshold, max_buffer_rows, cleanup_use_truncate, user_triggers |
| Handle larger DAGs without timeouts | max_workers, max_dynamic_refresh_workers, scheduler_interval_ms |
| Connection-pooler compatibility (PgBouncer) | pooler_compatibility_mode, use_prepared_statements |
| Lower memory usage during refresh | merge_work_mem_mb, max_delta_estimate_rows |
| Improve cost-model accuracy | cost_model_safety_margin, planner_aggressive, differential_max_change_ratio |
| Enable WAL-based CDC | cdc_mode, wal_transition_timeout, slot_lag_warning_threshold_mb |
| Prevent a runaway stream table | max_consecutive_errors, fuse_threshold, buffer_alert_threshold |
See the full reference below for each variable's defaults, valid values, and
notes. Use pgtrickle.recommend_refresh_mode() for per-table advice.
Table of Contents
- Overview
- GUC Variables
- Essential
- WAL CDC
- Refresh Performance
- pg_trickle.differential_max_change_ratio
- pg_trickle.refresh_strategy
- pg_trickle.cost_model_safety_margin
- pg_trickle.max_delta_estimate_rows
- pg_trickle.planner_aggressive
- pg_trickle.merge_join_strategy
- pg_trickle.merge_strategy
- pg_trickle.merge_strategy_threshold
- pg_trickle.merge_planner_hints (deprecated)
- pg_trickle.merge_work_mem_mb
- pg_trickle.merge_seqscan_threshold
- pg_trickle.auto_backoff
- pg_trickle.tiered_scheduling
- pg_trickle.cleanup_use_truncate
- pg_trickle.use_prepared_statements
- pg_trickle.user_triggers
- Guardrails & Limits
- pg_trickle.block_source_ddl
- pg_trickle.buffer_alert_threshold
- pg_trickle.compact_threshold
- pg_trickle.max_buffer_rows
- pg_trickle.auto_index
- pg_trickle.aggregate_fast_path
- pg_trickle.template_cache
- pg_trickle.buffer_partitioning
- pg_trickle.max_grouping_set_branches
- pg_trickle.max_parse_depth
- pg_trickle.ivm_topk_max_limit
- pg_trickle.ivm_recursive_max_depth
- Parallel Refresh
- Advanced / Internal
- Guardrails & Diagnostics
- Connection Pooler
- History & Retention
- Circular Dependencies
- Scheduler Scalability (v0.25.0)
- Operability, Observability & DR (v0.27.0)
- Transactional Outbox (v0.28.0)
- pg_trickle.outbox_enabled
- pg_trickle.outbox_retention_hours
- pg_trickle.outbox_drain_batch_size
- pg_trickle.outbox_drain_interval_seconds
- pg_trickle.outbox_inline_threshold_rows
- pg_trickle.outbox_skip_empty_delta
- pg_trickle.outbox_storage_critical_mb
- pg_trickle.outbox_force_retention
- pg_trickle.consumer_dead_threshold_hours
- pg_trickle.consumer_stale_offset_threshold_days
- pg_trickle.consumer_cleanup_enabled
- Transactional Inbox (v0.28.0)
- Pre-GA Correctness & Stability (v0.30.0)
- Citus Distributed Tables (v0.32.0+)
- GUC Interaction Matrix
- Tuning Profiles
- Complete postgresql.conf Example
- Runtime Configuration
- Further Reading
Overview
pg_trickle exposes over forty configuration variables in the pg_trickle namespace. All can be set in postgresql.conf or at runtime via SET / ALTER SYSTEM.
Required postgresql.conf settings:
shared_preload_libraries = 'pg_trickle'
The extension must be loaded via shared_preload_libraries because it registers GUC variables and a background worker at startup.
Note:
wal_level = logicalandmax_replication_slotsare recommended but not required. The default CDC mode (auto) uses lightweight row-level triggers initially and transparently transitions to WAL-based capture ifwal_level = logicalis available. Ifwal_levelis notlogical, pg_trickle stays on triggers permanently — no degradation, no errors. Setpg_trickle.cdc_mode = 'trigger'to disable WAL transitions entirely (see pg_trickle.cdc_mode).
GUC Variables
Essential
The settings most users configure at install time.
pg_trickle.enabled
Enable or disable the pg_trickle extension.
| Property | Value |
|---|---|
| Type | bool |
| Default | true |
| Context | SUSET (superuser) |
| Restart Required | No |
When set to false, the background scheduler stops processing refreshes. Existing stream tables remain in the catalog but are not refreshed. Manual pgtrickle.refresh_stream_table() calls still work.
Note on CDC triggers: Setting
enabled = falsestops the scheduler from refreshing stream tables but does not disable CDC trigger execution. Change buffers continue to accumulate. This is intentional: when the extension is re-enabled, stream tables can refresh immediately from the buffered changes rather than performing a full table scan.To fully quiesce CDC overhead during extended maintenance, use
pgtrickle.drain()before disabling, thenDROP TRIGGERthe CDC triggers manually and recreate them viapgtrickle.repair_stream_table()when re-enabling.
-- Disable automatic refreshes
SET pg_trickle.enabled = false;
-- Re-enable
SET pg_trickle.enabled = true;
pg_trickle.cdc_mode
CDC (Change Data Capture) mechanism selection.
| Value | Description |
|---|---|
'auto' | (default) Use triggers for creation; transition to WAL-based CDC if wal_level = logical. Falls back to triggers automatically on error. |
'trigger' | Always use row-level triggers for change capture |
'wal' | Require WAL-based CDC (fails if wal_level != logical) |
Default: 'auto'
pg_trickle.cdc_mode only affects deferred refresh modes ('AUTO', 'FULL',
and 'DIFFERENTIAL'). refresh_mode = 'IMMEDIATE' bypasses CDC entirely and
always uses statement-level IVM triggers. If the GUC is set to 'wal' when a
stream table is created or altered to IMMEDIATE, pg_trickle logs an INFO and
continues with IVM triggers instead of creating CDC triggers or WAL slots.
Per-stream-table overrides take precedence over the GUC when you pass
cdc_mode => 'auto' | 'trigger' | 'wal' to
pgtrickle.create_stream_table(...) or pgtrickle.alter_stream_table(...).
The override is stored in pgtrickle.pgt_stream_tables.requested_cdc_mode.
For shared source tables, pg_trickle resolves the effective source-level CDC
mechanism conservatively: any dependent stream table that requests 'trigger'
keeps the source on trigger CDC; otherwise 'wal' wins over 'auto'.
-- Enable automatic trigger → WAL transition (default)
SET pg_trickle.cdc_mode = 'auto';
-- Force trigger-only CDC (disable WAL transitions)
SET pg_trickle.cdc_mode = 'trigger';
-- Require WAL-based CDC (error if wal_level != logical)
SET pg_trickle.cdc_mode = 'wal';
pg_trickle.scheduler_interval_ms
How often the background scheduler checks for stream tables that need refreshing.
| Property | Value |
|---|---|
| Type | int |
| Default | 1000 (1 second) |
| Range | 100 – 60000 (100ms to 60s) |
| Context | SUSET |
| Restart Required | No |
Tuning Guidance:
- Low-latency workloads (sub-second schedule): Set to
100–500. - Standard workloads (minutes of schedule): Default
1000is appropriate. - Low-overhead workloads (many STs with long schedules): Increase to
5000–10000to reduce scheduler overhead.
The scheduler interval does not determine refresh frequency — it determines how often the scheduler checks whether any ST's staleness exceeds its schedule (or whether a cron expression has fired). The actual refresh frequency is governed by schedule (duration or cron) and canonical period alignment.
SET pg_trickle.scheduler_interval_ms = 500;
pg_trickle.event_driven_wake
⚠️ Removed in v0.51.0 — This GUC has been removed. It had no effect since v0.39.0 because PostgreSQL's
LISTENcommand is not permitted inside background worker processes. The scheduler always uses efficient latch-based polling regardless of this setting.Migration: Remove
pg_trickle.event_driven_wakefrompostgresql.confand anyALTER SYSTEMsettings. The scheduler behavior is unchanged — it wakes atpg_trickle.scheduler_interval_msintervals. To reduce latency, lowerscheduler_interval_msinstead (e.g.200ms for sub-200 ms refresh latency).
pg_trickle.wake_debounce_ms
⚠️ Removed in v0.51.0 — This GUC has been removed together with
event_driven_wake. It had no effect sinceevent_driven_wakewas always non-functional in background workers.Migration: Remove
pg_trickle.wake_debounce_msfrompostgresql.confand anyALTER SYSTEMsettings. No replacement is needed.
pg_trickle.min_schedule_seconds
Minimum allowed schedule value (in seconds) when creating or altering a stream table with a duration-based schedule. This limit does not apply to cron expressions.
| Property | Value |
|---|---|
| Type | int |
| Default | 1 (1 second) |
| Range | 1 – 86400 (1 second to 24 hours) |
| Context | SUSET |
| Restart Required | No |
This acts as a safety guardrail to prevent users from setting impractically small schedules that would cause excessive refresh overhead.
Tuning Guidance:
- Development/testing: Default
1allows sub-second testing. - Production: Raise to
60or higher to prevent excessive WAL consumption and CPU usage.
-- Restrict to 10-second minimum schedules
SET pg_trickle.min_schedule_seconds = 10;
pg_trickle.default_schedule_seconds
Default effective schedule (in seconds) for isolated CALCULATED stream tables that have no downstream dependents.
| Property | Value |
|---|---|
| Type | int |
| Default | 1 (1 second) |
| Range | 1 – 86400 (1 second to 24 hours) |
| Context | SUSET |
| Restart Required | No |
When a CALCULATED stream table (scheduled with 'calculated') has no downstream dependents to derive a schedule from, this value is used as its effective refresh interval. This is distinct from min_schedule_seconds, which is the validation floor for duration-based schedules.
Tuning Guidance:
- Development/testing: Default
1allows rapid iteration. - Production standalone CALCULATED tables: Raise to match your desired update cadence (e.g.,
60for once-per-minute).
-- Set default for isolated CALCULATED tables to 30 seconds
SET pg_trickle.default_schedule_seconds = 30;
pg_trickle.max_consecutive_errors
Maximum consecutive refresh failures before a stream table is moved to ERROR status.
| Property | Value |
|---|---|
| Type | int |
| Default | 3 |
| Range | 1 – 100 |
| Context | SUSET |
| Restart Required | No |
When a ST's consecutive_errors reaches this threshold:
- The ST status changes to
ERROR. - Automatic refreshes stop for this ST.
- Manual intervention is required:
SELECT pgtrickle.alter_stream_table('...', status => 'ACTIVE').
Tuning Guidance:
- Strict (production):
3— fail fast to surface issues. - Lenient (development):
10–20— tolerate transient errors.
SET pg_trickle.max_consecutive_errors = 5;
WAL CDC
Settings specific to WAL-based CDC. Only relevant when pg_trickle.cdc_mode = 'auto' or 'wal'.
pg_trickle.wal_transition_timeout
Note: This setting is only relevant when
pg_trickle.cdc_mode = 'auto'or'wal'. See ARCHITECTURE.md for the full CDC transition lifecycle.
Maximum time (seconds) to wait for the WAL decoder to catch up during the transition from trigger-based to WAL-based CDC. If the decoder has not caught up within this timeout, the system falls back to triggers.
Default: 300 (5 minutes)
Range: 10 – 3600
SET pg_trickle.wal_transition_timeout = 300;
pg_trickle.slot_lag_warning_threshold_mb
Warning threshold for retained WAL on pg_trickle replication slots.
| Property | Value |
|---|---|
| Type | int |
| Default | 100 (MB) |
| Range | 1 – 1048576 |
| Context | SUSET |
| Restart Required | No |
When retained WAL for a pg_trickle replication slot exceeds this threshold:
- The scheduler emits a
slot_lag_warningevent onLISTEN pg_trickle_alert pgtrickle.health_check()reportsWARNfor theslot_lagcheck
Raise this on high-throughput systems that intentionally tolerate larger WAL retention. Lower it if you want earlier warning before slots risk invalidation.
SET pg_trickle.slot_lag_warning_threshold_mb = 256;
pg_trickle.slot_lag_critical_threshold_mb
Critical threshold for retained WAL on pg_trickle replication slots.
| Property | Value |
|---|---|
| Type | int |
| Default | 1024 (MB) |
| Range | 1 – 1048576 |
| Context | SUSET |
| Restart Required | No |
When retained WAL for a pg_trickle replication slot exceeds this threshold,
pgtrickle.check_cdc_health() returns a per-source
slot_lag_exceeds_threshold alert.
This threshold is intentionally higher than the warning threshold so operators can separate early warning from source-level unhealthy state.
SET pg_trickle.slot_lag_critical_threshold_mb = 2048;
Refresh Performance
Fine-grained tuning for the differential refresh engine.
pg_trickle.differential_max_change_ratio
Maximum change-to-table ratio before DIFFERENTIAL refresh falls back to FULL refresh.
| Property | Value |
|---|---|
| Type | float |
| Default | 0.15 (15%) |
| Range | 0.0 – 1.0 |
| Context | SUSET |
| Restart Required | No |
When the number of pending change buffer rows exceeds this fraction of the source table's estimated row count, the refresh engine switches from DIFFERENTIAL (which uses JSONB parsing and window functions) to FULL refresh. At high change rates FULL refresh is cheaper because it avoids the per-row JSONB overhead.
Special Values:
0.0: Disable adaptive fallback — always use DIFFERENTIAL.1.0: Always fall back to FULL (effectively forces FULL mode).
Tuning Guidance:
- OLTP with low change rates (< 5%): Default
0.15is appropriate. - Batch-load workloads (bulk inserts): Lower to
0.05–0.10so large batches trigger FULL refresh sooner. - Latency-sensitive (want deterministic refresh time): Set to
0.0to always use DIFFERENTIAL.
-- Lower threshold for batch-heavy workloads
SET pg_trickle.differential_max_change_ratio = 0.10;
-- Disable adaptive fallback
SET pg_trickle.differential_max_change_ratio = 0.0;
pg_trickle.refresh_strategy
Cluster-wide refresh strategy override.
| Property | Value |
|---|---|
| Type | string |
| Default | 'auto' |
| Values | 'auto', 'differential', 'full' |
| Context | SUSET |
| Restart Required | No |
Controls the FULL vs. DIFFERENTIAL decision for all stream tables whose refresh_mode is DIFFERENTIAL:
'auto'(default): Use the adaptive cost-based heuristic that considersdifferential_max_change_ratio, per-STauto_threshold, refresh history, and spill detection to pick the optimal strategy per refresh cycle.'differential': Always use DIFFERENTIAL refresh — skip the adaptive ratio check entirely. The BUF-LIMIT safety check (max_buffer_rows) still applies.'full': Always use FULL refresh regardless of change volume. Useful for debugging or when you know DIFFERENTIAL is consistently slower for your workload.
Important: Per-ST refresh_mode in the catalog takes precedence. Stream tables explicitly configured as refresh_mode = 'FULL' always use FULL regardless of this GUC.
Tuning Guidance:
- Most workloads: Leave at
'auto'— the adaptive heuristic learns from refresh history. - Known-low-churn workloads: Set to
'differential'to eliminate the per-source capped-count query overhead. - Debugging delta issues: Temporarily set to
'full'to compare behavior.
-- Force DIFFERENTIAL for all stream tables (skip ratio check)
SET pg_trickle.refresh_strategy = 'differential';
-- Force FULL for all stream tables (debugging)
SET pg_trickle.refresh_strategy = 'full';
-- Reset to adaptive heuristic
SET pg_trickle.refresh_strategy = 'auto';
pg_trickle.cost_model_safety_margin
Added in v0.17.0. Safety margin for the predictive cost model that decides FULL vs. DIFFERENTIAL.
| Property | Value |
|---|---|
| Type | float |
| Default | 0.8 |
| Range | 0.1 – 2.0 |
| Context | SUSET |
| Restart Required | No |
When refresh_strategy = 'auto', the cost model estimates DIFFERENTIAL and FULL costs from recent refresh history. DIFFERENTIAL is chosen when:
estimated_diff_cost < estimated_full_cost × safety_margin
A value below 1.0 biases toward DIFFERENTIAL (which has lower lock contention and is generally preferred). A value above 1.0 biases toward FULL.
The cost model also classifies each stream table's query complexity (scan, filter, aggregate, join, or join+aggregate) and uses per-class coefficients learned from historical data.
Tuning Guidance:
0.8(default): Prefer DIFFERENTIAL unless it's nearly as expensive as FULL.0.5: Strongly prefer DIFFERENTIAL — only fall back when it's clearly more expensive.1.0: Neutral — pick whichever is estimated to be cheaper.1.2: Slightly prefer FULL — useful when FULL is very fast and DIFFERENTIAL lock contention is a concern.
-- Strongly prefer DIFFERENTIAL
SET pg_trickle.cost_model_safety_margin = 0.5;
-- Neutral (pick the estimated cheapest)
SET pg_trickle.cost_model_safety_margin = 1.0;
pg_trickle.max_delta_estimate_rows
Added in v0.15.0. Maximum estimated delta output cardinality before falling back to FULL refresh.
| Property | Value |
|---|---|
| Type | int |
| Default | 0 (disabled) |
| Range | 0 – 10,000,000 |
| Context | SUSET |
| Restart Required | No |
Before executing the MERGE, the refresh executor extracts the delta subquery and runs a capped SELECT count(*) FROM (delta LIMIT N+1). If the count reaches the configured limit, the refresh emits a NOTICE and falls back to FULL refresh to prevent OOM or excessive temp-file spills from unexpectedly large delta output.
This is complementary to differential_max_change_ratio which checks input change buffer size as a ratio of source table size. max_delta_estimate_rows checks output cardinality — catching cases where a small number of input changes produce a large delta output after JOINs.
Special Values:
0(default): Disable the estimation check entirely.
Tuning Guidance:
- Servers with 8–16 GB RAM: Start with
100000and adjust based on observed refresh behavior. - Large-memory servers (32+ GB):
500000or higher. - Complex multi-join queries: Lower to
50000since join fan-out can amplify small changes.
-- Enable delta output estimation with 100K row limit
SET pg_trickle.max_delta_estimate_rows = 100000;
-- Disable estimation (default)
SET pg_trickle.max_delta_estimate_rows = 0;
pg_trickle.cleanup_use_truncate
Use TRUNCATE instead of per-row DELETE for change buffer cleanup when the entire buffer is consumed by a refresh.
| Property | Value |
|---|---|
| Type | bool |
| Default | true |
| Context | SUSET |
| Restart Required | No |
After a differential refresh consumes all rows from the change buffer, the engine must clean up the buffer table. TRUNCATE is O(1) regardless of row count, versus DELETE which must update indexes row-by-row. This saves 3–5 ms per refresh at 10%+ change rates.
Trade-off: TRUNCATE acquires an AccessExclusiveLock on the change buffer table. If concurrent DML on the source table is actively inserting into the same change buffer via triggers, this lock can cause brief contention.
Tuning Guidance:
- Most workloads: Leave at
true— the performance benefit outweighs the brief lock. - High-concurrency OLTP with continuous writes during refresh: Set to
falseif you observe lock-wait timeouts on the change buffer. - PgBouncer / connection poolers: The
AccessExclusiveLockacquired byTRUNCATEis held only on the change buffer table (not the source table), but in transaction-pooling mode with frequent refreshes, even brief exclusive locks can cause connection queuing. If you observe elevatedpg_stat_activitywait events on change buffer tables, switch tofalse.
-- Use per-row DELETE for change buffer cleanup
SET pg_trickle.cleanup_use_truncate = false;
pg_trickle.planner_aggressive
Added in v0.14.0. Consolidated switch for all MERGE planner hints. Replaces the deprecated merge_planner_hints and merge_work_mem_mb GUCs.
| Property | Value |
|---|---|
| Type | bool |
| Default | true |
| Context | SUSET |
| Restart Required | No |
When enabled, the refresh executor estimates the delta size and applies optimizer hints within the transaction:
- Delta ≥ 100 rows:
SET LOCAL enable_nestloop = off— forces hash joins instead of nested-loop joins. - Delta ≥ 10,000 rows: additionally
SET LOCAL work_mem = '<N>MB'(see pg_trickle.merge_work_mem_mb).
Tuning Guidance:
- Most workloads: Leave at
true— the hints improve tail latency without affecting small deltas. - Custom plan overrides: Set to
falseif you manage planner settings yourself or if the hints conflict with yourpg_hint_planconfiguration. - Memory-constrained environments: When enabled, large deltas (≥ 10,000 rows) raise
work_memto 64 MB (configurable viamerge_work_mem_mb). If your server has limited RAM and runs many concurrent refreshes, this can cause unexpected memory pressure or temp-file spills. Monitortemp_blks_writteninpg_stat_statementsand consider loweringmerge_work_mem_mbor disabling this GUC if spills are frequent.
-- Disable all planner hints
SET pg_trickle.planner_aggressive = false;
pg_trickle.merge_join_strategy
Added in v0.15.0. Manual override for the join strategy used during MERGE execution.
| Property | Value |
|---|---|
| Type | text |
| Default | 'auto' |
| Values | auto, hash_join, nested_loop, merge_join |
| Context | SUSET |
| Restart Required | No |
Controls which join strategy the refresh executor hints to PostgreSQL via SET LOCAL during differential refresh. Requires planner_aggressive to be enabled.
| Value | Behaviour |
|---|---|
auto (default) | Delta-size heuristics choose: nested-loop for tiny deltas, hash-join for larger ones |
hash_join | Always disable nested-loop joins and raise work_mem — best for medium-to-large deltas |
nested_loop | Always disable hash-join and merge-join — best for very small deltas against indexed tables |
merge_join | Always disable hash-join and nested-loop — useful if data is pre-sorted |
Tuning Guidance:
- Most workloads: Leave at
auto— the built-in heuristic performs well. - Consistently large deltas (1K+ rows): Setting to
hash_joinavoids heuristic overhead. - Troubleshooting: If refresh is slow, try different strategies and compare with
explain_st().
-- Force hash joins for all MERGE operations
SET pg_trickle.merge_join_strategy = 'hash_join';
-- Revert to automatic heuristics
SET pg_trickle.merge_join_strategy = 'auto';
pg_trickle.merge_strategy
Added in v0.16.0. Controls how differential refresh applies deltas to stream tables.
| Property | Value |
|---|---|
| Type | text |
| Default | 'auto' |
| Values | auto, merge |
| Context | SUSET |
| Restart Required | No |
| Value | Behaviour |
|---|---|
auto (default) | Use DELETE+INSERT when delta_rows / target_rows is below merge_strategy_threshold; MERGE otherwise |
merge | Always use the PostgreSQL MERGE statement |
Breaking change (v0.19.0): The
delete_insertvalue was removed in v0.19.0 (CORR-1) because it was semantically unsafe for aggregate and DISTINCT queries. Setting it now logs a WARNING and falls back toauto.
The DELETE+INSERT strategy avoids the MERGE join cost by executing two targeted statements:
a DELETE for removed rows (matched by __pgt_row_id), then an INSERT for new rows.
This is significantly cheaper for sub-1% deltas against large tables because it avoids
scanning the entire target for the MERGE join.
Tuning Guidance:
- Most workloads: Leave at
auto— the heuristic picks DELETE+INSERT for small deltas automatically. - Correctness concerns: The
mergesetting preserves the pre-v0.16.0 behaviour.
-- Force MERGE for all differential refreshes
SET pg_trickle.merge_strategy = 'merge';
-- Revert to automatic heuristics
SET pg_trickle.merge_strategy = 'auto';
pg_trickle.merge_strategy_threshold
Added in v0.16.0. Delta ratio threshold for the auto merge strategy.
| Property | Value |
|---|---|
| Type | float |
| Default | 0.01 (1%) |
| Range | 0.001 – 1.0 |
| Context | SUSET |
| Restart Required | No |
When merge_strategy is auto, DELETE+INSERT is used instead of
MERGE when delta_rows / target_rows is below this threshold. The target row count is estimated
from pg_class.reltuples.
Tuning Guidance:
- Default (0.01): DELETE+INSERT for deltas under 1% of the target table size.
- Higher values (0.05–0.10): More aggressive use of DELETE+INSERT; useful for wide tables where MERGE join overhead is high.
- Lower values (0.001): Only use DELETE+INSERT for very tiny deltas.
-- Use DELETE+INSERT for deltas under 5% of target size
SET pg_trickle.merge_strategy_threshold = 0.05;
pg_trickle.merge_planner_hints
Deprecated in v0.14.0. Use
pg_trickle.planner_aggressiveinstead. This GUC is still accepted for backward compatibility but is ignored at runtime.
Inject SET LOCAL planner hints before MERGE execution during differential refresh.
| Property | Value |
|---|---|
| Type | bool |
| Default | true |
| Context | SUSET |
| Restart Required | No |
When enabled, the refresh executor estimates the delta size and applies optimizer hints within the transaction:
- Delta ≥ 100 rows:
SET LOCAL enable_nestloop = off— forces hash joins instead of nested-loop joins. - Delta ≥ 10,000 rows: additionally
SET LOCAL work_mem = '<N>MB'(see pg_trickle.merge_work_mem_mb).
This reduces P95 latency spikes caused by PostgreSQL choosing nested-loop plans for medium/large delta sizes.
Tuning Guidance:
- Most workloads: Leave at
true— the hints improve tail latency without affecting small deltas. - Custom plan overrides: Set to
falseif you manage planner settings yourself or if the hints conflict with yourpg_hint_planconfiguration.
-- Disable planner hints
SET pg_trickle.merge_planner_hints = false;
pg_trickle.merge_work_mem_mb
work_mem value (in MB) applied via SET LOCAL when the delta exceeds 10,000 rows and planner hints are enabled.
| Property | Value |
|---|---|
| Type | int |
| Default | 64 (64 MB) |
| Range | 8 – 4096 (8 MB to 4 GB) |
| Context | SUSET |
| Restart Required | No |
A higher value lets PostgreSQL use larger in-memory hash tables for the MERGE join, avoiding disk-spilling sort/merge strategies on large deltas. This setting is only applied when both merge_planner_hints = true and the delta exceeds 10,000 rows.
Tuning Guidance:
- Servers with ample RAM (32+ GB): Increase to
128–256for faster large-delta refreshes. - Memory-constrained: Lower to
16–32or disable planner hints entirely. - Very large deltas (100K+ rows): Consider
256–512if refresh latency matters.
SET pg_trickle.merge_work_mem_mb = 128;
pg_trickle.delta_work_mem_cap_mb
Maximum work_mem (in MB) that planner hints are allowed to set during delta MERGE execution. When the deep-join or large-delta path would set work_mem above this cap, the refresh falls back to FULL instead of risking OOM.
| Property | Value |
|---|---|
| Type | int |
| Default | 0 (disabled — no cap) |
| Range | 0 – 8192 (0 to 8 GB) |
| Context | SUSET |
| Restart Required | No |
Set to 0 to disable the cap entirely (default). When enabled, the cap is checked before any SET LOCAL work_mem in apply_planner_hints(). If the configured or computed work_mem exceeds the cap, the refresh emits a NOTICE and falls back to FULL refresh.
Tuning Guidance:
- Production servers with tight memory budgets: Set to
256–512to prevent runaway hash joins. - Servers with ample RAM (64+ GB): Leave at
0(disabled) or set high (2048+). - If you see SCAL-3 fallback notices: Either raise the cap or investigate why delta sizes are unexpectedly large.
SET pg_trickle.delta_work_mem_cap_mb = 512;
pg_trickle.merge_seqscan_threshold
Delta-to-ST row ratio below which sequential scans are disabled for the MERGE transaction. Requires planner hints to be enabled.
| Property | Value |
|---|---|
| Type | real |
| Default | 0.001 |
| Range | 0.0 – 1.0 |
| Context | SUSET |
| Restart Required | No |
When the estimated delta row count divided by the stream table's reltuples falls below this threshold, the refresh executor issues SET LOCAL enable_seqscan = off, coercing PostgreSQL into using the __pgt_row_id B-tree index instead of a full sequential scan.
Set to 0.0 to disable the feature entirely.
Tuning Guidance:
- Default (
0.001): Suitable for most workloads. A 10M-row ST with fewer than 10K delta rows triggers the hint. - High-throughput / small STs: Increase to
0.01if your STs are small and you want more aggressive index usage. - Disable: Set to
0.0if index-only scans are not beneficial for your access pattern.
SET pg_trickle.merge_seqscan_threshold = 0.01;
pg_trickle.auto_backoff
Automatically back off the refresh schedule when a stream table is consistently falling behind.
| Property | Value |
|---|---|
| Type | bool |
| Default | on |
| Context | SUSET |
| Restart Required | No |
When enabled (the default), the scheduler tracks a per-stream-table backoff factor. If a
refresh cycle takes more than 95% of the scheduled interval, the backoff factor doubles
(capped at 8×), effectively stretching the schedule to avoid runaway refresh storms.
The factor resets to 1× on the first on-time completion, and a WARNING is emitted whenever
the factor changes so you always know why a stream table is refreshing more slowly than expected.
The 95% trigger threshold means that brief jitter on developer machines (e.g. a 950 ms refresh
on a 1-second schedule) will correctly engage backoff, while a 900 ms refresh on the same
schedule will not. The EC-11 operator alert (scheduler_falling_behind NOTIFY) continues to
fire at the lower 80% threshold, giving you advance warning before the scheduler is actually stuck.
This is a safety net for overloaded systems — it prevents a single slow stream table from monopolizing the background worker when operators are not available to intervene.
Tuning Guidance:
- Leave on (the default) for both production and development environments.
- Disable only if you are deliberately running stream tables at the limit of their schedule budget and want the scheduler to keep trying at full speed regardless.
-- Disable if you want no backoff (not recommended for production)
SET pg_trickle.auto_backoff = off;
pg_trickle.tiered_scheduling
Enable tiered refresh scheduling (Hot/Warm/Cold/Frozen) for stream tables.
| Property | Value |
|---|---|
| Type | bool |
| Default | on |
| Context | SUSET |
| Restart Required | No |
When enabled, the scheduler applies a per-stream-table refresh tier multiplier
to duration-based schedules. Each stream table has a refresh_tier column
(default 'hot') that controls how often it is refreshed relative to its
configured schedule:
| Tier | Multiplier | Effect |
|---|---|---|
hot | 1× | Refresh at configured schedule (default) |
warm | 2× | Refresh at 2× the configured interval |
cold | 10× | Refresh at 10× the configured interval |
frozen | skip | Never refreshed until manually promoted |
Cron-based schedules are not affected by the tier multiplier.
Set the tier via:
SELECT pgtrickle.alter_stream_table('my_table', tier => 'warm');
SELECT pgtrickle.alter_stream_table('my_table', tier => 'frozen');
Design note: Tiers are user-assigned only. Automatic classification from
pg_stat_user_tables was rejected because pg_trickle's own MERGE scans
pollute the read counters, making auto-classification unreliable.
Tier Thresholds Reference
The following table summarizes the effective refresh behavior for each tier.
All multipliers apply to duration-based schedules only — cron-based
schedules are always honored as-is. New stream tables default to hot.
| Tier | Multiplier | Effective Schedule (1 s base) | Use Case |
|---|---|---|---|
hot | 1× | 1 s | Real-time dashboards, alerting tables, SLA-bound queries |
warm | 2× | 2 s | Important but not latency-critical tables; reduces CPU by 50% |
cold | 10× | 10 s | Reporting tables queried infrequently; saves significant CPU |
frozen | skip | never (until promoted) | Archival tables, tables under maintenance, or seasonal reports |
When to use each tier:
- Hot — default for all new stream tables. Appropriate when downstream consumers expect near-real-time freshness.
- Warm — set for tables where a few seconds of staleness is acceptable. Halves the refresh CPU cost compared to Hot.
- Cold — set for tables queried only by batch jobs or low-frequency dashboards. 10× reduction in refresh overhead.
- Frozen — set when a table should not be refreshed at all (e.g., during a maintenance window or when the upstream source is being migrated). Promote back to Hot/Warm/Cold when ready.
-- Promote a frozen table back to warm
SELECT pgtrickle.alter_stream_table('seasonal_report', tier => 'warm');
-- Freeze a table during maintenance
SELECT pgtrickle.alter_stream_table('my_table', tier => 'frozen');
Changed in v0.12.0: The default for
pg_trickle.tiered_schedulingchanged fromofftoon. Setpg_trickle.tiered_scheduling = offinpostgresql.confto restore pre-v0.12.0 behavior (all STs refresh at full speed regardless of tier assignment).
Diamond Schedule Policy (per-stream-table)
Controls how the scheduler fires diamond consistency groups — sets of stream tables that share upstream sources through a diamond-shaped DAG topology.
| Property | Value |
|---|---|
| Column | diamond_schedule_policy in pgt_stream_tables |
| Values | 'fastest' (default), 'slowest' |
| Set via | create_stream_table(..., diamond_schedule_policy => 'slowest') |
| Alter via | alter_stream_table('name', diamond_schedule_policy => 'slowest') |
Only meaningful when diamond_consistency = 'atomic' is also set.
fastest (default): The atomic group fires when any member is due.
This maximizes freshness but can cause CPU multiplication. In an asymmetric
diamond where stream table B refreshes every 1 s and stream table C every 5 s,
both feeding D with diamond_consistency = 'atomic': C refreshes 5× more
often than its schedule because B triggers the group every second. For N
members with schedules S₁ < S₂ < … < Sₙ, the total refresh count is
N × (cycle_time / S₁), meaning slower members do up to Sₙ/S₁ times more work
than their schedule implies.
slowest: The atomic group fires only when all members are due.
This minimizes CPU cost at the expense of freshness — faster members are held
back until the slowest member's schedule fires.
Tuning Guidance:
- Use
'fastest'when freshness of the diamond tip matters and the cost of extra refreshes is acceptable. - Use
'slowest'when CPU budget is tight or members have very different schedules (e.g., 1 s vs 60 s) and the multiplication would be excessive.
-- Create with slowest policy to avoid CPU multiplication
SELECT pgtrickle.create_stream_table(
'my_diamond_tip',
'SELECT ... FROM a JOIN b ...',
diamond_consistency => 'atomic',
diamond_schedule_policy => 'slowest'
);
pg_trickle.use_prepared_statements
Use SQL PREPARE / EXECUTE for MERGE statements during differential refresh.
| Property | Value |
|---|---|
| Type | bool |
| Default | true |
| Context | SUSET |
| Restart Required | No |
When enabled, the refresh executor issues PREPARE __pgt_merge_{id} on the first cache-hit cycle, then uses EXECUTE on subsequent cycles. After approximately 5 executions, PostgreSQL switches from a custom plan to a generic plan, saving 1–2 ms of parse/plan overhead per refresh.
Tuning Guidance:
- Most workloads: Leave at
true— the cumulative parse/plan savings are significant for frequently-refreshed stream tables. - Highly skewed data: Set to
falseif prepared-statement parameter sniffing produces poor plans (e.g., highly skewed LSN distributions causing bad join estimates).
-- Disable prepared statements
SET pg_trickle.use_prepared_statements = false;
pg_trickle.user_triggers
Control how user-defined triggers on stream tables are handled during refresh.
| Property | Value |
|---|---|
| Type | text |
| Default | 'auto' |
| Values | 'auto', 'off' ('on' accepted as deprecated alias for 'auto') |
| Context | SUSET |
| Restart Required | No |
When a stream table has user-defined row-level triggers, the refresh engine can decompose the MERGE into explicit DELETE + UPDATE + INSERT statements so triggers fire with correct TG_OP, OLD, and NEW values.
Values:
auto(default): Automatically detect user triggers on the stream table. If present, use the explicit DML path; otherwise useMERGE.off: Always useMERGE. User triggers are suppressed during refresh. This is the escape hatch if explicit DML causes issues.on: Deprecated compatibility alias forauto. Existing configs continue to work, but new configs should useauto.
Notes:
- Row-level triggers do not fire during FULL refresh regardless of this setting. FULL refresh uses
DISABLE TRIGGER USER/ENABLE TRIGGER USERto suppress them. - The explicit DML path adds ~25–60% overhead compared to MERGE for affected stream tables.
- Stream tables without user triggers have zero overhead when using
auto(only a fastpg_triggercheck).
-- Auto-detect (default)
SET pg_trickle.user_triggers = 'auto';
-- Suppress triggers, use MERGE
SET pg_trickle.user_triggers = 'off';
-- Backward-compatible legacy setting (treated the same as 'auto')
SET pg_trickle.user_triggers = 'on';
Guardrails & Limits
Safety controls and hard limits.
pg_trickle.block_source_ddl
When enabled, column-affecting DDL (e.g., ALTER TABLE ... DROP COLUMN,
ALTER TABLE ... ALTER COLUMN ... TYPE) on source tables tracked by stream
tables is blocked with an ERROR instead of silently marking stream tables
for reinitialization.
This is useful in production environments where you want to prevent accidental schema changes that would trigger expensive full recomputation of downstream stream tables.
Default: false
Context: Superuser
-- Block column-affecting DDL on tracked source tables
SET pg_trickle.block_source_ddl = true;
-- Allow DDL (stream tables will be marked for reinit instead)
SET pg_trickle.block_source_ddl = false;
Note: Only column-affecting changes are blocked. Benign DDL (adding indexes, comments, constraints) is always allowed regardless of this setting.
pg_trickle.buffer_alert_threshold
When any source table's change buffer exceeds this number of rows, a
BufferGrowthWarning alert is emitted. Raise for high-throughput workloads,
lower for small tables.
Default: 1000000 (1 million rows)
Range: 1000 – 100000000
SET pg_trickle.buffer_alert_threshold = 500000;
pg_trickle.compact_threshold
When a source table's pending change buffer exceeds this many rows,
compaction is triggered before the next refresh cycle. Compaction eliminates
net-zero INSERT+DELETE pairs (rows inserted then deleted within the same
refresh window) and collapses multi-change groups to first+last rows per
pk_hash, reducing delta scan overhead by 50–90% for high-churn tables.
Set to 0 to disable compaction.
Default: 100000 (100K rows)
Range: 0 – 100000000
-- Trigger compaction at 50K pending rows
SET pg_trickle.compact_threshold = 50000;
-- Disable compaction
SET pg_trickle.compact_threshold = 0;
pg_trickle.max_buffer_rows
Added in v0.16.0. Hard limit on change buffer rows per source table. When a source table's change buffer exceeds this limit at refresh time, pg_trickle forces a FULL refresh and truncates the buffer, preventing unbounded disk growth when differential refresh fails repeatedly.
| Property | Value |
|---|---|
| Type | integer |
| Default | 1000000 (1 million rows) |
| Range | 0 – 100000000 |
| Context | SUSET |
| Restart Required | No |
Set to 0 to disable the limit (not recommended for production).
Tuning Guidance:
- Most workloads: Leave at
1000000. This accommodates high-throughput tables while preventing runaway growth. - High-throughput event tables: Raise to
5000000–10000000if your source tables regularly accumulate large change buffers between refresh cycles. - Small databases / tight disk budgets: Lower to
100000–500000to limit change buffer disk usage.
-- Set buffer limit to 5 million rows
SET pg_trickle.max_buffer_rows = 5000000;
-- Disable the limit (not recommended)
SET pg_trickle.max_buffer_rows = 0;
pg_trickle.auto_index
Added in v0.16.0. Controls whether create_stream_table() automatically
creates performance indexes on stream tables.
| Property | Value |
|---|---|
| Type | bool |
| Default | true |
| Context | SUSET |
| Restart Required | No |
When enabled, the following indexes are created automatically:
-
GROUP BY composite index — for aggregate queries in DIFFERENTIAL mode, a composite index on the GROUP BY columns is created to speed up group lookups during MERGE.
-
DISTINCT composite index — for DISTINCT queries with ≤ 8 output columns, a composite index on all output columns is created.
-
Covering
__pgt_row_idindex — for stream tables with ≤ 8 output columns, the__pgt_row_idindex includes all user columns viaINCLUDE, enabling index-only scans during MERGE (20–50% faster for small deltas against large targets).
The __pgt_row_id index itself is always created regardless of this setting
(it is required for correctness).
Tuning Guidance:
- Most workloads: Leave at
true. - Custom index strategies: Set to
falseif you prefer to manage indexes manually or if the auto-created indexes conflict with your workload patterns.
-- Disable automatic index creation
SET pg_trickle.auto_index = false;
pg_trickle.aggregate_fast_path
Added in v0.16.0. Controls whether stream tables with all-algebraic aggregates use the explicit DML fast-path instead of MERGE.
| Property | Value |
|---|---|
| Type | bool |
| Default | true |
| Context | SUSET |
| Restart Required | No |
When enabled, stream tables whose aggregates are all algebraically invertible (COUNT, SUM, AVG, STDDEV, VAR, CORR, REGR_*, etc.) use the explicit DML path (DELETE + UPDATE + INSERT via a materialized temp table) instead of the generic MERGE statement. This avoids the MERGE hash-join cost, which dominates for aggregate queries with high group cardinality.
Not eligible:
- Queries with SEMI_ALGEBRAIC aggregates (MIN, MAX) — these may require group-rescan on extremum deletion
- Queries with GROUP_RESCAN aggregates (STRING_AGG, ARRAY_AGG, JSON_AGG, etc.)
- Queries with user-defined triggers on the stream table (already use explicit DML via the user-trigger path)
The explain_st() output shows the aggregate_path field:
explicit_dml— fast-path is activemerge— using the default MERGE pathmerge (fast-path disabled)— eligible but GUC is off
-- Disable aggregate fast-path
SET pg_trickle.aggregate_fast_path = false;
-- Check the current aggregate path for a stream table
SELECT * FROM pgtrickle.explain_st('my_agg_st');
pg_trickle.template_cache
Added in v0.16.0. Controls the cross-backend delta template cache backed by an UNLOGGED catalog table.
| Property | Value |
|---|---|
| Type | bool |
| Default | true |
| Context | SUSET |
| Restart Required | No |
When enabled, delta SQL templates generated by the DVM engine are persisted in
pgtrickle.pgt_template_cache so that new backends skip the ~45 ms
parse+differentiate step on their first refresh of each stream table (down to
~1 ms SPI lookup).
Templates are automatically invalidated when:
- A stream table's defining query changes (ALTER STREAM TABLE ... SET QUERY)
- A stream table is dropped
- A stream table is reinitialized
The explain_st() output includes template_cache (enabled/disabled) and
template_cache_stats with L2 hit and full miss counters.
-- Disable the template cache for debugging
SET pg_trickle.template_cache = false;
-- Check template cache stats
SELECT * FROM pgtrickle.explain_st('my_st')
WHERE property IN ('template_cache', 'template_cache_stats');
pg_trickle.buffer_partitioning
Controls whether change buffer tables use PARTITION BY RANGE (lsn) for
O(1) cleanup via partition detach instead of O(n) DELETE.
| Value | Behaviour |
|---|---|
'off' | (Default) Unpartitioned heap tables. Cleanup uses DELETE or TRUNCATE. Lowest DDL overhead per cycle. |
'on' | Always create partitioned change buffers. Old partitions are detached and dropped after consumption — O(1) cleanup regardless of buffer size. Best for high-throughput sources where buffers routinely exceed compact_threshold. |
'auto' | Start with unpartitioned buffers. If a buffer accumulates more rows than compact_threshold within a single refresh cycle, automatically promote it to RANGE(lsn) partitioned mode. Once promoted, the buffer stays partitioned. Combines low overhead for quiet sources with O(1) cleanup for hot ones. |
Default: 'off'
Context: SUSET (superuser session-level)
-- Always partition change buffers
SET pg_trickle.buffer_partitioning = 'on';
-- Auto-promote based on throughput
SET pg_trickle.buffer_partitioning = 'auto';
-- Disable partitioning (default)
SET pg_trickle.buffer_partitioning = 'off';
Interaction with
compact_threshold: In'auto'mode, thecompact_thresholdvalue serves double duty — it triggers both compaction and the auto-promotion decision. Loweringcompact_thresholdmakes auto-promotion more sensitive.
pg_trickle.max_grouping_set_branches
Maximum allowed grouping set branches in CUBE/ROLLUP queries.
CUBE(n) produces $2^n$ branches — without a limit, large cubes cause
memory exhaustion during parsing. Users who genuinely need more than
64 branches can raise this GUC.
Default: 64
Range: 1 – 65536
-- Allow up to 128 grouping set branches
SET pg_trickle.max_grouping_set_branches = 128;
pg_trickle.volatile_function_policy
Controls how volatile functions in defining queries are handled for DIFFERENTIAL and IMMEDIATE modes.
| Value | Behaviour |
|---|---|
reject | (Default) Volatile functions cause an ERROR at stream table creation time. |
warn | Volatile functions emit a WARNING but creation proceeds. Delta correctness is not guaranteed. |
allow | Volatile functions are permitted silently. Use only when you understand that delta computation may produce incorrect results. |
Default: reject
Context: SUSET (superuser session-level)
-- Allow volatile functions with a warning
SET pg_trickle.volatile_function_policy = 'warn';
-- Allow volatile functions silently
SET pg_trickle.volatile_function_policy = 'allow';
Note: Volatile functions (e.g.,
random(),clock_timestamp()) produce different values on each evaluation. In DIFFERENTIAL/IMMEDIATE modes, the delta computation assumes deterministic functions — volatile functions may cause stale or incorrect rows. FULL mode is unaffected since it recomputes from scratch every time.
pg_trickle.unlogged_buffers
Create new change buffer tables as UNLOGGED to reduce WAL amplification
from CDC trigger inserts.
| Value | Behaviour |
|---|---|
false | (Default) Change buffers are WAL-logged. Crash-safe — no data loss on crash recovery. |
true | New change buffers are created as UNLOGGED. Eliminates WAL writes for trigger-inserted rows, reducing WAL amplification by ~30%. Trade-off: buffers are truncated on crash recovery; affected stream tables automatically receive a FULL refresh on the next scheduler cycle. |
Default: false
Context: SUSET (superuser session-level)
-- Enable UNLOGGED buffers for new stream tables
SET pg_trickle.unlogged_buffers = true;
Crash recovery: After a PostgreSQL crash or standby restart, UNLOGGED buffer tables are automatically truncated by PostgreSQL. The pg_trickle scheduler detects this condition and enqueues a FULL refresh for each affected stream table on the next tick. During the window between crash recovery and FULL refresh completion, stream table data may be stale.
Standby replicas: UNLOGGED tables are not replicated to standbys. Stream tables on read replicas will be stale after any standby restart until the next FULL refresh completes on the primary.
Converting existing buffers: This GUC only affects newly created change buffer tables. To convert existing logged buffers, use:
SELECT pgtrickle.convert_buffers_to_unlogged();This function acquires
ACCESS EXCLUSIVElock on each buffer table. Run it during a low-traffic maintenance window.
pg_trickle.max_parse_depth
Maximum recursion depth for the query parser's tree visitors (G13-SD).
Prevents stack-overflow crashes on pathological queries with deeply nested
subqueries, CTEs, or set operations. When the limit is exceeded, the
parser returns a QueryTooComplex error instead of crashing.
Default: 64
Range: 1 – 10000
-- Raise the limit for deeply nested queries
SET pg_trickle.max_parse_depth = 128;
pg_trickle.ivm_topk_max_limit
Maximum LIMIT value for TopK stream tables in IMMEDIATE mode.
TopK queries exceeding this threshold are rejected because the inline
micro-refresh (recomputing top-K rows on every DML statement) adds
latency proportional to LIMIT. Set to 0 to disable TopK in
IMMEDIATE mode entirely.
Default: 1000
Range: 0 – 1000000
-- Allow TopK up to LIMIT 5000 in IMMEDIATE mode
SET pg_trickle.ivm_topk_max_limit = 5000;
pg_trickle.ivm_recursive_max_depth
Maximum recursion depth for WITH RECURSIVE queries in IMMEDIATE mode.
The semi-naive evaluation injects a __pgt_depth counter column into the
recursive SQL; iteration stops when the counter reaches this limit. Protects
against infinite recursion in pathological graphs.
Default: 100
Range: 1 – 10000
-- Allow deeper recursion for large hierarchies
SET pg_trickle.ivm_recursive_max_depth = 500;
Invalidation Ring & Deep-Join Tuning (v0.50.0)
SCAL-10-01 — Invalidation ring capacity ceiling
pg_trickle.invalidation_ring_capacity
Controls the number of slots in the per-backend invalidation ring buffer.
When a source table DDL change (e.g. ALTER TABLE) or schema reload is
detected, the extension marks affected stream-table OIDs in this ring so
background refresh workers can schedule a full DAG rebuild.
Default: 128
Range: 1 – 1024
Hard ceiling: 1024 entries (enforced at registration time; values above
1024 are clamped to 1024)
-- Increase for deployments with many simultaneously-modified source tables
SET pg_trickle.invalidation_ring_capacity = 512;
Overflow behaviour
When more than invalidation_ring_capacity unique OIDs are invalidated in a
single burst (e.g. during a schema migration touching hundreds of tables at
once), the ring overflows. An overflow causes:
- A full DAG rebuild to be triggered on the next scheduler tick, regardless of which individual OIDs were invalidated.
- The
invalidation_ring_overflowscounter (visible viapgtrickle.reliability_counters()and thepg_trickle_reliability_invalidation_ring_overflows_totalPrometheus metric) to be incremented by 1.
Overflows are safe but expensive — a full DAG rebuild scans all registered
stream tables. A sustained non-zero overflow rate indicates that capacity
should be increased.
Guidance for large deployments (1,000+ stream tables)
| ST count | Recommended capacity |
|---|---|
| < 200 | 128 (default) |
| 200–500 | 256 |
| 500–1000 | 512 |
| > 1000 | 1024 (maximum) |
Note: Each ring slot consumes ~8 bytes of shared memory (allocated at
pg_trickle.max_shared_memory_kb). Increasing capacity by 896 slots (128 → 1024) uses an extra ~7 KB of shared memory, which is negligible.
| Setting | Value |
|---|---|
| Default | 128 |
| Range | 1 – 1024 |
| Context | postmaster (requires server restart) |
COR-10-01 — Deep join chain threshold
pg_trickle.part3_max_scan_count
Maximum number of source-table rows the differential engine scans in Part 3 (direct scan strategy) before it escalates to a full deep-join delta recomputation.
Part 3 applies when a join chain has depth ≤ part3_max_scan_count source
rows per side. Beyond this threshold the planner falls back to a more
expensive multi-pass join, which is correct at any depth but generates larger
SQL plans.
Default: 5
Range: 1 – 10000
-- Tighten to Part 3 only for very small dimension tables (≤3 rows):
SET pg_trickle.part3_max_scan_count = 3;
-- Relax for moderately sized dimensions where Part 3 SQL is manageable:
SET pg_trickle.part3_max_scan_count = 20;
Trade-off: SQL complexity vs. delta correctness at depth
| Threshold | Effect |
|---|---|
| Low (1–5) | Part 3 used rarely; deeper join strategy always chosen for non-trivial chains; correct but larger delta SQL and higher planning time. |
| Default (5) | Balanced; Part 3 applies only to tiny lookup tables (static enums, 1-row config rows). Recommended for most workloads. |
| High (50+) | Part 3 used aggressively; SQL is simpler but intermediate delta estimates may miss correlated rows at join depth > 6. |
Recommendation by join-chain depth
- ≤ 6 table join chains: Default (
5) is safe and near-optimal. - > 6 table join chains with small dimension tables: Increase to
10–20only if you observe excessive delta SQL plan sizes inEXPLAINoutput. - Analytical workloads with 10+ table star schemas: Leave at default
5and rely on the planner's GROUP_RESCAN fallback for correctness.
Diagnostic: Set
pg_trickle.log_format = 'verbose'and look forpart3_direct_scantags in the scheduler log to see how often Part 3 is being selected.
| Setting | Value |
|---|---|
| Default | 5 |
| Range | 1 – 10000 |
| Context | superuser |
Parallel Refresh
These settings control whether and how the scheduler dispatches refresh work to multiple dynamic background workers instead of processing stream tables sequentially. See PLAN_PARALLELISM.md for the design.
Note: Parallel refresh has been the default (
on) since v0.11.0. Usepg_trickle.parallel_refresh_mode = 'off'to revert to sequential execution.
pg_trickle.parallel_refresh_mode
Controls whether the scheduler dispatches refresh work to dynamic background workers.
| Property | Value |
|---|---|
| Type | text |
| Default | 'on' |
| Values | 'off', 'dry_run', 'on' |
| Context | SUSET |
| Restart Required | No |
on(default as of v0.11.0): True parallel refresh. The coordinator builds an execution-unit DAG, dispatches ready units to dynamic background workers, and respects both the per-database cap (max_concurrent_refreshes) and the cluster-wide cap (max_dynamic_refresh_workers).dry_run: The scheduler computes execution units, logs dispatch decisions (unit keys, ready-queue contents, budget), but still executes refreshes inline. Useful for previewing parallel behaviour without actually spawning workers.off: Sequential execution. All stream tables are refreshed one at a time in topological order by the single scheduler background worker.
-- Preview parallel dispatch decisions without changing runtime behaviour
SET pg_trickle.parallel_refresh_mode = 'dry_run';
-- Enable parallel refresh
SET pg_trickle.parallel_refresh_mode = 'on';
pg_trickle.max_dynamic_refresh_workers
Cluster-wide cap on concurrently active pg_trickle dynamic refresh workers.
| Property | Value |
|---|---|
| Type | int |
| Default | 4 |
| Range | 0 – 64 |
| Context | SUSET |
| Restart Required | No |
This is distinct from pg_trickle.max_concurrent_refreshes (per-database
cap). When multiple databases each have their own scheduler, this GUC
prevents them from overcommitting the shared PostgreSQL
max_worker_processes budget.
Worker-budget planning: Each dynamic refresh worker consumes one
max_worker_processes slot. In addition, pg_trickle uses one slot for
the launcher and one per-database scheduler. Ensure:
max_worker_processes >= pg_trickle launchers (1)
+ pg_trickle schedulers (1 per database)
+ max_dynamic_refresh_workers
+ autovacuum workers
+ parallel query workers
+ other extensions
A typical small deployment (1–2 databases, 4 parallel workers) needs at
least max_worker_processes = 16. The E2E test Docker image uses 128.
-- Allow up to 8 concurrent refresh workers cluster-wide
SET pg_trickle.max_dynamic_refresh_workers = 8;
pg_trickle.max_concurrent_refreshes
Per-database dispatch cap for parallel refresh workers.
| Property | Value |
|---|---|
| Type | int |
| Default | 4 |
| Range | 1 – 32 |
| Context | SUSET |
| Restart Required | No |
When parallel_refresh_mode = 'on', this limits how many execution units
a single database coordinator may have in-flight at the same time. In
sequential mode (parallel_refresh_mode = 'off'), this setting has no
effect.
The effective concurrent refreshes for a database is:
min(max_concurrent_refreshes, max_dynamic_refresh_workers - workers_used_by_other_dbs)
-- Allow up to 8 concurrent refreshes in this database
SET pg_trickle.max_concurrent_refreshes = 8;
pg_trickle.per_database_worker_quota
Per-database dynamic refresh worker quota for multi-tenant cluster isolation.
| Property | Value |
|---|---|
| Type | int |
| Default | 0 (disabled) |
| Range | 0 – 64 |
| Context | SUSET |
| Restart Required | No |
When greater than 0, each per-database scheduler limits itself to this many
concurrently active dynamic refresh workers drawn from the shared
max_dynamic_refresh_workers pool. This prevents a single busy database
from starving others in multi-tenant clusters.
Burst capacity: when the cluster is lightly loaded (active workers
< 80% of max_dynamic_refresh_workers), a database may temporarily
exceed its quota by up to 50% to absorb sudden change backlogs. The burst
is reclaimed automatically within 1 scheduler cycle once global load rises
back above the 80% threshold.
Priority dispatch: within each dispatch tick, IMMEDIATE-trigger closures are dispatched before all other unit kinds, ensuring transactional consistency requirements are always met first, even under quota pressure.
-- Limit the analytics DB to 4 base workers (bursts to 6 when cluster is idle)
ALTER DATABASE analytics SET pg_trickle.per_database_worker_quota = 4;
-- Give the reporting DB only 2 base workers
ALTER DATABASE reporting SET pg_trickle.per_database_worker_quota = 2;
SELECT pg_reload_conf();
When per_database_worker_quota = 0 (the default), this feature is
disabled and all databases share the max_dynamic_refresh_workers pool
on a first-come-first-served basis, bounded per coordinator by
max_concurrent_refreshes.
Note: Set this GUC per-database with
ALTER DATABASErather than globally withALTER SYSTEM, so different databases can have different quotas.
Advanced / Internal
pg_trickle.change_buffer_schema
Schema name for change-buffer tables created by the trigger-based CDC pipeline.
Default: 'pgtrickle_changes'
Change buffer tables are named <schema>.changes_<oid> where <oid> is
the source table's OID. Placing them in a dedicated schema keeps them out
of the public namespace.
SET pg_trickle.change_buffer_schema = 'my_change_buffers';
pg_trickle.foreign_table_polling
Enable polling-based change detection for foreign table sources. When
enabled, the scheduler periodically re-executes the foreign table query
and computes deltas via snapshot comparison (EXCEPT ALL). Foreign tables
cannot use trigger or WAL-based CDC, so this is the only mechanism for
incremental maintenance.
Default: false
SET pg_trickle.foreign_table_polling = true;
pg_trickle.matview_polling
Enable polling-based CDC for materialized views. When enabled, materialized
views referenced in defining queries are supported via snapshot-comparison
(the same mechanism as foreign table polling). A local shadow table stores
the previous state; EXCEPT ALL computes the delta on each refresh cycle.
| Property | Value |
|---|---|
| Type | boolean |
| Default | false |
| Context | SUSET (superuser) |
| Restart required | No |
SET pg_trickle.matview_polling = true;
pg_trickle.cdc_trigger_mode
Controls the CDC trigger granularity: statement (default) or row.
statement uses statement-level AFTER triggers with transition tables
(NEW TABLE / OLD TABLE). A single invocation per DML statement processes
all affected rows in one bulk INSERT ... SELECT, giving 50-80% less
write-side overhead for bulk UPDATE/DELETE. Single-row DML is unaffected.
row uses the legacy per-row trigger approach (pg_trickle < 0.4.0 behavior).
Changing this setting takes effect for newly installed CDC triggers. Call
pgtrickle.rebuild_cdc_triggers() to migrate existing stream tables.
| Property | Value |
|---|---|
| Type | string |
| Default | 'statement' |
| Valid values | statement, row |
| Context | SUSET (superuser) |
| Restart required | No |
-- Switch to statement-level triggers (default, recommended)
SET pg_trickle.cdc_trigger_mode = 'statement';
-- After changing, rebuild existing triggers:
SELECT pgtrickle.rebuild_cdc_triggers();
pg_trickle.tick_watermark_enabled
Cap CDC consumption to the WAL LSN at scheduler tick start. When enabled
(default), each scheduler tick captures pg_current_wal_lsn() at its start
and prevents any refresh from consuming WAL changes beyond that LSN. This
bounds cross-source staleness without requiring user configuration.
Disable only if you need stream tables to always advance to the latest available LSN.
| Property | Value |
|---|---|
| Type | boolean |
| Default | true |
| Context | SUSET (superuser) |
| Restart required | No |
-- Disable tick watermark bounding
SET pg_trickle.tick_watermark_enabled = false;
pg_trickle.watermark_holdback_timeout
Maximum seconds a user-provided watermark may remain un-advanced before
being considered stuck. When a watermark group contains a source whose
watermark has not been advanced within this timeout, downstream stream
tables in that group are paused (refresh is skipped) and a
pgtrickle_alert NOTIFY with watermark_stuck event is emitted.
When the stuck watermark is advanced again (via advance_watermark()), the
pause is automatically lifted and a watermark_resumed event is emitted.
Set to 0 to disable stuck-watermark detection (default). Useful values
depend on your ETL pipeline cadence -- for a pipeline that loads every 5
minutes, a timeout of 600 (10 min) gives a safety margin.
| Property | Value |
|---|---|
| Type | integer |
| Default | 0 (disabled) |
| Min | 0 |
| Max | 86400 (24 hours) |
| Context | SUSET (superuser) |
| Restart required | No |
-- Set stuck-watermark timeout to 10 minutes
ALTER SYSTEM SET pg_trickle.watermark_holdback_timeout = 600;
SELECT pg_reload_conf();
NOTIFY payloads:
{"event":"watermark_stuck","group":"order_pipeline","source_oid":16385,"age_secs":620}
{"event":"watermark_resumed","source_oid":16385}
pg_trickle.spill_threshold_blocks
Temp blocks written threshold for spill detection. After each differential
MERGE, pg_trickle queries pg_stat_statements for the temp_blks_written
metric. If the value exceeds this threshold, the refresh is considered a
spill.
After spill_consecutive_limit consecutive spills, the scheduler forces a
FULL refresh for that stream table to prevent repeated expensive
differential merges.
Requires the pg_stat_statements extension to be installed. Set to 0 to
disable spill detection (default).
| Property | Value |
|---|---|
| Type | integer |
| Default | 0 (disabled) |
| Min | 0 |
| Max | 100000000 |
| Context | SUSET (superuser) |
| Restart required | No |
-- Enable spill detection: flag > 1000 temp blocks as a spill
ALTER SYSTEM SET pg_trickle.spill_threshold_blocks = 1000;
SELECT pg_reload_conf();
pg_trickle.spill_consecutive_limit
Number of consecutive spilling differential refreshes before the scheduler automatically forces a FULL refresh. Resets after any non-spilling refresh.
Only effective when spill_threshold_blocks > 0.
| Property | Value |
|---|---|
| Type | integer |
| Default | 3 |
| Min | 1 |
| Max | 100 |
| Context | SUSET (superuser) |
| Restart required | No |
-- Force FULL after 5 consecutive spills (default: 3)
ALTER SYSTEM SET pg_trickle.spill_consecutive_limit = 5;
SELECT pg_reload_conf();
pg_trickle.log_merge_sql
Log the generated MERGE SQL template on every refresh cycle. When enabled,
the MERGE SQL template built during differential refresh is emitted to the
PostgreSQL server log at LOG level.
Intended for debugging MERGE query generation only. Do not enable in production — the output is verbose and includes the full SQL for every refresh.
| Property | Value |
|---|---|
| Type | boolean |
| Default | false |
| Context | SUSET (superuser) |
| Restart required | No |
SET pg_trickle.log_merge_sql = true;
Guardrails & Diagnostics
These GUCs control safety thresholds and diagnostic warnings.
pg_trickle.fuse_default_ceiling
Global default change-count ceiling for the fuse circuit breaker. When a
stream table has fuse_mode = 'on' or 'auto' and no per-ST fuse_ceiling,
this value is used. If pending changes exceed this count, the fuse blows
and the stream table is suspended (status = SUSPENDED).
Set to 0 to disable the global default (per-ST ceilings still apply).
| Property | Value |
|---|---|
| Type | integer |
| Default | 0 (disabled) |
| Range | 0 - 2,000,000,000 |
| Context | SUSET (superuser) |
| Restart required | No |
-- Set global fuse ceiling to 1 million rows
SET pg_trickle.fuse_default_ceiling = 1000000;
pg_trickle.delta_amplification_threshold
Delta amplification detection threshold (output/input ratio). When a
DIFFERENTIAL refresh produces more than this multiple of the input delta
rows, a WARNING is emitted so operators can identify pathological join
fan-out or many-to-many amplification.
Set to 0.0 to disable.
| Property | Value |
|---|---|
| Type | float |
| Default | 0.0 (disabled) |
| Range | 0.0 - 100,000.0 |
| Context | SUSET (superuser) |
| Restart required | No |
-- Warn when delta output is 10x the input
SET pg_trickle.delta_amplification_threshold = 10.0;
pg_trickle.algebraic_drift_reset_cycles
Differential cycles between automatic full recomputes for algebraic
aggregates. After this many differential refresh cycles, stream tables
with algebraic aggregates (AVG, STDDEV, VAR) are automatically
reinitialized to reset accumulated floating-point drift in auxiliary
columns.
Set to 0 to disable automatic resets.
| Property | Value |
|---|---|
| Type | integer |
| Default | 0 (disabled) |
| Range | 0 - 100,000 |
| Context | SUSET (superuser) |
| Restart required | No |
-- Reset algebraic aggregates every 10,000 cycles
SET pg_trickle.algebraic_drift_reset_cycles = 10000;
pg_trickle.agg_diff_cardinality_threshold
Estimated GROUP BY cardinality threshold for algebraic aggregate warnings.
At create_stream_table time, if the defining query uses algebraic
aggregates (SUM, COUNT, AVG) in DIFFERENTIAL mode and the estimated
group cardinality is below this threshold, a WARNING is emitted suggesting
FULL or AUTO mode.
Set to 0 to disable the warning.
| Property | Value |
|---|---|
| Type | integer |
| Default | 0 (disabled) |
| Range | 0 - 100,000,000 |
| Context | SUSET (superuser) |
| Restart required | No |
-- Warn when GROUP BY cardinality is below 100
SET pg_trickle.agg_diff_cardinality_threshold = 100;
Connection Pooler
v0.19.0+ (STAB-1).
pg_trickle.connection_pooler_mode
Cluster-wide connection pooler compatibility override.
| Property | Value |
|---|---|
| Type | string |
| Default | 'off' |
| Valid values | 'off', 'transaction', 'session' |
| Context | SUSET |
| Value | Behaviour |
|---|---|
off (default) | Per-ST pooler_compatibility_mode governs |
transaction | Globally disable prepared-statement reuse and suppress NOTIFY emissions (PgBouncer transaction-pool compatibility) |
session | Explicit opt-in to session mode (same as off today, reserved for future use) |
See Connection Pooler Compatibility for deployment guidance.
-- Enable transaction-mode pooler compatibility globally
SET pg_trickle.connection_pooler_mode = 'transaction';
History & Retention
v0.19.0+ (DB-5).
pg_trickle.history_retention_days
Number of days to retain rows in pgtrickle.pgt_refresh_history.
| Property | Value |
|---|---|
| Type | integer |
| Default | 90 |
| Min | 0 (disabled) |
| Max | 36500 (~100 years) |
| Context | SUSET |
The scheduler runs a daily background cleanup that deletes rows older than
this many days. Set to 0 to disable automatic cleanup (history grows
unbounded — monitor disk usage).
-- Keep 30 days of refresh history
SET pg_trickle.history_retention_days = 30;
Circular Dependencies
v0.7.0+ — Circular dependency support is now fully available for safe monotone cycles in DIFFERENTIAL mode. These settings control whether cycles are allowed at all and how many fixpoint iterations the scheduler will try before surfacing a non-convergence error.
pg_trickle.allow_circular
Master switch for circular (cyclic) stream table dependencies. When false
(default), creating a stream table that would introduce a cycle in the
dependency graph is rejected with a CycleDetected error. When true,
monotone cycles — those containing only safe operators (joins, filters,
projections, UNION ALL, INTERSECT, EXISTS) — are allowed.
Non-monotone operators (Aggregate, EXCEPT, Window functions, NOT EXISTS) always block cycle creation regardless of this setting, because they cannot guarantee convergence to a fixed point.
Default: false
SET pg_trickle.allow_circular = true;
pg_trickle.max_fixpoint_iterations
Maximum number of iterations per strongly connected component (SCC) before the scheduler declares non-convergence and marks all SCC members as ERROR. Prevents runaway loops in circular dependency chains.
For most practical use cases (transitive closure, graph reachability), convergence happens in 2–5 iterations. The default of 100 provides ample headroom.
Default: 100
Range: 1 – 10000
SET pg_trickle.max_fixpoint_iterations = 50;
pg_trickle.self_monitoring_auto_apply
Added in v0.20.0 (DF-G1).
Controls whether the self-monitoring analytics stream tables can automatically adjust stream table configuration.
| Value | Behaviour |
|---|---|
off (default) | Advisory only — no automatic changes. Dog-feeding stream tables produce analytics that operators and dashboards can read, but nothing is applied automatically. |
threshold_only | After each 10-minute auto-apply cycle, reads df_threshold_advice. If a recommendation has HIGH confidence and the recommended threshold differs from the current threshold by more than 5%, applies ALTER STREAM TABLE ... SET auto_threshold = <recommended>. Changes are logged with initiated_by = 'SELF_MONITOR'. |
full | Same as threshold_only, plus applies scheduling hints from df_scheduling_interference (future enhancement). |
Default: off
-- Enable threshold auto-apply.
SET pg_trickle.self_monitoring_auto_apply = 'threshold_only';
-- Check current setting.
SHOW pg_trickle.self_monitoring_auto_apply;
Prerequisites: Dog-feeding stream tables must be created first via
SELECT pgtrickle.setup_self_monitoring(). If the stream tables do not exist,
the auto-apply worker is a no-op.
Rate limiting: At most one threshold change per stream table per 10 minutes.
Audit trail: All auto-apply changes are recorded in pgt_refresh_history
with initiated_by = 'SELF_MONITOR' and a SKIP action describing the old and new
threshold values.
Scheduler Scalability (v0.25.0)
pg_trickle.worker_pool_size
Added in v0.25.0 (SCAL-5).
Number of persistent background workers kept ready in a pool. When > 0,
the scheduler reuses these workers across refresh cycles instead of spawning
a new worker for each job, eliminating the ~2 ms per-worker startup cost.
| Property | Value |
|---|---|
| Type | integer |
| Default | 0 (spawn-per-task) |
| Range | 0 – 64 |
| Context | SUSET (superuser) |
| Restart required | Yes |
-- Keep 4 persistent workers ready at all times
ALTER SYSTEM SET pg_trickle.worker_pool_size = 4;
SELECT pg_reload_conf();
Set to 0 to use the original spawn-per-task model (default, no change from
pre-v0.25.0 behavior).
pg_trickle.template_cache_max_entries
Added in v0.25.0 (CACHE-2).
Maximum number of entries in the per-backend L1 delta SQL template cache. When the cache reaches this limit, the least-recently-used entry is evicted.
| Property | Value |
|---|---|
| Type | integer |
| Default | 0 (unbounded) |
| Range | 0 – 65536 |
| Context | SUSET (superuser) |
| Restart required | No |
-- Cap the template cache at 200 entries per backend
SET pg_trickle.template_cache_max_entries = 200;
Operability, Observability & DR (v0.27.0)
pg_trickle.metrics_port
Added in v0.27.0 (OP-2).
TCP port for the Prometheus/OpenMetrics endpoint served by the per-database
background scheduler. When non-zero, GET /metrics returns all pg_trickle
monitoring metrics in Prometheus text format.
| Property | Value |
|---|---|
| Type | integer |
| Default | 0 (disabled) |
| Range | 0 – 65535 |
| Context | SUSET (superuser) |
| Restart required | Yes |
-- Expose metrics on port 9188 (per database)
ALTER DATABASE mydb SET pg_trickle.metrics_port = 9188;
SELECT pg_reload_conf();
Use 0 (the default) to disable the HTTP endpoint. Each database runs its
own scheduler, so the port must be unique per database on the same host.
pg_trickle.metrics_request_timeout_ms
Added in v0.27.0 (METR-2).
Maximum milliseconds the metrics HTTP handler is allowed to run. If a slow HTTP client holds the connection open longer, it is dropped. This protects the scheduler tick loop from being blocked by unresponsive Prometheus scrapers.
| Property | Value |
|---|---|
| Type | integer |
| Default | 5000 (5 seconds) |
| Range | 0 (no timeout) – 600 000 (10 min) |
| Context | SUSET (superuser) |
| Restart required | No |
pg_trickle.frontier_holdback_mode
Added in v0.27.0 (issue #536).
Controls how the scheduler prevents silent data loss from long-running transactions. When an uncommitted transaction has written rows to a source table, those change-buffer rows must not be included in a refresh until the transaction commits (or rolls back).
| Property | Value |
|---|---|
| Type | text |
| Default | 'xmin' |
| Values | 'xmin', 'none', 'lsn:<N>' |
| Context | SUSET (superuser) |
| Restart required | No |
| Value | Behaviour |
|---|---|
'xmin' | (Default) Probes pg_stat_activity + pg_prepared_xacts once per tick; caps the frontier to exclude rows from uncommitted transactions. |
'none' | No holdback — maximum performance but can skip rows from long-lived transactions. Not recommended for production. |
'lsn:<N>' | Hold back by exactly N bytes. Debugging use only. |
pg_trickle.frontier_holdback_warn_seconds
Added in v0.27.0 (issue #536).
Emit a WARNING when a holdback-causing transaction has been blocking frontier advancement for longer than this many seconds. The warning fires at most once per minute to avoid log spam. Useful for identifying forgotten long-running transactions.
| Property | Value |
|---|---|
| Type | integer |
| Default | 300 (5 minutes) |
| Range | 0 (disabled) – 3600 |
| Context | SUSET (superuser) |
| Restart required | No |
pg_trickle.publication_lag_warn_bytes
Added in v0.27.0 (PUB-1).
Emit a WARNING and defer change-buffer truncation when a downstream logical replication subscriber's confirmed WAL position lags behind the current change buffer by more than this many bytes.
| Property | Value |
|---|---|
| Type | integer |
| Default | 0 (disabled) |
| Range | 0 – 2 147 483 647 |
| Context | SUSET (superuser) |
| Restart required | No |
This prevents data loss for subscribers that rely on the change buffer as part of their replication pipeline.
pg_trickle.schedule_recommendation_min_samples
Added in v0.27.0 (PLAN-4).
Minimum number of refresh-history observations before
pgtrickle.recommend_schedule() returns a recommendation with non-zero
confidence. Raise this for more reliable recommendations; lower it to get
early guidance on new stream tables.
| Property | Value |
|---|---|
| Type | integer |
| Default | 10 |
| Range | 1 – 1000 |
| Context | SUSET (superuser) |
| Restart required | No |
pg_trickle.schedule_alert_cooldown_seconds
Added in v0.27.0 (PLAN-3).
Minimum seconds between consecutive predicted_sla_breach alerts for the
same stream table. Prevents log spam when the cost model consistently predicts
an imminent SLA violation.
| Property | Value |
|---|---|
| Type | integer |
| Default | 300 (5 minutes) |
| Range | 0 (no cooldown) – 86 400 |
| Context | SUSET (superuser) |
| Restart required | No |
Transactional Outbox (v0.28.0)
These GUCs control the transactional outbox subsystem. See the SQL Reference
for the enable_outbox(), poll_outbox(), and consumer group functions.
pg_trickle.outbox_enabled
Master enable/disable switch for the outbox subsystem.
| Property | Value |
|---|---|
| Type | boolean |
| Default | true |
| Context | SUSET (superuser) |
| Restart required | No |
pg_trickle.outbox_retention_hours
Default retention period (in hours) for outbox rows. Rows older than this
threshold are eligible for the background drain sweep. Can be overridden
per stream table via enable_outbox(retention_hours => N).
| Property | Value |
|---|---|
| Type | integer |
| Default | 24 |
| Range | 1 – 87 600 (10 years) |
| Context | SUSET (superuser) |
| Restart required | No |
pg_trickle.outbox_drain_batch_size
Number of expired outbox rows deleted in a single background drain pass.
| Property | Value |
|---|---|
| Type | integer |
| Default | 1000 |
| Range | 1 – 1 000 000 |
| Context | SUSET (superuser) |
| Restart required | No |
pg_trickle.outbox_drain_interval_seconds
Seconds between background outbox drain sweeps. Set to 0 to disable
automatic draining (you would then drain manually with outbox_rows_consumed()).
| Property | Value |
|---|---|
| Type | integer |
| Default | 60 |
| Range | 0 (disabled) – 86 400 |
| Context | SUSET (superuser) |
| Restart required | No |
pg_trickle.outbox_inline_threshold_rows
Maximum number of delta rows stored inline in the outbox payload. When a
refresh delta exceeds this count, pg_trickle switches to claim-check mode:
the payload is stored in a separate table and poll_outbox() returns
is_claim_check = true with a NULL payload.
| Property | Value |
|---|---|
| Type | integer |
| Default | 10000 |
| Range | 0 (always inline) – 10 000 000 |
| Context | SUSET (superuser) |
| Restart required | No |
pg_trickle.outbox_skip_empty_delta
When true, no outbox row is written for refreshes that produce zero inserted
and zero deleted rows. This reduces outbox table growth for frequently-scheduled
stream tables with sparse updates.
| Property | Value |
|---|---|
| Type | boolean |
| Default | true |
| Context | SUSET (superuser) |
| Restart required | No |
pg_trickle.outbox_storage_critical_mb
Size threshold (in MB) at which the outbox table is considered critically large. When exceeded, a WARNING is emitted on each refresh cycle.
| Property | Value |
|---|---|
| Type | integer |
| Default | 1024 (1 GB) |
| Range | 1 – 10 000 000 (10 TB) |
| Context | SUSET (superuser) |
| Restart required | No |
pg_trickle.outbox_force_retention
When true, outbox rows are kept past their retention_hours expiry until
all consumer groups have committed an offset past them. Prevents consumers
that are temporarily offline from missing messages.
| Property | Value |
|---|---|
| Type | boolean |
| Default | false |
| Context | SUSET (superuser) |
| Restart required | No |
pg_trickle.consumer_dead_threshold_hours
Hours of silence (no heartbeat) after which a consumer is marked as dead and
eligible for cleanup (when consumer_cleanup_enabled = true).
| Property | Value |
|---|---|
| Type | integer |
| Default | 24 |
| Range | 1 – 87 600 |
| Context | SUSET (superuser) |
| Restart required | No |
pg_trickle.consumer_stale_offset_threshold_days
Days of no offset progress after which a consumer's offset record is considered stale and eligible for cleanup.
| Property | Value |
|---|---|
| Type | integer |
| Default | 7 |
| Range | 1 – 3650 |
| Context | SUSET (superuser) |
| Restart required | No |
pg_trickle.consumer_cleanup_enabled
Enable automatic background cleanup of dead and stale consumer offsets and leases. When disabled, old records must be removed manually.
| Property | Value |
|---|---|
| Type | boolean |
| Default | true |
| Context | SUSET (superuser) |
| Restart required | No |
Transactional Inbox (v0.28.0)
These GUCs control the transactional inbox subsystem. See the SQL Reference
for create_inbox() and related functions.
pg_trickle.inbox_enabled
Master enable/disable switch for the inbox subsystem.
| Property | Value |
|---|---|
| Type | boolean |
| Default | true |
| Context | SUSET (superuser) |
| Restart required | No |
pg_trickle.inbox_processed_retention_hours
Retention period (in hours) for successfully processed inbox messages
(processed_at IS NOT NULL). Rows older than this threshold are deleted
by the background drain sweep.
| Property | Value |
|---|---|
| Type | integer |
| Default | 72 (3 days) |
| Range | 1 – 87 600 |
| Context | SUSET (superuser) |
| Restart required | No |
pg_trickle.inbox_dlq_retention_hours
Retention period (in hours) for dead-letter queue rows. Set to 0 to
keep DLQ rows indefinitely (useful for forensics and manual replay).
| Property | Value |
|---|---|
| Type | integer |
| Default | 0 (keep forever) |
| Range | 0 (keep forever) – 87 600 |
| Context | SUSET (superuser) |
| Restart required | No |
pg_trickle.inbox_drain_batch_size
Number of expired inbox messages deleted in a single background drain pass.
| Property | Value |
|---|---|
| Type | integer |
| Default | 1000 |
| Range | 1 – 1 000 000 |
| Context | SUSET (superuser) |
| Restart required | No |
pg_trickle.inbox_drain_interval_seconds
Seconds between inbox background drain sweeps. Set to 0 to disable
automatic draining.
| Property | Value |
|---|---|
| Type | integer |
| Default | 60 |
| Range | 0 (disabled) – 86 400 |
| Context | SUSET (superuser) |
| Restart required | No |
pg_trickle.inbox_dlq_alert_max_per_refresh
Maximum number of DLQ alert events raised per refresh cycle. Limits log
volume when many messages are failing simultaneously. Set to 0 to disable
DLQ alerting.
| Property | Value |
|---|---|
| Type | integer |
| Default | 10 |
| Range | 0 (disabled) – 100 |
| Context | SUSET (superuser) |
| Restart required | No |
Pre-GA Correctness & Stability (v0.30.0)
pg_trickle.use_sqlstate_classification
Added in v0.30.0 (SCAL-1).
When true, the retry classification for SPI errors uses the five-character
PostgreSQL SQLSTATE class rather than message-text pattern matching. This
makes retry decisions locale-independent (safe with lc_messages=fr_FR or
other non-English locales).
| Property | Value |
|---|---|
| Type | boolean |
| Default | false |
| Context | SUSET (superuser) |
| Restart required | No |
-- Enable locale-safe SQLSTATE-based retry classification
ALTER SYSTEM SET pg_trickle.use_sqlstate_classification = true;
SELECT pg_reload_conf();
Set to true in any deployment where lc_messages is not en_US.UTF-8, or
in mixed-locale clusters.
pg_trickle.template_cache_max_age_hours
Added in v0.30.0 (STAB-3).
Maximum age (in hours) for entries in the L2 catalog-backed template cache
(pgtrickle.pgt_template_cache). Entries older than this limit are purged
by the background scheduler on each tick. Keeping the cache fresh ensures
stale delta SQL templates do not persist after a stream table schema change.
| Property | Value |
|---|---|
| Type | integer |
| Default | 168 (7 days) |
| Range | 1 – 8760 (1 year) |
| Context | SUSET (superuser) |
| Restart required | No |
-- Purge L2 cache entries older than 24 hours
ALTER SYSTEM SET pg_trickle.template_cache_max_age_hours = 24;
SELECT pg_reload_conf();
Lower values reduce the risk of stale templates surviving a schema change. Higher values improve performance by retaining warm cache entries across long maintenance windows.
pg_trickle.max_parse_nodes
Added in v0.30.0 (PERF-2).
Maximum estimated number of parse-tree nodes allowed in a stream table
defining query. When > 0, queries whose estimated node count exceeds this
limit are rejected with a QueryTooComplex error before the OpTree builder
allocates memory. This guards against pathological queries such as
WHERE id IN (1, 2, …, 1 000 000) that would otherwise exhaust per-backend
memory during delta SQL generation.
| Property | Value |
|---|---|
| Type | integer |
| Default | 0 (disabled) |
| Range | 0 (disabled) – 10,000,000 |
| Context | SUSET (superuser) |
| Restart required | No |
-- Reject defining queries with more than 100 000 estimated nodes
ALTER SYSTEM SET pg_trickle.max_parse_nodes = 100000;
SELECT pg_reload_conf();
Node count is estimated conservatively as len(query) / 4. The default of
0 disables the check for backward compatibility. A value of 100000 is
recommended for production deployments.
Citus Distributed Tables (v0.32.0+)
Configuration for Citus distributed-table CDC and stream table support. See docs/integrations/citus.md for the full setup guide.
pg_trickle.citus_st_lock_lease_ms
Duration in milliseconds of the pgtrickle.pgt_st_locks lease used for
cross-node refresh coordination in Citus clusters.
The lease prevents two coordinator nodes from applying changes to the same
distributed stream table simultaneously. When pg_ripple is deployed alongside
pg_trickle, set this value to be ≥ pg_ripple.merge_fence_timeout_ms to
prevent the lease from expiring mid-merge.
| Property | Value |
|---|---|
| Type | integer |
| Default | 60000 (60 seconds) |
| Range | 0 (disabled) – 600,000 ms (10 minutes) |
| Context | SUSET (superuser) |
| Restart required | No |
| Added in | v0.33.0 (COORD-2) |
-- Align with a 30-second pg_ripple merge fence
ALTER SYSTEM SET pg_trickle.citus_st_lock_lease_ms = 45000;
SELECT pg_reload_conf();
pg_trickle.citus_worker_retry_ticks
Number of consecutive per-worker poll failures before the scheduler emits a
WARNING in the PostgreSQL log and flags the worker in pgtrickle.citus_status.
The warning is for operator attention — healthy workers continue refreshing
uninterrupted while a failed worker is skipped.
Set to 0 to disable the alerting entirely (failures are still logged at
LOG level per tick).
| Property | Value |
|---|---|
| Type | integer |
| Default | 5 |
| Range | 0 (disabled) – 100 |
| Context | SUSET (superuser) |
| Restart required | No |
| Added in | v0.34.0 (COORD-15) |
-- Alert after 3 consecutive failures instead of 5
ALTER SYSTEM SET pg_trickle.citus_worker_retry_ticks = 3;
SELECT pg_reload_conf();
-- Disable alerting (failures logged at LOG level only)
ALTER SYSTEM SET pg_trickle.citus_worker_retry_ticks = 0;
SELECT pg_reload_conf();
WAL Backpressure & Logging (v0.36.0)
pg_trickle.enforce_backpressure
When true, CDC trigger writes are suppressed when the WAL replication slot
lag exceeds pg_trickle.slot_lag_critical_threshold_mb. Writes resume once
the lag drops below 50 % of the threshold (hysteresis).
When false (default), pg_trickle only emits WARNING log messages when slot
lag is critical but does not suppress writes. Use enforce_backpressure = true only when disk exhaustion is an immediate risk.
Important:
enforce_backpressure = trueoperates in discard mode — changes that arrive while backpressure is active are dropped from the CDC buffer. Stream tables must be reinitialized after backpressure clears to recover from the data gap. See alsopg_trickle.cdc_capture_mode.
| Property | Value |
|---|---|
| Type | boolean |
| Default | false |
| Context | SUSET (superuser) |
| Restart required | No |
| Added in | v0.36.0 (A12) |
-- Enable backpressure suppression (discard mode)
ALTER SYSTEM SET pg_trickle.enforce_backpressure = true;
SELECT pg_reload_conf();
-- After clearing: reinitialize affected stream tables
SELECT pgtrickle.refresh_stream_table('my_stream', 'FULL');
pg_trickle.log_format
Log format for pg_trickle structured log events.
"text"(default): Unstructured human-readable messages."json": Structured JSON with fieldsevent,pgt_id,cycle_id,duration_ms,refresh_reason,error_code. Suitable for log aggregation pipelines (Loki, OpenTelemetry).
| Property | Value |
|---|---|
| Type | string |
| Default | text |
| Valid values | text, json |
| Context | SUSET (superuser) |
| Restart required | No |
| Added in | v0.36.0 (A20) |
-- Switch to JSON structured logging
ALTER SYSTEM SET pg_trickle.log_format = 'json';
SELECT pg_reload_conf();
pgVectorMV & OpenTelemetry (v0.37.0)
pg_trickle.enable_vector_agg
When true, avg(vector_col) and sum(vector_col) in stream table defining
queries are handled by the DVM engine using incremental aggregate operators for
vector, halfvec, and sparsevec types. Requires the pgvector extension.
| Property | Value |
|---|---|
| Type | boolean |
| Default | false |
| Context | SUSET (superuser) |
| Restart required | No |
| Added in | v0.37.0 (F4) |
ALTER SYSTEM SET pg_trickle.enable_vector_agg = true;
SELECT pg_reload_conf();
pg_trickle.enable_trace_propagation
When true, pg_trickle reads pg_trickle.trace_id from the session GUC at
CDC capture time and stores the W3C traceparent in the change buffer column
__pgt_trace_context. At refresh time, spans are exported via OTLP/gRPC to
pg_trickle.otel_endpoint.
| Property | Value |
|---|---|
| Type | boolean |
| Default | false |
| Context | SUSET (superuser) |
| Restart required | No |
| Added in | v0.37.0 (F10) |
pg_trickle.otel_endpoint
OTLP/gRPC endpoint for OpenTelemetry span export. Empty string (default) disables span export.
| Property | Value |
|---|---|
| Type | string |
| Default | '' (disabled) |
| Example | 'http://jaeger:4317' |
| Context | SUSET (superuser) |
| Restart required | No |
| Added in | v0.37.0 (F10) |
-- Export spans to a local Jaeger instance
ALTER SYSTEM SET pg_trickle.otel_endpoint = 'http://localhost:4317';
ALTER SYSTEM SET pg_trickle.enable_trace_propagation = true;
SELECT pg_reload_conf();
pg_trickle.trace_id
Session-level W3C traceparent header for trace context propagation. Set this
in the application session before DML so CDC capture links the changes to the
initiating trace. Requires enable_trace_propagation = true.
| Property | Value |
|---|---|
| Type | string |
| Default | '' |
| Context | USERSET (any user) |
| Restart required | No |
| Added in | v0.37.0 (F10) |
-- Propagate a trace across CDC capture
SET pg_trickle.trace_id = '00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01';
INSERT INTO orders VALUES (42, 'widget', 5);
Operational Truthfulness (v0.39.0)
pg_trickle.cdc_capture_mode
Controls what happens to CDC writes when pg_trickle.cdc_paused = on.
"discard"(default): CDC trigger bodies returnNULL; changes arriving while paused are dropped. Stream tables must be reinitialized after un-pausing to recover from the data gap."hold": Reserved for a future durable capture-and-hold mode. Currently emits aWARNINGand falls back to"discard".
Operator checklist: Before setting
cdc_paused = on, checkpgtrickle.cdc_pause_status()to confirm the active mode. After un-pausing in discard mode, runpgtrickle.refresh_stream_table('<name>')withFULLmode for each affected stream table.
| Property | Value |
|---|---|
| Type | string |
| Default | discard |
| Valid values | discard, hold (reserved) |
| Context | SUSET (superuser) |
| Restart required | No |
| Added in | v0.39.0 (O39-8) |
-- Check the current CDC pause status
SELECT * FROM pgtrickle.cdc_pause_status();
-- Pause CDC (discard mode — changes arriving now are DROPPED)
ALTER SYSTEM SET pg_trickle.cdc_paused = on;
SELECT pg_reload_conf();
-- After maintenance, un-pause and reinitialize affected tables
ALTER SYSTEM SET pg_trickle.cdc_paused = off;
SELECT pg_reload_conf();
-- Full refresh to recover from the gap:
SELECT pgtrickle.refresh_stream_table('public.my_stream_table', 'FULL');
GUC Interaction Matrix
Some GUC variables interact with or depend on each other. The table below documents these cross-dependencies to help avoid misconfiguration.
| GUC A | GUC B | Interaction |
|---|---|---|
auto_backoff | min_schedule_seconds | auto_backoff stretches the effective interval up to 8× the configured schedule, but never below min_schedule_seconds. If min_schedule_seconds is high, backoff has limited room to operate. |
auto_backoff | default_schedule_seconds | The backoff multiplier is applied to default_schedule_seconds (or the per-ST override); raising this value gives backoff a wider range. |
parallel_refresh_mode | max_concurrent_refreshes | parallel_refresh_mode = 'on' dispatches independent STs to parallel workers, up to max_concurrent_refreshes per database. Setting max_concurrent_refreshes = 1 effectively disables parallelism even when the mode is 'on'. |
parallel_refresh_mode | max_dynamic_refresh_workers | max_dynamic_refresh_workers is a cluster-wide cap across all databases. If you have 4 databases each wanting 4 concurrent refreshes, set this to ≥16 (or accept queuing). |
max_dynamic_refresh_workers | per_database_worker_quota | When per_database_worker_quota > 0, each database claims at most that many workers from the shared max_dynamic_refresh_workers pool. Set per_database_worker_quota to max_dynamic_refresh_workers / n_databases for equal sharing. Burst to 150% is allowed when the cluster is < 80% loaded. |
differential_max_change_ratio | fuse_default_ceiling | Both guard against large change batches but at different levels: differential_max_change_ratio triggers a FULL refresh fallback (proportional to table size), while fuse_default_ceiling halts refresh entirely (absolute row count). The fuse fires first if the change count exceeds it, regardless of the ratio. |
block_source_ddl | DDL operations | When true, DDL on source tables (ALTER TABLE, DROP COLUMN) is blocked by an event trigger. Disable temporarily with SET pg_trickle.block_source_ddl = false before schema migrations, then re-enable. |
cdc_mode | cdc_trigger_mode | cdc_trigger_mode ('statement' / 'row') only applies when CDC is trigger-based. When cdc_mode = 'wal' (or after auto-transition to WAL), cdc_trigger_mode is irrelevant. |
cdc_mode | wal_transition_timeout | wal_transition_timeout only applies when cdc_mode = 'auto'. It controls how many seconds to wait for the first WAL-based refresh to succeed before falling back to triggers. |
cleanup_use_truncate | compact_threshold | cleanup_use_truncate = true uses TRUNCATE to clear consumed change buffers (fastest, acquires AccessExclusiveLock briefly). compact_threshold controls when fully-consumed buffers are compacted via DELETE — only relevant when TRUNCATE is disabled. |
buffer_partitioning | compact_threshold | In 'auto' mode, compact_threshold serves as the promotion trigger: if a buffer exceeds this many rows in a single refresh cycle, it is promoted to RANGE(lsn) partitioned mode. Lowering compact_threshold makes auto-promotion more sensitive. |
allow_circular | max_fixpoint_iterations | max_fixpoint_iterations is only evaluated when allow_circular = true. It caps the number of convergence iterations for circular dependency chains. |
ivm_topk_max_limit | TopK queries | Queries with LIMIT > ivm_topk_max_limit fall back to FULL refresh instead of the optimized TopK path. Raise this if you have legitimate large TopK queries. |
ivm_recursive_max_depth | Recursive CTEs | Recursive expansion beyond ivm_recursive_max_depth iterations is terminated with a warning and falls back to FULL refresh. Set to 0 to disable the guard (not recommended). |
Tuning Profiles
Three named profiles for common deployment patterns. Copy the relevant
settings into your postgresql.conf and adjust to taste.
Low-Latency Profile
Goal: Minimize end-to-end latency from base table write to stream table update. Best for dashboards, real-time analytics, and operational monitoring.
# Fast scheduling (polling-based, sub-200ms median latency)
pg_trickle.scheduler_interval_ms = 200 # poll interval
pg_trickle.min_schedule_seconds = 1
pg_trickle.default_schedule_seconds = 1
# Parallel refresh for independent STs
pg_trickle.parallel_refresh_mode = 'on'
pg_trickle.max_concurrent_refreshes = 4
# Lean merge
pg_trickle.merge_planner_hints = true
pg_trickle.merge_work_mem_mb = 128 # more memory = fewer disk sorts
pg_trickle.cleanup_use_truncate = true
pg_trickle.use_prepared_statements = true
# Guardrails
pg_trickle.auto_backoff = true # prevent CPU runaway
pg_trickle.fuse_default_ceiling = 0 # disabled — latency over safety
pg_trickle.block_source_ddl = true
High-Throughput Profile
Goal: Maximize rows-per-second processed across many stream tables under heavy write load. Accepts slightly higher latency in exchange for better batching and resource efficiency.
# Batched scheduling
pg_trickle.scheduler_interval_ms = 2000 # 2-second poll interval
pg_trickle.min_schedule_seconds = 2
pg_trickle.default_schedule_seconds = 5
# Heavy parallelism
pg_trickle.parallel_refresh_mode = 'on'
pg_trickle.max_concurrent_refreshes = 8
pg_trickle.max_dynamic_refresh_workers = 8
# Aggressive performance
pg_trickle.merge_planner_hints = true
pg_trickle.merge_work_mem_mb = 256 # large work_mem for big deltas
pg_trickle.merge_seqscan_threshold = 0.01 # allow seq scans for >1% changes
pg_trickle.cleanup_use_truncate = true
pg_trickle.use_prepared_statements = true
pg_trickle.auto_backoff = true
pg_trickle.buffer_partitioning = 'auto' # O(1) cleanup for hot buffers
# Safety for bulk workloads
pg_trickle.fuse_default_ceiling = 500000 # pause on >500K changes
pg_trickle.differential_max_change_ratio = 0.25 # FULL fallback at 25%
pg_trickle.block_source_ddl = true
Resource-Constrained Profile
Goal: Minimize CPU and memory footprint for small instances, shared hosting, or development environments. Accepts higher latency and slower throughput.
# Poll-based scheduling (conservative)
pg_trickle.scheduler_interval_ms = 5000 # 5-second poll
# Conservative scheduling
pg_trickle.min_schedule_seconds = 5
pg_trickle.default_schedule_seconds = 10
# Minimal parallelism
pg_trickle.parallel_refresh_mode = 'off' # single-threaded refresh
pg_trickle.max_concurrent_refreshes = 1
pg_trickle.max_dynamic_refresh_workers = 1
# Conservative memory
pg_trickle.merge_work_mem_mb = 32
pg_trickle.merge_planner_hints = true
pg_trickle.cleanup_use_truncate = true
# Tight guardrails
pg_trickle.auto_backoff = true
pg_trickle.fuse_default_ceiling = 100000
pg_trickle.differential_max_change_ratio = 0.10
pg_trickle.block_source_ddl = true
pg_trickle.buffer_alert_threshold = 500000
Complete postgresql.conf Example
# Required
shared_preload_libraries = 'pg_trickle'
# Essential
pg_trickle.enabled = true
pg_trickle.cdc_mode = 'auto'
pg_trickle.scheduler_interval_ms = 1000
pg_trickle.min_schedule_seconds = 1
pg_trickle.default_schedule_seconds = 1
pg_trickle.max_consecutive_errors = 3
# WAL CDC
pg_trickle.wal_transition_timeout = 300
pg_trickle.slot_lag_warning_threshold_mb = 100
pg_trickle.slot_lag_critical_threshold_mb = 1024
# Refresh performance
pg_trickle.differential_max_change_ratio = 0.15
pg_trickle.merge_planner_hints = true
pg_trickle.merge_work_mem_mb = 64
pg_trickle.cleanup_use_truncate = true
pg_trickle.use_prepared_statements = true
pg_trickle.user_triggers = 'auto'
# Guardrails & limits
pg_trickle.block_source_ddl = false
pg_trickle.buffer_alert_threshold = 1000000
pg_trickle.compact_threshold = 100000
pg_trickle.buffer_partitioning = 'off'
pg_trickle.max_grouping_set_branches = 64
pg_trickle.max_parse_depth = 64
pg_trickle.ivm_topk_max_limit = 1000
pg_trickle.ivm_recursive_max_depth = 100
# Circular dependencies (v0.7.0+)
pg_trickle.allow_circular = false # master switch
pg_trickle.max_fixpoint_iterations = 100 # convergence limit
# Parallel refresh (v0.11.0+, default 'on')
pg_trickle.parallel_refresh_mode = 'on' # 'off' | 'dry_run' | 'on'
pg_trickle.max_dynamic_refresh_workers = 4 # cluster-wide worker cap
pg_trickle.max_concurrent_refreshes = 4 # per-database dispatch cap
pg_trickle.max_parallel_workers = 0 # user-facing parallel cap (0 = use automatic sizing)
# Predictive cost model (v0.22.0+)
pg_trickle.prediction_window = 60 # minutes of history for regression
pg_trickle.prediction_ratio = 1.5 # diff/full cost ratio threshold
pg_trickle.prediction_min_samples = 5 # minimum samples before model activates
# DVM scaling & diagnostics (v0.23.0+)
pg_trickle.log_delta_sql = false # log delta SQL at DEBUG1 (diagnostic only)
pg_trickle.delta_work_mem = 0 # work_mem MB for delta execution (0 = inherit)
pg_trickle.delta_enable_nestloop = true # allow nested-loop joins in delta SQL
pg_trickle.analyze_before_delta = true # ANALYZE change buffers before delta SQL
pg_trickle.max_change_buffer_alert_rows = 0 # change buffer overflow alert threshold (0 = off)
pg_trickle.diff_output_format = 'split' # 'split' (DI-2 pairs) | 'merged' (compat)
# Scheduler scalability (v0.25.0+)
pg_trickle.worker_pool_size = 0 # 0 = spawn-per-task; >0 = persistent pool
pg_trickle.template_cache_max_entries = 0 # 0 = unbounded
# Operability & observability (v0.27.0+)
pg_trickle.metrics_port = 0 # 0 = disabled; set per-database
pg_trickle.frontier_holdback_mode = 'xmin' # xmin | none | lsn:<N>
pg_trickle.frontier_holdback_warn_seconds = 300 # warn after 5 min of blocked frontier
pg_trickle.publication_lag_warn_bytes = 0 # 0 = disabled
# Transactional outbox (v0.28.0+)
pg_trickle.outbox_enabled = true
pg_trickle.outbox_retention_hours = 24
pg_trickle.outbox_inline_threshold_rows = 10000
pg_trickle.outbox_skip_empty_delta = true
pg_trickle.outbox_force_retention = false
pg_trickle.consumer_dead_threshold_hours = 24
pg_trickle.consumer_cleanup_enabled = true
# Transactional inbox (v0.28.0+)
pg_trickle.inbox_enabled = true
pg_trickle.inbox_processed_retention_hours = 72
pg_trickle.inbox_dlq_retention_hours = 0 # 0 = keep forever
pg_trickle.inbox_dlq_alert_max_per_refresh = 10
# Citus distributed tables (v0.32.0+)
pg_trickle.citus_st_lock_lease_ms = 60000 # lease duration for cross-node coordination
pg_trickle.citus_worker_retry_ticks = 5 # failures before WARNING in citus_status
# Advanced / internal
pg_trickle.change_buffer_schema = 'pgtrickle_changes'
pg_trickle.foreign_table_polling = false
Runtime Configuration
All GUC variables can be changed at runtime by a superuser:
-- View current settings
SHOW pg_trickle.enabled;
SHOW pg_trickle.parallel_refresh_mode;
-- Enable parallel refresh for current session
SET pg_trickle.parallel_refresh_mode = 'on';
-- Change persistently (requires reload)
ALTER SYSTEM SET pg_trickle.scheduler_interval_ms = 500;
SELECT pg_reload_conf();
Further Reading
- Installation — Installation and initial configuration
- ARCHITECTURE.md — System architecture overview
- SQL_REFERENCE.md — Complete function reference
Appendix: Deprecated / Compatibility GUCs
The GUCs listed below are deprecated and retained only for backward compatibility (to allow rolling upgrades without configuration errors). They have no effect on extension behaviour and should be removed from new deployments.
pg_trickle.event_driven_wake
⚠️ Removed in v0.51.0 — This GUC has been fully removed from the extension. If it appears in
postgresql.confafter upgrading to v0.51.0, PostgreSQL will emit an "unrecognized configuration parameter" warning at startup. Remove it to suppress the warning.Migration: Remove
pg_trickle.event_driven_wakefrompostgresql.confand anyALTER SYSTEMsettings. No replacement is needed — the scheduler always uses efficient latch-based polling.
pg_trickle.wake_debounce_ms
⚠️ Removed in v0.51.0 — This GUC has been fully removed together with
event_driven_wake. Remove it frompostgresql.confto avoid an "unrecognized configuration parameter" warning at startup.Migration: Remove
pg_trickle.wake_debounce_msfrompostgresql.conf. No replacement is needed.
pg_trickle.merge_planner_hints
⚠️ Deprecated — accepted for backwards compatibility but has no effect. Will be removed in a future major version.
pg_trickle.user_triggers (value 'on')
⚠️ Deprecated — The value
'on'is accepted as a deprecated alias for'auto'and has no distinct behaviour. Use'auto'(default) or'off'instead. The'on'alias will be removed in a future major version.
Predictive Cost Model
pg_trickle's AUTO refresh mode does not just toggle between FULL
and DIFFERENTIAL by hand-tuned thresholds. It runs a predictive
cost model that estimates the expected cost of each mode for the
next refresh, given the current change ratio and historical
runtimes, and picks the cheaper one.
This page explains how the model works, the levers you can pull, and when it is safe to ignore the model and pin a mode by hand.
Why a cost model?
Differential refresh is dramatically faster than FULL refresh — when the change ratio is small. As the change ratio grows, the delta overhead (computing ΔQ, scanning change buffers, planning the MERGE) starts to dominate, and at some point a full recomputation wins.
A static threshold ("switch at 50%") is a reasonable default, but it is wrong in either direction for many real queries:
- Aggregates with a few groups recompute trivially in FULL — DIFF has to do work for nothing.
- Wide joins with selective filters benefit from DIFF even at very high change ratios.
- A query whose source has just doubled in size will see different trade-offs from yesterday's plan.
The cost model uses measured last_full_ms and last_diff_ms
together with the current change ratio to make a per-refresh
decision.
Inputs the model uses
For each stream table, on each scheduler tick:
| Input | Source |
|---|---|
change_ratio_current | pending_changes / source_row_count |
last_full_ms | most recent full refresh duration |
last_diff_ms | most recent differential refresh duration |
pending_rows | size of the change buffer |
delta_amplification_factor | learned multiplier: estimated delta volume given pending changes |
cost_model_safety_margin (GUC) | bias toward FULL or DIFF |
It then computes:
predicted_diff_ms = base_diff_overhead
+ per_row_diff_cost × pending_rows × delta_amplification_factor
predicted_full_ms = last_full_ms × source_growth_factor
AUTO chooses the cheaper one (after applying the safety margin).
Inspect what the model would do
-- Recommendation and reasoning for a single stream table
SELECT * FROM pgtrickle.recommend_refresh_mode('order_totals');
-- recommended_mode | reason | composite_score
-- DIFFERENTIAL | change ratio 0.018, est. 22 ms | 0.31
-- Rolling efficiency: refresh durations vs source-table size
SELECT * FROM pgtrickle.refresh_efficiency('order_totals');
-- Or for the whole catalogue
SELECT pgt_name, recommended_mode, change_ratio, est_diff_ms, est_full_ms
FROM pgtrickle.recommend_refresh_mode_all();
Tuning levers
| GUC | Default | Effect |
|---|---|---|
pg_trickle.cost_model_safety_margin | 1.20 | Multiplier on the predicted DIFF cost. > 1.0 biases toward FULL. |
pg_trickle.differential_max_change_ratio | 0.50 | Hard cap: never pick DIFF above this ratio. |
pg_trickle.adaptive_full_threshold | 0.50 | Force FULL when change ratio exceeds this (legacy fallback). |
pg_trickle.delta_amplification_threshold | 5.0 | When delta_volume / pending_changes > T, prefer FULL. |
pg_trickle.max_delta_estimate_rows | 10000000 | Cap on the model's estimate; above this, prefer FULL. |
pg_trickle.planner_aggressive | off | Allow the model to override per-table refresh-mode hints. |
A typical "trust the model" setup:
ALTER SYSTEM SET pg_trickle.cost_model_safety_margin = '1.0';
ALTER SYSTEM SET pg_trickle.planner_aggressive = 'on';
SELECT pg_reload_conf();
A typical "be conservative" setup (prefer FULL on uncertainty):
ALTER SYSTEM SET pg_trickle.cost_model_safety_margin = '1.5';
ALTER SYSTEM SET pg_trickle.differential_max_change_ratio = '0.30';
When to override the model
There are good reasons to pin a mode manually:
- Queries the model can't see well. Holistic aggregates
(
PERCENTILE_*,STRING_AGG) often plan badly under DIFF; pin to FULL. - Workloads with large but cheap full refreshes. Tiny aggregates that re-aggregate from a small source — the FULL plan is so cheap that the model's DIFF estimate cannot beat it.
- Predictable bursts. If you know that 9–10 a.m. is your
upload window, pre-emptively
ALTER STREAM TABLE … SET refresh_mode = 'FULL'for that window.
Pin a mode with:
SELECT pgtrickle.alter_stream_table('order_totals', refresh_mode => 'FULL');
The model will not override a manually-pinned mode unless
planner_aggressive = on.
Verifying the model in production
-- Compare actual mode vs recommended mode over the last 1000 refreshes
WITH recent AS (
SELECT pgt_name, refresh_mode, started_at, finished_at,
EXTRACT(epoch FROM (finished_at - started_at)) * 1000 AS ms
FROM pgtrickle.pgt_refresh_history
WHERE started_at > now() - interval '1 hour'
)
SELECT pgt_name,
refresh_mode,
AVG(ms) AS avg_ms,
COUNT(*) AS n
FROM recent
GROUP BY pgt_name, refresh_mode
ORDER BY pgt_name;
If a stream table is consistently slow in AUTO mode, look at the
distribution of modes chosen. Often the answer is "the model is
flapping" — increase cost_model_safety_margin or pin a mode.
See also: Performance Cookbook · tuning-refresh-mode · Configuration – Refresh Performance · SQL Reference – Diagnostics
Pre-Deployment Checklist
Complete this checklist before deploying pg_trickle to a new environment. Each item links to the relevant documentation for details.
Version: v0.14.0+. Earlier versions may have different requirements.
1. PostgreSQL Version
- PostgreSQL 18.x is required (pg_trickle is compiled against PG 18)
- Extension binary matches your exact PostgreSQL major version
SELECT version(); -- Must show PostgreSQL 18.x
2. shared_preload_libraries
pg_trickle must be loaded at server startup via shared_preload_libraries.
Without this, GUC variables and the background scheduler are not available.
# postgresql.conf
shared_preload_libraries = 'pg_trickle'
-
shared_preload_librariesincludespg_trickle - PostgreSQL has been restarted after changing this setting (reload is not sufficient)
SHOW shared_preload_libraries; -- Must include pg_trickle
Managed PostgreSQL: Some providers (Supabase, Neon) do not support custom
shared_preload_libraries. Check your provider's extension compatibility list. AWS RDS and Google Cloud SQL support custom shared libraries via parameter groups.
3. WAL Configuration (Optional but Recommended)
pg_trickle works without wal_level = logical — it uses trigger-based
CDC by default. However, WAL-based CDC provides lower overhead on
write-heavy workloads.
# postgresql.conf (optional — for WAL-based CDC)
wal_level = logical
max_replication_slots = 10 # At least 1 per tracked source table
- Decide: trigger-based CDC (default) or WAL-based CDC
-
If WAL:
wal_level = logicaland server restarted -
If WAL:
max_replication_slotsis sufficient for your source table count
Note: CDC mode is configurable per stream table. The default
cdc_mode = 'auto'starts with triggers and transitions to WAL automatically whenwal_level = logicalis detected. See CONFIGURATION.md for details.
4. Extension Installation
CREATE EXTENSION pg_trickle;
-- Verify installation
SELECT extname, extversion FROM pg_extension WHERE extname = 'pg_trickle';
- Extension created successfully
- Version matches expected release
5. Background Scheduler
The scheduler runs as a background worker and manages automatic refresh. Verify it's running:
SELECT pid, backend_type, state
FROM pg_stat_activity
WHERE backend_type = 'pg_trickle scheduler';
-
Scheduler process is visible in
pg_stat_activity -
pg_trickle.enabled = true(default; set tofalseto disable)
6. Connection Pooler Compatibility
PgBouncer (Transaction Mode)
PgBouncer in transaction pooling mode drops session state between transactions. pg_trickle needs special handling:
-
Enable
pooler_compatibility_modeon affected stream tables:
SELECT pgtrickle.alter_stream_table('my_st',
pooler_compatibility_mode => true);
- Or set globally via GUC:
pg_trickle.pooler_compatibility_mode = true
PgBouncer (Session Mode)
Session mode preserves session state — no special configuration needed.
Supavisor / Other Poolers
Some poolers (Supavisor, pgcat) have their own compatibility
characteristics. Test with pgtrickle.validate_query() before deploying.
7. Recommended GUC Starting Values
These are sensible defaults for most workloads. Adjust based on monitoring data.
# Core settings (usually fine as defaults)
pg_trickle.enabled = true # Enable scheduler
pg_trickle.schedule_interval = '5s' # Global default refresh interval
pg_trickle.max_concurrent_refreshes = 4 # Parallel refresh limit
# Performance tuning
pg_trickle.planner_aggressive = true # Enable MERGE planner hints
pg_trickle.tiered_scheduling = true # Tier-aware scheduling
# CDC mode
pg_trickle.cdc_mode = 'auto' # auto | trigger | wal
# Safety
pg_trickle.unlogged_buffers = false # true = faster but not crash-safe
pg_trickle.fuse_default_ceiling = 10000 # Auto-fuse change threshold
- Review GUC values for your workload
- See CONFIGURATION.md for the full reference
8. Resource Planning
Memory
- Each background worker uses a separate PostgreSQL backend
work_memapplies to each worker's delta SQL execution- Monitor RSS growth via
pg_stat_activityor OS-level tools
Storage
- Change buffer tables (
pgtrickle_changes.changes_*) grow between refreshes - Buffer size depends on DML rate × refresh interval
- Monitor via
pgtrickle.shared_buffer_stats()
Connections
-
The scheduler uses
pg_trickle.max_concurrent_refreshesbackend connections -
Ensure
max_connectionshas headroom for workers + application -
max_connectionsis at least application connections +pg_trickle.max_concurrent_refreshes+ 5
9. Monitoring Setup
Essential Queries
-- Stream table health overview
SELECT pgt_name, status, staleness, refresh_mode
FROM pgtrickle.stream_tables_info
ORDER BY staleness DESC NULLS LAST;
-- Refresh efficiency
SELECT pgt_name, diff_speedup, avg_change_ratio
FROM pgtrickle.refresh_efficiency();
-- Error states
SELECT pgt_name, status, last_error_message, last_error_at
FROM pgtrickle.pgt_stream_tables
WHERE status IN ('ERROR', 'SUSPENDED');
Grafana / Prometheus
See the monitoring/ directory for ready-to-use Grafana dashboards and Prometheus configuration.
- Monitoring configured for stream table health
- Alerting on ERROR/SUSPENDED status
10. Backup & Restore
pg_trickle stream tables are standard PostgreSQL tables and are included
in pg_dump / pg_restore. See BACKUP_AND_RESTORE.md
for details.
- Backup strategy accounts for both source tables and stream tables
- Restore procedure tested (stream tables may need re-initialization)
Quick Validation Script
Run this after deployment to verify everything is working:
-- 1. Extension loaded
SELECT extname, extversion FROM pg_extension WHERE extname = 'pg_trickle';
-- 2. Scheduler running
SELECT COUNT(*) > 0 AS scheduler_alive
FROM pg_stat_activity
WHERE backend_type = 'pg_trickle scheduler';
-- 3. Create a test stream table
CREATE TABLE _deploy_test_src (id INT PRIMARY KEY, val INT);
INSERT INTO _deploy_test_src VALUES (1, 100), (2, 200);
SELECT pgtrickle.create_stream_table(
'_deploy_test_st',
'SELECT id, val FROM _deploy_test_src',
refresh_mode => 'FULL'
);
SELECT pgtrickle.refresh_stream_table('_deploy_test_st');
-- 4. Verify data
SELECT * FROM _deploy_test_st ORDER BY id;
-- Expected: (1, 100), (2, 200)
-- 5. Cleanup
SELECT pgtrickle.drop_stream_table('_deploy_test_st');
DROP TABLE _deploy_test_src;
Connection Pooler Compatibility
Added in v0.19.0 (UX-4 / STAB-1).
pg_trickle uses prepared statements and NOTIFY internally. These features
require special handling when a connection pooler sits between the application
and PostgreSQL.
PgBouncer Transaction Mode
In PgBouncer transaction pooling mode, each transaction may land on a different server-side connection. Prepared statements and LISTEN/NOTIFY do not survive across transactions.
Recommended configuration:
# postgresql.conf
pg_trickle.connection_pooler_mode = 'transaction'
This cluster-wide GUC:
- Disables prepared-statement reuse for all stream tables.
- Suppresses
NOTIFY pg_trickle_refreshemissions (listeners on other connections will not receive them anyway in transaction mode).
Alternatively, enable pooler compatibility per stream table:
SELECT pgtrickle.alter_stream_table('my_stream_table',
pooler_compatibility_mode => true);
PgBouncer Session Mode
Session pooling is fully compatible — no special configuration needed.
pgcat / Supavisor
These poolers generally support prepared statements and NOTIFY. Set
pg_trickle.connection_pooler_mode = 'off' (the default).
Kubernetes / CNPG
See Scaling — CNPG for connection pooler configuration in Kubernetes environments.
Row-Level Security
Important: pg_trickle background workers execute refresh queries with
SET LOCAL row_security = off. This is intentional and matches the semantics of PostgreSQL'sREFRESH MATERIALIZED VIEW.
Implications
- Stream table output always contains the full, unfiltered result set regardless of RLS policies on source tables.
- Row-Level Security policies on source tables do not filter what ends up in a stream table.
- If the source table has RLS and the defining query selects
*, all rows (including those that would be hidden by RLS for normal roles) will be included in the stream table.
Mitigations
- Audit all stream table queries: ensure sensitive columns are excluded or aggregated.
- Do not expose stream tables directly to end-user roles if the source tables are protected by RLS.
-
Use a per-role VIEW on top of the stream table to re-apply filtering:
CREATE VIEW orders_view AS SELECT * FROM order_totals_st WHERE user_id = current_user_id(). -
Consider column-level masking extensions (e.g.,
anon) on the stream table output view. -
Review
pgtrickle.list_stream_tables()output for any stream tables selecting from RLS-protected sources.
Related Documentation
- Getting Started — First stream table in 5 minutes
- Configuration Reference — All GUC variables
- SQL Reference — Complete function reference
- Best-Practice Patterns — Common data modeling patterns
- Architecture — How pg_trickle works internally
- Backup & Restore — Backup considerations
Scaling Guide
This document provides guidance for scaling pg_trickle to hundreds of stream tables and beyond. It covers worker pool sizing, scheduler tuning, and diagnostic queries for identifying bottlenecks.
Architecture Overview
pg_trickle uses a two-tier background worker model:
- Launcher — one per server. Scans
pg_databaseevery 10 seconds, spawns per-database schedulers, and auto-restarts crashed workers. - Per-database scheduler — one per database. Wakes every
scheduler_interval_ms(default: 1 s), reads DAG changes from shared memory, consumes CDC buffers, and dispatches refreshes.
When parallel_refresh_mode = 'on', the scheduler dispatches refresh work to a
pool of dynamic background workers instead of running refreshes inline.
Worker Pool Sizing
| Deployment Size | Stream Tables | Recommended max_dynamic_refresh_workers | Notes |
|---|---|---|---|
| Small | 1–20 | 2–4 | Default (4) is usually sufficient |
| Medium | 20–100 | 4–8 | Monitor worker saturation |
| Large | 100–200 | 8–16 | Enable tiered scheduling |
| Very Large | 200+ | 16–32 | Tune per-database quotas |
Budget Formula
Worker slots are drawn from max_worker_processes, which is shared with
autovacuum, parallel queries, and other extensions:
max_worker_processes >= launchers(1)
+ schedulers(N_databases)
+ max_dynamic_refresh_workers
+ autovacuum_max_workers
+ max_parallel_workers
+ other_extensions
Example for 200 STs across 2 databases with 16 workers:
# postgresql.conf
max_worker_processes = 40
pg_trickle.max_dynamic_refresh_workers = 16
pg_trickle.max_concurrent_refreshes = 8
pg_trickle.per_database_worker_quota = 8
pg_trickle.parallel_refresh_mode = 'on'
Tiered Scheduling
For deployments with 50+ stream tables, enable tiered scheduling to reduce scheduler overhead:
pg_trickle.tiered_scheduling = on -- default since v0.12.0
The scheduler classifies stream tables into tiers based on change frequency:
| Tier | Schedule Multiplier | Behavior |
|---|---|---|
| Hot | 1× (base interval) | Tables with frequent changes |
| Warm | 2× | Tables with moderate changes |
| Cold | 10× | Tables with rare changes |
| Frozen | skip | Tables with no recent changes |
This reduces the CPU cost of the scheduling loop itself, which can become a bottleneck at 200+ STs when every table is polled every cycle.
Dispatch Priority
When multiple stream tables are ready simultaneously, the scheduler dispatches in priority order:
- IMMEDIATE closures — time-critical refresh requests
- Atomic groups / Repeatable-read groups / Fused chains — multi-ST units
- Singletons — individual stream tables
- Cyclic SCCs — strongly-connected components
Within each priority band, the tier sort applies (Hot > Warm > Cold).
Per-Database Quotas and Burst
When per_database_worker_quota > 0, each database gets a guaranteed slice
of the worker pool:
- Normal load (cluster < 80% capacity): database can burst to 150% of its quota using idle capacity from other databases.
- High load (cluster ≥ 80% capacity): strict quota enforcement.
This prevents a single high-traffic database from starving others.
Monitoring
Worker Pool Status
SELECT * FROM pgtrickle.worker_pool_status();
-- Returns: active_workers, max_workers, per_db_cap, parallel_mode
Active Job Details
SELECT * FROM pgtrickle.parallel_job_status(300);
-- Returns recent jobs (last 300s): status, duration, worker PID, etc.
Health Summary
SELECT * FROM pgtrickle.health_summary();
-- Returns: total/active/error/suspended/stale counts, scheduler status, cache hit rate
Buffer Backlog Check
SELECT * FROM pgtrickle.change_buffer_sizes()
ORDER BY row_count DESC
LIMIT 20;
Identifying Bottlenecks
Is the scheduler loop the bottleneck?
-- If queue depth is consistently > 10 and workers are not saturated,
-- the scheduler loop is the bottleneck. Reduce scheduler_interval_ms.
SELECT active_workers, max_workers
FROM pgtrickle.worker_pool_status();
Are workers saturated?
-- If active_workers == max_workers consistently, increase the pool.
SELECT active_workers >= max_workers AS saturated
FROM pgtrickle.worker_pool_status();
Which STs take the longest?
SELECT st.pgt_schema, st.pgt_name,
AVG(EXTRACT(EPOCH FROM (h.end_time - h.start_time))) AS avg_sec,
MAX(EXTRACT(EPOCH FROM (h.end_time - h.start_time))) AS max_sec,
COUNT(*) AS refreshes
FROM pgtrickle.pgt_refresh_history h
JOIN pgtrickle.pgt_stream_tables st ON st.pgt_id = h.pgt_id
WHERE h.start_time > now() - interval '1 hour'
AND h.status = 'COMPLETED'
GROUP BY st.pgt_schema, st.pgt_name
ORDER BY avg_sec DESC
LIMIT 20;
Tuning Profiles
Low-Latency (< 50 ms P99)
pg_trickle.scheduler_interval_ms = 200
pg_trickle.parallel_refresh_mode = 'on'
pg_trickle.max_dynamic_refresh_workers = 8
pg_trickle.tiered_scheduling = on
High-Throughput (200+ STs)
pg_trickle.scheduler_interval_ms = 500
pg_trickle.parallel_refresh_mode = 'on'
pg_trickle.max_dynamic_refresh_workers = 16
pg_trickle.max_concurrent_refreshes = 8
pg_trickle.per_database_worker_quota = 8
pg_trickle.tiered_scheduling = on
pg_trickle.merge_work_mem_mb = 128
Resource-Constrained (4 CPU / 8 GB RAM)
pg_trickle.scheduler_interval_ms = 2000
pg_trickle.parallel_refresh_mode = 'on'
pg_trickle.max_dynamic_refresh_workers = 2
pg_trickle.max_concurrent_refreshes = 2
pg_trickle.tiered_scheduling = on
pg_trickle.delta_work_mem_cap_mb = 256
pg_trickle.merge_work_mem_mb = 32
Profiling Methodology
To profile worker utilization at scale, run a test with 200+ stream tables
and max_workers set to 4, 8, and 16 in turn. Collect the following metrics
at 1-second intervals:
-- Worker pool utilization over time
SELECT now() AS ts,
(SELECT active_workers FROM pgtrickle.worker_pool_status()) AS active,
(SELECT max_workers FROM pgtrickle.worker_pool_status()) AS pool_size,
(SELECT COUNT(*) FROM pgtrickle.parallel_job_status(5)
WHERE status = 'QUEUED') AS queue_depth;
Plot active / pool_size (utilization) and queue_depth over time.
If utilization is consistently > 90% with non-zero queue depth, the pool
is undersized. If utilization is < 50%, the pool is oversized and consuming
max_worker_processes slots unnecessarily.
Known Scaling Limits
| Resource | Practical Limit | Bottleneck |
|---|---|---|
| Stream tables per DB | ~500 | Scheduler loop CPU |
| Worker pool size | 64 | GUC max |
| Change buffer rows | max_buffer_rows (default 1M) | Disk I/O |
| Template cache size | 128 entries (L1) | Evictions increase at >128 STs |
| DAG depth | ~20 levels | Topological sort + cascade latency |
Read Replicas & Hot Standby
Added in v0.19.0 (SCAL-1 / STAB-2).
pg_trickle is a primary-only extension. Stream tables are maintained by the background scheduler through DML (INSERT, DELETE, MERGE), which is only possible on the primary server.
Behaviour on Replicas
When the pg_trickle shared library is loaded on a read replica (physical standby or streaming replica):
- The launcher worker detects
pg_is_in_recovery() = trueand enters a sleep loop, checking every 30 seconds for promotion. - Upon promotion (e.g.
pg_promote()), the launcher resumes normal operation and spawns per-database schedulers. - Manual refresh calls (
pgtrickle.refresh_stream_table()) on a replica are rejected with a clear error message.
Recommended Setup
- Include
pg_trickleinshared_preload_librarieson both primary and replicas. This ensures immediate availability after failover without a restart. - Stream tables are read-queryable on replicas via physical replication — the storage tables are regular PostgreSQL tables that replicate normally.
- Monitor the replication lag to estimate stream table staleness on replicas.
CNPG & Kubernetes Operations
Added in v0.19.0 (SCAL-3).
CloudNativePG (CNPG) is the recommended Kubernetes operator for running pg_trickle. The extension is packaged as a custom container image that extends the official PostgreSQL image.
Container Image
Build the pg_trickle image using the provided Dockerfiles:
# GHCR image (multi-stage build)
docker build -f Dockerfile.ghcr -t pg-trickle:latest .
# Or use the CNPG-specific Dockerfile
docker build -f cnpg/Dockerfile.ext -t pg-trickle-cnpg:latest .
CNPG Cluster Configuration
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: pg-trickle-cluster
spec:
instances: 3
imageName: your-registry/pg-trickle:0.19.0
postgresql:
shared_preload_libraries:
- pg_trickle
parameters:
pg_trickle.enabled: "true"
pg_trickle.scheduler_interval_ms: "1000"
pg_trickle.max_concurrent_refreshes: "4"
# STAB-1: If using PgBouncer sidecar in transaction mode:
# pg_trickle.connection_pooler_mode: "transaction"
Operational Notes
- Failover: pg_trickle detects promotion automatically (see Read Replicas above). After CNPG promotes a replica, the launcher starts within 30 seconds.
- Scaling replicas: Stream table data replicates to all replicas via physical replication. No pg_trickle-specific configuration needed on replicas.
- Backup: Use CNPG's built-in Barman backup. pg_trickle's catalog tables are included automatically. See Backup & Restore.
- Monitoring: The Prometheus endpoint (
pgtrickle.health_summary()) is compatible with CNPG's monitoring sidecar. See the Grafana dashboards inmonitoring/grafana/.
Cluster-wide Worker Fairness (v0.27.0)
When pg_trickle is installed across multiple databases on the same PostgreSQL
instance, all scheduler background workers share a single worker pool bounded
by pg_trickle.max_dynamic_refresh_workers. Without care, high-throughput
databases can starve lower-priority databases of worker slots.
Quota allocation
Use the quota formula to distribute workers fairly:
per_db_quota = ceil(max_dynamic_refresh_workers / N_databases)
For high-priority databases, increase their individual quota via ALTER DATABASE SET:
ALTER DATABASE tenant_high SET pg_trickle.max_dynamic_refresh_workers = 4;
Monitoring cluster-wide allocation
Use pgtrickle.cluster_worker_summary() to monitor allocation in real time:
SELECT db_name, active_workers, total_active_workers
FROM pgtrickle.cluster_worker_summary()
ORDER BY active_workers DESC;
See docs/integrations/multi-tenant.md for the complete multi-tenant deployment guide, Prometheus configuration, and Grafana dashboard snippets.
Per-database Prometheus labels
From v0.27.0, all metrics include db_oid and db_name labels so Grafana
dashboards can filter by database without requiring separate scrape targets:
rate(pg_trickle_refreshes_total{db_name="tenant_a"}[5m])
See also: Capacity Planning · Configuration · Multi-Database Deployments · Performance Cookbook · Cost Model · Pre-Deployment Checklist
Capacity Planning
This page helps you estimate the resources pg_trickle will need before you put it in production: disk for change buffers and WAL, memory for refresh execution, CPU for the scheduler and refresh workers, and connection budget for background workers.
The rules of thumb here are starting points. The Performance Cookbook and Scaling Guide cover how to tune once you have real data to work with.
Quick sizing table
| Deployment size | Stream tables | Source tables | Sustained write rate | Recommended starting config |
|---|---|---|---|---|
| Small | 1–20 | 1–20 | < 100/s | All defaults |
| Medium | 20–100 | 20–100 | 100–1,000/s | parallel_refresh_mode=on, max_dynamic_refresh_workers=4 |
| Large | 100–500 | 50–500 | 1,000–10,000/s | tiered_scheduling=on, max_dynamic_refresh_workers=8, WAL CDC |
| Very large | 500+ | 500+ | > 10,000/s | Add per-database quotas; consider Citus |
Disk: change buffers
Each source table referenced by a stream table gets its own change
buffer (pgtrickle_changes.changes_<oid>). One row per captured
change.
Per-row size estimate (trigger CDC):
~ row_overhead (24 B)
+ key_columns (≈ 2 × avg_key_size)
+ referenced_columns (sum of referenced col sizes)
+ bitmap (1–2 B for narrow tables, ~ ncols/8 otherwise)
Rule of thumb: budget ~1.5 KB per captured change for a typical wide-row OLTP table, ~150 B for a narrow lookup table.
Steady-state size:
buffer_bytes ≈ writes_per_second × refresh_interval × per_row_size × (1 - compaction_ratio)
Compaction collapses cancelling INSERT/DELETE pairs and successive updates to the same row. Typical compaction ratios:
| Workload | Compaction ratio |
|---|---|
| Append-only event log | 0% |
| Mixed OLTP | 30–60% |
| High-churn (frequent UPDATEs to same key) | 70–95% |
Worked example. A source table doing 5,000 writes/s, refreshed every 5 s, with 50% compaction and 1 KB rows:
5000 × 5 × 1024 × 0.5 ≈ 12.8 MiB per refresh cycle
That is the peak size of the buffer between refreshes. After the
refresh, the consumed rows are deleted (or TRUNCATEd depending on
pg_trickle.cleanup_use_truncate).
Alerts. Set pg_trickle.buffer_alert_threshold (default
100000 rows) so a WARNING is logged before a buffer becomes
unbounded.
Disk: WAL retention (WAL CDC mode)
If you use pg_trickle.cdc_mode = 'auto' or 'wal', each source
table gets a logical replication slot. PostgreSQL retains WAL until
every active slot has consumed it.
Worst-case retention = slot_lag_critical_threshold_mb (default
1024 MB). Add this per source to your WAL disk budget if you
expect occasional refresh delays.
Recommended monitoring:
SELECT * FROM pgtrickle.check_cdc_health()
WHERE severity != 'OK';
Memory: refresh execution
Each refresh runs a MERGE (DIFFERENTIAL) or full INSERT … SELECT
(FULL). Memory usage is dominated by hash tables and sorts:
peak_memory ≈ work_mem × (number of hash/sort nodes in the plan)
For most stream tables, work_mem = 64 MB is comfortable. Wide
joins or large GROUP BY may benefit from 256 MB. Use
pg_trickle.merge_work_mem_mb to set it per-refresh without
affecting the rest of the database.
pg_trickle.spill_threshold_blocks controls when intermediate
results spill to disk; raise it on memory-rich servers.
CPU and worker processes
The scheduler is one background worker per database. Refresh work
is either inline (default) or dispatched to a dynamic worker pool
(pg_trickle.parallel_refresh_mode = on).
max_worker_processes ≥ 1 (launcher)
+ N (one scheduler per pg_trickle database)
+ max_dynamic_refresh_workers
+ autovacuum_max_workers
+ max_parallel_workers
+ other extensions
A typical safe starting point:
# postgresql.conf
max_worker_processes = 32
max_parallel_workers = 8
pg_trickle.max_dynamic_refresh_workers = 4
Defaults of 8 are usually too low — the Pre-Deployment Checklist calls this out as the most common silent misconfiguration.
DAG topology and scheduling overhead
The scheduler walks the dependency DAG every tick (default 1 s). Per-stream-table overhead is small (sub-millisecond) but not zero. For very large DAGs:
| Stream tables | Recommended scheduler interval |
|---|---|
| < 50 | 1000 ms (default) |
| 50–200 | 1000–2000 ms, plus tiered_scheduling=on |
| 200–1000 | 2000–5000 ms + Hot/Warm/Cold tiers |
| 1000+ | Consider splitting across databases |
The scheduler's zero-change overhead is documented in README – Zero-Change Latency (target < 10 ms).
Connection budget
Each parallel-refresh worker uses one PostgreSQL backend slot. The scheduler uses one. The launcher uses one. So:
backends_used_by_pg_trickle = 1 + databases + max_dynamic_refresh_workers
If you front PostgreSQL with PgBouncer, this is separate from your application's pool — pg_trickle's background workers connect directly, not through the pooler.
Network (Citus & multi-node)
In Citus deployments, the coordinator polls each worker's WAL slot
on every scheduler tick via dblink. Bandwidth scales with the
delta volume, not the source-table size, but you still need a
fast and reliable coordinator-to-worker network.
Plan for:
- One TCP connection per worker per polling cycle.
- Short, frequent reads.
- Tolerance for individual worker failures —
pg_trickle.citus_worker_retry_tickscontrols when failures escalate toWARNING.
Forecasting growth
A workable rough model for a year of growth:
year_1_disk = current_buffer_peak × growth_factor
+ current_storage × growth_factor × number_of_stream_tables
+ WAL_retention_budget
Stream-table storage itself is just an ordinary heap table — its size is the size of the result set, no different from a materialized view.
Sanity-check queries
-- Per-source change-buffer size
SELECT * FROM pgtrickle.change_buffer_sizes()
ORDER BY pending_rows DESC;
-- Per-stream-table refresh stats
SELECT pgt_name, last_full_ms, last_diff_ms, p95_ms
FROM pgtrickle.st_refresh_stats()
ORDER BY p95_ms DESC NULLS LAST;
-- Worker-pool saturation
SELECT * FROM pgtrickle.worker_pool_status();
See also: Scaling Guide · Performance Cookbook · Configuration · Pre-Deployment Checklist
Multi-Database Deployments
pg_trickle is multi-database aware. A single PostgreSQL server can host pg_trickle in any number of databases simultaneously, and the extension's background workers handle the fan-out automatically. You do not need to start anything per database; the launcher discovers them.
Architecture in one diagram
┌─────────────────────────────────────────────────────────────┐
│ PostgreSQL 18 server │
│ │
│ ┌────────────┐ │
│ │ Launcher │ ── scans pg_database every ~10 s │
│ └─────┬──────┘ │
│ │ spawns │
│ ┌────┼────────────────────┬────────────────────┐ │
│ ▼ ▼ ▼ ▼ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │Scheduler │ │Scheduler │ │Scheduler │ │
│ │ db_a │ │ db_b │ │ db_etl │ │
│ └────┬─────┘ └────┬─────┘ └────┬─────┘ │
│ │ refresh jobs │ refresh jobs │ │
│ ▼ ▼ ▼ │
│ ┌─────────────────────────────────────────────────┐ │
│ │ Shared dynamic refresh worker pool │ │
│ │ (max_dynamic_refresh_workers, per-DB quotas) │ │
│ └─────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
- One launcher per PostgreSQL server.
- One scheduler per database that has
pg_trickleinstalled. - One shared dynamic worker pool for parallel refreshes.
The launcher restarts crashed schedulers automatically. Each scheduler is fully independent; failure in one database does not affect the others.
Enabling pg_trickle in additional databases
Just install the extension. The launcher will pick it up on the next discovery cycle (within ~10 s).
\c db_b
CREATE EXTENSION pg_trickle;
Verify the scheduler started:
SELECT * FROM pgtrickle.cluster_worker_summary();
Expected: one row per database with a column showing the scheduler PID and uptime.
Resource budgeting across databases
Background-worker slots are a finite, server-wide resource. The formula:
max_worker_processes ≥ launchers(1)
+ schedulers(N_databases)
+ max_dynamic_refresh_workers
+ autovacuum_max_workers
+ max_parallel_workers
+ other_extensions
For 4 databases each running pg_trickle, with 8 dynamic refresh workers and modest autovacuum:
max_worker_processes = 32
max_parallel_workers = 8
pg_trickle.max_dynamic_refresh_workers = 8
pg_trickle.per_database_worker_quota = 4
Without per_database_worker_quota, one busy database can starve
the others. Set it to max_dynamic_refresh_workers / N_databases
or higher.
Cluster-wide observability
-- One row per database with per-database stats
SELECT * FROM pgtrickle.cluster_worker_summary();
-- Combined health across every database
SELECT datname, severity, message
FROM pgtrickle.cluster_health_check() -- (where exposed)
WHERE severity != 'OK';
The pgtrickle workers command also aggregates
across databases when invoked against a server-level URL.
Common patterns
Per-tenant database
If you isolate tenants by database, each gets its own scheduler.
Combined with tiered_scheduling = on, low-traffic tenants pay
almost no scheduler cost.
App database + analytics database
A common topology: app_db runs OLTP and a few IMMEDIATE stream
tables; analytics_db (often a logical replica) runs the heavy
DIFFERENTIAL aggregates. The launcher handles both.
Tenant-of-tenants (Citus)
For very large multi-tenant deployments, see Citus — distributed sources can replace per-tenant databases.
Caveats
- pg_trickle does not create cross-database stream tables. Stream tables live in exactly one database; their sources must live there too. Use logical replication or downstream publications to bridge databases.
shared_preload_libraries = 'pg_trickle'is server-wide. The launcher then discovers per-database state.pg_trickle.enabled = offdisables the launcher (and therefore every scheduler) — use it for maintenance windows.
See also: Scaling Guide · Capacity Planning · Configuration
Backup and Restore
pg_trickle plays nicely with every standard PostgreSQL backup
mechanism — pg_dump, pg_basebackup, pgBackRest, WAL archiving,
PITR, and pre-built tools like CloudNativePG and Crunchy Operator.
The catalog, change buffers, and stream-table contents are all
ordinary PostgreSQL relations, so they get backed up like anything
else.
This page walks through the recommended workflows, the gotchas, and how the v0.27 Snapshots API fits in.
TL;DR. Physical backups (pgBackRest,
pg_basebackup) just work.pg_dumpworks too, with one small ordering rule. Snapshots are an application-level tool for derived state, not a backup replacement.
Choosing the right tool
| Tool | Best for | Notes |
|---|---|---|
| pgBackRest / WAL-G / pg_basebackup | Production backup & PITR | Full-fidelity; no special pg_trickle steps |
pg_dump / pg_restore | Logical copies, dev environments, schema migration | Works; restore order matters slightly |
| Stream-table snapshots | Replica bootstrap, archival of derived state, fast rollback of one stream table | Not a substitute for a real backup |
Physical backups (pgBackRest, pg_basebackup, WAL-G)
Physical backups copy the data directory at the file-system level.
Everything is captured: source tables, stream-table storage, the
pgtrickle.* catalog, the pgtrickle_changes.* change buffers,
and (in WAL CDC mode) the replication slots' on-disk state.
Restore procedure:
- Restore the data directory exactly as you would for any PostgreSQL database.
- Start PostgreSQL.
- The pg_trickle launcher discovers each database on the next tick (~10 s) and resumes the per-database scheduler.
There is nothing pg_trickle-specific to do.
Point-in-time recovery (PITR). PITR works as expected. If you
recover to a point in the middle of a refresh, that refresh is
marked failed in pgtrickle.pgt_refresh_history on first start;
the next scheduler tick re-runs it. No data loss.
WAL CDC slots after restore. If you were running in
pg_trickle.cdc_mode = 'wal' and the restored cluster came up
without the original slots (e.g. a logical-decoding replica that
did not inherit slots), pg_trickle's scheduler detects the absence
and re-bootstraps trigger CDC for the affected sources. You will
see one WARNING per source; the system continues to work.
Logical backups (pg_dump / pg_restore)
pg_dump produces a portable SQL script (or directory archive)
that can be replayed into a fresh database. pg_trickle objects are
included automatically because they are normal extension objects.
The one ordering rule: restore must follow the standard
PostgreSQL "schema, then data, then constraints/indexes" order.
pg_restore --section=pre-data --section=data --section=post-data
does this for you. Avoid hand-editing the dump to interleave
sections.
Recommended workflow
# Create the dump (custom or directory format)
pg_dump --format=custom --file=mydb.dump mydb
# Restore into a fresh database
createdb mydb_restored
pg_restore --dbname=mydb_restored --jobs=4 mydb.dump
Then, if you want to verify everything came back:
-- Should list every stream table
SELECT * FROM pgtrickle.pgt_status();
-- Force a refresh on each one to confirm CDC is wired
SELECT pgtrickle.refresh_stream_table(pgt_name)
FROM pgtrickle.stream_tables_info;
What pg_dump does and does not capture
| Object | Captured by pg_dump? |
|---|---|
| Source tables (your data) | ✅ |
| Stream-table storage (your derived data) | ✅ |
pgtrickle.* catalog rows | ✅ |
| CDC trigger definitions | ✅ (recreated when the extension reapplies them) |
pgtrickle_changes.* change buffers | ✅ — but typically empty after a clean dump |
| WAL replication slots (WAL CDC mode) | ✕ (slots are not dumpable; the scheduler recreates them) |
| Refresh history | ✅ |
If you do not need the audit history, you can shrink the dump with
pg_dump --exclude-table='pgtrickle.pgt_refresh_history'.
Stream-table snapshots vs. backups
Snapshots (v0.27+) are an application-level mechanism for capturing the contents of one stream table at a chosen point. They are great for:
- Bootstrapping a replica without re-running a slow full refresh.
- Archiving a slowly-changing dimension daily.
- Rolling one stream table back after a defining-query mistake.
They are not a backup of your database. Use them in addition to,
not instead of, pgBackRest / pg_dump.
A reasonable production posture:
- Daily pgBackRest backup.
- Snapshots of your most important stream tables on the cadence that matches your business RPO.
- WAL retention sized to PITR window.
Backup and restore on Kubernetes (CNPG)
CloudNativePG handles backup orchestration via Barman / object storage. pg_trickle is fully compatible:
- Use
Cluster.spec.backupexactly as you would for any other PG cluster. - After a
Cluster.spec.bootstrap.recoveryoperation, the pg_trickle launcher resumes automatically. - For very large stream tables, consider taking pre-backup snapshots and restoring them on the new cluster to skip an initial full refresh.
See CloudNativePG integration.
Disaster-recovery checklist
- Backup tool of choice configured (pgBackRest / WAL-G / CNPG / managed service).
- WAL retention window ≥ your PITR target.
-
If using WAL CDC: alerting on
pg_trickle.slot_lag_critical_threshold_mb. - Periodic snapshot of business-critical stream tables.
-
Documented restore procedure tested at least once
(snapshot → fresh database →
pg_trickle.health_check()). - Off-site copy of backups (managed service, S3 with cross-region replication, etc.).
-
Monitoring on
pg_trickle.pgt_refresh_historyfor restore drift.
See also: Snapshots · High Availability and Replication · CloudNativePG integration · Capacity Planning
Snapshots
A snapshot is a point-in-time copy of a stream table's contents, stored as an ordinary PostgreSQL table. Snapshots let you back up derived state, bootstrap a replica, build deterministic test fixtures, or compare two refresh runs without having to re-derive the data.
Available since v0.27.0
Why snapshots?
A stream table's contents are derived — pg_trickle can always recompute them from the source tables. But recomputation is not free, and there are operational situations where having a frozen copy is cheaper, safer, or simpler:
- Replica bootstrap. When you stand up a new read replica or a fresh environment, you can restore from a snapshot in seconds instead of waiting for an initial full refresh that may take minutes or hours on a large dataset.
- Point-in-time forensics. Take a snapshot before a risky migration or a suspicious incident; compare it to the live stream table later.
- Test fixtures. Snapshot a stream table from a representative environment and check it into a test database.
- Cheap rollback. If a defining-query change goes wrong, restore from the most recent snapshot while you investigate.
Quickstart
Take a snapshot
SELECT pgtrickle.snapshot_stream_table('order_totals');
-- pgtrickle.snapshot_order_totals_1735689421000
The function returns the fully-qualified name of the new snapshot
table. By default snapshots live in the pgtrickle schema and are
named snapshot_<table>_<epoch_ms>.
You can choose your own name with the optional second argument:
SELECT pgtrickle.snapshot_stream_table(
'order_totals',
'archive.order_totals_2026_q1'
);
List snapshots
SELECT * FROM pgtrickle.list_snapshots();
Or filter to a single stream table:
SELECT * FROM pgtrickle.list_snapshots('order_totals');
Restore from a snapshot
SELECT pgtrickle.restore_from_snapshot(
'order_totals', -- stream table to restore into
'pgtrickle.snapshot_order_totals_1735689421000' -- snapshot table
);
After a restore, pg_trickle reinitialises the stream table's frontier so that the next refresh reads only changes that occurred after the snapshot was taken.
Drop an old snapshot
SELECT pgtrickle.drop_snapshot('pgtrickle.snapshot_order_totals_1735689421000');
What's in a snapshot
The snapshot table is a plain PostgreSQL heap table with the same
columns as the stream table, including the hidden __pgt_row_id
column. That is what allows a restore to map snapshot rows back to
their stable identities.
Because the snapshot is an ordinary table, you can:
- Back it up with
pg_dump, copy it elsewhere withpg_dump -t, or move it across databases with\copy. - Inspect it freely with regular SQL.
- Add indexes for read-side workloads (the snapshot is independent of the live stream table).
Operational patterns
Periodic archival
-- Every night, snapshot a slowly-changing dimension
SELECT pgtrickle.snapshot_stream_table(
'customer_360',
format('archive.customer_360_%s', to_char(now(), 'YYYY_MM_DD'))
);
-- Keep only the last 30 days
SELECT pgtrickle.drop_snapshot(snapshot_table)
FROM pgtrickle.list_snapshots('customer_360')
WHERE created_at < now() - interval '30 days';
Replica bootstrap
-- On the source: pg_dump the snapshot
pg_dump -t pgtrickle.snapshot_order_totals_1735689421000 mydb > snap.sql
-- On the replica: restore the snapshot, then reattach
psql replicadb < snap.sql
SELECT pgtrickle.restore_from_snapshot(
'order_totals',
'pgtrickle.snapshot_order_totals_1735689421000'
);
Disaster recovery
Combine snapshots with regular pg_dump of the source tables.
After a restore, pg_trickle's frontier tracking ensures the stream
table will catch up correctly when CDC resumes.
Caveats
- Snapshots are not coordinated across multiple stream tables. If you need a consistent view across several stream tables, take them inside a single transaction and rely on PostgreSQL's MVCC isolation.
- Snapshots do not freeze the source tables. The "as-of" time is determined by the most recent refresh of the stream table at the moment you take the snapshot.
- A restore reinitialises the frontier — if you want the stream table to replay changes between the snapshot time and now, the source CDC slots / change buffers must still hold those entries. Otherwise, expect a full refresh on the next cycle.
See also: Backup & Restore · Replica Bootstrap & PITR Alignment (Patterns) · SQL Reference – Lifecycle
High Availability and Replication
This page covers running pg_trickle in production with PostgreSQL replication: how stream tables behave on physical (streaming) replicas, on logical replicas, during failover, and across read-write splits.
Looking for backups instead? See Backup & Restore. Looking for disaster-recovery snapshots? See Snapshots.
Quick answers
| Question | Answer |
|---|---|
| Can I run pg_trickle on a physical replica? | The extension can be installed, but the scheduler does not refresh on a hot standby. Stream tables on the standby reflect what was promoted from the primary. |
| Will my stream tables survive a failover? | Yes — they are ordinary heap tables. On the new primary, the scheduler resumes from the last persisted frontier. |
| Can I logically replicate a stream table? | Yes — including the hidden __pgt_row_id column. See Downstream Publications. |
| Does pg_trickle work on a logical replica? | Yes — the replica is its own database with its own pg_trickle scheduler. |
| What about CNPG (Kubernetes)? | Fully supported; see CloudNativePG integration. |
Physical (streaming) replication
PostgreSQL's streaming replication ships WAL from primary to replica. Stream tables, change buffers, and the pg_trickle catalog are all WAL-logged, so they are byte-for-byte identical on the replica.
On the primary: the scheduler runs and refreshes as normal.
On the replica (hot standby): the scheduler does not refresh — the database is read-only. Stream-table contents are exactly the contents that were on the primary at the replica's replay LSN. Reads from the replica are perfectly valid; writes (and refreshes) are not.
After a failover:
- The new primary's pg_trickle launcher detects the database is writable.
- The scheduler resumes refreshing from the last persisted frontier.
- Any change-buffer rows that arrived between the last refresh and the failover are processed in the next cycle.
Recommended GUCs on a streaming-replica role you might promote:
shared_preload_libraries = 'pg_trickle'
pg_trickle.enabled = on # safe to leave on; refresh is gated by writability
Logical replication
A logical replica is a separate database that subscribes to one or more publications on the primary. Each logical replica has its own pg_trickle catalog and its own scheduler.
This makes logical replication a good answer for an analytics replica: replicate the source tables to the analytics replica, install pg_trickle there, and define your stream tables on the replica. Heavy DIFFERENTIAL workloads are isolated from the OLTP primary.
┌──────────────┐ logical ┌────────────────────┐
│ primary │ publication: source tables │ analytics replica │
│ (writes) │ ─────────────────────────────▶ │ (heavy STs here) │
└──────────────┘ └────────────────────┘
Stream tables on the analytics replica are independent of any stream tables on the primary.
Replicating stream tables themselves
If you want a downstream system to receive stream-table changes (another PostgreSQL, Debezium, Kafka, …), use downstream publications:
SELECT pgtrickle.stream_table_to_publication('order_totals');
The publication exposes the storage table's INSERT/DELETE events,
including the __pgt_row_id column. Subscribers receive the
materialised data only — they do not need to know that pg_trickle
is generating it.
Failover behaviour in detail
When the primary fails and a replica is promoted:
| Stage | What happens |
|---|---|
| Promotion | Replica becomes writable; its pg_trickle launcher detects this. |
| Scheduler restart | Scheduler resumes from the last persisted frontier (catalog row). |
| Change-buffer rows | Any rows captured before the failover are still in the buffers (they're WAL-logged). They are processed in the next refresh. |
| In-flight refresh | An interrupted refresh is marked failed in pgt_refresh_history and retried automatically (subject to the fuse). |
| WAL CDC slots | If using WAL CDC, the slots exist on the replica (slots are WAL-logged in PG ≥ 17 with failover_slots). On older versions, the scheduler recreates them and falls back briefly to triggers. |
CNPG (Kubernetes) specifics
- Use the OCI extension image:
ghcr.io/trickle-labs/pg_trickle-ext:<version>. - CNPG's standby cluster topology is supported — pg_trickle behaves exactly as on bare-metal streaming replication.
Cluster.spec.postgresql.shared_preload_librariesmust includepg_trickle.- Use
Cluster.spec.postgresql.parametersforpg_trickle.*GUCs.
Full example: see
integrations/cloudnativepg.md and
the cnpg/
directory in the repository.
Read-write splits with PgBouncer
pg_trickle's background workers connect directly to PostgreSQL, not through a pooler. Your application can use PgBouncer (in transaction-pool mode, including Supabase / Railway / Neon) freely.
For a read-only replica behind a pooler:
- Reads from stream tables work exactly as reads from any table.
IMMEDIATEstream tables only update on the primary; on the replica they reflect what's been replayed.
See PgBouncer integration for tuning.
Geographic / cross-region replication
The recommended pattern for cross-region:
- Stream tables on the regional primary (low-latency CDC).
- Downstream publications for the materialised results.
- PostgreSQL logical replication carries them to the remote region.
- Optional: pg_trickle in the remote region builds further stream tables on the replicated data.
This keeps the heavy DIFFERENTIAL maintenance close to its source data, and ships only the final materialised diffs over the WAN.
Caveats
- pg_trickle does not participate in synchronous replication decisions. It is data, not infrastructure.
- A logical replica that subscribes to source tables but not to the pg_trickle catalog will need to define its own stream tables — they are not auto-created.
- Promoting a standby with a stale
pgtrickle_changesschema (e.g. after a long replication lag) is fine; the next refresh catches up. If the lag was very long, the model may pick FULL instead of DIFFERENTIAL for the catch-up — by design.
See also: Backup & Restore · Snapshots · Downstream Publications · CloudNativePG integration · PgBouncer integration
Upgrading pg_trickle
This guide covers upgrading pg_trickle from one version to another.
Quick Upgrade (Recommended)
-- 1. Check current version
SELECT extversion FROM pg_extension WHERE extname = 'pg_trickle';
-- 2. Replace the binary files (.so/.dylib, .control, .sql)
-- See the installation method below for your platform.
-- 3. Restart PostgreSQL (required for shared library changes)
-- sudo systemctl restart postgresql
-- 4. Run the upgrade in each database that has pg_trickle installed
ALTER EXTENSION pg_trickle UPDATE;
-- 5. Verify the upgrade
SELECT pgtrickle.version();
SELECT * FROM pgtrickle.health_check();
Step-by-Step Instructions
1. Check Current Version
SELECT extversion FROM pg_extension WHERE extname = 'pg_trickle';
-- Returns your current installed version, e.g. '0.9.0'
2. Install New Binary Files
Replace the extension files in your PostgreSQL installation directory. The method depends on how you originally installed pg_trickle.
From release tarball:
# Replace <new-version> with the target release, for example 0.2.3
curl -LO https://github.com/getretake/pg_trickle/releases/download/v<new-version>/pg_trickle-<new-version>-pg18-linux-amd64.tar.gz
tar xzf pg_trickle-<new-version>-pg18-linux-amd64.tar.gz
# Copy files to PostgreSQL directories
sudo cp pg_trickle-<new-version>-pg18-linux-amd64/lib/* $(pg_config --pkglibdir)/
sudo cp pg_trickle-<new-version>-pg18-linux-amd64/extension/* $(pg_config --sharedir)/extension/
From source (cargo-pgrx):
cargo pgrx install --release
3. Restart PostgreSQL
The shared library (.so / .dylib) is loaded at server start via
shared_preload_libraries. A restart is required for the new binary to
take effect.
sudo systemctl restart postgresql
# or on macOS with Homebrew:
brew services restart postgresql@18
4. Run ALTER EXTENSION UPDATE
Connect to each database where pg_trickle is installed and run:
ALTER EXTENSION pg_trickle UPDATE;
This executes the upgrade migration scripts in order (for example,
pg_trickle--0.5.0--0.6.0.sql → pg_trickle--0.6.0--0.7.0.sql).
PostgreSQL automatically determines the full upgrade chain from your current
version to the new default_version.
5. Verify the Upgrade
-- Check version
SELECT pgtrickle.version();
-- Run health check
SELECT * FROM pgtrickle.health_check();
-- Verify stream tables are intact
SELECT * FROM pgtrickle.stream_tables_info;
-- Test a refresh
SELECT pgtrickle.refresh_stream_table('your_stream_table');
Version-Specific Notes
0.1.3 → 0.2.0
New functions added:
pgtrickle.list_sources(name)— list source tables for a stream tablepgtrickle.change_buffer_sizes()— inspect CDC change buffer sizespgtrickle.health_check()— diagnostic health checkspgtrickle.dependency_tree()— visualize the dependency DAGpgtrickle.trigger_inventory()— audit CDC triggerspgtrickle.refresh_timeline(max_rows)— refresh historypgtrickle.diamond_groups()— diamond dependency group infopgtrickle.version()— extension version stringpgtrickle.pgt_ivm_apply_delta(...)— internal IVM delta applicationpgtrickle.pgt_ivm_handle_truncate(...)— internal TRUNCATE handlerpgtrickle._signal_launcher_rescan()— internal launcher signal
No schema changes to pgtrickle.pgt_stream_tables or
pgtrickle.pgt_dependencies catalog tables.
No breaking changes. All v0.1.3 functions and views continue to work as before.
0.2.0 → 0.2.1
Three new catalog columns added to pgtrickle.pgt_stream_tables:
| Column | Type | Default | Purpose |
|---|---|---|---|
topk_offset | INT | NULL | Pre-provisioned for paged TopK OFFSET (activated in v0.2.2) |
has_keyless_source | BOOLEAN NOT NULL | FALSE | EC-06: keyless source flag; switches apply strategy from MERGE to counted DELETE |
function_hashes | TEXT | NULL | EC-16: stores MD5 hashes of referenced function bodies for change detection |
The migration script (pg_trickle--0.2.0--0.2.1.sql) adds these columns
via ALTER TABLE … ADD COLUMN IF NOT EXISTS.
No breaking changes. All v0.2.0 functions, views, and event triggers continue to work as before.
What's also new:
- Upgrade migration safety infrastructure (scripts, CI, E2E tests)
- GitHub Pages book expansion (6 new documentation pages)
- User-facing upgrade guide (this document)
0.2.1 → 0.2.2
No catalog table DDL changes. The topk_offset column needed for paged
TopK was already added in v0.2.1.
Two SQL function updates are applied by
pg_trickle--0.2.1--0.2.2.sql:
pgtrickle.create_stream_table(...)- default
schedulechanges from'1m'to'calculated' - default
refresh_modechanges from'DIFFERENTIAL'to'AUTO'
- default
pgtrickle.alter_stream_table(...)- adds the optional
queryparameter used by ALTER QUERY support
- adds the optional
Because PostgreSQL stores argument defaults and function signatures in
pg_proc, the migration script must DROP FUNCTION and recreate both
signatures during ALTER EXTENSION ... UPDATE.
Behavioral notes:
- Existing stream tables keep their current catalog values. The migration only
changes the defaults used by future
create_stream_table(...)calls. - Existing applications can opt a table into the new defaults explicitly via
pgtrickle.alter_stream_table(...)after the upgrade. - After installing the new binary and restarting PostgreSQL, the scheduler now
warns if the shared library version and SQL-installed extension version do
not match. This helps detect stale
.so/.dylibfiles after partial upgrades.
0.2.2 → 0.2.3
One new catalog column is added to pgtrickle.pgt_stream_tables:
| Column | Type | Default | Purpose |
|---|---|---|---|
requested_cdc_mode | TEXT | NULL | Optional per-stream-table CDC override ('auto', 'trigger', 'wal') |
The upgrade script also recreates two SQL functions:
pgtrickle.create_stream_table(...)- adds the optional
cdc_modeparameter
- adds the optional
pgtrickle.alter_stream_table(...)- adds the optional
cdc_modeparameter
- adds the optional
Monitoring view updates:
pgtrickle.pg_stat_stream_tablesgains thecdc_modescolumnpgtrickle.pgt_cdc_statusis added for per-source CDC visibility
Because PostgreSQL stores function signatures and defaults in pg_proc, the
upgrade script drops and recreates both lifecycle functions during
ALTER EXTENSION ... UPDATE.
0.6.0 → 0.7.0
One new catalog column is added to pgtrickle.pgt_stream_tables:
| Column | Type | Default | Purpose |
|---|---|---|---|
last_fixpoint_iterations | INT | NULL | Records how many rounds the last circular-dependency fixpoint run required |
Two new catalog tables are added:
| Table | Purpose |
|---|---|
pgtrickle.pgt_watermarks | Stores per-source watermark progress reported by external loaders |
pgtrickle.pgt_watermark_groups | Stores groups of sources that must stay temporally aligned before refresh |
The upgrade script also updates and adds SQL functions:
- Recreates
pgtrickle.pgt_status()so the result includesscc_id - Adds
pgtrickle.pgt_scc_status()for circular-dependency monitoring - Adds
pgtrickle.advance_watermark(source, watermark) - Adds
pgtrickle.create_watermark_group(name, sources[], tolerance_secs) - Adds
pgtrickle.drop_watermark_group(name) - Adds
pgtrickle.watermarks() - Adds
pgtrickle.watermark_groups() - Adds
pgtrickle.watermark_status()
Behavioral notes:
- Circular stream table dependencies can now run to convergence when
pg_trickle.allow_circular = trueand every member of the cycle is safe for monotone DIFFERENTIAL refresh. - The scheduler can now hold back refreshes until related source tables are aligned within a configured watermark tolerance.
- Existing non-circular stream tables continue to work as before. The new catalog objects are additive.
0.7.0 → 0.8.0
No catalog schema changes. The upgrade migration script contains no DDL.
New operational features:
pg_dump/pg_restoresupport: stream tables are now safely exported and re-connected after restore without manual intervention.- Connection pooler opt-in was introduced at the per-stream level (superseded
by the more comprehensive
pooler_compatibility_modeadded in v0.10.0).
No breaking changes. All v0.7.0 functions, views, and event triggers continue to work as before.
0.8.0 → 0.9.0
No catalog schema DDL changes to pgtrickle.pgt_stream_tables or the
dependency catalog.
New API function added:
pgtrickle.restore_stream_tables()— re-installs CDC triggers and re-registers stream tables after apg_restorefrom apg_dump.
Hidden auxiliary columns for AVG / STDDEV / VAR aggregates. Stream tables
using these aggregates will automatically receive hidden __pgt_aux_*
columns on the next refresh after upgrading. No manual action is needed —
pg_trickle detects missing auxiliary columns and performs a single full
reinitialise to add them.
Behavioral notes:
- COUNT, SUM, and AVG now update in constant time (O(changed rows)) instead of rescanning the whole group.
- STDDEV and VAR variants likewise update in O(changed rows) via hidden sum-of-squares auxiliary columns.
- MIN/MAX still requires a group rescan only when the deleted value is the current extreme.
- Refresh groups (
create_refresh_group,drop_refresh_group,refresh_groups()) are available starting from this version.
0.9.0 → 0.10.0
Two new catalog columns added to pgtrickle.pgt_stream_tables:
| Column | Type | Default | Purpose |
|---|---|---|---|
pooler_compatibility_mode | BOOLEAN NOT NULL | FALSE | Disables prepared statements and NOTIFY for this stream table — required when accessed through PgBouncer in transaction-pool mode |
refresh_tier | TEXT NOT NULL | 'hot' | Tiered scheduling tier: hot, warm, cold, or frozen |
One new catalog table is added:
| Table | Purpose |
|---|---|
pgtrickle.pgt_refresh_groups | Stores refresh groups for snapshot-consistent multi-table refresh |
The upgrade script also updates and adds SQL functions:
pgtrickle.create_stream_table(...)gains thepooler_compatibility_modeparameterpgtrickle.create_stream_table_if_not_exists(...)likewisepgtrickle.create_or_replace_stream_table(...)likewisepgtrickle.alter_stream_table(...)likewise- Adds
pgtrickle.create_refresh_group(name, members, isolation) - Adds
pgtrickle.drop_refresh_group(name) - Adds
pgtrickle.refresh_groups()— lists all declared groups
Behavioral notes:
pooler_compatibility_modedefaults tofalse. Existing stream tables are unaffected. Enable it only for stream tables accessed through PgBouncer transaction-mode pooling.pg_trickle.auto_backoffnow defaults toon(wasoff). The backoff threshold is raised from 80 % → 95 % and the maximum slowdown is capped at 8× (was 64×). If you relied on the old opt-in behaviour, setpg_trickle.auto_backoff = offexplicitly.diamond_consistencynow defaults to'atomic'for new stream tables (was'none'). Existing stream tables keep their current setting.- The scheduler now uses row-level locking for concurrency control instead of session-level advisory locks, making pg_trickle compatible with PgBouncer transaction-pool and similar connection poolers.
- Statistical aggregates (
CORR,COVAR_*,REGR_*) now update incrementally using Welford-style accumulation, no longer requiring a group rescan. - Materialized view sources can now be used in DIFFERENTIAL mode when
pg_trickle.matview_polling = onis set. - Recursive CTE stream tables with DELETE/UPDATE now use the Delete-and-Rederive algorithm (O(delta) instead of O(n)).
0.10.0 → 0.11.0
New catalog columns added to pgtrickle.pgt_stream_tables:
| Column | Type | Default | Purpose |
|---|---|---|---|
effective_refresh_mode | TEXT | NULL | Actual refresh mode used in the last cycle (FULL / DIFFERENTIAL / APPEND_ONLY / TOP_K / NO_DATA); populated by the scheduler after each completed refresh |
fuse_mode | TEXT NOT NULL | 'off' | Circuit-breaker mode: off, on, or auto |
fuse_state | TEXT NOT NULL | 'armed' | Circuit-breaker state: armed, blown, or disabled |
fuse_ceiling | BIGINT | NULL | Maximum change-row count that can pass through in one refresh before the fuse blows; NULL = unlimited |
fuse_sensitivity | INT | NULL | Sensitivity multiplier for auto-fuse detection |
blown_at | TIMESTAMPTZ | NULL | Timestamp when the fuse last triggered |
blow_reason | TEXT | NULL | Human-readable reason the fuse blew |
st_partition_key | TEXT | NULL | Partition key column for declaratively partitioned stream tables; NULL = not partitioned |
Updated function signatures — existing calls continue to work because new parameters all have defaults:
pgtrickle.create_stream_table(...)gainspartition_by TEXT DEFAULT NULLpgtrickle.create_stream_table_if_not_exists(...)likewisepgtrickle.create_or_replace_stream_table(...)likewisepgtrickle.alter_stream_table(...)gainsfuse TEXT DEFAULT NULL,fuse_ceiling BIGINT DEFAULT NULL,fuse_sensitivity INT DEFAULT NULL
New functions:
pgtrickle.reset_fuse(name TEXT, action TEXT DEFAULT 'apply')— clear a blown fuse and resume schedulingpgtrickle.fuse_status()— returns circuit-breaker state for every stream tablepgtrickle.explain_refresh_mode(name TEXT)— shows configured mode, effective mode, and the reason for any downgrade
Behavioral notes:
- Stream-table-to-stream-table chains now refresh incrementally — downstream tables receive a small insert/delete delta rather than cascading full refreshes.
pg_trickle.tiered_schedulingnow defaults toon.- Declaratively partitioned stream tables are supported via
partition_by— the refresh MERGE is automatically restricted to only the changed partitions.
0.11.0 → 0.12.0
No schema changes. This release adds four new diagnostic SQL functions only:
| Function | Returns | Purpose |
|---|---|---|
pgtrickle.explain_query_rewrite(query TEXT) | TABLE(pass_name TEXT, changed BOOL, sql_after TEXT) | Walk a query through every DVM rewrite pass to see how pg_trickle transforms it |
pgtrickle.diagnose_errors(name TEXT) | TABLE(event_time TIMESTAMPTZ, error_type TEXT, error_message TEXT, remediation TEXT) | Last 5 FAILED refresh events with error classification and suggested fixes |
pgtrickle.list_auxiliary_columns(name TEXT) | TABLE(column_name TEXT, data_type TEXT, purpose TEXT) | List all hidden __pgt_* auxiliary columns on a stream table's storage relation |
pgtrickle.validate_query(query TEXT) | TABLE(valid BOOL, mode TEXT, reason TEXT) | Parse and validate a query for stream-table compatibility without creating one |
Behavioral notes:
- The incremental engine now handles multi-table join deletes correctly — phantom rows after simultaneous deletes from multiple join sides no longer occur.
- Stream-table-to-stream-table row identity is now computed consistently between the change buffer and the downstream table, eliminating stale duplicate rows after upstream UPDATEs.
pg_trickle.tiered_schedulingdefaults toon(same as 0.11.0 runtime behaviour; this release makes it the explicit default).
0.12.0 → 0.13.0
Ten new catalog columns added to pgtrickle.pgt_stream_tables:
| Column | Type | Default | Purpose |
|---|---|---|---|
effective_refresh_mode | TEXT | NULL | Computed refresh mode after AUTO resolution |
fuse_mode | TEXT NOT NULL | 'off' | Fuse configuration: off, auto, or manual |
fuse_state | TEXT NOT NULL | 'armed' | Current fuse state: armed or blown |
fuse_ceiling | BIGINT | NULL | Maximum change count before fuse blows |
fuse_sensitivity | INT | NULL | Consecutive cycles above ceiling before triggering |
blown_at | TIMESTAMPTZ | NULL | Timestamp when the fuse last blew |
blow_reason | TEXT | NULL | Reason the fuse blew |
st_partition_key | TEXT | NULL | Partition key specification (RANGE, LIST, or HASH) |
max_differential_joins | INT | NULL | Maximum join count for differential mode (auto-fallback to FULL when exceeded) |
max_delta_fraction | DOUBLE PRECISION | NULL | Maximum delta-to-table ratio for differential mode (auto-fallback to FULL when exceeded) |
All columns use ADD COLUMN IF NOT EXISTS for idempotent upgrades.
Nine new SQL functions (plus one replacement with new signature):
| Function | Purpose |
|---|---|
pgtrickle.explain_delta(name, format) | Delta SQL query plan inspection |
pgtrickle.dedup_stats() | MERGE deduplication frequency counters |
pgtrickle.shared_buffer_stats() | Per-source-buffer observability |
pgtrickle.explain_refresh_mode(name) | Refresh mode decision explanation |
pgtrickle.reset_fuse(name) | Reset a blown fuse |
pgtrickle.fuse_status() | Fuse state across all stream tables |
pgtrickle.explain_query_rewrite(query) | DVM rewrite pass inspection |
pgtrickle.diagnose_errors(name) | Error classification and remediation |
pgtrickle.list_auxiliary_columns(name) | Hidden __pgt_* column listing |
pgtrickle.validate_query(query) | Query compatibility validation |
pgtrickle.alter_stream_table(...) | (replaced) — new partition_by parameter |
New GUC variables:
| GUC | Default | Purpose |
|---|---|---|
pg_trickle.per_database_worker_quota | 0 (auto) | Per-database parallel worker limit |
Behavioral notes:
- Shared change buffers: Multiple stream tables reading from the same source now automatically share a single change buffer. No migration action required — existing per-source buffers continue to work.
- Columnar change tracking: Wide-table UPDATEs that touch only value columns (not GROUP BY / JOIN / WHERE columns) now generate significantly less delta volume. This is fully automatic.
- Auto buffer partitioning: Set
pg_trickle.buffer_partitioning = 'auto'to let high-throughput buffers self-promote to partitioned mode for O(1) cleanup. - dbt macros: If you use dbt-pgtrickle, update your macros to the matching v0.13.0 version. New config options:
partition_by,fuse,fuse_ceiling,fuse_sensitivity.
No breaking changes. All v0.12.0 functions, views, and event triggers continue to work as before.
0.13.0 → 0.14.0
Two new catalog columns added to pgtrickle.pgt_stream_tables:
| Column | Type | Default | Purpose |
|---|---|---|---|
last_error_message | TEXT | NULL | Error message from the last permanent refresh failure |
last_error_at | TIMESTAMPTZ | NULL | Timestamp of the last permanent refresh failure |
Updated function signature (return type gained new columns):
pgtrickle.st_refresh_stats()— gainsconsecutive_errors,schedule,refresh_tier, andlast_error_messagecolumns. The upgrade script drops and recreates the function. No behavior change for existing callers that ignore unknown columns.
New SQL functions (available immediately after ALTER EXTENSION ... UPDATE):
| Function | Purpose |
|---|---|
pgtrickle.recommend_refresh_mode(name) | Workload-based refresh mode recommendation with confidence level |
pgtrickle.refresh_efficiency(name) | Per-table FULL vs. DIFFERENTIAL performance metrics |
pgtrickle.export_definition(name) | Export stream table as reproducible DROP+CREATE+ALTER DDL |
pgtrickle.convert_buffers_to_unlogged() | Convert logged change buffers to UNLOGGED |
New GUC variables:
| GUC | Default | Purpose |
|---|---|---|
pg_trickle.planner_aggressive | true | Consolidated switch replacing merge_planner_hints + merge_work_mem_mb |
pg_trickle.unlogged_buffers | false | Create new change buffers as UNLOGGED (reduces WAL by ~30%) |
pg_trickle.agg_diff_cardinality_threshold | 1000 | Warn at creation time when GROUP BY cardinality is below this |
Deprecated GUCs (still accepted but ignored at runtime):
pg_trickle.merge_planner_hints→ usepg_trickle.planner_aggressivepg_trickle.merge_work_mem_mb→ usepg_trickle.planner_aggressive
Behavioral notes:
- Error-state circuit breaker: A single permanent refresh failure (e.g.
a function that doesn't exist for the column type) now immediately sets the
stream table status to
ERRORwith a message stored inlast_error_message. The scheduler skipsERRORtables. Usepgtrickle.resume_stream_table(name)followed bypgtrickle.alter_stream_table(name, query => ...)to recover. - Tiered scheduling NOTICE: Demoting a stream table from
hottocoldorfrozennow emits a NOTICE so operators are aware the effective refresh interval has changed (10× for cold, suspended for frozen). - SECURITY DEFINER triggers: All CDC trigger functions now run with
SECURITY DEFINERand an explicitSET search_path, hardening against privilege-escalation attacks. This is applied automatically on upgrade — no manual action needed.
No breaking changes. All v0.13.0 functions, views, and event triggers continue to work as before.
0.14.0 → 0.15.0
No schema changes. New features: interactive dashboard,
bulk create_stream_tables_from_schema(), and per-table runaway-refresh
protection (max_refresh_duration_ms).
New GUC variables:
| GUC | Default | Purpose |
|---|---|---|
pg_trickle.ivm_cache_max_entries | 0 (unbounded) | Bound per-backend IVM delta cache |
0.15.0 → 0.16.0
No schema changes. Performance improvements to the delta pipeline and
refresh path. L2 catalog-backed template cache (pgtrickle.pgt_template_cache)
introduced.
0.16.0 → 0.17.0
No schema changes. Query intelligence improvements: window function differentiation, correlated-sublink rewriting.
0.17.0 → 0.18.0
No schema changes. Hardening pass: tightened unsafe blocks, improved error propagation, delta performance improvements (prepared-statement MERGE path).
0.18.0 → 0.19.0
No schema changes. Security enhancements: SECURITY DEFINER on all public-facing functions, improved RLS awareness in delta generation.
0.19.0 → 0.20.0
New catalog table: pgtrickle.pgt_self_monitoring for extension health
metrics. New function: pgtrickle.metrics_summary().
0.20.0 → 0.21.0
No schema changes. Reliability improvements: advisory-lock hardening, WAL-receiver retry, graceful SIGTERM in background workers.
0.21.0 → 0.22.0
No schema changes. New features: downstream CDC pipeline, parallel refresh scheduling, predictive cost model for FULL vs DIFFERENTIAL selection.
0.22.0 → 0.23.0
No schema changes. Performance tuning and diagnostics: delta amplification
detection, EXPLAIN capture (PGS_PROFILE_DELTA), adaptive threshold
auto-tuning.
New GUC variables:
| GUC | Default | Purpose |
|---|---|---|
pg_trickle.delta_amplification_threshold | 10.0 | Warn when output/input delta ratio exceeds this |
pg_trickle.log_delta_sql | false | Log resolved delta SQL at DEBUG1 |
0.23.0 → 0.24.0
No schema changes. Join correctness hardening: phantom-row detection infrastructure, durability improvements for committed change buffers.
0.24.0 → 0.25.0
No schema changes. Scheduler scalability: worker pool, L1 template cache with LRU eviction.
New GUC variables:
| GUC | Default | Purpose |
|---|---|---|
pg_trickle.worker_pool_size | 0 | Persistent worker pool size |
pg_trickle.template_cache_max_entries | 0 | L1 delta SQL template cache cap |
0.25.0 → 0.26.0
No schema changes. Concurrency hardening: improved lock ordering, stress test suite, fixed MERGE race under high concurrency.
0.26.0 → 0.27.0
New catalog columns added to pgtrickle.pgt_stream_tables:
| Column | Type | Default | Purpose |
|---|---|---|---|
last_full_ms | FLOAT8 | NULL | Duration of last FULL refresh (ms) |
auto_threshold | FLOAT8 | NULL | Adaptive FULL/DIFF cost-ratio threshold |
New catalog table: pgtrickle.pgt_template_cache for L2 cross-backend
delta SQL storage.
New SQL functions:
| Function | Purpose |
|---|---|
pgtrickle.snapshot_stream_table(name, dest) | Consistent snapshot copy |
pgtrickle.restore_from_snapshot(name, source) | Restore from snapshot |
pgtrickle.list_snapshots(name) | List available snapshots |
pgtrickle.recommend_schedule(name) | SLA-based scheduling recommendation |
pgtrickle.schedule_recommendations() | Multi-table scheduling report |
pgtrickle.cluster_worker_summary() | Cross-database scheduler health |
pgtrickle.metrics_summary() | Prometheus-compatible extension metrics |
New GUC variables:
| GUC | Default | Purpose |
|---|---|---|
pg_trickle.metrics_port | 9187 | Prometheus metrics port |
pg_trickle.metrics_request_timeout_ms | 5000 | Metrics endpoint timeout |
pg_trickle.frontier_holdback_mode | warn | Holdback action on stale frontier |
pg_trickle.frontier_holdback_warn_seconds | 300 | Frontier holdback warning threshold |
pg_trickle.publication_lag_warn_bytes | 104857600 | WAL lag warning threshold |
pg_trickle.schedule_recommendation_min_samples | 20 | Min samples for schedule recommendation |
pg_trickle.schedule_alert_cooldown_seconds | 300 | Min interval between schedule alerts |
pg_trickle.change_buffer_durability | unlogged | Change buffer WAL level |
No breaking changes.
0.27.0 → 0.28.0
New catalog tables: pgtrickle.outbox_events, pgtrickle.inbox_messages,
pgtrickle.inbox_dead_letters for transactional outbox and inbox patterns.
New SQL functions: pgtrickle.enable_outbox(name, ...),
pgtrickle.enable_inbox(name, ...), and related management functions.
No breaking changes.
0.28.0 → 0.29.0
Added relay catalog tables and SQL functions (set_relay_outbox, set_relay_inbox,
enable_relay, disable_relay, delete_relay, get_relay_config,
list_relay_configs) and the standalone pgtrickle-relay binary. These were
later extracted to pg_tide in v0.46.0.
No breaking changes.
0.29.0 → 0.30.0
No schema changes. All improvements are confined to the Rust extension
binary. The migration file (sql/pg_trickle--0.29.0--0.30.0.sql) is empty
other than documentation comments.
New GUC variables:
| GUC | Default | Purpose |
|---|---|---|
pg_trickle.use_sqlstate_classification | false | Locale-safe SQLSTATE-based retry classification |
pg_trickle.template_cache_max_age_hours | 168 | Max age for L2 template-cache entries (hours) |
pg_trickle.max_parse_nodes | 0 | Parser node-count guard (0 = disabled) |
Behavioral changes:
restore_from_snapshot()now returns a typed error (SnapshotSchemaVersionMismatch) when the snapshot has no__pgt_snapshot_versioncolumn (pre-v0.27 snapshots). Previously it silently treated the missing column as compatible.snapshot_stream_table()andrestore_from_snapshot()now wrap critical operations in PostgreSQL subtransactions. A failed catalog INSERT rolls back the snapshot table creation, preventing orphan tables.- Cross-cycle phantom rows are now cleaned up unconditionally after every differential refresh of a join query.
No breaking changes.
0.30.0 → 0.31.0
No schema changes. All improvements are confined to the Rust extension binary and scheduler logic.
New GUC variables:
| GUC | Default | Purpose |
|---|---|---|
pg_trickle.cost_model_miss_penalty | 2.0 | Weight applied to the estimated cost when the planner's row count estimate is inaccurate |
pg_trickle.scheduler_hot_tier_interval_ms | 500 | Effective polling interval (ms) for Hot-tier stream tables |
Behavioral changes:
- Scheduler now uses a predictive cost model to decide DIFFERENTIAL vs. FULL refresh per cycle; the model activates after
pg_trickle.prediction_min_samplessamples. - Event-driven wake now debounces duplicate NOTIFY payloads within a single tick to avoid redundant wakeups on bulk writes.
No breaking changes.
0.31.0 → 0.32.0
No schema changes. Citus stable naming infrastructure is added without altering the public catalog schema.
Behavioral changes:
pgtrickle.source_stable_name(rel_oid)introduced as a deterministic, version-stable WAL slot name for Citus distributed sources.- Per-source
last_frontiercolumn added topgtrickle.pgt_stream_tablesviaADD COLUMN IF NOT EXISTS— existing rows receiveNULL.
No breaking changes.
0.32.0 → 0.33.0
Schema additions — new catalog tables for Citus distributed CDC:
| Object | Type | Purpose |
|---|---|---|
pgtrickle.pgt_worker_slots | Table | Tracks per-worker WAL slot name and last-consumed frontier for each Citus worker / source combination |
pgtrickle.pgt_st_locks | Table | Lightweight distributed mutex for cross-coordinator refresh serialisation |
pgtrickle.citus_status | View | Per-(stream table, source, worker) CDC health view |
New SQL functions:
pgtrickle.ensure_worker_slot(st_name, worker_host, worker_port)— creates the WAL slot on a Citus worker if it does not exist.pgtrickle.poll_worker_slot_changes(st_name, worker_host, worker_port)— drains pending WAL changes from a worker slot into the coordinator change buffer.pgtrickle.handle_vp_promoted(payload TEXT)— processes apg_ripple.vp_promotedNOTIFY payload and signals the scheduler.pgtrickle.check_citus_version_compat()— verifies that all worker nodes run the same pg_trickle version.pgtrickle.check_worker_wal_level()— verifies thatwal_level = logicalon every worker.
New create_stream_table() parameter:
output_distribution_column TEXT— when provided (and Citus is installed), converts the output storage table to a Citus distributed table on that column immediately after creation.
New GUC:
| GUC | Default | Purpose |
|---|---|---|
pg_trickle.citus_st_lock_lease_ms | 60000 | Duration (ms) of the pgt_st_locks lease for cross-node coordination |
No application-level breaking changes. Existing stream tables on non-Citus deployments are completely unaffected.
0.33.0 → 0.34.0
Schema additions — the pgtrickle.citus_status view gains five new columns:
| Column | Type | Description |
|---|---|---|
last_polled_at | timestamptz | Timestamp of the last successful per-worker poll |
lease_holder | text | Session holding the pgt_st_locks lease (NULL when unlocked) |
lease_acquired_at | timestamptz | When the current lease was acquired |
lease_expires_at | timestamptz | When the current lease expires |
lease_health | text | 'unlocked' / 'locked' / 'expired' |
Behavioral changes:
- The scheduler now drives the full per-worker slot lifecycle automatically for stream tables with
source_placement = 'distributed':ensure_worker_slot()on first tick (and after topology changes),poll_worker_slot_changes()on every tick, andpgt_st_lockslease acquire/extend/release. Manual wiring viaLISTEN "pg_ripple.vp_promoted" + handle_vp_promoted()is no longer required (though harmless if left in place). - Shard rebalance auto-recovery: the scheduler detects
pg_dist_nodetopology changes, prunes stalepgt_worker_slotsrows, inserts new ones, and marks the stream table for a full refresh — no operator intervention required. - Worker failure isolation: per-worker
poll_worker_slot_changes()failures are caught, logged, and skipped for that tick; healthy workers continue uninterrupted.
New GUC:
| GUC | Default | Purpose |
|---|---|---|
pg_trickle.citus_worker_retry_ticks | 5 | Consecutive per-worker poll failures before emitting a WARNING and flagging in citus_status. Set to 0 to disable. |
Migration note:
ALTER EXTENSION pg_trickle UPDATE TO '0.34.0';
The migration script adds the five new columns to citus_status via CREATE OR REPLACE VIEW. No data loss.
No breaking changes. Non-Citus deployments are completely unaffected.
0.34.0 → 0.35.0
New catalog columns added to pgtrickle.pgt_stream_tables:
| Column | Type | Default | Purpose |
|---|---|---|---|
in_shadow_build | BOOLEAN NOT NULL | FALSE | Whether this stream table is currently undergoing zero-downtime schema evolution |
shadow_table_name | TEXT | NULL | Name of the shadow table being built during schema evolution |
New catalog table:
| Table | Purpose |
|---|---|
pgtrickle.pgt_subscriptions | Stores reactive subscription registrations (NOTIFY channel → stream table mappings) |
New SQL functions:
| Function | Purpose |
|---|---|
pgtrickle.subscribe(stream_table TEXT, channel TEXT) | Register a NOTIFY channel to fire after each refresh cycle |
pgtrickle.unsubscribe(stream_table TEXT, channel TEXT) | Remove a subscription |
pgtrickle.list_subscriptions() | List all active subscriptions |
pgtrickle.sla_summary() | Return p50/p99 latency, freshness lag, error rate, and budget over the SLA window |
pgtrickle.explain_stream_table(name TEXT) | Human-readable DVM configuration and refresh mode explanation |
pgtrickle.view_evolution_status() | Status of in-progress shadow table builds |
Behavioral notes:
- Zero-downtime schema evolution (
ALTER STREAM TABLE) now builds a shadow table in the background and cuts over atomically. Thein_shadow_buildcolumn tracks progress; checkpgtrickle.view_evolution_status()to monitor. pgtrickle.sla_summary()queriespgt_refresh_historyusing thepg_trickle.sla_window_hoursGUC (default 24 h).- Reactive subscriptions emit
pg_notify(channel, '')after each non-empty refresh cycle. Debounce interval is controlled bypg_trickle.notify_coalesce_ms.
New GUCs:
| GUC | Default | Description |
|---|---|---|
pg_trickle.cdc_paused | false | Pause CDC trigger writes (discard mode — see CONFIGURATION.md) |
pg_trickle.notify_coalesce_ms | 250 | Debounce window (ms) for reactive subscription NOTIFY calls |
pg_trickle.sla_window_hours | 24 | Reporting window (h) for sla_summary() |
pg_trickle.history_prune_interval_seconds | 60 | Interval between pgt_refresh_history cleanup sweeps |
Migration note:
ALTER EXTENSION pg_trickle UPDATE TO '0.35.0';
No breaking changes. All v0.34.0 functions continue to work.
0.35.0 → 0.36.0
New catalog columns added to pgtrickle.pgt_stream_tables:
| Column | Type | Default | Purpose |
|---|---|---|---|
temporal_mode | BOOLEAN NOT NULL | FALSE | Enable temporal IVM (SCD Type 2) tracking |
storage_backend | TEXT NOT NULL | 'heap' | Storage backend: 'heap', 'citus', or 'pg_mooncake' |
column_lineage | JSONB | NULL | Column-level lineage mapping output columns to source tables/columns |
New SQL functions:
| Function | Purpose |
|---|---|
pgtrickle.drain(timeout_s INT) | Gracefully quiesce all in-flight refreshes |
pgtrickle.is_drained() | Check whether the scheduler is fully drained |
pgtrickle.bulk_alter_stream_tables(names TEXT[], params JSONB) | Alter multiple stream tables in one call |
pgtrickle.bulk_drop_stream_tables(names TEXT[]) | Drop multiple stream tables in one call |
pgtrickle.stream_table_lineage(name TEXT) | Return column-level lineage for a stream table |
pgtrickle.exec_stream_ddl(cmd TEXT) | Execute DDL in the stream-table DDL sandbox |
Updated function signatures:
pgtrickle.create_stream_table(...)gainstemporal BOOLEAN DEFAULT FALSEandstorage_backend TEXT DEFAULT 'heap'parameters.
Behavioral notes:
- Temporal IVM (CORR-1): Stream tables created with
temporal := truemaintain SCD Type 2 history. Each row carries__pgt_valid_from TIMESTAMPTZand__pgt_valid_to TIMESTAMPTZ. Existing tables are unaffected. - Alternative storage backends:
storage_backend = 'citus'creates the stream table storage as a Citus distributed table;'pg_mooncake'uses columnar storage. Both require the respective extensions to be installed. - Drain mode (A35):
pgtrickle.drain()is a safety mechanism for maintenance windows. The scheduler completes all in-flight refreshes, then stops dispatching new ones until the drain is cancelled or the server restarts. - WAL slot backpressure (A12): The
pg_trickle.enforce_backpressureGUC is now wired — when slot lag exceedsslot_lag_critical_threshold_mb, CDC writes are suppressed. SeeCONFIGURATION.mdfor details and the discard semantics ofcdc_paused.
New GUCs:
| GUC | Default | Description |
|---|---|---|
pg_trickle.enforce_backpressure | false | Suppress CDC writes when WAL slot lag exceeds critical threshold |
pg_trickle.log_format | 'text' | Structured log format: 'text' or 'json' |
pg_trickle.temporal_stream_tables | false | Master switch for temporal IVM support |
TRUNCATE and CDC semantics: When pg_trickle.cdc_paused = on, CDC trigger
bodies return NULL — changes are discarded. This is the discard mode.
After un-pausing, stream tables must be reinitialized (FULL refresh) to
recover from the gap. A future cdc_capture_mode = 'hold' option is planned.
Migration note:
ALTER EXTENSION pg_trickle UPDATE TO '0.36.0';
No breaking changes. The column_lineage column is additive. The
temporal_mode and storage_backend columns have safe defaults.
0.36.0 → 0.37.0
Schema change: All existing change buffer tables in
pgtrickle_changes.* gain a __pgt_trace_context TEXT column via dynamic
ALTER TABLE. This is applied automatically by the upgrade script.
Behavioral notes:
- W3C Trace Context (F10): When
pg_trickle.enable_trace_propagation = true, CDC triggers capture the sessionpg_trickle.trace_idGUC into the__pgt_trace_contextcolumn. At refresh time, the stored trace context is propagated to any OTLP span exported topg_trickle.otel_endpoint. - pgVectorMV (F4):
avg(vector_col)andsum(vector_col)in defining queries are now handled incrementally whenpg_trickle.enable_vector_agg = true. Requires pgvector ≥ 0.7.0.
New GUCs:
| GUC | Default | Description |
|---|---|---|
pg_trickle.enable_vector_agg | false | Enable incremental vector aggregate operators |
pg_trickle.enable_trace_propagation | false | Enable W3C Trace Context propagation |
pg_trickle.otel_endpoint | '' | OTLP/gRPC endpoint for span export (empty = off) |
pg_trickle.trace_id | '' | Session-level W3C traceparent header |
Migration note:
ALTER EXTENSION pg_trickle UPDATE TO '0.37.0';
The upgrade script applies ALTER TABLE ... ADD COLUMN IF NOT EXISTS __pgt_trace_context TEXT to each existing change buffer table. This is
idempotent and safe for large installations. Expect a brief metadata lock
on each buffer table during the upgrade.
No breaking changes. The __pgt_trace_context column is NULL unless
enable_trace_propagation = true and a session trace_id is set.
0.37.0 → 0.38.0
No schema changes. This is a correctness and diagnostic release.
Behavioral notes:
- EC-01 correctness closeout: Join phantom row elimination is complete. Property tests now prove convergence across join patterns including three-way joins with simultaneous multi-side deletes.
- Fuzz regression fixes: All known fuzz corpus failures are resolved.
Migration note:
ALTER EXTENSION pg_trickle UPDATE TO '0.38.0';
No breaking changes.
0.38.0 → 0.39.0
New SQL function:
| Function | Purpose |
|---|---|
pgtrickle.cdc_pause_status() | Return the active CDC pause state: paused flag, capture mode, and operator guidance |
Extended SQL function:
pgtrickle.explain_stream_table(name TEXT)— now includes CDC status, backpressure state, and explicit DIFF/FULL reasoning.
New GUC:
| GUC | Default | Description |
|---|---|---|
pg_trickle.cdc_capture_mode | 'discard' | CDC semantics when cdc_paused=on: 'discard' (default, drops changes) or 'hold' (reserved) |
Behavioral notes:
- O39-6 SQLSTATE-first retry: When
use_sqlstate_classification = true(default), scheduler retry decisions use the bracketed SQLSTATE code fromPgTrickleError::SpiErrorCodeinstead of English error message text, making retry behavior locale-independent. - O39-8 CDC capture mode: The
cdc_pauseddiscard semantics are now explicitly documented and operator-visible viapgtrickle.cdc_pause_status(). Thecdc_capture_modeGUC is reserved for a future hold mode.
TRUNCATE/CDC semantics (explicit):
When pg_trickle.cdc_paused = on:
cdc_capture_mode = 'discard'(default): All DML against source tables passes through the CDC trigger, but the trigger returnsNULLimmediately without writing to the change buffer. Changes are permanently lost. After un-pausing, run a FULL refresh on any stream table that received DML during the pause:SELECT pgtrickle.refresh_stream_table('my_stream', 'FULL');cdc_capture_mode = 'hold': Not yet implemented. Emits aWARNINGand falls back to'discard'.
When TRUNCATE occurs on a source table:
- In trigger-based CDC mode, the TRUNCATE trigger calls
pgtrickle.pgt_ivm_handle_truncate()which schedules a FULL refresh. Ifcdc_paused = on, the trigger returnsNULLand the TRUNCATE is not recorded. After un-pausing, the stream table will not be aware that a TRUNCATE occurred — reinitialize explicitly. - In WAL-based CDC mode, TRUNCATEs are detected during the next logical decoding poll and scheduled for FULL refresh.
Migration note:
ALTER EXTENSION pg_trickle UPDATE TO '0.39.0';
No catalog schema changes. The upgrade script is a no-op DDL-wise.
0.39.0 → 0.40.0
No SQL schema changes. This release contains internal improvements only.
ALTER EXTENSION pg_trickle UPDATE TO '0.40.0';
0.40.0 → 0.50.0
No SQL schema changes. This release contains internal improvements and new documentation only.
ALTER EXTENSION pg_trickle UPDATE TO '0.50.0';
0.50.0 → 0.51.0
Breaking changes:
pg_trickle.event_driven_wakeGUC has been removed (CQ-10-02). Remove anyALTER SYSTEM SET pg_trickle.event_driven_wake ...orpostgresql.confentries for this GUC before upgrading.pg_trickle.wake_debounce_msGUC has been removed (CQ-10-02). Remove any references to this GUC from your configuration.
No SQL schema changes. Only code and documentation changes.
Migration:
-- Remove obsolete GUC settings before upgrading (run as superuser):
ALTER SYSTEM RESET pg_trickle.event_driven_wake;
ALTER SYSTEM RESET pg_trickle.wake_debounce_ms;
SELECT pg_reload_conf();
-- Then upgrade:
ALTER EXTENSION pg_trickle UPDATE TO '0.51.0';
Supported Upgrade Paths
The following migration hops are available. PostgreSQL chains them
automatically when you run ALTER EXTENSION pg_trickle UPDATE.
| From | To | Script |
|---|---|---|
| 0.1.3 | 0.2.0 | pg_trickle--0.1.3--0.2.0.sql |
| 0.2.0 | 0.2.1 | pg_trickle--0.2.0--0.2.1.sql |
| 0.2.1 | 0.2.2 | pg_trickle--0.2.1--0.2.2.sql |
| 0.2.2 | 0.2.3 | pg_trickle--0.2.2--0.2.3.sql |
| 0.2.3 | 0.3.0 | pg_trickle--0.2.3--0.3.0.sql |
| 0.3.0 | 0.4.0 | pg_trickle--0.3.0--0.4.0.sql |
| 0.4.0 | 0.5.0 | pg_trickle--0.4.0--0.5.0.sql |
| 0.5.0 | 0.6.0 | pg_trickle--0.5.0--0.6.0.sql |
| 0.6.0 | 0.7.0 | pg_trickle--0.6.0--0.7.0.sql |
| 0.7.0 | 0.8.0 | pg_trickle--0.7.0--0.8.0.sql |
| 0.8.0 | 0.9.0 | pg_trickle--0.8.0--0.9.0.sql |
| 0.9.0 | 0.10.0 | pg_trickle--0.9.0--0.10.0.sql |
| 0.10.0 | 0.11.0 | pg_trickle--0.10.0--0.11.0.sql |
| 0.11.0 | 0.12.0 | pg_trickle--0.11.0--0.12.0.sql |
| 0.12.0 | 0.13.0 | pg_trickle--0.12.0--0.13.0.sql |
| 0.13.0 | 0.14.0 | pg_trickle--0.13.0--0.14.0.sql |
| 0.14.0 | 0.15.0 | pg_trickle--0.14.0--0.15.0.sql |
| 0.15.0 | 0.16.0 | pg_trickle--0.15.0--0.16.0.sql |
| 0.16.0 | 0.17.0 | pg_trickle--0.16.0--0.17.0.sql |
| 0.17.0 | 0.18.0 | pg_trickle--0.17.0--0.18.0.sql |
| 0.18.0 | 0.19.0 | pg_trickle--0.18.0--0.19.0.sql |
| 0.19.0 | 0.20.0 | pg_trickle--0.19.0--0.20.0.sql |
| 0.20.0 | 0.21.0 | pg_trickle--0.20.0--0.21.0.sql |
| 0.21.0 | 0.22.0 | pg_trickle--0.21.0--0.22.0.sql |
| 0.22.0 | 0.23.0 | pg_trickle--0.22.0--0.23.0.sql |
| 0.23.0 | 0.24.0 | pg_trickle--0.23.0--0.24.0.sql |
| 0.24.0 | 0.25.0 | pg_trickle--0.24.0--0.25.0.sql |
| 0.25.0 | 0.26.0 | pg_trickle--0.25.0--0.26.0.sql |
| 0.26.0 | 0.27.0 | pg_trickle--0.26.0--0.27.0.sql |
| 0.27.0 | 0.28.0 | pg_trickle--0.27.0--0.28.0.sql |
| 0.28.0 | 0.29.0 | pg_trickle--0.28.0--0.29.0.sql |
| 0.29.0 | 0.30.0 | pg_trickle--0.29.0--0.30.0.sql |
| 0.30.0 | 0.31.0 | pg_trickle--0.30.0--0.31.0.sql |
| 0.31.0 | 0.32.0 | pg_trickle--0.31.0--0.32.0.sql |
| 0.32.0 | 0.33.0 | pg_trickle--0.32.0--0.33.0.sql |
| 0.33.0 | 0.34.0 | pg_trickle--0.33.0--0.34.0.sql |
| 0.34.0 | 0.35.0 | pg_trickle--0.34.0--0.35.0.sql |
| 0.35.0 | 0.36.0 | pg_trickle--0.35.0--0.36.0.sql |
| 0.36.0 | 0.37.0 | pg_trickle--0.36.0--0.37.0.sql |
| 0.37.0 | 0.38.0 | pg_trickle--0.37.0--0.38.0.sql |
| 0.38.0 | 0.39.0 | pg_trickle--0.38.0--0.39.0.sql |
| 0.39.0 | 0.40.0 | pg_trickle--0.39.0--0.40.0.sql |
| 0.40.0 | 0.50.0 | pg_trickle--0.40.0--0.50.0.sql |
| 0.50.0 | 0.51.0 | pg_trickle--0.50.0--0.51.0.sql |
Any installation from 0.1.3 onward can be upgraded to 0.51.0 in a single
ALTER EXTENSION pg_trickle UPDATE — PostgreSQL chains the hops automatically
after the new binaries are installed and the server has been restarted.
Rollback / Downgrade
PostgreSQL does not support automatic extension downgrades. To roll back:
- Export stream table definitions (if you want to recreate them later):
cargo run --bin pg_trickle_dump -- --output backup.sql
Or, if the binary is already installed in your PATH:
pg_trickle_dump --output backup.sql
Use --dsn '<connection string>' or standard PG* / DATABASE_URL
environment variables when the default local connection parameters are not
sufficient.
-
Drop the extension (destroys all stream tables):
DROP EXTENSION pg_trickle CASCADE; -
Install the old version and restart PostgreSQL.
-
Recreate the extension at the old version:
CREATE EXTENSION pg_trickle VERSION '0.1.3'; -
Recreate stream tables from your backup.
Troubleshooting
"function pgtrickle.xxx does not exist" after upgrade
This means the upgrade script is missing a function. Workaround:
-- Check what version PostgreSQL thinks is installed
SELECT extversion FROM pg_extension WHERE extname = 'pg_trickle';
-- If the version looks correct but functions are missing,
-- the upgrade script may be incomplete. Try a clean reinstall:
DROP EXTENSION pg_trickle CASCADE;
CREATE EXTENSION pg_trickle CASCADE;
-- Warning: this destroys all stream tables!
Report this as a bug — upgrade scripts should never silently drop functions.
"could not access file pg_trickle" after restart
The new shared library file was not installed correctly. Verify:
ls -la $(pg_config --pkglibdir)/pg_trickle*
ALTER EXTENSION UPDATE says "already at version X"
The binary files are already the new version but the SQL catalog wasn't
upgraded. This usually means the .control file's default_version
matches your current version. Check:
cat $(pg_config --sharedir)/extension/pg_trickle.control
Multi-Database Environments
ALTER EXTENSION UPDATE must be run in each database where pg_trickle
is installed. A common pattern:
for db in $(psql -t -c "SELECT datname FROM pg_database WHERE datname NOT IN ('template0', 'template1')"); do
psql -d "$db" -c "ALTER EXTENSION pg_trickle UPDATE;" 2>/dev/null || true
done
CloudNativePG (CNPG)
For CNPG deployments, see cnpg/README.md for upgrade instructions specific to the Kubernetes operator.
Upgrading to v0.23.0
New GUCs
| GUC | Default | Description |
|---|---|---|
pg_trickle.log_delta_sql | off | Log generated delta SQL at DEBUG1 level for diagnosis |
pg_trickle.delta_work_mem | 0 (inherit) | work_mem override (MB) for delta SQL execution |
pg_trickle.delta_enable_nestloop | on | Allow nested-loop joins during delta execution |
pg_trickle.analyze_before_delta | on | Run ANALYZE on change buffers before delta SQL |
pg_trickle.max_change_buffer_alert_rows | 0 (disabled) | Alert threshold for change buffer overflow |
pg_trickle.diff_output_format | split | DIFF output format: split or merged |
Behavioral Changes
DI-2 aggregate UPDATE-split: The DIFF output row format for aggregate stream tables changes from UPDATE rows to DELETE+INSERT pairs. This is the algebraically correct form that enables O(Δ) performance for multi-join queries.
Impact: Application code that reads the change buffer or outbox and
checks op = 'UPDATE' will silently produce incorrect results.
Migration path:
- Set
pg_trickle.diff_output_format = 'merged'before upgrading - Migrate application code to handle DELETE+INSERT pairs
- Switch to
pg_trickle.diff_output_format = 'split'(default)
Rollback Strategy
The DI-2/DI-6 code paths are gated by detecting UPDATE rows in the change buffer. Downgrading to v0.22.0 is safe if no writes have occurred to upgraded stream tables.
Pre-Upgrade Validation
# Verify version files are in sync
just check-version-sync
New SQL Functions
pgtrickle.explain_diff_sql(name TEXT)— Returns the delta SQL template for a stream table (for inspection/EXPLAIN)pgtrickle.pgtrickle_refresh_stats()— Per-stream-table timing stats with avg/p95/p99 percentiles
Security Guide
This page is the practical security reference for operators of pg_trickle. It covers roles and grants, what privileges the extension needs, how stream tables interact with PostgreSQL Row-Level Security (RLS), how triggers behave under SECURITY DEFINER vs INVOKER, and what to lock down in production.
Reporting a vulnerability? See SECURITY.md in the repository root for the disclosure policy.
Threat model in one paragraph
pg_trickle runs inside PostgreSQL. Anyone who can connect as a superuser, or as a role that owns the relevant tables, can already read, modify, or destroy the data the extension manages — they do not need pg_trickle to do that. The threats this guide focuses on are: privilege escalation through stream tables (e.g., a low-privilege role gaining access to source data via a stream table), accidental exposure of source data through CDC change buffers, and operational mistakes (running everything as the postgres superuser).
Roles & grants
What pg_trickle needs
The extension installs into the pgtrickle and pgtrickle_changes
schemas. The role that runs CREATE EXTENSION pg_trickle must be a
superuser because the extension installs background workers, but
day-to-day usage can (and should) be done with a less-privileged
role.
The role that creates a stream table needs:
USAGEon the schemas containing source tables.SELECTon the source tables referenced in the defining query.CREATEon the schema where the stream table will live.EXECUTEon the relevantpgtrickle.*functions.
Recommended split
-- Owner of stream tables (your application's "data engineer" role)
CREATE ROLE st_author NOINHERIT;
GRANT USAGE ON SCHEMA public TO st_author;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO st_author;
GRANT CREATE ON SCHEMA public TO st_author;
GRANT EXECUTE ON ALL FUNCTIONS IN SCHEMA pgtrickle TO st_author;
GRANT USAGE ON SCHEMA pgtrickle TO st_author;
-- Read-only consumer (your application)
CREATE ROLE app_reader;
GRANT USAGE ON SCHEMA public TO app_reader;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO app_reader;
app_reader can read stream tables exactly as it reads any other
table — the extension does not require special privileges for
reading a stream table.
Stream tables and Row-Level Security (RLS)
A stream table is the materialized result of its defining query. RLS policies on source tables are evaluated at the time the defining query runs, which is during refresh, under the owner's identity (not the consumer's).
This has two important consequences:
- Stream-table contents do not honour the consumer's RLS context. Two consumers with different RLS contexts will read the same rows from the stream table.
- You can apply RLS to the stream table itself to filter rows per consumer. pg_trickle does not interfere with RLS policies on stream tables (they are ordinary heap tables under the hood).
The recommended pattern is therefore:
-- Define the ST without RLS at the source level
SELECT pgtrickle.create_stream_table(
'order_summary',
$$SELECT tenant_id, customer_id, SUM(amount) AS total
FROM orders GROUP BY tenant_id, customer_id$$
);
-- Apply RLS to the stream table for per-tenant isolation
ALTER TABLE order_summary ENABLE ROW LEVEL SECURITY;
CREATE POLICY tenant_isolation ON order_summary
FOR SELECT USING (tenant_id = current_setting('app.tenant_id')::int);
See the Row-Level Security tutorial for a complete worked example.
CDC triggers — SECURITY DEFINER vs INVOKER
In trigger CDC mode, pg_trickle installs AFTER row-level triggers
on every source table. These triggers run as SECURITY DEFINER
under the role that owns the stream table — so they can write to
pgtrickle_changes.* regardless of who issued the source-table
write.
What this means for you:
- Any role that can write to a source table will indirectly write to the corresponding change buffer. That is by design.
- The change buffer table is owned by the stream-table owner. Other roles get no implicit access.
- If you revoke
INSERTon the change buffer, the trigger keeps working (it runs as the owner).
In WAL CDC mode, no triggers are installed; capture happens in
PostgreSQL's logical decoding pipeline and is governed by the
max_replication_slots and wal_level settings.
What change buffers contain
pgtrickle_changes.changes_<oid> tables contain the post-image
of each changed row, restricted to the columns referenced by the
defining query (columnar tracking). Two consequences:
- If your defining query references a sensitive column, that column ends up in the change buffer.
- The change buffer table inherits the same
tablespaceand disk layout rules as ordinary tables. If you encrypt your data directory, the change buffers are encrypted at rest the same way.
You can lock change buffers down further:
REVOKE ALL ON ALL TABLES IN SCHEMA pgtrickle_changes FROM PUBLIC;
GRANT SELECT ON ALL TABLES IN SCHEMA pgtrickle_changes TO st_owner;
Lock down circular dependencies
pg_trickle.allow_circular is off by default and should generally
stay that way. Cycles in the DAG are accepted only when this GUC is
on, and only for monotone queries — but enabling it widens the
class of queries pg_trickle accepts, which deserves explicit
attention. Set it via ALTER SYSTEM and require a superuser to
flip it.
Audit & monitoring
pg_trickle records every refresh in
pgtrickle.pgt_refresh_history. For audit:
-- Last 100 refreshes across the whole installation
SELECT pgt_name, refresh_mode, started_at, finished_at,
success, rows_in, rows_out, error_message
FROM pgtrickle.pgt_refresh_history
ORDER BY started_at DESC
LIMIT 100;
-- Failed refreshes in the last hour
SELECT * FROM pgtrickle.pgt_refresh_history
WHERE NOT success AND started_at > now() - interval '1 hour';
Combine with pg_audit for full DDL/DML coverage. The
Monitoring & Alerting tutorial
includes recommended Prometheus alerts.
Copy-Paste Role Templates
The following SQL templates create the three standard pg_trickle roles and grant the minimum required privileges. Run these as a superuser immediately after installing the extension.
pgtrickle_admin — stream table author
CREATE ROLE pgtrickle_admin NOLOGIN NOINHERIT;
-- Extension function access
GRANT USAGE ON SCHEMA pgtrickle TO pgtrickle_admin;
GRANT USAGE ON SCHEMA pgtrickle_changes TO pgtrickle_admin;
GRANT EXECUTE ON ALL FUNCTIONS IN SCHEMA pgtrickle TO pgtrickle_admin;
-- Create stream tables in the public schema
GRANT CREATE ON SCHEMA public TO pgtrickle_admin;
GRANT USAGE ON SCHEMA public TO pgtrickle_admin;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO pgtrickle_admin;
-- Automatically grant SELECT on new source tables
ALTER DEFAULT PRIVILEGES IN SCHEMA public
GRANT SELECT ON TABLES TO pgtrickle_admin;
pgtrickle_user — application backend
CREATE ROLE pgtrickle_user NOLOGIN NOINHERIT;
GRANT USAGE ON SCHEMA pgtrickle TO pgtrickle_user;
-- Monitoring functions (read-only)
GRANT EXECUTE ON FUNCTION pgtrickle.pgt_status() TO pgtrickle_user;
GRANT EXECUTE ON FUNCTION pgtrickle.refresh_efficiency() TO pgtrickle_user;
GRANT EXECUTE ON FUNCTION pgtrickle.health_check() TO pgtrickle_user;
-- Per-stream-table SELECT (run after each create_stream_table call):
-- GRANT SELECT ON <stream_table_name> TO pgtrickle_user;
pgtrickle_readonly — BI and reporting tools
CREATE ROLE pgtrickle_readonly NOLOGIN NOINHERIT;
GRANT USAGE ON SCHEMA public TO pgtrickle_readonly;
-- Per-stream-table SELECT (run after each create_stream_table call):
-- GRANT SELECT ON <stream_table_name> TO pgtrickle_readonly;
Assign roles to login roles
-- Data engineer
CREATE ROLE de_alice LOGIN PASSWORD '...';
GRANT pgtrickle_admin TO de_alice;
-- Application backend
CREATE ROLE app_backend LOGIN PASSWORD '...';
GRANT pgtrickle_user TO app_backend;
-- BI tool
CREATE ROLE bi_tool LOGIN PASSWORD '...';
GRANT pgtrickle_readonly TO bi_tool;
For a complete worked example including CDC trigger ownership verification, see the Security Hardening tutorial.
Hardening checklist
-
pg_trickle.allow_circular = offunless explicitly needed. - Stream tables owned by a dedicated, non-superuser role.
-
REVOKE ... FROM PUBLIConpgtrickle_changesif change buffers contain sensitive columns. - RLS policies applied to stream tables that present per-tenant data.
-
Audit logging in place for
pgtrickle.pgt_refresh_history. -
pg_trickle.enabled = ononly in environments that should run refreshes (you can disable extension behaviour without uninstalling it).
See also: Row-Level Security tutorial · Pre-Deployment Checklist · Configuration · SECURITY policy
Troubleshooting & Failure Mode Runbook
This document covers common failure scenarios, their symptoms, diagnosis steps, and resolution procedures. It is intended for operators and DBAs running pg_trickle in production.
Quick start: Run
SELECT * FROM pgtrickle.health_check() WHERE severity != 'OK';for a single-call triage of your installation.
See also:
- Error Reference — all
PgTrickleErrorvariants with causes and fixes- FAQ — Troubleshooting section — common user questions
- Pre-Deployment Checklist — configuration verification
- Configuration — GUC reference
Table of Contents
- Diagnostic Toolkit
- Failure Scenarios
- 1. Scheduler Not Running
- 2. Stream Table Stuck in SUSPENDED Status
- 3. CDC Triggers Missing or Disabled
- 4. WAL Replication Slot Lag or Missing
- 5. Stream Table Stuck in INITIALIZING
- 6. Change Buffers Growing Without Refresh
- 7. Lock Contention Blocking Refresh
- 8. Out-of-Memory During Refresh
- 9. Disk Full / WAL Retention Exceeded
- 10. Circular Pipeline Convergence Failure
- 11. Schema Change Broke Stream Table
- 12. Worker Pool Exhaustion
- 13. Fuse Tripped (Circuit Breaker)
- 14. Stream Table Appears Stuck Behind a Long Transaction
- 15. Stale Data After High-Concurrency Writes (Sequence Cache Inversion)
Diagnostic Toolkit
These functions are your primary tools for diagnosing issues:
| Function | Purpose |
|---|---|
pgtrickle.health_check() | Single-call overall health triage (OK/WARN/ERROR) |
pgtrickle.pgt_status() | Status, staleness, error count for all stream tables |
pgtrickle.refresh_timeline(N) | Last N refresh events across all stream tables |
pgtrickle.diagnose_errors('name') | Last 5 failed events with classification and remediation |
pgtrickle.change_buffer_sizes() | CDC pipeline: pending rows and buffer bytes per source |
pgtrickle.trigger_inventory() | CDC trigger presence and enabled state |
pgtrickle.check_cdc_health() | WAL replication slot health (WAL mode only) |
pgtrickle.dependency_tree() | Dependency DAG visualization |
pgtrickle.worker_pool_status() | Parallel refresh worker pool state |
pgtrickle.explain_st('name') | DVM operator tree and generated delta SQL |
Quick health check script:
-- 1. Overall health
SELECT * FROM pgtrickle.health_check() WHERE severity != 'OK';
-- 2. Problem stream tables
SELECT name, status, refresh_mode, consecutive_errors, staleness
FROM pgtrickle.pgt_status()
WHERE status != 'ACTIVE' OR consecutive_errors > 0
ORDER BY consecutive_errors DESC;
-- 3. Recent failures
SELECT start_time, stream_table, action, status, duration_ms, error_message
FROM pgtrickle.refresh_timeline(20)
WHERE status = 'FAILED';
Failure Scenarios
1. Scheduler Not Running
Symptoms:
- No stream tables are refreshing
health_check()reportsscheduler_running = false- No
pg_trickle schedulerprocess inpg_stat_activity
Diagnosis:
-- Check for the scheduler process
SELECT pid, datname, backend_type, state
FROM pg_stat_activity
WHERE backend_type = 'pg_trickle scheduler';
-- Check GUC
SHOW pg_trickle.enabled;
-- Check shared_preload_libraries
SHOW shared_preload_libraries;
Resolution:
| Cause | Fix |
|---|---|
pg_trickle.enabled = off | ALTER SYSTEM SET pg_trickle.enabled = on; SELECT pg_reload_conf(); |
Not in shared_preload_libraries | Add pg_trickle to shared_preload_libraries in postgresql.conf and restart PostgreSQL |
max_worker_processes exhausted | Increase max_worker_processes and restart. The launcher retries every 5 minutes — check PostgreSQL logs for WARNING: pg_trickle launcher: could not spawn scheduler |
| Scheduler crashed | Check PostgreSQL logs for crash details. The launcher will auto-restart it. If recurring, check for OOM or resource limits |
2. Stream Table Stuck in SUSPENDED Status
Symptoms:
- Stream table status shows
SUSPENDED consecutive_errorsis at or abovepg_trickle.max_consecutive_errors- No refreshes happening for this stream table
Diagnosis:
-- Check the stream table status
SELECT pgt_name, status, consecutive_errors, last_error_message
FROM pgtrickle.pg_stat_stream_tables
WHERE pgt_name = 'my_stream_table';
-- Get detailed error history
SELECT * FROM pgtrickle.diagnose_errors('my_stream_table');
Resolution:
- Fix the underlying error (check
last_error_messageanddiagnose_errors) - Resume the stream table:
SELECT pgtrickle.alter_stream_table('my_stream_table', enabled => true);
- Trigger a manual refresh to verify:
SELECT pgtrickle.refresh_stream_table('my_stream_table');
Prevention: Increase pg_trickle.max_consecutive_errors (default 3) if
transient errors are common in your environment:
ALTER SYSTEM SET pg_trickle.max_consecutive_errors = 5;
SELECT pg_reload_conf();
3. CDC Triggers Missing or Disabled
Symptoms:
- Stream table refreshes succeed but shows no changes
change_buffer_sizes()showspending_rows = 0despite active DML- Source tables have no pg_trickle triggers
Diagnosis:
-- Check trigger inventory
SELECT source_table, trigger_type, trigger_name, present, enabled
FROM pgtrickle.trigger_inventory()
WHERE NOT present OR NOT enabled;
-- Manual check on a specific source table
SELECT tgname, tgenabled
FROM pg_trigger
WHERE tgrelid = 'public.orders'::regclass
AND tgname LIKE 'pgt_%';
Resolution:
| Cause | Fix |
|---|---|
Triggers dropped by DDL (e.g., pg_dump + restore without triggers) | Drop and recreate the stream table, or reinitialize: SELECT pgtrickle.refresh_stream_table('my_st'); |
Triggers disabled (ALTER TABLE ... DISABLE TRIGGER) | ALTER TABLE source_table ENABLE TRIGGER ALL; |
| Source gating active | Check SELECT * FROM pgtrickle.source_gates(); and ungate: SELECT pgtrickle.ungate_source('source_table'); |
| WAL mode active but slot missing | See WAL Replication Slot Lag or Missing |
4. WAL Replication Slot Lag or Missing
Symptoms:
check_cdc_health()showsslot_lag_exceeds_thresholdorreplication_slot_missing- WAL disk usage growing unexpectedly
- Stream tables not receiving changes in WAL mode
Diagnosis:
-- Check CDC health
SELECT * FROM pgtrickle.check_cdc_health();
-- Check replication slots directly
SELECT slot_name, active, restart_lsn, confirmed_flush_lsn,
pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) AS lag
FROM pg_replication_slots
WHERE slot_name LIKE 'pgt_%';
Resolution:
| Cause | Fix |
|---|---|
| Slot dropped externally | pg_trickle will auto-fallback to trigger-based CDC. To recreate: drop and recreate the stream table |
| Slot lagging (WAL accumulation) | Check for long-running transactions: SELECT pid, age(backend_xmin) FROM pg_stat_replication;. Kill idle-in-transaction sessions |
wal_level != logical | WAL CDC requires wal_level = logical. Set it and restart PostgreSQL |
max_replication_slots exhausted | Increase max_replication_slots and restart |
Fallback: Force trigger-based CDC mode if WAL mode is problematic:
ALTER SYSTEM SET pg_trickle.cdc_mode = 'trigger';
SELECT pg_reload_conf();
5. Stream Table Stuck in INITIALIZING
Symptoms:
- Stream table status is
INITIALIZINGfor an extended period - The initial full refresh may have failed or is still running
Diagnosis:
-- Check refresh history
SELECT * FROM pgtrickle.get_refresh_history('my_st', 5);
-- Check for active refresh
SELECT pid, state, query, now() - query_start AS running_for
FROM pg_stat_activity
WHERE query LIKE '%pgtrickle%' AND state = 'active';
Resolution:
| Cause | Fix |
|---|---|
| Initial refresh failed (check error in history) | Fix the error, then: SELECT pgtrickle.refresh_stream_table('my_st'); |
| Defining query is very slow | Optimize the query, add indexes on source tables, or increase work_mem |
| Lock contention during initial refresh | See Lock Contention |
6. Change Buffers Growing Without Refresh
Symptoms:
change_buffer_sizes()shows largepending_rowsand growingbuffer_bytes- Stream tables are stale
- Refreshes are not running or are failing
Diagnosis:
-- Check buffer sizes
SELECT stream_table, source_table, pending_rows, buffer_bytes
FROM pgtrickle.change_buffer_sizes()
ORDER BY pending_rows DESC;
-- Check if refreshes are happening
SELECT * FROM pgtrickle.refresh_timeline(10);
-- Check for blocked refresh processes
SELECT pid, wait_event_type, wait_event, state, query
FROM pg_stat_activity
WHERE query LIKE '%pgtrickle%';
Resolution:
| Cause | Fix |
|---|---|
| Scheduler not running | See Scheduler Not Running |
| All refreshes failing | Check diagnose_errors() for each affected stream table |
| Lock contention | See Lock Contention |
| Very large buffer causing slow MERGE | Consider lowering pg_trickle.differential_change_ratio_threshold to trigger FULL refresh for large batches |
Emergency: If buffers are dangerously large and you need immediate relief:
-- Force a full refresh (bypasses change buffers)
SELECT pgtrickle.refresh_stream_table('my_st', force_full => true);
7. Lock Contention Blocking Refresh
Symptoms:
- Refresh duration is much longer than usual
pg_stat_activityshows refresh processes inLockwait state- Long-running transactions on source or stream tables
Diagnosis:
-- Find blocking locks
SELECT blocked.pid AS blocked_pid,
blocked.query AS blocked_query,
blocking.pid AS blocking_pid,
blocking.query AS blocking_query
FROM pg_stat_activity blocked
JOIN pg_locks bl ON bl.pid = blocked.pid AND NOT bl.granted
JOIN pg_locks gl ON gl.locktype = bl.locktype
AND gl.database IS NOT DISTINCT FROM bl.database
AND gl.relation IS NOT DISTINCT FROM bl.relation
AND gl.page IS NOT DISTINCT FROM bl.page
AND gl.tuple IS NOT DISTINCT FROM bl.tuple
AND gl.pid != bl.pid
AND gl.granted
JOIN pg_stat_activity blocking ON blocking.pid = gl.pid
WHERE blocked.query LIKE '%pgtrickle%';
Resolution:
- Identify and terminate the blocking session if appropriate:
SELECT pg_terminate_backend(<blocking_pid>); - Investigate why the blocking transaction is long-running (idle in transaction, slow query, etc.)
- Consider adding
statement_timeoutoridle_in_transaction_session_timeoutto prevent future occurrences
8. Out-of-Memory During Refresh
Symptoms:
- Refresh processes killed by OS OOM killer
- PostgreSQL logs show
out of memoryerrors - Stream tables fail with system-category errors
Diagnosis:
# Check OS OOM killer logs
dmesg | grep -i "oom\|killed process" | tail -20
# Check PostgreSQL logs for memory errors
grep -i "out of memory\|oom" /var/log/postgresql/postgresql-*.log | tail -10
-- Check which stream tables have large source data
SELECT stream_table, source_table, pending_rows
FROM pgtrickle.change_buffer_sizes()
ORDER BY pending_rows DESC;
Resolution:
| Cause | Fix |
|---|---|
| Large FULL refresh on big table | Reduce work_mem or maintenance_work_mem to limit per-query memory |
| Large change buffer accumulation | Refresh more frequently (shorter schedule) to keep buffers small |
| Complex query with many joins | Simplify the defining query or break into cascading stream tables |
| Parallel refresh amplifies memory | Reduce pg_trickle.max_concurrent_refreshes |
Tuning:
-- Limit per-refresh memory
SET work_mem = '64MB';
-- Limit concurrent refreshes to reduce peak memory
ALTER SYSTEM SET pg_trickle.max_concurrent_refreshes = 2;
SELECT pg_reload_conf();
9. Disk Full / WAL Retention Exceeded
Symptoms:
- PostgreSQL logs
No space left on deviceerrors - WAL directory consuming excessive disk
- Replication slots preventing WAL cleanup
Diagnosis:
# Check disk usage
df -h /var/lib/postgresql/data
du -sh /var/lib/postgresql/data/pg_wal/
-- Check replication slot WAL retention
SELECT slot_name, active,
pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) AS retained_wal
FROM pg_replication_slots
ORDER BY pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn) DESC;
-- Check change buffer table sizes
SELECT stream_table, source_table,
pg_size_pretty(buffer_bytes::bigint) AS buffer_size
FROM pgtrickle.change_buffer_sizes()
ORDER BY buffer_bytes DESC;
Resolution:
| Cause | Fix |
|---|---|
| Inactive replication slot holding WAL | Drop the slot: SELECT pg_drop_replication_slot('pgt_...'); |
| Change buffer tables too large | Force full refresh to clear buffers, or refresh more frequently |
| WAL accumulation from long transactions | Terminate idle-in-transaction sessions |
max_wal_size too low | Increase max_wal_size in postgresql.conf |
Emergency cleanup:
-- Drop inactive pg_trickle replication slots
SELECT pg_drop_replication_slot(slot_name)
FROM pg_replication_slots
WHERE slot_name LIKE 'pgt_%' AND NOT active;
10. Circular Pipeline Convergence Failure
Symptoms:
- Stream tables in a circular dependency hit the maximum iteration limit
- Refresh history shows repeated cycles without convergence
- Error messages mention
fixed_point_max_iterations
Diagnosis:
-- Check for circular dependencies
SELECT * FROM pgtrickle.dependency_tree();
-- Check refresh history for iteration patterns
SELECT start_time, stream_table, action, status, error_message
FROM pgtrickle.refresh_timeline(50)
WHERE stream_table IN ('st_a', 'st_b') -- suspected cycle members
ORDER BY start_time DESC;
Resolution:
- Verify the cycle is intentional (see Circular Dependencies tutorial)
- Increase the iteration limit if convergence is slow:
ALTER SYSTEM SET pg_trickle.fixed_point_max_iterations = 20; SELECT pg_reload_conf(); - If the cycle never converges, the defining queries may not be monotone. Restructure to eliminate the cycle or ensure monotonicity
11. Schema Change Broke Stream Table
Symptoms:
- Stream table has
needs_reinit = true - Reinitialization keeps failing
- Error messages reference dropped or renamed columns
Diagnosis:
-- Check for pending reinit
SELECT pgt_name, needs_reinit, status, last_error_message
FROM pgtrickle.pg_stat_stream_tables
WHERE needs_reinit;
-- Get error details
SELECT * FROM pgtrickle.diagnose_errors('my_st');
Resolution:
If the defining query is still valid after the DDL change, force a reinit:
SELECT pgtrickle.refresh_stream_table('my_st');
If the defining query needs to be updated:
-- Option 1: Alter the defining query
SELECT pgtrickle.alter_stream_table('my_st',
query => 'SELECT new_column, SUM(amount) FROM orders GROUP BY new_column'
);
-- Option 2: Drop and recreate
SELECT pgtrickle.drop_stream_table('my_st');
SELECT pgtrickle.create_stream_table(
'my_st',
'SELECT new_column, SUM(amount) FROM orders GROUP BY new_column',
'1m'
);
12. Worker Pool Exhaustion
Symptoms:
- Refresh latency increases across the board
- Some stream tables refresh while others queue indefinitely
worker_pool_status()shows all workers busy
Diagnosis:
-- Check worker pool
SELECT * FROM pgtrickle.worker_pool_status();
-- Check for long-running parallel jobs
SELECT job_id, unit_key, status, duration_ms
FROM pgtrickle.parallel_job_status(300)
WHERE status = 'RUNNING'
ORDER BY duration_ms DESC;
Resolution:
| Cause | Fix |
|---|---|
| Too few workers for workload | Increase pg_trickle.max_concurrent_refreshes and/or max_worker_processes |
| One stream table monopolizing workers | Check if a single slow refresh is blocking the pool. Consider splitting into smaller stream tables |
| Global worker cap reached | Increase pg_trickle.max_dynamic_refresh_workers |
13. Fuse Tripped (Circuit Breaker)
Symptoms:
- Stream table shows
fuse_state = 'BLOWN'or refresh is paused fuse_status()reports a tripped fuse- No refreshes happening despite active scheduler
Diagnosis:
-- Check fuse status
SELECT * FROM pgtrickle.fuse_status();
Resolution:
Reset the fuse after investigating the root cause:
SELECT pgtrickle.reset_fuse('my_stream_table');
See the Fuse Circuit Breaker tutorial for details on fuse thresholds and configuration.
14. Stream Table Appears Stuck Behind a Long Transaction
Symptoms:
- A stream table's
data_timestampis not advancing even though the source table is receiving new inserts. - The
pg_trickle_frontier_holdback_lsn_bytesPrometheus gauge is non-zero. - Server log contains:
pg_trickle: frontier holdback active — the oldest in-progress transaction is Ns old.
Cause:
frontier_holdback_mode = 'xmin' (the default) prevents the scheduler from
advancing the frontier while any in-progress transaction exists that is older
than the previous tick's xmin baseline. A long-running or forgotten session
holding an open transaction will pause frontier advancement for all stream
tables on that PostgreSQL server.
This is intentional: without the holdback, a transaction that inserts into a
tracked source table and commits after the scheduler ticks would have its
change permanently lost (see Issue #536 and plans/safety/PLAN_FRONTIER_VISIBILITY_HOLDBACK.md).
Diagnosis:
-- Find the oldest in-progress transaction
SELECT pid, usename, state, application_name,
backend_xmin,
EXTRACT(EPOCH FROM (now() - xact_start))::int AS xact_age_secs,
query
FROM pg_stat_activity
WHERE backend_xmin IS NOT NULL
AND state <> 'idle'
ORDER BY xact_start;
-- Check for prepared (2PC) transactions
SELECT gid, prepared,
EXTRACT(EPOCH FROM (now() - prepared))::int AS age_secs,
owner, database
FROM pg_prepared_xacts
ORDER BY prepared;
Resolution:
-
Identify and terminate the blocking session:
SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE state = 'idle in transaction' AND backend_xmin IS NOT NULL ORDER BY xact_start LIMIT 1; -
Rollback a forgotten 2PC transaction:
ROLLBACK PREPARED 'gid_from_pg_prepared_xacts'; -
For benchmark or known-safe workloads only, disable holdback to restore the pre-fix fast path (risks silent data loss):
ALTER SYSTEM SET pg_trickle.frontier_holdback_mode = 'none'; SELECT pg_reload_conf(); -
Suppress the warning (while keeping holdback active) by raising the threshold:
ALTER SYSTEM SET pg_trickle.frontier_holdback_warn_seconds = 300; SELECT pg_reload_conf(); -
On managed PostgreSQL (RDS, Cloud SQL, Aiven, etc.) where
pg_stat_activityis restricted to the current user's own sessions, the probe will silently see no other backends and never trigger a holdback. The server log will contain:pg_trickle: frontier holdback probe cannot see other PostgreSQL backends.Fix by granting the monitoring role to the pg_trickle service account:
GRANT pg_monitor TO <pg_trickle_service_role>;Then restart the pg_trickle scheduler (or reload PostgreSQL) so the new privilege takes effect.
15. Stale Data After High-Concurrency Writes (Sequence Cache Inversion)
Symptoms:
- A stream table consistently shows an outdated value for a row that is being updated frequently by concurrent sessions.
- The issue is reproducible under high write concurrency but not with a single writer.
pgtrickle.check_cdc_health()returns aCRITICAL: change buffer sequence(s) have CACHE > 1alert.
Cause:
Someone manually altered a change buffer sequence to use CACHE > 1 (e.g.
to reduce sequence LWLock contention). The change buffer BIGSERIAL sequence
must use CACHE 1 — this is a hard correctness invariant, not a
performance knob.
With CACHE > 1, PostgreSQL backends pre-allocate blocks of sequence
values. Two concurrent transactions modifying the same row can commit in an
order that inverts their pre-cached change_id values:
- Backend A caches
[16, 31]and starts updating rowid=1. - Backend B caches
[33, 64], updates the same row withchange_id=33, and commits first. - Backend A commits last (true final state), but its
change_id=16. - The compaction/delta pipeline uses
ORDER BY change_id DESCto find the final state → pickschange_id=33(Backend B's stale data).
Silent data corruption. See issue #536 for full analysis.
Diagnosis:
-- check_cdc_health() surfaces the problem:
SELECT source_table, cdc_mode, alert
FROM pgtrickle.check_cdc_health()
WHERE alert IS NOT NULL;
-- Directly inspect all change buffer sequences:
SELECT schemaname, sequencename, cache_size
FROM pg_sequences
WHERE schemaname = 'pgtrickle_changes'
AND (sequencename LIKE 'changes_%_change_id_seq'
OR sequencename LIKE 'changes_pgt_%_change_id_seq')
AND cache_size > 1;
Resolution:
-
Reset every affected sequence back to
CACHE 1:-- Replace <seq_name> with the sequencename from the query above. ALTER SEQUENCE pgtrickle_changes.<seq_name> CACHE 1; -
Verify the alert is gone:
SELECT alert FROM pgtrickle.check_cdc_health() WHERE alert IS NOT NULL; -- Should return zero rows. -
Do NOT increase CACHE to reduce LWLock contention. The only structural solution for high-concurrency
change_idcontention is to switch to the WAL/logical-decoding CDC backend, which uses commit-LSN ordering and has no sequence at all:SELECT pgtrickle.alter_stream_table('my_st', cdc_mode => 'wal');
General Diagnostic Workflow
When investigating any issue, follow this sequence:
1. health_check() → identify which subsystem is unhealthy
2. pgt_status() → find specific affected stream tables
3. diagnose_errors('name') → get root cause for failures
4. refresh_timeline(20) → correlate with recent refresh events
5. change_buffer_sizes() → check CDC pipeline health
6. trigger_inventory() → verify change capture is working
7. dependency_tree() → confirm DAG wiring
8. PostgreSQL logs → low-level crash/resource details
GUC Quick Reference for Troubleshooting
| GUC | Default | What to check |
|---|---|---|
pg_trickle.enabled | on | Must be on for scheduler to run |
pg_trickle.max_consecutive_errors | 3 | Stream tables suspend after this many failures |
pg_trickle.scheduler_interval_ms | 100 | Very high values cause refresh lag |
pg_trickle.cdc_mode | auto | trigger for reliable fallback |
pg_trickle.max_concurrent_refreshes | 4 | Per-database parallel refresh cap |
pg_trickle.fixed_point_max_iterations | 10 | Circular pipeline iteration limit |
pg_trickle.differential_change_ratio_threshold | 0.5 | Falls back to FULL above this ratio |
pg_trickle.auto_backoff | on | Stretches intervals up to 8x under load |
pg_trickle.frontier_holdback_mode | xmin | none disables holdback (unsafe); xmin = safe default |
pg_trickle.frontier_holdback_warn_seconds | 60 | Warn after holding back for this many seconds |
See also: Error Reference · Configuration · FAQ · Pre-Deployment Checklist · Capacity Planning · CDC Modes
pg_trickle Error Reference
This document lists all PgTrickleError variants with descriptions, common
causes, and suggested fixes. If you encounter an error not listed here, please
open an issue.
Tip: Most errors include context (table name, OID, or query fragment) in the message text. Use that context to narrow down the root cause.
SQLSTATE Code Reference
Every pg_trickle error includes a PostgreSQL SQLSTATE code for programmatic
error handling. Use SQLSTATE in PL/pgSQL EXCEPTION WHEN blocks or check
the error code in your client library.
| Error Variant | SQLSTATE | Code Name |
|---|---|---|
| QueryParseError | 42000 | SYNTAX_ERROR_OR_ACCESS_RULE_VIOLATION |
| TypeMismatch | 42804 | DATATYPE_MISMATCH |
| UnsupportedOperator | 0A000 | FEATURE_NOT_SUPPORTED |
| CycleDetected | 3F000 | INVALID_SCHEMA_DEFINITION |
| NotFound | 42P01 | UNDEFINED_TABLE |
| AlreadyExists | 42P07 | DUPLICATE_TABLE |
| InvalidArgument | 22023 | INVALID_PARAMETER_VALUE |
| QueryTooComplex | 54000 | PROGRAM_LIMIT_EXCEEDED |
| PermissionDenied | 42501 | INSUFFICIENT_PRIVILEGE |
| UpstreamTableDropped | 42P01 | UNDEFINED_TABLE |
| UpstreamSchemaChanged | 42P17 | INVALID_TABLE_DEFINITION |
| LockTimeout | 55P03 | LOCK_NOT_AVAILABLE |
| ReplicationSlotError | 55000 | OBJECT_NOT_IN_PREREQUISITE_STATE |
| WalTransitionError | 55000 | OBJECT_NOT_IN_PREREQUISITE_STATE |
| SpiError | XX000 | INTERNAL_ERROR |
| SpiErrorCode | XX000 | INTERNAL_ERROR (SQLSTATE preserved from original) |
| SpiPermissionError | 42501 | INSUFFICIENT_PRIVILEGE |
| WatermarkBackwardMovement | 22000 | DATA_EXCEPTION |
| WatermarkGroupNotFound | 42704 | UNDEFINED_OBJECT |
| WatermarkGroupAlreadyExists | 42710 | DUPLICATE_OBJECT |
| RefreshSkipped | 55000 | OBJECT_NOT_IN_PREREQUISITE_STATE |
| PublicationAlreadyExists | 42710 | DUPLICATE_OBJECT |
| PublicationNotFound | 42704 | UNDEFINED_OBJECT |
| SlaTooSmall | 22023 | INVALID_PARAMETER_VALUE |
| ChangedColsBitmaskFailed | XX000 | INTERNAL_ERROR |
| PublicationRebuildFailed | XX000 | INTERNAL_ERROR |
| DiagnosticError | XX000 | INTERNAL_ERROR |
| SnapshotAlreadyExists | 42710 | DUPLICATE_OBJECT |
| SnapshotSourceNotFound | 42P01 | UNDEFINED_TABLE |
| SnapshotSchemaVersionMismatch | 42P17 | INVALID_TABLE_DEFINITION |
| OutboxAlreadyEnabled | 42710 | DUPLICATE_OBJECT |
| OutboxNotEnabled | 42704 | UNDEFINED_OBJECT |
| PgTideMissing | 0A000 | FEATURE_NOT_SUPPORTED |
| UnresolvedPlaceholder | XX000 | INTERNAL_ERROR |
| DiffDepthExceeded | 54000 | PROGRAM_LIMIT_EXCEEDED |
| DiffCteCountExceeded | 54000 | PROGRAM_LIMIT_EXCEEDED |
| StSourceFrontierMissing | 42P01 | UNDEFINED_TABLE |
| InternalError | XX000 | INTERNAL_ERROR |
Error Categories
pg_trickle classifies errors into four categories that determine retry behavior:
| Category | Retried by scheduler? | Description |
|---|---|---|
| User | No | Invalid queries, type mismatches, DAG cycles. Fix the input. |
| Schema | No (triggers reinitialize) | Upstream DDL changes. The stream table is reinitialized automatically. |
| System | Yes (with backoff) | Lock timeouts, replication slot problems, transient SPI failures. |
| Internal | No | Unexpected bugs. Please report these. |
User Errors
QueryParseError
Message: query parse error: <details>
Description: The defining query could not be parsed or validated by the pg_trickle query analyzer.
Common causes:
- Syntax error in the defining query
- Use of PostgreSQL syntax not yet supported by pgrx's query parser
- A CTE or subquery that cannot be analyzed
Suggested fix: Simplify the query. Check that it runs as a standalone
SELECT statement. Review SQL Reference — Expression Support
for supported syntax.
TypeMismatch
Message: type mismatch: <details>
Description: A type incompatibility was detected between the defining query output and the stream table schema, or between source columns and expected types.
Common causes:
- Column type changed on a source table after stream table creation
- Explicit cast to an incompatible type in the defining query
UNIONbranches with mismatched column types
Suggested fix: Ensure column types match. Use explicit CAST() to align
types if needed. If the source table changed, use
pgtrickle.repair_stream_table() to reinitialize.
UnsupportedOperator
Message: unsupported operator for DIFFERENTIAL mode: <operator>
Description: The defining query uses an SQL operator or construct that pg_trickle cannot maintain incrementally.
Common causes:
TABLESAMPLE,GROUPING SETSbeyond the branch limit, recursive CTEs with unsupported patterns, certain window function combinations- Non-monotonic or volatile functions in positions that prevent differential maintenance
Suggested fix: Use refresh_mode => 'FULL' to fall back to full
recomputation:
SELECT pgtrickle.alter_stream_table('my_stream_table',
refresh_mode => 'FULL');
Or restructure the query to avoid the unsupported construct. See SQL Reference — Expression Support.
CycleDetected
Message: cycle detected in dependency graph: A -> B -> C -> A
Description: Adding or altering this stream table would create a circular dependency in the refresh DAG.
Common causes:
- Stream table A depends on stream table B, which depends on A
- Indirect cycles through chains of stream tables
Suggested fix: Restructure the stream table definitions to break the cycle.
Use pgtrickle.get_dependency_graph() to visualize the current DAG. If
circular dependencies are intentional, enable pg_trickle.allow_circular = true
(see Configuration).
NotFound
Message: stream table not found: <name>
Description: The specified stream table does not exist in the
pgtrickle.pgt_stream_tables catalog.
Common causes:
- Typo in the stream table name
- The stream table was already dropped
- Schema-qualified name required but not provided (e.g.,
myschema.my_st)
Suggested fix: Check the name with pgtrickle.list_stream_tables(). Use
the fully qualified name: schema.table_name.
AlreadyExists
Message: stream table already exists: <name>
Description: A create_stream_table() call was made for a stream table
name that is already registered.
Common causes:
- Re-running a migration or DDL script without
IF NOT EXISTS
Suggested fix: Use pgtrickle.create_stream_table_if_not_exists() or
pgtrickle.create_or_replace_stream_table() for idempotent creation.
InvalidArgument
Message: invalid argument: <details>
Description: An invalid value was passed to a pg_trickle API function.
Common causes:
- Invalid
refresh_modevalue (must be'DIFFERENTIAL','FULL', or'AUTO') - Calling
resume_stream_table()on a table that is not suspended - Invalid schedule interval or threshold value
- Empty or malformed table name
Suggested fix: Check the function signature in the SQL Reference and correct the argument.
QueryTooComplex
Message: query too complex: <details>
Description: The defining query exceeds the maximum parse depth, which protects against stack overflow during query analysis.
Common causes:
- Deeply nested subqueries (> 64 levels by default)
- Large
UNION ALLchains - Complex CTE hierarchies
Suggested fix: Simplify the query. If the depth limit is too restrictive,
increase pg_trickle.max_parse_depth (default: 64). See
Configuration.
PermissionDenied
Message: permission denied: <details>
Description: The current role does not own the stream table's storage table or lacks the necessary PostgreSQL privileges (SEC-1).
Common causes:
- Calling
alter_stream_table()ordrop_stream_table()as a role that does not own the underlying storage table - Using
SECURITY DEFINERfunctions that change the effective role
Suggested fix: Run the operation as the table owner, or grant ownership
with ALTER TABLE ... OWNER TO <role>.
UpstreamTableDropped
Message: upstream table dropped: OID <oid>
Description: A source table referenced by the stream table's defining query was dropped.
Common causes:
DROP TABLEon a source table- Table replaced via
DROP+CREATE(new OID)
Suggested fix: Either recreate the source table with the same schema or
drop the stream table and recreate it. If pg_trickle.block_source_ddl = true
(default), the DROP would have been blocked in the first place.
UpstreamSchemaChanged
Message: upstream table schema changed: OID <oid>
Description: A source table's schema was altered (e.g., column added, dropped, or type changed) in a way that affects the defining query.
Common causes:
ALTER TABLE ... ADD/DROP/ALTER COLUMNon a source table- Type change on a column used in the defining query
Suggested fix: The stream table will be automatically reinitialized on the
next scheduler tick. If pg_trickle.block_source_ddl = true (default), most
schema changes are blocked proactively. Use
pgtrickle.alter_stream_table(..., query => '...') to update the defining
query if needed.
System Errors
LockTimeout
Message: lock timeout: <details>
Description: A lock required for refresh could not be acquired within the configured timeout.
Common causes:
- Long-running transactions holding locks on the stream table or source tables
- Concurrent
ALTER TABLEorVACUUM FULLoperations - High contention on the change buffer tables
Suggested fix: This error is automatically retried with exponential backoff.
If persistent, investigate long-running transactions with pg_stat_activity.
Consider increasing lock_timeout or reducing refresh frequency.
ReplicationSlotError
Message: replication slot error: <details>
Description: An error occurred with the logical replication slot used for WAL-based CDC.
Common causes:
- Replication slot dropped externally
wal_levelchanged fromlogicalto a lower level- Slot lag exceeded
max_slot_wal_keep_size
Suggested fix: Check replication slot status with
SELECT * FROM pg_replication_slots. Ensure wal_level = logical. If the slot
was dropped, pg_trickle will recreate it automatically. See
Configuration — WAL CDC.
WalTransitionError
Message: WAL transition error: <details>
Description: An error occurred during the transition from trigger-based CDC to WAL-based CDC.
Common causes:
wal_levelis notlogicalwhencdc_mode = 'auto'- Transient connection issues during the transition
Suggested fix: Ensure wal_level = logical in postgresql.conf if you
want WAL-based CDC. Otherwise set pg_trickle.cdc_mode = 'trigger' to stay
on trigger-based CDC. This error is retried automatically.
SpiError
Message: SPI error: <details>
Description: A PostgreSQL Server Programming Interface (SPI) error occurred during an internal query.
Common causes:
- Transient serialization failures under high concurrency
- Deadlocks between refresh and concurrent DML
- Connection issues in background workers
- Permanent errors: missing columns, syntax errors in generated SQL
Suggested fix: Transient SPI errors (deadlocks, serialization failures) are
retried automatically. Permanent errors (permission denied, missing objects)
will suspend the stream table after max_consecutive_errors failures. Check
pgtrickle.check_health() for details.
SpiErrorCode
Message: SPI error [<sqlstate_code>]: <details>
Description: A PostgreSQL SPI error where the original SQLSTATE code has
been preserved for programmatic classification (SCAL-1, v0.30.0). Used when
pg_trickle.use_sqlstate_classification = true, which is the default.
Common causes: Same as SpiError above. The difference is that this
variant carries the 5-character SQLSTATE code for locale-safe retry decisions
rather than relying on English message pattern matching.
Suggested fix: Inspect the SQLSTATE code. 40001 (serialization failure)
and 40P01 (deadlock) are retried automatically. 42xxx (privilege/schema
errors) will suspend the stream table.
SpiPermissionError
Message: SPI permission error: <details>
Description: The background worker's role lacks required permissions.
Common causes:
- Missing
SELECTprivilege on a source table - Missing
INSERT/UPDATE/DELETEprivilege on the stream table - Role used by the background worker is not the table owner
Suggested fix: Grant the necessary privileges to the role running pg_trickle's background workers:
GRANT SELECT ON source_table TO pgtrickle_role;
GRANT ALL ON pgtrickle.my_stream_table TO pgtrickle_role;
This error does not count toward the consecutive error suspension limit.
Watermark Errors
WatermarkBackwardMovement
Message: watermark moved backward: <details>
Description: A watermark advancement was rejected because the new value is older than the current watermark, violating monotonicity.
Common causes:
- Clock skew in distributed systems
- Manual watermark manipulation with an incorrect value
- Bug in watermark tracking logic
Suggested fix: Ensure watermark values are monotonically increasing. Check
the current watermark with pgtrickle.get_watermark_groups().
WatermarkGroupNotFound
Message: watermark group not found: <details>
Description: The specified watermark group does not exist.
Common causes:
- Typo in the watermark group name
- The group was deleted or never created
Suggested fix: List existing groups with
pgtrickle.get_watermark_groups().
WatermarkGroupAlreadyExists
Message: watermark group already exists: <details>
Description: A watermark group with this name already exists.
Common causes:
- Re-running a setup script without idempotent guards
Suggested fix: Use a different name or delete the existing group first.
Transient Errors
RefreshSkipped
Message: refresh skipped: <details>
Description: A refresh was skipped because a previous refresh for the same stream table is still running.
Common causes:
- Slow refresh (large delta or complex query) overlapping with the next scheduled cycle
- Multiple manual
refresh_stream_table()calls in parallel
Suggested fix: No action needed — the scheduler will retry on the next
cycle. If this happens frequently, increase the schedule interval or
investigate why refreshes are slow using pgtrickle.explain_st().
This error does not count toward the consecutive error suspension limit.
Publication Errors
PublicationAlreadyExists
Message: publication already exists for stream table: <name>
Description: pgtrickle.create_publication() was called for a stream table
that already has a downstream publication registered.
Common causes:
- Re-running publication setup without
IF NOT EXISTS - Concurrent setup in multi-process deployments
Suggested fix: Use pgtrickle.drop_publication() first if you want to
recreate it, or check the existing publication with
SELECT * FROM pgtrickle.pgt_stream_tables WHERE outbox_enabled = true.
PublicationNotFound
Message: no publication found for stream table: <name>
Description: A publication management call (e.g., drop_publication())
was made for a stream table that does not have a downstream publication.
Common causes:
- Calling
drop_publication()on a stream table that never had one - The publication was already dropped
Suggested fix: Check if the stream table has a publication before dropping
it. Use pgtrickle.list_publications() to see active publications.
SLA Errors
SlaTooSmall
Message: SLA interval too small for available tiers: <details>
Description: The requested SLA interval is smaller than the fastest available scheduling tier.
Common causes:
- Specifying a sub-second SLA (e.g.,
sla_seconds => 0.1) when the minimum schedule is 1 second (pg_trickle.min_schedule_seconds) - No available tier can satisfy the requested latency budget
Suggested fix: Lower pg_trickle.min_schedule_seconds if the cluster
supports faster scheduling, or set a larger SLA interval. Check available
tiers with pgtrickle.list_sla_tiers().
CDC Errors
ChangedColsBitmaskFailed
Message: failed to build changed-columns bitmask: <details>
Description: CDC-1 (v0.24.0): The columnar change tracking system could not build the bitmask expression for changed-column detection. This indicates a table structure that prevents column-level CDC tracking.
Common causes:
- All columns of the source table are part of the primary key (no non-key columns to track changes for)
- Very wide tables exceeding the bitmask width limit
- Schema edge cases with generated or system columns
Suggested fix: Switch to whole-row change tracking with
pg_trickle.columnar_cdc = false, or restructure the source table to have
at least one non-primary-key column.
PublicationRebuildFailed
Message: publication rebuild failed: <details>
Description: CDC-2 (v0.24.0): The logical replication publication could not be rebuilt for a partitioned source table after a partition attach/detach or schema change.
Common causes:
- Insufficient privileges to manage publications
- The publication slot was already dropped
- Partition schema changed concurrently during the rebuild
Suggested fix: Ensure the pg_trickle background worker role has
CREATE PUBLICATION privilege. Check publication status with
SELECT * FROM pg_publication. If the issue persists, call
pgtrickle.reinitialize_stream_table() to force a clean rebuild.
Diagnostic Errors
DiagnosticError
Message: diagnostic error: <details>
Description: ERR-1 (v0.26.0): An error occurred inside a diagnostic or
monitoring function such as explain_refresh_mode(), source_gates(), or
watermarks(). These surface as user-visible PostgreSQL errors with context.
Common causes:
- The stream table was dropped between the diagnostic call and its execution
- Missing privileges on the internal catalog tables
- An internal consistency check found unexpected state
Suggested fix: Verify the stream table still exists. Check that the
calling role has access to pgtrickle.* catalog tables. If the error
message says "stream table not found", the table may have been dropped or
its catalog entry is corrupted — use pgtrickle.repair_stream_table().
Snapshot Errors
SnapshotAlreadyExists
Message: snapshot already exists: <name>
Description: SNAP-1 (v0.27.0): A snapshot with the given target name already exists in the snapshot catalog.
Common causes:
- Re-running snapshot creation without checking for existing snapshots
- Concurrent snapshot creation with the same name
Suggested fix: Use a unique name for each snapshot, or drop the existing
snapshot with pgtrickle.drop_snapshot() before creating a new one.
SnapshotSourceNotFound
Message: snapshot source not found: <name>
Description: SNAP-2 (v0.27.0): The stream table specified as the snapshot source was not found in the catalog.
Common causes:
- Typo in the stream table name
- The stream table was dropped before the snapshot was taken
Suggested fix: Verify the stream table name with
pgtrickle.list_stream_tables().
SnapshotSchemaVersionMismatch
Message: snapshot schema version mismatch: <details>
Description: SNAP-3 (v0.27.0): The snapshot's schema version does not match the current extension version, indicating the snapshot was taken with a different version of pg_trickle.
Common causes:
- Upgrading pg_trickle after taking a snapshot
- Restoring a snapshot from a different version of the extension
Suggested fix: Re-create the snapshot after the upgrade. Old snapshots cannot be used across major version boundaries. See Backup & Restore for migration guidance.
Outbox / pg_tide Errors
OutboxAlreadyEnabled
Message: outbox already attached for stream table: <name>
Description: v0.46.0: pgtrickle.attach_outbox() was called for a stream
table that already has an outbox registered via the pg_tide integration.
Common causes:
- Re-running outbox attachment without checking for existing configuration
- Concurrent calls to
attach_outbox()for the same stream table
Suggested fix: Check for existing outbox configuration with
SELECT * FROM pgtrickle.pgt_stream_tables WHERE outbox_enabled = true.
Use pgtrickle.detach_outbox() if you need to reconfigure.
OutboxNotEnabled
Message: outbox not attached for stream table: <name>
Description: v0.46.0: An outbox management operation was called on a stream
table that does not have a pg_tide outbox attached.
Common causes:
- Calling
detach_outbox()or outbox-related functions on a stream table that never had an outbox configured
Suggested fix: Call pgtrickle.attach_outbox() first, or verify the
stream table name.
PgTideMissing
Message: attach_outbox() requires the pg_tide extension. Install it with: CREATE EXTENSION pg_tide;
Description: v0.46.0: The pg_tide extension is not installed in the
current database. The outbox/inbox functionality requires pg_tide to be
present.
Common causes:
- Calling
pgtrickle.attach_outbox()before installingpg_tide - The extension was dropped after outbox configuration
Suggested fix:
CREATE EXTENSION pg_tide;
See pg_tide on GitHub for installation instructions.
Placeholder Errors
UnresolvedPlaceholder
Message: unresolved placeholder '<token>' in SQL for <context>
Description: A41-2: A delta SQL template still contains an unresolved
__PGS_*__ or __PGT_*__ placeholder token after all substitution passes
have completed. Executing SQL with a raw token would cause an obscure
PostgreSQL syntax error; this error is raised early to give a clear,
actionable message.
Common causes:
- A source table OID or stream table ID that is referenced in the query but not present in the current refresh frontier
- A bug in the delta SQL template generation where a new placeholder type was introduced but not registered in the substitution map
- An upstream stream table was dropped while the delta SQL was cached
Suggested fix: Reinitialize the affected stream table with
pgtrickle.reinitialize_stream_table() to force a fresh template generation.
If this persists, please report the issue.
DVM Engine Errors
DiffDepthExceeded
Message: differential query depth exceeded limit of <N> levels; reduce query nesting or raise pg_trickle.max_parse_depth
Description: C-7 (v0.54.0): The diff_node() recursion depth exceeded
the configured limit during differential query generation. This prevents stack
overflows on pathologically deeply-nested queries.
Common causes:
- Queries with more than
pg_trickle.max_parse_depthlevels of nested subqueries, CTEs, or operator trees - Highly chained view references that expand into deep nesting
Suggested fix: Simplify the query by reducing nesting depth. If the query
is legitimately deep, raise pg_trickle.max_parse_depth:
SET pg_trickle.max_parse_depth = 128;
Alternatively, use refresh_mode => 'FULL' to bypass the differential engine.
DiffCteCountExceeded
Message: differential query CTE count exceeded limit of <N>; simplify the query or raise pg_trickle.max_diff_ctes
Description: R-7 (v0.54.0): The number of CTEs generated during
differentiation exceeded the configured limit (pg_trickle.max_diff_ctes).
This prevents unbounded memory growth for queries that produce thousands of
intermediate CTEs.
Common causes:
- Multi-source queries with many join paths where each path generates independent delta CTEs
- Queries with many aggregation levels each requiring separate delta expressions
Suggested fix: Simplify the query or raise the CTE limit:
SET pg_trickle.max_diff_ctes = 500;
Alternatively, use refresh_mode => 'FULL'.
StSourceFrontierMissing
Message: upstream stream table (pgt_id=<id>) not found in refresh frontier; the source stream table may have been dropped — call pgtrickle.reinitialize_stream_table() to recover
Description: C-4 (v0.54.0): A stream-table-to-stream-table source frontier entry is missing from the refresh frontier, indicating the upstream stream table was dropped while a downstream stream table still references it.
Common causes:
- An upstream stream table was dropped directly (bypassing the dependency check) while a downstream stream table's delta SQL still references it
- Database restored from backup at a point before the upstream ST was recreated
Suggested fix:
SELECT pgtrickle.reinitialize_stream_table('downstream_stream_table');
If the upstream stream table was intentionally removed, drop and recreate the downstream one with an updated defining query.
Internal Errors
InternalError
Message: internal error: <details>
Description: An unexpected internal error that indicates a bug in pg_trickle.
Common causes:
- This should not happen in normal operation
Suggested fix: Please report the issue
with the full error message, your PostgreSQL version, and pg_trickle version.
Include the output of pgtrickle.check_health() and the relevant PostgreSQL
log entries.
v0.23.0 — DVM Scaling Errors
change_buffer_overflow Alert
Alert: pg_trickle_alert change_buffer_overflow
Description: A source table's change buffer exceeded the
pg_trickle.max_change_buffer_alert_rows threshold during refresh.
Common causes:
- High write rate on source tables
- Slow or blocked refresh cycles
- WAL accumulation during cross-query consistency checks
Suggested fix:
- Increase
pg_trickle.max_change_buffer_alert_rowsif the write rate is expected - Check for long-running transactions blocking the refresh
- Consider increasing refresh frequency or using FULL mode for affected tables
DIFF-Slower-Than-FULL Warning
Warning: [pg_trickle] DIFF refresh for <table> took Xms vs last FULL Yms — DIFF is Nx slower
Description: Emitted when pg_trickle.log_delta_sql = on and a DIFF
refresh takes longer than the last recorded FULL refresh.
Common causes:
- Query complexity exceeds the DVM engine's O(Δ) capacity (see PERFORMANCE_COOKBOOK.md §13)
- Stale planner statistics on change buffer tables
- work_mem too low for hash joins in the delta SQL
Suggested fix:
- Check the delta SQL via
pgtrickle.explain_diff_sql('<table>') - Increase
pg_trickle.delta_work_memfor the affected database - Switch to AUTO or FULL mode for queries with known threshold-collapse patterns
See Also
Citus Distributed Tables
pg_trickle supports Citus distributed tables as sources for incremental view maintenance and as output targets for stream tables. Once configured, distribution is mostly invisible: you create stream tables exactly as you would on single-node PostgreSQL, and pg_trickle handles per-worker change capture and merging on the coordinator.
Available since v0.32.0 (sources, output targets); the fully automated per-worker scheduler arrived in v0.34.0.
This page is the canonical entry point for Citus support. The long-form reference (worker-slot lifecycle, troubleshooting, and internal architecture) lives at integrations/citus.md.
What you get
- Distributed sources. Define a stream table whose source is a
Citus-distributed table. pg_trickle creates a logical replication
slot on each worker, polls all slots from the coordinator via
dblink, and merges the changes into the stream table's storage. - Distributed output. Pass
output_distribution_columntocreate_stream_table()and the resulting stream table is itself a Citus distributed table, co-located with your source shards. - Automated scheduler. Since v0.34, the per-worker slot lifecycle
(
ensure_worker_slot,poll_worker_slot_changes, lease management) runs automatically — no manual wiring required. - Shard-rebalance auto-recovery. Topology changes detected by
comparing
pg_dist_nodeagainstpgt_worker_slots; stale slots are pruned and new ones inserted without operator intervention. - Worker failure isolation. Per-worker poll failures are logged
and skipped; healthy workers keep running. After
pg_trickle.citus_worker_retry_ticks(default 5) consecutive failures, aWARNINGis raised.
Prerequisites
- PostgreSQL 17 or 18 with
wal_level = logicalon every node (coordinator and workers). - Citus 12.x or 13.x on the coordinator and all workers.
- The
dblinkextension on the coordinator. - pg_trickle installed at the same version on every node.
- Each source distributed table must have
REPLICA IDENTITY FULL.
Quickstart
1. Verify prerequisites
-- Run on coordinator AND each worker:
SHOW wal_level; -- must be 'logical'
SELECT extname, extversion FROM pg_extension
WHERE extname IN ('citus', 'pg_trickle', 'dblink');
2. Create extensions on the coordinator
CREATE EXTENSION IF NOT EXISTS dblink;
CREATE EXTENSION IF NOT EXISTS pg_trickle;
3. Prepare a distributed source table
-- Distribute (or co-locate) the source
SELECT create_distributed_table('orders', 'customer_id');
-- Required for logical decoding to capture old values on UPDATE / DELETE
ALTER TABLE orders REPLICA IDENTITY FULL;
4. Create a stream table over distributed sources
SELECT pgtrickle.create_stream_table(
'order_totals',
$$SELECT customer_id, SUM(amount) AS total
FROM orders GROUP BY customer_id$$,
schedule => '5s'
);
That is it on the user side. pg_trickle:
- Detects that
ordersis distributed. - Creates a per-worker logical replication slot.
- Records each slot in
pgtrickle.pgt_worker_slots. - Polls every slot on each scheduler tick via
dblink. - Merges decoded changes into the coordinator-local change buffer.
- Applies the delta to the stream table.
5. (Optional) make the output distributed too
SELECT pgtrickle.create_stream_table(
'order_totals',
$$SELECT customer_id, SUM(amount) AS total
FROM orders GROUP BY customer_id$$,
schedule => '5s',
output_distribution_column => 'customer_id'
);
The result table is now itself distributed on customer_id and
co-located with the source shards.
Observability
| Helper | Purpose |
|---|---|
SELECT * FROM pgtrickle.citus_status; | Per-worker slot summary |
SELECT * FROM pgtrickle.pgt_worker_slots; | Raw slot catalogue |
SELECT * FROM pgtrickle.check_cdc_health(); | WAL slot health (lag, status) |
SELECT * FROM pgtrickle.health_check(); | Whole-extension triage |
Caveats
- DDL on distributed sources is more involved than on local tables; see the long-form guide.
- Foreign keys across shards are restricted by Citus, not by pg_trickle.
- Co-location: if your stream table joins distributed tables, the join columns must be the distribution columns (a Citus requirement).
See also:
Long-form Citus reference (worker slots, lifecycle, internals) ·
CDC Modes ·
Configuration – pg_trickle.citus_* ·
CloudNativePG integration
CDC Modes
pg_trickle captures changes from source tables using Change Data Capture (CDC). Two mechanisms are available: row-level triggers and WAL-based logical decoding. Understanding both helps you choose the right setting for your workload.
Quick decision guide
| Situation | Recommended mode |
|---|---|
| Just getting started / unsure | auto (default) — triggers now, upgrades to WAL automatically |
| High-write tables where trigger overhead matters | auto or wal |
wal_level = logical not available (managed PG, read replica) | trigger |
| You want strict control — no automatic transitions | trigger or wal |
| Per-table override (e.g. one hot table on WAL, rest on triggers) | Pass cdc_mode to create_stream_table |
How trigger-based CDC works
When you create a stream table, pg_trickle installs three AFTER row-level
triggers on every source table:
AFTER INSERT OR UPDATE OR DELETE FOR EACH ROW
Each trigger fires synchronously within the user's transaction and writes
one row per changed row to a buffer table (pgtrickle_changes.changes_<oid>).
The buffer row is in the same transaction as the user's change — if the
transaction rolls back, the buffer row also disappears.
User transaction:
INSERT INTO orders …
→ trigger fires
→ INSERT INTO pgtrickle_changes.changes_12345 (op, row_data)
COMMIT
│
▼
Scheduler picks up buffer rows → computes delta → refreshes stream table
Write-side cost: approximately 2–15 µs per changed row, depending on row width and table size. This is added directly to the user transaction's commit latency.
How WAL-based CDC works
WAL-based CDC uses PostgreSQL's built-in logical decoding to capture changes asynchronously from the write-ahead log, eliminating trigger overhead entirely.
User transaction:
INSERT INTO orders …
COMMIT (no trigger overhead)
│
▼
WAL written to disk
│
▼
pg_trickle WAL decoder background worker
calls pg_logical_slot_get_changes()
│
▼
Decoded changes written to pgtrickle_changes.changes_<oid>
│
▼
Scheduler refreshes stream table
The change capture is decoupled from the user transaction. Users see no added latency on commits.
Trade-off: WAL decoding introduces a small additional replication lag (typically < 1 second). Changes committed by the user are visible to the stream table slightly later than with triggers.
Prerequisites for WAL-based CDC
wal_level = logicalinpostgresql.conf- Sufficient replication slots:
max_replication_slots ≥ (number of tracked source tables) + existing slots - Source table has
REPLICA IDENTITY DEFAULT(primary key) orREPLICA IDENTITY FULL - PostgreSQL 18.x (required for the pg_trickle extension)
The auto mode: transparent transition
The default cdc_mode = 'auto' starts with triggers and automatically upgrades
to WAL-based CDC when the prerequisites are met.
TRIGGER ──► TRANSITIONING ──► WAL
▲ │
└───────── (fallback) ──────┘
Transition lifecycle
- TRIGGER — pg_trickle installs row-level triggers on the source table.
- When
wal_level = logicalbecomes available, pg_trickle starts the transition:- Creates a publication (
pgtrickle_cdc_<oid>) and replication slot (pgtrickle_<oid>) - Sets the source's CDC state to TRANSITIONING
- Both the trigger and WAL decoder write to the buffer (deduplication happens at refresh)
- Creates a publication (
- WAL — once the WAL decoder confirms it has caught up, the trigger is dropped.
- Fallback — if the transition times out or errors (e.g.
wal_levelreverts toreplica), the slot and publication are dropped and CDC reverts to triggers.
The transition is transparent — stream tables remain current throughout and there is no window of data loss.
Configuring CDC mode
Global setting
In postgresql.conf:
pg_trickle.cdc_mode = 'auto' # default: start with triggers, upgrade to WAL
pg_trickle.cdc_mode = 'trigger' # always use triggers; never create replication slots
pg_trickle.cdc_mode = 'wal' # require WAL; error if wal_level != logical
Apply without restart:
ALTER SYSTEM SET pg_trickle.cdc_mode = 'auto';
SELECT pg_reload_conf();
Per-stream-table override
Override for a single stream table at creation time:
SELECT pgtrickle.create_stream_table(
'public.order_totals',
$$SELECT customer_id, SUM(amount) AS total FROM orders GROUP BY customer_id$$,
cdc_mode => 'wal' -- force WAL for this table's sources only
);
Or after the fact:
SELECT pgtrickle.alter_stream_table('public.order_totals', p_cdc_mode => 'trigger');
The per-table override is stored in pgtrickle.pgt_stream_tables.requested_cdc_mode
and takes precedence over the global GUC.
Checking the current CDC mode
-- Per-stream-table CDC state for all sources
SELECT source_table, cdc_mode, pending_rows, buffer_bytes
FROM pgtrickle.change_buffer_sizes()
ORDER BY source_table;
-- Full health check including WAL lag
SELECT * FROM pgtrickle.check_cdc_health();
-- Which triggers are installed
SELECT source_table, trigger_type, trigger_name, present, enabled
FROM pgtrickle.trigger_inventory()
ORDER BY source_table;
check_cdc_health() returns one row per source table with:
| Column | Description |
|---|---|
source_table | Qualified source table name |
cdc_mode | Current effective mode: trigger, wal, or transitioning |
slot_lag_bytes | WAL slot lag (NULL for trigger mode) |
slot_lag_warn | true if lag exceeds publication_lag_warn_bytes |
alert | Human-readable status / warning message |
Enabling WAL-based CDC
If you start with cdc_mode = 'trigger' and later want to switch to WAL:
Step 1 — Configure PostgreSQL
# postgresql.conf
wal_level = logical
max_replication_slots = 20 # allow enough slots for all tracked sources
Requires a PostgreSQL restart:
pg_ctl restart -D $PGDATA
Step 2 — Set the GUC
ALTER SYSTEM SET pg_trickle.cdc_mode = 'auto';
SELECT pg_reload_conf();
pg_trickle will automatically begin transitioning existing stream tables to WAL-based CDC over the next few scheduler ticks. No manual intervention is needed per stream table.
Step 3 — Monitor the transition
SELECT source_table, cdc_mode FROM pgtrickle.check_cdc_health();
Tables will cycle through trigger → transitioning → wal over the next
1–2 minutes depending on write volume.
Reverting to trigger-based CDC
To revert globally:
ALTER SYSTEM SET pg_trickle.cdc_mode = 'trigger';
SELECT pg_reload_conf();
pg_trickle will drop all CDC replication slots and publications on the next scheduler tick and reinstall row-level triggers. Stream tables remain current throughout — the transition is safe.
To revert a single table:
SELECT pgtrickle.alter_stream_table('public.order_totals', p_cdc_mode => 'trigger');
Trigger mode details
Statement-level vs. row-level triggers
By default, pg_trickle uses row-level AFTER triggers. On high-volume bulk
inserts (e.g. INSERT INTO orders SELECT … FROM staging), row-level triggers
fire once per row. You can switch to statement-level triggers to reduce
overhead at the cost of coarser change capture:
pg_trickle.cdc_trigger_mode = 'statement' # default: 'row'
Note: cdc_trigger_mode is ignored when WAL-based CDC is active.
REPLICA IDENTITY and triggers
Trigger-based CDC captures the full NEW and OLD row. For DELETE and
UPDATE to capture the old row values, the source table needs a primary key
or REPLICA IDENTITY FULL. Without a primary key, pg_trickle detects this
and may fall back to full refresh for affected stream tables.
WAL mode details
Replication slot naming
Each tracked source table gets its own replication slot:
pgtrickle_<source_table_oid>
And a publication:
pgtrickle_cdc_<source_table_oid>
These are internal to pg_trickle and should not be modified manually.
Slot lag management
If a subscriber (or pg_trickle itself) falls behind, the replication slot holds WAL on disk until it is consumed. This can grow unboundedly if pg_trickle is stopped for an extended period.
pg_trickle monitors slot lag and warns when it exceeds
pg_trickle.publication_lag_warn_bytes (default: 64 MB). In auto mode,
change-buffer cleanup is paused for lagging slots to prevent data loss.
If a slot grows dangerously large while pg_trickle is down, you can drop and recreate it:
-- 1. Temporarily switch to trigger mode
ALTER SYSTEM SET pg_trickle.cdc_mode = 'trigger';
SELECT pg_reload_conf();
-- 2. Manually drop the stale slot if needed
SELECT pg_drop_replication_slot('pgtrickle_12345');
-- 3. Switch back to auto (pg_trickle recreates the slot)
ALTER SYSTEM SET pg_trickle.cdc_mode = 'auto';
SELECT pg_reload_conf();
Partitioned source tables
WAL-based CDC for partitioned tables uses publish_via_partition_root = true
so that child partition changes are published under the parent table name.
This matches trigger-mode behaviour and ensures the stream table sees a
unified change stream.
If a table is converted to partitioned after CDC is set up, pg_trickle detects the inconsistency on the next health check and rebuilds the publication with the correct setting automatically.
Monitoring slot lag in Prometheus
If you use the Prometheus & Grafana integration, pg_trickle exports per-source slot lag as:
pgtrickle_replication_slot_lag_bytes{slot_name="pgtrickle_12345", source_table="orders"}
Set an alert at 80% of your disk space budget for WAL retention.
Performance comparison
| Trigger | WAL | |
|---|---|---|
| Write-side overhead | ~2–15 µs per row | Zero (async) |
| Change latency | Sub-millisecond | Up to ~1 second |
| Prerequisites | None | wal_level = logical, replication slot |
| Works on managed PG (e.g. RDS without logical replication) | Yes | No |
| Works on physical read replicas | No | No |
| Handles bulk inserts efficiently | Statement mode optional | Yes (batch decoded) |
| Replication slot disk usage | None | Yes — grows if consumer lags |
Troubleshooting
Trigger CDC: changes not appearing
- Verify triggers are installed:
SELECT * FROM pgtrickle.trigger_inventory() WHERE NOT present OR NOT enabled; - If missing, rebuild:
SELECT pgtrickle.rebuild_cdc_triggers('public.source_table');
WAL CDC: slot not advancing
- Check slot lag:
SELECT slot_name, active, pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn) AS lag_bytes FROM pg_replication_slots WHERE slot_name LIKE 'pgtrickle_%'; - Check that the pg_trickle background worker is running:
SELECT * FROM pg_stat_activity WHERE application_name LIKE 'pg_trickle%'; - Check
pg_trickle.cdc_modeis set to'auto'or'wal'.
Stuck in TRANSITIONING state
If a source table stays in transitioning for more than a few minutes:
SELECT source_table, cdc_mode FROM pgtrickle.check_cdc_health();
The transition has a timeout (wal_transition_timeout, default: 300 s). After
the timeout it falls back to triggers automatically. If it keeps failing:
- Check
wal_level = logicalis still set. - Check
max_replication_slotshas not been exceeded. - Force revert:
ALTER SYSTEM SET pg_trickle.cdc_mode = 'trigger'.
See also
- Configuration: pg_trickle.cdc_mode
- Architecture Overview — CDC architecture and WAL decoder design
- Downstream Publications — expose stream table output via logical replication
- Tutorials: What Happens on INSERT — trigger-mode CDC deep dive
SLA-based Smart Scheduling
Instead of manually tuning refresh schedules, you can tell pg_trickle what your data freshness requirement is and let it figure out the rest.
Set a target — "this stream table must never be more than 10 seconds stale" — and pg_trickle assigns the right scheduling tier, monitors whether it can meet the target based on real refresh history, and alerts you before an SLA breach happens.
set_stream_table_slaavailable since v0.22.0recommend_schedule,predicted_sla_breachalerts available since v0.27.0
The problem with manual scheduling
A manually configured schedule like schedule => '5s' works when your source
tables are quiet, but it can easily become wrong over time:
- Source tables grow → refreshes take longer → the 5-second schedule no longer completes in time.
- A brief write surge hits → a single refresh takes 4× the normal time → the SLA is quietly broken with no warning.
- You add a complex JOIN → differential refresh cost jumps → you never notice until a user complains about stale data.
SLA-based scheduling solves this by tying the refresh schedule to an observable outcome (data freshness) instead of an assumed refresh duration.
Quickstart
Set an SLA on a stream table
SELECT pgtrickle.set_stream_table_sla('public.order_totals', interval '10 seconds');
This does two things immediately:
- Stores
10000ms asfreshness_deadline_msin the catalog. - Assigns a tier based on the SLA value (see Tier assignment).
pg_trickle will then actively monitor whether each refresh is on track to meet the target, and alert you if it predicts a breach.
Check the current SLA
SELECT pgt_name, freshness_deadline_ms, refresh_tier, staleness
FROM pgtrickle.stream_tables_info
WHERE pgt_name = 'order_totals';
Tier assignment
set_stream_table_sla maps your freshness target to one of three scheduler
tiers:
| SLA target | Tier assigned | Description |
|---|---|---|
| ≤ 5 seconds | Hot | Maximum priority; refreshes as fast as the worker pool allows |
| 6–30 seconds | Warm | Standard priority |
| > 30 seconds | Cold | Background priority; other tables take precedence |
You can still override the tier manually after setting an SLA:
-- Force to hot regardless of SLA
SELECT pgtrickle.set_stream_table_tier('public.order_totals', 'hot');
Schedule recommendations
Once a stream table has accumulated enough refresh history, pg_trickle can recommend an optimal schedule based on observed refresh durations using a median+MAD (Median Absolute Deviation) statistical model.
Single table recommendation
SELECT pgtrickle.recommend_schedule('public.order_totals');
Returns JSONB:
{
"recommended_interval_seconds": 3.8,
"current_interval_seconds": 5.0,
"delta_pct": -24.0,
"peak_window_cron": null,
"confidence": 0.87,
"reasoning": "median=1247ms mad=183ms p95_estimate=1796ms recommended=2.7s confidence=0.87"
}
| Field | Meaning |
|---|---|
recommended_interval_seconds | Suggested new schedule, with a 1.5× headroom over p95 refresh duration |
current_interval_seconds | Current configured schedule |
delta_pct | How much the recommendation differs from the current schedule (negative = speed up) |
confidence | 0.0–1.0; reflects how consistent refresh times are; 0.0 means insufficient history |
reasoning | Human-readable explanation of how the recommendation was computed |
All tables at once
SELECT name, current_interval_seconds, recommended_interval_seconds, delta_pct, confidence
FROM pgtrickle.schedule_recommendations()
ORDER BY ABS(delta_pct) DESC;
This is particularly useful for a periodic review of your entire deployment.
Sort by delta_pct DESC to find tables where the schedule is too aggressive
(recommendation is longer → reducing unnecessary CPU cost), or by
delta_pct ASC to find tables where the schedule is too relaxed (refresh
is taking too long to stay within SLA).
Minimum sample threshold
The planner requires at least pg_trickle.schedule_recommendation_min_samples
completed refreshes (default: 20) before computing a non-zero confidence score.
Until then, confidence = 0.0 and the recommendation reflects the last known
full refresh duration. You can lower this during initial setup:
ALTER SYSTEM SET pg_trickle.schedule_recommendation_min_samples = 10;
SELECT pg_reload_conf();
Predictive SLA breach alerts
After every refresh, the scheduler checks whether the predicted next refresh
duration will exceed the stream table's freshness_deadline_ms by more than
20%. If so, a predicted_sla_breach alert is emitted via LISTEN/NOTIFY on
the pg_trickle_alert channel.
This gives you advance warning before the breach happens — not after.
Listening for alerts
LISTEN pg_trickle_alert;
A breach alert payload looks like:
{
"event": "predicted_sla_breach",
"stream_table": "public.order_totals",
"predicted_duration_ms": 12800,
"deadline_ms": 10000,
"overage_pct": 28.0,
"timestamp": "2025-04-23T14:32:00Z"
}
Debouncing
To avoid flooding your alerting system during a temporary spike, alerts are debounced per stream table:
pg_trickle.schedule_alert_cooldown_seconds = 300 # 5 minutes (default)
Only one predicted_sla_breach alert fires per stream table per cooldown
window, even if every refresh during that window predicts a breach.
Bridging alerts to external systems
See Monitoring & Alerting
for examples of routing pg_trickle_alert notifications to PagerDuty, Slack,
Prometheus alertmanager, and other systems.
Workflow: setting up SLA-based scheduling from scratch
1. Create the stream table with a rough initial schedule
SELECT pgtrickle.create_stream_table(
'public.order_totals',
$$SELECT customer_id, SUM(amount) AS total FROM orders GROUP BY customer_id$$,
schedule => '5s'
);
2. Let it run for a while to build history
Wait for at least 20 refreshes (typically a minute or two with a 5-second schedule):
SELECT COUNT(*) FROM pgtrickle.pgt_refresh_history
WHERE pgt_id = (SELECT pgt_id FROM pgtrickle.pgt_stream_tables WHERE pgt_name = 'order_totals')
AND status = 'COMPLETED';
3. Set an SLA
SELECT pgtrickle.set_stream_table_sla('public.order_totals', interval '8 seconds');
4. Get a data-driven recommendation
SELECT pgtrickle.recommend_schedule('public.order_totals');
5. Apply the recommendation
SELECT pgtrickle.alter_stream_table(
'public.order_totals',
p_schedule => '3s' -- use the recommended value
);
6. Monitor for predicted breaches
LISTEN pg_trickle_alert;
Or query the alert history:
SELECT event_type, stream_table, payload, created_at
FROM pgtrickle.pgt_alert_history
WHERE event_type = 'predicted_sla_breach'
ORDER BY created_at DESC
LIMIT 10;
Checking current SLA status across all tables
SELECT
pgt_name,
freshness_deadline_ms,
staleness,
CASE WHEN staleness > (freshness_deadline_ms || ' milliseconds')::interval
THEN 'BREACHED' ELSE 'OK' END AS sla_status
FROM pgtrickle.stream_tables_info
WHERE freshness_deadline_ms IS NOT NULL
ORDER BY sla_status DESC, staleness DESC;
Removing an SLA
To remove an SLA target without changing the schedule:
UPDATE pgtrickle.pgt_stream_tables
SET freshness_deadline_ms = NULL
WHERE pgt_name = 'order_totals';
No predictive breach alerts will fire after this.
When recommendations have low confidence
A low confidence score (< 0.5) means refresh durations are highly variable.
Common causes:
| Cause | Fix |
|---|---|
| Not enough history | Wait for more refreshes, or lower schedule_recommendation_min_samples |
| Highly variable write load | Widen the prediction window; consider a cron schedule for peak hours |
| Source table growing rapidly | The current schedule may already be too slow; reduce it manually |
| Mix of FULL and DIFFERENTIAL refreshes | Check that the differential threshold is tuned correctly |
See also
- Tiered Scheduling — manual tier assignment and freeze controls
- Monitoring & Alerting — full NOTIFY-based alerting setup
- Tuning Refresh Mode — when to use FULL vs. DIFFERENTIAL
- SQL Reference: set_stream_table_sla
- Configuration: schedule_recommendation_min_samples
Downstream Publications
pg_trickle can expose the live content of any stream table as a PostgreSQL logical replication publication. This lets any tool that understands PostgreSQL logical replication — Debezium, Kafka Connect, Spark Structured Streaming, a read replica, a custom consumer — subscribe to stream table changes in real time, without needing to poll the table or set up a separate CDC pipeline.
Available since v0.22.0
Why use downstream publications?
Stream tables are already the result of incremental view maintenance — every refresh produces a well-defined diff of inserted and deleted rows. Exposing that diff via logical replication means external systems get exactly the same granular change events that pg_trickle computes internally, without extra work.
| Use case | Tool |
|---|---|
| Push stream table changes to Kafka | Debezium, Kafka Connect |
| Replicate to a read replica or standby | PostgreSQL physical/logical replica |
| Build event-driven microservices | Any logical replication consumer |
| Feed a data warehouse incrementally | Spark, Flink, Airbyte |
| Archive change history | Custom WAL consumer |
How it works
When you call stream_table_to_publication, pg_trickle creates a standard
PostgreSQL publication named pgt_pub_<stream_table_name> that covers the
stream table's underlying storage table.
Stream table refresh (MERGE)
│
▼
Rows inserted / deleted in stream table storage
│
▼
PostgreSQL logical replication
│
▼
Subscribers receive INSERT / DELETE events
(standard pgoutput protocol)
The publication is named pgt_pub_<stream_table_name> and is owned by the
same role that created the stream table.
Quickstart
Step 1 — Verify PostgreSQL is configured
Logical replication requires wal_level = logical in postgresql.conf:
SHOW wal_level;
-- Should return: logical
If it returns replica or minimal, update postgresql.conf:
wal_level = logical
Then restart PostgreSQL. You also need enough replication slots:
max_replication_slots = 10 # at least 1 per subscriber
Step 2 — Create the publication
SELECT pgtrickle.stream_table_to_publication('public.order_totals');
-- INFO: pg_trickle: created publication 'pgt_pub_order_totals' for stream table 'public.order_totals'
This creates the publication immediately. Any subscriber can connect right away.
Step 3 — Create a subscriber
PostgreSQL logical replication subscriber
-- On a downstream PostgreSQL instance:
CREATE SUBSCRIPTION order_totals_sub
CONNECTION 'host=primary port=5432 dbname=mydb user=replicator password=secret'
PUBLICATION pgt_pub_order_totals;
Debezium (via Kafka Connect)
{
"name": "order-totals-connector",
"config": {
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"database.hostname": "primary",
"database.port": "5432",
"database.user": "replicator",
"database.password": "secret",
"database.dbname": "mydb",
"publication.name": "pgt_pub_order_totals",
"table.include.list": "public.order_totals",
"plugin.name": "pgoutput"
}
}
Kafka Connect (without Debezium)
{
"name": "order-totals-source",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
"publication.name": "pgt_pub_order_totals"
}
}
Checking whether a publication exists
-- Via pg_trickle catalog
SELECT pgt_name, downstream_publication_name
FROM pgtrickle.pgt_stream_tables
WHERE pgt_name = 'order_totals';
-- Via PostgreSQL catalog
SELECT pubname, puballtables, pubinsert, pubupdate, pubdelete
FROM pg_publication
WHERE pubname = 'pgt_pub_order_totals';
Monitoring subscriber lag
Slow or stalled subscribers can cause the WAL to grow unboundedly. Monitor replication slot lag:
SELECT slot_name, database, active, pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) AS lag
FROM pg_replication_slots
WHERE slot_name LIKE 'pgt_%'
ORDER BY restart_lsn;
pg_trickle also watches subscriber lag automatically via
pg_trickle.publication_lag_warn_bytes (v0.25.0). When a slot exceeds the
configured byte lag:
- A warning is logged.
- Change-buffer cleanup is paused for that slot until it catches up — preventing data loss for slow consumers.
Configure the threshold:
pg_trickle.publication_lag_warn_bytes = 67108864 # 64 MB
Removing a publication
SELECT pgtrickle.drop_stream_table_publication('public.order_totals');
Publications are also automatically dropped when the stream table is dropped:
SELECT pgtrickle.drop_stream_table('public.order_totals');
-- Also drops pgt_pub_order_totals
Multiple subscribers on the same publication
A single publication can support multiple subscribers (e.g. both Debezium and a PostgreSQL logical replica). Each subscriber gets its own replication slot and offset — they progress independently.
-- One publication, multiple consumers:
-- Consumer 1: Debezium → Kafka
-- Consumer 2: PostgreSQL read replica
-- Consumer 3: Spark Structured Streaming
SELECT pgtrickle.stream_table_to_publication('public.order_totals');
-- All three consumers can subscribe to pgt_pub_order_totals
Partitioned stream tables
If your stream table is backed by a partitioned source, pg_trickle
automatically sets publish_via_partition_root = true on the publication so
that child partition changes are published under the parent table's identity.
This matches the behaviour of trigger-based CDC and ensures subscribers see a
consistent stream regardless of partitioning scheme.
Permissions
The role consuming the publication needs the REPLICATION attribute (or
superuser):
CREATE ROLE replicator WITH REPLICATION LOGIN PASSWORD 'secret';
For Debezium and Kafka Connect, grant SELECT on the stream table too:
GRANT SELECT ON public.order_totals TO replicator;
Limitations
- Only one publication per stream table. Calling
stream_table_to_publicationtwice returns an error. Use a single publication with multiple subscribers instead. wal_level = logicalis required. This is not the default in all managed PostgreSQL providers — check your provider's documentation.- Subscribers must be able to handle
INSERTandDELETEevents (stream tables do not useUPDATE— every change is expressed as a delete + insert pair in the logical replication stream).
Relationship to WAL-based CDC
Downstream publications are a separate feature from pg_trickle's own
WAL-based CDC mode. pg_trickle uses WAL internally (when cdc_mode = 'wal')
to capture source table changes — the downstream publication feature exposes
the output (stream table) to external consumers.
See CDC Modes for an explanation of how pg_trickle captures changes from source tables.
See also
- SQL Reference: stream_table_to_publication
- CDC Modes — WAL-based change capture for source tables
- Prometheus & Grafana integration — monitor replication lag
Transactional Outbox
The transactional outbox pattern solves the dual-write problem: how to atomically update your database and publish an event to an external system without risking inconsistency if one side fails.
pg_trickle's outbox implementation builds on top of stream tables. Every time a
stream table refresh produces a non-empty delta, a summary row is written to an
outbox table in the same transaction as the MERGE. Consumers are notified
via pg_notify the moment the commit lands.
Available since v0.28.0
How it works
Source tables (INSERT / UPDATE / DELETE)
│
▼
CDC trigger fires → pgtrickle_changes buffer
│
▼
Stream table refresh (MERGE)
│ ← same transaction ─────────────────────────────┐
▼ │
Delta rows applied to stream table outbox row written
(inserted_count / deleted_count recorded) to pgtrickle.outbox_<st>
│
pg_notify fired
│
Consumer polls / listens
The outbox row is guaranteed to exist if and only if the stream table was updated. There is no window where the stream table changes but no outbox row exists, or an outbox row exists but the stream table did not change.
Inline vs. claim-check mode
| Condition | Mode | What the consumer receives |
|---|---|---|
delta_rows ≤ outbox_inline_threshold_rows (default: 1000) | Inline | Full delta serialized as JSONB in payload |
delta_rows > outbox_inline_threshold_rows | Claim-check | is_claim_check = true, payload is NULL; delta rows in pgtrickle.outbox_delta_rows_<st> |
Inline mode is simpler — the consumer reads one row and gets everything. Claim-check mode avoids storing very large payloads in the outbox table, at the cost of an extra query to fetch the delta rows.
Quickstart
1. Create a stream table
SELECT pgtrickle.create_stream_table(
'public.order_totals',
$$SELECT customer_id, SUM(amount) AS total
FROM orders
GROUP BY customer_id$$
);
2. Enable the outbox
SELECT pgtrickle.enable_outbox('public.order_totals');
This creates:
pgtrickle.outbox_order_totals— outbox header tablepgtrickle.outbox_delta_rows_order_totals— claim-check delta rowspgtrickle.pgt_outbox_latest_order_totals— convenience view pointing to the most recent outbox row
3. Create consumer groups
Each independent consumer needs its own group. Groups track their own offset into the outbox table so they never interfere with each other.
SELECT pgtrickle.create_consumer_group(
'shipping_service',
'public.order_totals'
);
SELECT pgtrickle.create_consumer_group(
'analytics_pipeline',
'public.order_totals'
);
4. Poll for messages
A consumer loop looks like this:
-- Claim up to 50 unprocessed rows, hold the lease for 30 seconds
SELECT * FROM pgtrickle.poll_outbox(
'public.order_totals',
'shipping_service',
batch_size => 50,
lease_seconds => 30
);
poll_outbox returns outbox rows that this consumer has not yet committed.
Each row is leased — no other worker sharing the same consumer group can claim
it until the lease expires.
5. Process and commit
After successfully processing each batch:
SELECT pgtrickle.commit_offset('shipping_service', 'public.order_totals', last_id);
last_id is the highest id value from the batch you just processed.
Committed rows are never returned by poll_outbox again.
Reading the payload
Inline mode
SELECT
id,
created_at,
inserted_count,
deleted_count,
payload -> 'inserted' AS inserted_rows,
payload -> 'deleted' AS deleted_rows
FROM pgtrickle.outbox_order_totals
ORDER BY id DESC
LIMIT 5;
Claim-check mode
-- Get the outbox row
SELECT id, is_claim_check FROM pgtrickle.pgt_outbox_latest_order_totals;
-- Fetch the actual delta rows for a claim-check outbox row
SELECT row_op, row_data
FROM pgtrickle.outbox_delta_rows_order_totals
WHERE outbox_id = <outbox_id>
ORDER BY row_num;
Multiple workers (parallel consumption)
Multiple workers in the same consumer group share the workload. pg_trickle assigns non-overlapping leases, so each row is processed by exactly one worker at a time.
-- Worker 1
SELECT * FROM pgtrickle.poll_outbox('public.order_totals', 'shipping_service');
-- Worker 2 (concurrent, gets a different batch)
SELECT * FROM pgtrickle.poll_outbox('public.order_totals', 'shipping_service');
Workers should register their presence so the system can detect dead workers:
-- Call periodically (e.g. every 30 s) while the worker is alive
SELECT pgtrickle.consumer_heartbeat('shipping_service', 'worker-1');
Workers that miss their heartbeat deadline are removed from the consumer group.
Any leases held by a dead worker expire automatically after lease_seconds,
returning those rows to the available pool.
Lease management
Extending a lease
If processing is taking longer than expected:
SELECT pgtrickle.extend_lease(
'shipping_service',
'public.order_totals',
outbox_id => 42,
extra_seconds => 60
);
Seeking to a specific position
For replay or recovery scenarios:
-- Replay from the beginning
SELECT pgtrickle.seek_offset('shipping_service', 'public.order_totals', 0);
-- Skip ahead to the current tip
SELECT pgtrickle.seek_offset(
'shipping_service', 'public.order_totals',
(SELECT MAX(id) FROM pgtrickle.outbox_order_totals)
);
Monitoring
Check outbox health
SELECT pgtrickle.outbox_status('public.order_totals');
Returns JSONB:
{
"enabled": true,
"stream_table": "public.order_totals",
"outbox_table": "pgtrickle.outbox_order_totals",
"row_count": 1247,
"oldest_row": "2025-04-20T10:00:00Z",
"newest_row": "2025-04-23T14:32:00Z",
"retention_hours": 24
}
Consumer lag
-- Per consumer group
SELECT pgtrickle.consumer_lag('shipping_service', 'public.order_totals');
Returns the number of outbox rows that the consumer group has not yet committed. A large or growing lag means the consumer is falling behind.
Global outbox overview
SELECT * FROM pgtrickle.pgt_outbox_config;
Catalog tables
| Table | Contents |
|---|---|
pgtrickle.pgt_outbox_config | One row per enabled outbox: ST OID, outbox table name, retention hours |
pgtrickle.pgt_consumer_groups | One row per consumer group: name, stream table, created_at |
pgtrickle.pgt_consumer_offsets | Per-group committed offsets and lease state |
pgtrickle.outbox_<st> | Outbox header rows (auto-created per stream table) |
pgtrickle.outbox_delta_rows_<st> | Claim-check delta rows (auto-created per stream table) |
Retention and cleanup
Outbox rows are automatically deleted after outbox_retention_hours (default:
24). Claim-check delta rows are removed when commit_offset is called or when
the retention period expires.
Configure retention per stream table at enable time:
SELECT pgtrickle.enable_outbox('public.order_totals', p_retention_hours => 48);
Or globally in postgresql.conf:
pg_trickle.outbox_retention_hours = 48
Disabling the outbox
SELECT pgtrickle.disable_outbox('public.order_totals');
This drops the outbox table, delta-rows table, and latest view, and removes the catalog entry. Consumer groups must be dropped separately:
SELECT pgtrickle.drop_consumer_group('shipping_service', 'public.order_totals');
Recommended configuration
| GUC | Recommended value | Notes |
|---|---|---|
pg_trickle.outbox_enabled | on | Must be on for the outbox background worker to run |
pg_trickle.outbox_retention_hours | 24–72 | Balance storage cost vs. replay window |
pg_trickle.outbox_drain_batch_size | 500–2000 | Larger batches improve throughput |
pg_trickle.outbox_inline_threshold_rows | 500–2000 | Tune based on typical delta size |
pg_trickle.outbox_skip_empty_delta | on | Skip writing outbox rows when delta is empty |
pg_trickle.consumer_cleanup_enabled | on | Auto-remove dead consumer workers |
pg_trickle.consumer_dead_threshold_hours | 1 | Mark worker dead after 1 h of silence |
Anti-patterns
Do not poll without committing. If your consumer processes messages but
never calls commit_offset, the lag grows unboundedly and messages are
replayed forever after a worker restart.
Do not use a single consumer group for independent services. Each service that needs to process outbox events independently must have its own consumer group. Sharing a group means one service blocking the other.
Do not delete outbox rows manually. Let the retention mechanism handle cleanup. Manual deletes can cause consumer group offsets to point to non-existent rows.
Do not enable the outbox on IMMEDIATE-mode stream tables. The outbox requires DIFFERENTIAL or FULL refresh mode to detect which rows changed.
See also
- Transactional Inbox — receive events from external systems
- SQL Reference: Transactional Outbox
- Configuration
- Pattern 7: Transactional Outbox
Transactional Inbox
The transactional inbox pattern solves the duplicate-processing problem: when messages arrive from an external system, your service needs a guarantee that each message is processed exactly once, even if the message broker delivers it more than once or your service restarts mid-batch.
pg_trickle's inbox works by writing incoming messages to a PostgreSQL table and using stream tables to present live views of pending work, dead letters, and per-type statistics. Because the inbox table is ordinary PostgreSQL, your application's processing step and the "mark as processed" step can be wrapped in a single transaction — making the entire operation atomic.
Available since v0.28.0
How it works
External system (Kafka / NATS / webhook / custom consumer)
│
▼
INSERT into pgtrickle.<inbox_name>
(idempotent: ON CONFLICT DO NOTHING on event_id)
│
▼
Stream tables refresh automatically:
├─ <inbox_name>_pending ← WHERE processed_at IS NULL AND retry_count < max_retries
├─ <inbox_name>_dlq ← WHERE processed_at IS NULL AND retry_count >= max_retries
└─ <inbox_name>_stats ← GROUP BY event_type (counts)
│
▼
Your application queries <inbox_name>_pending,
processes each message, then:
UPDATE <inbox_name> SET processed_at = now() WHERE event_id = $1
The stream tables are differential: when a row's processed_at is set, the
change propagates to _pending and _stats in the next refresh cycle
(typically within 1 second).
Quickstart
1. Create an inbox
SELECT pgtrickle.create_inbox('order_events');
This creates:
pgtrickle.order_events— the inbox table (one row per message)pgtrickle.order_events_pending— stream table: unprocessed messagespgtrickle.order_events_dlq— stream table: messages that exhausted retriespgtrickle.order_events_stats— stream table: per-event-type counts
2. Write messages (sender side)
The inbox table has a standard schema:
| Column | Type | Description |
|---|---|---|
event_id | TEXT PK | Globally unique message ID (idempotency key) |
event_type | TEXT | Message type / topic (e.g. order.placed) |
source | TEXT | Originating system or service |
aggregate_id | TEXT | Business entity ID (e.g. order ID) |
payload | JSONB | Message body |
received_at | TIMESTAMPTZ | Set to now() on insert |
processed_at | TIMESTAMPTZ | Set by your application after processing |
error | TEXT | Last error message, if any |
retry_count | INT | Number of failed attempts |
trace_id | TEXT | Distributed trace ID for observability |
Write messages with conflict protection to guarantee idempotency:
INSERT INTO pgtrickle.order_events
(event_id, event_type, source, aggregate_id, payload)
VALUES
('evt-001', 'order.placed', 'shop-api', 'ORD-123', '{"amount": 49.99}')
ON CONFLICT (event_id) DO NOTHING;
3. Process messages (receiver side)
-- Read pending messages
SELECT event_id, event_type, aggregate_id, payload
FROM pgtrickle.order_events_pending
LIMIT 100;
Process each message in a transaction:
BEGIN;
-- Do your business logic here
-- (e.g. publish to downstream service, update application tables)
-- Mark as processed atomically with your business logic
UPDATE pgtrickle.order_events
SET processed_at = now()
WHERE event_id = 'evt-001';
COMMIT;
If the transaction rolls back, processed_at stays NULL and the message
remains in _pending for retry.
Using an existing table (bring-your-own-table)
If you already have a messages table, point pg_trickle at it instead of creating a new one:
SELECT pgtrickle.enable_inbox_tracking(
'my_inbox', -- logical name
'app.incoming_events', -- your existing table
p_id_column => 'msg_id',
p_processed_at_column => 'done_at',
p_event_type_column => 'type'
);
pg_trickle validates that the required columns exist, then creates the standard stream tables on top of your table. The underlying table is not modified.
Ordering guarantees (per-aggregate)
By default, multiple workers can process messages for the same aggregate_id
concurrently. If your business logic requires strictly sequential processing per
aggregate (e.g. events for the same order must be handled in order), enable
ordering:
SELECT pgtrickle.enable_inbox_ordering(
'order_events',
p_aggregate_column => 'aggregate_id',
p_sequence_column => 'received_at'
);
This creates a fourth stream table:
pgtrickle.next_order_events— one row peraggregate_id, always the next unprocessed message for that aggregate (DISTINCT ON semantics)
Workers that need ordered processing should query next_order_events instead
of order_events_pending:
-- Only the next message per aggregate — safe for parallel workers
SELECT event_id, event_type, aggregate_id, payload
FROM pgtrickle.next_order_events
LIMIT 50;
A worker processing aggregate_id = 'ORD-123' blocks any other message for
that order until it commits. Different aggregates are processed in parallel.
Checking for ordering gaps
-- Returns aggregate IDs where messages are out of sequence or missing
SELECT * FROM pgtrickle.inbox_ordering_gaps('order_events');
Priority processing
If some message types should be processed before others, enable priority scheduling:
SELECT pgtrickle.enable_inbox_priority(
'order_events',
p_priority_column => 'event_type',
p_priority_map => '{"order.cancelled": 1, "order.placed": 2, "order.shipped": 3}'::jsonb
);
Lower priority values are processed first. Messages without an entry in the priority map default to priority 999 (processed last).
Multi-worker partitioning
When many workers process the same inbox concurrently, you can partition the workload by aggregate ID using consistent hashing:
-- Worker 0 of 4: only process messages assigned to partition 0
SELECT event_id, aggregate_id, payload
FROM pgtrickle.order_events_pending
WHERE pgtrickle.inbox_is_my_partition('order_events', aggregate_id, 0, 4);
-- Worker 1 of 4
SELECT event_id, aggregate_id, payload
FROM pgtrickle.order_events_pending
WHERE pgtrickle.inbox_is_my_partition('order_events', aggregate_id, 1, 4);
The hash function is deterministic — the same aggregate_id always maps to
the same partition — so you can scale the worker pool without rebalancing.
Dead-letter queue
Messages that exceed max_retries (default: 3) are automatically visible in
the DLQ stream table:
-- View dead letters
SELECT event_id, event_type, aggregate_id, error, retry_count
FROM pgtrickle.order_events_dlq
ORDER BY received_at;
Replaying DLQ messages
After fixing the root cause:
-- Reset retry count so the message is picked up again
SELECT pgtrickle.replay_inbox_messages(
'order_events',
p_event_ids => ARRAY['evt-001', 'evt-002']
);
-- Or replay all DLQ messages of a specific type
SELECT pgtrickle.replay_inbox_messages(
'order_events',
p_event_type => 'order.placed'
);
Monitoring
Health check
SELECT pgtrickle.inbox_health('order_events');
Returns a JSONB object:
{
"inbox": "order_events",
"pending_count": 42,
"dlq_count": 3,
"oldest_pending_age_seconds": 12,
"throughput_per_minute": 180,
"status": "healthy"
}
A status of "degraded" means the DLQ count or pending age is above
configured thresholds.
Detailed status
SELECT pgtrickle.inbox_status('order_events');
Returns richer JSONB including processing rates, error breakdown, and stream table refresh counts.
Global inbox overview
SELECT * FROM pgtrickle.pgt_inbox_config;
Catalog tables
| Table | Contents |
|---|---|
pgtrickle.pgt_inbox_config | One row per inbox: name, schema, max_retries, schedule |
pgtrickle.pgt_inbox_ordering_config | Ordering settings per inbox |
pgtrickle.pgt_inbox_priority_config | Priority map per inbox |
pgtrickle.<name> | The inbox message table (auto-created) |
pgtrickle.<name>_pending | Stream table: unprocessed messages |
pgtrickle.<name>_dlq | Stream table: dead letters |
pgtrickle.<name>_stats | Stream table: per-event-type counts |
pgtrickle.next_<name> | Stream table: next message per aggregate (ordering only) |
Retention and cleanup
Processed messages are automatically deleted after inbox_processed_retention_hours
(default: 72). DLQ rows are held for inbox_dlq_retention_hours (default: 168
= 7 days) to give operators time to inspect and replay them.
Configure globally in postgresql.conf:
pg_trickle.inbox_processed_retention_hours = 72
pg_trickle.inbox_dlq_retention_hours = 168
Dropping an inbox
-- Drop the inbox and its stream tables, but keep the underlying table
SELECT pgtrickle.drop_inbox('order_events');
-- Drop everything including the backing table
SELECT pgtrickle.drop_inbox('order_events', p_cascade => true);
Recommended configuration
| GUC | Recommended value | Notes |
|---|---|---|
pg_trickle.inbox_enabled | on | Must be on for inbox background workers to run |
pg_trickle.inbox_processed_retention_hours | 24–72 | Adjust based on audit requirements |
pg_trickle.inbox_dlq_retention_hours | 168 | Keep DLQ items for at least 7 days |
pg_trickle.inbox_drain_batch_size | 500–2000 | Tune for throughput vs. latency |
pg_trickle.inbox_dlq_alert_max_per_refresh | 100 | Alert when DLQ grows rapidly |
Anti-patterns
Do not mark messages as processed outside a transaction with your business logic. The atomic combination of "do work + mark processed" is what prevents duplicate processing. If you process first and then mark processed in a separate transaction, a crash between the two steps causes duplicate processing.
Do not share a single inbox across unrelated services. Each service should have its own inbox so they can fail, replay, and scale independently.
Do not ignore the DLQ. A growing DLQ is a signal that something is
consistently broken. Set up an alert on inbox_dlq_alert_max_per_refresh and
review DLQ items regularly.
Do not delete inbox rows manually. Let the retention mechanism handle cleanup. Manual deletes can confuse the stream table refresh cycle.
See also
- Transactional Outbox — publish events from your database to external systems
- SQL Reference: Transactional Inbox
- Configuration
- Pattern 8: Transactional Inbox
What Happens When You INSERT a Row?
This tutorial traces the complete lifecycle of a single INSERT statement on a base table that is referenced by a stream table — from the moment the row is written to the moment the stream table reflects the change.
Setup: A Real-World Example
Suppose you run an e-commerce platform. You have an orders table and a stream table that maintains a running total per customer:
-- Base table
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
customer TEXT NOT NULL,
amount NUMERIC(10,2) NOT NULL
);
-- Stream table: always-fresh customer totals
SELECT pgtrickle.create_stream_table(
name => 'customer_totals',
query => $$
SELECT customer, SUM(amount) AS total, COUNT(*) AS order_count
FROM orders GROUP BY customer
$$,
schedule => '1m' -- refresh when data is staler than 1 minute
-- refresh_mode defaults to 'AUTO' (differential with full-refresh fallback)
);
After creation, customer_totals is a real PostgreSQL table:
SELECT * FROM customer_totals;
-- (empty — no orders yet)
Phase 1: The INSERT
A new order arrives:
INSERT INTO orders (customer, amount) VALUES ('alice', 49.99);
What happens inside PostgreSQL
When create_stream_table() was called, pg_trickle installed an AFTER INSERT OR UPDATE OR DELETE trigger on the orders table. This trigger fires automatically — the user's INSERT statement triggers it transparently.
The trigger function (pgtrickle_changes.pg_trickle_cdc_fn_<oid>()) executes inside the same transaction as the INSERT and writes a single row into the change buffer table:
pgtrickle_changes.changes_16384 (where 16384 = orders table OID)
┌───────────┬─────────────┬────────┬─────────┬──────────┬──────────┬────────────┐
│ change_id │ lsn │ action │ pk_hash │ new_id │ new_cust │ new_amount │
├───────────┼─────────────┼────────┼─────────┼──────────┼──────────┼────────────┤
│ 1 │ 0/1A3F2B80 │ I │ -837291 │ 1 │ alice │ 49.99 │
└───────────┴─────────────┴────────┴─────────┴──────────┴──────────┴────────────┘
Key details:
lsn: The current WAL Log Sequence Number (pg_current_wal_lsn()), used to bound which changes belong to which refresh cycle.action:'I'for INSERT,'U'for UPDATE,'D'for DELETE.pk_hash: A pre-computed hash of the primary key (orders.id), used later for efficient row matching.new_*columns: The actual column values fromNEW, stored as native PostgreSQL types (not JSONB). There are noold_*values for INSERTs.
The trigger adds zero overhead to the user's transaction commit beyond this single INSERT into the buffer table. There is no JSONB serialization, no logical replication slot, and no external process involved.
Phase 2: The Scheduler Wakes Up
A background worker called the scheduler runs inside PostgreSQL (registered via shared_preload_libraries). It wakes up every pg_trickle.scheduler_interval_ms milliseconds (default: 1000ms) and performs a tick:
- Rebuild the DAG (if any stream tables were created/dropped since last tick) — a dependency graph of all stream tables and their source tables.
- Topological sort — determine the refresh order so that stream tables depending on other stream tables are refreshed after their dependencies.
- For each stream table, check: has its staleness exceeded its schedule?
For customer_totals with a '1m' schedule, the scheduler compares:
now()minusdata_timestamp(the freshness watermark from the last refresh)- Against the schedule: 60 seconds
If more than 60 seconds have elapsed and the stream table isn't already being refreshed, the scheduler begins a refresh.
Phase 3: Frontier Advancement
Before executing the refresh, the scheduler creates a new frontier — a snapshot of how far to read changes from each source table:
Previous frontier: { orders(16384): lsn = 0/1A3F2A00 }
New frontier: { orders(16384): lsn = 0/1A3F2C00 }
The frontier is a DBSP-inspired version vector. Each source table has its own LSN cursor. The refresh will process all changes in the buffer table where lsn > previous_frontier_lsn AND lsn <= new_frontier_lsn.
This means:
- Changes committed before the previous refresh are already reflected.
- Changes committed after the new frontier will be picked up in the next cycle.
- The INSERT we made (
lsn = 0/1A3F2B80) falls within this window.
Phase 4: Change Detection — Is There Anything to Do?
Before running the full delta query, the scheduler runs a short-circuit check: does the change buffer actually have any rows in the LSN window?
SELECT count(*)::bigint FROM (
SELECT 1 FROM pgtrickle_changes.changes_16384
WHERE lsn > '0/1A3F2A00'::pg_lsn
AND lsn <= '0/1A3F2C00'::pg_lsn
LIMIT <threshold>
) __pgt_capped
This query also checks the adaptive threshold: if the number of changes exceeds a percentage of the source table size (default: 10%), the scheduler falls back to a FULL refresh instead of DIFFERENTIAL, because applying thousands of individual deltas would be slower than a bulk reload.
For our single INSERT, the count is 1 — well below the threshold. The scheduler proceeds with a DIFFERENTIAL refresh.
Phase 5: Delta Query Generation (DVM Engine)
This is where the Differential View Maintenance (DVM) engine does its work. The defining query:
SELECT customer, SUM(amount) AS total, COUNT(*) AS order_count
FROM orders GROUP BY customer
is parsed into an operator tree:
Aggregate(GROUP BY customer, SUM(amount), COUNT(*))
└── Scan(orders)
The DVM engine differentiates each operator — converting it from "compute the full result" to "compute only what changed":
Step 1: Differentiate the Scan
The Scan(orders) operator becomes a read from the change buffer:
-- Reads only changes in the LSN window, splitting UPDATEs into DELETE+INSERT
WITH __pgt_raw AS (
SELECT c.pk_hash, c.action,
c."new_customer", c."old_customer",
c."new_amount", c."old_amount"
FROM pgtrickle_changes.changes_16384 c
WHERE c.lsn > '0/1A3F2A00'::pg_lsn
AND c.lsn <= '0/1A3F2C00'::pg_lsn
)
-- INSERT rows: take new_* values
SELECT pk_hash AS __pgt_row_id, 'I' AS __pgt_action,
"new_customer" AS customer, "new_amount" AS amount
FROM __pgt_raw WHERE action IN ('I', 'U')
UNION ALL
-- DELETE rows: take old_* values
SELECT pk_hash AS __pgt_row_id, 'D' AS __pgt_action,
"old_customer" AS customer, "old_amount" AS amount
FROM __pgt_raw WHERE action IN ('D', 'U')
For our single INSERT, this produces:
__pgt_row_id | __pgt_action | customer | amount
-------------|--------------|----------|-------
-837291 | I | alice | 49.99
Step 2: Differentiate the Aggregate
The Aggregate differentiation is the heart of incremental maintenance. Instead of re-computing SUM(amount) over the entire orders table, it computes:
-- Delta for SUM: add new values, subtract deleted values
SELECT customer,
SUM(CASE WHEN __pgt_action = 'I' THEN amount
WHEN __pgt_action = 'D' THEN -amount END) AS total,
SUM(CASE WHEN __pgt_action = 'I' THEN 1
WHEN __pgt_action = 'D' THEN -1 END) AS order_count,
pgtrickle.pg_trickle_hash(customer::text) AS __pgt_row_id,
'I' AS __pgt_action
FROM <scan_delta>
GROUP BY customer
For our INSERT of ('alice', 49.99), this yields:
customer | total | order_count | __pgt_row_id | __pgt_action
---------|--------|-------------|--------------|-------------
alice | +49.99 | +1 | 7283194 | I
The stream table uses reference counting: it tracks __pgt_count (how many source rows contribute to each group). When __pgt_count reaches 0, the group row is deleted.
Phase 6: MERGE Into the Stream Table
The delta is applied to the customer_totals storage table using a single SQL MERGE statement:
MERGE INTO public.customer_totals AS st
USING (<delta_query>) AS d
ON st.__pgt_row_id = d.__pgt_row_id
WHEN MATCHED AND d.__pgt_action = 'D' THEN DELETE
WHEN MATCHED AND d.__pgt_action = 'I' THEN
UPDATE SET customer = d.customer, total = d.total, order_count = d.order_count
WHEN NOT MATCHED AND d.__pgt_action = 'I' THEN
INSERT (__pgt_row_id, customer, total, order_count)
VALUES (d.__pgt_row_id, d.customer, d.total, d.order_count)
Since alice didn't exist before, this is a NOT MATCHED → INSERT. The stream table now contains:
SELECT * FROM customer_totals;
customer | total | order_count
----------|-------|------------
alice | 49.99 | 1
Phase 7: Cleanup and Bookkeeping
After the MERGE succeeds:
-
Consumed changes are deleted from the buffer table:
DELETE FROM pgtrickle_changes.changes_16384 WHERE lsn > '0/1A3F2A00'::pg_lsn AND lsn <= '0/1A3F2C00'::pg_lsn -
The frontier is saved to the catalog as JSONB, so the next refresh knows where to start.
-
The refresh is recorded in
pgtrickle.pgt_refresh_history:refresh_id | pgt_id | action | rows_inserted | rows_deleted | delta_row_count | status | initiated_by 1 | 1 | DIFFERENTIAL | 1 | 0 | 1 | COMPLETED | SCHEDULERThe
delta_row_countcolumn (new in v0.2.0) records the total number of change buffer rows consumed during this refresh cycle. -
The data timestamp on the stream table is advanced, resetting the staleness clock.
-
The MERGE template is cached in thread-local storage. The next refresh for this stream table skips SQL parsing, operator tree construction, and differentiation — it only substitutes LSN values into the cached template. This saves ~45ms per refresh cycle.
What About UPDATE and DELETE?
UPDATE
UPDATE orders SET amount = 59.99 WHERE id = 1;
The trigger writes a single row with action = 'U', capturing both OLD and NEW values:
action | new_amount | old_amount | new_customer | old_customer
-------|------------|------------|--------------|-------------
U | 59.99 | 49.99 | alice | alice
The scan differentiation splits this into:
- DELETE old:
(alice, 49.99)with action'D' - INSERT new:
(alice, 59.99)with action'I'
The aggregate differentiation computes: +59.99 - 49.99 = +10.00 for alice's total. The MERGE updates the existing row.
DELETE
DELETE FROM orders WHERE id = 1;
The trigger writes action = 'D' with the OLD values. The aggregate differentiation computes -49.99 for the total and -1 for the count. If the __pgt_count reaches 0 (no more orders for alice), the MERGE deletes alice's row from the stream table entirely.
Performance: Why This Is Fast
| Step | What it avoids |
|---|---|
| Trigger-based CDC | No logical replication slot, no WAL parsing, no external process |
| Typed columns | No JSONB serialization in the trigger, no jsonb_populate_record in the delta query |
| Pre-computed pk_hash | No per-row hash computation during the delta query |
| LSN-bounded reads | Index scan on the change buffer, not a full table scan |
| Algebraic differentiation | Processes only changed rows — O(changes) not O(table size) |
| MERGE statement | Single SQL round-trip for all inserts, updates, and deletes |
| Cached templates | After the first refresh, delta SQL generation is skipped entirely |
| Adaptive fallback | Automatically switches to FULL refresh when changes exceed a threshold |
For a table with 10 million rows and 100 changed rows, a DIFFERENTIAL refresh processes only those 100 rows. A FULL refresh would need to scan all 10 million.
What About IMMEDIATE Mode?
Everything described above applies to the default AUTO mode — changes accumulate in a buffer and are applied on a schedule using differential (delta-only) maintenance. As of v0.2.0, pg_trickle also supports IMMEDIATE mode, which takes a fundamentally different path.
With IMMEDIATE mode, there are no change buffers, no scheduler, and no waiting:
SELECT pgtrickle.create_stream_table(
name => 'customer_totals_live',
query => $$
SELECT customer, SUM(amount) AS total, COUNT(*) AS order_count
FROM orders GROUP BY customer
$$,
refresh_mode => 'IMMEDIATE'
);
How IMMEDIATE Mode Differs for INSERT
| Phase | DIFFERENTIAL | IMMEDIATE |
|---|---|---|
| Trigger type | Row-level AFTER trigger | Statement-level AFTER trigger with REFERENCING NEW TABLE |
| What's captured | One buffer row per INSERT | A transition table containing all inserted rows |
| When delta runs | Next scheduler tick (up to schedule bound) | Immediately, in the same transaction |
| Delta source | Change buffer table (pgtrickle_changes.*) | Temp table copied from transition table |
| Concurrency | No locking between writers | Advisory lock per stream table |
When you run INSERT INTO orders ...:
- A BEFORE INSERT statement-level trigger acquires an advisory lock on the stream table
- The AFTER INSERT trigger captures the transition table (
NEW TABLE AS __pgt_newtable) into a temp table - The DVM engine generates the same delta query, but reads from the temp table instead of the change buffer
- The delta is applied to the stream table via INSERT/DELETE DML (not MERGE)
- The stream table is immediately up-to-date — within the same transaction
BEGIN;
INSERT INTO orders (customer, amount) VALUES ('alice', 49.99);
-- customer_totals_live already shows alice with total=49.99 here!
SELECT * FROM customer_totals_live;
COMMIT;
The delta SQL template is cached per (pgt_id, source_oid, has_new, has_old) combination, so subsequent trigger invocations skip query parsing entirely.
Next in This Series
- What Happens When You UPDATE a Row? — D+I split, group key changes, net-effect for multiple UPDATEs
- What Happens When You DELETE a Row? — Reference counting, group deletion, INSERT+DELETE cancellation
- What Happens When You TRUNCATE a Table? — Why TRUNCATE bypasses triggers and how to recover
What Happens When You UPDATE a Row?
This tutorial traces what happens when an UPDATE statement hits a base table that is referenced by a stream table. It covers the trigger capture, the scan-level decomposition into DELETE + INSERT, and how each DVM operator propagates the change — including cases where the group key changes, where JOINs are involved, and where multiple UPDATEs happen within a single refresh window.
Prerequisite: Read WHAT_HAPPENS_ON_INSERT.md first — it introduces the full 7-phase lifecycle. This tutorial focuses on how UPDATE differs.
Setup
Same e-commerce example:
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
customer TEXT NOT NULL,
amount NUMERIC(10,2) NOT NULL
);
SELECT pgtrickle.create_stream_table(
name => 'customer_totals',
query => $$
SELECT customer, SUM(amount) AS total, COUNT(*) AS order_count
FROM orders GROUP BY customer
$$,
schedule => '1m'
);
-- Seed some data
INSERT INTO orders (customer, amount) VALUES
('alice', 49.99),
('alice', 30.00),
('bob', 75.00);
After the first refresh, the stream table contains:
customer | total | order_count
---------|-------|------------
alice | 79.99 | 2
bob | 75.00 | 1
Case 1: Simple Value UPDATE (Same Group Key)
UPDATE orders SET amount = 59.99 WHERE id = 1;
Alice's first order changes from 49.99 to 59.99. The customer (group key) stays the same.
Phase 1: Trigger Capture
The AFTER UPDATE trigger fires and writes one row to the change buffer with both OLD and NEW values:
pgtrickle_changes.changes_16384
┌───────────┬─────────────┬────────┬──────────┬──────────┬────────────┬──────────┬────────────┐
│ change_id │ lsn │ action │ new_cust │ new_amt │ old_cust │ old_amt │ pk_hash │
├───────────┼─────────────┼────────┼──────────┼──────────┼────────────┼──────────┼────────────┤
│ 4 │ 0/1A3F3000 │ U │ alice │ 59.99 │ alice │ 49.99 │ -837291 │
└───────────┴─────────────┴────────┴──────────┴──────────┴────────────┴──────────┴────────────┘
Key difference from INSERT: the trigger writes both new_* and old_* columns. The pk_hash is computed from NEW.id.
Phase 2–4: Scheduler, Frontier, Change Detection
Identical to the INSERT flow. The scheduler detects one change row in the LSN window.
Phase 5: Scan Differentiation — The U → D+I Split
This is where UPDATE handling diverges fundamentally. The scan delta operator decomposes the UPDATE into two events:
__pgt_row_id | __pgt_action | customer | amount
-------------|--------------|----------|-------
-837291 | D | alice | 49.99 ← old values (DELETE)
-837291 | I | alice | 59.99 ← new values (INSERT)
Why split into D+I? This is a core IVM principle. Downstream operators (aggregates, joins, filters) don't have special "update" logic — they only understand insertions and deletions. By decomposing the UPDATE:
- The DELETE event subtracts the old values from running aggregates
- The INSERT event adds the new values
This algebraic approach handles arbitrary operator trees without operator-specific update logic.
Phase 5 (continued): Aggregate Differentiation
The aggregate operator processes both events against the alice group:
-- DELETE event: subtract old values
alice: total += CASE WHEN action='D' THEN -49.99 END → -49.99
alice: count += CASE WHEN action='D' THEN -1 END → -1
-- INSERT event: add new values
alice: total += CASE WHEN action='I' THEN +59.99 END → +59.99
alice: count += CASE WHEN action='I' THEN +1 END → +1
Net effect on alice's group:
total delta: -49.99 + 59.99 = +10.00
count delta: -1 + 1 = 0
The aggregate emits this as an INSERT (because the group still exists and its value changed):
customer | total | order_count | __pgt_row_id | __pgt_action
---------|--------|-------------|--------------|-------------
alice | +10.00 | 0 | 7283194 | I
Phase 6: MERGE
The MERGE updates the existing row:
-- MERGE WHEN MATCHED AND action = 'I' THEN UPDATE:
-- alice's total: 79.99 + 10.00 = 89.99 (via reference counting)
-- alice's count: 2 + 0 = 2
Wait — that's not right. The MERGE doesn't add deltas; it replaces the row. The aggregate delta query actually computes the new absolute value by combining the stored state with the delta:
COALESCE(existing.total, 0) + delta.total → 79.99 + 10.00 = 89.99
COALESCE(existing.__pgt_count, 0) + delta.__pgt_count → 2 + 0 = 2
Result:
SELECT * FROM customer_totals;
customer | total | order_count
----------|-------|------------
alice | 89.99 | 2 ← was 79.99
bob | 75.00 | 1
Case 2: Group Key Change (Customer Reassignment)
UPDATE orders SET customer = 'bob' WHERE id = 2;
Alice's second order (amount=30.00) is reassigned to Bob. The group key itself changes.
Trigger Capture
change_id | lsn | action | new_cust | new_amt | old_cust | old_amt | pk_hash
5 | 0/1A3F3100 | U | bob | 30.00 | alice | 30.00 | 4521038
The old and new customer values differ.
Scan Delta: D+I Split
__pgt_row_id | __pgt_action | customer | amount
-------------|--------------|----------|-------
4521038 | D | alice | 30.00 ← removes from alice's group
4521038 | I | bob | 30.00 ← adds to bob's group
Aggregate Delta
The aggregate groups by customer, so the DELETE and INSERT land in different groups:
Group "alice":
total delta: -30.00
count delta: -1
Group "bob":
total delta: +30.00
count delta: +1
After MERGE
SELECT * FROM customer_totals;
customer | total | order_count
----------|--------|------------
alice | 59.99 | 1 ← lost one order (-30.00)
bob | 105.00 | 2 ← gained one order (+30.00)
This is why the D+I decomposition is essential. Without it, you'd need special "move between groups" logic. With it, the standard aggregate differentiation handles group key changes naturally.
Case 3: UPDATE That Deletes a Group
-- Alice only has one order left. Reassign it to bob.
UPDATE orders SET customer = 'bob' WHERE id = 1;
Aggregate Delta
Group "alice":
total delta: -59.99
count delta: -1
new __pgt_count: 1 - 1 = 0 → group vanishes!
Group "bob":
total delta: +59.99
count delta: +1
When __pgt_count reaches 0, the aggregate emits a DELETE for alice's group:
customer | total | __pgt_row_id | __pgt_action
---------|-------|--------------|-------------
alice | — | 7283194 | D ← group removed
bob | ... | 9182734 | I ← group updated
The MERGE deletes alice's row entirely:
SELECT * FROM customer_totals;
customer | total | order_count
----------|--------|------------
bob | 165.00 | 3
Case 4: Multiple UPDATEs on the Same Row (Within One Refresh Window)
What if a row is updated multiple times before the next refresh?
UPDATE orders SET amount = 10.00 WHERE id = 3; -- bob: 75 → 10
UPDATE orders SET amount = 20.00 WHERE id = 3; -- bob: 10 → 20
UPDATE orders SET amount = 30.00 WHERE id = 3; -- bob: 20 → 30
The change buffer now has 3 rows for pk_hash of order #3:
change_id | action | old_amt | new_amt
6 | U | 75.00 | 10.00
7 | U | 10.00 | 20.00
8 | U | 20.00 | 30.00
Net-Effect Computation
The scan delta uses a split fast-path design. Since order #3 has multiple changes (cnt > 1), it takes the multi-change path with window functions:
FIRST_VALUE(action) OVER (PARTITION BY pk_hash ORDER BY change_id) → 'U'
LAST_VALUE(action) OVER (...) → 'U'
Both first and last actions are 'U', so:
- DELETE: emits using old values from the earliest change (change_id=6):
old_amt = 75.00 - INSERT: emits using new values from the latest change (change_id=8):
new_amt = 30.00
Net delta:
__pgt_row_id | __pgt_action | amount
-------------|--------------|-------
pk_hash_3 | D | 75.00 ← original value before all changes
pk_hash_3 | I | 30.00 ← final value after all changes
The aggregate sees -75.00 + 30.00 = -45.00. This is correct regardless of the intermediate values. The intermediate rows (10.00, 20.00) are never seen.
Case 5: INSERT + UPDATE in Same Window
INSERT INTO orders (customer, amount) VALUES ('charlie', 100.00);
UPDATE orders SET amount = 200.00 WHERE customer = 'charlie';
Both happen before the next refresh. The buffer has:
change_id | action | old_amt | new_amt
9 | I | NULL | 100.00
10 | U | 100.00 | 200.00
Net-effect analysis:
first_action = 'I'(row didn't exist before this window)last_action = 'U'(row exists after)
Result:
- No DELETE emitted (first_action = 'I' means the row was born in this window)
- INSERT with final values:
(charlie, 200.00)
The aggregate sees a pure insertion of (charlie, 200.00) — the intermediate value of 100.00 never appears.
Case 6: UPDATE + DELETE in Same Window
UPDATE orders SET amount = 999.99 WHERE id = 3;
DELETE FROM orders WHERE id = 3;
Net-effect:
first_action = 'U'(row existed before)last_action = 'D'(row no longer exists)
Result:
- DELETE with original old values from the first change
- No INSERT (last_action = 'D')
The aggregate correctly sees only a removal.
Case 7: UPDATE with JOINs
Consider a stream table that joins two tables:
CREATE TABLE customers (
id SERIAL PRIMARY KEY,
name TEXT NOT NULL,
tier TEXT NOT NULL DEFAULT 'standard'
);
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
customer_id INT REFERENCES customers(id),
amount NUMERIC(10,2)
);
SELECT pgtrickle.create_stream_table(
name => 'order_details',
query => $$
SELECT c.name, c.tier, o.amount
FROM orders o
JOIN customers c ON o.customer_id = c.id
$$,
schedule => '1m'
);
Now update a customer's tier:
UPDATE customers SET tier = 'premium' WHERE name = 'alice';
How the JOIN Delta Works
The join differentiation follows the formula:
$$\Delta(L \bowtie R) = (\Delta L \bowtie R) \cup (L \bowtie \Delta R) - (\Delta L \bowtie \Delta R)$$
Since only the customers table changed:
- $\Delta L$ = changes to orders (empty)
- $\Delta R$ = changes to customers (alice's tier: standard → premium)
So:
- Part 1: $\Delta\text{orders} \bowtie \text{customers}$ = empty (no order changes)
- Part 2: $\text{orders} \bowtie \Delta\text{customers}$ = all of alice's orders joined with her tier change
- Part 3: $\Delta\text{orders} \bowtie \Delta\text{customers}$ = empty (no order changes)
Part 2 produces the delta: for each of alice's orders, DELETE the old row (with tier='standard') and INSERT a new row (with tier='premium').
The stream table is updated to reflect the new tier across all of alice's order rows.
Performance Summary
| Scenario | Buffer rows | Delta rows emitted | Work |
|---|---|---|---|
| Simple value change | 1 | 2 (D+I) | O(1) per group |
| Group key change | 1 | 2 (D+I, different groups) | O(1) per affected group |
| Group deletion | 1 | 1 (D) + 1 (I) or 1 (D) | O(1) |
| N updates same row | N | 2 (D first-old + I last-new) | O(N) scan, O(1) aggregate |
| INSERT+UPDATE same window | 2 | 1 (I only) | O(1) |
| UPDATE+DELETE same window | 2 | 1 (D only) | O(1) |
In all cases, the work is proportional to the number of changed rows, not the total table size. A single UPDATE on a billion-row table produces the same delta cost as on a 10-row table.
What About IMMEDIATE Mode?
Everything above describes DIFFERENTIAL mode — changes accumulate in a buffer and are applied on a schedule. As of v0.2.0, pg_trickle also supports IMMEDIATE mode, where the stream table is updated synchronously within the same transaction as your UPDATE.
How IMMEDIATE Mode Differs for UPDATE
| Phase | DIFFERENTIAL | IMMEDIATE |
|---|---|---|
| Trigger type | Row-level AFTER trigger | Statement-level AFTER trigger with REFERENCING OLD TABLE, NEW TABLE |
| What's captured | One buffer row with old_* and new_* | Two transition tables: __pgt_oldtable and __pgt_newtable |
| When delta runs | Next scheduler tick | Immediately, in the same transaction |
| D+I decomposition | In the scan delta CTE | Same algebra, but reading from transition temp tables |
| Concurrency | No locking between writers | Advisory lock per stream table |
When you run UPDATE orders SET amount = 59.99 WHERE id = 1:
- A BEFORE UPDATE trigger acquires an advisory lock on the stream table
- The AFTER UPDATE trigger captures both
OLD TABLE AS __pgt_oldtableandNEW TABLE AS __pgt_newtableinto temp tables - The DVM engine generates the same D+I decomposition, reading old values from the old-table and new values from the new-table
- The delta is applied to the stream table immediately
- Any query within the same transaction sees the updated stream table
BEGIN;
UPDATE orders SET amount = 59.99 WHERE id = 1;
-- customer_totals already reflects the new amount here!
SELECT * FROM customer_totals WHERE customer = 'alice';
COMMIT;
The same D+I split, aggregate differentiation, and net-effect logic applies — the only difference is the data source (transition tables vs change buffer) and timing (synchronous vs scheduled).
Next in This Series
- What Happens When You INSERT a Row? — The full 7-phase lifecycle (start here if you haven't already)
- What Happens When You DELETE a Row? — Reference counting, group deletion, INSERT+DELETE cancellation
- What Happens When You TRUNCATE a Table? — Why TRUNCATE bypasses triggers and how to recover
What Happens When You DELETE a Row?
This tutorial traces what happens when a DELETE statement hits a base table that is referenced by a stream table. It covers the trigger capture, how the scan delta emits a single DELETE event, and how each DVM operator propagates the removal — including group deletion, partial group reduction, JOINs, cascading deletes within a single refresh window, and the important edge case where a DELETE cancels a prior INSERT.
Prerequisite: Read WHAT_HAPPENS_ON_INSERT.md first — it introduces the full 7-phase lifecycle (trigger → scheduler → frontier → change detection → DVM delta → MERGE → cleanup). This tutorial focuses on how DELETE differs.
Setup
Same e-commerce example used throughout the series:
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
customer TEXT NOT NULL,
amount NUMERIC(10,2) NOT NULL
);
SELECT pgtrickle.create_stream_table(
name => 'customer_totals',
query => $$
SELECT customer, SUM(amount) AS total, COUNT(*) AS order_count
FROM orders GROUP BY customer
$$,
schedule => '1m'
);
-- Seed some data
INSERT INTO orders (customer, amount) VALUES
('alice', 50.00),
('alice', 30.00),
('bob', 75.00),
('bob', 25.00);
After the first refresh, the stream table contains:
customer | total | order_count
---------|--------|------------
alice | 80.00 | 2
bob | 100.00 | 2
Case 1: Delete One Row (Group Survives)
DELETE FROM orders WHERE id = 2; -- alice's 30.00 order
Alice still has one remaining order (id=1, amount=50.00). The group shrinks but doesn't vanish.
Phase 1: Trigger Capture
The AFTER DELETE trigger fires and writes one row to the change buffer with only OLD values:
pgtrickle_changes.changes_16384
┌───────────┬─────────────┬────────┬──────────┬──────────┬────────────┬──────────┬────────────┐
│ change_id │ lsn │ action │ new_cust │ new_amt │ old_cust │ old_amt │ pk_hash │
├───────────┼─────────────┼────────┼──────────┼──────────┼────────────┼──────────┼────────────┤
│ 5 │ 0/1A3F3000 │ D │ NULL │ NULL │ alice │ 30.00 │ 4521038 │
└───────────┴─────────────┴────────┴──────────┴──────────┴────────────┴──────────┴────────────┘
Key difference from INSERT and UPDATE:
new_*columns are all NULL — the row no longer exists, so there are no NEW valuesold_*columns contain the deleted row's data — this is what gets subtractedpk_hashis computed fromOLD.id(the deleted row's primary key)
Phase 2–4: Scheduler, Frontier, Change Detection
Identical to the INSERT flow. The scheduler detects one change row in the LSN window.
Phase 5: Scan Differentiation — Pure DELETE
Unlike UPDATE (which splits into D+I), a DELETE produces a single event:
__pgt_row_id | __pgt_action | customer | amount
-------------|--------------|----------|-------
4521038 | D | alice | 30.00
The scan delta applies the net-effect filtering rule:
first_action = 'D'→ row existed before the refresh windowlast_action = 'D'→ row does not exist after
Result: emit a DELETE using old values. No INSERT is emitted (because last_action = 'D').
This is the simplest path through the scan delta — one change, one PK, one DELETE event.
Phase 5 (continued): Aggregate Differentiation
The aggregate operator processes the DELETE event against the alice group:
-- DELETE event: subtract old values from alice's group
__ins_count = 0 -- no inserts
__del_count = 1 -- one deletion
__ins_total = 0 -- no amount added
__del_total = 30.00 -- 30.00 removed
The merge CTE joins this delta with the existing stream table state:
new_count = old_count + ins_count - del_count = 2 + 0 - 1 = 1 (still > 0)
Since new_count > 0 and the group already existed (old_count = 2), the action is classified as 'U' (update). The aggregate emits the group with its new values:
customer | total | order_count | __pgt_row_id | __pgt_action
---------|-------|-------------|--------------|-------------
alice | 50.00 | 1 | 7283194 | I
Note: the 'U' meta-action is emitted as __pgt_action = 'I' because the MERGE treats it as an update-via-INSERT (see aggregate final CTE: CASE WHEN __pgt_meta_action = 'D' THEN 'D' ELSE 'I' END).
Phase 6: MERGE
The MERGE statement matches alice's existing row and updates it:
MERGE INTO customer_totals AS st
USING (...delta...) AS d
ON st.__pgt_row_id = d.__pgt_row_id
WHEN MATCHED AND d.__pgt_action = 'I' THEN
UPDATE SET customer = d.customer, total = d.total, order_count = d.order_count, ...
Result:
SELECT * FROM customer_totals;
customer | total | order_count
----------|--------|------------
alice | 50.00 | 1 ← was 80.00 / 2
bob | 100.00 | 2
Phase 7: Cleanup
The change buffer rows in the consumed LSN window are deleted:
DELETE FROM pgtrickle_changes.changes_16384
WHERE lsn > '0/1A3F2FFF'::pg_lsn AND lsn <= '0/1A3F3000'::pg_lsn;
Case 2: Delete Last Row in Group (Group Vanishes)
-- Alice has one order left (id=1, amount=50.00). Delete it.
DELETE FROM orders WHERE id = 1;
Trigger Capture
change_id | lsn | action | old_cust | old_amt | pk_hash
6 | 0/1A3F3100 | D | alice | 50.00 | -837291
Scan Delta
Single DELETE event:
__pgt_row_id | __pgt_action | customer | amount
-------------|--------------|----------|-------
-837291 | D | alice | 50.00
Aggregate Delta
Group "alice":
ins_count = 0
del_count = 1
new_count = old_count + 0 - 1 = 1 - 1 = 0 → group vanishes!
When new_count drops to 0 (or below), the aggregate classifies this as action 'D' (delete). The reference count has reached zero — no rows contribute to this group anymore.
The aggregate emits a DELETE for alice's group:
customer | __pgt_row_id | __pgt_action
---------|--------------|-------------
alice | 7283194 | D
MERGE
The MERGE matches alice's existing row and deletes it:
WHEN MATCHED AND d.__pgt_action = 'D' THEN DELETE
Result:
SELECT * FROM customer_totals;
customer | total | order_count
----------|--------|------------
bob | 100.00 | 2
Alice's row is completely removed from the stream table. This is the correct behavior — with zero contributing rows, the group should not exist.
Case 3: Delete Multiple Rows (Same Group, Same Window)
-- Delete both of bob's orders before the next refresh
DELETE FROM orders WHERE id = 3; -- bob, 75.00
DELETE FROM orders WHERE id = 4; -- bob, 25.00
The change buffer has two rows with different pk_hash values (different PKs):
change_id | action | old_cust | old_amt | pk_hash
7 | D | bob | 75.00 | pk_hash_3
8 | D | bob | 25.00 | pk_hash_4
Scan Delta
Each PK has exactly one change, so both take the single-change fast path:
__pgt_row_id | __pgt_action | customer | amount
-------------|--------------|----------|-------
pk_hash_3 | D | bob | 75.00
pk_hash_4 | D | bob | 25.00
Two DELETE events, both targeting bob's group.
Aggregate Delta
The aggregate sums both deletions:
Group "bob":
ins_count = 0
del_count = 2
del_total = 75.00 + 25.00 = 100.00
new_count = 2 + 0 - 2 = 0 → group vanishes!
The aggregate emits a DELETE for bob's group.
MERGE
Bob's row is deleted from the stream table. With both alice and bob gone (from Cases 1+2+3), the stream table is now empty.
Case 4: INSERT + DELETE in Same Window (Cancellation)
What if a row is inserted and then deleted before the next refresh?
INSERT INTO orders (customer, amount) VALUES ('charlie', 200.00);
DELETE FROM orders WHERE customer = 'charlie';
The change buffer has:
change_id | action | new_cust | new_amt | old_cust | old_amt | pk_hash
9 | I | charlie | 200.00 | NULL | NULL | pk_hash_new
10 | D | NULL | NULL | charlie | 200.00 | pk_hash_new
Net-Effect Computation
Both changes share the same pk_hash. The pk_stats CTE finds cnt = 2, so this goes through the multi-change path:
first_action = FIRST_VALUE(action) OVER (...) → 'I'
last_action = LAST_VALUE(action) OVER (...) → 'D'
The scan delta applies the net-effect filtering:
- DELETE branch: requires
first_action != 'I'→ FAILS (first_action = 'I') - INSERT branch: requires
last_action != 'D'→ FAILS (last_action = 'D')
Result: zero events emitted. The INSERT and DELETE completely cancel each other out.
The aggregate never sees charlie. The stream table is unchanged. This is correct — the row was born and died within the same refresh window, so it should have no visible effect.
Case 5: UPDATE + DELETE in Same Window
UPDATE orders SET amount = 999.99 WHERE id = 3; -- bob: 75 → 999.99
DELETE FROM orders WHERE id = 3;
The change buffer:
change_id | action | old_amt | new_amt
11 | U | 75.00 | 999.99
12 | D | 999.99 | NULL
Net-Effect Computation
Same pk_hash, cnt = 2:
first_action = 'U' (row existed before this window)
last_action = 'D' (row no longer exists)
Filtering:
- DELETE branch:
first_action != 'I'→ OK. Emit DELETE with old values from the earliest change:old_amt = 75.00 - INSERT branch:
last_action != 'D'→ FAILS. No INSERT emitted.
Net delta:
__pgt_row_id | __pgt_action | amount
-------------|--------------|-------
pk_hash_3 | D | 75.00
The intermediate value of 999.99 never appears. The aggregate sees only the removal of the original value (75.00), which is correct — that's the value that was previously accounted for in the stream table.
Case 6: DELETE with JOINs
Consider a stream table that joins two tables:
CREATE TABLE customers (
id SERIAL PRIMARY KEY,
name TEXT NOT NULL,
tier TEXT NOT NULL DEFAULT 'standard'
);
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
customer_id INT REFERENCES customers(id),
amount NUMERIC(10,2)
);
SELECT pgtrickle.create_stream_table(
name => 'order_details',
query => $$
SELECT c.name, c.tier, o.amount
FROM orders o
JOIN customers c ON o.customer_id = c.id
$$,
schedule => '1m'
);
Seed data:
INSERT INTO customers VALUES (1, 'alice', 'premium'), (2, 'bob', 'standard');
INSERT INTO orders VALUES (1, 1, 50.00), (2, 1, 30.00), (3, 2, 75.00);
After refresh, the stream table has:
name | tier | amount
------|----------|-------
alice | premium | 50.00
alice | premium | 30.00
bob | standard | 75.00
Now delete an order:
DELETE FROM orders WHERE id = 2; -- alice's 30.00 order
How the JOIN Delta Works
The join differentiation formula:
$$\Delta(L \bowtie R) = (\Delta L \bowtie R) \cup (L \bowtie \Delta R) - (\Delta L \bowtie \Delta R)$$
Since only the orders table changed:
- $\Delta L$ = changes to orders (one DELETE: order #2)
- $\Delta R$ = changes to customers (empty)
So:
- Part 1: $\Delta\text{orders} \bowtie \text{customers}$ = the deleted order joined with its customer
- Part 2: $\text{orders} \bowtie \Delta\text{customers}$ = empty (no customer changes)
- Part 3: $\Delta\text{orders} \bowtie \Delta\text{customers}$ = empty (customers unchanged)
Part 1 produces:
name | tier | amount | __pgt_action
------|---------|--------|-------------
alice | premium | 30.00 | D
The deleted order is joined with alice's customer record to produce a DELETE delta row with the complete joined values.
MERGE
The MERGE matches the row (alice, premium, 30.00) and deletes it:
SELECT * FROM order_details;
name | tier | amount
-------|----------|-------
alice | premium | 50.00 ← alice's remaining order
bob | standard | 75.00
What About Deleting From the Dimension Table?
DELETE FROM customers WHERE id = 2; -- remove bob entirely
Now $\Delta R$ has a DELETE for bob, while $\Delta L$ is empty:
- Part 1: $\Delta\text{orders} \bowtie \text{customers}$ = empty
- Part 2: $\text{orders} \bowtie \Delta\text{customers}$ = bob's order(s) joined with deleted customer record
Part 2 produces DELETE events for every order that referenced bob:
name | tier | amount | __pgt_action
-----|----------|--------|-------------
bob | standard | 75.00 | D
After MERGE, bob's rows vanish from the stream table.
Note: This assumes referential integrity — if
ordersstill references customer #2, a foreign key constraint would prevent the DELETE in practice. But from the IVM perspective, the join delta correctly handles the removal regardless.
Case 7: Bulk DELETE
DELETE FROM orders WHERE amount < 50.00;
This deletes multiple rows across potentially multiple groups. The trigger fires once per row (it's a FOR EACH ROW trigger), writing one change buffer entry per deleted row:
change_id | action | old_cust | old_amt | pk_hash
13 | D | alice | 30.00 | pk_hash_2
14 | D | bob | 25.00 | pk_hash_4
Scan Delta
Each deleted PK is independent (different pk_hash values), so each takes the single-change fast path. Two DELETE events:
__pgt_row_id | __pgt_action | customer | amount
-------------|--------------|----------|-------
pk_hash_2 | D | alice | 30.00
pk_hash_4 | D | bob | 25.00
Aggregate Delta
The aggregate groups these by customer:
Group "alice":
del_count = 1, del_total = 30.00
new_count = 2 - 1 = 1 (survives)
Group "bob":
del_count = 1, del_total = 25.00
new_count = 2 - 1 = 1 (survives)
Both groups survive (count > 0), so the aggregate emits UPDATE (as 'I') events with new values:
customer | total | order_count
---------|-------|------------
alice | 50.00 | 1
bob | 75.00 | 1
The MERGE updates both rows. All work is proportional to the number of deleted rows (2), not the total table size.
Case 8: TRUNCATE (Automatic Full Refresh)
TRUNCATE orders;
TRUNCATE does not fire row-level triggers. However, as of v0.2.0, pg_trickle installs a statement-level AFTER TRUNCATE trigger that writes a 'T' marker to the change buffer. On the next refresh cycle, the scheduler detects this marker and automatically performs a full refresh — truncating the stream table and recomputing from the defining query.
No manual intervention is required. For details on how TRUNCATE is handled across all three refresh modes (DIFFERENTIAL, IMMEDIATE, FULL), see What Happens When You TRUNCATE a Table?.
How DELETE Differs From INSERT and UPDATE — A Summary
| Aspect | INSERT | UPDATE | DELETE |
|---|---|---|---|
| Trigger writes | new_* columns only | Both new_* and old_* | old_* columns only |
| new_ columns* | Row values | New values | NULL |
| old_ columns* | NULL | Old values | Row values |
| pk_hash source | NEW.pk | NEW.pk | OLD.pk |
| Scan delta output | 1 INSERT event | 2 events (D+I split) | 1 DELETE event |
| Aggregate effect | Adds to group count/sum | Subtracts old, adds new | Subtracts from group |
| Can delete a group? | No (only creates/grows) | Yes (if group key changes) | Yes (if count reaches 0) |
| MERGE action | INSERT new row | UPDATE existing row | DELETE matched row |
The Reference Counting Principle
The core insight behind incremental DELETE handling is reference counting. Every aggregate group in the stream table maintains an internal counter (__pgt_count) that tracks how many source rows contribute to the group:
Stream table internal state:
customer | total | order_count | __pgt_count (hidden)
---------|-------|-------------|---------------------
alice | 80.00 | 2 | 2
bob | 100.00| 2 | 2
- INSERT →
__pgt_count += 1 - DELETE →
__pgt_count -= 1 - UPDATE →
__pgt_count += 0(D cancels I for same-group updates)
When __pgt_count reaches 0:
- The group has zero contributing rows
- The aggregate emits a DELETE event
- The MERGE removes the row from the stream table
This is mathematically rigorous — the stream table always reflects the correct result of the defining query over the current base table contents, incrementally maintained through algebraic delta operations.
Performance Summary
| Scenario | Buffer rows | Delta rows emitted | Work |
|---|---|---|---|
| Single row DELETE (group survives) | 1 | 1 (D) | O(1) per group |
| Single row DELETE (group vanishes) | 1 | 1 (D) | O(1) |
| N deletes same group | N | N (D) → 1 group delta | O(N) scan, O(1) per group |
| INSERT+DELETE same window | 2 | 0 (cancels) | O(1) |
| UPDATE+DELETE same window | 2 | 1 (D original) | O(1) |
| Bulk DELETE across M groups | N | N (D) → M group deltas | O(N) scan, O(M) aggregate |
| JOIN table DELETE | 1 | K (one per matched join row) | O(K) join |
In all cases, the work is proportional to the number of changed rows, not the total table size. Deleting 3 rows from a billion-row table produces the same delta cost as from a 10-row table.
What About IMMEDIATE Mode?
Everything above describes DIFFERENTIAL mode — changes accumulate in a buffer and are applied on a schedule. As of v0.2.0, pg_trickle also supports IMMEDIATE mode, where the stream table is updated synchronously within the same transaction as your DELETE.
How IMMEDIATE Mode Differs for DELETE
| Phase | DIFFERENTIAL | IMMEDIATE |
|---|---|---|
| Trigger type | Row-level AFTER trigger | Statement-level AFTER trigger with REFERENCING OLD TABLE |
| What's captured | One buffer row with old_* columns per deleted row | A transition table containing all deleted rows |
| When delta runs | Next scheduler tick | Immediately, in the same transaction |
| Delta source | Change buffer rows with action='D' | Temp table copied from transition table |
| Concurrency | No locking between writers | Advisory lock per stream table |
When you run DELETE FROM orders WHERE id = 2:
- A BEFORE DELETE trigger acquires an advisory lock on the stream table
- The AFTER DELETE trigger captures
OLD TABLE AS __pgt_oldtableinto a temp table - The DVM engine generates the same aggregate delta, reading deleted values from the old-table
- The delta is applied to the stream table immediately — groups are decremented, and groups reaching count=0 are removed
- Any query within the same transaction sees the updated stream table
BEGIN;
DELETE FROM orders WHERE id = 2; -- alice's 30.00 order
-- customer_totals already reflects the deletion here!
SELECT * FROM customer_totals WHERE customer = 'alice';
-- Shows: alice | 50.00 | 1
COMMIT;
The same reference counting, group deletion, and net-effect logic applies — the only difference is the data source (transition tables vs change buffer) and timing (synchronous vs scheduled).
Next in This Series
- What Happens When You INSERT a Row? — The full 7-phase lifecycle (start here if you haven't already)
- What Happens When You UPDATE a Row? — D+I split, group key changes, net-effect for multiple UPDATEs
- What Happens When You TRUNCATE a Table? — Why TRUNCATE bypasses triggers and how to recover
What Happens When You TRUNCATE a Table?
This tutorial explains what happens when a TRUNCATE statement hits a base table that is referenced by a stream table. Unlike INSERT, UPDATE, and DELETE — which are fully tracked by the CDC trigger — TRUNCATE is a special case that bypasses row-level triggers entirely. Understanding this gap is essential for operating pg_trickle correctly.
Prerequisite: Read WHAT_HAPPENS_ON_INSERT.md first — it introduces the 7-phase lifecycle. This tutorial explains why TRUNCATE breaks that lifecycle and how to recover.
Setup
Same e-commerce example used throughout the series:
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
customer TEXT NOT NULL,
amount NUMERIC(10,2) NOT NULL
);
SELECT pgtrickle.create_stream_table(
name => 'customer_totals',
query => $$
SELECT customer, SUM(amount) AS total, COUNT(*) AS order_count
FROM orders GROUP BY customer
$$,
schedule => '1m'
);
-- Seed some data
INSERT INTO orders (customer, amount) VALUES
('alice', 50.00),
('alice', 30.00),
('bob', 75.00),
('bob', 25.00);
After the first refresh, the stream table contains:
customer | total | order_count
---------|--------|------------
alice | 80.00 | 2
bob | 100.00 | 2
Case 1: TRUNCATE the Base Table (DIFFERENTIAL Mode)
TRUNCATE orders;
All four rows are removed instantly.
What Happens at the Trigger Level: TRUNCATE Marker
Updated in v0.2.0: pg_trickle now installs a statement-level AFTER TRUNCATE trigger on tracked source tables. This trigger writes a single marker row to the change buffer with
action = 'T'.
Unlike the per-row DML triggers, the TRUNCATE trigger cannot capture individual row data (PostgreSQL's TRUNCATE does not provide OLD records). Instead, it writes a sentinel:
pgtrickle_changes.changes_16384
┌───────────┬─────────────┬────────┬──────────┬──────────┐
│ change_id │ lsn │ action │ new_* │ old_* │
├───────────┼─────────────┼────────┼──────────┼──────────┤
│ 5 │ 0/1A3F4000 │ T │ NULL │ NULL │
└───────────┴─────────────┴────────┴──────────┴──────────┘
The 'T' action marker tells the refresh engine: "a TRUNCATE happened — a full refresh is required."
What Happens at the Scheduler: Automatic Full Refresh
On the next refresh cycle, the scheduler:
- Checks the change buffer for rows in the LSN window
- Finds the
action = 'T'marker row - Falls back to a FULL refresh — regardless of the stream table's configured
refresh_mode - TRUNCATEs the stream table
- Re-executes the defining query against the current base table state
- Inserts all results
Since the orders table is now empty, the defining query returns zero rows:
-- After the next scheduled refresh:
SELECT * FROM customer_totals;
customer | total | order_count
----------|-------|------------
(0 rows) ← correct: orders is empty
No manual intervention required. The TRUNCATE marker ensures the stream table is automatically brought back into consistency on the next refresh cycle.
Note: In versions before v0.2.0, TRUNCATE was not captured at all — the change buffer stayed empty and the stream table became silently stale. If you're running an older version, you still need to call
pgtrickle.refresh_stream_table()manually after a TRUNCATE.
Case 2: Manual Refresh (Explicit Recovery)
Although TRUNCATE is now automatically handled on the next refresh cycle, you can force an immediate recovery without waiting:
SELECT pgtrickle.refresh_stream_table('customer_totals');
This executes a full refresh regardless of the stream table's configured refresh mode:
- TRUNCATE the stream table itself (clearing the stale data)
- Re-execute the defining query
- INSERT the results into the stream table
- Update the frontier so future differential refreshes start from the current LSN
This is useful when you can't wait for the next scheduled refresh cycle and need the stream table consistent immediately.
Case 3: TRUNCATE Then INSERT (Common ETL Pattern)
A common data loading pattern is:
BEGIN;
TRUNCATE orders;
INSERT INTO orders (customer, amount) VALUES
('charlie', 100.00),
('charlie', 200.00),
('dave', 150.00);
COMMIT;
What the Change Buffer Sees
- TRUNCATE: 1 marker event (
action = 'T') — captured by the statement-level trigger - INSERT charlie 100.00: 1 event (captured)
- INSERT charlie 200.00: 1 event (captured)
- INSERT dave 150.00: 1 event (captured)
The change buffer has 4 rows — the TRUNCATE marker plus 3 INSERT events.
What the Scheduler Does
The scheduler sees the action = 'T' marker and triggers a full refresh, ignoring the individual INSERT events. The full refresh re-executes the defining query against the current state of orders, which now contains only charlie and dave:
-- After the next scheduled refresh:
SELECT * FROM customer_totals;
customer | total | order_count
----------|--------|------------
charlie | 300.00 | 2 ← correct
dave | 150.00 | 1 ← correct
The old data (alice, bob) is gone because the full refresh recomputed from scratch. This is correct — the TRUNCATE marker ensures consistency regardless of what other changes occurred in the same window.
Case 4: TRUNCATE a Dimension Table in a JOIN
Consider a stream table that joins two tables:
CREATE TABLE customers (
id SERIAL PRIMARY KEY,
name TEXT NOT NULL,
tier TEXT NOT NULL DEFAULT 'standard'
);
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
customer_id INT REFERENCES customers(id),
amount NUMERIC(10,2)
);
SELECT pgtrickle.create_stream_table(
name => 'order_details',
query => $$
SELECT c.name, c.tier, o.amount
FROM orders o
JOIN customers c ON o.customer_id = c.id
$$,
schedule => '1m'
);
Now truncate the dimension table:
TRUNCATE customers CASCADE;
The CASCADE also truncates orders (due to the foreign key). Both tables have TRUNCATE triggers installed, so both write a 'T' marker to their respective change buffers.
On the next refresh cycle, the scheduler detects the TRUNCATE markers and performs a full refresh. The stream table is recomputed from the now-empty base tables:
-- After the next scheduled refresh:
SELECT * FROM order_details;
-- (0 rows) — correct
Case 5: FULL Mode Stream Tables Are Immune
If the stream table uses FULL refresh mode instead of DIFFERENTIAL:
SELECT pgtrickle.create_stream_table(
name => 'customer_totals_full',
query => $$
SELECT customer, SUM(amount) AS total, COUNT(*) AS order_count
FROM orders GROUP BY customer
$$,
schedule => '1m',
refresh_mode => 'FULL'
);
A FULL-mode stream table doesn't use the change buffer at all. Every refresh cycle:
- TRUNCATEs the stream table
- Re-executes the defining query
- Inserts all results
So after a TRUNCATE of the base table, the next scheduled refresh automatically picks up the correct state — no manual intervention needed. The trade-off is that every refresh recomputes from scratch, which is more expensive for large result sets.
Why PostgreSQL Doesn't Fire Row Triggers on TRUNCATE
Understanding the PostgreSQL internals helps explain why per-row capture is impossible:
| Operation | Mechanism | Row triggers fired? | Statement triggers fired? |
|---|---|---|---|
DELETE FROM t | Scans and removes rows one by one | Yes — AFTER DELETE per row | Yes |
TRUNCATE t | Removes all heap files and reinitializes the table storage | No — no per-row processing | Yes — AFTER TRUNCATE |
DELETE FROM t WHERE true | Same as DELETE FROM t (full scan) | Yes — AFTER DELETE per row | Yes |
TRUNCATE is fundamentally different from DELETE. It's an O(1) operation that replaces the table's storage files, while DELETE is O(N) — scanning every row and recording each removal in WAL.
pg_trickle uses a statement-level AFTER TRUNCATE trigger to detect the event and write a 'T' marker to the change buffer. This marker does not contain per-row data (PostgreSQL's TRUNCATE trigger doesn't provide OLD records), but it's sufficient to signal that a full refresh is needed.
Alternative: DELETE FROM Instead of TRUNCATE
For DIFFERENTIAL mode, TRUNCATE is now handled automatically (via the 'T' marker and full refresh fallback). However, using DELETE FROM instead of TRUNCATE has its own advantages:
-- Instead of: TRUNCATE orders;
DELETE FROM orders;
This fires the row-level DELETE trigger for every row. The change buffer captures all removals, and the next differential refresh correctly decrements all reference counts through the standard algebraic delta path — avoiding the need for a full refresh fallback.
| Approach | Speed | Stream table consistent? | Refresh type |
|---|---|---|---|
TRUNCATE orders | O(1) — instant | Yes — automatic full refresh on next cycle | FULL (fallback) |
DELETE FROM orders | O(N) — scans all rows | Yes — per-row triggers fire | DIFFERENTIAL |
TRUNCATE + manual refresh | O(1) + O(query) | Yes — immediately | FULL (manual) |
For tables with millions of rows, DELETE FROM can be slow and generate significant WAL. TRUNCATE is generally the better choice — the automatic full refresh fallback makes it safe to use.
Best Practices
1. TRUNCATE Is Safe to Use
As of v0.2.0, TRUNCATE on tracked source tables is automatically detected and triggers a full refresh on the next scheduler cycle. No manual intervention is required for standard operation.
2. Use Manual Refresh for Immediate Consistency
If you need the stream table to be consistent immediately (not on the next cycle), call refresh explicitly:
TRUNCATE orders;
SELECT pgtrickle.refresh_stream_table('customer_totals');
3. Consider IMMEDIATE Mode for Real-Time Needs
For stream tables that need to reflect TRUNCATE instantly (within the same transaction), use IMMEDIATE mode. The TRUNCATE trigger automatically performs a full refresh synchronously.
4. Consider FULL Mode for ETL-Heavy Tables
If a table is routinely truncated and reloaded, FULL refresh mode may be simpler than DIFFERENTIAL — it naturally handles TRUNCATE because it recomputes from scratch every cycle.
5. Use trigger_inventory() to Verify Triggers
You can verify that both the DML trigger and the TRUNCATE trigger are installed and enabled:
SELECT * FROM pgtrickle.trigger_inventory();
This shows one row per (source table, trigger type) confirming both pg_trickle_cdc_<oid> (DML) and pg_trickle_cdc_truncate_<oid> (TRUNCATE) triggers are present.
How TRUNCATE Compares to Other Operations
| Aspect | INSERT | UPDATE | DELETE | TRUNCATE |
|---|---|---|---|---|
| Row trigger fires? | Yes (per row) | Yes (per row) | Yes (per row) | No |
| Statement trigger fires? | Yes | Yes | Yes | Yes (writes 'T' marker) |
| Change buffer | 1 row per INSERT | 1 row per UPDATE | 1 row per DELETE | 1 marker row (action='T') |
| Stream table updated? | Yes (next refresh) | Yes (next refresh) | Yes (next refresh) | Yes (full refresh on next cycle) |
| Recovery | Automatic (differential) | Automatic (differential) | Automatic (differential) | Automatic (full refresh fallback) |
| FULL mode affected? | N/A (recomputes) | N/A (recomputes) | N/A (recomputes) | N/A (recomputes) |
| IMMEDIATE mode? | Synchronous delta | Synchronous delta | Synchronous delta | Synchronous full refresh |
| Speed | O(1) per row | O(1) per row | O(1) per row | O(1) + O(query) for refresh |
What About IMMEDIATE Mode?
In IMMEDIATE mode, TRUNCATE is handled synchronously within the same transaction:
- The BEFORE TRUNCATE trigger acquires an advisory lock on the stream table
- The AFTER TRUNCATE trigger calls
pgt_ivm_handle_truncate(pgt_id) - This function TRUNCATEs the stream table and re-populates it by re-executing the defining query
- The stream table is immediately consistent — within the same transaction
SELECT pgtrickle.create_stream_table(
name => 'customer_totals_live',
query => $$
SELECT customer, SUM(amount) AS total, COUNT(*) AS order_count
FROM orders GROUP BY customer
$$,
refresh_mode => 'IMMEDIATE'
);
BEGIN;
TRUNCATE orders;
-- customer_totals_live is already empty here!
SELECT * FROM customer_totals_live; -- (0 rows)
COMMIT;
No waiting for a scheduler cycle, no stale data — TRUNCATE is fully handled in real-time.
Summary
As of v0.2.0, TRUNCATE is fully tracked by pg_trickle across all three refresh modes. While it cannot be captured as per-row DELETE events (PostgreSQL's TRUNCATE doesn't process individual rows), pg_trickle uses a statement-level trigger to detect the event and respond appropriately.
The key takeaways:
- TRUNCATE is automatically handled — a statement-level AFTER TRUNCATE trigger writes a
'T'marker to the change buffer - DIFFERENTIAL mode: automatic full refresh — the scheduler detects the marker and falls back to a full refresh on the next cycle
- IMMEDIATE mode: synchronous full refresh — the stream table is rebuilt within the same transaction
- FULL mode: naturally immune — every refresh recomputes from scratch regardless
- Manual refresh for instant consistency — call
pgtrickle.refresh_stream_table()if you can't wait for the next cycle DELETE FROMremains an alternative — fires per-row triggers, enabling incremental delta processing instead of full refresh fallback
Next in This Series
- What Happens When You INSERT a Row? — The full 7-phase lifecycle (start here if you haven't already)
- What Happens When You UPDATE a Row? — D+I split, group key changes, net-effect for multiple UPDATEs
- What Happens When You DELETE a Row? — Reference counting, group deletion, INSERT+DELETE cancellation
Tutorial: Build a Real-Time Analytics Dashboard
DOC-NEW-24 (v0.57.0) — End-to-end tutorial: build the backend for a real-time analytics dashboard over a sample e-commerce dataset.
What You Will Build
A real-time analytics backend that powers three dashboard panels:
- Revenue by region — running totals updated within seconds of each order
- Hourly order counts — time-bucketed activity feed for trend charts
- Top 10 products — a continuously-maintained leaderboard by revenue
All three panels are backed by pg_trickle stream tables, so they refresh incrementally — only the rows that actually changed are recomputed.
Prerequisites
- PostgreSQL 18 with pg_trickle installed (see Installation)
psqlor any SQL client
Step 1 — Create the Source Tables
-- Orders: the core transaction table
CREATE TABLE orders (
id BIGSERIAL PRIMARY KEY,
region TEXT NOT NULL,
product_id BIGINT NOT NULL,
amount NUMERIC(12,2) NOT NULL,
placed_at TIMESTAMPTZ DEFAULT now()
);
-- Products: the product catalogue
CREATE TABLE products (
id BIGSERIAL PRIMARY KEY,
name TEXT NOT NULL,
category TEXT NOT NULL
);
-- Seed products
INSERT INTO products (name, category) VALUES
('Laptop Pro 15', 'Electronics'),
('Wireless Keyboard', 'Electronics'),
('Standing Desk', 'Furniture'),
('Ergonomic Chair', 'Furniture'),
('USB-C Hub', 'Electronics');
Step 2 — Enable pg_trickle
CREATE EXTENSION IF NOT EXISTS pg_trickle;
Step 3 — Revenue by Region
This panel shows the total revenue in each region, updated automatically as orders arrive.
SELECT pgtrickle.create_stream_table(
name => 'revenue_by_region',
query => $$
SELECT
region,
COUNT(*) AS order_count,
SUM(amount) AS total_revenue,
AVG(amount) AS avg_order_value,
MAX(placed_at) AS last_order_at
FROM orders
GROUP BY region
$$,
schedule => '5s',
refresh_mode => 'DIFFERENTIAL'
);
-- Index for fast dashboard lookups
CREATE INDEX ON revenue_by_region (region);
Dashboard query:
SELECT region,
order_count,
total_revenue,
avg_order_value,
last_order_at
FROM revenue_by_region
ORDER BY total_revenue DESC;
Step 4 — Hourly Order Counts
A time-series stream table that aggregates orders into one-hour buckets.
date_trunc('hour', ...) is a STABLE function, so DIFFERENTIAL mode works.
SELECT pgtrickle.create_stream_table(
name => 'hourly_order_counts',
query => $$
SELECT
date_trunc('hour', placed_at) AS hour,
region,
COUNT(*) AS order_count,
SUM(amount) AS hourly_revenue
FROM orders
GROUP BY date_trunc('hour', placed_at), region
$$,
schedule => '10s',
refresh_mode => 'DIFFERENTIAL'
);
CREATE INDEX ON hourly_order_counts (hour DESC, region);
Dashboard query — last 24 hours:
SELECT hour,
region,
order_count,
hourly_revenue
FROM hourly_order_counts
WHERE hour >= now() - interval '24 hours'
ORDER BY hour DESC, region;
Step 5 — Top 10 Products by Revenue
A leaderboard of the top-selling products. Joining orders to products
to include the product name.
SELECT pgtrickle.create_stream_table(
name => 'top_products',
query => $$
SELECT
p.id AS product_id,
p.name AS product_name,
p.category,
COUNT(o.id) AS order_count,
SUM(o.amount) AS total_revenue
FROM orders o
JOIN products p ON p.id = o.product_id
GROUP BY p.id, p.name, p.category
ORDER BY SUM(o.amount) DESC
LIMIT 10
$$,
schedule => '5s',
refresh_mode => 'DIFFERENTIAL'
);
Dashboard query:
SELECT product_name,
category,
order_count,
total_revenue
FROM top_products
ORDER BY total_revenue DESC;
Note:
LIMIT Nstream tables use differential TOP-K maintenance — pg_trickle tracks the rank boundary and recomputes only when a row enters or exits the top 10.
Step 6 — Insert Some Sample Data and Watch It Update
-- Simulate a batch of orders
INSERT INTO orders (region, product_id, amount) VALUES
('US', 1, 1299.00),
('EU', 2, 149.99),
('US', 3, 799.00),
('APAC', 1, 1299.00),
('US', 4, 599.00),
('EU', 5, 49.99),
('US', 1, 1299.00);
-- Wait a few seconds, then query the stream tables
SELECT * FROM revenue_by_region ORDER BY total_revenue DESC;
SELECT * FROM top_products;
Step 7 — Chain the Stream Tables (Optional)
You can build derived stream tables on top of other stream tables.
For example, compute a "daily summary" that reads from hourly_order_counts:
SELECT pgtrickle.create_stream_table(
name => 'daily_revenue_summary',
query => $$
SELECT
date_trunc('day', hour) AS day,
region,
SUM(order_count) AS total_orders,
SUM(hourly_revenue) AS total_revenue
FROM hourly_order_counts
GROUP BY date_trunc('day', hour), region
$$,
schedule => '30s',
refresh_mode => 'DIFFERENTIAL'
);
pg_trickle automatically builds a dependency DAG: when orders changes,
it refreshes hourly_order_counts first, then daily_revenue_summary
in topological order.
Step 8 — Optional: Grafana Data-Source Configuration
If you use Grafana with the PostgreSQL data source plugin, point it at the database and use the stream tables directly as query targets:
-- Grafana time-series panel query (hourly revenue per region)
SELECT
$__timeGroupAlias(hour, '1h'),
region,
SUM(hourly_revenue) AS revenue
FROM hourly_order_counts
WHERE $__timeFilter(hour)
GROUP BY 1, 2
ORDER BY 1;
Set Refresh to 5s in the Grafana panel options to poll for updates.
Monitor the Stream Tables
-- Check that all three stream tables are ACTIVE and refreshing
SELECT pgt_name, status, refresh_mode,
last_refresh_at,
consecutive_errors
FROM pgtrickle.pgt_stream_tables
WHERE pgt_name IN ('revenue_by_region', 'hourly_order_counts',
'top_products', 'daily_revenue_summary')
ORDER BY pgt_name;
Clean Up
SELECT pgtrickle.drop_stream_table('daily_revenue_summary');
SELECT pgtrickle.drop_stream_table('top_products');
SELECT pgtrickle.drop_stream_table('hourly_order_counts');
SELECT pgtrickle.drop_stream_table('revenue_by_region');
DROP TABLE orders;
DROP TABLE products;
Next Steps
- Add per-tenant isolation with RLS — see ROW_LEVEL_SECURITY.md
- Expose stream tables as a downstream publication — see PUBLICATIONS.md
- Tune refresh intervals for your load — see tuning-refresh-mode.md
- Performance optimisation — see PERFORMANCE_COOKBOOK.md
Tutorial: Stream Tables as Event-Sourced Read Models
DOC-NEW-25 (v0.57.0) — End-to-end tutorial: use stream tables as read-model projections over an event-sourced write model. Models an order-processing domain with CQRS pattern and event-replay guidance.
What You Will Build
An event-sourced order-processing system where:
- Writes go to an immutable
order_eventstable (the event log) - Reads are served by three stream tables (the read models):
current_order_state— current status of each ordercustomer_lifetime_value— rolling spend and order count per customerinventory_levels— current stock count derived from reservation events
This is the CQRS (Command Query Responsibility Segregation) pattern: the write model is append-only events; the read models are incrementally maintained projections.
Prerequisites
- PostgreSQL 18 with pg_trickle installed (see Installation)
- Basic familiarity with event sourcing concepts
Step 1 — The Event Log (Write Model)
The event log is a single append-only table. Every mutation to an order is recorded as a row. The table is never updated or deleted from — only new events are appended.
CREATE TYPE order_event_type AS ENUM (
'ORDER_PLACED',
'PAYMENT_RECEIVED',
'PAYMENT_FAILED',
'SHIPPED',
'DELIVERED',
'CANCELLED',
'REFUNDED',
'ITEM_RESERVED',
'ITEM_RELEASED'
);
CREATE TABLE order_events (
id BIGSERIAL PRIMARY KEY,
event_type order_event_type NOT NULL,
order_id UUID NOT NULL,
customer_id UUID NOT NULL,
product_id BIGINT,
quantity INT,
amount NUMERIC(12,2),
payload JSONB,
occurred_at TIMESTAMPTZ DEFAULT now()
);
-- Immutability enforced: no UPDATE or DELETE allowed
CREATE RULE no_update_order_events AS ON UPDATE TO order_events DO INSTEAD NOTHING;
CREATE RULE no_delete_order_events AS ON DELETE TO order_events DO INSTEAD NOTHING;
Step 2 — Enable pg_trickle
CREATE EXTENSION IF NOT EXISTS pg_trickle;
Step 3 — Current Order State (Read Model)
This stream table folds all events for each order into its current state.
FILTER (WHERE ...) aggregates extract the latest relevant event data per
event type.
SELECT pgtrickle.create_stream_table(
name => 'current_order_state',
query => $$
SELECT
order_id,
customer_id,
MAX(occurred_at)
FILTER (WHERE event_type = 'ORDER_PLACED') AS placed_at,
MAX(occurred_at)
FILTER (WHERE event_type = 'PAYMENT_RECEIVED') AS paid_at,
MAX(occurred_at)
FILTER (WHERE event_type = 'SHIPPED') AS shipped_at,
MAX(occurred_at)
FILTER (WHERE event_type = 'DELIVERED') AS delivered_at,
MAX(occurred_at)
FILTER (WHERE event_type = 'CANCELLED') AS cancelled_at,
SUM(amount)
FILTER (WHERE event_type = 'ORDER_PLACED') AS order_total,
CASE
WHEN BOOL_OR(event_type = 'CANCELLED') THEN 'cancelled'
WHEN BOOL_OR(event_type = 'DELIVERED') THEN 'delivered'
WHEN BOOL_OR(event_type = 'SHIPPED') THEN 'shipped'
WHEN BOOL_OR(event_type = 'PAYMENT_RECEIVED') THEN 'paid'
WHEN BOOL_OR(event_type = 'PAYMENT_FAILED') THEN 'payment_failed'
ELSE 'placed'
END AS status
FROM order_events
WHERE event_type IN (
'ORDER_PLACED', 'PAYMENT_RECEIVED', 'PAYMENT_FAILED',
'SHIPPED', 'DELIVERED', 'CANCELLED'
)
GROUP BY order_id, customer_id
$$,
schedule => '5s',
refresh_mode => 'DIFFERENTIAL'
);
CREATE INDEX ON current_order_state (order_id);
CREATE INDEX ON current_order_state (customer_id, placed_at DESC);
CREATE INDEX ON current_order_state (status, placed_at DESC);
Read-model query — active orders for a customer:
SELECT order_id,
status,
order_total,
placed_at,
shipped_at
FROM current_order_state
WHERE customer_id = $1
AND status NOT IN ('delivered', 'cancelled')
ORDER BY placed_at DESC;
Step 4 — Customer Lifetime Value (Read Model)
Track rolling spend and order count per customer.
SELECT pgtrickle.create_stream_table(
name => 'customer_lifetime_value',
query => $$
SELECT
customer_id,
COUNT(DISTINCT order_id) AS total_orders,
SUM(amount)
FILTER (WHERE event_type = 'PAYMENT_RECEIVED') AS total_spent,
SUM(amount)
FILTER (WHERE event_type = 'REFUNDED') AS total_refunded,
SUM(amount)
FILTER (WHERE event_type = 'PAYMENT_RECEIVED') -
COALESCE(SUM(amount)
FILTER (WHERE event_type = 'REFUNDED'), 0) AS net_revenue,
MIN(occurred_at)
FILTER (WHERE event_type = 'ORDER_PLACED') AS first_order_at,
MAX(occurred_at)
FILTER (WHERE event_type = 'ORDER_PLACED') AS last_order_at
FROM order_events
WHERE event_type IN ('ORDER_PLACED', 'PAYMENT_RECEIVED', 'REFUNDED')
GROUP BY customer_id
$$,
schedule => '10s',
refresh_mode => 'DIFFERENTIAL'
);
CREATE INDEX ON customer_lifetime_value (customer_id);
CREATE INDEX ON customer_lifetime_value (net_revenue DESC);
Read-model query — top customers by net revenue:
SELECT customer_id,
total_orders,
net_revenue,
last_order_at
FROM customer_lifetime_value
ORDER BY net_revenue DESC
LIMIT 20;
Step 5 — Inventory Levels (Read Model)
Derive current stock counts from ITEM_RESERVED and ITEM_RELEASED events.
SELECT pgtrickle.create_stream_table(
name => 'inventory_levels',
query => $$
SELECT
product_id,
SUM(CASE
WHEN event_type = 'ITEM_RESERVED' THEN -quantity
WHEN event_type = 'ITEM_RELEASED' THEN quantity
ELSE 0
END) AS reserved_delta,
SUM(quantity)
FILTER (WHERE event_type = 'ITEM_RESERVED') AS total_reserved,
SUM(quantity)
FILTER (WHERE event_type = 'ITEM_RELEASED') AS total_released,
COUNT(DISTINCT order_id)
FILTER (WHERE event_type = 'ITEM_RESERVED') AS active_reservations
FROM order_events
WHERE event_type IN ('ITEM_RESERVED', 'ITEM_RELEASED')
AND product_id IS NOT NULL
GROUP BY product_id
$$,
schedule => '5s',
refresh_mode => 'DIFFERENTIAL'
);
CREATE INDEX ON inventory_levels (product_id);
Step 6 — Try It with Sample Events
-- A customer places an order
INSERT INTO order_events (event_type, order_id, customer_id, product_id, quantity, amount) VALUES
('ORDER_PLACED', 'ord-001'::uuid, 'cust-A'::uuid, 1, 2, 2598.00),
('ITEM_RESERVED', 'ord-001'::uuid, 'cust-A'::uuid, 1, 2, NULL);
-- Payment succeeds
INSERT INTO order_events (event_type, order_id, customer_id, amount) VALUES
('PAYMENT_RECEIVED', 'ord-001'::uuid, 'cust-A'::uuid, 2598.00);
-- Order ships
INSERT INTO order_events (event_type, order_id, customer_id) VALUES
('SHIPPED', 'ord-001'::uuid, 'cust-A'::uuid);
-- Wait for the scheduler, then read the projections
SELECT * FROM current_order_state WHERE order_id = 'ord-001'::uuid;
SELECT * FROM customer_lifetime_value WHERE customer_id = 'cust-A'::uuid;
SELECT * FROM inventory_levels WHERE product_id = 1;
Step 7 — CQRS Pattern Summary
Write path: Read path:
────────────────── ──────────────────────────────────
Application layer Dashboard / API queries
│ │
│ INSERT INTO order_events │ SELECT FROM current_order_state
│ │ SELECT FROM customer_lifetime_value
▼ │ SELECT FROM inventory_levels
order_events (event log) ▲
│ │
│ pg_trickle CDC triggers │ pg_trickle differential refresh
└──────────────────────────────────────►│
(incremental, per schedule)
The application layer writes only to order_events. pg_trickle handles
all projection maintenance automatically.
Step 8 — Event Replay and Backfill
If you need to rebuild a projection from scratch (e.g., after changing the defining query), use the reinitialize API:
-- Force a full rebuild of the current_order_state projection
SELECT pgtrickle.reinitialize_stream_table('current_order_state');
This triggers a FULL refresh from the event log, rebuilding the projection from all historical events. Once complete, pg_trickle switches back to differential maintenance automatically.
Backfill workflow for a new projection:
-- 1. Create the new projection with IMMEDIATE for the first cycle,
-- then switch to DIFFERENTIAL
SELECT pgtrickle.create_stream_table(
name => 'new_projection',
query => '...',
schedule => '5s',
refresh_mode => 'FULL' -- use FULL for initial backfill
);
-- 2. Wait for the first full cycle to complete
SELECT pgt_name, status, last_refresh_at
FROM pgtrickle.pgt_stream_tables
WHERE pgt_name = 'new_projection';
-- 3. Once status = 'ACTIVE', switch to DIFFERENTIAL
SELECT pgtrickle.alter_stream_table('new_projection',
refresh_mode => 'DIFFERENTIAL'
);
Monitor the Projections
SELECT pgt_name, status, refresh_mode,
last_refresh_at,
consecutive_errors,
rows_in_last_refresh
FROM pgtrickle.pgt_stream_tables
WHERE pgt_name IN ('current_order_state', 'customer_lifetime_value',
'inventory_levels')
ORDER BY pgt_name;
Clean Up
SELECT pgtrickle.drop_stream_table('inventory_levels');
SELECT pgtrickle.drop_stream_table('customer_lifetime_value');
SELECT pgtrickle.drop_stream_table('current_order_state');
DROP TABLE order_events;
DROP TYPE order_event_type;
Next Steps
- Chain projections for derived aggregates — see FIRST_DASHBOARD.md Step 7
- Add downstream publication for external consumers — see PUBLICATIONS.md
- Secure projections with RLS — see ROW_LEVEL_SECURITY.md
- Backfill and migration guide — see BACKFILL_AND_MIGRATION.md
Tutorial: Zero-Downtime Migration from Materialized Views
DOC-NEW-26 (v0.57.0) — Step-by-step guide: migrate a manually-maintained materialized view to a stream table with zero downtime.
Overview
This tutorial walks through migrating an existing REFRESH MATERIALIZED VIEW
workflow to a pg_trickle stream table without any downtime or data loss.
The process runs the old view and the new stream table in parallel so you can
verify correctness before cutting over consumers.
Prerequisites
- PostgreSQL 18 with pg_trickle installed (see Installation)
- An existing
MATERIALIZED VIEWwith a known refresh schedule - At least
SELECTaccess to the materialized view
Step 1 — Pre-Migration Assessment
Before migrating, assess whether the defining query is IVM-eligible.
-- Check the current materialized view definition
SELECT schemaname, matviewname, definition
FROM pg_matviews
WHERE matviewname = 'my_view';
-- Validate the query against pg_trickle's IVM compatibility checker
SELECT pgtrickle.validate_query(
$$<paste your view definition here>$$
);
Example output:
result | detail
---------+--------------------------------------------------------------
ok | Query is IVM-eligible. Recommended mode: DIFFERENTIAL
If validate_query returns ok, the migration is straightforward.
If it returns warnings or not_eligible, check the detail column —
common reasons include volatile functions or unsupported SQL patterns.
See LIMITATIONS.md for the full list.
For non-eligible queries, use refresh_mode => 'FULL' — pg_trickle
will maintain a full-refresh stream table automatically, eliminating the
manual REFRESH MATERIALIZED VIEW calls.
Step 2 — Create the Stream Table in Parallel
Do not drop the old materialized view yet. Create the stream table alongside it, pointing at the same source tables.
-- Example: migrating this materialized view
-- CREATE MATERIALIZED VIEW orders_summary AS
-- SELECT region, COUNT(*) AS order_count, SUM(amount) AS total
-- FROM orders GROUP BY region;
-- Create the equivalent stream table
SELECT pgtrickle.create_stream_table(
name => 'orders_summary_st',
query => $$
SELECT region,
COUNT(*) AS order_count,
SUM(amount) AS total
FROM orders
GROUP BY region
$$,
schedule => '10s',
refresh_mode => 'DIFFERENTIAL'
);
The stream table will populate within one refresh cycle. Check status:
SELECT pgt_name, status, last_refresh_at, rows_in_last_refresh
FROM pgtrickle.pgt_stream_tables
WHERE pgt_name = 'orders_summary_st';
Step 3 — Verify Output Parity
Compare the outputs of both the old view and the new stream table:
-- Rows in old view but not in stream table
SELECT * FROM orders_summary
EXCEPT
SELECT * FROM orders_summary_st;
-- Rows in stream table but not in old view
SELECT * FROM orders_summary_st
EXCEPT
SELECT * FROM orders_summary;
Both queries should return zero rows. If there are differences:
- Check that the stream table has had at least one full refresh cycle.
- Verify the defining query is identical (column order and aliases matter).
- Run
SELECT pgtrickle.reinitialize_stream_table('orders_summary_st')to force a clean full refresh if there is any doubt.
For long-running parallel validation, write DML to the source table and verify both targets update correctly:
INSERT INTO orders (region, amount) VALUES ('TEST', 99.99);
-- Both should show the TEST region
SELECT * FROM orders_summary WHERE region = 'TEST';
SELECT * FROM orders_summary_st WHERE region = 'TEST';
Step 4 — Create a Compatibility View (Optional)
If consumers reference orders_summary by name and you cannot update them
before cutover, create a compatibility view that points to the stream table:
-- 1. Rename the old materialized view
ALTER MATERIALIZED VIEW orders_summary RENAME TO orders_summary_old;
-- 2. Create a regular view with the original name, reading from the ST
CREATE VIEW orders_summary AS SELECT * FROM orders_summary_st;
Consumers now read from the stream table transparently. The old materialized view remains as a fallback.
Step 5 — Consumer Cutover
Once parallel validation passes, cut over consumers:
Option A — Direct table reference (recommended):
Update consumer queries/code to reference orders_summary_st directly.
This is the cleanest path and exposes pg_trickle's automatic freshness.
Option B — Keep the compatibility view: If you created the compatibility view in Step 4, consumers already read from the stream table. No further changes needed.
Step 6 — Remove the Old Materialized View
After confirming all consumers use the stream table:
-- Remove the old materialized view
DROP MATERIALIZED VIEW orders_summary_old;
-- If you kept the compatibility view, optionally rename the stream table
-- to match the original name:
-- SELECT pgtrickle.drop_stream_table('orders_summary_st');
-- SELECT pgtrickle.create_stream_table('orders_summary', ...)
-- Or rename the view to align naming conventions.
Also remove any cron jobs or application code that called
REFRESH MATERIALIZED VIEW orders_summary.
Step 7 — Rollback Procedure
If problems arise after cutover:
-- 1. Stop the stream table from refreshing
SELECT pgtrickle.pause_stream_table('orders_summary_st');
-- 2. Revert consumers to the old materialized view
-- (if you renamed it in Step 4)
ALTER MATERIALIZED VIEW orders_summary_old RENAME TO orders_summary;
-- 3. Drop the compatibility view if you created one
DROP VIEW IF EXISTS orders_summary;
-- 4. Resume your manual refresh schedule
-- (add back the cron job / pg_cron entry for REFRESH MATERIALIZED VIEW)
-- 5. Optionally drop the stream table
SELECT pgtrickle.drop_stream_table('orders_summary_st');
Common Migration Patterns
Non-IVM-eligible queries (use FULL mode)
-- Query uses a volatile function; pg_trickle will use FULL refresh
SELECT pgtrickle.create_stream_table(
name => 'hourly_snapshot',
query => $$
SELECT *, now() AS snapshot_at FROM large_table
$$,
schedule => '1h',
refresh_mode => 'FULL'
);
Concurrently-refreshed materialized views
If the old view used REFRESH MATERIALIZED VIEW CONCURRENTLY, note that
pg_trickle's MERGE-based update is also non-blocking for readers. No special
configuration is needed.
Views with WITH DATA at creation
pg_trickle always populates the stream table on the first cycle, equivalent
to WITH DATA. The WITHOUT DATA option does not apply.
Post-Migration Checklist
-
pgtrickle.validate_query()returnedokor migration is in FULL mode -
Stream table reached
status = 'ACTIVE' -
EXCEPTdiff queries return zero rows -
Manual
REFRESH MATERIALIZED VIEWcalls removed from cron/pg_cron - Old materialized view dropped (or retained as read-only archive)
- Consumer queries point to the stream table
Next Steps
- Tune the refresh interval and mode — see tuning-refresh-mode.md
- Add monitoring and alerts — see MONITORING_AND_ALERTING.md
- Performance optimisation — see PERFORMANCE_COOKBOOK.md
Tutorial: Security Hardening for pg_trickle
DOC-NEW-27 (v0.57.0) — Step-by-step security hardening guide: dedicated roles, CDC trigger ownership, change-buffer protection, and audit logging.
Overview
This guide hardens a pg_trickle installation following the principle of least privilege. After completing these steps:
- Stream tables are owned by a dedicated non-superuser role.
- Application users can read (but not write) stream tables.
- Change buffers are protected from direct application access.
- DDL operations against stream tables are audit-logged.
Prerequisites
- PostgreSQL 18 with pg_trickle installed as a superuser
psqlor an admin SQL client
Step 1 — Create Dedicated Roles
Run these statements as a superuser (e.g., postgres).
-- ─── pgtrickle_admin ──────────────────────────────────────────────────────
-- Manages stream tables: create, alter, drop, reinitialize.
-- Intended for DBAs and data engineers.
CREATE ROLE pgtrickle_admin NOLOGIN NOINHERIT;
GRANT USAGE ON SCHEMA pgtrickle TO pgtrickle_admin;
GRANT USAGE ON SCHEMA pgtrickle_changes TO pgtrickle_admin;
GRANT EXECUTE ON ALL FUNCTIONS IN SCHEMA pgtrickle TO pgtrickle_admin;
-- Allow creating stream tables in the public schema
GRANT CREATE ON SCHEMA public TO pgtrickle_admin;
-- Allow reading source tables (add schemas as required)
GRANT USAGE ON SCHEMA public TO pgtrickle_admin;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO pgtrickle_admin;
-- Future tables in public schema (run once per schema)
ALTER DEFAULT PRIVILEGES IN SCHEMA public
GRANT SELECT ON TABLES TO pgtrickle_admin;
-- ─── pgtrickle_user ───────────────────────────────────────────────────────
-- Reads stream tables and calls monitoring functions.
-- Intended for application backends.
CREATE ROLE pgtrickle_user NOLOGIN NOINHERIT;
GRANT USAGE ON SCHEMA pgtrickle TO pgtrickle_user;
-- Read-only access to stream tables (granted per-table below)
-- Monitoring functions
GRANT EXECUTE ON FUNCTION pgtrickle.pgt_status() TO pgtrickle_user;
GRANT EXECUTE ON FUNCTION pgtrickle.refresh_efficiency() TO pgtrickle_user;
GRANT EXECUTE ON FUNCTION pgtrickle.health_check() TO pgtrickle_user;
-- ─── pgtrickle_readonly ───────────────────────────────────────────────────
-- Pure read access to stream tables only; no extension function access.
-- Intended for reporting tools and BI consumers.
CREATE ROLE pgtrickle_readonly NOLOGIN NOINHERIT;
GRANT USAGE ON SCHEMA public TO pgtrickle_readonly;
-- Per-table GRANT added below after stream tables are created.
Step 2 — Grant Roles to Login Roles
-- Example: your data engineer login
CREATE ROLE de_alice LOGIN PASSWORD '...';
GRANT pgtrickle_admin TO de_alice;
-- Example: your application backend login
CREATE ROLE app_backend LOGIN PASSWORD '...';
GRANT pgtrickle_user TO app_backend;
-- Example: your BI tool login
CREATE ROLE bi_tool LOGIN PASSWORD '...';
GRANT pgtrickle_readonly TO bi_tool;
Step 3 — Create Stream Tables Under the Admin Role
Connect as the pgtrickle_admin role (or SET ROLE pgtrickle_admin) and
create stream tables. The admin role becomes the owner, not the superuser.
SET ROLE pgtrickle_admin;
SELECT pgtrickle.create_stream_table(
name => 'order_summary',
query => $$SELECT region, SUM(amount) AS total FROM orders GROUP BY region$$,
schedule => '10s'
);
RESET ROLE;
Verify ownership:
SELECT tablename, tableowner
FROM pg_tables
WHERE tablename = 'order_summary';
Step 4 — Grant Read Access to Consumer Roles
-- pgtrickle_user: reads stream tables and calls monitoring functions
GRANT SELECT ON order_summary TO pgtrickle_user;
-- pgtrickle_readonly: pure read access
GRANT SELECT ON order_summary TO pgtrickle_readonly;
-- For future stream tables, set default privileges so new tables are
-- automatically accessible:
ALTER DEFAULT PRIVILEGES FOR ROLE pgtrickle_admin IN SCHEMA public
GRANT SELECT ON TABLES TO pgtrickle_user;
ALTER DEFAULT PRIVILEGES FOR ROLE pgtrickle_admin IN SCHEMA public
GRANT SELECT ON TABLES TO pgtrickle_readonly;
Step 5 — Protect Change Buffers
Change buffers in pgtrickle_changes should never be directly accessible
to application users. Revoke all access and grant only to the extension owner:
-- Revoke PUBLIC access (if not already revoked during extension install)
REVOKE ALL ON SCHEMA pgtrickle_changes FROM PUBLIC;
-- Application roles must not see change buffer tables
REVOKE ALL ON ALL TABLES IN SCHEMA pgtrickle_changes FROM pgtrickle_user;
REVOKE ALL ON ALL TABLES IN SCHEMA pgtrickle_changes FROM pgtrickle_readonly;
REVOKE ALL ON ALL TABLES IN SCHEMA pgtrickle_changes FROM pgtrickle_admin;
-- Verify: this query should return zero rows for non-superuser roles
SELECT table_name
FROM information_schema.role_table_grants
WHERE table_schema = 'pgtrickle_changes'
AND grantee IN ('pgtrickle_user', 'pgtrickle_readonly', 'pgtrickle_admin');
Step 6 — Secure CDC Trigger Ownership
CDC triggers on source tables are owned by the stream table owner
(pgtrickle_admin). Verify this:
-- CDC triggers should be owned by pgtrickle_admin, not a superuser
SELECT trigger_name, event_object_table, action_statement
FROM information_schema.triggers
WHERE trigger_name LIKE 'pgt_cdc_%'
ORDER BY event_object_table;
-- Verify trigger function ownership
SELECT proname, rolname AS owner
FROM pg_proc
JOIN pg_roles ON pg_roles.oid = pg_proc.proowner
WHERE proname LIKE 'pgt_cdc_%';
If triggers are owned by postgres (the superuser), recreate the stream
tables under pgtrickle_admin (drop and recreate via SET ROLE pgtrickle_admin).
Step 7 — Enable Audit Logging for Stream Table DDL
Use PostgreSQL's log_statement or pg_audit (if installed) to capture
DDL events against pg_trickle objects.
Using log_statement (built-in)
-- Log all DDL operations (creates, alters, drops)
ALTER SYSTEM SET log_statement = 'ddl';
SELECT pg_reload_conf();
DDL against stream tables — including pgtrickle.create_stream_table(),
pgtrickle.drop_stream_table(), and pgtrickle.alter_stream_table() —
will appear in the PostgreSQL log.
Using pg_audit (recommended for production)
-- Install pg_audit extension (if available)
CREATE EXTENSION IF NOT EXISTS pgaudit;
-- Audit all DDL and function calls in the pgtrickle schema
ALTER SYSTEM SET pgaudit.log = 'DDL, FUNCTION';
SELECT pg_reload_conf();
Query the pg_trickle DDL history
pg_trickle records every refresh in pgtrickle.pgt_refresh_history.
For change-level audit trails:
-- All stream table DDL operations (create, alter, drop)
SELECT pgt_name, action, performed_by, performed_at
FROM pgtrickle.pgt_ddl_history
ORDER BY performed_at DESC
LIMIT 50;
-- Recent refresh failures
SELECT pgt_name, refresh_mode, started_at, error_message
FROM pgtrickle.pgt_refresh_history
WHERE NOT success
AND started_at > now() - interval '24 hours'
ORDER BY started_at DESC;
Step 8 — Disable Extension Behaviour in Non-Refresh Environments
If you have replica databases or analysis environments where you do not want pg_trickle running refreshes:
-- Disable the scheduler without uninstalling the extension
ALTER SYSTEM SET pg_trickle.enabled = off;
SELECT pg_reload_conf();
Verification Checklist
After completing all steps, verify the hardened state:
-- 1. pgtrickle_admin can create stream tables
SET ROLE pgtrickle_admin;
SELECT pgtrickle.validate_query('SELECT 1');
RESET ROLE;
-- 2. pgtrickle_user can read stream tables but cannot modify them
SET ROLE pgtrickle_user;
SELECT * FROM order_summary LIMIT 1; -- should succeed
-- INSERT INTO order_summary VALUES (...); -- should fail with permission denied
RESET ROLE;
-- 3. pgtrickle_readonly cannot call extension functions
SET ROLE pgtrickle_readonly;
-- SELECT pgtrickle.refresh_stream_table('order_summary'); -- should fail
RESET ROLE;
-- 4. No application role can see change buffers
SELECT COUNT(*)
FROM information_schema.role_table_grants
WHERE table_schema = 'pgtrickle_changes'
AND grantee NOT IN ('postgres', 'pg_trickle');
-- Expected: 0
Security Hardening Checklist
-
pgtrickle_adminrole created with NOLOGIN NOINHERIT -
pgtrickle_userrole created for application backends -
pgtrickle_readonlyrole created for BI / reporting tools -
Stream tables owned by
pgtrickle_admin, not a superuser -
REVOKE ALL ON SCHEMA pgtrickle_changes FROM PUBLIC -
Application roles have no access to
pgtrickle_changes.* -
Audit logging enabled (
log_statement = 'ddl'orpg_audit) -
pg_trickle.allow_circular = off(default) -
pg_trickle.enabled = offon replica / analysis environments
Next Steps
- Full security model reference — see SECURITY_MODEL.md
- RLS patterns for per-tenant stream tables — see ROW_LEVEL_SECURITY.md
- Security Guide (threat model, CDC triggers, hardening checklist) — see SECURITY_GUIDE.md
Row-Level Security (RLS) on Stream Tables
This tutorial shows how to apply PostgreSQL Row-Level Security to stream tables so that different database roles see only the rows they are permitted to access.
Background
Stream tables materialize the full result set of their defining query,
regardless of any RLS policies on the source tables. This matches the behavior
of PostgreSQL's built-in MATERIALIZED VIEW — the cache contains everything,
and access control is enforced at read time.
The recommended pattern is:
- Source tables: may or may not have RLS. Stream tables always see all rows.
- Stream table: enable RLS on the stream table and create per-role policies so each role sees only its permitted rows.
Setup: Multi-Tenant Orders
-- Source table: all tenant orders
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
tenant_id INT NOT NULL,
product TEXT NOT NULL,
amount NUMERIC(10,2) NOT NULL
);
INSERT INTO orders (tenant_id, product, amount) VALUES
(1, 'Widget A', 19.99),
(1, 'Widget B', 9.50),
(2, 'Gadget X', 49.00),
(2, 'Gadget Y', 25.00),
(3, 'Doohickey', 5.00);
-- Stream table: per-tenant spend summary
SELECT pgtrickle.create_stream_table(
name => 'tenant_spend',
query => $$
SELECT tenant_id,
COUNT(*) AS order_count,
SUM(amount) AS total_spend
FROM orders
GROUP BY tenant_id
$$,
schedule => '1m'
);
After the first refresh, tenant_spend contains all three tenants:
SELECT * FROM pgtrickle.tenant_spend ORDER BY tenant_id;
-- tenant_id | order_count | total_spend
-- -----------+-------------+-------------
-- 1 | 2 | 29.49
-- 2 | 2 | 74.00
-- 3 | 1 | 5.00
Step 1: Enable RLS on the Stream Table
ALTER TABLE pgtrickle.tenant_spend ENABLE ROW LEVEL SECURITY;
Once RLS is enabled, non-superuser roles see zero rows unless a policy grants access. The superuser (table owner) bypasses RLS by default.
Step 2: Create Per-Tenant Roles
CREATE ROLE tenant_1 LOGIN;
CREATE ROLE tenant_2 LOGIN;
GRANT USAGE ON SCHEMA pgtrickle TO tenant_1, tenant_2;
GRANT SELECT ON pgtrickle.tenant_spend TO tenant_1, tenant_2;
Step 3: Create RLS Policies
-- Tenant 1 sees only tenant_id = 1
CREATE POLICY tenant_1_policy ON pgtrickle.tenant_spend
FOR SELECT
TO tenant_1
USING (tenant_id = 1);
-- Tenant 2 sees only tenant_id = 2
CREATE POLICY tenant_2_policy ON pgtrickle.tenant_spend
FOR SELECT
TO tenant_2
USING (tenant_id = 2);
Step 4: Verify Filtering
Connect as each tenant role and query:
-- As tenant_1:
SET ROLE tenant_1;
SELECT * FROM pgtrickle.tenant_spend;
-- tenant_id | order_count | total_spend
-- -----------+-------------+-------------
-- 1 | 2 | 29.49
RESET ROLE;
-- As tenant_2:
SET ROLE tenant_2;
SELECT * FROM pgtrickle.tenant_spend;
-- tenant_id | order_count | total_spend
-- -----------+-------------+-------------
-- 2 | 2 | 74.00
RESET ROLE;
Each tenant sees only their own data. The underlying stream table still contains all rows — the filtering happens at query time via RLS.
How Refresh Works with RLS
Both scheduled and manual refreshes run with superuser-equivalent privileges, so RLS on source tables is always bypassed during refresh. This ensures:
- The stream table always contains the complete result set.
- A
refresh_stream_table()call produces the same result regardless of who calls it. - IMMEDIATE mode (IVM triggers) also bypasses RLS via
SECURITY DEFINERtrigger functions.
Policy Change Detection
pg_trickle automatically detects RLS-related DDL on source tables:
| DDL on source table | Effect |
|---|---|
CREATE POLICY / ALTER POLICY / DROP POLICY | Stream table marked for reinit |
ALTER TABLE ... ENABLE ROW LEVEL SECURITY | Stream table marked for reinit |
ALTER TABLE ... DISABLE ROW LEVEL SECURITY | Stream table marked for reinit |
ALTER TABLE ... FORCE ROW LEVEL SECURITY | Stream table marked for reinit |
ALTER TABLE ... NO FORCE ROW LEVEL SECURITY | Stream table marked for reinit |
Since the stream table always sees all rows (bypassing RLS), these reinits serve as a confirmation that the materialized data remains consistent after the security posture of the source table changed.
Tips
- One stream table, many roles: A single stream table can serve all tenants. Each role's RLS policy filters at read time — no per-tenant duplication needed.
- Write policies: Stream tables are maintained by pg_trickle. Restrict
writes to the pg_trickle system by only creating
FOR SELECTpolicies. - Default deny: Once RLS is enabled, roles without a matching policy see zero rows. Always test with a non-superuser role.
- FORCE ROW LEVEL SECURITY: By default, table owners bypass RLS. Use
ALTER TABLE ... FORCE ROW LEVEL SECURITYif the owner should also be subject to policies.
Partitioned Tables as Sources
This tutorial shows how pg_trickle works with PostgreSQL's declarative table partitioning. It covers RANGE, LIST, and HASH partitioned source tables, explains what happens when you add or remove partitions, and documents known caveats.
Background
PostgreSQL lets you split large tables into smaller "partitions" — for
example one partition per month for an orders table. This is a common
technique for managing very large datasets. pg_trickle handles partitioned
source tables transparently:
-
CDC triggers fire on all partitions. PostgreSQL 13+ automatically clones row-level triggers from the parent to every child partition. All DML (INSERT, UPDATE, DELETE) on any partition is captured in a single change buffer keyed by the parent table's OID.
-
ATTACH PARTITION is detected automatically. When you add a new partition with pre-existing data, pg_trickle's DDL event trigger detects the change and marks affected stream tables for reinitialization. No manual intervention required.
-
WAL-based CDC works correctly. When using WAL mode, publications are created with
publish_via_partition_root = trueso all partition changes appear under the parent table's identity.
Example: Monthly Sales Partitions (RANGE)
-- Create a RANGE-partitioned source table
CREATE TABLE sales (
id SERIAL,
sale_date DATE NOT NULL,
region TEXT NOT NULL,
amount NUMERIC NOT NULL,
PRIMARY KEY (id, sale_date)
) PARTITION BY RANGE (sale_date);
-- Create partitions for each half of the year
CREATE TABLE sales_h1_2025 PARTITION OF sales
FOR VALUES FROM ('2025-01-01') TO ('2025-07-01');
CREATE TABLE sales_h2_2025 PARTITION OF sales
FOR VALUES FROM ('2025-07-01') TO ('2026-01-01');
-- Insert data across partitions
INSERT INTO sales (sale_date, region, amount) VALUES
('2025-02-15', 'US', 100.00),
('2025-05-20', 'EU', 250.00),
('2025-08-10', 'US', 175.00),
('2025-11-30', 'EU', 300.00);
-- Create a stream table over the partitioned source
SELECT pgtrickle.create_stream_table(
name => 'regional_sales',
query => $$
SELECT region, SUM(amount) AS total, COUNT(*) AS cnt
FROM sales
GROUP BY region
$$,
schedule => '1 minute',
refresh_mode => 'DIFFERENTIAL'
);
-- Refresh to populate
SELECT pgtrickle.refresh_stream_table('regional_sales');
-- Verify — aggregates span all partitions:
SELECT * FROM regional_sales ORDER BY region;
-- region | total | cnt
-- --------+--------+-----
-- EU | 550.00 | 2
-- US | 275.00 | 2
Adding New Partitions
When you add a new partition, any new rows inserted through the parent are automatically captured by CDC triggers. The trigger on the parent is cloned to the new partition by PostgreSQL.
-- Add a new partition for 2026
CREATE TABLE sales_h1_2026 PARTITION OF sales
FOR VALUES FROM ('2026-01-01') TO ('2026-07-01');
-- Inserts into the new partition are captured normally
INSERT INTO sales (sale_date, region, amount)
VALUES ('2026-03-15', 'US', 400.00);
-- Next refresh picks up the new row
SELECT pgtrickle.refresh_stream_table('regional_sales');
SELECT * FROM regional_sales ORDER BY region;
-- region | total | cnt
-- --------+--------+-----
-- EU | 550.00 | 2
-- US | 675.00 | 3
ATTACH PARTITION with Pre-Existing Data
The most important edge case: attaching a table that already contains rows. These rows were never seen by CDC triggers, so the stream table would be stale. pg_trickle detects this automatically.
-- Create a standalone table with existing data
CREATE TABLE sales_h2_2026 (
id SERIAL,
sale_date DATE NOT NULL,
region TEXT NOT NULL,
amount NUMERIC NOT NULL,
PRIMARY KEY (id, sale_date)
);
INSERT INTO sales_h2_2026 (sale_date, region, amount) VALUES
('2026-08-01', 'EU', 500.00),
('2026-09-15', 'US', 200.00);
-- Attach it to the partitioned table
ALTER TABLE sales ATTACH PARTITION sales_h2_2026
FOR VALUES FROM ('2026-07-01') TO ('2027-01-01');
-- pg_trickle detects the partition change and marks the stream table
-- for reinitialize. Check:
SELECT pgt_name, needs_reinit
FROM pgtrickle.pgt_stream_tables
WHERE pgt_name = 'regional_sales';
-- pgt_name | needs_reinit
-- -----------------+--------------
-- regional_sales | t
-- The next refresh reinitializes — re-reading all data from scratch:
SELECT pgtrickle.refresh_stream_table('regional_sales');
SELECT * FROM regional_sales ORDER BY region;
-- region | total | cnt
-- --------+---------+-----
-- EU | 1050.00 | 3
-- US | 875.00 | 4
DETACH PARTITION
When you detach a partition, the detached table's data is no longer visible through the parent. pg_trickle detects this too and marks stream tables for reinitialize.
-- Archive the old partition
ALTER TABLE sales DETACH PARTITION sales_h1_2025;
-- Stream table is marked for reinit:
SELECT pgt_name, needs_reinit
FROM pgtrickle.pgt_stream_tables
WHERE pgt_name = 'regional_sales';
-- pgt_name | needs_reinit
-- -----------------+--------------
-- regional_sales | t
-- After refresh, the detached partition's rows are gone:
SELECT pgtrickle.refresh_stream_table('regional_sales');
SELECT * FROM regional_sales ORDER BY region;
-- (only rows from remaining partitions)
LIST Partitioning
LIST partitioning splits rows by discrete values. It works identically:
CREATE TABLE events (
id SERIAL,
region TEXT NOT NULL,
payload TEXT,
PRIMARY KEY (id, region)
) PARTITION BY LIST (region);
CREATE TABLE events_us PARTITION OF events FOR VALUES IN ('US');
CREATE TABLE events_eu PARTITION OF events FOR VALUES IN ('EU');
CREATE TABLE events_ap PARTITION OF events FOR VALUES IN ('AP');
SELECT pgtrickle.create_stream_table(
name => 'event_counts',
query => 'SELECT region, count(*) AS cnt FROM events GROUP BY region',
schedule => '1 minute'
);
HASH Partitioning
HASH partitioning distributes rows across a fixed number of partitions. Useful for spreading write load evenly:
CREATE TABLE metrics (
id SERIAL PRIMARY KEY,
sensor_id INT NOT NULL,
value DOUBLE PRECISION
) PARTITION BY HASH (id);
CREATE TABLE metrics_0 PARTITION OF metrics
FOR VALUES WITH (MODULUS 4, REMAINDER 0);
CREATE TABLE metrics_1 PARTITION OF metrics
FOR VALUES WITH (MODULUS 4, REMAINDER 1);
CREATE TABLE metrics_2 PARTITION OF metrics
FOR VALUES WITH (MODULUS 4, REMAINDER 2);
CREATE TABLE metrics_3 PARTITION OF metrics
FOR VALUES WITH (MODULUS 4, REMAINDER 3);
SELECT pgtrickle.create_stream_table(
name => 'sensor_avg',
query => $$
SELECT sensor_id, AVG(value) AS avg_val, COUNT(*) AS cnt
FROM metrics GROUP BY sensor_id
$$,
schedule => '1 minute'
);
Foreign Tables
Tables from other databases (via postgres_fdw) can be used as sources,
but with restrictions:
- No trigger-based CDC — foreign tables don't support row-level triggers.
- No WAL-based CDC — foreign tables don't generate local WAL.
- FULL refresh works —
SELECT *executes a remote query each time. - Polling-based CDC works — when
pg_trickle.foreign_table_pollingis enabled, pg_trickle creates a local snapshot table and detects changes viaEXCEPT ALLcomparison.
When you use a foreign table as a source, pg_trickle emits an info message explaining the limitations:
CREATE EXTENSION postgres_fdw;
CREATE SERVER remote_db
FOREIGN DATA WRAPPER postgres_fdw
OPTIONS (host 'remote-host', dbname 'analytics');
CREATE USER MAPPING FOR CURRENT_USER
SERVER remote_db OPTIONS (user 'reader');
CREATE FOREIGN TABLE remote_orders (
id INT,
amount NUMERIC
) SERVER remote_db OPTIONS (table_name 'orders');
-- Only FULL refresh is available:
SELECT pgtrickle.create_stream_table(
name => 'remote_totals',
query => 'SELECT SUM(amount) AS total FROM remote_orders',
schedule => '5 minutes',
refresh_mode => 'FULL'
);
-- INFO: pg_trickle: source table remote_orders is a foreign table.
-- Foreign tables cannot use trigger-based or WAL-based CDC —
-- only FULL refresh mode or polling-based change detection is supported.
Known Caveats
| Caveat | Description |
|---|---|
| PostgreSQL 13+ required | Parent-table triggers only propagate to child partitions on PG 13+. pg_trickle targets PostgreSQL 18, so this is always satisfied. |
| Partition key in PRIMARY KEY | PostgreSQL requires the partition key to be part of any unique constraint. This means your PRIMARY KEY must include the partition column. |
| ATTACH with data = reinitialize | Attaching a partition with pre-existing rows triggers a full reinitialize on the next refresh. For very large tables, this may be slow. Consider gating the source with pgtrickle.gate_source() during bulk partition operations. |
| Sub-partitioning | Multi-level partitioning (partitions of partitions) works in principle because triggers propagate through the entire hierarchy, but it is not extensively tested. |
| pg_partman compatibility | pg_partman dynamically creates and drops partitions. Since pg_trickle detects ATTACH/DETACH via DDL event triggers, it should work, but this combination is not yet tested. |
| Partitioned storage tables | Using a partitioned table as the stream table's storage is not supported. This is tracked for a future release. |
| DETACH PARTITION CONCURRENTLY | DETACH PARTITION ... CONCURRENTLY is a two-phase operation. The DDL event trigger fires after the first phase; the partition is not fully detached until the second phase commits. The stream table may briefly reflect the old partition count. |
Foreign Table Sources
This tutorial shows how to use a postgres_fdw foreign table as a source for
a stream table. Foreign tables let you aggregate data from remote PostgreSQL
databases into a local stream table that refreshes automatically.
Background
PostgreSQL's Foreign Data Wrapper
(postgres_fdw) lets you define tables that transparently query a remote
database. pg_trickle can use these foreign tables as stream table sources,
but with different change-detection semantics than regular tables.
Key difference: Foreign tables cannot use trigger-based or WAL-based CDC. Changes are detected either by re-scanning the entire remote table (FULL refresh) or by comparing a local snapshot to the remote data (polling-based CDC).
Step 1 — Set Up the Foreign Server
-- Enable the foreign data wrapper extension
CREATE EXTENSION IF NOT EXISTS postgres_fdw;
-- Create a connection to the remote database
CREATE SERVER warehouse_db
FOREIGN DATA WRAPPER postgres_fdw
OPTIONS (host 'warehouse.example.com', dbname 'analytics', port '5432');
-- Map the current user to a remote user
CREATE USER MAPPING FOR CURRENT_USER
SERVER warehouse_db
OPTIONS (user 'readonly_user', password 'secret');
Step 2 — Define the Foreign Table
CREATE FOREIGN TABLE remote_orders (
id INT,
customer_id INT,
amount NUMERIC(12,2),
region TEXT,
created_at TIMESTAMP
) SERVER warehouse_db
OPTIONS (schema_name 'public', table_name 'orders');
Alternatively, import an entire remote schema:
IMPORT FOREIGN SCHEMA public
LIMIT TO (orders, customers)
FROM SERVER warehouse_db
INTO public;
Step 3 — Create a Stream Table with FULL Refresh
The simplest approach uses FULL refresh mode — pg_trickle re-executes the query against the remote table on every refresh cycle:
SELECT pgtrickle.create_stream_table(
name => 'orders_by_region',
query => $$
SELECT
region,
COUNT(*) AS order_count,
SUM(amount) AS total_revenue,
AVG(amount) AS avg_order_value
FROM remote_orders
GROUP BY region
$$,
schedule => '5m',
refresh_mode => 'FULL'
);
pg_trickle will emit an informational message:
INFO: pg_trickle: source table remote_orders is a foreign table.
Foreign tables cannot use trigger-based or WAL-based CDC —
only FULL refresh mode or polling-based change detection is supported.
How FULL refresh works with foreign tables:
- Every 5 minutes, pg_trickle executes the defining query.
- The query is sent to the remote database via
postgres_fdw. - The complete result set replaces the stream table contents.
- This is equivalent to a
MATERIALIZED VIEWrefresh, but automated.
Step 4 — Polling-Based CDC (Optional)
If the remote table is large and changes are small, FULL refresh becomes expensive because it transfers the entire result set every cycle. Polling-based CDC provides a more efficient alternative:
-- Enable polling globally (or per-session)
SET pg_trickle.foreign_table_polling = on;
-- Now create with DIFFERENTIAL mode — pg_trickle will use polling
SELECT pgtrickle.create_stream_table(
name => 'orders_by_region_polling',
query => $$
SELECT
region,
COUNT(*) AS order_count,
SUM(amount) AS total_revenue,
AVG(amount) AS avg_order_value
FROM remote_orders
GROUP BY region
$$,
schedule => '5m',
refresh_mode => 'FULL'
);
How polling works:
- On the first refresh, pg_trickle creates a local snapshot table that mirrors the remote table's data.
- On subsequent refreshes, it fetches the current remote data and computes
an
EXCEPT ALLdifference against the snapshot. - Only the changed rows are written to the change buffer and processed through the incremental delta pipeline.
- The snapshot table is updated to reflect the new remote state.
- When the stream table is dropped, the snapshot table is cleaned up.
Trade-offs:
| Aspect | FULL Refresh | Polling CDC |
|---|---|---|
| Network transfer | Full result set every cycle | Full remote scan, but only diffs applied |
| Local storage | Stream table only | Stream table + snapshot table |
| Best for | Small remote tables | Large remote tables with small change rates |
| GUC required | No | pg_trickle.foreign_table_polling = on |
Step 5 — Verify and Monitor
-- Check stream table status
SELECT * FROM pgtrickle.pgt_status('orders_by_region');
-- Check CDC health (will show foreign table constraints)
SELECT * FROM pgtrickle.check_cdc_health();
-- View refresh history
SELECT * FROM pgtrickle.get_refresh_history('orders_by_region', 5);
-- Monitor staleness
SELECT * FROM pgtrickle.get_staleness('orders_by_region');
Worked Example — Remote Inventory Dashboard
This example aggregates inventory data from a remote warehouse database into a local dashboard table:
-- Remote table definition
CREATE FOREIGN TABLE remote_inventory (
sku TEXT,
warehouse TEXT,
quantity INT,
updated_at TIMESTAMP
) SERVER warehouse_db
OPTIONS (schema_name 'inventory', table_name 'stock_levels');
-- Dashboard: inventory summary by warehouse
SELECT pgtrickle.create_stream_table(
name => 'inventory_dashboard',
query => $$
SELECT
warehouse,
COUNT(DISTINCT sku) AS unique_products,
SUM(quantity) AS total_units,
MIN(updated_at) AS oldest_update,
MAX(updated_at) AS newest_update
FROM remote_inventory
GROUP BY warehouse
$$,
schedule => '10m',
refresh_mode => 'FULL'
);
After the first refresh:
SELECT * FROM inventory_dashboard;
warehouse | unique_products | total_units | oldest_update | newest_update
-----------+-----------------+-------------+---------------------+---------------------
east | 142 | 23500 | 2026-03-14 08:00:00 | 2026-03-14 09:15:00
west | 98 | 15200 | 2026-03-14 07:30:00 | 2026-03-14 09:10:00
central | 215 | 41000 | 2026-03-14 06:00:00 | 2026-03-14 09:20:00
Constraints and Caveats
| Constraint | Details |
|---|---|
| No trigger CDC | Foreign tables don't support PostgreSQL row-level triggers. |
| No WAL CDC | Foreign tables don't generate local WAL entries. |
| Network latency | Each refresh cycle queries the remote database. Schedule accordingly. |
| Remote availability | If the remote database is down, the refresh will fail (logged in pgt_refresh_history). The stream table retains its last successful data. |
| Authentication | CREATE USER MAPPING credentials must remain valid. Use .pgpass or environment variables in production. |
| Snapshot storage | Polling CDC creates a snapshot table sized proportionally to the remote table. Monitor disk usage. |
FAQ
Q: Why does my foreign table stream table only work in FULL mode?
Foreign tables cannot install row-level triggers (the mechanism pg_trickle uses
for trigger-based CDC) and don't generate local WAL records (used by WAL-based
CDC). FULL refresh works because it simply re-executes the remote query.
Enable pg_trickle.foreign_table_polling if you need differential-style
change detection.
Q: Can I mix foreign and local tables in the same defining query?
Yes. If your query joins a foreign table with a local table, pg_trickle uses trigger/WAL CDC for the local table and FULL-rescan or polling for the foreign table. The refresh mode must be FULL unless polling is enabled for the foreign table sources.
Q: What happens if the remote database is temporarily unavailable?
The refresh attempt fails, is logged in pgt_refresh_history with status
FAILED, and the consecutive_errors counter increments. The stream table
retains its last successful data. When the remote database recovers, the next
scheduled refresh succeeds and the error counter resets.
Tutorial: Tiered Scheduling
Tiered scheduling (v0.12.0+) lets you assign refresh priorities to stream tables using four tiers: Hot, Warm, Cold, and Frozen. This reduces CPU and I/O overhead by refreshing less-critical tables less frequently.
When to Use It
- You have many stream tables (50+) and want to reduce scheduler load
- Some tables power real-time dashboards (need hot refresh) while others serve weekly reports (can be cold)
- You want to freeze tables during maintenance windows without dropping them
Tier Overview
| Tier | Multiplier | Effect |
|---|---|---|
hot | 1× | Refresh at the configured schedule (default) |
warm | 2× | Refresh at 2× the configured interval |
cold | 10× | Refresh at 10× the configured interval |
frozen | skip | Never refreshed until manually promoted |
For a stream table with schedule => '1m':
| Tier | Effective Interval |
|---|---|
| hot | 1 minute |
| warm | 2 minutes |
| cold | 10 minutes |
| frozen | never |
Note: Cron-based schedules are not affected by the tier multiplier. They always fire at the configured cron time.
Step-by-Step Example
1. Enable tiered scheduling
Tiered scheduling is enabled by default since v0.12.0. Verify:
SHOW pg_trickle.tiered_scheduling;
-- Should return: on
2. Create stream tables with different priorities
-- Real-time dashboard — stays hot (default)
SELECT pgtrickle.create_stream_table(
name => 'live_order_count',
query => 'SELECT COUNT(*) AS total FROM orders WHERE status = ''active''',
schedule => '30s'
);
-- Important but not latency-critical
SELECT pgtrickle.create_stream_table(
name => 'daily_revenue',
query => 'SELECT DATE_TRUNC(''day'', created_at) AS day, SUM(amount) AS revenue
FROM orders GROUP BY 1',
schedule => '1m'
);
-- Weekly report — rarely queried
SELECT pgtrickle.create_stream_table(
name => 'customer_lifetime_value',
query => 'SELECT customer_id, SUM(amount) AS lifetime_value
FROM orders GROUP BY customer_id',
schedule => '5m'
);
3. Assign tiers
-- live_order_count stays at 'hot' (default) — refreshes every 30s
-- daily_revenue: 2× multiplier → effective interval = 2 minutes
SELECT pgtrickle.alter_stream_table('daily_revenue', tier => 'warm');
-- customer_lifetime_value: 10× multiplier → effective interval = 50 minutes
SELECT pgtrickle.alter_stream_table('customer_lifetime_value', tier => 'cold');
4. Verify effective schedules
SELECT pgt_name, schedule, refresh_tier,
CASE refresh_tier
WHEN 'hot' THEN schedule
WHEN 'warm' THEN schedule || ' ×2'
WHEN 'cold' THEN schedule || ' ×10'
WHEN 'frozen' THEN 'never'
END AS effective
FROM pgtrickle.pgt_stream_tables
ORDER BY refresh_tier;
5. Freeze a table during maintenance
-- Freeze before a schema migration
SELECT pgtrickle.alter_stream_table('customer_lifetime_value', tier => 'frozen');
-- ... perform migration ...
-- Promote back when ready
SELECT pgtrickle.alter_stream_table('customer_lifetime_value', tier => 'warm');
Choosing the Right Tier
| Use Case | Recommended Tier |
|---|---|
| Real-time dashboards, alerting tables | hot |
| Operational reports queried hourly | warm |
| Weekly/monthly analytics, batch consumers | cold |
| Tables under maintenance, seasonal reports | frozen |
Rules of thumb:
- Start with everything at hot (the default). Move tables to warm or cold as you identify which ones can tolerate more staleness.
- Warm halves the refresh CPU cost compared to hot.
- Cold reduces refresh overhead by 90%.
- Use frozen sparingly — changes accumulate in the buffer and will be processed when you promote the table back.
Monitoring Tiers
-- Check which tables are in which tier
SELECT pgt_name, refresh_tier, status, staleness
FROM pgtrickle.stream_tables_info
ORDER BY refresh_tier, staleness DESC;
-- Find frozen tables (these are NOT being refreshed)
SELECT pgt_name, refresh_tier
FROM pgtrickle.pgt_stream_tables
WHERE refresh_tier = 'frozen';
Troubleshooting
All tables are frozen and nothing is refreshing:
If every stream table is set to frozen, the scheduler has nothing to do.
Promote at least one table back to hot or warm.
Staleness exceeds expectations for cold tables:
Remember that cold applies a 10× multiplier. A 5-minute schedule becomes
a 50-minute effective interval. If this is too stale, use warm instead.
Further Reading
Tutorial: Fuse Circuit Breaker
The fuse circuit breaker (v0.11.0+) suspends differential refreshes when the incoming change volume exceeds a threshold. This protects your database from runaway refresh cycles during bulk data loads, accidental mass-deletes, or migration scripts.
When to Use It
- Bulk ETL loads — loading millions of rows that would overwhelm a differential refresh
- Data migration scripts — large schema or data changes that temporarily spike the change buffer
- Protection against accidents — an errant
DELETE FROM ordersshouldn't silently cascade through all downstream stream tables
How It Works
Normal operation Fuse blows After reset
───────────────── ───────────────── ─────────────────
Source DML ──▶ CDC ──▶ Refresh Source DML ──▶ CDC ──▶ BLOCKED Source DML ──▶ CDC ──▶ Refresh
│ (resumed)
▼
NOTIFY alert
(fuse_blown)
- Each refresh cycle, the scheduler counts pending changes in the buffer.
- If the count exceeds
fuse_ceilingforfuse_sensitivityconsecutive cycles, the fuse blows. - The stream table enters a paused state — no refreshes occur.
- A
fuse_blownalert is emitted viaNOTIFY pg_trickle_alert. - An operator investigates and calls
reset_fuse()to resume.
Step-by-Step Example
1. Create a stream table with fuse protection
SELECT pgtrickle.create_stream_table(
name => 'category_summary',
query => 'SELECT category, COUNT(*) AS cnt, SUM(price) AS total
FROM products GROUP BY category',
schedule => '1m',
refresh_mode => 'DIFFERENTIAL'
);
-- Arm the fuse: blow when pending changes exceed 50,000 rows
SELECT pgtrickle.alter_stream_table(
'category_summary',
fuse => 'on',
fuse_ceiling => 50000,
fuse_sensitivity => 3 -- require 3 consecutive over-ceiling cycles
);
2. Observe normal operation
-- Insert a small batch — well under the ceiling
INSERT INTO products (name, category, price)
SELECT 'Product ' || i, 'Electronics', 9.99
FROM generate_series(1, 100) i;
-- After the next refresh cycle, the stream table is updated normally
SELECT * FROM pgtrickle.category_summary;
3. Trigger a bulk load
-- Simulate a large ETL load — 100,000 rows
INSERT INTO products (name, category, price)
SELECT 'Bulk ' || i, 'Imported', 4.99
FROM generate_series(1, 100000) i;
After fuse_sensitivity scheduler cycles (3 in our example), the fuse
blows. The stream table stops refreshing.
4. Inspect the fuse state
SELECT name, fuse_mode, fuse_state, fuse_ceiling, blown_at, blow_reason
FROM pgtrickle.fuse_status();
name | fuse_mode | fuse_state | fuse_ceiling | blown_at | blow_reason
-------------------+-----------+------------+--------------+----------------------------+---------------------------
category_summary | on | blown | 50000 | 2026-03-31 14:22:01.123+00 | change_count_exceeded
5. Decide how to recover
You have three options:
-- Option A: Apply the changes (process the bulk load normally)
SELECT pgtrickle.reset_fuse('category_summary', action => 'apply');
-- Option B: Skip the changes (discard the batch, resume from current state)
SELECT pgtrickle.reset_fuse('category_summary', action => 'skip_changes');
-- Option C: Reinitialize (full rebuild from the defining query)
SELECT pgtrickle.reset_fuse('category_summary', action => 'reinitialize');
After resetting, the fuse returns to 'armed' state and the scheduler
resumes.
Fuse Modes
| Mode | Behavior |
|---|---|
'off' | No fuse protection (default) |
'on' | Always armed — blows when changes exceed fuse_ceiling |
'auto' | Blows only when a FULL refresh would be cheaper than DIFFERENTIAL |
'auto' mode is recommended for most use cases — it protects against
bulk loads while allowing large-but-efficient differential refreshes to
proceed.
Using with dbt
In dbt models, configure the fuse via the stream_table materialization:
-- models/marts/category_summary.sql
{{ config(
materialized='stream_table',
schedule='5m',
refresh_mode='DIFFERENTIAL',
fuse='auto',
fuse_ceiling=50000,
fuse_sensitivity=3
) }}
SELECT category, COUNT(*) AS cnt, SUM(price) AS total
FROM {{ source('raw', 'products') }}
GROUP BY category
Global Defaults
Set a cluster-wide default ceiling via the pg_trickle.fuse_default_ceiling
GUC. Stream tables with fuse_ceiling = NULL inherit this value:
ALTER SYSTEM SET pg_trickle.fuse_default_ceiling = 100000;
SELECT pg_reload_conf();
Monitoring
pgtrickle.fuse_status()— inspect fuse state for all stream tablesLISTEN pg_trickle_alert— receive real-timefuse_blownnotificationspgtrickle.dedup_stats()— includes fuse-related counterspgtrickle.pgt_stream_tables.fuse_state— direct catalog query
Further Reading
- SQL Reference — fuse_status()
- SQL Reference — reset_fuse()
- Configuration — fuse_default_ceiling
- Tutorial: ETL & Bulk Load Patterns
Tutorial: Circular Dependencies
pg_trickle supports circular (cyclic) stream table dependencies (v0.7.0+) for queries that use only monotone operators. The scheduler groups circular dependencies into Strongly Connected Components (SCCs) and iterates them to a fixed point.
When to Use It
- Transitive closure — computing all reachable nodes in a graph
- Graph reachability — finding all paths between nodes
- Iterative convergence — mutual dependencies that stabilize after a few iterations
Prerequisites
Circular dependencies are disabled by default. Enable them:
SET pg_trickle.allow_circular = true;
Monotone Operator Requirement
Only monotone operators are allowed in circular dependency chains. Monotone operators guarantee convergence — the result set grows (or stays the same) with each iteration until a fixed point is reached.
| Allowed (Monotone) | Blocked (Non-Monotone) |
|---|---|
| Joins (INNER, LEFT, RIGHT, FULL) | Aggregates (SUM, COUNT, etc.) |
| Filters (WHERE) | EXCEPT |
| Projections (SELECT) | Window functions |
| UNION ALL | NOT EXISTS / NOT IN |
| INTERSECT | |
| EXISTS |
Creating a circular dependency with non-monotone operators is rejected
with a clear error message, regardless of the allow_circular setting.
Step-by-Step Example: Transitive Closure
Suppose you have a graph of relationships:
CREATE TABLE edges (src INT, dst INT);
INSERT INTO edges VALUES
(1, 2), (2, 3), (3, 4), (4, 5),
(1, 3), (2, 5);
1. Create the base reachability table
-- Direct edges: all nodes directly connected
SELECT pgtrickle.create_stream_table(
name => 'reachable_direct',
query => 'SELECT src, dst FROM edges',
schedule => '1m',
refresh_mode => 'DIFFERENTIAL'
);
2. Create the transitive closure with a self-reference
-- Transitive closure: if A→B and B→C, then A→C
-- This creates a circular dependency (reachable depends on itself via the join)
SELECT pgtrickle.create_stream_table(
name => 'reachable',
query => 'SELECT DISTINCT r1.src, r2.dst
FROM pgtrickle.reachable_direct r1
JOIN pgtrickle.reachable_direct r2 ON r1.dst = r2.src
UNION ALL
SELECT src, dst FROM edges',
schedule => '1m',
refresh_mode => 'DIFFERENTIAL'
);
Note: This example uses the
reachable_directtable for the join rather than self-referencingreachabledirectly. For a true self-referencing cycle, pg_trickle detects the SCC and iterates.
3. Observe the fixed-point iteration
When the scheduler processes an SCC, it iterates until no new rows are produced (the fixed point):
-- Check SCC status
SELECT * FROM pgtrickle.pgt_scc_status();
Output:
scc_id | members | iteration | converged
--------+----------------------------------+-----------+-----------
1 | {reachable_direct,reachable} | 3 | true
4. Add new edges and watch convergence
INSERT INTO edges VALUES (5, 1); -- creates a cycle in the graph
On the next refresh cycle, the scheduler re-iterates the SCC until the transitive closure stabilizes with the new edge.
Monitoring SCCs
-- View all SCCs and their convergence status
SELECT * FROM pgtrickle.pgt_scc_status();
-- Check which stream tables belong to which SCC
SELECT pgt_name, scc_id
FROM pgtrickle.pgt_stream_tables
WHERE scc_id IS NOT NULL;
Controlling Iteration Limits
The pg_trickle.max_fixpoint_iterations GUC limits how many iterations
the scheduler attempts before declaring non-convergence:
-- Default: 100 (generous headroom)
SHOW pg_trickle.max_fixpoint_iterations;
-- Lower it for fast-converging workloads
SET pg_trickle.max_fixpoint_iterations = 20;
If convergence is not reached within the limit, all SCC members are marked
as ERROR. This prevents runaway infinite loops.
Limitations
- Non-monotone operators are always rejected — aggregates, EXCEPT, window functions, and NOT EXISTS/NOT IN cannot appear in circular chains because they prevent convergence.
- Performance scales with iteration count — each iteration runs a full differential refresh cycle for all SCC members. Keep cycles small.
- All SCC members must use DIFFERENTIAL mode — FULL and IMMEDIATE modes are not supported for circular dependencies.
Further Reading
- Configuration — pg_trickle.allow_circular
- Configuration — pg_trickle.max_fixpoint_iterations
- SQL Reference — pgt_scc_status()
- FAQ — Cycle Detection
Tutorial: Tuning Refresh Mode
This tutorial walks you through using pg_trickle's built-in diagnostics to determine whether your stream tables are running in the most efficient refresh mode (FULL vs DIFFERENTIAL), and how to act on the recommendations.
Prerequisites
- pg_trickle v0.14.0 or later
- At least one stream table with several completed refresh cycles (the diagnostics become more accurate with more history)
Step 1: Check Current Refresh Efficiency
Start by reviewing how your stream tables are performing with their current refresh mode:
SELECT pgt_name, refresh_mode, diff_count, full_count,
avg_diff_ms, avg_full_ms, diff_speedup
FROM pgtrickle.refresh_efficiency();
Example output:
| pgt_name | refresh_mode | diff_count | full_count | avg_diff_ms | avg_full_ms | diff_speedup |
|---|---|---|---|---|---|---|
| order_totals | DIFFERENTIAL | 142 | 3 | 12.4 | 850.2 | 68.6x |
| user_stats | FULL | 0 | 145 | — | 320.1 | — |
| daily_metrics | DIFFERENTIAL | 98 | 47 | 425.8 | 410.3 | 1.0x |
Key observations:
- order_totals: DIFFERENTIAL is 68× faster — this is a great fit.
- user_stats: Running in FULL mode with no DIFFERENTIAL history — worth checking if DIFFERENTIAL would be faster.
- daily_metrics: DIFFERENTIAL and FULL take about the same time (1.0× speedup). FULL might actually be simpler and more predictable here.
Step 2: Get Recommendations
Use recommend_refresh_mode() to get AI-weighted recommendations:
SELECT pgt_name, current_mode, recommended_mode, confidence, reason
FROM pgtrickle.recommend_refresh_mode();
Example output:
| pgt_name | current_mode | recommended_mode | confidence | reason |
|---|---|---|---|---|
| order_totals | DIFFERENTIAL | KEEP | high | DIFFERENTIAL is 68.6× faster than FULL with low latency variance |
| user_stats | FULL | DIFFERENTIAL | medium | Query is simple (no complex joins), change ratio is low (2.1%), target table is large |
| daily_metrics | DIFFERENTIAL | FULL | medium | DIFFERENTIAL shows no speedup over FULL (1.0×); high latency variance (p95/p50 = 4.2) suggests unstable performance |
For a single table with full signal details:
SELECT recommended_mode, confidence, reason,
jsonb_pretty(signals) AS signal_details
FROM pgtrickle.recommend_refresh_mode('daily_metrics');
Step 3: Understand the Signals
The signals JSONB column contains the detailed breakdown of all seven
weighted signals that contributed to the recommendation:
{
"composite_score": -0.22,
"signals": [
{ "name": "change_ratio_avg", "score": -0.1, "weight": 0.30 },
{ "name": "empirical_timing", "score": -0.3, "weight": 0.35 },
{ "name": "change_ratio_current", "score": -0.2, "weight": 0.25 },
{ "name": "query_complexity", "score": 0.0, "weight": 0.10 },
{ "name": "target_size", "score": 0.1, "weight": 0.10 },
{ "name": "index_coverage", "score": 0.0, "weight": 0.05 },
{ "name": "latency_variance", "score": -0.4, "weight": 0.05 }
]
}
Positive scores favour DIFFERENTIAL; negative scores favour FULL. A composite score above +0.15 recommends DIFFERENTIAL; below −0.15 recommends FULL; in between, the current mode is near-optimal (KEEP).
Why ±0.15? The thresholds create a dead zone between −0.15 and +0.15
where the engine considers the two modes equivalent. Without this dead zone,
small fluctuations in the change_ratio signal would cause the engine to
oscillate between FULL and DIFFERENTIAL every few cycles — burning scheduling
overhead with no net benefit. The +0.15 threshold means DIFFERENTIAL needs
a clear edge (roughly a 15% advantage in combined signal weight) before the
engine switches away from FULL, and vice versa.
You can widen or narrow the dead zone:
-- Wider dead zone (less switching) — good for stable, predictable workloads
ALTER SYSTEM SET pg_trickle.cost_model_safety_margin = 0.25;
SELECT pg_reload_conf();
-- Narrower dead zone (faster mode switching) — good for highly variable workloads
ALTER SYSTEM SET pg_trickle.cost_model_safety_margin = 0.05;
SELECT pg_reload_conf();
The default is 0.15. If you see frequent mode oscillation in
pgtrickle.pgt_refresh_history, increase the margin.
Confidence levels:
| Level | Meaning |
|---|---|
high | 10+ completed refresh cycles; strong signal agreement |
medium | 5–10 cycles or mixed signals |
low | Fewer than 5 cycles; recommendation is speculative |
Step 4: Apply the Recommendation
If you decide to follow a recommendation, use ALTER STREAM TABLE:
-- Switch daily_metrics from DIFFERENTIAL to FULL
SELECT pgtrickle.alter_stream_table('daily_metrics',
refresh_mode => 'FULL'
);
Or switch a table to DIFFERENTIAL:
-- Switch user_stats to DIFFERENTIAL mode
SELECT pgtrickle.alter_stream_table('user_stats',
refresh_mode => 'DIFFERENTIAL'
);
The change takes effect on the next refresh cycle. No data is lost during the transition.
Step 5: Monitor After the Change
After switching modes, wait for several refresh cycles and re-check:
-- Wait a few minutes, then re-check efficiency
SELECT pgt_name, refresh_mode, diff_count, full_count,
avg_diff_ms, avg_full_ms, diff_speedup
FROM pgtrickle.refresh_efficiency()
WHERE pgt_name = 'daily_metrics';
Run the recommendation function again to verify the change was beneficial:
SELECT recommended_mode, confidence, reason
FROM pgtrickle.recommend_refresh_mode('daily_metrics');
If the recommendation now says KEEP, the new mode is working well.
Common Scenarios
High-cardinality aggregates
Stream tables with SUM/COUNT/AVG over high-cardinality GROUP BY keys
(1000+ groups) are almost always better in DIFFERENTIAL mode. pg_trickle
warns about low-cardinality groups at creation time (DIAG-2).
Small tables with frequent full rewrites
If the source table is small (< 10,000 rows) and changes affect > 30% of rows per cycle, FULL refresh is often faster because it avoids the overhead of change tracking and delta application.
Complex multi-join queries
Queries with 4+ JOINs may have high DIFFERENTIAL overhead due to the
delta propagation rules. If diff_speedup is below 2×, consider FULL mode.
Tables with volatile functions
Stream tables using volatile functions (e.g., now(), random()) must use
FULL mode. pg_trickle rejects volatile functions in DIFFERENTIAL mode at
creation time.
See Also
- SQL Reference: recommend_refresh_mode() — Full function documentation
- SQL Reference: refresh_efficiency() — Efficiency metrics documentation
- Configuration: agg_diff_cardinality_threshold — Cardinality warning threshold
- DVM Operators — Full operator support matrix
Tutorial: Monitoring & Alerting
This guide consolidates all pg_trickle monitoring capabilities into a single reference: built-in SQL views, NOTIFY-based alerts, and the Prometheus/Grafana observability stack.
Quick Health Check
The fastest way to verify pg_trickle is healthy:
SELECT * FROM pgtrickle.health_check() WHERE severity != 'OK';
If this returns no rows, everything is working. Any WARN or ERROR rows
tell you where to investigate.
Built-in Monitoring Views
Stream table status
-- Overview: name, status, mode, staleness
SELECT name, status, refresh_mode, staleness, stale
FROM pgtrickle.stream_tables_info;
-- Detailed stats: refresh counts, duration, error streaks
SELECT pgt_name, total_refreshes, avg_duration_ms, consecutive_errors, stale
FROM pgtrickle.pg_stat_stream_tables;
-- Live status with error counts
SELECT * FROM pgtrickle.pgt_status();
Refresh history
-- Last 10 refreshes for a specific stream table
SELECT start_time, action, status, duration_ms, rows_inserted, rows_deleted, error_message
FROM pgtrickle.get_refresh_history('order_totals', 10);
-- Global refresh timeline (last 20 events across all stream tables)
SELECT start_time, stream_table, action, status, duration_ms, error_message
FROM pgtrickle.refresh_timeline(20);
-- Aggregate refresh statistics
SELECT * FROM pgtrickle.st_refresh_stats();
CDC pipeline health
-- Per-source CDC mode, WAL lag, and alerts
SELECT * FROM pgtrickle.check_cdc_health();
-- Change buffer sizes (pending changes not yet consumed)
SELECT stream_table, source_table, cdc_mode, pending_rows, buffer_bytes
FROM pgtrickle.change_buffer_sizes()
ORDER BY pending_rows DESC;
-- Verify all CDC triggers are installed and enabled
SELECT source_table, trigger_type, trigger_name
FROM pgtrickle.trigger_inventory()
WHERE NOT present OR NOT enabled;
Dependencies
-- ASCII tree view of the entire dependency graph
SELECT tree_line, status, refresh_mode
FROM pgtrickle.dependency_tree();
-- Diamond consistency groups
SELECT * FROM pgtrickle.diamond_groups();
Fuse circuit breaker
-- Check fuse state for all stream tables
SELECT name, fuse_mode, fuse_state, fuse_ceiling, blown_at
FROM pgtrickle.fuse_status();
Parallel workers
-- Worker pool status (when parallel_refresh_mode = 'on')
SELECT * FROM pgtrickle.worker_pool_status();
-- Recent parallel job history
SELECT job_id, unit_key, status, duration_ms
FROM pgtrickle.parallel_job_status(60);
NOTIFY-Based Alerting
pg_trickle emits real-time events via PostgreSQL's NOTIFY system:
LISTEN pg_trickle_alert;
Event Types
| Event | Trigger | Severity |
|---|---|---|
stale_data | Scheduler is also behind — view is genuinely out of date | Warning |
no_upstream_changes | Scheduler is healthy but source tables have had no writes — view is correct | Info |
auto_suspended | Stream table suspended after max consecutive errors | Critical |
resumed | Stream table resumed after suspension | Info |
reinitialize_needed | Upstream DDL change detected | Warning |
buffer_growth_warning | Change buffer growing unexpectedly | Warning |
slot_lag_warning | WAL replication slot retaining excessive data | Warning |
fuse_blown | Circuit breaker tripped | Warning |
refresh_completed | Refresh completed successfully | Info |
refresh_failed | Refresh failed | Error |
diamond_partial_failure | One member of an atomic diamond group failed | Warning |
scheduler_falling_behind | Refresh duration approaching the schedule interval | Warning |
spill_threshold_exceeded | Delta MERGE spilled to temp files for consecutive refreshes, forcing FULL | Warning |
Notification Payload
Each notification carries a JSON payload:
{
"event": "auto_suspended",
"stream_table": "order_totals",
"consecutive_errors": 3,
"last_error": "column \"deleted_column\" does not exist",
"timestamp": "2026-03-31T14:22:01.123Z"
}
Bridging to External Systems
To forward NOTIFY events to external alerting systems (PagerDuty, Slack, OpsGenie), use a listener process:
# Example: Python listener using psycopg
import psycopg
import json
conn = psycopg.connect("postgresql://user:pass@host/db", autocommit=True)
conn.execute("LISTEN pg_trickle_alert")
for notify in conn.notifies():
payload = json.loads(notify.payload)
event = payload["event"]
# no_upstream_changes is informational — source tables are quiet but healthy.
# Only page on actionable events.
if event in ("auto_suspended", "fuse_blown", "refresh_failed"):
send_to_pagerduty(payload)
elif event == "stale_data": # scheduler itself is falling behind
send_to_pagerduty(payload)
Prometheus & Grafana Stack
For production deployments, use the pre-built observability stack in the
monitoring/ directory:
cd monitoring/
docker compose up -d
This gives you:
- Prometheus scraping pg_trickle metrics via postgres_exporter
- Grafana with a pre-provisioned dashboard
- Alerting rules for staleness, errors, CDC lag, and scheduler health
See Prometheus & Grafana Integration for full setup details.
Diagnostic Workflow
When something is wrong, follow this systematic workflow:
Step 1 — Global health
SELECT * FROM pgtrickle.health_check() WHERE severity != 'OK';
Step 2 — Status and staleness
SELECT name, status, consecutive_errors, staleness
FROM pgtrickle.pgt_status()
ORDER BY staleness DESC NULLS FIRST;
Step 3 — Recent refresh activity
SELECT start_time, stream_table, action, status, error_message
FROM pgtrickle.refresh_timeline(20);
Step 4 — Error details for a specific stream table
SELECT * FROM pgtrickle.diagnose_errors('my_stream_table');
Step 5 — CDC pipeline
SELECT stream_table, source_table, pending_rows, buffer_bytes
FROM pgtrickle.change_buffer_sizes()
ORDER BY pending_rows DESC;
Step 6 — Trigger verification
SELECT source_table, trigger_type, trigger_name
FROM pgtrickle.trigger_inventory()
WHERE NOT present OR NOT enabled;
Common Alert Responses
| Alert | Likely Cause | Action |
|---|---|---|
stale_data | Scheduler behind, long refresh, or lock contention | Check pgt_status() and refresh_timeline() |
auto_suspended | Repeated refresh failures | Fix root cause, then resume_stream_table() |
fuse_blown | Bulk load exceeded fuse ceiling | Investigate, then reset_fuse() |
buffer_growth_warning | Scheduler not consuming buffers fast enough | Check scheduler status and refresh errors |
reinitialize_needed | Source table DDL changed | Verify schema compatibility; scheduler handles automatically |
Further Reading
- Prometheus & Grafana Integration
- SQL Reference — Monitoring Functions
- Configuration Reference
- FAQ — Troubleshooting
Tutorial: ETL & Bulk Load Patterns
pg_trickle provides source gating (v0.5.0+) and watermark gating (v0.7.0+) to coordinate stream table refreshes with ETL pipelines and bulk data loads. This tutorial covers common patterns for pausing refreshes during loads and resuming them safely afterward.
The Problem
When you bulk-load data into a source table (e.g., a nightly ETL job), the change buffer fills rapidly. Without coordination:
- A differential refresh mid-load sees a partial batch, producing incomplete results
- The adaptive fallback may trigger repeated FULL refreshes during the load
- The fuse circuit breaker may blow, requiring manual intervention
Source gating solves this by telling pg_trickle to skip refreshes for gated sources until the load completes.
Recipe 1 — Single Source Bulk Load
The simplest pattern: gate the source, load data, ungate.
-- 1. Gate the source table — all dependent stream tables pause
SELECT pgtrickle.gate_source('public.orders');
-- 2. Perform the bulk load
COPY orders FROM '/data/orders_20260331.csv' WITH (FORMAT csv, HEADER);
-- or: INSERT INTO orders SELECT ... FROM staging_orders;
-- 3. Ungate — stream tables resume and process the full batch
SELECT pgtrickle.ungate_source('public.orders');
While gated, the scheduler skips all stream tables that depend on the gated source. Changes still accumulate in the CDC buffer and are processed in a single batch after ungating.
Recipe 2 — Coordinated Multi-Source Load
When your ETL loads multiple tables that feed into the same stream table:
-- Gate all sources involved in the load
SELECT pgtrickle.gate_source('public.orders');
SELECT pgtrickle.gate_source('public.customers');
SELECT pgtrickle.gate_source('public.products');
-- Load all tables
COPY orders FROM '/data/orders.csv' WITH (FORMAT csv, HEADER);
COPY customers FROM '/data/customers.csv' WITH (FORMAT csv, HEADER);
COPY products FROM '/data/products.csv' WITH (FORMAT csv, HEADER);
-- Ungate all at once — stream tables see a consistent snapshot
SELECT pgtrickle.ungate_source('public.orders');
SELECT pgtrickle.ungate_source('public.customers');
SELECT pgtrickle.ungate_source('public.products');
Recipe 3 — Gate + Deferred Stream Table Creation
For initial deployments where data must be loaded before stream tables are created:
-- 1. Gate the source before any stream tables exist
SELECT pgtrickle.gate_source('public.orders');
-- 2. Load the initial data
COPY orders FROM '/data/historical_orders.csv' WITH (FORMAT csv, HEADER);
-- 3. Create stream tables — they won't refresh yet (source is gated)
SELECT pgtrickle.create_stream_table(
name => 'order_totals',
query => 'SELECT customer_id, SUM(amount) AS total FROM orders GROUP BY customer_id',
schedule => '1m'
);
-- 4. Ungate — the first refresh processes all data cleanly
SELECT pgtrickle.ungate_source('public.orders');
Recipe 4 — Nightly Batch Pattern
A common production pattern using a scheduled batch job:
-- Run nightly at 02:00 UTC
-- Step 1: Gate all ETL sources
DO $$
DECLARE
src TEXT;
BEGIN
FOR src IN SELECT DISTINCT source_table
FROM pgtrickle.list_sources('daily_report')
LOOP
PERFORM pgtrickle.gate_source(src);
END LOOP;
END;
$$;
-- Step 2: Run the ETL pipeline
CALL etl.load_daily_data();
-- Step 3: Ungate all sources
DO $$
DECLARE
gated RECORD;
BEGIN
FOR gated IN SELECT source_name FROM pgtrickle.source_gates()
WHERE is_gated = true
LOOP
PERFORM pgtrickle.ungate_source(gated.source_name);
END LOOP;
END;
$$;
Monitoring During a Gated Load
While sources are gated, verify the gate status:
-- Check which sources are currently gated
SELECT * FROM pgtrickle.source_gates();
-- Bootstrap gate status (v0.6.0+)
SELECT * FROM pgtrickle.bootstrap_gate_status();
Combining with the Fuse Circuit Breaker
For extra safety, combine gating with the fuse circuit breaker:
-- Arm the fuse as a safety net
SELECT pgtrickle.alter_stream_table('order_totals',
fuse => 'on',
fuse_ceiling => 500000
);
-- Gate for controlled loads
SELECT pgtrickle.gate_source('public.orders');
-- ... load data ...
SELECT pgtrickle.ungate_source('public.orders');
-- The fuse catches any unexpected bulk changes outside the gated window
Watermark Gating (v0.7.0+)
Watermark gating extends source gating with LSN-based coordination for more precise control:
-- Set a watermark — refreshes only consume changes up to this LSN
SELECT pgtrickle.set_watermark('public.orders', pg_current_wal_lsn());
-- Load new data (changes accumulate beyond the watermark)
COPY orders FROM '/data/new_orders.csv' WITH (FORMAT csv, HEADER);
-- Advance the watermark to include the new data
SELECT pgtrickle.advance_watermark('public.orders', pg_current_wal_lsn());
-- Or clear the watermark entirely
SELECT pgtrickle.clear_watermark('public.orders');
See the SQL Reference — Watermark Gating for the complete API.
Further Reading
- SQL Reference — Bootstrap Source Gating
- SQL Reference — Watermark Gating
- SQL Reference — ETL Coordination Cookbook
- Tutorial: Fuse Circuit Breaker
Tutorial: Migrating from Materialized Views
This guide shows how to incrementally migrate existing PostgreSQL
MATERIALIZED VIEW + manual REFRESH workflows to pg_trickle stream
tables.
Coming from a different background?
The step-by-step guide below covers the PostgreSQL materialized view path. If you are migrating from a different system, start here:
| You are migrating from | Jump to |
|---|---|
PostgreSQL MATERIALIZED VIEW + REFRESH | This guide |
pg_ivm | Migrating from pg_ivm |
Cron-based REFRESH (pg_cron, OS cron) | Step 6 — Remove external refresh jobs |
| Application-level refresh (manual SQL in code) | Step 2 — Create the stream table |
| Debezium + Materialize / RisingWave | Port your queries to PostgreSQL SQL, then follow this guide. See Comparisons for feature mapping. |
| Looker PDTs (Persistent Derived Tables) | PDTs map closely to stream tables. Translate the PDT SQL to a create_stream_table() call; the schedule replaces the PDT caching strategy. |
| Snowflake Dynamic Tables | The concepts are nearly identical. Map TARGET_LAG to a pg_trickle schedule; DOWNSTREAM is schedule => 'calculated'. See Comparisons. |
| Homemade ETL pipeline (INSERT ... SELECT) | Replace the periodic ETL job with a stream table using the same SELECT query. |
Why Migrate?
| Materialized View | Stream Table | |
|---|---|---|
| Refresh | Manual (REFRESH MATERIALIZED VIEW) | Automatic (scheduler) or manual |
| Incremental refresh | Not supported | Built-in differential mode |
| Blocking reads | REFRESH without CONCURRENTLY blocks readers | Never blocks readers |
| Dependency ordering | Manual | Automatic (DAG-aware topological refresh) |
| Monitoring | None | Built-in views, stats, NOTIFY alerts |
| Scheduling | External (cron, pg_cron) | Native (duration, cron, CALCULATED) |
Step-by-Step Migration
1. Identify materialized views to migrate
-- List all materialized views with their defining queries
SELECT schemaname, matviewname, definition
FROM pg_matviews
ORDER BY schemaname, matviewname;
2. Create the stream table
Take the materialized view's defining query and pass it to
create_stream_table():
Before (materialized view):
CREATE MATERIALIZED VIEW order_totals AS
SELECT customer_id, SUM(amount) AS total, COUNT(*) AS order_count
FROM orders
GROUP BY customer_id;
-- Refreshed via cron or pg_cron:
-- */5 * * * * psql -c "REFRESH MATERIALIZED VIEW CONCURRENTLY order_totals"
After (stream table):
SELECT pgtrickle.create_stream_table(
name => 'order_totals',
query => 'SELECT customer_id, SUM(amount) AS total, COUNT(*) AS order_count
FROM orders GROUP BY customer_id',
schedule => '5m'
);
3. Update application queries
Stream tables live in the pgtrickle schema by default. Update your
application queries to reference the new location:
-- Before:
SELECT * FROM public.order_totals WHERE total > 1000;
-- After:
SELECT * FROM pgtrickle.order_totals WHERE total > 1000;
Or create a view in the original schema for backward compatibility:
CREATE VIEW public.order_totals AS
SELECT customer_id, total, order_count
FROM pgtrickle.order_totals;
4. Recreate indexes
Stream tables are regular heap tables — you can add indexes just like any other table. Recreate the indexes your queries depend on:
-- Before (on materialized view):
CREATE UNIQUE INDEX ON order_totals (customer_id);
-- After (on stream table):
CREATE INDEX ON pgtrickle.order_totals (customer_id);
Note: The
__pgt_row_idcolumn is the primary key on stream tables. You cannot add a separateUNIQUEprimary key, but you can add regular or unique indexes on your business columns.
5. Remove the old materialized view
Once you've verified the stream table is working correctly:
DROP MATERIALIZED VIEW IF EXISTS public.order_totals;
6. Remove external refresh jobs
Delete any cron jobs, pg_cron entries, or application-level refresh triggers that were maintaining the old materialized view.
Migrating Concurrent Refresh Patterns
If you use REFRESH MATERIALIZED VIEW CONCURRENTLY (which requires a
unique index), the stream table equivalent is simpler — differential
refresh never blocks readers and doesn't require a unique index:
Before:
CREATE MATERIALIZED VIEW active_users AS
SELECT user_id, MAX(login_at) AS last_login
FROM logins
WHERE login_at > NOW() - INTERVAL '30 days'
GROUP BY user_id;
CREATE UNIQUE INDEX ON active_users (user_id);
REFRESH MATERIALIZED VIEW CONCURRENTLY active_users;
After:
SELECT pgtrickle.create_stream_table(
name => 'active_users',
query => 'SELECT user_id, MAX(login_at) AS last_login
FROM logins
WHERE login_at > NOW() - INTERVAL ''30 days''
GROUP BY user_id',
schedule => '1m'
);
-- No unique index needed. No manual refresh needed.
Migrating Cascading Materialized Views
If you have materialized views that depend on other materialized views, the migration is straightforward — pg_trickle handles dependency ordering automatically:
Before:
CREATE MATERIALIZED VIEW order_totals AS
SELECT customer_id, SUM(amount) AS total FROM orders GROUP BY customer_id;
CREATE MATERIALIZED VIEW big_customers AS
SELECT customer_id, total FROM order_totals WHERE total > 1000;
-- Must refresh in order:
REFRESH MATERIALIZED VIEW order_totals;
REFRESH MATERIALIZED VIEW big_customers;
After:
SELECT pgtrickle.create_stream_table(
name => 'order_totals',
query => 'SELECT customer_id, SUM(amount) AS total FROM orders GROUP BY customer_id',
schedule => '1m'
);
SELECT pgtrickle.create_stream_table(
name => 'big_customers',
query => 'SELECT customer_id, total FROM pgtrickle.order_totals WHERE total > 1000',
schedule => '1m'
);
-- Dependency ordering is automatic. No manual refresh needed.
Idempotent Deployment
For CI/CD pipelines, use create_or_replace_stream_table() so your
migration scripts are safe to re-run:
SELECT pgtrickle.create_or_replace_stream_table(
name => 'order_totals',
query => 'SELECT customer_id, SUM(amount) AS total FROM orders GROUP BY customer_id',
schedule => '5m',
refresh_mode => 'DIFFERENTIAL'
);
Choosing the Right Refresh Mode
| Scenario | Mode |
|---|---|
| Most migrations (default) | DIFFERENTIAL — only processes changes |
Volatile functions (NOW(), RANDOM()) in the query | FULL — the query result changes even without source DML |
| Need real-time consistency within a transaction | IMMEDIATE |
| Unsure | AUTO (default) — pg_trickle picks the best mode per cycle |
Migration Checklist
- Identify all materialized views and their refresh schedules
- Create equivalent stream tables with matching queries
- Recreate any required indexes on the stream tables
-
Update application queries to reference the
pgtrickleschema - Verify data correctness (compare stream table vs. materialized view)
- Remove external refresh jobs (cron, pg_cron)
- Drop the old materialized views
- Set up monitoring (Prometheus/Grafana or built-in views)
Further Reading
- Getting Started
- SQL Reference — create_stream_table()
- SQL Reference — create_or_replace_stream_table()
- FAQ — Materialized View vs Stream Table
Tutorial: Migrating from pg_ivm to pg_trickle
This guide walks through migrating existing pg_ivm IMMVs (Incrementally
Maintained Materialized Views) to pg_trickle stream tables. It covers API
mapping, behavioral differences, and a step-by-step migration checklist.
See also: plans/ecosystem/GAP_PG_IVM_COMPARISON.md for the full feature comparison and gap analysis between the two extensions.
Why Migrate?
| pg_ivm (IMMV) | pg_trickle (Stream Table) | |
|---|---|---|
| Maintenance model | Immediate only (in-transaction) | Deferred (scheduler) and Immediate |
| Aggregate functions | 5 (COUNT, SUM, AVG, MIN, MAX) | 60+ (all built-in + user-defined) |
| Window functions | Not supported | Full support |
| CTEs (recursive) | Not supported | Semi-naive, DRed, recomputation |
| Subqueries | Very limited | Full (EXISTS, NOT EXISTS, IN, LATERAL, scalar) |
| Set operations | Not supported | UNION, INTERSECT, EXCEPT (bag + set) |
| HAVING clause | Not supported | Supported |
| GROUPING SETS / CUBE / ROLLUP | Not supported | Auto-rewritten to UNION ALL |
| DISTINCT ON | Not supported | Auto-rewritten to ROW_NUMBER |
| Views as sources | Not supported | Auto-inlined |
| Cascading views | Not supported | DAG-aware topological scheduling |
| Background scheduling | None (manual only) | Native cron, duration, CALCULATED |
| Monitoring | 1 catalog table | 15+ diagnostic functions |
| Concurrency | ExclusiveLock during maintenance | Advisory locks, non-blocking reads |
| Parallel refresh | Not supported | Worker pool with caps |
Concept Mapping
| pg_ivm Concept | pg_trickle Equivalent | Notes |
|---|---|---|
| IMMV (Incrementally Maintained Materialized View) | Stream table | Same idea — a query result kept incrementally up to date |
pgivm.create_immv(name, query) | pgtrickle.create_stream_table(name, query) | pg_trickle adds optional schedule and refresh_mode parameters |
pgivm.refresh_immv(name, true) | pgtrickle.refresh_stream_table(name) | Manual refresh |
pgivm.refresh_immv(name, false) | No direct equivalent | pg_trickle has pgtrickle.alter_stream_table(name, enabled => false) to suspend |
pgivm.pg_ivm_immv catalog | pgtrickle.pgt_stream_tables | Plus pgt_status(), refresh_timeline(), etc. |
DROP TABLE immv_name | pgtrickle.drop_stream_table(name) | Stream tables must be dropped via the API |
ALTER TABLE immv RENAME TO ... | pgtrickle.alter_stream_table(old, name => new) | Rename via API |
| In-transaction maintenance (AFTER row triggers) | refresh_mode => 'IMMEDIATE' | Same model — triggers fire in the writing transaction |
| (not available) | refresh_mode => 'DIFFERENTIAL' | Deferred incremental refresh via change buffers |
| (not available) | refresh_mode => 'AUTO' | Picks DIFFERENTIAL or FULL automatically |
| Auto-created indexes on GROUP BY / PK | Manual CREATE INDEX | pg_trickle auto-creates the primary key but not secondary indexes |
Step-by-Step Migration
1. Inventory existing IMMVs
List all pg_ivm IMMVs in your database:
-- pg_ivm catalog
SELECT immvrelid::regclass AS immv_name,
pgivm.get_immv_def(immvrelid) AS defining_query
FROM pgivm.pg_ivm_immv
ORDER BY immvrelid::regclass::text;
Record each IMMV's name, defining query, and any indexes you have created on it.
2. Check query compatibility
pg_trickle supports a superset of pg_ivm's SQL dialect, so any query that works with pg_ivm will work with pg_trickle. However, there are a few things to verify:
- Data types: pg_ivm requires btree operator classes for all columns
(excluding
json,xml,point, etc.). pg_trickle has no such restriction. - Outer joins: If your IMMV uses outer joins, pg_trickle removes pg_ivm's restrictions (single equijoin, no aggregates, no CASE). Your query may work unchanged or you may be able to simplify workarounds you added for pg_ivm.
3. Choose a refresh mode
For each IMMV, decide which pg_trickle refresh mode to use:
| pg_ivm behavior | pg_trickle refresh mode | When to choose |
|---|---|---|
| Zero staleness required | IMMEDIATE | Same in-transaction behavior as pg_ivm |
| Some staleness acceptable | DIFFERENTIAL with schedule | Lower write latency, batched refresh |
| Let pg_trickle decide | AUTO (default) | Recommended for most cases |
4. Create stream tables
For each IMMV, create the corresponding stream table:
pg_ivm (before):
SELECT pgivm.create_immv(
'order_totals',
'SELECT customer_id, SUM(amount) AS total FROM orders GROUP BY customer_id'
);
pg_trickle — IMMEDIATE mode (same behavior as pg_ivm):
SELECT pgtrickle.create_stream_table(
'order_totals',
'SELECT customer_id, SUM(amount) AS total FROM orders GROUP BY customer_id',
NULL, -- no schedule needed for IMMEDIATE
'IMMEDIATE'
);
pg_trickle — deferred mode (lower write latency):
SELECT pgtrickle.create_stream_table(
'order_totals',
'SELECT customer_id, SUM(amount) AS total FROM orders GROUP BY customer_id',
'30s' -- refresh every 30 seconds; mode defaults to AUTO
);
5. Recreate indexes
pg_ivm auto-creates indexes on GROUP BY, DISTINCT, and primary key columns.
pg_trickle auto-creates the primary key (pgt_row_id) but not secondary indexes.
Recreate any indexes that your read queries depend on:
-- Example: index on the GROUP BY column for lookup queries
CREATE INDEX ON pgtrickle.order_totals (customer_id);
6. Update application queries
pg_ivm IMMVs live in the schema where they were created (usually public).
pg_trickle stream tables default to the pgtrickle schema.
-- Before (pg_ivm):
SELECT * FROM public.order_totals WHERE customer_id = 42;
-- After (pg_trickle):
SELECT * FROM pgtrickle.order_totals WHERE customer_id = 42;
To avoid changing application code, create a compatibility view:
CREATE VIEW public.order_totals AS
SELECT * FROM pgtrickle.order_totals;
7. Verify correctness
After creating the stream table and running a refresh, compare results:
-- Compare row counts
SELECT 'immv' AS source, COUNT(*) FROM public.order_totals_immv
UNION ALL
SELECT 'stream_table', COUNT(*) FROM pgtrickle.order_totals;
-- Full diff (should return zero rows)
(SELECT * FROM public.order_totals_immv EXCEPT SELECT * FROM pgtrickle.order_totals)
UNION ALL
(SELECT * FROM pgtrickle.order_totals EXCEPT SELECT * FROM public.order_totals_immv);
8. Drop the old IMMV
Once you have verified the stream table is correct and applications are updated:
DROP TABLE public.order_totals_immv;
9. (Optional) Remove pg_ivm
After all IMMVs are migrated:
DROP EXTENSION pg_ivm CASCADE;
Remove pg_ivm from shared_preload_libraries if it was listed there and
restart PostgreSQL.
Behavioral Differences to Be Aware Of
Locking
- pg_ivm: Holds
ExclusiveLockon the IMMV during maintenance. InREPEATABLE READ/SERIALIZABLE, concurrent writes to the same IMMV's base tables may raise serialization errors. - pg_trickle (IMMEDIATE): Uses advisory locks. Concurrent reads of the stream table are never blocked.
- pg_trickle (deferred): Base table writes only insert into change buffers (~2–50 μs). No lock contention with refresh.
TRUNCATE
- pg_ivm: Synchronously truncates or fully refreshes the IMMV.
- pg_trickle (IMMEDIATE): Performs a full refresh within the same transaction.
- pg_trickle (deferred): Clears the change buffer and queues a full refresh on the next cycle.
Logical Replication
- pg_ivm: Not compatible with logical replication — subscriber nodes do not have triggers that fire for replicated changes.
- pg_trickle (deferred): Supports WAL-based CDC (
pg_trickle.cdc_mode = 'wal') which reads from the WAL directly. Trigger-based CDC also works with logical replication if triggers are created on the subscriber.
Schema Changes
- pg_ivm: No automatic DDL tracking. If a base table column is altered, the IMMV may break silently.
- pg_trickle: Event triggers detect DDL changes on source tables and automatically reinitialize affected stream tables.
Upgrading Queries That pg_ivm Couldn't Handle
pg_ivm's SQL restrictions often force users to create workarounds. With pg_trickle, many of these workarounds can be simplified:
HAVING clauses
-- pg_ivm workaround: filter in application or wrap in a view
SELECT pgivm.create_immv('big_customers',
'SELECT customer_id, SUM(amount) AS total
FROM orders GROUP BY customer_id'
);
-- Then: SELECT * FROM big_customers WHERE total > 1000;
-- pg_trickle: use HAVING directly
SELECT pgtrickle.create_stream_table('big_customers',
'SELECT customer_id, SUM(amount) AS total
FROM orders GROUP BY customer_id
HAVING SUM(amount) > 1000'
);
NOT EXISTS / anti-joins
-- pg_ivm: not supported — manual workaround required
-- pg_trickle: works directly
SELECT pgtrickle.create_stream_table('orphan_orders',
'SELECT o.* FROM orders o
WHERE NOT EXISTS (SELECT 1 FROM customers c WHERE c.id = o.customer_id)'
);
Window functions
-- pg_ivm: not supported
-- pg_trickle: works directly
SELECT pgtrickle.create_stream_table('ranked_products',
'SELECT product_id, category, revenue,
RANK() OVER (PARTITION BY category ORDER BY revenue DESC) AS rnk
FROM product_revenue'
);
UNION ALL pipelines
-- pg_ivm: not supported — requires separate IMMVs + application-side UNION
-- pg_trickle: works directly
SELECT pgtrickle.create_stream_table('all_events',
'SELECT id, ts, ''order'' AS type FROM order_events
UNION ALL
SELECT id, ts, ''return'' AS type FROM return_events'
);
Monitoring After Migration
pg_trickle provides extensive monitoring that pg_ivm does not offer:
-- Overall health
SELECT * FROM pgtrickle.health_check();
-- Status of all stream tables (includes staleness, last refresh, error count)
SELECT * FROM pgtrickle.pgt_status();
-- Recent refresh history across all stream tables
SELECT * FROM pgtrickle.refresh_timeline(20);
-- CDC pipeline health
SELECT * FROM pgtrickle.change_buffer_sizes();
-- Diagnose errors for a specific stream table
SELECT * FROM pgtrickle.diagnose_errors('order_totals');
See SQL Reference for the complete list of monitoring functions.
dbt-pgtrickle
dbt-pgtrickle is the official dbt adapter for pg_trickle. It lets
you define stream tables as dbt models using standard {{ config() }}
blocks, manage them through dbt run / dbt build, and run
incremental refreshes as part of your dbt pipeline.
Quick example
-- models/orders_agg.sql
{{ config(
materialized = 'stream_table',
schedule = '5m',
refresh_mode = 'DIFFERENTIAL'
) }}
SELECT customer_id,
COUNT(*) AS order_count,
SUM(amount) AS total_spent
FROM {{ ref('orders') }}
GROUP BY customer_id
Run it:
dbt run --select orders_agg
The model is created as a stream table and refreshed automatically.
Subsequent dbt run invocations update the defining query if it changed
(via ALTER QUERY), without dropping and recreating the table.
Installation
pip install dbt-pgtrickle
Requires dbt-postgres 1.7+ and pg_trickle v0.30+.
The full configuration reference, supported materializations, macros, testing guide, and CI setup are in the dbt-pgtrickle README.
dbt-pgtrickle
A dbt package that integrates
pg_trickle stream tables into your dbt
project via a custom stream_table materialization.
No custom Python adapter required — works with the standard dbt-postgres
adapter. Just Jinja SQL macros that call pg_trickle's SQL API.
Prerequisites
| Requirement | Minimum Version |
|---|---|
| dbt Core | ≥ 1.9 |
| dbt-postgres adapter | Matching dbt Core version |
| PostgreSQL | 18.x |
| pg_trickle extension | ≥ 0.1.0 (CREATE EXTENSION pg_trickle;) |
Installation
From Git (recommended until dbt Hub listing is live)
Add to your packages.yml:
packages:
- git: "https://github.com/trickle-labs/pg-trickle.git"
revision: v0.15.0
subdirectory: "dbt-pgtrickle"
From dbt Hub (once published)
After the package is listed on dbt Hub, you can install by package name:
packages:
- package: grove/dbt_pgtrickle
version: [">=0.15.0", "<1.0.0"]
Note: dbt Hub listing requires a separate GitHub repository for the package. See docs/integrations/dbt-hub-submission.md for the submission checklist and steps.
Then run:
dbt deps
Quick Start
Create a model with materialized='stream_table':
-- models/marts/order_totals.sql
{{
config(
materialized='stream_table',
schedule='5m',
refresh_mode='DIFFERENTIAL'
)
}}
SELECT
customer_id,
SUM(amount) AS total_amount,
COUNT(*) AS order_count
FROM {{ source('raw', 'orders') }}
GROUP BY customer_id
dbt run --select order_totals # Creates the stream table
dbt test --select order_totals # Tests work normally (it's a real table)
Configuration Reference
| Key | Type | Default | Description |
|---|---|---|---|
materialized | string | — | Must be 'stream_table' |
schedule | string/null | '1m' | Refresh schedule (e.g., '5m', '1h', cron). null for pg_trickle's CALCULATED schedule. |
refresh_mode | string | 'DIFFERENTIAL' | 'FULL', 'DIFFERENTIAL', 'AUTO', or 'IMMEDIATE' |
initialize | bool | true | Populate on creation |
status | string/null | null | 'ACTIVE' or 'PAUSED'. When set, applies on subsequent runs via alter_stream_table(). |
stream_table_name | string | model name | Override stream table name |
stream_table_schema | string | target schema | Override schema |
cdc_mode | string/null | null | CDC mode override: 'auto', 'trigger', or 'wal'. null uses the GUC default. |
partition_by | string/null | null | Column name for RANGE partitioning of the storage table (v0.13.0+). Cannot be changed after creation. |
fuse | string/null | null | Fuse circuit-breaker mode: 'off', 'on', or 'auto' (v0.13.0+). Applied via alter_stream_table() on every run; no-op if unchanged. |
fuse_ceiling | int/null | null | Change-count threshold that triggers the fuse (v0.13.0+). null uses the global GUC default. |
fuse_sensitivity | int/null | null | Number of consecutive over-ceiling observations before the fuse blows (v0.13.0+). null means 1 (immediate). |
partition_by — RANGE partitioning
Partition the stream table's storage table by a column value. pg_trickle creates a PARTITION BY RANGE (<col>) storage table with a default catch-all partition. Add your own date/integer range partitions via standard PostgreSQL DDL after dbt run.
-- models/marts/events_by_day.sql
{{ config(
materialized='stream_table',
schedule='1m',
refresh_mode='DIFFERENTIAL',
partition_by='event_day'
) }}
SELECT
event_day,
user_id,
COUNT(*) AS event_count
FROM {{ source('raw', 'events') }}
GROUP BY event_day, user_id
Note:
partition_byis applied only at creation time. Changing it after the stream table exists has no effect. Usedbt run --full-refreshto recreate with a new partition key.
fuse — Circuit breaker
The fuse circuit breaker suspends refreshes when the change volume exceeds a threshold, protecting against runaway refresh cycles during bulk ingestion.
-- models/marts/order_totals.sql
{{ config(
materialized='stream_table',
schedule='5m',
refresh_mode='DIFFERENTIAL',
fuse='auto',
fuse_ceiling=50000,
fuse_sensitivity=3
) }}
SELECT customer_id, SUM(amount) AS total
FROM {{ source('raw', 'orders') }}
GROUP BY customer_id
fuse value | Behaviour |
|---|---|
'off' | Fuse disabled (default) |
'on' | Fuse always active; blows when ceiling is exceeded |
'auto' | Fuse activates only when the delta is large enough to make FULL refresh cheaper than DIFFERENTIAL |
Fuse parameters are applied on every dbt run via alter_stream_table() — only calls the SQL function when the values have genuinely changed from the catalog state.
Project-level defaults
# dbt_project.yml
models:
my_project:
marts:
+materialized: stream_table
+schedule: '5m'
+refresh_mode: DIFFERENTIAL
Operations
pgtrickle_refresh — Manual refresh
dbt run-operation pgtrickle_refresh --args '{"model_name": "order_totals"}'
refresh_all_stream_tables — Refresh all in dependency order
Refreshes all dbt-managed stream tables in topological (dependency) order.
Upstream tables are refreshed before downstream ones. Designed for CI pipelines:
run after dbt run and before dbt test to ensure all data is current.
# Refresh all dbt-managed stream tables
dbt run-operation refresh_all_stream_tables
# Refresh only stream tables in a specific schema
dbt run-operation refresh_all_stream_tables --args '{"schema": "analytics"}'
drop_all_stream_tables — Drop dbt-managed stream tables
Drops only stream tables defined as dbt models (safe in shared environments):
dbt run-operation drop_all_stream_tables
drop_all_stream_tables_force — Drop ALL stream tables
Drops everything from the pg_trickle catalog, including non-dbt stream tables:
dbt run-operation drop_all_stream_tables_force
pgtrickle_check_cdc_health — CDC pipeline health
dbt run-operation pgtrickle_check_cdc_health
Raises an error (non-zero exit) if any CDC source is unhealthy.
Freshness Monitoring
Native dbt source freshness is not supported (the last_refresh_at column lives in
the catalog, not on the stream table). Use the pgtrickle_check_freshness run-operation
instead:
# Check all active stream tables (defaults: warn=600s, error=1800s)
dbt run-operation pgtrickle_check_freshness
# Custom thresholds
dbt run-operation pgtrickle_check_freshness \
--args '{model_name: order_totals, warn_seconds: 300, error_seconds: 900}'
Exits non-zero when any stream table exceeds the error threshold — safe for CI.
Useful dbt Commands
# List all stream table models
dbt ls --select config.materialized:stream_table
# Full refresh (drop + recreate)
dbt run --select order_totals --full-refresh
# Build models + tests in DAG order
dbt build --select order_totals
Note: dbt build runs stream table models early in the DAG. If downstream models
depend on a stream table with initialize: false, the table may not be populated yet.
Testing
Stream tables are standard PostgreSQL heap tables — all dbt tests work normally:
models:
- name: order_totals
columns:
- name: customer_id
tests:
- not_null
- unique
Stream Table Health Test
Use the built-in stream_table_healthy generic test to fail your dbt test suite
when a stream table is stale, erroring, or paused:
models:
- name: order_totals
tests:
- dbt_pgtrickle.stream_table_healthy:
warn_seconds: 300 # fail if stale for more than 5 minutes
The test queries pgtrickle.pg_stat_stream_tables and returns rows for any
unhealthy condition. An empty result set means the stream table is healthy.
Stream Table Status Macro
For more programmatic control, use the pgtrickle_stream_table_status() macro
directly in custom tests or run-operations:
{%- set st = dbt_pgtrickle.pgtrickle_stream_table_status('order_totals', warn_seconds=300) -%}
{# st.status is one of: 'healthy', 'stale', 'erroring', 'paused', 'not_found' #}
{# st.staleness_seconds, st.consecutive_errors, st.total_refreshes, etc. #}
__pgt_row_id Column
pg_trickle adds an internal __pgt_row_id column to stream tables for row identity
tracking. This column:
- Appears in
SELECT *anddbt docs generate - Does not affect
dbt testunless you check column counts - Can be documented to reduce confusion:
columns:
- name: __pgt_row_id
description: "Internal pg_trickle row identity hash. Ignore this column."
Limitations
| Limitation | Workaround |
|---|---|
| No in-place query alteration | Materialization auto-drops and recreates when query changes |
__pgt_row_id visible | Document it; exclude in downstream SELECT |
No native dbt source freshness | Use pgtrickle_check_freshness run-operation |
No dbt snapshot support | Snapshot the stream table as a regular table |
| Query change detection is whitespace-sensitive | dbt compiles deterministically; unnecessary recreations are safe |
| PostgreSQL 18 required | Extension requirement |
| Shared version tags with pg_trickle extension | Pin to specific git revision |
Contributing
See AGENTS.md for development guidelines and the implementation plan for design rationale.
Running tests locally
The quickest way (requires Docker and dbt installed):
# Full run — builds Docker image, starts container, runs tests, cleans up
just test-dbt
# Fast run — reuses existing Docker image (run after first build)
just test-dbt-fast
Or use the script directly with options:
cd dbt-pgtrickle/integration_tests/scripts
# Default: builds image, runs tests with dbt 1.9, cleans up
./run_dbt_tests.sh
# Skip image rebuild (faster iteration)
./run_dbt_tests.sh --skip-build
# Keep the container running after tests (for debugging)
./run_dbt_tests.sh --skip-build --keep-container
# Use a custom port (avoids conflicts with local PostgreSQL)
PGPORT=25432 ./run_dbt_tests.sh
Manual testing against an existing pg_trickle instance
If you already have PostgreSQL 18 + pg_trickle running locally:
export PGHOST=localhost PGPORT=5432 PGUSER=postgres PGPASSWORD=postgres PGDATABASE=postgres
cd dbt-pgtrickle/integration_tests
dbt deps
dbt seed
dbt run
./scripts/wait_for_populated.sh order_totals 30
dbt test
dbt run-operation drop_all_stream_tables
License
Apache 2.0 — see LICENSE.
CloudNativePG / Kubernetes
pg_trickle is designed to work with CloudNativePG (CNPG) — the Kubernetes operator for PostgreSQL. The extension is loaded via Image Volume Extensions, meaning no custom PostgreSQL image is needed.
Prerequisites
- Kubernetes 1.33+ with the ImageVolume feature gate enabled
- CloudNativePG operator 1.28+
- The
pg_trickle-extOCI image available in your cluster registry
Architecture
┌─────────────────────────────────────┐
│ CNPG Cluster (3 pods) │
│ │
│ ┌──────────┐ ┌──────────────────┐ │
│ │ Primary │ │ pg_trickle-ext │ │
│ │ PG 18 │◄─┤ (ImageVolume) │ │
│ │ │ │ .so + .sql only │ │
│ └──────────┘ └──────────────────┘ │
│ ┌──────────┐ ┌──────────┐ │
│ │ Replica 1│ │ Replica 2│ │
│ │ (standby)│ │ (standby)│ │
│ └──────────┘ └──────────┘ │
└─────────────────────────────────────┘
- The scheduler runs on the primary pod only. Replica pods detect recovery
mode (
pg_is_in_recovery() = true) and sleep. - Stream tables are replicated to standbys via physical streaming replication like any other heap table.
- Pod restarts are safe — the scheduler resumes from the stored frontier with no data loss.
Deploying pg_trickle on CNPG
1. Build the extension image
The cnpg/Dockerfile.ext builds a scratch-based OCI image containing
only the shared library, control file, and SQL migrations:
# From the dist/ directory with pre-built artifacts:
docker build -t ghcr.io/<owner>/pg_trickle-ext:0.13.0 -f cnpg/Dockerfile.ext dist/
docker push ghcr.io/<owner>/pg_trickle-ext:0.13.0
2. Deploy the Cluster
Apply the Cluster manifest with pg_trickle configured as an Image Volume extension:
# cnpg/cluster-example.yaml
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: pg-trickle-demo
spec:
instances: 3
imageName: ghcr.io/cloudnative-pg/postgresql:18
postgresql:
shared_preload_libraries:
- pg_trickle
extensions:
- name: pg-trickle
image:
reference: ghcr.io/<owner>/pg_trickle-ext:0.13.0
parameters:
max_worker_processes: "8"
bootstrap:
initdb:
database: app
owner: app
storage:
size: 10Gi
storageClass: standard
kubectl apply -f cnpg/cluster-example.yaml
3. Enable the extension
Use the CNPG Database resource for declarative extension management:
# cnpg/database-example.yaml
apiVersion: postgresql.cnpg.io/v1
kind: Database
metadata:
name: app
spec:
cluster:
name: pg-trickle-demo
name: app
owner: app
extensions:
- name: pg_trickle
kubectl apply -f cnpg/database-example.yaml
4. Verify
kubectl exec -it pg-trickle-demo-1 -- psql -U postgres -d app -c \
"SELECT pgtrickle.version();"
Key Considerations
Worker processes
Each database with pg_trickle needs one background worker slot. Set
max_worker_processes in the Cluster manifest to accommodate the launcher
(1) + one scheduler per database + any parallel refresh workers:
parameters:
max_worker_processes: "16"
Persistent volumes
Catalog tables (pgtrickle.pgt_stream_tables) and change buffers
(pgtrickle_changes.*) are stored in regular PostgreSQL tablespaces.
Persistent volume claims preserve them across pod rescheduling.
Backups
pg_trickle state (catalog, change buffers, stream table data) is included in CNPG's Barman object-store backups automatically. After a restore, the scheduler detects frontier inconsistencies and performs a full refresh on the first cycle. See Backup and Restore for details.
Failover
When the primary pod fails and a replica is promoted, the new primary's scheduler starts automatically. Since stream tables were replicated via streaming replication, they are already up-to-date (minus replication lag). The scheduler resumes refreshing from the stored frontier.
Resource limits
For production deployments, set resource requests and limits in the Cluster manifest to prevent the scheduler from starving other workloads:
resources:
requests:
memory: 512Mi
cpu: 500m
limits:
memory: 2Gi
cpu: 2000m
Example manifests
The repository includes ready-to-use manifests in the cnpg/ directory:
| File | Purpose |
|---|---|
cnpg/Dockerfile.ext | Build the scratch-based extension image |
cnpg/Dockerfile.ext-build | Multi-stage build for CI/CD pipelines |
cnpg/cluster-example.yaml | Complete Cluster manifest with pg_trickle |
cnpg/database-example.yaml | Database resource with declarative extension management |
Further reading
- CloudNativePG Image Volume Extensions
- CloudNativePG Declarative Database Management
- Backup and Restore
- Configuration Reference
Citus Distributed Tables
pg_trickle supports Citus distributed tables as sources for incremental view maintenance and as output targets for stream tables.
Prerequisites
- PostgreSQL 17 or 18 with
wal_level = logicalon every node (coordinator and all workers). - Citus 12.x or 13.x installed on the coordinator and all workers.
- The
dblinkextension installed on the coordinator (CREATE EXTENSION IF NOT EXISTS dblink). - pg_trickle installed at the same version on every node.
- Each source distributed table must have
REPLICA IDENTITY FULL:ALTER TABLE my_distributed_table REPLICA IDENTITY FULL;
Architecture Overview
┌───────────────────────────────────────────────────────────┐
│ Citus Coordinator │
│ │
│ pg_trickle scheduler │
│ ├─ reads coordinator WAL slot (local sources) │
│ └─ polls worker WAL slots via dblink (distributed) │
│ │
│ pgtrickle.pgt_worker_slots ← tracks per-worker slots │
│ pgtrickle.citus_status ← observability view │
└─────────────┬────────────┬────────────────────────────────┘
│ │
dblink│ dblink│
▼ ▼
┌─────────────────┐ ┌─────────────────┐
│ Citus Worker 1 │ │ Citus Worker 2 │
│ WAL slot: │ │ WAL slot: │
│ pgtrickle_... │ │ pgtrickle_... │
└─────────────────┘ └─────────────────┘
pg_trickle creates a logical replication slot on each worker for every
distributed source table. The coordinator scheduler polls these slots via
dblink on every tick, merges the decoded changes into the coordinator-local
change buffer, and then applies them to the stream table output.
Installation
1. Verify prerequisites on every node
-- Run on coordinator AND each worker:
SHOW wal_level; -- must be 'logical'
SELECT extname, extversion FROM pg_extension WHERE extname IN ('citus', 'pg_trickle', 'dblink');
2. Create the extension on the coordinator
CREATE EXTENSION IF NOT EXISTS dblink;
CREATE EXTENSION IF NOT EXISTS pg_trickle;
3. Run pre-flight checks
pg_trickle provides two pre-flight helpers that verify worker readiness:
-- COORD-7: Verify pg_trickle version matches on all workers
SELECT pgtrickle.source_stable_name(0::oid); -- triggers version check on startup
-- COORD-8: Verify wal_level=logical on all workers
-- (checked automatically when a distributed CDC source is set up)
4. Prepare your distributed source table
-- Distribute your source table if not already distributed
SELECT create_distributed_table('orders', 'customer_id');
-- REPLICA IDENTITY FULL is required for CDC on distributed tables
ALTER TABLE orders REPLICA IDENTITY FULL;
5. Create a stream table over a distributed source
-- Basic stream table (output stored on coordinator)
CALL pgtrickle.create_stream_table(
name => 'orders_summary',
query => 'SELECT customer_id, count(*) AS order_count, sum(amount) AS total
FROM orders GROUP BY customer_id'
);
-- Distributed output: co-locate the stream table with the source
CALL pgtrickle.create_stream_table(
name => 'orders_summary',
query => 'SELECT customer_id, count(*) AS order_count,
sum(amount) AS total
FROM orders GROUP BY customer_id',
output_distribution_column => 'customer_id'
);
The output_distribution_column parameter (added in v0.33.0) converts the
output storage table into a Citus distributed table on that column immediately
after creation. If Citus is not loaded and you pass this parameter, an error
is raised.
Placement Options
| Placement | When to use | Created by |
|---|---|---|
local (default) | Small result sets, coordinator-only queries | create_stream_table() without output_distribution_column |
distributed | Large result sets, co-location with source shards | output_distribution_column => 'col' |
reference | Lookup tables replicated to all workers | create_distributed_table(st, 'col', colocate_with => 'none') after creation |
Monitoring
The pgtrickle.citus_status view shows per-worker CDC slot health:
SELECT
pgt_schema || '.' || pgt_name AS stream_table,
source_stable_name,
source_placement,
worker_name,
worker_port,
worker_slot,
worker_frontier,
last_polled_at, -- v0.34.0+
lease_health -- v0.34.0+: 'unlocked' | 'locked' | 'expired'
FROM pgtrickle.citus_status
ORDER BY pgt_name, worker_name;
| Column | Description |
|---|---|
coordinator_slot | Local WAL slot name on the coordinator |
source_placement | distributed, reference, or local |
worker_name | Hostname of the Citus worker |
worker_port | Port of the Citus worker |
worker_slot | WAL slot name on the worker |
worker_frontier | Last consumed LSN on the worker |
last_polled_at | Timestamp of the last successful poll for each worker slot (v0.34.0+) |
lease_holder | Session that currently holds the pgt_st_locks lease, if any (v0.34.0+) |
lease_acquired_at | When the current lease was acquired (v0.34.0+) |
lease_expires_at | When the current lease expires (v0.34.0+) |
lease_health | 'unlocked', 'locked', or 'expired' (v0.34.0+) |
Worker-failure alerting GUC (v0.34.0)
| GUC | Default | Description |
|---|---|---|
pg_trickle.citus_worker_retry_ticks | 5 | Consecutive per-worker poll failures before raising a WARNING in the PostgreSQL log. Set to 0 to disable. |
pg_trickle.citus_st_lock_lease_ms | 60000 | Duration (ms) of the pgt_st_locks distributed-refresh lease. Must be ≥ pg_ripple.merge_fence_timeout_ms when pg_ripple is in use. |
Failure Modes
Worker unreachable
If a worker becomes unreachable, poll_worker_slot_changes() returns an error.
pg_trickle logs the failure and skips that worker's changes for the current
tick. Refresh resumes automatically once the worker is reachable again.
Action: Monitor pgtrickle.citus_status and alert on gaps in
worker_frontier.
WAL slot recycled (slot missing or lag too high)
If the coordinator stops polling a worker slot for too long, PostgreSQL may
recycle the WAL and invalidate the slot. pg_trickle will log a
WalTransitionError and fall back to a full refresh for that stream table.
Prevention: Set pg_trickle.citus_slot_max_lag_bytes (default: 1 GB)
and ensure the coordinator restarts within the slot retention window.
Recovery:
-- Drop the stale slot on the worker (via dblink if needed)
SELECT pg_drop_replication_slot('pgtrickle_<stable_name>');
-- pg_trickle will re-create it on the next scheduler tick
Shard rebalance
Citus shard rebalancing changes which worker holds which shards. Since v0.34.0,
pg_trickle detects a topology change automatically (by comparing pg_dist_node
active primaries against pgt_worker_slots) and recovers without operator
intervention:
- Stale slot entries for removed workers are dropped.
- New
pgt_worker_slotsrows are inserted for the incoming workers. - The affected stream table is marked for a full refresh on the next tick.
No manual DROP + CREATE of stream tables is required after a rebalance.
Version mismatch across nodes
If pg_trickle versions differ between the coordinator and workers,
check_citus_version_compat() raises an error during CDC setup. Install the
same pg_trickle version on all nodes before creating distributed stream tables.
Known Limitations
-
MERGE is not supported for distributed stream tables. pg_trickle automatically uses the
DELETE + INSERT … ON CONFLICT DO UPDATEpath for distributed output tables. -
Cross-shard JOINs in the stream table query follow normal Citus pushdown rules. If the plan is not pushable, the query runs on the coordinator.
-
Citus reference tables work as sources with trigger-based CDC only (per-worker WAL slots are not needed for reference tables).
-
Worker failure alerting — configure
pg_trickle.citus_worker_retry_ticks(default 5) to control how many consecutive poll failures trigger a WARNING. Set to0to disable the alert entirely.
pg_ripple Integration (v0.58.0+)
pg_trickle v0.33.0 and pg_ripple v0.58.0 can be deployed together on a Citus
cluster. pg_ripple stores its RDF triples in Vertical Partitioning (VP)
tables that are distributed by subject hash (s BIGINT). pg_trickle can track
changes to these tables and materialize downstream stream tables.
Co-location Contract
VP tables are distributed on s (the XXH3-128 subject ID encoded as BIGINT).
Downstream stream tables consuming VP data should use the same distribution
column to avoid coordinator fan-out:
CALL pgtrickle.create_stream_table(
name => 'rdf_subjects',
query => 'SELECT s, count(*) AS triple_count
FROM _pg_ripple.vp_42_delta GROUP BY s',
output_distribution_column => 's' -- co-locate with VP shards
);
The natural row identity for such a stream table is (s, predicate_hash, g) —
the triple's encoded subject, predicate, and named-graph. Configure pg_trickle
with this composite key so the DELETE WHERE row_id IN (…) apply path targets
the correct shard.
VP Table Promotion Notifications
When pg_ripple distributes a new VP table it emits a
pg_ripple.vp_promoted NOTIFY with the following JSON payload:
{
"table": "_pg_ripple.vp_42_delta",
"shard_count": 32,
"shard_table_prefix":"_pg_ripple.vp_42_delta_",
"predicate_id": 42
}
pg_trickle ships a helper function that processes this payload. Use it from any regular backend session that LISTENs to the channel:
LISTEN "pg_ripple.vp_promoted";
-- … wait for pg_notify …
-- (in application code: call handle_vp_promoted with the notification payload)
SELECT pgtrickle.handle_vp_promoted(pg_notification_queue_transfer());
-- or pass the payload directly:
SELECT pgtrickle.handle_vp_promoted(
'{"table":"_pg_ripple.vp_42_delta","shard_count":32,'
'"shard_table_prefix":"_pg_ripple.vp_42_delta_","predicate_id":42}'
);
handle_vp_promoted() logs the promotion and, when the VP table is already
tracked as a distributed CDC source, signals the scheduler that worker-slot
probing should run on the next tick.
Merge Fencing and pgt_st_locks Lease Alignment
pg_ripple's merge worker emits pg_ripple.merge_start / merge_end NOTIFY
signals as observability hints — the TRUNCATE+INSERT merge is a single 2PC
transaction so no inconsistent state is visible to pg_trickle's per-worker WAL
decoders even without these signals.
pg_trickle uses pgtrickle.pgt_st_locks (catalog-based leases) for cross-node
coordination. Set the pgt_st_locks lease expiry ≥
pg_ripple.merge_fence_timeout_ms to prevent a lease from expiring mid-merge:
-- pg_ripple side (postgresql.conf or SET):
SET pg_ripple.merge_fence_timeout_ms = 30000; -- 30 seconds
-- pg_trickle side:
SET pg_trickle.citus_st_lock_lease_ms = 45000; -- 45 seconds (≥ 30s fence)
Monitor both together:
SELECT
r.predicate_id,
r.cycle_duration_ms,
c.stream_table,
c.worker_frontier
FROM pg_ripple.merge_status() r
JOIN pgtrickle.citus_status c
ON c.source_stable_name LIKE '_pg_ripple_vp_' || r.predicate_id || '_%';
Prerequisites
- pg_ripple ≥ 0.58.0 (Citus support)
- pg_trickle ≥ 0.33.0 (distributed CDC + stream tables)
- Citus 12.x on all nodes
pg_ripple.citus_sharding_enabled = onpg_ripple.citus_trickle_compat = on(setscolocate_with='none'on VP tables, avoiding cross-shard tombstone deletes during CDC apply)
Performance Considerations
dblinkpolling adds round-trip latency per worker per tick. On a loopback network, throughput exceeds 50 k rows/s (seebenches/bench_remote_slot_poll). If your workload requires higher throughput, consider batching slot polls or increasing the scheduler poll interval.- For large distributed stream tables, co-locating the output with the source
shards (
output_distribution_column) avoids data movement during apply.
See Also
- SQL Reference —
create_stream_table - Architecture — Citus distributed CDC
- Monitoring —
citus_statusview - CHANGELOG — v0.32.0 (stable naming and frontier foundations)
- CHANGELOG — v0.33.0 (distributed CDC and stream tables)
- CHANGELOG — v0.34.0 (automated distributed CDC scheduler & shard rebalance auto-recovery)
Prometheus & Grafana Monitoring
pg_trickle ships with a complete observability stack based on
postgres_exporter, Prometheus, and Grafana. The monitoring/
directory in the repository contains everything you need.
Quick Start
cd monitoring/
docker compose up -d
Open Grafana at http://localhost:3000 (default: admin / admin).
The pg_trickle Overview dashboard is pre-provisioned.
Architecture
PostgreSQL + pg_trickle
│
│ custom SQL queries
▼
postgres_exporter (:9187)
│
│ /metrics (Prometheus format)
▼
Prometheus (:9090)
│
│ data source
▼
Grafana (:3000)
postgres_exporter runs custom SQL queries defined in
prometheus/pg_trickle_queries.yml against the pg_trickle monitoring views
(pgtrickle.stream_tables_info, pgtrickle.pg_stat_stream_tables, etc.)
and exposes them as Prometheus metrics.
Connecting to an Existing Database
If you already have PostgreSQL + pg_trickle running, configure the exporter to point at your instance:
export PG_HOST=your-pg-host
export PG_PORT=5432
export PG_USER=postgres
export PG_PASSWORD=yourpassword
export PG_DATABASE=yourdb
docker compose up -d
Or edit the DATA_SOURCE_NAME in docker-compose.yml directly.
Metrics Exposed
All metrics are prefixed pg_trickle_.
| Metric | Type | Description |
|---|---|---|
pg_trickle_stream_tables_total | gauge | Total stream tables by status |
pg_trickle_stale_tables_total | gauge | Tables with data older than schedule |
pg_trickle_consecutive_errors | gauge | Per-table consecutive error count |
pg_trickle_refresh_duration_ms | gauge | Average refresh duration (ms) |
pg_trickle_total_refreshes | counter | Total refresh count per table |
pg_trickle_failed_refreshes | counter | Failed refresh count per table |
pg_trickle_rows_inserted_total | counter | Rows inserted per table |
pg_trickle_rows_deleted_total | counter | Rows deleted per table |
pg_trickle_staleness_seconds | gauge | Seconds since last successful refresh |
pg_trickle_cdc_pending_rows | gauge | Pending rows in CDC change buffer |
pg_trickle_cdc_buffer_bytes | gauge | CDC change buffer size in bytes |
pg_trickle_scheduler_running | gauge | 1 if scheduler background worker is alive |
pg_trickle_health_status | gauge | Overall health: 0=OK, 1=WARNING, 2=CRITICAL |
Pre-configured Alerts
Alerting rules are defined in prometheus/alerts.yml:
| Alert | Condition | Severity |
|---|---|---|
PgTrickleTableStale | Staleness > 5 min past schedule | warning |
PgTrickleConsecutiveErrors | ≥ 3 consecutive refresh failures | warning |
PgTrickleTableSuspended | Any table in SUSPENDED status | critical |
PgTrickleCdcBufferLarge | CDC buffer > 1 GB | warning |
PgTrickleSchedulerDown | Scheduler not running for > 2 min | critical |
PgTrickleHighRefreshDuration | Avg refresh > 30 s | warning |
NOTIFY-Based Alerting
In addition to Prometheus alerts, pg_trickle emits real-time PostgreSQL
NOTIFY events on the pg_trickle_alert channel:
LISTEN pg_trickle_alert;
Events include stale_data, auto_suspended, reinitialize_needed,
buffer_growth_warning, fuse_blown, refresh_completed, and
refresh_failed. Each notification carries a JSON payload with the stream
table name and relevant details.
You can bridge NOTIFY events to external alerting systems (PagerDuty, Slack,
etc.) using tools like pgnotify or a
simple LISTEN loop in your application.
Grafana Dashboard
The pre-provisioned pg_trickle Overview dashboard
(grafana/dashboards/pg_trickle_overview.json) includes panels for:
- Stream table status distribution (active / suspended / error)
- Refresh rate and duration over time
- Staleness heatmap
- CDC buffer sizes
- Consecutive error counts
- Scheduler uptime
Built-in SQL Monitoring Views
pg_trickle also provides built-in monitoring accessible without Prometheus:
-- Quick health overview (returns warnings and errors)
SELECT * FROM pgtrickle.health_check() WHERE severity != 'OK';
-- Stream table status and staleness
SELECT name, status, refresh_mode, staleness
FROM pgtrickle.stream_tables_info;
-- Detailed refresh statistics
SELECT * FROM pgtrickle.pg_stat_stream_tables;
-- CDC health per source table
SELECT * FROM pgtrickle.check_cdc_health();
-- Change buffer sizes
SELECT * FROM pgtrickle.change_buffer_sizes()
ORDER BY pending_rows DESC;
See the SQL Reference for the complete list of monitoring functions.
Files Reference
| File | Purpose |
|---|---|
monitoring/docker-compose.yml | Demo stack: PG + exporter + Prometheus + Grafana |
monitoring/prometheus/prometheus.yml | Prometheus scrape configuration |
monitoring/prometheus/pg_trickle_queries.yml | Custom SQL queries for postgres_exporter |
monitoring/prometheus/alerts.yml | Alerting rules |
monitoring/grafana/provisioning/ | Auto-provisioned data source + dashboard |
monitoring/grafana/dashboards/pg_trickle_overview.json | Overview dashboard |
Requirements
- Docker 24+ with Compose v2
- pg_trickle 0.10.0+ installed in the target database
- PostgreSQL user with
SELECTon thepgtrickle.*schema
PgBouncer & Connection Poolers
pg_trickle's background scheduler uses session-level PostgreSQL features. This page explains how to configure pg_trickle alongside connection poolers like PgBouncer, Supavisor (Supabase), and PgCat.
Compatibility Matrix
| Pooling Mode | Compatible? | Notes |
|---|---|---|
Session mode (pool_mode = session) | ✅ Fully | All features work. |
| Direct connection (no pooler for scheduler) | ✅ Fully | Application queries can still go through a pooler. |
Transaction mode (pool_mode = transaction) | ❌ Not supported | Advisory locks, prepared statements, and LISTEN/NOTIFY are session-scoped. |
Statement mode (pool_mode = statement) | ❌ Not supported | Same session-scoped limitations. |
Why Transaction Mode Breaks
The pg_trickle scheduler relies on three session-level features:
| Feature | Problem in Transaction Mode |
|---|---|
pg_advisory_lock() | Session lock released when connection returns to pool — concurrent refreshes become possible |
PREPARE / EXECUTE | Prepared statements vanish on connection hop — "prepared statement does not exist" errors |
LISTEN / NOTIFY | Listener loses notifications when assigned a different backend connection |
Recommended Setup
Route the pg_trickle background worker through a direct connection while keeping application traffic on the pooler:
┌─────────────────┐ ┌──────────────┐
│ Application │────▶│ PgBouncer │──┐
│ (transaction │ │ (txn mode) │ │
│ mode OK) │ └──────────────┘ │
└─────────────────┘ │
▼
┌─────────────────┐ ┌─────────────┐
│ pg_trickle │───────────────▶│ PostgreSQL │
│ scheduler │ direct conn │ │
│ (session mode) │ └─────────────┘
└─────────────────┘
The scheduler connects directly to PostgreSQL as a background worker — it does not go through the pooler at all. No special configuration is needed for this; the scheduler always uses an internal SPI connection.
The pooler only matters for application queries that read from stream
tables or call pg_trickle functions (e.g., refresh_stream_table()).
Platform-Specific Notes
Supabase
Supabase uses Supavisor in transaction mode by default. pg_trickle's
scheduler works because it runs as a background worker (bypasses the
pooler). Application queries against stream tables work normally through
the pooler since they are regular SELECT statements.
If you call pgtrickle.refresh_stream_table() from application code,
use the direct connection string (port 5432) rather than the pooled
connection (port 6543).
Neon
Neon uses a custom proxy that supports both session and transaction modes. Use the session-mode connection string for any pg_trickle management calls. The scheduler runs as a background worker and is unaffected by the proxy.
AWS RDS Proxy
RDS Proxy only supports transaction-mode pooling. The pg_trickle scheduler runs as a background worker inside the RDS instance and is unaffected. Application queries reading stream tables work normally through the proxy.
Manual refresh_stream_table() calls through the proxy may fail due to
advisory lock issues. Use a direct connection for management operations.
Pooler Compatibility Mode
pg_trickle includes a pooler_compatibility_mode setting (v0.10.0+) that
adjusts internal behavior for environments where the scheduler's SPI
connection may be affected by pooler-like middleware:
-- Usually not needed — the scheduler bypasses external poolers
SHOW pg_trickle.pooler_compatibility_mode;
This GUC is primarily for edge cases in managed PostgreSQL services. For standard deployments, the default setting works correctly.
Further Reading
Flyway & Liquibase Migration Frameworks
pg_trickle stream tables are managed through SQL function calls, not standard
DDL (CREATE TABLE / ALTER TABLE). This page documents patterns for
integrating pg_trickle with Flyway and Liquibase migration frameworks.
Key Principle
Stream tables are created and managed via pgtrickle.create_stream_table(),
pgtrickle.alter_stream_table(), and pgtrickle.drop_stream_table(). These
are regular SQL function calls that can be embedded in any migration script.
CDC triggers are automatically installed on source tables during stream table creation — no manual trigger management is needed.
Flyway
Creating Stream Tables in Migrations
Place stream table definitions in versioned migration files alongside your regular schema changes:
-- V3__create_order_stream_tables.sql
-- 1. Create the source tables first (standard DDL)
CREATE TABLE IF NOT EXISTS orders (
id SERIAL PRIMARY KEY,
region TEXT NOT NULL,
amount NUMERIC(10,2) NOT NULL,
created_at TIMESTAMPTZ DEFAULT now()
);
-- 2. Create stream tables via pg_trickle API
SELECT pgtrickle.create_stream_table(
'order_totals',
$$SELECT region, COUNT(*) AS order_count, SUM(amount) AS total
FROM orders GROUP BY region$$,
schedule => '5s',
refresh_mode => 'DIFFERENTIAL'
);
Altering Stream Tables
Use pgtrickle.alter_stream_table() in a new migration:
-- V5__update_order_totals_schedule.sql
SELECT pgtrickle.alter_stream_table(
'order_totals',
schedule => '10s'
);
Altering the Defining Query
Use alter_query to change the SQL without dropping and recreating:
-- V7__add_avg_to_order_totals.sql
SELECT pgtrickle.alter_stream_table(
'order_totals',
alter_query => $$SELECT region,
COUNT(*) AS order_count,
SUM(amount) AS total,
AVG(amount) AS avg_amount
FROM orders GROUP BY region$$
);
Dropping Stream Tables
-- V9__remove_legacy_stream_tables.sql
SELECT pgtrickle.drop_stream_table('legacy_report');
Bulk Creation
For environments with many stream tables, use bulk_create to create
them atomically:
-- V4__create_all_stream_tables.sql
SELECT pgtrickle.bulk_create('[
{
"name": "order_totals",
"query": "SELECT region, COUNT(*) AS cnt, SUM(amount) AS total FROM orders GROUP BY region",
"schedule": "5s",
"refresh_mode": "DIFFERENTIAL"
},
{
"name": "daily_revenue",
"query": "SELECT date_trunc(''day'', created_at) AS day, SUM(amount) AS revenue FROM orders GROUP BY 1",
"schedule": "30s",
"refresh_mode": "DIFFERENTIAL"
}
]'::jsonb);
Ordering: Source Tables Before Stream Tables
Flyway executes migrations in version order. Ensure source tables are created in an earlier migration than their dependent stream tables:
V1__create_schema.sql -- CREATE TABLE orders, products, ...
V2__create_indexes.sql -- CREATE INDEX ...
V3__create_stream_tables.sql -- SELECT pgtrickle.create_stream_table(...)
Repeatable Migrations
If you want stream table definitions to be re-applied on every Flyway run (for development environments), use repeatable migrations:
-- R__stream_tables.sql
-- Drop and recreate all stream tables
SELECT pgtrickle.drop_stream_table('order_totals')
WHERE EXISTS (
SELECT 1 FROM pgtrickle.pgt_stream_tables
WHERE pgt_name = 'order_totals'
);
SELECT pgtrickle.create_stream_table(
'order_totals',
$$SELECT region, COUNT(*) AS cnt FROM orders GROUP BY region$$,
schedule => '5s',
refresh_mode => 'DIFFERENTIAL'
);
Or use create_or_replace_stream_table for idempotent definitions:
-- R__stream_tables.sql (idempotent)
SELECT pgtrickle.create_or_replace_stream_table(
'order_totals',
$$SELECT region, COUNT(*) AS cnt FROM orders GROUP BY region$$,
schedule => '5s',
refresh_mode => 'DIFFERENTIAL'
);
Handling ALTER TABLE on Source Tables
When a Flyway migration alters a source table (e.g., adding a column), pg_trickle's DDL event trigger detects the change and suspends affected stream tables. After the schema change, stream tables resume automatically on the next refresh cycle.
If the source table change invalidates the stream table's defining query (e.g., removing a referenced column), you must update or drop the stream table in the same or a subsequent migration.
Liquibase
Creating Stream Tables in Changesets
Use Liquibase's <sql> tag to call pg_trickle functions:
<!-- changelog-3.0.xml -->
<changeSet id="create-order-stream-tables" author="dev">
<sql>
SELECT pgtrickle.create_stream_table(
'order_totals',
$pgt$SELECT region, COUNT(*) AS order_count, SUM(amount) AS total
FROM orders GROUP BY region$pgt$,
schedule => '5s',
refresh_mode => 'DIFFERENTIAL'
);
</sql>
<rollback>
<sql>SELECT pgtrickle.drop_stream_table('order_totals');</sql>
</rollback>
</changeSet>
Rollback Support
Always include <rollback> blocks that drop the stream table:
<changeSet id="add-daily-revenue-st" author="dev">
<sql>
SELECT pgtrickle.create_stream_table(
'daily_revenue',
$pgt$SELECT date_trunc('day', created_at) AS day,
SUM(amount) AS revenue
FROM orders GROUP BY 1$pgt$,
schedule => '30s',
refresh_mode => 'DIFFERENTIAL'
);
</sql>
<rollback>
<sql>SELECT pgtrickle.drop_stream_table('daily_revenue');</sql>
</rollback>
</changeSet>
Altering Stream Tables
<changeSet id="update-order-totals-schedule" author="dev">
<sql>
SELECT pgtrickle.alter_stream_table(
'order_totals',
schedule => '10s'
);
</sql>
<rollback>
<sql>
SELECT pgtrickle.alter_stream_table(
'order_totals',
schedule => '5s'
);
</sql>
</rollback>
</changeSet>
Preconditions
Use Liquibase preconditions to check whether pg_trickle is available:
<changeSet id="create-stream-tables" author="dev">
<preConditions onFail="MARK_RAN">
<sqlCheck expectedResult="1">
SELECT COUNT(*) FROM pg_extension WHERE extname = 'pg_trickle'
</sqlCheck>
</preConditions>
<sql>
SELECT pgtrickle.create_stream_table(...);
</sql>
</changeSet>
Common Patterns
Environment-Specific Schedules
Use different schedules for development vs. production:
-- Use a function to parameterize schedules
SELECT pgtrickle.create_stream_table(
'order_totals',
$$SELECT region, COUNT(*) AS cnt FROM orders GROUP BY region$$,
schedule => CASE
WHEN current_setting('pg_trickle.enabled', true) = 'on'
THEN '5s'
ELSE '1m'
END,
refresh_mode => 'DIFFERENTIAL'
);
CI/Test Environments
In CI, set pg_trickle.enabled = off in postgresql.conf to prevent the
background scheduler from running during schema migrations. Stream tables
will still be created correctly — they just won't auto-refresh until the
scheduler is enabled.
Extension Dependency
Ensure CREATE EXTENSION pg_trickle runs before any stream table migration.
In Flyway, use an early versioned migration:
-- V0__extensions.sql
CREATE EXTENSION IF NOT EXISTS pg_trickle;
In Liquibase:
<changeSet id="install-extensions" author="dev" runOnChange="true">
<sql>CREATE EXTENSION IF NOT EXISTS pg_trickle;</sql>
</changeSet>
Further Reading
- SQL Reference — Complete function reference
- Configuration — GUC variables for schedule tuning
- Getting Started — First stream table walkthrough
ORM Integration Guides
pg_trickle stream tables are read-only materialized views that refresh automatically. This page documents how to use stream tables from popular Python ORMs — SQLAlchemy and Django ORM.
Key Principles
- Stream tables are read-only. All writes go to the source tables; pg_trickle refreshes stream tables in the background.
- Model stream tables as views, not regular tables. ORMs should never
attempt
INSERT,UPDATE, orDELETEon a stream table. - Internal columns are hidden. The
__pgt_row_idcolumn used for incremental maintenance is excluded fromSELECT *queries.
SQLAlchemy
Read-Only Model Definition
Map a stream table as a read-only model using __table_args__:
from sqlalchemy import Column, Integer, Numeric, String, BigInteger
from sqlalchemy.orm import DeclarativeBase
class Base(DeclarativeBase):
pass
class OrderTotals(Base):
"""Read-only model backed by pg_trickle stream table."""
__tablename__ = "order_totals"
# Map the stream table's row ID as primary key for ORM identity
__pgt_row_id = Column("__pgt_row_id", BigInteger, primary_key=True)
region = Column(String, nullable=False)
order_count = Column(BigInteger, nullable=False)
total = Column(Numeric(10, 2), nullable=False)
__table_args__ = {
"info": {"readonly": True}, # Convention marker
}
Querying
Query stream tables like any other SQLAlchemy model:
from sqlalchemy import select
# All regions
stmt = select(OrderTotals).order_by(OrderTotals.total.desc())
results = session.execute(stmt).scalars().all()
# Filtered
stmt = (
select(OrderTotals)
.where(OrderTotals.order_count > 10)
.where(OrderTotals.region == "East")
)
row = session.execute(stmt).scalar_one_or_none()
Preventing Accidental Writes
Use SQLAlchemy events to block write operations:
from sqlalchemy import event
READONLY_TABLES = {"order_totals", "daily_revenue", "customer_stats"}
@event.listens_for(session, "before_flush")
def block_stream_table_writes(session, flush_context, instances):
for obj in session.new | session.dirty | session.deleted:
table_name = obj.__class__.__tablename__
if table_name in READONLY_TABLES:
raise RuntimeError(
f"Cannot write to stream table '{table_name}'. "
f"Write to the source table instead."
)
Reflecting Stream Tables
If you prefer reflection over explicit models:
from sqlalchemy import MetaData, Table, create_engine
engine = create_engine("postgresql://...")
metadata = MetaData()
# Reflect the stream table (treated as a regular table by PostgreSQL)
order_totals = Table("order_totals", metadata, autoload_with=engine)
# Query
with engine.connect() as conn:
result = conn.execute(order_totals.select().limit(10))
for row in result:
print(row)
Checking Freshness
Query the stream table's metadata to check when it was last refreshed:
from sqlalchemy import text
def get_staleness(session, st_name: str) -> dict:
"""Return freshness info for a stream table."""
result = session.execute(
text("SELECT * FROM pgtrickle.get_staleness(:name)"),
{"name": st_name},
).mappings().one()
return dict(result)
# Usage
staleness = get_staleness(session, "order_totals")
print(f"Last refresh: {staleness['data_timestamp']}")
print(f"Stale for: {staleness['staleness_seconds']}s")
Async SQLAlchemy (2.0+)
Works identically with async_session:
from sqlalchemy.ext.asyncio import AsyncSession
async def get_top_regions(session: AsyncSession, limit: int = 10):
stmt = (
select(OrderTotals)
.order_by(OrderTotals.total.desc())
.limit(limit)
)
result = await session.execute(stmt)
return result.scalars().all()
Django ORM
Read-Only Model Definition
Use managed = False so Django never creates, alters, or drops the table:
# models.py
from django.db import models
class OrderTotals(models.Model):
"""Read-only model backed by pg_trickle stream table."""
region = models.CharField(max_length=255)
order_count = models.BigIntegerField()
total = models.DecimalField(max_digits=10, decimal_places=2)
class Meta:
managed = False # Django will not create/alter this table
db_table = "order_totals"
def save(self, *args, **kwargs):
raise NotImplementedError("Stream tables are read-only")
def delete(self, *args, **kwargs):
raise NotImplementedError("Stream tables are read-only")
Querying
Standard Django QuerySet operations work:
# All regions sorted by total
OrderTotals.objects.all().order_by("-total")
# Filtered
OrderTotals.objects.filter(
order_count__gt=10,
region="East"
).first()
# Aggregation (on the stream table itself)
from django.db.models import Sum, Avg
OrderTotals.objects.aggregate(
total_revenue=Sum("total"),
avg_orders=Avg("order_count"),
)
Django Migrations
Since managed = False, Django migrations won't touch stream tables.
Create stream tables in a custom migration using RunSQL:
# migrations/0003_create_stream_tables.py
from django.db import migrations
class Migration(migrations.Migration):
dependencies = [
("myapp", "0002_create_orders_table"),
]
operations = [
migrations.RunSQL(
sql="""
SELECT pgtrickle.create_stream_table(
'order_totals',
$pgt$SELECT region,
COUNT(*) AS order_count,
SUM(amount) AS total
FROM orders GROUP BY region$pgt$,
schedule => '5s',
refresh_mode => 'DIFFERENTIAL'
);
""",
reverse_sql="""
SELECT pgtrickle.drop_stream_table('order_totals');
""",
),
]
Read-Only Mixin
Create a reusable mixin for all stream table models:
class StreamTableMixin(models.Model):
"""Base class for pg_trickle stream table models."""
class Meta:
abstract = True
managed = False
def save(self, *args, **kwargs):
raise NotImplementedError(
f"{self.__class__.__name__} is a read-only stream table. "
f"Write to the source table instead."
)
def delete(self, *args, **kwargs):
raise NotImplementedError(
f"{self.__class__.__name__} is a read-only stream table."
)
# Usage
class OrderTotals(StreamTableMixin):
region = models.CharField(max_length=255)
order_count = models.BigIntegerField()
total = models.DecimalField(max_digits=10, decimal_places=2)
class Meta(StreamTableMixin.Meta):
db_table = "order_totals"
class DailyRevenue(StreamTableMixin):
day = models.DateField()
revenue = models.DecimalField(max_digits=12, decimal_places=2)
class Meta(StreamTableMixin.Meta):
db_table = "daily_revenue"
Checking Freshness
Use raw SQL to query pg_trickle diagnostics:
from django.db import connection
def get_staleness(st_name: str) -> dict:
"""Return freshness info for a stream table."""
with connection.cursor() as cursor:
cursor.execute(
"SELECT * FROM pgtrickle.get_staleness(%s)", [st_name]
)
columns = [col.name for col in cursor.description]
row = cursor.fetchone()
return dict(zip(columns, row)) if row else {}
Django REST Framework
Stream table models work with DRF serializers and viewsets:
from rest_framework import serializers, viewsets
class OrderTotalsSerializer(serializers.ModelSerializer):
class Meta:
model = OrderTotals
fields = ["region", "order_count", "total"]
class OrderTotalsViewSet(viewsets.ReadOnlyModelViewSet):
"""Read-only API endpoint for order totals stream table."""
queryset = OrderTotals.objects.all()
serializer_class = OrderTotalsSerializer
Common Patterns
Write to Source, Read from Stream
The fundamental pattern: all writes go to source tables (normal ORM models), reads come from stream tables (read-only models).
# Write to source table (normal ORM)
order = Order(region="East", amount=Decimal("99.99"))
session.add(order)
session.commit()
# Read from stream table (auto-refreshed by pg_trickle)
totals = session.execute(
select(OrderTotals).where(OrderTotals.region == "East")
).scalar_one()
print(f"East: {totals.order_count} orders, ${totals.total}")
Handling Eventual Consistency
Stream tables refresh on a schedule (e.g., every 5 seconds). After writing to a source table, the stream table may be briefly stale. Options:
- Accept staleness — suitable for dashboards and reports.
- Force refresh — call
pgtrickle.refresh_stream_table()after critical writes. - Use IMMEDIATE mode — stream table refreshes within the same transaction.
# Option 2: Force refresh after a critical write
session.execute(text(
"SELECT pgtrickle.refresh_stream_table('order_totals')"
))
Further Reading
- SQL Reference — Complete function reference
- Configuration — Schedule tuning and refresh modes
- Getting Started — First stream table walkthrough
- dbt Integration — Using pg_trickle with dbt
Multi-tenant Deployment Guide
This guide covers recommended deployment patterns for running pg_trickle across multiple PostgreSQL databases on the same instance, including worker quota allocation, per-database observability, and Grafana dashboard configuration.
Architecture Overview
In a multi-tenant setup, each PostgreSQL database gets its own pg_trickle
background worker scheduler. All schedulers share a single worker pool via
PostgreSQL shared memory (ACTIVE_REFRESH_WORKERS counter). The total number
of concurrent refresh workers is bounded by pg_trickle.max_dynamic_refresh_workers.
┌─────────────────────────────────────────────┐
│ PostgreSQL instance │
│ │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ tenant_a DB │ │ tenant_b DB │ ... │
│ │ scheduler │ │ scheduler │ │
│ └──────┬───────┘ └──────┬───────┘ │
│ │ │ │
│ └────────┬────────┘ │
│ ▼ │
│ Shared worker pool (shmem) │
│ ACTIVE_REFRESH_WORKERS atomic │
└─────────────────────────────────────────────┘
Worker Quota Formula
When running N databases on the same instance, the recommended per-database
worker quota is:
per_db_quota = ceil(max_parallel_refresh_workers / N_databases)
For example, with pg_trickle.max_dynamic_refresh_workers = 8 and 4 databases:
per_db_quota = ceil(8 / 4) = 2 workers per database
Set this in postgresql.conf or in each database's ALTER DATABASE SET:
-- Global limit (applies to all databases)
ALTER SYSTEM SET pg_trickle.max_dynamic_refresh_workers = 8;
-- Per-database override (optional, for high-priority tenants)
\c tenant_a
ALTER DATABASE tenant_a SET pg_trickle.max_dynamic_refresh_workers = 4;
Monitoring with cluster_worker_summary()
The pgtrickle.cluster_worker_summary() function returns a real-time view of
worker allocation across all databases visible from the current connection:
SELECT * FROM pgtrickle.cluster_worker_summary();
Example output:
| db_oid | db_name | active_workers | scheduler_pid | scheduler_running | total_active_workers |
|---|---|---|---|---|---|
| 16384 | tenant_a | 2 | 12345 | true | 5 |
| 16385 | tenant_b | 1 | 12346 | true | 5 |
| 16386 | tenant_c | 2 | 12347 | true | 5 |
The total_active_workers column shows the cluster-wide total from shared memory.
Per-Database Prometheus Labels (CLUS-2)
All pg_trickle metrics emitted by the /metrics endpoint include db_oid and
db_name labels from v0.27.0 onwards. This enables per-database Grafana panels
and alerting rules without requiring separate Prometheus scrape targets.
Example metric with labels:
pg_trickle_refreshes_total{schema="public",name="orders_agg",db_oid="16384",db_name="tenant_a"} 1247
Configuring Prometheus scrape targets
In a multi-tenant setup, configure one scrape job per database, each pointing to its own scheduler's metrics port:
scrape_configs:
- job_name: 'pg_trickle_tenant_a'
static_configs:
- targets: ['localhost:9101']
labels:
instance: 'pg-primary'
- job_name: 'pg_trickle_tenant_b'
static_configs:
- targets: ['localhost:9102']
labels:
instance: 'pg-primary'
Configure each database's metrics port:
\c tenant_a
ALTER DATABASE tenant_a SET pg_trickle.metrics_port = 9101;
\c tenant_b
ALTER DATABASE tenant_b SET pg_trickle.metrics_port = 9102;
Grafana Dashboard Snippets
Per-tenant refresh rate panel
rate(pg_trickle_refreshes_total{db_name=~"$tenant"}[5m])
Variable $tenant should be a Grafana template variable sourcing from:
label_values(pg_trickle_refreshes_total, db_name)
Cluster-wide worker utilisation
sum(pg_trickle_active_workers) by (db_name)
/ scalar(pg_trickle_max_concurrent_refreshes)
Refresh failure rate heatmap
sum by (db_name, le) (
rate(pg_trickle_refresh_failures_total{db_name=~"$tenant"}[1h])
)
SLA breach prediction alerts (PLAN-3)
Add this Grafana alert rule to fire when a predicted breach is emitted via NOTIFY:
-- pg_trickle emits NOTIFY pg_trickle_alert with JSON payload.
-- Parse in your alerting system:
-- {"event":"predicted_sla_breach","pgt_schema":"...","pgt_name":"...",
-- "predicted_ms":...,"sla_ms":...,"pct_over":...}
LISTEN pg_trickle_alert;
See Also
- docs/SCALING.md — cluster-wide fairness and worker budgeting
- docs/CONFIGURATION.md — GUC reference
- docs/SQL_REFERENCE.md —
cluster_worker_summary()andmetrics_summary()API
dbt Hub Submission Guide
This document describes how to publish dbt-pgtrickle to
dbt Hub so users can install it with a simple
package name instead of a git URL.
Background
dbt Hub is a package registry maintained by dbt Labs. Packages are indexed
by the hubcap automation, which runs
hourly and scans listed GitHub repositories for new tagged releases containing
a dbt_project.yml at the repository root.
Current Status
dbt-pgtrickle lives in the dbt-pgtrickle/ subdirectory of the
trickle-labs/pg-trickle monorepo. Because
hubcap expects dbt_project.yml at the repository root, a monorepo layout
requires one of the approaches below.
Submission Approaches
Option A: Separate Repository (recommended)
Create a standalone repository (e.g., grove/dbt-pgtrickle) that mirrors the
dbt-pgtrickle/ directory. This is the standard pattern used by most Hub
packages (Fivetran, Snowplow, etc.).
- Create
grove/dbt-pgtricklerepository on GitHub. - Copy (or subtree-split) the
dbt-pgtrickle/contents into the repo root. - Tag a release matching the version in
dbt_project.yml(e.g.,v0.15.0). - Submit a PR to dbt-labs/hubcap
adding
"grove": ["dbt-pgtrickle"]tohub.json. - Once merged, hubcap will automatically index new tags and publish versions.
After listing, users install with:
packages:
- package: grove/dbt_pgtrickle
version: [">=0.15.0", "<1.0.0"]
Option B: Keep Monorepo, Git Install Only
Continue using the git-based install with subdirectory:. This is fully
functional but requires users to specify a git URL and revision:
packages:
- git: "https://github.com/trickle-labs/pg-trickle.git"
revision: v0.15.0
subdirectory: "dbt-pgtrickle"
Submission Checklist
-
dbt_project.ymlhasname,version,config-version,require-dbt-version -
dbt_project.ymlversion synced with pg_trickle release (0.15.0) -
README.mddocuments both git and Hub installation methods -
Macros are in
macros/directory -
Tests are in
tests/directory -
Package has been tested with
dbt deps && dbt run && dbt test -
Separate
grove/dbt-pgtricklerepository created (if using Option A) - Tagged release published on the standalone repo
-
PR submitted to
dbt-labs/hubcapadding"grove": ["dbt-pgtrickle"] -
Hub listing verified at
https://hub.getdbt.com/grove/dbt_pgtrickle/latest
Hub.json Entry Format
The PR to hubcap adds an entry to hub.json:
{
"grove": [
"dbt-pgtrickle"
]
}
The key is the GitHub organization name (grove), and the value is an array of
repository names. Hubcap will scan grove/dbt-pgtrickle for tags matching
semantic versioning and index each version automatically.
Version Syncing
The dbt_project.yml version should track the pg_trickle extension version to
avoid confusion. When releasing a new pg_trickle version:
- Update
dbt-pgtrickle/dbt_project.ymlversion. - If using a separate repo, sync the changes and tag a new release.
- Hubcap will pick up the new tag within ~1 hour.
References
pg_trickle Blog
Note: This blog directory is an experiment. All posts were generated with AI assistance (GitHub Copilot / Claude) as a way to explore how well LLM-generated technical writing holds up for a niche systems engineering topic. The technical content has been reviewed for accuracy, but treat the posts as drafts — not as officially reviewed documentation. The blog is meant for informative purposes — to learn about interesting topics in the context of pg_trickle. It is a showcase for use-cases rather than a definitive reference; for authoritative documentation see the pg_trickle documentation.
Posts
Core Concepts & Theory
| Post | Summary |
|---|---|
| Why Your Materialized Views Are Always Stale | Explains why REFRESH MATERIALIZED VIEW fails at scale — locking, cost, and the full-scan ceiling — and how switching to a stream table with DIFFERENTIAL mode fixes staleness in 5 lines of SQL. |
| Differential Dataflow for the Rest of Us | A plain-language walkthrough of the mathematics behind incremental view maintenance: delta rules for filters, joins, aggregates, the MERGE application step, and why some aggregates (MEDIAN, RANK) can't be made incremental. |
| Incremental Aggregates in PostgreSQL: No ETL Required | How SUM, COUNT, AVG, and (in v0.37) vector_avg are maintained as running algebraic state rather than full scans. Covers multi-table aggregates, conditional aggregates, and the non-differentiable cases. |
| The Z-Set: The Data Structure That Makes IVM Correct | A concrete tour of the integer-weighted multiset that underlies pg_trickle's differential engine — how inserts are +1, deletes are -1, updates are both, and why commutativity eliminates an entire class of ordering bugs. |
| The Cost Model: How pg_trickle Decides Whether to Refresh Differentially | Inside AUTO mode: the decision inputs (delta ratio, query complexity, historical timings), the learned cost model, and when the engine switches between DIFFERENTIAL and FULL refresh mid-flight. |
SQL Operator Deep Dives
| Post | Summary |
|---|---|
| Recursive CTEs That Update Themselves | Semi-naive evaluation for insert-only tables and Delete-and-Rederive for mixed DML — how pg_trickle maintains WITH RECURSIVE queries incrementally for org charts, BOMs, and graph reachability. |
| Window Functions Without the Full Recompute | Partition-scoped recomputation for ROW_NUMBER, RANK, LAG, LEAD, and all standard window functions. Change one partition, leave the rest untouched. |
| GROUPING SETS, ROLLUP, and CUBE — Incrementally | Multi-dimensional aggregation decomposed into UNION ALL branches, each maintained with algebraic delta rules. Drill-down dashboards that refresh in milliseconds. |
| EXISTS and NOT EXISTS: The Delta Rules Nobody Talks About | Semi-joins and anti-joins maintained via reference counting on the join key. Delta-key pre-filtering, inverted semantics for NOT EXISTS, SubLink extraction from WHERE clauses. |
| DISTINCT That Doesn't Recount | Reference counting (__pgt_dup_count) for incremental deduplication. Insert increments, delete decrements, row removed when count hits zero. DISTINCT ON with tie-breaking. |
| Scalar Subqueries in the SELECT List — Incrementally | Pre/post snapshot diff for correlated subqueries. Only groups affected by the delta are re-evaluated — O(affected groups), not O(all rows). |
| LATERAL Joins in a Stream Table | Row-scoped re-execution for JSON_TABLE, unnest(), generate_series(), and correlated set-returning functions. Cost proportional to changed left-side rows. |
| Set Operations Done Right: UNION, INTERSECT, EXCEPT | Dual-count multiplicity tracking for all set operations. UNION uses reference counting, INTERSECT requires both-side presence, EXCEPT removes when the right side gains a match. |
Refresh Modes & Scheduling
| Post | Summary |
|---|---|
| IMMEDIATE Mode: When "Good Enough Freshness" Isn't Good Enough | Synchronous IVM inside the source transaction — zero lag, no background worker. Account balances, inventory tracking, and the trade-offs vs. DIFFERENTIAL mode. |
| How pg_trickle Handles Diamond Dependencies | When two branches of a DAG share a source and converge downstream, naively refreshing can cause double-counting. How the frontier tracker and diamond-group scheduling ensure correctness. |
| Temporal Stream Tables: Time-Windowed Views That Update Themselves | The "last 7 days" problem — results that change because time passes, not because data changed. Sliding-window eviction, the temporal_mode parameter, and when fixed windows don't need it. |
| Declare Freshness Once: CALCULATED Scheduling | Upstream tables derive their refresh cadence from downstream consumers. Set the SLA on the dashboard; the pipeline adjusts automatically. |
| Cycles in Your Dependency Graph? That's Fine. | Fixed-point iteration for monotone queries. allow_circular = on, SCC detection, convergence guarantees, the iteration limit, and when cycles are a legitimate design choice. |
| Hot, Warm, Cold, Frozen: Tiered Scheduling at Scale | Automatic tier classification by change frequency. The scheduler checks hot tables every cycle, frozen tables every ~60 cycles — 80%+ overhead reduction at 500+ stream tables. |
CDC & Change Tracking
| Post | Summary |
|---|---|
| The CDC Mode You Never Have to Choose | Hybrid CDC starts with triggers, silently graduates to WAL. Three-step transition orchestration, automatic fallback on failure, WAL backpressure, and why AUTO is the right default. |
| IVM Without Primary Keys | Content-based hashing (xxHash64) generates synthetic row identity for keyless tables. Multiplicity counting for duplicates, collision probability, and when to add a PK anyway. |
| Foreign Tables as Stream Table Sources | IVM over postgres_fdw, file_fdw, and parquet_fdw sources using polling-based change detection. Mixed local/foreign source queries, performance trade-offs, and the materialize-first optimization. |
Architecture & Data Patterns
| Post | Summary |
|---|---|
| The Medallion Architecture Lives Inside PostgreSQL | Bronze/Silver/Gold without Spark or Airflow. Chained stream tables propagate from raw ingest to business aggregates in under 5 seconds, with DAG-aware scheduling and transactional consistency. |
| CQRS Without a Second Database | Command Query Responsibility Segregation using stream tables as the read model — same PostgreSQL instance, no CDC pipeline, read-your-writes with IMMEDIATE mode. |
| Slowly Changing Dimensions in Real Time | SCD Type 2 (historical attribute tracking with valid_from/valid_to) maintained continuously by a stream table — no nightly ETL, no Airflow DAG. |
| The Append-Only Fast Path | Why insert-only tables (event logs, sensor data, clickstreams) get a 2–3× faster refresh: no delete-side delta, no inverse computation, no before-image lookups. |
Use Cases & Migration
| Post | Summary |
|---|---|
| Real-Time Leaderboards That Don't Lie | Top-N stream tables for games, sales dashboards, and coding challenges — tied scores, multi-category boards, the pagination problem, and why you might not need Redis. |
| The Hidden Cost of Trigger-Based Denormalization | Four failure modes of hand-rolled trigger sync — blind UPDATE divergence, statement vs. row trigger semantics, invisible deletes, and multi-row races — and how declarative IVM avoids all of them. |
| How We Replaced a Celery Pipeline with 3 SQL Statements | A before/after case study of a Celery + Elasticsearch product search pipeline across three generations of growing complexity, and the pg_trickle stream table that replaced it. Includes benchmark numbers. |
| Migrating from pg_ivm to pg_trickle | Feature gap table, SQL syntax differences, step-by-step migration procedure, and when staying on pg_ivm is the right call. |
Integrations & Ecosystem
| Post | Summary |
|---|---|
| Streaming to Kafka Without Kafka Expertise | pgtrickle-relay bridges stream table deltas to Kafka, NATS, SQS, and webhooks — a single binary with TOML config, advisory-lock HA, subject routing, and Prometheus metrics. |
| The Relay Deep Dive: NATS, Redis Streams, and RabbitMQ | Beyond Kafka: per-backend architecture for NATS JetStream, Redis Streams, RabbitMQ, SQS, and HTTP webhooks. Subject templates, consumer groups, multi-sink pipelines, and a decision tree for choosing a backend. |
| The Inbox Pattern: Receiving Events from Kafka into PostgreSQL | Idempotent, ordered event ingestion via the inbox table — deduplication by event ID, dead-letter queue, and stream tables that aggregate incoming events incrementally. |
| The Outbox You Don't Have to Build | pg_trickle's built-in outbox API: enable_outbox(), consumer groups, poll_outbox(), offset tracking, exactly-once delivery, consumer lag monitoring, and cleanup. |
| dbt + pg_trickle: The Analytics Engineer's Stack | The pgtrickle dbt materialization: continuously-fresh models that are also version-controlled, tested, and documented. DAG alignment, freshness checks, and mixing materializations. |
| Distributed IVM with Citus | Incremental view maintenance across sharded PostgreSQL: per-worker CDC, shard-aware delta routing, co-located join push-down, and automatic recovery after shard rebalances. |
| pg_trickle on CloudNativePG | Production Kubernetes deployment using the CloudNativePG operator: Dockerfile, Cluster manifest, GUC configuration, HA failover behaviour, Prometheus metrics ConfigMap, alerting rules, upgrade procedure, and sizing guidance. |
| Making pg_trickle Work Through PgBouncer | Connection pooling modes, the background-worker bypass, LISTEN/NOTIFY caveats in transaction mode, and a configuration checklist for PgBouncer + pg_trickle. |
| Publishing Stream Tables via Logical Replication | Stream tables as standard publication sources for downstream PostgreSQL instances. Replication identity, multi-region distribution, and feeding Debezium/Kafka with clean aggregated events. |
| One PostgreSQL, Five Databases, One Worker Pool | Multi-database architecture: one launcher per server, one scheduler per database, shared worker pool with per-database quotas. Failure isolation and the database-per-tenant SaaS pattern. |
pgvector Integration
| Post | Summary |
|---|---|
| Your pgvector Index Is Lying to You | Four silent failure modes of unmanaged pgvector deployments: stale embedding corpora, drifting aggregates, IVFFlat recall loss, and over-fetching. How pg_trickle's differential IVM and drift-aware reindexing closes each gap. |
| Incremental Vector Aggregates: Building Recommendation Engines in Pure SQL | How vector_avg (v0.37) turns user taste vectors, category centroids, and cluster representatives into live algebraic aggregates — O(new interactions) cost, not O(history). Comparison with batch recomputation, feature stores, and application-level updates. |
| Deploying RAG at Scale: pg_trickle as Your Embedding Infrastructure | Production operations for pgvector + pg_trickle: drift-aware HNSW reindexing (reindex_if_drift), vector_status() monitoring, multi-tenant tiered indexing patterns, sparse/half-precision aggregates, reactive distance subscriptions, and the embedding_stream_table() ergonomic API. |
| HNSW Recall Is a Lie: Distribution Drift Explained | Deep dive on IVFFlat centroid staleness and HNSW tombstone accumulation — how to measure drift, what the right threshold is, and how post_refresh_action => 'reindex_if_drift' (v0.38) automates the fix. |
| The pgvector Tooling Landscape in 2026 | Honest comparison of pg_trickle against pgai (archived Feb 2026), pg_vectorize, DIY batch pipelines, and Debezium. Introduces the two-layer model: Layer 1 = embedding generation, Layer 2 = derived-state maintenance. |
| Multi-Tenant Vector Search with Row-Level Security | Zero cross-tenant data leakage using RLS policies on stream tables, tiered tenancy (large / medium / small tenant strategies), per-tenant partial HNSW indexes, and drift-aware reindexing per partition. |
Operations & Observability
| Post | Summary |
|---|---|
| Stop Rebuilding Your Search Index at 3am | How pg_trickle's scheduler, SLA tiers (critical / standard / background), backpressure, and parallel workers let you tune refresh behaviour per workload — and why the 3am maintenance window disappears with continuous incremental refresh. |
| pg_trickle Monitors Itself | Since v0.20, the extension's own health metrics are maintained as stream tables. How self-monitoring works, what it tracks, and the recursion question ("who monitors the monitor?"). |
| How to Change a Stream Table Query Without Taking It Offline | ALTER STREAM TABLE ... QUERY performs online schema evolution — the stream table stays queryable during migration, with atomic swap and cascade-safe dependency checking. |
| Backup and Restore for Stream Tables | pg_dump, PITR, selective restore, and the repair_stream_table procedure. What to do (and what breaks) when you restore a database with active stream tables. |
| Testing Stream Tables: Shadow Mode and Correctness Fuzzing | Shadow mode runs DIFFERENTIAL and FULL refresh in parallel and compares. SQLancer fuzzing generates random schemas and DML to find delta engine bugs. The multiset invariant and what it caught. |
| Snapshots: Time Travel for Stream Tables | snapshot_stream_table() captures point-in-time copies for pre-migration safety, replica bootstrap, forensic comparison, and test fixtures. Restore, list, and clean up with one function call each. |
| Drain Mode: Zero-Downtime Upgrades for Stream Tables | pgtrickle.drain() quiesces in-flight refreshes before maintenance. Safe upgrade workflow, CloudNativePG integration, HA failover, and the resume path. |
| Column-Level Lineage in One Function Call | stream_table_lineage() maps output columns to source columns. Impact analysis before ALTER TABLE, GDPR column-deletion audit, documentation generation, and recursive DAG tracing. |
| Error Budgets for Stream Tables | SRE-style freshness monitoring: sla_summary() with p50/p99 latency, staleness tracking, error budget consumption, alerting thresholds, and Prometheus integration. |
| Structured Logging and OpenTelemetry for Stream Tables | log_format = json emits structured events with cycle_id correlation. Event taxonomy, log aggregator integration (Loki, Datadog, Elasticsearch), and OpenTelemetry compatibility. |
Analytics & Feature Engineering
| Post | Summary |
|---|---|
| Funnel Analysis and Cohort Retention at Scale | Computing conversion funnels, retention matrices, and session aggregates incrementally — keeping product analytics live without billion-row scans. |
| Incremental ML Feature Engineering in PostgreSQL | Replace nightly feature store batch jobs with continuously fresh features: rolling windows, lag features, cross-entity comparisons, all maintained as stream tables. |
| Time-Series Downsampling Without TimescaleDB | Hourly, daily, and monthly rollups maintained incrementally from raw sensor data — cascading stream tables as a lightweight alternative to a dedicated TSDB. |
| Incremental Statistical Aggregates: stddev, Percentiles, and Histograms | Which higher-order statistics (variance, correlation, histograms) can be maintained exactly, which need approximations, and the space-accuracy trade-offs. |
Data Patterns & Domain Applications
| Post | Summary |
|---|---|
| Event Sourcing Read Models Without Replay | Project live read-optimized views from an append-only event store without replaying history — order status, revenue analytics, and inventory projections as stream tables. |
| Soft Deletes and Tombstone Management in Differential IVM | How deleted_at patterns interact with delta propagation, ghost row pitfalls, cascading visibility, and best practices for correct stream tables over soft-deletable data. |
| Compliance and Audit Trails with Append-Only Stream Tables | GDPR-compliant, tamper-evident audit logs: right-to-erasure reconciliation, hash chains, access pattern monitoring, and retention policies — all incrementally maintained. |
| Incremental Full-Text Search with tsvector | Maintain ranked search results incrementally as documents change — tracked queries, faceted counts, and top-K ranking without re-indexing the corpus. |
| Incremental PageRank and Graph Analytics in SQL | Live PageRank, connected components, and shortest-path metrics maintained inside PostgreSQL as stream tables — no graph database required. |
| PostGIS + pg_trickle: Incremental Geospatial Aggregates | Heatmaps, geofencing, spatial clustering, and distance-based aggregation that update in milliseconds as new points arrive. |
Deployment & Multi-Tenancy
| Post | Summary |
|---|---|
| High Availability Failover with pg_trickle and Patroni | How stream table state survives primary switchover, WAL replay semantics for change buffers, split-brain prevention, and zero-data-loss configuration. |
| Parameterized Stream Tables: Building a SQL View Library | Patterns for reusable, tenant-scoped, and versionable stream table definitions: single-table multi-tenant, template functions, schema isolation, and composable building blocks. |
Performance Internals
| Post | Summary |
|---|---|
| The 45ms Cold-Start Tax and How L0 Cache Eliminates It | Connection poolers recycle backends, paying a template-parse penalty. The L0 process-local RwLock<HashMap> cache keyed by (pgt_id, cache_generation) drops p99 from 48ms to 6ms. |
| Spill-to-Disk and the Auto-Fallback Safety Net | When delta queries exceed work_mem, pg_trickle detects consecutive spills and auto-switches to FULL refresh. Tuning merge_work_mem_mb, spill_threshold_blocks, and the self-healing recovery path. |
Benchmarks & Advanced Patterns
| Post | Summary |
|---|---|
| TPC-H at 1GB in 40ms | Reproducible benchmark of differential vs. full refresh across five TPC-H queries (Q1, Q3, Q5, Q6, Q12). Results: 13–22× faster per refresh cycle, with differential lag under 2.5 seconds vs. 186 seconds at 5,000 rows/second sustained write load. |
| From Nexmark to Production: Benchmarking Stream Processing in PostgreSQL | pg_trickle on the Nexmark streaming benchmark: per-query throughput, latency percentiles, and how the numbers compare to Flink, Materialize, and a cron job. |
| Reactive Alerts Without Polling | How pg_trickle's reactive subscriptions (v0.39) replace polling loops: SLA breach detection, inventory alerts, fraud velocity checks, and vector distance subscriptions. Covers OLD.*/NEW.* transition semantics and PostgreSQL LISTEN. |
| The Outbox Pattern, Turbocharged | Using stream tables as transactionally consistent event sources for the outbox pattern — derived aggregate events, fat payloads, transition-based routing, and why stream tables naturally debounce high-frequency changes into fewer events. |
Contributing
These posts are deliberately rough-edged — they're drafts exploring how the extension works, not polished marketing copy. If you spot a technical inaccuracy, open an issue or PR. If you want to write a post, open a discussion first to avoid duplication.
Frequently Asked Questions
This FAQ covers everything from core concepts and getting started, through SQL support details, to operational topics like deployment, monitoring, and troubleshooting. Use the table of contents below to jump to a specific topic.
New User FAQ — Top 15 Questions
New to pg_trickle? Start here. Each answer is a short summary with a link to the full explanation further down.
1. What is pg_trickle?
A PostgreSQL 18 extension that adds stream tables — materialized views that refresh themselves incrementally, processing only changed rows instead of re-running the entire query. Full answer →
2. How is this different from a materialized view?
Stream tables refresh automatically on a schedule, support incremental
(differential) refresh, track changes via CDC triggers, and propagate updates
through dependency chains — none of which REFRESH MATERIALIZED VIEW provides.
Full answer →
3. How do I install pg_trickle?
Install from the Docker image, PGXN, or build from source. Add
shared_preload_libraries = 'pg_trickle' to postgresql.conf, then
CREATE EXTENSION pg_trickle; in each database. Full answer →
4. How do I create my first stream table?
One function call: SELECT pgtrickle.create_stream_table(name => 'my_st', query => 'SELECT ...', schedule => '5s');
See the Getting Started guide for a walkthrough.
Full answer →
5. What is the difference between FULL and DIFFERENTIAL refresh?
FULL re-runs the entire defining query. DIFFERENTIAL reads only the changed rows from the change buffer and computes the delta — orders of magnitude faster for small changes on large tables. AUTO mode picks the best strategy per cycle. Full answer →
6. Which refresh mode should I use?
Use AUTO (the default) — it selects DIFFERENTIAL when possible and falls back to FULL when needed. Use IMMEDIATE for same-transaction consistency. Use FULL only when the defining query uses volatile functions or is not IVM-eligible. Full answer →
7. What SQL features are supported?
Joins (INNER, LEFT, RIGHT, FULL OUTER, CROSS, LATERAL), aggregates (60+ functions including SUM, COUNT, AVG, array_agg, jsonb_agg), CTEs (including recursive), window functions, UNION/INTERSECT/EXCEPT, subqueries, CASE, COALESCE, DISTINCT, GROUP BY with ROLLUP/CUBE/GROUPING SETS, and more. Full answer →
8. How fresh is my stream table data?
As fresh as the refresh schedule allows. With a 1s schedule, data is typically
< 2 seconds stale. With IMMEDIATE mode, data is updated within the same
transaction as the source write. Full answer →
9. Can I chain stream tables (ST reads from another ST)?
Yes — stream tables can reference other stream tables. pg_trickle builds a dependency DAG and refreshes them in topological order automatically. Full answer →
10. How does change data capture work?
Lightweight row-level AFTER triggers capture every INSERT, UPDATE, and DELETE
into per-table change buffers. If wal_level = logical is available,
pg_trickle can automatically transition to WAL-based CDC for near-zero
write-path overhead. Full answer →
11. Do I need wal_level = logical?
No. pg_trickle works with the default wal_level = replica using trigger-based
CDC. WAL-based CDC is optional and provides lower write-path overhead.
Full answer →
12. Can I use pg_trickle with PgBouncer / connection poolers?
Yes. pg_trickle's background workers use direct connections, not pooled ones. Your application can use any pooler for reads and writes — the scheduler operates independently. Full answer →
13. How do I monitor stream table health?
Built-in views (pgtrickle.pgt_status, pgtrickle.pgt_refresh_history),
Prometheus metrics endpoint, Grafana dashboard, and NOTIFY-based alerts.
Full answer →
14. What happens if a refresh fails?
The stream table is marked SUSPENDED after exceeding the fuse threshold (default
5 consecutive failures). Data in the change buffer is preserved. Use
pgtrickle.reset_fuse('my_st') to resume after fixing the issue.
Full answer →
15. Can I use pg_trickle with dbt?
Yes — the dbt-pgtrickle package provides a stream_table materialization.
dbt run creates/alters stream tables, dbt source freshness checks staleness.
Full answer →
Table of Contents
Getting started
- General — What pg_trickle is, how IVM works, key concepts
- Installation & Setup — Installing, configuring, uninstalling
- Creating & Managing Stream Tables — Create, alter, drop, schedules
Consistency & refresh modes
- Data Freshness & Consistency — Staleness, read-your-writes, DVS
- IMMEDIATE Mode (Transactional IVM) — Same-transaction refresh
SQL features
- SQL Support — Supported and unsupported SQL constructs
- Aggregates & Group-By — Incremental aggregates, HAVING, auxiliary columns
- Joins — Multi-table delta computation, FULL OUTER JOIN
- CTEs & Recursive Queries — Semi-naive, DRed, recomputation strategies
- Window Functions & LATERAL — Partition-based recomputation, SRFs
- TopK (ORDER BY … LIMIT) — Bounded result sets
- Tables Without Primary Keys — Content-based row identity
Internals & architecture
- Change Data Capture (CDC) — Triggers, WAL transition, why
autois the default, change buffers - Diamond Dependencies & DAG Scheduling — Topological ordering, atomic groups
- Schema Changes & DDL Events — Reinitialize, event triggers
Operations
- Performance & Tuning — Scheduler tuning, min schedule risks, disk space, adaptive fallback
- Interoperability — Views, replication, connection poolers, triggers, pgvector
- dbt Integration — Materialization, commands, freshness checks
- Row-Level Security (RLS) — Source vs stream table policies, SECURITY DEFINER triggers
- Deployment & Operations — Workers, upgrades, replicas, Kubernetes
- Monitoring & Alerting — Views, NOTIFY alerts, failure handling
- Configuration Reference — All GUC parameters
Troubleshooting & reference
- Troubleshooting — Common problems and debugging
- Why Are These SQL Features Not Supported? — Technical explanations for each limitation
- Why Are These Stream Table Operations Restricted? — Why direct DML, ALTER TABLE, and TRUNCATE are disallowed
General
These questions cover fundamental concepts — what pg_trickle is, how incremental view maintenance works, and the key building blocks (frontiers, row IDs, the auto-rewrite pipeline) that power the extension.
What is pg_trickle?
pg_trickle is a PostgreSQL 18 extension that implements stream tables — declarative, automatically-refreshing materialized views with Differential View Maintenance (DVM). You define a SQL query and a refresh schedule; the extension handles change capture, delta computation, and incremental refresh automatically.
It is inspired by the DBSP differential dataflow framework. See DBSP_COMPARISON.md for a detailed comparison.
What is incremental view maintenance (IVM) and why does it matter?
Incremental View Maintenance means updating a materialized view by processing only the changes (deltas) to the source data, rather than re-executing the entire defining query from scratch.
Consider a stream table defined as SELECT customer_id, SUM(amount) FROM orders GROUP BY customer_id over a 10-million-row orders table. When you insert 5 new rows:
- Without IVM (FULL refresh): Re-scans all 10 million rows and recomputes every group. Cost: O(total rows).
- With IVM (DIFFERENTIAL refresh): Reads only the 5 new rows from the change buffer, identifies the affected groups, and updates just those groups. Cost: O(changed rows × affected groups).
pg_trickle's DVM engine implements IVM using differentiation rules for each SQL operator (Scan, Filter, Join, Aggregate, etc.), generating a delta query that computes the exact changes to the stream table from the exact changes to the source.
What is the difference between a stream table and a regular materialized view, in practice?
| Feature | Materialized Views | Stream Tables |
|---|---|---|
| Refresh | Manual (REFRESH MATERIALIZED VIEW) | Automatic (scheduler) or manual |
| Incremental refresh | Not supported natively | Built-in differential mode |
| Change detection | None — always full recompute | CDC triggers track row-level changes |
| Dependency ordering | None | DAG-aware topological refresh |
| Monitoring | None | Built-in views, stats, NOTIFY alerts |
| Schedule | None | Duration strings (5m) or cron (*/5 * * * *) |
| Transactional IVM | No | Yes (IMMEDIATE mode) |
In practice, stream tables are regular PostgreSQL heap tables under the hood — you can query them, create indexes on them, join them with other tables, and reference them from views. The key difference is that pg_trickle manages their contents automatically.
What happens behind the scenes when I INSERT a row into a table tracked by a stream table?
The full data flow for a DIFFERENTIAL-mode stream table:
- Your INSERT completes normally. The row is written to the source table.
- A CDC trigger fires (row-level
AFTER INSERT). It writes a change record (action=I, the new row data as JSONB, the current WAL LSN) into the source's change buffer table (pgtrickle_changes.changes_<oid>). This happens within your transaction — if you roll back, the change record is also rolled back. - You commit. Both the source row and the change record become visible.
- The scheduler wakes up (every
pg_trickle.scheduler_interval_ms, default 1 second). It checks whether the stream table's schedule says a refresh is due. - If due, the refresh engine runs. It reads the change buffer for rows with LSN > the stream table's current frontier, generates a delta query from the DVM operator tree, and applies the result via
MERGE. - Frontier advances. The stream table's frontier is updated to the new LSN, and the consumed change buffer rows are cleaned up.
For IMMEDIATE-mode stream tables, steps 2–6 are replaced: a statement-level AFTER trigger computes and applies the delta within your transaction, so the stream table is updated before your transaction commits.
What does "differential" mean in the context of pg_trickle?
"Differential" refers to the mathematical approach of computing differences (deltas) rather than absolute values. Given a query Q and a set of changes ΔR to source table R, the DVM engine computes ΔQ(R, ΔR) — the change to the query result caused by the change to the source. This delta is then applied (merged) into the stream table.
Each SQL operator has its own differentiation rule. For example:
- Filter: ΔFilter(R, ΔR) = Filter(ΔR) — just apply the filter to the changes.
- Join: ΔJoin(R, S, ΔR) = Join(ΔR, S) — join the changes against the other side's current state.
- Aggregate: Recompute only the groups whose keys appear in the changes.
See DVM_OPERATORS.md for the complete set of differentiation rules.
What is a frontier, and why does pg_trickle track LSNs?
A frontier is a per-source map of {source_oid → LSN} that records exactly how far each stream table has consumed changes from each of its source tables. It is stored as JSONB in the pgtrickle.pgt_stream_tables catalog.
Why LSNs? PostgreSQL's Write-Ahead Log Sequence Number (LSN) provides a globally ordered, monotonically increasing position in the change stream. By recording the LSN at which each source was last consumed, the frontier ensures:
- No missed changes. The next refresh reads changes with LSN > frontier, ensuring contiguous, non-overlapping windows.
- No duplicate processing. Changes at or below the frontier are never re-read.
- Consistent snapshots. When a stream table depends on multiple source tables, the frontier tracks each source independently, enabling consistent multi-source delta computation.
Lifecycle: Created on first full refresh → Advanced on each differential refresh → Reset on reinitialize.
What is the __pgt_row_id column and why does it appear in my stream tables?
Every stream table has a __pgt_row_id BIGINT PRIMARY KEY column. It stores a 64-bit xxHash of the row's group-by key (for aggregate queries) or all output columns (for non-aggregate queries). The refresh engine uses it to match incoming deltas against existing rows during the MERGE operation.
You should ignore this column in your queries. It is an implementation detail. If it bothers you, exclude it explicitly:
SELECT customer_id, total FROM order_totals; -- omit __pgt_row_id
What is the auto-rewrite pipeline and how does it affect my queries?
Before parsing a defining query into the DVM operator tree, pg_trickle runs six automatic rewrite passes:
| # | Pass | What it does |
|---|---|---|
| 0 | View inlining | Replaces view references with (view_definition) AS alias subqueries (fixpoint, max depth 10) |
| 1 | DISTINCT ON | Converts to ROW_NUMBER() OVER (PARTITION BY … ORDER BY …) = 1 subquery |
| 2 | GROUPING SETS / CUBE / ROLLUP | Decomposes into UNION ALL of separate GROUP BY queries |
| 3 | Scalar subquery in WHERE | Rewrites WHERE col > (SELECT …) to CROSS JOIN |
| 4 | Correlated scalar subquery in SELECT | Rewrites to LEFT JOIN with grouped inline view |
| 5 | SubLinks in OR | Splits WHERE a OR EXISTS (…) into UNION branches |
The rewrites are transparent — your original query is preserved in the catalog (original_query column) while the rewritten version is stored in defining_query. The DVM engine only sees standard SQL operators after rewriting.
See ARCHITECTURE.md for details on each pass.
How does pg_trickle compare to DBSP (the academic framework)?
pg_trickle is inspired by DBSP but is not a direct implementation. Key differences:
- DBSP is a general-purpose differential dataflow framework with a Rust runtime (Feldera). It models computation as circuits over Z-sets (multisets with integer weights).
- pg_trickle implements the same mathematical principles (delta queries, frontier tracking) but embedded inside PostgreSQL as an extension. It generates SQL delta queries rather than running a separate computation engine.
- Trade-off: pg_trickle leverages PostgreSQL's optimizer, indexes, and storage engine but is limited to what can be expressed as SQL queries. DBSP can implement arbitrary dataflow computations.
See DBSP_COMPARISON.md for a detailed comparison.
How does pg_trickle compare to pg_ivm?
| Feature | pg_ivm | pg_trickle |
|---|---|---|
| Refresh timing | Immediate (same transaction) only | Immediate, Deferred (scheduled), or Manual |
| Incremental strategy | Transition tables + query rewriting | DVM operator tree + delta SQL generation |
| Supported SQL | Inner joins, simple outer joins, COUNT/SUM/AVG/MIN/MAX, EXISTS, DISTINCT | All of the above + window functions, recursive CTEs, LATERAL, UNION/INTERSECT/EXCEPT, 37 aggregates, TopK, GROUPING SETS |
| Cascading (view-on-view) | No | Yes (DAG-aware topological refresh) |
| Scheduling | None (always immediate) | Duration, cron, CALCULATED, or NULL |
| Monitoring | None | Built-in views, stats, NOTIFY alerts |
| PostgreSQL version | 14–17 | 18 only (until v0.4.0) |
pg_trickle's IMMEDIATE mode is designed as a migration path for pg_ivm users — it uses the same statement-level trigger approach with transition tables.
What PostgreSQL versions are supported?
PostgreSQL 18.x exclusively. pg_trickle uses PostgreSQL 18 features such as enhanced MERGE syntax with NOT MATCHED BY SOURCE and improved event trigger payloads. These features are not available in earlier versions.
Backward compatibility with PostgreSQL 16–17 is planned for a future release (tracked in the roadmap).
Does pg_trickle require wal_level = logical?
No. By default, pg_trickle uses lightweight row-level triggers for change data capture instead of logical replication. This means you do not need to set wal_level = logical, configure max_replication_slots, or create publications.
If you later enable the hybrid CDC mode (pg_trickle.cdc_mode = 'auto'), WAL-based capture becomes an option — but this is opt-in and not required for normal operation.
Is pg_trickle production-ready?
pg_trickle is under active development and approaching production readiness. It has a comprehensive test suite with 700+ unit tests and 290+ end-to-end tests covering correctness, failure recovery, and concurrency scenarios.
That said, as with any new extension, you should evaluate it against your specific workloads before deploying to production. Start with non-critical dashboards or reporting tables, monitor refresh performance and data correctness, and gradually expand usage as confidence grows.
Installation & Setup
How do I install pg_trickle?
- Add
pg_trickletoshared_preload_librariesinpostgresql.conf:shared_preload_libraries = 'pg_trickle' - Restart PostgreSQL.
- Run:
CREATE EXTENSION pg_trickle;
See Installation for platform-specific instructions and pre-built release artifacts.
What are the minimum configuration requirements?
The only mandatory setting is adding pg_trickle to shared_preload_libraries in postgresql.conf (this requires a PostgreSQL restart):
shared_preload_libraries = 'pg_trickle'
All other GUC parameters have sensible defaults and can be tuned later. However, max_worker_processes often needs to be raised from its default of 8 — see the next question.
Can I install pg_trickle on a managed PostgreSQL service (RDS, Cloud SQL, etc.)?
It depends on whether the service allows custom extensions and shared_preload_libraries modifications. Many managed services restrict these. However, pg_trickle has one advantage over replication-based extensions: it does not require wal_level = logical, which avoids one of the most common restrictions on managed PostgreSQL services.
Check your provider's documentation for custom extension support. Services that support custom extensions (e.g., some tiers of Azure Flexible Server, Supabase, Neon) are more likely to work.
How do I uninstall pg_trickle?
- Drop all stream tables first (or they will be cascade-dropped):
SELECT pgtrickle.drop_stream_table(pgt_name) FROM pgtrickle.pgt_stream_tables; - Drop the extension:
DROP EXTENSION pg_trickle CASCADE; - Remove
pg_tricklefromshared_preload_librariesand restart PostgreSQL.
Creating & Managing Stream Tables
Do I need to choose a refresh mode?
No. The default mode ('AUTO') is adaptive: it uses differential (delta-only)
maintenance when efficient, and automatically falls back to full
recomputation when the change volume is high or the query cannot be
differentiated. This works well for the vast majority of queries.
You only need to specify a mode explicitly when:
- You want FULL mode to force recomputation every time (rare).
- You want IMMEDIATE mode for sub-second, in-transaction updates (adds overhead to every write on source tables).
- You want strict DIFFERENTIAL mode and prefer an error over silent fallback when the query isn't differentiable.
How do I create a stream table?
-- Minimal: just name and query. Refreshes on a calculated schedule
-- using adaptive differential maintenance.
SELECT pgtrickle.create_stream_table(
'order_totals',
'SELECT customer_id, SUM(amount) AS total
FROM orders GROUP BY customer_id'
);
-- With custom schedule:
SELECT pgtrickle.create_stream_table(
name => 'order_totals',
query => 'SELECT customer_id, SUM(amount) AS total
FROM orders GROUP BY customer_id',
schedule => '5m'
);
What is the difference between FULL and DIFFERENTIAL refresh mode?
- FULL — Truncates the stream table and re-runs the entire defining query every refresh cycle. Simple but expensive for large result sets.
- DIFFERENTIAL — Computes only the delta (changes since the last refresh) using the DVM engine and applies it via a
MERGEstatement. Much faster when only a small fraction of source data changes between refreshes. When the change ratio exceedspg_trickle.differential_max_change_ratio(default 15%), DIFFERENTIAL automatically falls back to FULL for that cycle. - IMMEDIATE — Maintains the stream table synchronously within the same transaction as the base table DML. Uses statement-level triggers with transition tables — no change buffers, no scheduler. The stream table is always up-to-date.
Why does FULL mode exist if DIFFERENTIAL can fall back to it automatically?
DIFFERENTIAL mode with adaptive fallback covers most user needs — it uses incremental deltas when changes are small and automatically switches to a full recompute when the change ratio is high. However, explicit FULL mode still has its place:
-
No CDC overhead. FULL mode installs CDC triggers on source tables (for DAG tracking), but the refresh itself ignores the change buffers entirely. If your workload has very high write throughput and you know you'll always do a full recompute, FULL mode avoids the per-row trigger overhead of writing change records that will never be consumed incrementally.
-
Simpler debugging. When investigating data correctness issues, FULL mode is a clean baseline — it re-runs the defining query with no delta computation, no frontier tracking, and no MERGE logic. If FULL produces correct results but DIFFERENTIAL doesn't, the bug is in the delta pipeline.
-
Predictable performance. DIFFERENTIAL refresh time varies with the number of changes, which can be unpredictable. FULL refresh time is proportional to the total result set size, which is stable. For SLA-sensitive workloads where you'd rather have consistent 500ms refreshes than variable 5ms–500ms refreshes, FULL provides that predictability.
-
Unsupported-but-planned constructs. Some queries may parse correctly in DIFFERENTIAL mode but produce suboptimal deltas. Using FULL mode explicitly is a safe fallback while the DVM engine matures.
For most users, DIFFERENTIAL is the right default. Use FULL when you have a specific reason.
When should I use FULL vs. DIFFERENTIAL vs. IMMEDIATE?
Use DIFFERENTIAL (default) when:
- Source tables are large and changes between refreshes are small
- The defining query uses supported operators (most common SQL is supported)
- Some staleness (seconds to minutes) is acceptable
Use FULL when:
- The defining query uses unsupported aggregates (
CORR,COVAR_*,REGR_*) - Source tables are small and a full recompute is cheap
- You see frequent adaptive fallbacks to FULL (check refresh history)
Use IMMEDIATE when:
- The stream table must always reflect the latest committed data
- You need transactional consistency (reads within the same transaction see updated data)
- Write-side overhead per DML statement is acceptable
- The defining query is relatively simple (no TopK, no materialized view sources)
What are the advantages and disadvantages of IMMEDIATE vs. deferred (FULL/DIFFERENTIAL) refresh modes?
IMMEDIATE mode
| Detail | |
|---|---|
| ✅ Read-your-writes consistency | The stream table is updated within the same transaction as the base table DML — always current from the writer's perspective. |
| ✅ No lag | No background worker, no schedule interval. The view is never stale. |
| ✅ No change buffers | pgtrickle_changes.* tables are not used, reducing write overhead on source tables. |
| ✅ pg_ivm compatibility | Drop-in migration path for existing pg_ivm / IMMV users. |
| ❌ Write amplification | Every DML statement on a base table also executes IVM trigger logic, adding latency to the original transaction. |
| ❌ Serialized concurrent writes | An ExclusiveLock is taken on the stream table during maintenance, serializing writers. |
| ❌ Limited SQL support | Window functions, recursive CTEs, LATERAL joins, scalar subqueries, and TopK (ORDER BY … LIMIT) are not supported — use DIFFERENTIAL instead. |
| ❌ Cascading limitations | Cascading IMMEDIATE stream tables work but may require manual refresh for deep chains. |
| ❌ No throttling | The refresh cannot be delayed or rate-limited. |
Deferred mode (FULL / DIFFERENTIAL)
| Detail | |
|---|---|
| ✅ Decoupled write path | Base table writes are fast; view maintenance runs later via the scheduler or manual refresh. |
| ✅ Broadest SQL support | Window functions, recursive CTEs, LATERAL, UNION, user-defined aggregates, TopK, cascading stream tables, and more. |
| ✅ Adaptive cost control | DIFFERENTIAL automatically falls back to FULL when the change ratio exceeds pg_trickle.differential_max_change_ratio. |
| ✅ Concurrency-friendly | Writers never block on view maintenance. |
| ❌ Staleness | The stream table lags by up to one schedule interval (e.g. 1m). |
| ❌ No read-your-writes | A writer querying the stream table immediately after a write may see the pre-change data. |
| ❌ Infrastructure overhead | Requires change buffer tables, a background worker, and frontier tracking. |
Rule of thumb: use IMMEDIATE when the query is simple and freshness within the transaction matters. Use DIFFERENTIAL (or FULL) for complex queries, high concurrency, or when you want to decouple write latency from view maintenance.
What happens if I have an IMMEDIATE stream table between two DIFFERENTIAL stream tables in a dependency chain?
Consider the chain: source → ST_A (DIFFERENTIAL) → ST_B (IMMEDIATE) → ST_C (DIFFERENTIAL). This is a valid but unusual configuration with important behavioral consequences:
- ST_A refreshes on its schedule (e.g., every 1 minute) via the background scheduler.
- ST_B is IMMEDIATE, so it has no CDC triggers on ST_A — it uses statement-level IVM triggers. But ST_A is updated by the scheduler (not by user DML), and the scheduler's
MERGEoperation does fire statement-level triggers on ST_A's dependents. So ST_B updates within the scheduler's transaction when ST_A refreshes. - ST_C is DIFFERENTIAL and depends on ST_B. Since ST_B is a stream table, ST_C's CDC triggers fire when ST_B is modified. The scheduler refreshes ST_C on its own schedule.
The practical concern: write latency stacking. When the scheduler refreshes ST_A, ST_B's IVM triggers fire synchronously within that same transaction, adding IVM overhead to ST_A's refresh. If ST_B's delta computation is expensive, it slows down the entire scheduler cycle.
Recommendation: Avoid mixing IMMEDIATE into the middle of a deferred chain. Either make the entire chain IMMEDIATE (for small, simple queries) or keep it entirely DIFFERENTIAL. If you need read-your-writes for one specific step, consider making that the terminal (leaf) stream table in the chain.
What schedule formats are supported?
Duration strings:
| Unit | Suffix | Example |
|---|---|---|
| Seconds | s | 30s |
| Minutes | m | 5m |
| Hours | h | 2h |
| Days | d | 1d |
| Weeks | w | 1w |
| Compound | — | 1h30m |
Cron expressions:
| Format | Example | Description |
|---|---|---|
| 5-field | */5 * * * * | Every 5 minutes |
| Aliases | @hourly, @daily | Built-in shortcuts |
CALCULATED mode: Pass NULL as the schedule to inherit the schedule from downstream dependents.
How do cron schedules handle timezones? What does @daily really mean?
pg_trickle evaluates cron expressions in UTC. The underlying croner crate computes the next occurrence from a UTC timestamp, and the scheduler compares this against chrono::Utc::now(). There is no per-stream-table timezone setting.
This means:
@daily(equivalent to0 0 * * *) fires at midnight UTC, not midnight in your local timezone.@hourly(equivalent to0 * * * *) fires at the top of each UTC hour.0 9 * * 1-5fires at 09:00 UTC on weekdays — if your server is inAmerica/New_York, that's 04:00 or 05:00 local time depending on DST.
If you need a schedule aligned to a local timezone, convert the desired local time to UTC and write the cron expression accordingly. For example, to refresh at 08:00 Europe/Oslo (UTC+1 in winter, UTC+2 in summer), use 0 6 * * * in summer and 0 7 * * * in winter — or accept the 1-hour seasonal shift and pick one.
Tip: For most analytics workloads, UTC-based schedules are preferable because they don't shift with daylight saving transitions.
What is the minimum allowed schedule?
The pg_trickle.min_schedule_seconds GUC (default: 60 seconds) sets the shortest allowed refresh schedule. Any create_stream_table or alter_stream_table call with a schedule shorter than this floor is rejected with a clear error message.
This guard exists to prevent accidentally creating stream tables that refresh too frequently, which could overload the scheduler or the source tables. During development and testing, you can lower it:
ALTER SYSTEM SET pg_trickle.min_schedule_seconds = 1;
SELECT pg_reload_conf();
What happens if all stream tables in the DAG have a CALCULATED schedule?
When every stream table uses a CALCULATED schedule (schedule => 'calculated'), there
are no explicit schedules for the resolution algorithm to derive from. The
CALCULATED logic works by propagating MIN(effective_schedule) from downstream
dependents upward through the DAG. If no node has an explicit duration:
- Leaf nodes (no downstream dependents) have no schedules to take the
minimum of, so they fall back to the
pg_trickle.min_schedule_secondsGUC (default: 60 seconds). - Upstream nodes then resolve to
MIN(fallback) = fallback. - The result: every stream table in the DAG gets the fallback schedule (60 s by default).
This is safe but usually not what you want — the whole DAG refreshes at the same generic interval. Best practice is to set an explicit schedule on at least the leaf (most-downstream) stream tables so that upstream CALCULATED schedules resolve to something meaningful:
-- Leaf ST with an explicit schedule
SELECT pgtrickle.create_stream_table(
name => 'daily_summary',
query => 'SELECT region, SUM(total) FROM pgtrickle.order_totals GROUP BY region',
schedule => '10m'
);
-- Upstream ST inherits that 10 m schedule via CALCULATED
SELECT pgtrickle.create_stream_table(
name => 'order_totals',
query => 'SELECT customer_id, SUM(amount) AS total FROM orders GROUP BY customer_id',
schedule => 'calculated'
);
You can inspect the resolved effective schedules with:
SELECT pgt_name, schedule, effective_schedule
FROM pgtrickle.pgt_stream_tables;
Can a stream table reference another stream table?
Yes. Stream tables can depend on other stream tables. The scheduler automatically refreshes them in topological order (upstream first). Circular dependencies are detected and rejected at creation time.
-- ST1: aggregates orders
SELECT pgtrickle.create_stream_table(
name => 'order_totals',
query => 'SELECT customer_id, SUM(amount) AS total FROM orders GROUP BY customer_id',
schedule => '1m',
refresh_mode => 'DIFFERENTIAL'
);
-- ST2: filters ST1
SELECT pgtrickle.create_stream_table(
name => 'big_customers',
query => 'SELECT customer_id, total FROM pgtrickle.order_totals WHERE total > 1000',
schedule => '1m',
refresh_mode => 'DIFFERENTIAL'
);
How do I change a stream table's schedule or mode?
-- Change schedule
SELECT pgtrickle.alter_stream_table('order_totals', schedule => '10m');
-- Switch refresh mode
SELECT pgtrickle.alter_stream_table('order_totals', refresh_mode => 'FULL');
-- Suspend
SELECT pgtrickle.alter_stream_table('order_totals', status => 'SUSPENDED');
-- Resume
SELECT pgtrickle.alter_stream_table('order_totals', status => 'ACTIVE');
Can I change the defining query of a stream table?
Yes — use the query parameter of alter_stream_table():
SELECT pgtrickle.alter_stream_table('order_totals',
query => 'SELECT customer_id, SUM(amount) AS total, COUNT(*) AS order_count
FROM orders GROUP BY customer_id');
The ALTER QUERY operation validates the new query, migrates the storage table schema if needed, updates catalog entries and source dependencies, and runs a full refresh — all within a single transaction. Concurrent readers see either the old data or the new data, never an empty table.
Schema migration behavior:
| Schema change | Behavior |
|---|---|
| Same columns | Fast path — no storage DDL, just catalog update + full refresh |
| Columns added or removed | Compatible migration via ALTER TABLE ADD/DROP COLUMN — storage table OID preserved |
| Column type incompatible | Full rebuild — storage table dropped and recreated (OID changes, WARNING emitted) |
You can also change the query and other parameters simultaneously:
SELECT pgtrickle.alter_stream_table('order_totals',
query => 'SELECT customer_id, SUM(amount) AS total FROM orders GROUP BY customer_id',
refresh_mode => 'FULL');
How do I deploy stream tables idempotently?
Use create_or_replace_stream_table() — one function call that does the right
thing automatically:
-- Safe to run on every deploy — creates, updates, or no-ops as needed:
SELECT pgtrickle.create_or_replace_stream_table(
name => 'order_totals',
query => 'SELECT region, SUM(amount) AS total FROM orders GROUP BY region',
schedule => '2m',
refresh_mode => 'DIFFERENTIAL'
);
What happens on each deploy:
| Situation | Action |
|---|---|
| First deploy (stream table doesn't exist) | Creates it, populates data |
| Nothing changed since last deploy | No-op — logs INFO, returns instantly |
| You changed the schedule or mode | Updates config in place (no data loss) |
| You changed the query | Migrates storage schema + runs a full refresh |
This mirrors PostgreSQL's CREATE OR REPLACE VIEW / CREATE OR REPLACE FUNCTION pattern.
When to use which function:
| Function | Use case |
|---|---|
create_or_replace_stream_table() | Recommended for most deployments. Declarative, idempotent — handles all cases automatically. |
create_stream_table_if_not_exists() | Safe re-run, but never modifies an existing definition. Good for one-time seed migrations. |
create_stream_table() | Strict mode — errors if the stream table already exists. Use when you want an explicit failure on duplicates. |
How do I trigger a manual refresh?
Call refresh_stream_table() to immediately refresh a stream table without waiting for the next scheduled cycle:
SELECT pgtrickle.refresh_stream_table('order_totals');
This runs a synchronous refresh in your current session and returns when complete. It works even when the background scheduler is disabled (pg_trickle.enabled = false), making it useful for testing, debugging, or one-off data refreshes.
To force a full refresh regardless of the stream table's configured mode, temporarily change the refresh mode:
SELECT pgtrickle.alter_stream_table('order_totals', refresh_mode => 'FULL');
SELECT pgtrickle.refresh_stream_table('order_totals');
-- Switch back to the original mode when done:
SELECT pgtrickle.alter_stream_table('order_totals', refresh_mode => 'DIFFERENTIAL');
Data Freshness & Consistency
Understanding when and how stream tables become current is the #1 conceptual hurdle for users coming from synchronous materialized views. This section explains staleness guarantees, read-your-writes behavior, and Delayed View Semantics (DVS).
How stale can a stream table be?
For deferred modes (FULL / DIFFERENTIAL): A stream table can be at most one schedule interval behind the source data, plus the time it takes to execute the refresh itself. For example, with schedule => '1m', the maximum staleness is approximately 1 minute + refresh duration.
In practice, staleness is often less than the schedule interval because the scheduler continuously checks for due refreshes at pg_trickle.scheduler_interval_ms (default: 1 second).
For IMMEDIATE mode: The stream table is always current within the transaction that modified the source data. There is zero staleness.
Check current staleness:
SELECT pgtrickle.get_staleness('order_totals'); -- returns seconds, NULL if never refreshed
-- Or check all stream tables:
SELECT pgt_name, staleness, stale FROM pgtrickle.stream_tables_info;
Can I read my own writes immediately after an INSERT?
It depends on the refresh mode:
- IMMEDIATE mode: Yes. The stream table is updated within the same transaction as your INSERT. You can query it immediately and see the updated data.
- DIFFERENTIAL / FULL mode: No. The stream table is updated by the background scheduler in a separate transaction. Your INSERT is captured by the CDC trigger, but the stream table won't reflect it until the next scheduled refresh (or a manual
refresh_stream_table()call).
If read-your-writes consistency is a requirement, use refresh_mode => 'IMMEDIATE'.
What consistency guarantees does pg_trickle provide?
pg_trickle provides Delayed View Semantics (DVS): the contents of every stream table are logically equivalent to evaluating its defining query at some past point in time — the data_timestamp. This means:
- The data is always internally consistent — it corresponds to a valid snapshot of the source data.
- The data may be stale — it reflects the source state at
data_timestamp, not necessarily the current state. - For cascading stream tables, the scheduler refreshes in topological order so that when ST B references upstream ST A, A has already been refreshed before B runs its delta query against A's contents.
For IMMEDIATE mode, the guarantee is stronger: the stream table always reflects the state of the source data as of the current transaction.
What are "Delayed View Semantics" (DVS)?
DVS is the formal consistency guarantee: a stream table's contents are equivalent to evaluating its defining query at a specific past time (the data_timestamp). This is analogous to how a materialized view captured at a point in time is always internally consistent, even if the source data has since changed.
The data_timestamp is recorded in the catalog and advanced after each successful refresh:
SELECT pgt_name, data_timestamp FROM pgtrickle.pgt_stream_tables;
What happens if the scheduler is behind — does data get lost?
No. Change data is never lost, even if the scheduler falls behind. Changes accumulate in the change buffer tables (pgtrickle_changes.changes_<oid>) until consumed by a refresh. The frontier ensures that each refresh picks up exactly where the last one left off.
However, a growing change buffer increases:
- Disk usage (change buffer tables grow)
- Refresh time (more changes to process per cycle)
- Risk of adaptive fallback to FULL (if the change ratio exceeds
pg_trickle.differential_max_change_ratio)
The monitoring system emits a buffer_growth_warning NOTIFY alert if buffers grow unexpectedly.
How does pg_trickle ensure deltas are applied in the right order across cascading stream tables?
The scheduler uses topological ordering from the dependency DAG. When ST B depends on ST A:
- ST A is refreshed first — its data is brought up to date and its frontier advances.
- ST A's refresh writes are captured by CDC triggers (since ST A is a source for ST B).
- ST B is refreshed next — its delta query reads ST A's current (just-refreshed) data and the change buffer.
This ensures that downstream stream tables always see consistent upstream data. Circular dependencies are rejected at creation time.
IMMEDIATE Mode (Transactional IVM)
IMMEDIATE mode maintains the stream table synchronously — within the same transaction as the source DML. This section covers when to use it, what SQL it supports, locking behavior, and how to switch between modes.
When should I use IMMEDIATE mode instead of DIFFERENTIAL?
Use IMMEDIATE when:
- Your application requires read-your-writes consistency — e.g., a user inserts an order and immediately queries a dashboard that must include that order.
- The defining query is relatively simple (single-table aggregation, joins, filters).
- The source table write rate is moderate (IMMEDIATE adds latency to every DML statement).
Stick with DIFFERENTIAL when:
- Staleness of a few seconds to minutes is acceptable.
- The defining query uses unsupported IMMEDIATE constructs (materialized-view sources, foreign-table sources).
- Write-side performance is critical (high-throughput OLTP).
- You need to decouple write latency from view maintenance.
What SQL features are NOT supported in IMMEDIATE mode?
IMMEDIATE mode supports all constructs that DIFFERENTIAL supports, with two source-type exceptions:
| Feature | Status | Notes |
|---|---|---|
WITH RECURSIVE | ✅ Supported (IM1) | Semi-naive evaluation inside the trigger. A depth counter guards against infinite loops (pg_trickle.ivm_recursive_max_depth, default 100). A warning is emitted at create time for very deep hierarchies. |
TopK (ORDER BY … LIMIT N [OFFSET M]) | ✅ Supported (IM2) | Micro-refresh: recomputes the top-N rows on every DML statement. Gated by pg_trickle.ivm_topk_max_limit to prevent unbounded scans. |
| Materialized views as sources | ❌ Rejected | Stale-snapshot prevents trigger-based capture — use the underlying query instead. |
| Foreign tables as sources | ❌ Rejected | No triggers on foreign tables — use FULL mode instead. |
Attempting to create or switch to IMMEDIATE mode with an unsupported construct produces a clear error message.
What happens when I TRUNCATE a source table in IMMEDIATE mode?
A statement-level AFTER TRUNCATE trigger fires and truncates the stream table, then re-populates it by executing a full refresh from the defining query — all within the same transaction. The stream table remains consistent.
Can I have cascading IMMEDIATE stream tables (ST A → ST B)?
Yes. When ST A is IMMEDIATE and ST B depends on ST A and is also IMMEDIATE, changes propagate through the chain within the same transaction. The IVM triggers on the base table update ST A, and since that write is visible within the transaction, ST B's triggers fire and update ST B.
What locking does IMMEDIATE mode use?
IMMEDIATE mode acquires statement-level locks on the stream table during delta application:
- Simple queries (single-table scan/filter without aggregates or DISTINCT):
RowExclusiveLock— allows concurrent readers, blocks other writers. - Complex queries (joins, aggregates, DISTINCT, window functions):
ExclusiveLock— blocks both readers and writers to ensure delta consistency.
This means concurrent writes to the same base table are serialized through the stream table lock. For high-concurrency write workloads, DIFFERENTIAL mode avoids this bottleneck.
How do I switch an existing DIFFERENTIAL stream table to IMMEDIATE?
SELECT pgtrickle.alter_stream_table('order_totals', refresh_mode => 'IMMEDIATE');
This:
- Validates the defining query against IMMEDIATE mode restrictions.
- Removes the row-level CDC triggers from source tables.
- Installs statement-level IVM triggers (BEFORE + AFTER with transition tables).
- Clears the schedule (IMMEDIATE mode has no schedule).
- Performs a full refresh to establish a consistent baseline.
To switch back:
SELECT pgtrickle.alter_stream_table('order_totals', refresh_mode => 'DIFFERENTIAL');
This reverses the process: removes IVM triggers, installs CDC triggers, restores the schedule (default 1m), and performs a full refresh.
What happens to IMMEDIATE mode during a manual refresh_stream_table() call?
For IMMEDIATE mode stream tables, refresh_stream_table() performs a FULL refresh — truncates and re-populates from the defining query. This is useful for recovering from edge cases or forcing a clean baseline. It is equivalent to pg_ivm's refresh_immv(name, true).
How much write-side overhead does IMMEDIATE mode add?
Each DML statement on a base table tracked by an IMMEDIATE stream table incurs:
- BEFORE trigger: Advisory lock acquisition + pre-state setup (~0.1–0.5 ms).
- AFTER trigger: Transition table copy to temp tables + delta SQL generation + delta application (~1–50 ms depending on query complexity and delta size).
For a simple single-table aggregate, expect 2–10 ms overhead per statement. For multi-table joins or window functions, overhead is higher. The overhead scales with the number of IMMEDIATE stream tables that depend on the same source table.
SQL Support
pg_trickle supports a broad range of SQL in defining queries. This section
covers what’s supported, what’s rejected (with rewrites), and how specific
constructs like aggregates and ORDER BY are handled. The subsections that
follow dive deeper into aggregates, joins, CTEs, window functions, and TopK.
What SQL features are supported in defining queries?
Most common SQL is supported in both FULL and DIFFERENTIAL modes:
- Table scans, projections,
WHERE/HAVINGfilters INNER,LEFT,RIGHT,FULL OUTER JOIN(including multi-table joins)GROUP BYwith 25+ aggregate functions (COUNT,SUM,AVG,MIN,MAX,BOOL_AND/OR,STRING_AGG,ARRAY_AGG,JSON_AGG,JSONB_AGG,BIT_AND/OR/XOR,STDDEV,VARIANCE,MODE,PERCENTILE_CONT/DISC, and more)FILTER (WHERE ...)on aggregatesDISTINCT- Set operations:
UNION ALL,UNION,INTERSECT,INTERSECT ALL,EXCEPT,EXCEPT ALL - Subqueries:
EXISTS,NOT EXISTS,IN (subquery),NOT IN (subquery), scalar subqueries - Non-recursive and recursive CTEs
- Window functions (
ROW_NUMBER,RANK,SUM OVER, etc.) LATERALjoins with set-returning functions and correlated subqueriesCASE,COALESCE,NULLIF,GREATEST,LEAST,BETWEEN,IS DISTINCT FROM
See DVM Operators for the complete list.
What SQL features are NOT supported?
The following are rejected with clear error messages and suggested rewrites:
| Feature | Reason | Suggested Rewrite |
|---|---|---|
TABLESAMPLE | Stream tables materialize the full result set | Use WHERE random() < fraction in consuming query |
| Window functions in expressions | Cannot be differentially maintained | Move window function to a separate column |
LIMIT / OFFSET (without ORDER BY) | Stream tables materialize the full result set; ORDER BY … LIMIT N [OFFSET M] is supported as TopK | Apply when querying the stream table, or add ORDER BY + LIMIT to use the TopK pattern |
FOR UPDATE / FOR SHARE | Row-level locking not applicable | Remove the locking clause |
RANGE_AGG / RANGE_INTERSECT_AGG | No incremental delta decomposition exists for range aggregates | Use FULL mode, or compute range unions in the consuming query |
Each rejected feature is explained in detail in the Why Are These SQL Features Not Supported? section below.
What happens to ORDER BY in defining queries?
ORDER BY in the defining query is accepted but silently discarded. This is consistent with how PostgreSQL handles CREATE MATERIALIZED VIEW AS SELECT ... ORDER BY ... — the ordering only affects the initial INSERT, not the stored data.
Stream tables are heap tables with no guaranteed row order. Apply ORDER BY when querying the stream table instead:
-- Don't rely on ORDER BY in the defining query:
-- 'SELECT region, SUM(amount) AS total FROM orders GROUP BY region ORDER BY total DESC'
-- Instead, order when reading:
SELECT * FROM regional_totals ORDER BY total DESC;
Exception: When ORDER BY is paired with LIMIT N (with or without OFFSET M), pg_trickle recognizes the TopK pattern and preserves the ordering, limit, and offset.
Which aggregates support DIFFERENTIAL mode?
Algebraic (O(changes), fully incremental): COUNT, SUM, AVG
Semi-algebraic (incremental with occasional group rescan): MIN, MAX
Group-rescan (affected groups re-aggregated from source): STRING_AGG, ARRAY_AGG, JSON_AGG, JSONB_AGG, BOOL_AND, BOOL_OR, BIT_AND, BIT_OR, BIT_XOR, JSON_OBJECT_AGG, JSONB_OBJECT_AGG, STDDEV, STDDEV_POP, STDDEV_SAMP, VARIANCE, VAR_POP, VAR_SAMP, MODE, PERCENTILE_CONT, PERCENTILE_DISC, CORR, COVAR_POP, COVAR_SAMP, REGR_AVGX, REGR_AVGY, REGR_COUNT, REGR_INTERCEPT, REGR_R2, REGR_SLOPE, REGR_SXX, REGR_SXY, REGR_SYY
37 aggregate function variants are supported in total.
Aggregates & Group-By
Aggregate handling is one of the most complex parts of incremental view maintenance. This section explains how pg_trickle categorizes aggregates by their incremental cost, how hidden auxiliary columns work, and what happens when groups are created or destroyed.
Which aggregates are fully incremental (O(1) per change) vs. group-rescan?
pg_trickle categorizes aggregates into three tiers:
| Tier | Cost per change | Aggregates | Mechanism |
|---|---|---|---|
| Algebraic | O(1) | COUNT, SUM, AVG | Hidden auxiliary columns (__pgt_count, __pgt_sum_x) track running totals. Delta updates these columns arithmetically. |
| Semi-algebraic | O(1) normally, O(group) on extremum deletion | MIN, MAX | Maintained via LEAST/GREATEST. If the current MIN/MAX is deleted, the group is rescanned to find the new extremum. |
| Group-rescan | O(group size) per affected group | All others (35 functions) | Affected groups are re-aggregated from source data. A NULL sentinel marks stale groups for rescan. |
For most workloads, the algebraic tier (COUNT/SUM/AVG) covers the majority of aggregations and is the fastest.
Why do some aggregates have hidden auxiliary columns?
For algebraic aggregates (COUNT, SUM, AVG), the DVM engine adds hidden __pgt_count and __pgt_sum_x columns to the stream table's storage. These store running totals that can be updated with O(1) arithmetic per change instead of rescanning the entire group.
For example, a stream table defined as SELECT dept, AVG(salary) FROM employees GROUP BY dept internally stores:
dept— the group-by keyavg— the user-visible average (computed as__pgt_sum_x / __pgt_count)__pgt_count— running count of rows in the group__pgt_sum_x— running sum of salary values__pgt_row_id— row identity hash
When a new employee is inserted, the refresh updates __pgt_count += 1, __pgt_sum_x += new_salary, and recomputes avg. No rescan of the source table is needed.
How does HAVING work with incremental refresh?
HAVING is fully supported in DIFFERENTIAL mode. The DVM engine tracks threshold transitions — groups entering or exiting the HAVING condition:
- Group crosses threshold upward: A previously excluded group (e.g.,
HAVING COUNT(*) > 5) gains enough members → the group is inserted into the stream table. - Group crosses threshold downward: A group that was included drops below the threshold → the group is deleted from the stream table.
- Group stays above threshold: Normal delta update (adjust aggregate values).
This means the stream table always reflects only the groups that satisfy the HAVING clause, even as group membership changes.
What happens to a group when all its rows are deleted?
When the last row of a group is deleted from the source table, the DVM engine detects that __pgt_count drops to zero and deletes the group row from the stream table. The hidden auxiliary columns are cleaned up along with it.
If a new row for the same group-by key is later inserted, a fresh group row is created from scratch.
Why are CORR, COVAR_*, and REGR_* limited to FULL mode?
Regression aggregates like CORR, COVAR_POP, COVAR_SAMP, and the REGR_* family require maintaining running sums of products and squares across the entire group. Unlike COUNT/SUM/AVG (where deltas can be computed from the change alone), regression aggregates:
- Lack algebraic delta rules. There is no closed-form way to update a correlation coefficient from a single row change without access to the full group's data.
- Would degrade to group-rescan anyway. Even if supported, the implementation would need to rescan the full group from source — identical to FULL mode for most practical group sizes.
These aggregates work fine in FULL refresh mode, which re-runs the entire query from scratch each cycle.
Joins
Join delta computation can produce surprising results when both sides change simultaneously. This section covers the standard IVM join rule, FULL OUTER JOIN support, and known edge cases.
How does a DIFFERENTIAL refresh handle a join when both sides changed?
When both tables in a join have changes since the last refresh, the DVM engine computes the join delta using the standard IVM join rule:
$$\Delta(R \bowtie S) = (\Delta R \bowtie S) \cup (R \bowtie \Delta S) \cup (\Delta R \bowtie \Delta S)$$
In practice, this means:
- Join the changes from the left against the current state of the right.
- Join the current state of the left against the changes from the right.
- Join the changes from both sides (handles simultaneous changes to matching keys).
All three parts are combined into a single CTE-based delta query that PostgreSQL executes in one pass.
Does pg_trickle support FULL OUTER JOIN incrementally?
Yes. FULL OUTER JOIN is supported in DIFFERENTIAL mode with an 8-part delta computation. This handles all four cases: matched rows on both sides, left-only rows, right-only rows, and rows that transition between matched and unmatched states as data changes.
The 8 parts cover: new left matches, removed left matches, new right matches, removed right matches, newly matched from left-only, newly matched from right-only, newly unmatched to left-only, and newly unmatched to right-only.
What happens when a join key is updated and the joined row is simultaneously deleted?
This is a known edge case. When a join key column is updated in the same refresh cycle as the joined-side row is deleted, the delta may miss the required DELETE, potentially leaving a stale row in the stream table.
Mitigations:
- The adaptive FULL fallback (triggered when the change ratio exceeds
pg_trickle.differential_max_change_ratio) catches most high-change-rate scenarios where this is likely. - You can stagger changes across refresh cycles.
- Use FULL mode for tables where this pattern is common.
How does NATURAL JOIN work?
NATURAL JOIN is fully supported. At parse time, pg_trickle resolves the common columns between the two tables and synthesizes explicit equi-join conditions. The internal __pgt_row_id column is excluded from common column resolution, so NATURAL JOINs between stream tables also work correctly.
CTEs & Recursive Queries
Recursive CTE support is a key differentiator for pg_trickle. This section explains the three maintenance strategies (semi-naive, DRed, recomputation) and when each is used.
Do recursive CTEs work in DIFFERENTIAL mode?
Yes. pg_trickle supports WITH RECURSIVE in DIFFERENTIAL mode with three auto-selected strategies:
| Strategy | When used | How it works |
|---|---|---|
| Semi-naive evaluation | INSERT-only changes to the base case | Iteratively evaluates new derivations from the inserted rows without touching existing rows. Fastest path. |
| Delete-and-Rederive (DRed) | Mixed changes (INSERT + DELETE/UPDATE) | Deletes potentially affected derived rows, then rederives them from scratch to determine the true delta. |
| Recomputation fallback | Column mismatch or non-monotone recursive terms | Falls back to full recomputation of the recursive CTE. Used when the recursive term contains EXCEPT, Aggregate, Window, DISTINCT, AntiJoin, or INTERSECT SET operators. |
The strategy is selected automatically based on the type of changes and the recursive term's structure.
What are the three strategies for recursive CTE maintenance?
See the table above. In brief:
- Semi-naive is the fast path for append-only workloads (e.g., adding nodes to a tree). It's O(new derivations) — much cheaper than a full re-evaluation.
- DRed handles deletions and updates correctly by first removing potentially invalidated rows and then rederiving them. More expensive than semi-naive, but still incremental.
- Recomputation is the safe fallback that re-executes the entire recursive CTE. Used when the recursive term's structure is too complex for incremental processing.
What triggers a fallback from semi-naive to recomputation?
A recomputation fallback is triggered when:
- The recursive term contains non-monotone operators —
EXCEPT,Aggregate,Window,DISTINCT,AntiJoin, orINTERSECT SET. These operators can "un-derive" rows when inputs change, which semi-naive evaluation cannot handle. - Column mismatch — the CTE's output columns don't match the stream table's storage schema (e.g., after a schema change).
- Mixed DML with non-monotone terms — DELETE or UPDATE changes combined with non-monotone recursive terms always trigger recomputation.
Check which strategy was used in the refresh history:
SELECT action, rows_inserted, rows_deleted
FROM pgtrickle.get_refresh_history('my_recursive_st', 5);
What happens when a CTE is referenced multiple times in the same query?
When a non-recursive CTE is referenced more than once, pg_trickle uses shared delta computation — the CTE's delta is computed once and cached, then reused by each reference. This is tracked via CteScan operator nodes that look up the shared delta from an internal CTE registry.
For single-reference CTEs, pg_trickle simply inlines them as subqueries (no overhead).
Window Functions & LATERAL
Window functions are maintained via partition-based recomputation rather than row-level deltas. This section covers what’s supported, the expression restriction, and LATERAL constructs.
How are window functions maintained incrementally?
pg_trickle uses partition-based recomputation for window functions. When source data changes, the DVM engine:
- Identifies which partitions are affected by the changes (based on the
PARTITION BYkey). - Recomputes the window function for only the affected partitions.
- Replaces the old partition results with the new ones in the stream table.
This is more efficient than a full recomputation when changes affect a small number of partitions.
Why can't I use a window function inside a CASE or COALESCE expression?
Window functions like ROW_NUMBER() OVER (…) are supported as standalone columns but cannot be embedded in expressions (e.g., CASE WHEN ROW_NUMBER() OVER (...) = 1 THEN ...).
This restriction exists because the DVM engine handles window functions by recomputing entire partitions. When a window function is buried inside an expression, the engine cannot isolate the window computation from the surrounding expression.
Rewrite: Move the window function to a separate column in one stream table, then reference it in a second stream table:
-- ST1: compute the window function
SELECT id, dept, salary,
ROW_NUMBER() OVER (PARTITION BY dept ORDER BY salary DESC) AS rn
FROM employees
-- ST2: use it in an expression (references ST1)
SELECT id, CASE WHEN rn = 1 THEN 'top' ELSE 'other' END AS rank_label
FROM st1
What LATERAL constructs are supported?
pg_trickle supports three kinds of LATERAL constructs:
| Construct | Example | Delta strategy |
|---|---|---|
| Set-returning functions | LATERAL jsonb_array_elements(data) | Row-scoped recomputation — only affected parent rows are re-expanded |
| Correlated subqueries | LATERAL (SELECT ... WHERE t.id = s.id) | Row-scoped recomputation |
| JSON_TABLE (PG 17+) | JSON_TABLE(data, '$.items[*]' ...) | Modeled as LateralFunction |
Additional supported SRFs: jsonb_each, jsonb_each_text, unnest, generate_series, and others.
What happens when a row moves between window partitions during a refresh?
When a row's PARTITION BY key changes (e.g., an employee moves departments), the DVM engine recomputes both the old partition (to remove the row) and the new partition (to add it). Both partitions are re-evaluated from the source data, ensuring window function results are correct.
TopK (ORDER BY … LIMIT)
TopK queries (ORDER BY ... LIMIT N, optionally with OFFSET M) are handled via a
specialized MERGE-based strategy that re-executes the bounded query each cycle.
This section explains how it works and its limitations.
How does ORDER BY … LIMIT N work in a stream table?
When a defining query has a top-level ORDER BY … LIMIT N (with a constant integer N), pg_trickle recognizes it as a TopK pattern. An optional OFFSET M (constant integer) selects a "page" within the ranked result. The stream table stores exactly N rows and is refreshed via a MERGE-based scoped-recomputation strategy:
- On each refresh, the full query (with ORDER BY + LIMIT, and OFFSET if present) is re-executed against the source tables.
- The result is merged into the stream table using
MERGEwithNOT MATCHED BY SOURCEfor deletes. - The catalog records
topk_limit,topk_order_by, and optionallytopk_offsetfor the stream table.
TopK bypasses the DVM delta pipeline — it always re-executes the bounded query. This is efficient because the result set is bounded by N.
SELECT pgtrickle.create_stream_table(
name => 'top_customers',
query => 'SELECT customer_id, total FROM order_totals ORDER BY total DESC LIMIT 100',
schedule => '1m',
refresh_mode => 'DIFFERENTIAL'
);
-- With OFFSET — "page 2" of the leaderboard (rows 101–200):
SELECT pgtrickle.create_stream_table(
name => 'next_customers',
query => 'SELECT customer_id, total FROM order_totals ORDER BY total DESC LIMIT 100 OFFSET 100',
schedule => '1m',
refresh_mode => 'DIFFERENTIAL'
);
Does OFFSET work with TopK?
Yes. ORDER BY … LIMIT N OFFSET M is fully supported. The stream table stores exactly N rows starting from position M+1 in the ranked result. This is useful for:
- Paginated dashboards: Each page is a separate stream table with a different OFFSET.
- Excluding outliers:
OFFSET 5 LIMIT 50skips the top 5 and shows the next 50. - Windowed leaderboards:
OFFSET 10 LIMIT 10shows the "second tier."
Caveat: When source data changes, the "page" can shift — a row on page 3 may move to page 2 or 4. The stream table always reflects the current state of the page at the time of the last refresh.
OFFSET 0 is treated as no offset.
What happens when a row below the top-N cutoff rises above it?
On the next refresh, the full ORDER BY … LIMIT N query is re-executed. The newly qualifying row appears in the result, and the row that fell out of the top-N is removed. The MERGE operation handles this by:
- INSERT the newly qualifying row
- DELETE the row that fell below the cutoff
- UPDATE any rows whose values changed but remained in the top-N
Since TopK always re-executes the bounded query, it correctly detects all ranking changes.
Can I use TopK with aggregates or joins?
Yes. The defining query can contain any SQL that pg_trickle supports, plus ORDER BY … LIMIT N:
-- TopK over an aggregate
SELECT dept, SUM(salary) AS total_salary
FROM employees GROUP BY dept
ORDER BY total_salary DESC LIMIT 10
-- TopK over a join
SELECT e.name, d.name AS dept, e.salary
FROM employees e JOIN departments d ON e.dept_id = d.id
ORDER BY e.salary DESC LIMIT 20
The only restriction is that TopK cannot be combined with set operations (UNION/INTERSECT/EXCEPT) or GROUPING SETS/CUBE/ROLLUP.
Tables Without Primary Keys
While primary keys are not required, their absence changes how pg_trickle identifies rows. This section explains the content-based hashing fallback and its limitations with duplicate rows.
Do source tables need a primary key?
No, but it is strongly recommended. When a source table has a primary key, pg_trickle uses it to generate a deterministic __pgt_row_id for each row — this is the most reliable way to track row identity across refreshes.
Without a primary key, pg_trickle falls back to content-based hashing — an xxHash of all column values. This works correctly for tables where every row is unique, but has known issues with exact duplicate rows. See What are the risks of using tables without primary keys? for details.
What are the risks of using tables without primary keys?
Content-based row identity has known limitations with exact duplicate rows (rows where every column value is identical):
- INSERT as no-op: If a row identical to an existing one is inserted, both have the same
__pgt_row_idhash, so the MERGE treats it as a no-op (the row already exists). - DELETE removes all copies: Deleting one of N identical rows generates a DELETE delta, but the MERGE removes all rows with that
__pgt_row_id. - Aggregate drift: Over time, these mismatches can cause aggregate values to drift from the true result.
Recommendation: Add a primary key or unique constraint to source tables, or use FULL mode for tables with frequent exact-duplicate rows.
How does content-based row identity work for duplicate rows?
For tables without a primary key, __pgt_row_id is computed as pg_trickle_hash_multi(ARRAY[col1::text, col2::text, ...]) — an xxHash of all column values. Rows with identical content produce identical hashes.
The hash uses \x1E (record separator) between values and \x00NULL\x00 for NULL values, minimizing collision risk for rows with different content. However, truly identical rows (same values in every column) will always hash to the same value — this is inherent to content-based identity.
Change Data Capture (CDC)
This section explains how pg_trickle captures changes to your source tables, the trade-offs between trigger-based and WAL-based CDC, and operational topics like backup/restore and buffer inspection.
How does pg_trickle capture changes to source tables?
pg_trickle installs AFTER INSERT/UPDATE/DELETE row-level PL/pgSQL triggers on each source table referenced by a stream table. Whenever a row in the source table is modified, the trigger writes a change record into a per-source buffer table in the pgtrickle_changes schema.
Each change record contains:
- Action —
I(insert),U(update),D(delete), orT(truncate marker) - Row data — old and/or new row values serialized as JSONB
- LSN — the current WAL log sequence number, used for frontier tracking
- Transaction ID — links the change to its originating transaction
The trigger fires within your transaction, so if you roll back, the change record is also rolled back. This guarantees that only committed changes appear in the buffer.
What is the overhead of CDC triggers?
The per-row overhead is approximately 20–55 μs, which covers the PL/pgSQL function dispatch, row_to_json() serialization, and the buffer table INSERT.
At typical write rates (fewer than 1,000 writes per second per source table), this adds less than 5% additional DML latency. For most OLTP workloads, the overhead is negligible — a single network round-trip to the database is usually 10–100× more expensive.
If you have very high-throughput source tables (>10K writes/sec), consider enabling the hybrid CDC mode (pg_trickle.cdc_mode = 'auto') which can automatically transition to WAL-based capture for lower per-row overhead (~5–15 μs).
What happens when I TRUNCATE a source table?
TRUNCATE is captured via a statement-level AFTER TRUNCATE trigger that writes a T marker row to the change buffer. When the differential refresh engine detects this marker, it automatically falls back to a full refresh for that cycle, ensuring the stream table stays consistent. Both FULL and DIFFERENTIAL mode stream tables handle TRUNCATE correctly.
Are CDC triggers automatically cleaned up?
Yes. pg_trickle tracks which source tables are referenced by which stream tables in the pgt_dependencies catalog. When the last stream table referencing a particular source table is dropped, pg_trickle automatically:
- Removes the CDC triggers from the source table.
- Drops the associated change buffer table (
pgtrickle_changes.changes_<oid>).
You do not need to manually clean up triggers or buffer tables.
What happens if a source table is dropped or altered?
pg_trickle has DDL event triggers that listen for ALTER TABLE and DROP TABLE on source tables. When a change is detected, pg_trickle responds automatically:
- All stream tables that depend on the altered source are marked with
needs_reinit = truein the catalog. - On the next scheduler cycle, each affected stream table is reinitialized — the existing storage table is dropped, recreated from the current defining query schema, and re-populated with a full refresh.
- A
reinitialize_neededNOTIFY alert is sent so your monitoring can detect the event.
If the DDL change breaks the defining query (e.g., a column referenced in the query was dropped), the reinitialization will fail and the stream table will enter ERROR status. In that case, you need to drop and recreate the stream table with an updated query.
How do I check if a source table has switched from trigger-based CDC to WAL-based CDC?
When you enable hybrid CDC (pg_trickle.cdc_mode = 'auto'), pg_trickle starts capturing changes with triggers and can automatically transition to WAL-based logical replication once conditions are met. There are several ways to check the current CDC mode for each source table:
1. Query the dependency catalog directly:
SELECT d.source_relid, c.relname AS source_table, d.cdc_mode,
d.slot_name, d.decoder_confirmed_lsn, d.transition_started_at
FROM pgtrickle.pgt_dependencies d
JOIN pg_class c ON c.oid = d.source_relid;
The cdc_mode column shows one of three values:
TRIGGER— changes are captured via row-level triggers (the default)TRANSITIONING— the system is in the process of switching from triggers to WALWAL— changes are captured via logical replication
2. Use the built-in health check function:
SELECT source_table, cdc_mode, slot_name, lag_bytes, alert
FROM pgtrickle.check_cdc_health();
This returns a row per source table with the current mode, replication slot lag (for WAL-mode sources), and any alert conditions such as slot_lag_exceeds_threshold or replication_slot_missing.
3. Listen for real-time transition notifications:
LISTEN pg_trickle_cdc_transition;
pg_trickle sends a NOTIFY with a JSON payload whenever a transition starts, completes, or is rolled back. Example payload:
{
"event": "transition_complete",
"source_table": "public.orders",
"old_mode": "TRANSITIONING",
"new_mode": "WAL",
"slot_name": "pg_trickle_slot_16384"
}
This lets you integrate CDC mode changes into your monitoring stack without polling.
4. Check the global GUC setting:
SHOW pg_trickle.cdc_mode;
This shows the desired global behavior (trigger, auto, or wal), not the per-table actual state. The per-table state lives in pgt_dependencies.cdc_mode as described above.
See CONFIGURATION.md for details on the pg_trickle.cdc_mode, pg_trickle.wal_transition_timeout, pg_trickle.slot_lag_warning_threshold_mb, and pg_trickle.slot_lag_critical_threshold_mb GUCs.
Is it safe to add triggers to a stream table while the source table is switching CDC modes?
Yes, this is completely safe. CDC mode transitions and user-defined triggers operate on different tables and do not interfere with each other:
- CDC transitions affect how changes are captured from source tables (e.g.,
orders). The transition switches the capture mechanism from row-level triggers on the source table to WAL-based logical replication. - User-defined triggers live on stream tables (e.g.,
order_totals) and control how the refresh engine applies changes to the materialized output.
Because these are independent concerns, you can freely add, modify, or remove triggers on a stream table at any point — including during an active CDC transition on its source tables.
How it works in practice:
- The refresh engine checks for user-defined triggers on the stream table at the start of each refresh cycle (via a fast
pg_triggerlookup, <0.1 ms). - If user triggers are detected, the engine uses explicit
DELETE/UPDATE/INSERTstatements instead ofMERGE, so your triggers fire with correctTG_OP,OLD, andNEWvalues. - The change data consumed by the refresh engine has the same format regardless of whether it came from CDC triggers or WAL decoding — so the trigger detection and the CDC mode are fully decoupled.
A trigger added between two refresh cycles will simply be picked up on the next cycle. The only (theoretical) edge case is adding a trigger in the tiny window during a single refresh transaction, between the trigger-detection check and the MERGE execution — but since both happen within the same transaction, this is virtually impossible in practice.
Why does pg_trickle use triggers instead of logical replication for initial CDC?
pg_trickle always bootstraps CDC with row-level AFTER triggers because they provide single-transaction atomicity — the change record is written in the same transaction as the source DML, so:
- No commit-order ambiguity. The change buffer always reflects committed data; rolled-back transactions never produce partial change records.
- No replication slot management at creation time. Logical replication requires creating and monitoring replication slots, which can bloat WAL if the subscriber falls behind. Trigger-based bootstrap avoids this complexity.
- Works on all hosting providers. Some managed PostgreSQL services restrict
wal_level = logicalor limit the number of replication slots. Trigger bootstrap works everywhere, with no configuration changes. - Simpler initial deployment. No need for
wal_level = logical, no publication/subscription setup, and no extra connections for WAL senders.
With pg_trickle.cdc_mode = 'auto' (the default since v0.3.0), pg_trickle uses triggers initially and then transparently transitions to WAL-based CDC if wal_level = logical is available. If WAL is not available, triggers are kept permanently — no degradation, no errors. Set pg_trickle.cdc_mode = 'trigger' if you want to disable WAL transitions entirely. See ADR-001 and ADR-002 in the architecture documentation for the full rationale.
Why is auto the default pg_trickle.cdc_mode?
As of v0.3.0, auto is the default CDC mode. This was changed from trigger based on the following considerations:
1. Safe no-op on standard installs.
PostgreSQL ships with wal_level = replica by default. In this configuration, auto simply stays on trigger-based CDC permanently — it does not create replication slots, publications, or any WAL infrastructure. There is no error, warning, or user-visible difference from the old trigger default. auto only activates the WAL transition path when wal_level = logical is explicitly configured by the operator.
2. Automatic fallback hardening. The WAL transition and steady-state polling now include robust automatic fallback:
- Consecutive poll errors (5 failures) trigger automatic revert to triggers.
check_decoder_health()validates slot existence, WAL lag, andwal_levelon every tick.- The
TRANSITIONINGphase has a progressive timeout with informative warnings. - Post-restart health checks (
check_cdc_transition_health()) automatically clean up stale transitions.
3. Zero overhead for trigger-only deployments.
When wal_level != logical, the auto scheduler branch takes a fast-path exit after a single GUC check and pg_replication_slots query. The overhead compared to trigger mode is negligible (<1 ms per scheduler tick).
4. Progressive optimisation without config changes.
When an operator later enables wal_level = logical (e.g., for other replication needs), pg_trickle automatically benefits from lower per-row CDC overhead (~5–15 μs vs ~20–55 μs) without any configuration change. This aligns with the principle of least surprise.
When to use trigger instead: Set pg_trickle.cdc_mode = 'trigger' if you want fully deterministic trigger-only behaviour, need to minimize any replication slot management, or are on a restricted managed PostgreSQL that caps replication slots. This reverts to the pre-v0.3.0 default.
Caveats to be aware of in auto mode:
- Keyless tables (no PRIMARY KEY) stay on triggers permanently — WAL mode requires a PK for
pk_hashcomputation. - Replication slots prevent WAL recycling: if the decoder falls behind, WAL accumulates. pg_trickle now warns at
pg_trickle.slot_lag_warning_threshold_mb(default 100 MB) and marks per-source CDC health unhealthy atpg_trickle.slot_lag_critical_threshold_mb(default 1024 MB). - The
TRANSITIONINGphase runs both trigger and WAL decoder simultaneously; LSN-based deduplication handles correctness. If anything goes wrong, the system rolls back to triggers.
How does the trigger-to-WAL automatic transition work?
When pg_trickle.cdc_mode = 'auto', pg_trickle monitors each source table's write rate. When the rate exceeds an internal threshold, the transition proceeds in three phases:
- Slot creation. A logical replication slot is created for the source table's OID (e.g.,
pg_trickle_slot_16384). - Dual capture. For a brief period, both triggers and WAL decoding capture changes. The system uses LSN comparison to deduplicate, ensuring no changes are lost or double-counted.
- Trigger removal. Once the WAL decoder has confirmed it is caught up (its confirmed LSN ≥ the frontier LSN), the row-level triggers are dropped and the source transitions fully to WAL mode.
The transition is tracked in pgt_dependencies.cdc_mode (values: TRIGGER → TRANSITIONING → WAL). If the transition times out (pg_trickle.wal_transition_timeout, default 5 minutes), it is rolled back and triggers are kept.
What happens to CDC if I restore a database backup?
After restoring a backup (pg_dump, pg_basebackup, or PITR), the CDC state depends on the backup type:
| Backup type | Triggers | Change buffers | Frontier | Action needed |
|---|---|---|---|---|
| pg_dump (logical) | Preserved (in DDL) | Buffer rows included | Catalog restored | Usually none — next refresh detects stale frontier and does a full refresh |
| pg_basebackup (physical) | Preserved | Buffer rows preserved (committed at backup time) | Catalog restored | Replication slots may be invalid — WAL-mode sources may need manual transition back to TRIGGER mode |
| PITR (point-in-time) | Preserved | Only committed buffer rows at the recovery target | Catalog restored | Similar to pg_basebackup; frontier may point ahead of actual buffer content → first refresh does a full refresh to reconcile |
In all cases, the pg_trickle scheduler automatically detects frontier inconsistencies and falls back to a full refresh for the first cycle after restore. No manual intervention is required for trigger-mode sources.
For full guidelines on disaster recovery strategies, see our dedicated Backup and Restore chapter.
For WAL-mode sources, replication slots created after the backup point will not exist in the restored state. Set pg_trickle.cdc_mode = 'trigger' temporarily, or let the auto transition recreate slots.
Do CDC triggers fire for rows inserted via logical replication (subscribers)?
Yes. PostgreSQL fires row-level triggers on the subscriber side for rows applied via logical replication. This means if you have a subscriber database with pg_trickle installed, the CDC triggers will capture replicated changes into the local change buffers.
Implication: You can run stream tables on a subscriber database that tracks replicated tables — the change capture works transparently. However, be careful about:
- Double-counting. If the same table is tracked by pg_trickle on both the publisher and subscriber, changes are captured twice (once on each side). This is fine if the stream tables are independent, but confusing if you expect them to be identical.
- Replication lag. The stream table on the subscriber will be delayed by both the replication lag and the pg_trickle refresh schedule.
Can I inspect the change buffer tables directly?
Yes. Change buffers are ordinary tables in the pgtrickle_changes schema, named changes_<source_oid>:
-- List all change buffer tables
SELECT tablename FROM pg_tables WHERE schemaname = 'pgtrickle_changes';
-- Inspect recent changes for a source table (find OID first)
SELECT c.oid FROM pg_class c JOIN pg_namespace n ON n.oid = c.relnamespace
WHERE c.relname = 'orders' AND n.nspname = 'public';
-- Then query the buffer
SELECT action, lsn, txid, old_data, new_data
FROM pgtrickle_changes.changes_16384
ORDER BY lsn DESC LIMIT 10;
The action column contains: I (insert), U (update), D (delete), or T (truncate).
Warning: Do not modify buffer tables directly. The refresh engine manages buffer cleanup (truncation) after each successful refresh. Manual changes will corrupt the frontier tracking.
How does pg_trickle prevent its own refresh writes from re-triggering CDC?
When the refresh engine writes to a stream table (via MERGE or explicit DML), it does not trigger CDC capture on that stream table, even if the stream table is itself a source for a downstream stream table. This is because:
- CDC triggers are only installed on source tables, not on stream tables. The refresh engine writes directly to the stream table's storage without going through any change-capture mechanism.
- Downstream change propagation uses a different path. When stream table A is a source for stream table B, changes to A are detected at B's refresh time by re-reading A's data (not via triggers on A). The topological ordering ensures A is refreshed before B.
This design prevents infinite loops (A triggers B triggers A) and avoids the overhead of capturing changes to materialized output that will be recomputed anyway.
Diamond Dependencies & DAG Scheduling
When multiple stream tables form a diamond-shaped dependency graph, careful coordination is needed to avoid inconsistent snapshots. This section covers atomic consistency, schedule policies, and topological ordering.
What is a diamond dependency and why does it matter?
A diamond dependency occurs when two (or more) intermediate stream tables both depend on the same source, and a downstream stream table depends on both of them:
Source: orders
/ \
ST: totals ST: counts
\ /
ST: combined_report
Without coordination, combined_report might be refreshed after totals is updated but before counts is updated (or vice versa), producing a temporarily inconsistent snapshot — totals reflects the latest data but counts is stale.
What does diamond_consistency = 'atomic' do?
When diamond_consistency = 'atomic' is set on the downstream stream table (e.g., combined_report), pg_trickle ensures that all upstream stream tables in the diamond are refreshed within the same scheduler cycle before the downstream table is refreshed. This guarantees a consistent point-in-time snapshot.
If any upstream refresh in the atomic group fails, the downstream refresh is skipped for that cycle to avoid inconsistency. The failed upstream will be retried on the next cycle.
SELECT pgtrickle.alter_stream_table('combined_report',
diamond_consistency => 'atomic');
What is the difference between 'fastest' and 'slowest' schedule policy?
When a stream table has multiple upstream dependencies with different schedules, pg_trickle needs a policy for when to refresh the downstream table:
| Policy | Behavior | Best for |
|---|---|---|
fastest | Refresh downstream whenever any upstream refreshes | Low-latency dashboards where partial freshness is acceptable |
slowest | Refresh downstream only after all upstreams have refreshed | Reports requiring all-or-nothing consistency |
The default is fastest. Use slowest with diamond_consistency = 'atomic' for the strongest consistency guarantees.
What happens when an atomic diamond group partially fails?
When diamond_consistency = 'atomic' is set and one upstream stream table in the diamond fails to refresh:
- The downstream refresh is skipped for that cycle (it reads stale-but-consistent data from the previous successful cycle).
- The failed upstream follows the normal retry logic (exponential backoff, up to
max_consecutive_errors). - Other non-failing upstreams in the diamond are still refreshed normally — their data is fresh, but the downstream won't consume it until all upstreams succeed.
- A
NOTIFY pg_trickle_alertwith eventdiamond_partial_failureis sent so your monitoring can detect the situation.
How does pg_trickle determine topological refresh order?
The scheduler builds a directed acyclic graph (DAG) of stream table dependencies at startup and after any create_stream_table / drop_stream_table call. The algorithm:
- Edge discovery. For each stream table, the defining query's source tables are extracted. If a source table is itself a stream table, a dependency edge is added.
- Cycle detection. The DAG is checked for cycles. If a cycle is detected, the offending
create_stream_tablecall is rejected with a clear error message listing the cycle path. - Topological sort. A Kahn's algorithm topological sort produces the refresh order — leaf nodes (no stream table dependencies) are refreshed first, then their dependents, and so on.
- Level assignment. Each stream table is assigned a "level" (0 for leaves, max(parent levels) + 1 for dependents). Stream tables at the same level are refreshed concurrently when
pg_trickle.parallel_refresh_mode = 'on'.
The topological order is recalculated whenever the DAG changes. You can inspect it with:
SELECT pgt_name, depends_on, topo_level
FROM pgtrickle.stream_tables_info
ORDER BY topo_level, pgt_name;
Schema Changes & DDL Events
pg_trickle detects source table schema changes via PostgreSQL’s DDL event trigger system and reacts automatically. This section explains what happens for various DDL operations and how to handle them.
What happens when I add a column to a source table?
Adding a column to a source table is safe and non-disruptive if the stream table's defining query does not use SELECT *:
- Named columns: If the defining query explicitly lists columns (e.g.,
SELECT id, name, amount FROM orders), the new column is simply not captured by CDC and has no effect on the stream table. SELECT *: If the defining query usesSELECT *, pg_trickle detects the schema mismatch at the next refresh and marks the stream table withneeds_reinit = true. The next scheduler cycle performs a full reinitialization — drops the storage table, recreates it with the new column set, and does a full refresh.
CDC triggers capture the full row as JSONB regardless of which columns the stream table uses, so no trigger changes are needed.
What happens when I drop a column used in a stream table's query?
Dropping a column that is referenced in a stream table's defining query will cause the next refresh to fail because the column no longer exists in the source table. pg_trickle handles this via:
- DDL event trigger detects the
ALTER TABLE ... DROP COLUMNand marks all affected stream tables withneeds_reinit = true. - On the next refresh cycle, the scheduler attempts reinitialization — but the defining query will fail with a PostgreSQL error (e.g.,
column "amount" does not exist). - The stream table moves to ERROR status after
max_consecutive_errorsfailures. - A
reinitialize_neededNOTIFY alert is sent.
Resolution: Drop and recreate the stream table with an updated defining query:
SELECT pgtrickle.drop_stream_table('order_totals');
SELECT pgtrickle.create_stream_table(
name => 'order_totals',
query => 'SELECT id, name FROM orders', -- updated query without dropped column
schedule => '1m',
refresh_mode => 'DIFFERENTIAL'
);
What happens when I CREATE OR REPLACE a view used by a stream table?
PostgreSQL event triggers fire on CREATE OR REPLACE VIEW, so pg_trickle detects the change and marks dependent stream tables with needs_reinit = true. On the next refresh:
- If the new view definition is compatible (same output columns, same types), reinitialization succeeds transparently — the stream table is repopulated with the new query logic.
- If the new view definition changes the output schema (different columns or types), the delta query will fail and the stream table enters ERROR status.
Tip: To avoid disruption, use pgtrickle.alter_stream_table() to pause the stream table before replacing the view, then resume after verifying compatibility.
What happens when I alter or drop a function used in a stream table's query?
If a stream table's defining query calls a user-defined function (e.g., SELECT my_func(amount) FROM orders) and that function is altered or dropped:
- ALTER FUNCTION (changing the body): pg_trickle does not detect this automatically — PostgreSQL does not fire DDL event triggers for function body changes. The stream table continues refreshing with the new function behavior. If this is intentional, no action is needed. If you want a full rebase to the new logic, temporarily switch to FULL mode and refresh:
SELECT pgtrickle.alter_stream_table('my_st', refresh_mode => 'FULL'); SELECT pgtrickle.refresh_stream_table('my_st'); SELECT pgtrickle.alter_stream_table('my_st', refresh_mode => 'DIFFERENTIAL'); - DROP FUNCTION: The next refresh fails because the function no longer exists. The stream table enters ERROR status. Recreate the function or drop and recreate the stream table.
What is reinitialize and when does it trigger?
Reinitialize is pg_trickle's mechanism for handling structural changes to source tables. When a stream table is marked with needs_reinit = true, the next scheduler cycle performs:
- Drop the existing storage table (the physical heap table backing the stream table).
- Recreate the storage table from the defining query's current output schema.
- Full refresh — run the defining query against current source data and populate the new storage table.
- Reset the frontier to the current LSN.
- Clear the
needs_reinitflag.
Reinitialize triggers automatically when:
- DDL event triggers detect
ALTER TABLE,DROP TABLE, orCREATE OR REPLACE VIEWon source tables or intermediate views. - A
needs_reinitNOTIFY alert is sent. - You can also trigger it manually:
UPDATE pgtrickle.pgt_stream_tables SET needs_reinit = true WHERE pgt_name = 'my_st';
Can I block DDL on tracked source tables?
pg_trickle does not currently block DDL on source tables — it only reacts to DDL changes via event triggers. If you want to prevent accidental schema changes on critical source tables, use PostgreSQL's built-in mechanisms:
-- Revoke ALTER/DROP from application roles
REVOKE ALL ON TABLE orders FROM app_user;
GRANT SELECT, INSERT, UPDATE, DELETE ON TABLE orders TO app_user;
-- Only the table owner (or superuser) can now ALTER/DROP
Alternatively, create a custom event trigger that raises an exception when DDL targets tracked source tables:
CREATE OR REPLACE FUNCTION prevent_source_ddl() RETURNS event_trigger AS $$
BEGIN
IF EXISTS (
SELECT 1 FROM pg_event_trigger_ddl_commands() cmd
JOIN pgtrickle.pgt_dependencies d ON d.source_relid = cmd.objid
) THEN
RAISE EXCEPTION 'Cannot ALTER/DROP a table tracked by pg_trickle';
END IF;
END;
$$ LANGUAGE plpgsql;
CREATE EVENT TRIGGER guard_source_ddl ON ddl_command_end
EXECUTE FUNCTION prevent_source_ddl();
What happens if I run DDL on a source table during an active refresh?
PostgreSQL's locking mechanism prevents most conflicts. The refresh transaction acquires a ShareLock on source tables before reading them. Since ALTER TABLE (including ADD COLUMN, DROP COLUMN, ALTER TYPE) requires an AccessExclusiveLock, the DDL statement blocks until the refresh transaction completes.
In practice:
- During a refresh: The ALTER TABLE waits for the refresh to finish, then proceeds. pg_trickle's DDL event trigger then detects the change and marks the stream table for reinitialization.
- Between refreshes: DDL proceeds immediately. The next refresh picks up the reinitialization flag.
There is a tiny theoretical window between lock acquisition and the first read where DDL could sneak in, but this is prevented by PostgreSQL's MVCC — the refresh's snapshot was taken before the DDL committed, so it reads the old schema regardless.
If pg_trickle.block_source_ddl = true: Column-affecting DDL on tracked source tables is rejected entirely with an ERROR, regardless of whether a refresh is running.
Do stream tables work with logical replication?
Stream tables are replicated to standbys via physical (streaming) replication like any other heap table. However, they are not automatically maintained by pg_trickle on the subscriber:
| Aspect | Primary | Physical standby | Logical subscriber |
|---|---|---|---|
| Scheduler runs | Yes | No (read-only) | No (no pg_trickle catalog) |
| Stream tables readable | Yes | Yes (replicated) | Only if published |
| Refreshes occur | Yes | No (standby is read-only) | No |
| Change buffers | Managed by pg_trickle | Replicated but not consumed | Not available |
Key limitations:
- Change buffer tables (
pgtrickle_changes.*) are not published through logical replication — they are internal transient data. - The pg_trickle catalog (
pgtrickle.pgt_stream_tables) is not replicated through logical replication. - On a physical standby, stream tables receive updates through streaming replication with the usual replication lag.
Recommended pattern: Run pg_trickle on the primary only. Read stream tables from any physical standby.
Performance & Tuning
This section covers scheduler tuning, the adaptive FULL fallback, disk space management, and guidance on when to use DIFFERENTIAL vs. FULL mode.
How do I tune the scheduler interval?
The pg_trickle.scheduler_interval_ms GUC controls how often the scheduler checks for stale stream tables (default: 1000 ms).
| Workload | Recommended Value |
|---|---|
| Low-latency (near real-time) | 100–500 |
| Standard | 1000 (default) |
| Low-overhead (many STs, long schedules) | 5000–10000 |
Is there any risk in setting min_schedule_seconds very low?
Yes. pg_trickle.min_schedule_seconds (default: 60) is a safety guardrail, not an arbitrary limit. Setting it very low — especially in production — can cause several problems:
WAL amplification. Every differential refresh writes a MERGE to the WAL. At 1-second intervals across many stream tables, WAL generation rises sharply, increasing replication lag and storage costs.
Lock contention. Each refresh acquires locks on the change buffer table. With cleanup_use_truncate = true (the default), this is an AccessExclusiveLock. Sub-second schedules can starve concurrent INSERT/UPDATE/DELETE statements on the source tables.
Cascading refresh load. If a refresh takes longer than the schedule interval (e.g., an 800 ms refresh on a 1-second schedule), the next refresh fires almost immediately upon completion. With chained or diamond-shaped ST graphs, the entire topological chain must complete within the interval to avoid falling behind.
Autovacuum pressure. Rapid MERGE operations produce dead tuples in the stream table faster than autovacuum can clean them up, bloating the table and degrading query performance over time.
Adaptive fallback triggering. At high change rates, pg_trickle.differential_max_change_ratio may trigger a FULL refresh instead of DIFFERENTIAL. A FULL refresh at 1-second intervals is very expensive and defeats the purpose of differential maintenance.
Practical guidance:
| Environment | Recommended minimum |
|---|---|
| Development / testing | 1 s — fine for fast iteration |
| Lightly loaded production | 10–30 s |
| Standard production | 60 s (default) |
| High-throughput OLTP | 120+ s — let change buffers accumulate for efficient batch merging |
If you need near-real-time results, consider IMMEDIATE mode (refresh_mode => 'DIFFERENTIAL' with same-transaction refresh) instead of a very short schedule — it avoids the scheduler overhead entirely and updates the stream table within your transaction.
What is the adaptive fallback to FULL?
When the number of pending changes exceeds pg_trickle.differential_max_change_ratio (default: 15%) of the source table size, DIFFERENTIAL mode automatically falls back to FULL for that refresh cycle. This prevents pathological delta queries on bulk changes.
- Set to
0.0to always use DIFFERENTIAL (even on large change sets) - Set to
1.0to effectively always use FULL - Default
0.15(15%) is a good balance
How many concurrent refreshes can run?
By default (parallel_refresh_mode = 'off') refreshes are processed sequentially within the scheduler's single background worker. This is safe and efficient for most deployments.
Starting in v0.4.0, true parallel refresh is available via:
ALTER SYSTEM SET pg_trickle.parallel_refresh_mode = 'on';
ALTER SYSTEM SET pg_trickle.max_dynamic_refresh_workers = 4; -- cluster-wide cap
ALTER SYSTEM SET pg_trickle.max_concurrent_refreshes = 4; -- per-database cap
SELECT pg_reload_conf();
When enabled, independent stream tables at the same DAG level are refreshed concurrently in separate dynamic background workers. Each worker uses one max_worker_processes slot — see the worker-budget formula before enabling.
Monitor parallel refresh with:
SELECT * FROM pgtrickle.worker_pool_status();
SELECT * FROM pgtrickle.parallel_job_status(60);
For most deployments with fewer than 100 stream tables, sequential processing is still efficient (each differential refresh typically takes 5–50 ms).
How do I check if my stream tables are keeping up?
-- Quick overview
SELECT pgt_name, status, staleness, stale
FROM pgtrickle.stream_tables_info;
-- Detailed statistics
SELECT pgt_name, total_refreshes, avg_duration_ms, consecutive_errors, stale
FROM pgtrickle.pg_stat_stream_tables;
-- Recent refresh history for a specific ST
SELECT * FROM pgtrickle.get_refresh_history('order_totals', 10);
What is __pgt_row_id?
Every stream table has a __pgt_row_id BIGINT PRIMARY KEY column that stores a 64-bit xxHash of the row's identity key. The refresh engine uses it to match incoming deltas against existing rows during MERGE operations.
For a detailed explanation of how this column is computed and why it exists, see What is the __pgt_row_id column and why does it appear in my stream tables? in the General section.
You should ignore this column in your queries. It is an implementation detail.
How much disk space do change buffer tables consume?
Each change buffer table stores one row per source-table change (INSERT, UPDATE, DELETE, or TRUNCATE marker). The row size depends on the source table's column count and data types:
| Component | Approximate size |
|---|---|
action column (char) | 1 byte |
old_data / new_data (JSONB) | 1–10 KB per row (depends on source columns) |
lsn (pg_lsn) | 8 bytes |
txid (xid8) | 8 bytes |
| Index (on lsn) | ~40 bytes per row |
Rule of thumb: Buffer tables consume roughly 2–3× the raw row size of the source change, because both OLD and NEW values are stored as JSONB.
Buffer tables are cleaned up (truncated or deleted) after each successful refresh. If you suspect buffer bloat, check:
SELECT relname, pg_size_pretty(pg_total_relation_size(oid)) AS size
FROM pg_class
WHERE relnamespace = (SELECT oid FROM pg_namespace WHERE nspname = 'pgtrickle_changes')
ORDER BY pg_total_relation_size(oid) DESC;
What determines whether DIFFERENTIAL or FULL is faster for a given workload?
The breakeven point depends on the change ratio — the number of changed rows relative to the total source table size:
| Change ratio | Recommended mode | Why |
|---|---|---|
| < 5% | DIFFERENTIAL | Delta query touches few rows; much cheaper than re-reading everything |
| 5–15% | DIFFERENTIAL (usually) | Still faster, but approaching the crossover |
| 15–50% | FULL | The delta query scans a large fraction of the source anyway; FULL avoids the overhead of delta computation |
| > 50% | FULL | Bulk load scenario — TRUNCATE + INSERT is simpler and faster |
Additional factors:
- Query complexity: Queries with many joins or window functions have more expensive delta computation. The crossover shifts lower.
- Source table size: For small tables (<10K rows), FULL is nearly always faster because the overhead is negligible.
- Index presence: DIFFERENTIAL uses indexes to look up changed rows. Missing indexes on join keys or GROUP BY columns can make delta queries slow.
The adaptive fallback (pg_trickle.differential_max_change_ratio, default 0.15) automates this decision per-cycle.
What are the planner hints and when should I disable them?
Before executing a delta query, pg_trickle sets several session-level planner parameters to guide PostgreSQL toward efficient delta plans:
SET LOCAL enable_seqscan = off; -- Prefer index scans for small deltas
SET LOCAL enable_nestloop = on; -- Nested loops are good for small delta × large table joins
SET LOCAL enable_mergejoin = off; -- Merge joins are worse for skewed delta sizes
These hints are active only during the refresh transaction and are reset afterward.
When to disable hints: If you notice that a particular stream table's refresh is slow (check avg_duration_ms in pg_stat_stream_tables), the planner hints may be suboptimal for that specific query. You can disable them by setting:
SET pg_trickle.planner_hints = off;
This allows PostgreSQL's planner to choose its own strategy. Test both settings and compare avg_duration_ms.
How do prepared statements help refresh performance?
The refresh engine uses PostgreSQL prepared statements (PREPARE / EXECUTE) for the delta and MERGE queries. On the first refresh, the statement is prepared; subsequent refreshes reuse the cached plan. Benefits:
- Reduced planning overhead. For complex delta queries with many joins and CTEs, planning can take 5–50 ms. Prepared statements skip this on subsequent refreshes.
- Stable plans. The planner uses generic plans after the 5th execution (PostgreSQL default), avoiding plan instability from statistic fluctuations.
Prepared statements are stored per-session and are invalidated when:
- The stream table is reinitialized (schema change)
- The shared cache generation advances after DDL or stream-table metadata changes
- The PostgreSQL connection is recycled
- The session ends
How does the adaptive FULL fallback threshold work in practice?
The pg_trickle.differential_max_change_ratio GUC (default: 0.15) is evaluated per source table, per refresh cycle:
- Before each differential refresh, the engine counts pending changes in the buffer table:
pending_changes = COUNT(*) FROM pgtrickle_changes.changes_<oid>. - It estimates the source table size from
pg_class.reltuples. - If
pending_changes / reltuples > differential_max_change_ratio, the engine falls back to FULL for that cycle.
Edge cases:
- If the source table has
reltuples = 0(freshly created, no ANALYZE yet), the engine always uses FULL until statistics are available. - For multi-source stream tables (joins), each source is evaluated independently. If any source exceeds the threshold, the entire refresh falls back to FULL.
- The threshold applies to the current cycle only — the next cycle re-evaluates.
How many stream tables can a single PostgreSQL instance handle?
There is no hard limit. Practical limits depend on:
| Factor | Guideline |
|---|---|
| Scheduler overhead | Each cycle iterates all STs; at 1000 STs with 1ms overhead per check, the cycle takes ~1s |
| Background connections | 1 per database (the scheduler) + 1 per manual refresh call |
| Change buffer bloat | Each source table gets its own buffer table — many sources = many tables in pgtrickle_changes |
| Catalog size | pgt_stream_tables and pgt_dependencies grow linearly |
| Refresh throughput | Sequential processing means total cycle time = sum of individual refresh times |
Tested benchmarks: Up to 500 stream tables on a single instance with <2s total cycle time for DIFFERENTIAL refreshes averaging 3ms each.
What is the TRUNCATE vs DELETE cleanup trade-off for change buffers?
After each successful refresh, the engine cleans up processed change records from the buffer table. The pg_trickle.cleanup_use_truncate GUC (default: true) controls the method:
| Method | Pros | Cons |
|---|---|---|
TRUNCATE (default) | Instant — O(1) regardless of row count. Reclaims disk space immediately. | Takes an ACCESS EXCLUSIVE lock on the buffer table, briefly blocking concurrent INSERTs from CDC triggers (~0.1 ms typical). |
DELETE | Row-level lock only — no blocking of concurrent CDC writes. | O(N) — proportional to the number of processed rows. Dead tuples require VACUUM to reclaim space. |
When to switch to DELETE: If your source table has extremely high write throughput (>10K writes/sec) and you observe brief stalls in DML latency during refresh cleanup, switch to DELETE:
ALTER SYSTEM SET pg_trickle.cleanup_use_truncate = false;
SELECT pg_reload_conf();
For most workloads, TRUNCATE is the better choice because buffer tables are typically emptied completely after each refresh.
Interoperability
Stream tables are standard PostgreSQL heap tables, which means they work with most PostgreSQL features. This section clarifies what’s compatible (views, replication, triggers) and what’s not (direct DML, foreign keys).
Can PostgreSQL views reference stream tables?
Yes. Since stream tables are standard PostgreSQL heap tables, you can create views on top of them just like any other table. The view will return whatever data is currently in the stream table, reflecting the most recent refresh:
CREATE VIEW high_value_customers AS
SELECT customer_id, total FROM pgtrickle.order_totals WHERE total > 1000;
This is a common pattern for adding per-user filters or formatting on top of a shared stream table.
Can materialized views reference stream tables?
Yes, though this is usually redundant — both materialized views and stream tables are physical snapshots of query results. The key difference is that the materialized view requires its own manual REFRESH MATERIALIZED VIEW call; it does not auto-refresh when the underlying stream table refreshes.
A more idiomatic approach is to create a second stream table that references the first one. This way, pg_trickle handles the dependency ordering and refresh scheduling for both automatically.
Can I replicate stream tables with logical replication?
Yes. Stream tables can be published like any ordinary table:
CREATE PUBLICATION my_pub FOR TABLE pgtrickle.order_totals;
Important caveats:
- The
__pgt_row_idcolumn is replicated (it is the primary key) - Subscribers receive materialized data, not the defining query
- Do not install pg_trickle on the subscriber and attempt to refresh the replicated table — it will have no CDC triggers or catalog entries
- Internal change buffer tables are not published by default
Can I INSERT, UPDATE, or DELETE rows in a stream table directly?
No. Stream table contents are managed exclusively by the refresh engine, and direct DML will corrupt the internal state (row IDs, frontier tracking, and change buffer consistency). See Why can't I INSERT, UPDATE, or DELETE rows in a stream table? for a detailed explanation of what goes wrong.
If you need to post-process stream table data, create a view or a second stream table that references the first one.
Can I add foreign keys to or from stream tables?
No. Foreign key constraints are incompatible with how the refresh engine operates. The engine uses bulk MERGE operations that apply inserts and deletes atomically, without guaranteeing the row-by-row ordering that foreign key checks require. Full refreshes also use TRUNCATE + INSERT, which bypasses cascade logic entirely.
See Why can't I add foreign keys? for details. If you need referential integrity, enforce it in your application or in a view that joins the stream tables.
Can I add my own triggers to stream tables?
Yes, for DIFFERENTIAL mode stream tables. When user-defined row-level triggers are detected, the refresh engine automatically switches from MERGE to explicit DELETE + UPDATE + INSERT statements. This ensures triggers fire with the correct TG_OP, OLD, and NEW values. Legacy configs that still set pg_trickle.user_triggers = 'on' are treated the same as auto.
Limitations:
- Row-level triggers do not fire during FULL refresh (they are automatically suppressed via
DISABLE TRIGGER USER). UseREFRESH MODE DIFFERENTIALfor stream tables with triggers. - The
IS DISTINCT FROMguard prevents no-opUPDATEtriggers when the aggregate result is unchanged. BEFOREtriggers that modifyNEWwill affect the stored value — the next refresh may "correct" it back, causing oscillation.
See the pg_trickle.user_triggers GUC in CONFIGURATION.md for control options.
Can I ALTER TABLE a stream table directly?
No. Direct ALTER TABLE would change the physical table without updating pg_trickle's catalog, causing column mismatches and __pgt_row_id invalidation on the next refresh. See Why can't I ALTER TABLE a stream table directly? for details.
Instead, use the pg_trickle API:
-- Change schedule, mode, or status:
SELECT pgtrickle.alter_stream_table('order_totals', schedule => '10m');
-- To change the defining query or column structure, drop and recreate:
SELECT pgtrickle.drop_stream_table('order_totals');
SELECT pgtrickle.create_stream_table(
name => 'order_totals',
query => '...',
schedule => '5m',
refresh_mode => 'DIFFERENTIAL'
);
Does pg_trickle work with PgBouncer or other connection poolers?
It depends on the pooling mode. pg_trickle's background scheduler uses session-level features that are incompatible with transaction-mode connection pooling:
| Feature | Issue with Transaction-Mode Pooling |
|---|---|
pg_advisory_lock() | Session-level lock released when connection returns to pool — concurrent refreshes possible |
PREPARE / EXECUTE | Prepared statements are session-scoped — "does not exist" errors on different connections |
LISTEN / NOTIFY | Notifications lost when listeners change connections |
Recommended configurations:
- Session-mode pooling (
pool_mode = session): Fully compatible. The scheduler holds a dedicated connection. - Direct connection (no pooler for the scheduler): Fully compatible. Application queries can still go through a pooler.
- Transaction-mode pooling (
pool_mode = transaction): Not supported. The scheduler requires a persistent session.
Tip: If your infrastructure requires transaction-mode pooling (e.g., AWS RDS Proxy, Supabase), route the pg_trickle background worker through a direct connection while keeping application traffic on the pooler. Most connection poolers support per-database or per-user routing rules.
Does pg_trickle work with pgvector?
Partially — it depends on the refresh mode and what the defining query does.
What works:
- Source tables with
vectorcolumns. CDC triggers are generated using PostgreSQL'sformat_type(), which returns the full type name (e.g.vector(1536)). Change buffer tables mirror the source schema correctly, so inserts, updates, and deletes on pgvector tables are captured and replayed without issue. - Passing vector columns through in DIFFERENTIAL mode. Stream tables that select, filter (on non-vector columns), or join sources that happen to contain
vectorcolumns work correctly — the vector data is treated as an opaque value and copied through unchanged. - FULL mode with any pgvector expression. Because FULL mode re-executes the entire defining query, all pgvector operators (
<->,<=>,<#>) and functions (cosine_distance,l2_normalize, etc.) work exactly as they do in a regular query.
What does not work:
- DIFFERENTIAL mode with pgvector distance operators in the query. The DVM engine needs a differentiation rule for every SQL operator it encounters. Custom operators like
<->(L2 distance) or<=>(cosine distance) are not in the built-in rule set. The engine will fall back automatically to FULL mode if such operators appear in the delta query path. Setrefresh_mode => 'FULL'explicitly to make this intent clear. - Incremental aggregation over vector columns. There is no meaningful incremental form for aggregates over
vectorvalues (e.g. averaging embeddings). Use FULL mode for any aggregate that involves vector arithmetic.
Recommended pattern for a nearest-neighbour cache or semantic search result set:
CREATE EXTENSION IF NOT EXISTS vector;
SELECT pgtrickle.create_stream_table(
name => 'top_similar_docs',
query => $$
SELECT d.id, d.title, d.embedding,
d.embedding <=> '[0.1, 0.2, 0.3]'::vector AS distance
FROM documents d
ORDER BY distance
LIMIT 100
$$,
schedule => '5m',
refresh_mode => 'FULL'
);
For use-cases that only carry vector columns through without computing on them, DIFFERENTIAL mode works fine:
-- Vectors are not used in the delta computation — DIFFERENTIAL is safe here
SELECT pgtrickle.create_stream_table(
name => 'active_doc_embeddings',
query => $$
SELECT id, embedding
FROM documents
WHERE status = 'published'
$$,
schedule => '1m',
refresh_mode => 'DIFFERENTIAL'
);
dbt Integration
The dbt-pgtrickle package provides a stream_table materialization that
lets you manage stream tables through dbt’s standard workflow. This section
covers setup, commands, freshness checks, and query change handling.
How do I use pg_trickle with dbt?
Install the dbt-pgtrickle package (a pure Jinja SQL macro package — no Python dependencies):
# packages.yml
packages:
- package: pg_trickle/dbt_pgtrickle
version: ">=0.2.0"
Then define a stream table model using the stream_table materialization:
-- models/order_totals.sql
{{ config(
materialized='stream_table',
schedule='1m',
refresh_mode='DIFFERENTIAL'
) }}
SELECT customer_id, SUM(amount) AS total
FROM {{ source('public', 'orders') }}
GROUP BY customer_id
The stream_table materialization calls pgtrickle.create_stream_table() on the first run and pgtrickle.alter_stream_table() on subsequent runs (if the schedule or mode changes).
What dbt commands work with stream tables?
| Command | Behavior |
|---|---|
dbt run | Creates stream tables that don't exist; updates schedule/mode if changed; does not alter the defining query of existing STs |
dbt run --full-refresh | Drops and recreates all stream tables from scratch (new defining query, fresh data) |
dbt test | Works normally — tests query the stream table as a regular table |
dbt source freshness | Works if you configure a freshness block on the stream table source |
dbt docs generate | Documents stream tables like any other model |
How does dbt run --full-refresh work with stream tables?
When --full-refresh is passed, the stream_table materialization:
- Calls
pgtrickle.drop_stream_table('model_name')to remove the existing stream table, CDC triggers, and change buffers. - Calls
pgtrickle.create_stream_table(...)with the current defining query from the model file. - The new stream table starts in INITIALIZING status and performs its first full refresh.
This is the correct way to update a stream table's defining query in dbt. Without --full-refresh, dbt will not detect query changes (it only compares schedule and mode).
How do I check stream table freshness in dbt?
Use dbt's built-in source freshness feature by adding a freshness block to your source definition:
# models/sources.yml
sources:
- name: pgtrickle
schema: pgtrickle
tables:
- name: order_totals
loaded_at_field: "last_refreshed_at" # from stream_tables_info
freshness:
warn_after: {count: 5, period: minute}
error_after: {count: 15, period: minute}
Then run dbt source freshness to check.
Alternatively, query the pg_trickle monitoring views directly in a dbt test:
-- tests/check_freshness.sql
SELECT pgt_name FROM pgtrickle.stream_tables_info WHERE stale = true
What happens when the defining query changes in dbt?
If you modify the SQL in a stream table model file and run dbt run without --full-refresh:
- The
stream_tablematerialization detects that the stream table already exists. - It compares the schedule and refresh mode — if either changed, it calls
alter_stream_table()to update them. - It does not compare the defining query text. The existing defining query remains in effect.
To apply a new defining query, you must run dbt run --full-refresh. This drops and recreates the stream table with the new query.
Recommendation: After changing a model's SQL, always run dbt run --full-refresh -s model_name to apply the change.
Can I use dbt snapshot with stream tables?
Yes, with caveats. dbt snapshots work by tracking changes to a source table over time using updated_at or check strategies. You can snapshot a stream table like any other table.
However, keep in mind:
- Stream tables are refreshed periodically, not on every write. The snapshot will only capture changes at refresh boundaries, not at the granularity of individual source-table writes.
- The
__pgt_row_idcolumn will appear in the snapshot. You may want to exclude it withcheck_colsor aselectin the snapshot configuration. - FULL refresh mode replaces all rows each cycle, which will appear as "updates" to the snapshot strategy even if the data hasn't changed. Use DIFFERENTIAL mode for stream tables that are snapshotted.
What dbt versions are supported?
dbt-pgtrickle is a pure Jinja SQL macro package that works with:
- dbt-core 1.7+ (the
stream_tablematerialization uses standard Jinja patterns) - dbt-postgres adapter (required for PostgreSQL connection)
There are no Python dependencies beyond dbt-core and dbt-postgres. The package is tested against dbt 1.7.x and 1.8.x in CI.
Row-Level Security (RLS)
Does RLS on source tables affect stream table content?
No. Stream tables always materialize the full, unfiltered result set,
regardless of any RLS policies on source tables. This matches the behavior of
PostgreSQL's built-in REFRESH MATERIALIZED VIEW.
The scheduled refresh runs as a superuser background worker. Manual calls to
refresh_stream_table() and IMMEDIATE-mode IVM triggers also bypass RLS
internally (SET LOCAL row_security = off / SECURITY DEFINER trigger
functions), ensuring the stream table content is always complete and
deterministic.
Can I use RLS on a stream table to filter reads per role?
Yes. Stream tables are regular PostgreSQL tables, so ALTER TABLE … ENABLE ROW LEVEL SECURITY and CREATE POLICY work exactly as expected.
This is the recommended pattern for multi-tenant filtering:
ALTER TABLE pgtrickle.order_totals ENABLE ROW LEVEL SECURITY;
CREATE POLICY tenant_isolation ON pgtrickle.order_totals
USING (tenant_id = current_setting('app.tenant_id')::INT);
One stream table serves all tenants. Per-tenant filtering happens at query time with zero storage duplication.
What happens when I ENABLE or DISABLE RLS on a source table?
pg_trickle's DDL event trigger detects ALTER TABLE … ENABLE ROW LEVEL SECURITY, DISABLE ROW LEVEL SECURITY, FORCE ROW LEVEL SECURITY, and
NO FORCE ROW LEVEL SECURITY on source tables and marks all dependent
stream tables for reinitialisation. The same applies to CREATE POLICY,
ALTER POLICY, and DROP POLICY.
Why are IVM trigger functions SECURITY DEFINER?
In IMMEDIATE mode, the IVM trigger fires in the DML-issuing user's context.
If that user has restricted RLS visibility, the delta query could see only a
subset of the base table rows, producing a corrupt stream table. Making the
trigger function SECURITY DEFINER (owned by the extension installer, typically
a superuser) ensures the delta query always has full visibility. The DML itself
is still subject to the user's own RLS policies — only the stream table
maintenance runs with elevated privileges.
The trigger functions also set search_path = pg_catalog, pgtrickle, pgtrickle_changes, public to prevent search_path hijacking — a security best
practice for all SECURITY DEFINER functions. The public schema is included
because the delta SQL references user tables that typically reside there.
Deployment & Operations
This section covers the operational aspects of running pg_trickle in production: background workers, upgrades, restarts, replicas, Kubernetes, partitioned tables, and multi-database deployments.
How many background workers does pg_trickle use?
pg_trickle uses a two-tier background worker model:
- Launcher (
pg_trickle launcher) — one per cluster, static. Scanspg_databaseevery ~10 seconds and spawns a per-database scheduler for every database where pg_trickle is installed. Automatically re-spawns schedulers that exit. - Per-database scheduler (
pg_trickle scheduler) — one dynamic worker per database with pg_trickle installed.
| Component | Workers | Notes |
|---|---|---|
| Launcher | 1 (static) | Cluster-wide; connects to postgres database |
| Scheduler | 1 per database (dynamic) | Persistent per database; drives all refreshes |
| Parallel refresh workers | 0–N per database | Only when pg_trickle.parallel_refresh_mode = 'on' |
| WAL decoder | 0 (shared) | Shares the scheduler's SPI connection |
| Manual refresh | 0 | Runs in the caller's session |
How do I size max_worker_processes?
When max_worker_processes is too low, the launcher silently fails to spawn schedulers for some databases and retries every 5 minutes. Those databases stop refreshing with no error in the stream table itself — you only see it in the PostgreSQL log:
WARNING: pg_trickle launcher: could not spawn scheduler for database 'mydb'
The minimum formula:
max_worker_processes ≥
1 (pg_trickle launcher)
+ N (one scheduler per database with pg_trickle installed)
+ max_dynamic_refresh_workers (only if parallel_refresh_mode = 'on'; default 4)
+ autovacuum_max_workers (default 3)
+ parallel query workers (max_parallel_workers_per_gather × concurrent queries)
+ slots for other extensions (logical replication launcher, etc.)
A practical starting point for a cluster with a handful of databases:
max_worker_processes = 32
This value requires a full PostgreSQL restart (not just reload).
How do I upgrade pg_trickle to a new version?
- Install the new shared library (replace the
.so/.dylibfile in PostgreSQL'slibdirectory). - Run the upgrade SQL:
This applies migration scripts (e.g.,ALTER EXTENSION pg_trickle UPDATE;pg_trickle--0.2.1--0.2.2.sql) that update catalog tables, add new functions, and migrate data as needed. - Restart PostgreSQL if the shared library changed (required for
shared_preload_librarieschanges). - Verify:
SELECT pgtrickle.version();
Zero-downtime upgrades are possible for minor versions (patch releases) that don't change the shared library. Just run ALTER EXTENSION pg_trickle UPDATE — no restart needed.
For detailed instructions, version-specific notes, rollback procedures, and troubleshooting, see the full Upgrading Guide.
How do I know if my shared library and SQL extension versions match?
The background worker checks for version mismatches at startup and logs a
WARNING if the compiled .so version differs from the installed SQL extension
version. You can also check manually:
-- Compiled .so version:
SELECT pgtrickle.version();
-- Installed SQL extension version:
SELECT extversion FROM pg_extension WHERE extname = 'pg_trickle';
If these differ, run ALTER EXTENSION pg_trickle UPDATE; and restart
PostgreSQL if prompted.
Are stream tables preserved during an upgrade?
Yes. ALTER EXTENSION pg_trickle UPDATE applies only additive schema
migrations (new columns, updated function signatures). Existing stream tables,
their data, refresh history, and CDC infrastructure are preserved. The
scheduler resumes normal operation after the upgrade completes.
For version-specific migration notes, see the Upgrading Guide — Version-Specific Notes.
What happens to stream tables during a PostgreSQL restart?
During a restart:
- The scheduler stops. No refreshes occur while PostgreSQL is down.
- CDC triggers are inactive. Source table writes during the restart window are captured when PostgreSQL comes back up (triggers are persistent DDL objects).
- On startup, the scheduler background worker starts, reads the catalog, rebuilds the DAG, and resumes refresh cycles from where it left off.
- Frontier reconciliation. The scheduler detects any gap between the stored frontier LSN and the current WAL position. Source changes that occurred between the last successful refresh and the restart are in the change buffers (for trigger-mode CDC) and will be processed in the first refresh cycle.
Net effect: Stream tables may be stale for the duration of the downtime, but no data is lost. The first refresh cycle after restart catches up automatically.
Can I use pg_trickle on a read replica / standby?
The scheduler does not run on standby servers. When pg_trickle detects it is running in recovery mode (pg_is_in_recovery() = true), the background worker enters a sleep loop and does not attempt any refreshes.
However, stream tables replicated from the primary are readable on the standby — they are regular heap tables and are replicated via physical (streaming) replication like any other table.
Pattern for read-heavy workloads:
- Run pg_trickle on the primary — it performs all refreshes.
- Query stream tables on the standby — read replicas get the latest refreshed data via streaming replication, with replication lag as the only additional delay.
How does pg_trickle work with CloudNativePG / Kubernetes?
pg_trickle is compatible with CloudNativePG. The cnpg/ directory in the repository contains example manifests:
- Dockerfile.ext — builds a PostgreSQL image with pg_trickle pre-installed
- cluster-example.yaml — CloudNativePG Cluster manifest with
shared_preload_libraries = 'pg_trickle'
Key considerations:
- Include
pg_trickleinshared_preload_librariesin the Cluster'spostgresqlconfiguration. - The scheduler runs on the primary pod only. Replica pods detect recovery mode and sleep.
- Pod restarts are handled the same way as PostgreSQL restarts (see above).
- Persistent volume claims preserve catalog and change buffers across pod rescheduling.
Does pg_trickle work with partitioned source tables?
Yes. pg_trickle installs CDC triggers on the partitioned parent table, which PostgreSQL automatically propagates to all existing and future partitions. When a row is inserted into any partition, the trigger fires and writes the change to the buffer table.
Caveats:
TRUNCATEon individual partitions fires the partition-level trigger, which is also captured.- Attaching or detaching partitions (
ALTER TABLE ... ATTACH/DETACH PARTITION) fires DDL event triggers, which may mark the stream table for reinitialization. - Row movement between partitions (when the partition key is updated) is captured as a DELETE from the old partition and an INSERT into the new partition.
Can I run pg_trickle in multiple databases on the same cluster?
Yes. Each database gets its own independent scheduler background worker, its own catalog tables, and its own change buffers. Stream tables in different databases do not interact.
Resource planning: Each database with stream tables requires 1 background worker slot in max_worker_processes. If you have many databases, the default of 8 is easily exhausted.
Important: When
max_worker_processesis exhausted, the launcher silently skips databases it cannot spawn a scheduler for and retries every 5 minutes. This means stream tables in those databases stop refreshing with no visible error — they just go stale. Check the PostgreSQL log for:WARNING: pg_trickle launcher: could not spawn scheduler for database 'mydb'If you see this, increase
max_worker_processesand restart PostgreSQL.
See How do I size max_worker_processes? for the full formula.
-- On each database where you want pg_trickle:
CREATE EXTENSION pg_trickle;
The extension must be created separately in each database — shared_preload_libraries loads the shared library cluster-wide, but the SQL objects (catalog tables, functions) are per-database.
Monitoring & Alerting
pg_trickle provides built-in monitoring views and NOTIFY-based alerting. This section explains the available views, alert events, and failure handling.
How do I list all stream tables in my database?
Several options depending on how much detail you need:
-- Quickest: name + status + mode + staleness
SELECT name, status, refresh_mode, is_populated, staleness
FROM pgtrickle.stream_tables_info;
-- Full stats: refresh counts, rows inserted/deleted, avg duration, error streaks
SELECT * FROM pgtrickle.pg_stat_stream_tables;
-- Live status including consecutive_errors and data_timestamp
SELECT * FROM pgtrickle.pgt_status();
-- Raw catalog (all persisted properties, no computed fields)
SELECT * FROM pgtrickle.pgt_stream_tables;
How do I inspect what pg_trickle is doing right now?
Quick status snapshot:
SELECT name, status, refresh_mode, consecutive_errors, staleness
FROM pgtrickle.pgt_status();
Deep dive into a specific stream table — shows the defining query, DVM operator tree, source tables, generated delta SQL, and current WAL frontier:
SELECT * FROM pgtrickle.explain_st('my_table');
Key properties returned:
| Property | Description |
|---|---|
dvm_supported | Whether differential maintenance is possible for this query |
operator_tree | How the DVM engine has decomposed the query |
delta_query | The actual SQL executed during a differential refresh |
frontier | Per-source LSN positions flushed at last refresh |
Recent refresh activity:
-- Last 10 refreshes for a stream table (action, status, rows, duration):
SELECT * FROM pgtrickle.get_refresh_history('my_table', 10);
-- Aggregate refresh stats for all stream tables:
SELECT * FROM pgtrickle.st_refresh_stats();
CDC and slot health:
-- Per-source CDC mode, WAL lag, and alerts:
SELECT * FROM pgtrickle.check_cdc_health();
-- Replication slot health (slot_name, active, lag_bytes):
SELECT * FROM pgtrickle.slot_health();
Real-time event stream:
LISTEN pg_trickle_alert;
-- Receives JSON payloads for: stale_data, auto_suspended, resumed,
-- reinitialize_needed, buffer_growth_warning, refresh_completed, refresh_failed
Pending change buffers (rows not yet consumed by a differential refresh):
SELECT stream_table, source_table, cdc_mode, pending_rows, buffer_bytes
FROM pgtrickle.change_buffer_sizes()
ORDER BY pending_rows DESC;
Are there convenience functions for inspecting source tables and CDC buffers?
Yes. pg_trickle provides two functions added to complement the existing monitoring suite:
pgtrickle.list_sources(name) — shows every source table a stream table depends on, the CDC mode each uses, and any column-level usage metadata:
SELECT * FROM pgtrickle.list_sources('order_totals');
-- Returns: source_table, source_oid, source_type, cdc_mode, columns_used
pgtrickle.change_buffer_sizes() — shows, for every tracked source table, how many CDC rows are pending (not yet consumed by a differential refresh) and the estimated on-disk size of the change buffer:
SELECT * FROM pgtrickle.change_buffer_sizes()
ORDER BY pending_rows DESC;
-- Returns: stream_table, source_table, source_oid, cdc_mode, pending_rows, buffer_bytes
A large pending_rows value for a source table means a differential refresh is overdue or stalled — use pgtrickle.get_refresh_history() to investigate.
Can I see a tree view of all stream table dependencies?
Yes. pgtrickle.dependency_tree() walks the dependency DAG and renders it as an indented ASCII tree:
SELECT tree_line, status, refresh_mode
FROM pgtrickle.dependency_tree();
Example output:
tree_line | status | refresh_mode
------------------------------------------+--------+--------------
report_summary | ACTIVE | DIFFERENTIAL
├── orders_by_region | ACTIVE | DIFFERENTIAL
│ ├── public.orders [src] | |
│ └── public.customers [src] | |
└── revenue_totals | ACTIVE | DIFFERENTIAL
└── public.orders [src] | |
Each row has node (qualified name), node_type (stream_table or source_table), depth, status, and refresh_mode. Source tables are shown as leaves tagged with [src].
What monitoring views are available?
| View | Description |
|---|---|
pgtrickle.stream_tables_info | Status overview with computed staleness |
pgtrickle.pg_stat_stream_tables | Comprehensive stats (refresh counts, avg duration, error streaks) |
How do I get alerted when something goes wrong?
pg_trickle sends PostgreSQL NOTIFY messages on the pg_trickle_alert channel with JSON payloads:
| Event | When |
|---|---|
stale_data | Staleness exceeds 2× the schedule |
auto_suspended | Stream table suspended after max consecutive errors |
reinitialize_needed | Upstream DDL change detected |
buffer_growth_warning | Change buffer growing unexpectedly |
refresh_completed | Refresh completed successfully |
refresh_failed | Refresh failed |
Listen with:
LISTEN pg_trickle_alert;
What happens when a stream table keeps failing?
After pg_trickle.max_consecutive_errors (default: 3) consecutive failures, the stream table moves to ERROR status and automatic refreshes stop. An auto_suspended NOTIFY alert is sent.
To recover:
-- Fix the underlying issue (e.g., restore a dropped source table), then:
SELECT pgtrickle.alter_stream_table('my_table', status => 'ACTIVE');
Retries use exponential backoff (base 1s, max 60s, ±25% jitter, up to 5 retries before counting as a real failure).
Configuration Reference
All pg_trickle settings are configured via PostgreSQL GUC parameters. The table below lists the most-used parameters; for the complete reference see CONFIGURATION.md.
| GUC | Type | Default | Description |
|---|---|---|---|
pg_trickle.enabled | bool | true | Enable/disable the scheduler. Manual refreshes still work when false. |
pg_trickle.scheduler_interval_ms | int | 1000 | Scheduler wake interval in milliseconds (100–60000) |
pg_trickle.min_schedule_seconds | int | 60 | Minimum allowed schedule duration (1–86400) |
pg_trickle.max_consecutive_errors | int | 3 | Failures before auto-suspending (1–100) |
pg_trickle.change_buffer_schema | text | pgtrickle_changes | Schema for CDC buffer tables |
pg_trickle.max_concurrent_refreshes | int | 4 | Max parallel refresh workers (1–32) |
pg_trickle.user_triggers | text | auto | User trigger handling: auto (detect), off (suppress), on (deprecated alias for auto) |
pg_trickle.differential_max_change_ratio | float | 0.15 | Change ratio threshold for adaptive FULL fallback (0.0–1.0) |
pg_trickle.cleanup_use_truncate | bool | true | Use TRUNCATE instead of DELETE for buffer cleanup |
All GUCs are SUSET context (superuser SET) and take effect without restart,
except shared_preload_libraries which requires a PostgreSQL restart.
For SQL function signatures and descriptions see SQL_REFERENCE.md.
Troubleshooting
This section covers common problems and how to debug them. If your issue isn’t listed here, check the refresh history for error messages and the monitoring views for status information.
How do I diagnose stalled data flow through stream tables?
See also: Error Reference — comprehensive guide to all pg_trickle error variants with causes and fixes.
If data seems to have stopped flowing -- stream tables show stale results despite DML on the source tables -- follow this systematic diagnostic workflow. Each step narrows the problem from broad health checks down to specific root causes.
Step 0 -- Verify GUC configuration:
Misconfigured GUCs are a common and easy-to-miss cause of stalled or severely throttled data flow. Check all pg_trickle settings in one query:
SELECT name, setting, unit
FROM pg_settings
WHERE name LIKE 'pg_trickle.%'
OR name = 'max_worker_processes'
ORDER BY name;
Key values to check:
| GUC | Safe value | Problem if set to |
|---|---|---|
pg_trickle.enabled | on | off -- stops all automatic refreshes |
pg_trickle.tiered_scheduling | on (fine) | on with all STs at tier = 'frozen' -- silently skips them |
pg_trickle.max_consecutive_errors | 3-10 | 1 -- one transient error suspends the ST permanently |
pg_trickle.scheduler_interval_ms | 100-1000 | Very high (e.g. 60000) -- scheduler only wakes every 60 s |
pg_trickle.auto_backoff | on | Fine normally, but if refreshes take >95% of schedule it silently stretches intervals up to 8x |
pg_trickle.default_schedule_seconds | 1-60 | Very high -- isolated CALCULATED tables refresh very infrequently |
max_worker_processes | >= 16 (typical) | Too low -- workers cannot be spawned; parallel mode silently stalls |
Also check whether any stream tables are frozen:
SELECT pgt_name, refresh_tier
FROM pgtrickle.pgt_stream_tables
WHERE refresh_tier = 'frozen';
Step 1 -- Quick health overview:
SELECT * FROM pgtrickle.health_check() WHERE severity != 'OK';
This single call checks scheduler status, error tables, stale tables, buffer
growth, replication slot lag, and the worker pool. Any WARN or ERROR row
tells you where to look next.
Step 2 -- Check stream table status and staleness:
SELECT name, status, refresh_mode, consecutive_errors, staleness
FROM pgtrickle.pgt_status()
ORDER BY staleness DESC NULLS FIRST;
Look for SUSPENDED status (auto-suspended after repeated errors), high
consecutive_errors, or unexpectedly large staleness.
Step 3 -- Check recent refresh activity:
SELECT start_time, stream_table, action, status, duration_ms, error_message
FROM pgtrickle.refresh_timeline(20);
If no recent rows appear, the scheduler may not be running. If rows show
ERROR, the error messages explain why refreshes are failing.
Step 4 -- Inspect errors for a specific stream table:
SELECT * FROM pgtrickle.diagnose_errors('my_stream_table');
Returns the last 5 FAILED refresh events with error classification and
suggested remediation steps.
Step 5 -- Check the CDC pipeline (are changes being captured?):
SELECT stream_table, source_table, cdc_mode, pending_rows, buffer_bytes
FROM pgtrickle.change_buffer_sizes()
ORDER BY pending_rows DESC;
pending_rows = 0everywhere: either no DML is happening on the source tables, or CDC triggers are missing.pending_rowsgrowing but stream tables are not refreshing: scheduler or refresh problem (go back to Steps 1-3).
Step 6 -- Verify CDC triggers exist and are enabled:
SELECT source_table, trigger_type, trigger_name
FROM pgtrickle.trigger_inventory()
WHERE NOT present OR NOT enabled;
Any rows returned here mean change capture is broken for that source table -- DML changes are not being recorded.
Step 7 -- Check CDC slot health (WAL mode only):
SELECT * FROM pgtrickle.check_cdc_health();
Look for alert values like slot_lag_exceeds_threshold or
replication_slot_missing.
Step 8 -- Verify the dependency DAG:
SELECT tree_line, status, refresh_mode
FROM pgtrickle.dependency_tree();
Confirms the dependency graph is wired as expected. A missing edge means upstream changes will not propagate to downstream stream tables.
Step 9 -- Check the parallel worker pool (if using parallel mode):
SELECT * FROM pgtrickle.worker_pool_status();
SELECT job_id, unit_key, status, duration_ms
FROM pgtrickle.parallel_job_status(300)
WHERE status NOT IN ('SUCCEEDED');
Common root causes at a glance:
| Symptom | Diagnostic function | Likely root cause |
|---|---|---|
| No refreshes happening at all | health_check -> scheduler_running | Background worker not running or pg_trickle.enabled = off |
Stream table in SUSPENDED status | pgt_status | Repeated errors hit max_consecutive_errors threshold |
| Zero pending changes despite DML | trigger_inventory | CDC trigger was dropped or disabled by DDL |
| WAL slot missing or lagging | check_cdc_health, slot_health | Replication slot dropped, or WAL retention exceeded |
| Buffers growing but no refreshes | change_buffer_sizes + refresh_timeline | Scheduler stalled, refresh failing, or lock contention |
| Upstream changes not propagating | dependency_tree | Upstream ST not connected in the DAG |
Unit tests crash with symbol not found in flat namespace on macOS 26+
macOS 26 (Tahoe) changed the dynamic linker (dyld) to eagerly resolve all
flat-namespace symbols at binary load time. pgrx extensions link PostgreSQL
server symbols (e.g. CacheMemoryContext, SPI_connect) with
-Wl,-undefined,dynamic_lookup, which previously resolved lazily. Since
cargo test --lib runs outside the postgres process, those symbols are
missing and the test binary aborts:
dyld[66617]: symbol not found in flat namespace '_CacheMemoryContext'
Use just test-unit — it automatically detects macOS 26+ and injects a
stub library (libpg_stub.dylib) via DYLD_INSERT_LIBRARIES. The stub
provides NULL/no-op definitions for the ~28 PostgreSQL symbols; they are never
called during unit tests (pure Rust logic only).
This does not affect integration tests, E2E tests, just lint,
just build, or the extension running inside PostgreSQL.
See the Installation Guide for details and manual usage.
My stream table is stuck in INITIALIZING status
The initial full refresh may have failed. Check:
SELECT * FROM pgtrickle.get_refresh_history('my_table', 5);
If the error is transient, retry with:
SELECT pgtrickle.refresh_stream_table('my_table');
My stream table shows stale data but the scheduler is running
Common causes:
- TRUNCATE on source table — bypasses CDC triggers. Manual refresh needed.
- Too many errors — check
consecutive_errorsinpgtrickle.pg_stat_stream_tables. Resume withALTER ... status => 'ACTIVE'. - Long-running refresh — check for lock contention or slow defining queries.
- Scheduler disabled — verify
SHOW pg_trickle.enabled;returnson.
I get "cycle detected" when creating a stream table
Stream tables cannot have circular dependencies. If stream table A depends on stream table B and B depends on A (either directly or through a chain of intermediate stream tables), pg_trickle rejects the creation with a clear error message listing the cycle path.
To resolve this, restructure your queries to eliminate the circular reference. Common patterns:
- Extract the shared logic into a single base stream table that both A and B reference.
- Use a regular view instead of a stream table for one side of the dependency.
- Merge the two queries into a single stream table if possible.
A source table was altered and my stream table stopped refreshing
pg_trickle detects DDL changes (column additions, drops, type changes) via event triggers and marks affected stream tables with needs_reinit = true. The next scheduler cycle will attempt to reinitialize the stream table — drop the storage table, recreate it from the current defining query schema, and perform a full refresh.
If the schema change breaks the defining query (e.g., a column referenced in the query was dropped or renamed), the reinitialization will fail repeatedly until the stream table hits max_consecutive_errors and enters ERROR status.
To fix it: Update the defining query and recreate the stream table:
SELECT pgtrickle.drop_stream_table('order_totals');
SELECT pgtrickle.create_stream_table(
name => 'order_totals',
query => 'SELECT id, name FROM orders', -- updated query reflecting new schema
schedule => '1m',
refresh_mode => 'DIFFERENTIAL'
);
Check the refresh history for the specific error message:
SELECT * FROM pgtrickle.get_refresh_history('order_totals', 5);
How do I see the delta query generated for a stream table?
SELECT pgtrickle.explain_st('order_totals');
This shows the DVM operator tree, source tables, and the generated delta SQL.
How do I interpret the refresh history?
The pgtrickle.get_refresh_history() function returns the most recent refresh records for a stream table:
SELECT * FROM pgtrickle.get_refresh_history('order_totals', 10);
Key columns:
| Column | Meaning |
|---|---|
action | Refresh type: FULL, DIFFERENTIAL, TOPK, IMMEDIATE, or REINITIALIZE |
rows_inserted | Rows added to the stream table in this cycle |
rows_deleted | Rows removed from the stream table in this cycle |
rows_updated | Rows modified in the stream table (for explicit DML path) |
duration_ms | Wall-clock time for the refresh |
error_message | NULL for success; error text for failures |
source_changes | Number of pending change records processed |
fallback_reason | If DIFFERENTIAL fell back to FULL: change_ratio_exceeded, truncate_detected, or reinitialize |
Patterns to look for:
- High
rows_inserted+rows_deletedwith lowsource_changes→ possible duplicate rows (keyless source tables) fallback_reason = 'change_ratio_exceeded'frequently → consider lowering the threshold or switching to FULL mode- Increasing
duration_msover time → index maintenance or buffer bloat; consider VACUUM or checking for missing indexes
How can I tell if the scheduler is running?
Several ways to verify:
1. Check the background worker:
SELECT pid, datname, backend_type, state, query
FROM pg_stat_activity
WHERE backend_type = 'pg_trickle scheduler';
If no rows are returned, the scheduler is not running. Common causes:
pg_trickle.enabled = false- Extension not in
shared_preload_libraries max_worker_processesexhausted — the launcher silently skips databases it cannot accommodate and retries every 5 minutes. Check the PostgreSQL log forWARNING: pg_trickle launcher: could not spawn scheduler for database '...'.
2. Check recent refresh activity:
SELECT MAX(refreshed_at) AS last_refresh
FROM pgtrickle.pgt_stream_tables
WHERE status = 'ACTIVE';
If the last refresh was long ago relative to the shortest schedule, the scheduler may be stuck.
3. Check PostgreSQL logs:
The scheduler logs startup and shutdown messages at LOG level:
LOG: pg_trickle scheduler started for database "mydb"
LOG: pg_trickle scheduler shutting down (SIGTERM)
How do I debug a stream table that shows stale data?
Follow this diagnostic checklist:
- Is the scheduler running? (See above)
- Is the stream table active?
If status isSELECT pgt_name, status, consecutive_errors FROM pgtrickle.pg_stat_stream_tables WHERE pgt_name = 'my_st';ERRORorSUSPENDED, the stream table has been auto-suspended after repeated failures. - Are there pending changes?
If zero, the source table may not have CDC triggers (checkSELECT COUNT(*) FROM pgtrickle_changes.changes_<source_oid>;SELECT tgname FROM pg_trigger WHERE tgrelid = '<source_oid>'). - Is the refresh failing silently?
Check for error messages.SELECT * FROM pgtrickle.get_refresh_history('my_st', 5); - Is there lock contention? Long-running transactions holding locks on the source or stream table can block refreshes. Check
pg_locksandpg_stat_activity.
What does the needs_reinit flag mean and how do I clear it?
The needs_reinit flag in pgtrickle.pgt_stream_tables indicates that the stream table's physical storage needs to be rebuilt — typically because a source table's schema changed.
When needs_reinit = true:
- The scheduler skips normal differential/full refresh.
- Instead, it performs a reinitialize: drop the storage table, recreate it from the current defining query schema, and populate with a full refresh.
- If reinitialization succeeds,
needs_reinitis cleared automatically.
If reinitialization keeps failing (e.g., the defining query references a dropped column):
-- Fix the underlying issue first, then clear manually:
UPDATE pgtrickle.pgt_stream_tables SET needs_reinit = false WHERE pgt_name = 'my_st';
-- Or drop and recreate:
SELECT pgtrickle.drop_stream_table('my_st');
SELECT pgtrickle.create_stream_table(
name => 'my_st',
query => 'SELECT ...',
schedule => '1m',
refresh_mode => 'DIFFERENTIAL'
);
Why Are These SQL Features Not Supported?
This section gives detailed technical explanations for each SQL limitation. pg_trickle follows the principle of "fail loudly rather than produce wrong data" — every unsupported feature is detected at stream-table creation time and rejected with a clear error message and a suggested rewrite.
For all of these, returning an explicit error is a deliberate design choice: the alternative would be silently producing incorrect results after a refresh, which is far harder to diagnose.
How does NATURAL JOIN work?
NATURAL JOIN is fully supported. At parse time, pg_trickle resolves the common columns between the two tables (using OpTree::output_columns()) and synthesizes explicit equi-join conditions. This supports INNER, LEFT, RIGHT, and FULL NATURAL JOIN variants.
Internally, NATURAL JOIN is converted to an explicit JOIN ... ON before the DVM engine builds its operator tree, so delta computation works identically to a manually specified equi-join.
Note: The internal __pgt_row_id column is excluded from common column resolution, so NATURAL JOINs between stream tables work correctly.
How do GROUPING SETS, CUBE, and ROLLUP work?
GROUPING SETS, CUBE, and ROLLUP are fully supported via an automatic parse-time rewrite. pg_trickle decomposes these constructs into a UNION ALL of separate GROUP BY queries before the DVM engine processes the query.
Explosion guard:
CUBE(N)generates $2^N$ branches. pg_trickle rejects CUBE/ROLLUP combinations that would produce more than 64 branches to prevent runaway memory usage. Use explicitGROUPING SETS(...)instead.
For example:
-- This defining query:
SELECT dept, region, SUM(amount) FROM sales GROUP BY CUBE(dept, region)
-- Is automatically rewritten to:
SELECT dept, region, SUM(amount) FROM sales GROUP BY dept, region
UNION ALL
SELECT dept, NULL::text, SUM(amount) FROM sales GROUP BY dept
UNION ALL
SELECT NULL::text, region, SUM(amount) FROM sales GROUP BY region
UNION ALL
SELECT NULL::text, NULL::text, SUM(amount) FROM sales
GROUPING() function calls are replaced with integer literal constants corresponding to the grouping level. The rewrite is transparent — the DVM engine sees only standard GROUP BY + UNION ALL operators and can apply incremental delta computation to each branch independently.
How does DISTINCT ON (…) work?
DISTINCT ON is fully supported via an automatic parse-time rewrite. pg_trickle transparently transforms DISTINCT ON into a ROW_NUMBER() window function subquery:
-- This defining query:
SELECT DISTINCT ON (dept) dept, employee, salary
FROM employees ORDER BY dept, salary DESC
-- Is automatically rewritten to:
SELECT dept, employee, salary FROM (
SELECT dept, employee, salary,
ROW_NUMBER() OVER (PARTITION BY dept ORDER BY salary DESC) AS rn
FROM employees
) sub WHERE rn = 1
The rewrite happens before DVM parsing, so the operator tree sees a standard window function query and can apply partition-based recomputation for incremental delta maintenance.
Why is TABLESAMPLE rejected?
TABLESAMPLE returns a random subset of rows from a table (e.g., FROM orders TABLESAMPLE BERNOULLI(10) gives ~10% of rows).
Stream tables materialize the complete result set of the defining query and keep it up-to-date across refreshes. Baking a random sample into the defining query is not meaningful because:
-
Non-determinism. Each refresh would sample different rows, making the stream table contents unstable and unpredictable. The delta between refreshes would be dominated by sampling noise, not actual data changes.
-
CDC incompatibility. The trigger-based change-capture system tracks specific row-level changes (inserts, updates, deletes). A
TABLESAMPLEdefining query has no stable row identity — the "changed rows" concept doesn't apply when the entire sample shifts each cycle.
Rewrite:
-- Instead of sampling in the defining query:
SELECT * FROM orders TABLESAMPLE BERNOULLI(10)
-- Materialize the full result and sample when querying:
SELECT * FROM order_stream_table WHERE random() < 0.1
Why is LIMIT / OFFSET rejected?
Stream tables materialize the complete result set and keep it synchronized with source data. Bare LIMIT/OFFSET (without a recognized pattern) would truncate the result:
-
Undefined ordering.
LIMITwithoutORDER BYreturns an arbitrary subset. -
Delta instability. When source rows change, the boundary between "in the LIMIT" and "out of the LIMIT" shifts. A single INSERT could evict one row and admit another, requiring the refresh to track the full ordered position of every row.
-
Semantic mismatch. Users who write
LIMIT 100typically want to limit what they read, not what is stored.
Exception — TopK pattern: When the defining query has a top-level ORDER BY … LIMIT N (constant integer, optionally with OFFSET M), pg_trickle recognizes this as a TopK query and accepts it. The ORDER BY clause is required — bare LIMIT without ORDER BY is always rejected because it selects an arbitrary subset. With ORDER BY, the top-N boundary is well-defined and the stream table stores exactly those N rows (starting from position M+1 if OFFSET is specified). See the TopK section for details.
Rewrite (when TopK doesn't apply):
-- Instead of:
'SELECT * FROM orders ORDER BY created_at DESC LIMIT 100'
-- Omit LIMIT from the defining query, apply when reading:
SELECT * FROM orders_stream_table ORDER BY created_at DESC LIMIT 100
Why are window functions in expressions rejected?
Window functions like ROW_NUMBER() OVER (…) are supported as standalone columns in stream tables. However, embedding a window function inside an expression — such as CASE WHEN ROW_NUMBER() OVER (...) = 1 THEN ... or SUM(x) OVER (...) + 1 — is rejected.
This restriction exists because:
-
Partition-based recomputation. pg_trickle's differential mode handles window functions by recomputing entire partitions that were affected by changes. When a window function is buried inside an expression, the DVM engine cannot isolate the window computation from the surrounding expression, making it impossible to correctly identify which partitions to recompute.
-
Expression tree ambiguity. The DVM parser would need to differentiate the outer expression (arithmetic,
CASE, etc.) while treating the inner window function specially. This creates a combinatorial explosion of differentiation rules for every possible expression type × window function combination.
Rewrite:
-- Instead of:
SELECT id, CASE WHEN ROW_NUMBER() OVER (PARTITION BY dept ORDER BY salary DESC) = 1
THEN 'top' ELSE 'other' END AS rank_label
FROM employees
-- Move window function to a separate column, then use a wrapping stream table:
-- ST1:
SELECT id, dept, salary,
ROW_NUMBER() OVER (PARTITION BY dept ORDER BY salary DESC) AS rn
FROM employees
-- ST2 (references ST1):
SELECT id, CASE WHEN rn = 1 THEN 'top' ELSE 'other' END AS rank_label
FROM pgtrickle.employees_ranked
Why is FOR UPDATE / FOR SHARE rejected?
FOR UPDATE and related locking clauses (FOR SHARE, FOR NO KEY UPDATE, FOR KEY SHARE) acquire row-level locks on selected rows. This is incompatible with stream tables because:
-
Refresh semantics. Stream table contents are managed by the refresh engine using bulk
MERGEoperations. Row-level locks taken during the defining query would conflict with the refresh engine's own locking strategy. -
No direct DML. Since users cannot directly modify stream table rows, there is no use case for locking rows inside the defining query. The locks would be held for the duration of the refresh transaction and then released, serving no purpose.
How does ALL (subquery) work?
ALL (subquery) comparisons (e.g., WHERE x > ALL (SELECT y FROM t)) are supported via an automatic rewrite to NOT EXISTS. For example, x > ALL (SELECT y FROM t) is rewritten to NOT EXISTS (SELECT 1 FROM t WHERE y >= x), which pg_trickle handles via its anti-join operator.
Why is ORDER BY silently discarded?
ORDER BY in the defining query is accepted but ignored. This is consistent with how PostgreSQL treats CREATE MATERIALIZED VIEW AS SELECT ... ORDER BY ... — the ordering is not preserved in the stored data.
Stream tables are heap tables with no guaranteed row order. The ORDER BY in the defining query would only affect the order of the initial INSERT, which has no lasting effect. Apply ordering when querying the stream table:
-- This ORDER BY is meaningless in the defining query:
'SELECT region, SUM(amount) FROM orders GROUP BY region ORDER BY total DESC'
-- Instead, order when reading:
SELECT * FROM regional_totals ORDER BY total DESC
Why are unsupported aggregates (CORR, COVAR_*, REGR_*) limited to FULL mode?
Regression aggregates like CORR, COVAR_POP, COVAR_SAMP, and the REGR_* family require maintaining running sums of products and squares across the entire group. Unlike COUNT/SUM/AVG (where deltas can be computed from the change alone) or group-rescan aggregates (where only affected groups are re-read), regression aggregates:
-
Lack algebraic delta rules. There is no closed-form way to update a correlation coefficient from a single row change without access to the full group's data.
-
Would degrade to group-rescan anyway. Even if supported, the implementation would need to rescan the full group from source — identical to FULL mode for most practical group sizes.
These aggregates work fine in FULL refresh mode, which re-runs the entire query from scratch each cycle.
Why Are These Stream Table Operations Restricted?
Stream tables are regular PostgreSQL heap tables under the hood, but their contents are managed exclusively by the refresh engine. This section explains why certain operations that work on ordinary tables are disallowed or unsupported on stream tables, and what to do instead.
Why can't I INSERT, UPDATE, or DELETE rows in a stream table?
Stream table contents are the output of the refresh engine — they represent the materialized result of the defining query at a specific point in time. Direct DML would corrupt this contract in several ways:
-
Row ID integrity. Every row has a
__pgt_row_id(a 64-bit xxHash of the group-by key or all columns). The refresh engine uses this for deltaMERGE— matching incoming deltas against existing rows. A manually inserted row with an incorrect or duplicate__pgt_row_idwould cause the next differential refresh to produce wrong results (double-counting, missed deletes, or merge conflicts). -
Frontier inconsistency. Each refresh records a frontier — a set of per-source LSN positions that represent "data up to this point has been materialized." A manual DML change is not tracked by any frontier. The next differential refresh would either overwrite the change (if the delta touches the same row) or leave the stream table in a state that doesn't match any consistent point-in-time snapshot of the source data.
-
Change buffer desync. The CDC triggers on source tables write changes to buffer tables. The refresh engine reads these buffers and advances the frontier. Manual DML on the stream table bypasses this pipeline entirely — the buffer and frontier have no record of the change, so future refreshes cannot account for it.
If you need to post-process stream table data, create a view or a second stream table that references the first one.
Why can't I add foreign keys to or from a stream table?
Foreign key constraints require that referenced/referencing rows exist at the time of each DML statement. The refresh engine violates this assumption:
-
Bulk
MERGEordering. A differential refresh executes a singleMERGE INTOstatement that applies all deltas (inserts and deletes) atomically. PostgreSQL evaluates FK constraints row-by-row within thisMERGE. If a parent row is deleted and a new parent inserted in the same delta batch, the child FK check may fail because it sees the delete before the insert — even though the final state would be consistent. -
Full refresh uses
TRUNCATE+INSERT. In FULL mode, the refresh engine truncates the stream table and re-inserts all rows.TRUNCATEdoes not fire individualDELETEtriggers and bypasses FK cascade logic, which would leave referencing tables with dangling references. -
Cross-table refresh ordering. If stream table A has an FK referencing stream table B, both tables refresh independently (in topological order, but in separate transactions). There is no guarantee that A's refresh sees B's latest data — the FK constraint could transiently fail between refreshes.
Workaround: Enforce referential integrity in the consuming application or use a view that joins the stream tables and validates the relationship.
How do user-defined triggers work on stream tables?
When a DIFFERENTIAL mode stream table has user-defined row-level triggers, the refresh engine uses explicit DML decomposition instead of MERGE:
-
Delta materialized once. The delta query result is stored in a temporary table (
__pgt_delta_<id>) to avoid evaluating it three times. -
DELETE removed rows. Rows in the stream table whose
__pgt_row_idis absent from the delta are deleted.AFTER DELETEtriggers fire with correctOLDvalues. -
UPDATE changed rows. Rows whose
__pgt_row_idexists in both the stream table and delta but whose values differ (checked viaIS DISTINCT FROM) are updated.AFTER UPDATEtriggers fire with correctOLDandNEW. No-op updates (where values are identical) are skipped, preventing spurious triggers. -
INSERT new rows. Rows in the delta whose
__pgt_row_idis absent from the stream table are inserted.AFTER INSERTtriggers fire with correctNEWvalues.
FULL refresh behavior: Row-level user triggers are automatically suppressed during FULL refresh via DISABLE TRIGGER USER / ENABLE TRIGGER USER. A NOTIFY pgtrickle_refresh is emitted so listeners know a FULL refresh occurred. Use REFRESH MODE DIFFERENTIAL for stream tables that need per-row trigger semantics.
Performance: The explicit DML path adds ~25–60% overhead compared to MERGE for triggered stream tables. Stream tables without user triggers have zero overhead (only a fast pg_trigger check, <0.1 ms).
Control: The pg_trickle.user_triggers GUC controls this behavior:
auto(default): detect user triggers automaticallyoff: always use MERGE, suppressing triggerson: deprecated compatibility alias forauto
Why can't I ALTER TABLE a stream table directly?
Stream table metadata (defining query, schedule, refresh mode) is stored in the pg_trickle catalog (pgtrickle.pgt_stream_tables). A direct ALTER TABLE would change the physical table without updating the catalog, causing:
-
Column mismatch. If you add or remove columns, the refresh engine's cached delta query and
MERGEstatement would reference columns that no longer exist (or miss new ones), causing runtime errors. -
__pgt_row_idinvalidation. The row ID hash is computed from the defining query's output columns. Altering the table schema without updating the defining query would make existing row IDs inconsistent with the new column set.
Use pgtrickle.alter_stream_table() to change schedule, refresh mode, or status. To change the defining query or column structure, drop and recreate the stream table.
Why can't I TRUNCATE a stream table?
TRUNCATE removes all rows instantly but does not update the pg_trickle frontier or change buffers. After a TRUNCATE:
-
Differential refresh sees no changes. The frontier still records the last-processed LSN. No new source changes may have occurred, so the next differential refresh produces an empty delta — leaving the stream table empty even though the source still has data.
-
No recovery path for differential mode. The refresh engine has no way to detect that the stream table was externally truncated. It assumes the current contents match the frontier.
Use pgtrickle.refresh_stream_table('my_table') to force a full re-materialization, or drop and recreate the stream table if you need a clean slate.
What are the memory limits for delta processing?
The differential refresh path executes the delta query as a single SQL statement. For large batches (e.g., a bulk UPDATE of 10M rows), PostgreSQL may attempt to materialize the entire delta result set in memory. If the delta exceeds work_mem, PostgreSQL will spill to temporary files on disk, which is slower but safe. In extreme cases, OOM (out of memory) can occur if work_mem is set very high and the delta is enormous.
Mitigations:
-
Adaptive fallback. The
pg_trickle.differential_max_change_ratioGUC (default 0.15) automatically triggers a FULL refresh when the ratio of pending changes to total rows exceeds the threshold. This prevents large deltas from consuming excessive memory. -
work_memtuning. PostgreSQL'swork_memsetting controls how much memory each sort/hash operation uses before spilling to disk. For pg_trickle workloads, 64–256 MB is typical. Monitortemp_blks_writteninpg_stat_statementsto detect spilling. -
pg_trickle.merge_work_mem_mbGUC. Sets a session-levelwork_memoverride during the MERGE execution (default: 0 = use globalwork_mem). This allows higher memory for refresh without affecting other queries. -
Monitoring. If
pg_stat_statementsis installed, pg_trickle logs a warning when the MERGE query writes temporary blocks to disk.
Why are refreshes processed sequentially by default?
The default (parallel_refresh_mode = 'off') is sequential because it is simple, correct, and efficient for most workloads. Topological ordering guarantees upstream stream tables refresh before downstream ones with no coordination overhead.
When to consider enabling parallel refresh:
- Your database has many independent stream tables (no shared dependencies).
- Total cycle time = sum of all refresh durations and some refreshes are visibly blocking unrelated ones.
- You have enough
max_worker_processesheadroom (each parallel worker uses one slot).
Enabling parallel refresh (v0.4.0+):
ALTER SYSTEM SET pg_trickle.parallel_refresh_mode = 'on';
SELECT pg_reload_conf();
With parallel_refresh_mode = 'on', the scheduler builds an execution-unit DAG and dispatches independent units to dynamic background workers. Atomic consistency groups and IMMEDIATE-trigger closures remain single-worker for correctness.
See CONFIGURATION.md — Parallel Refresh for tuning guidance including the worker-budget formula.
How many connections does pg_trickle use?
pg_trickle uses the following PostgreSQL connections:
| Component | Connections | When |
|---|---|---|
| Background scheduler | 1 | Always (per database with STs) |
| WAL decoder polling | 0 (shared) | Uses the scheduler's SPI connection |
| Manual refresh | 1 | Per-call (uses caller's session) |
Total: 1 persistent connection per database. WAL decoder polling shares the scheduler's SPI connection rather than opening separate connections.
max_worker_processes: pg_trickle registers 1 background worker per database during _PG_init(). Ensure max_worker_processes (default 8) has room for the pg_trickle worker plus any other extensions.
Advisory locks: The scheduler holds a session-level advisory lock per actively-refreshing ST. These are released immediately after each refresh completes.
See also: Troubleshooting · Error Reference · Configuration · SQL Reference · Glossary
What's New
A curated, plain-language summary of recent pg_trickle releases — the bits a human reader actually wants to see. For the full exhaustive list of changes per release, see the Changelog.
v0.34 — Citus self-driving (April 2026)
The Citus integration grew up. The per-worker WAL slot lifecycle — creation, polling, lease management, recovery from rebalances — now runs automatically. There is no manual wiring left for distributed sources.
- Per-worker slot lifecycle fully automated (CITUS)
- Shard-rebalance auto-recovery
- Worker failure isolation with retry budget
v0.33 — DAG observability + worker-pool quotas
- Per-database worker quotas keep one busy database from starving the rest (SCALING)
- New cluster-wide health view (MULTI_DATABASE)
v0.32 — Citus distributed sources & outputs
- Stream tables can read from Citus-distributed source tables
output_distribution_columnproduces co-located distributed stream tables
v0.28 — Transactional Outbox & Inbox
- First-class outbox and inbox patterns built on stream tables (OUTBOX · INBOX)
- Consumer groups, lag tracking, and dead-letter queues out of the box
v0.27 — Snapshots & SLA-based scheduling
- Snapshots of stream-table contents — point-in-time copies for backup, replica bootstrap, and rollback
recommend_scheduleand predicted-SLA-breach alerts- PITR alignment guidance for replica bootstrap
v0.22 — Downstream Publications
- Any stream table can be exposed as a PostgreSQL logical publication. Debezium, Kafka Connect, Spark Structured Streaming, a downstream PostgreSQL replica — all subscribe to pg_trickle's incrementally-computed diffs without extra pipelines (PUBLICATIONS)
set_stream_table_slaintroduces freshness deadlines
v0.14 — AUTO mode by default + ergonomic warnings
refresh_mode = 'AUTO'is the new defaultcreate_stream_tablewarns on common anti-patterns (low-cardinality aggregates, non-deterministic queries)
v0.13 — Delta SQL profiling
pgtrickle.explain_delta,dedup_stats,shared_buffer_stats— visibility into what the engine is actually doing per refresh
v0.12 — Tiered scheduling on by default
- Hot/Warm/Cold/Frozen tiers, enabled by default, dramatically reduce scheduler overhead at scale (Tiered Scheduling tutorial)
v0.10 — Production-readiness floor
- Crash recovery, fuse circuit breaker, monitoring views, structured errors with SQLSTATE codes (ERRORS · TROUBLESHOOTING)
v0.9 — Algebraic aggregate maintenance
AVG,STDDEV, andCOUNT(DISTINCT)maintained from auxiliary state — no group-rescan needed in the common case
v0.7 — Watermarks and circular DAGs
- Watermark gating for ETL pipelines
- Monotone cycles supported with explicit
pg_trickle.allow_circular(Circular Dependencies) - Prometheus / Grafana observability (integrations/prometheus)
v0.6 — Partitioned source tables and idempotent DDL
- Stream tables can now read from partitioned source tables; all partitions are tracked automatically without extra configuration
create_stream_tableanddrop_stream_tableare idempotent — safe to call from migration scripts andIF NOT EXISTSguards- Circular dependency detection with a hard gate: cycles in the DAG raise a clear error with the offending chain listed
v0.5 — Row-level security and ETL bootstrap gating
- RLS policies on source tables are respected during the defining query's first FULL refresh; incremental refreshes maintain the same visibility contract
- ETL bootstrap gate: a stream table can be held in SUSPENDED state until an external ETL load completes, then released atomically
pgtrickle.pgt_status()view expanded with per-table health indicators
v0.4 — Parallel refresh
parallel_refresh_mode = 'on'dispatches independent stream tables across a worker pool (SCALING)
v0.3 — HAVING, FULL OUTER JOIN, and correlated subqueries
HAVINGclauses are now maintained differentially — no more falling back to FULL refresh when a GROUP BY result is post-filteredFULL OUTER JOINsupported in DIFFERENTIAL mode using an 8-part UNION ALL delta strategy- Correlated subqueries in the SELECT-list maintained with a pre/post snapshot EXCEPT ALL diff
v0.2 — IMMEDIATE mode + TopK
IMMEDIATErefresh mode: maintain stream tables inside the source DML's transaction- TopK stream tables:
ORDER BY x LIMIT N ALTER QUERY— change the defining query online
v0.1 — Differential foundation
- Trigger-based CDC captures every INSERT, UPDATE, and DELETE into per-table change buffers within the source DML transaction — zero committed-change loss
- Differential (incremental) and full refresh, with automatic fallback when a query is not IVM-eligible
- Background scheduler with per-database workers
- Initial monitoring views:
pgt_stream_tables,pgt_refresh_history
See also: Changelog (full detail) · Roadmap (what's coming)
Viewing on GitHub? The rendered changelog lives in CHANGELOG.md. This stub is served by the pg_trickle docs site — the include below renders there.
Changelog
What's new in pg_trickle — written for everyone, not just developers.
For future plans and upcoming features, see ROADMAP.md.
Table of Contents
- 0.59.0 — Performance & Observability
- 0.58.0 — Security & Correctness Hardening
- 0.57.0 — Documentation Excellence
- 0.56.0 — Documentation Foundation
- 0.55.0 — Final Pre-1.0 Polish
- 0.54.0 — DVM Engine Hardening
- 0.53.0 — Unit Test Depth Sweep
- 0.52.0 — DVM Hot-Path Performance
- 0.51.0 — Citus Chaos Resilience & Documentation Truth
- 0.50.0 — Performance, Security & Operational Hardening
- 0.49.1 — Repository Migration to trickle-labs/pg-trickle
- 0.49.0 — Test Infrastructure Hardening & Scheduler Decomposition
- 0.48.0 — Complete Embedding Programme: Hybrid Search, Sparse Vectors & Ergonomic API
- 0.47.0 — Embedding Pipeline Infrastructure & ANN Maintenance
- 0.46.0 — Extract
pg_tide: Standalone Outbox, Inbox & Relay - 0.45.0 — Operational Readiness, Scalability & CI Completeness
- 0.44.0 — Security Hardening & Code Quality
- 0.43.0 — D+I Change-Buffer Schema, GUC Tuning & WAL Diagnostics
- 0.42.0 — Repair API, Docs Overhaul & Test Infrastructure
- 0.41.0 — DVM Correctness: Structural Cache Keys, Placeholder Safety & WAL Transition Guards
- 0.40.0 — Operator Trust, Maintainability & Release Confidence
- 0.39.0 — Operational Truthfulness & Distributed Hardening
- 0.38.0 — EC-01 Join Correctness Sprint
- 0.37.0 — pgVector Incremental Aggregates & Distributed Trace Propagation
- 0.36.0 — Structural Hardening, Performance & Temporal IVM
- 0.35.0 — Hardening, Reactive Subscriptions & Relay Resilience
- 0.34.0 — Citus: Automated Distributed CDC Scheduler & Shard Recovery
- 0.33.0 — Citus: Distributed Source CDC & Stream Tables
- 0.32.0 — Citus: Stable Naming & Per-Source Frontier Foundation
- 0.31.0 — Performance & Scheduler Intelligence
- 0.30.0 — Pre-GA Correctness & Stability Sprint
- 0.29.0 — Relay CLI (pgtrickle-relay)
- 0.28.0 — Transactional Inbox & Outbox Patterns
- 0.27.0 — Operability, Observability & DR
- 0.26.0 — Test & Concurrency Hardening
- 0.25.0 — Scheduler Scalability & Pooler Performance
- 0.24.0 — Join Correctness & Durability Hardening
- 0.23.0 — Performance Tuning & Diagnostics
- 0.22.0 — Downstream CDC, Parallel Refresh & Predictive Cost Model
- 0.21.0 — Reliability, Safety & Operational Tools
- 0.20.0 — Self Monitoring
- 0.19.0 — Security, Scheduler Performance & Operator Convenience
- 0.18.0 — Hardening & Delta Performance
- 0.17.0 — Query Intelligence & Stability
- 0.16.0 — Performance & Refresh Optimization
- 0.15.0 — Interactive TUI, Bulk Create & Runaway-Refresh Protection
- 0.14.0 — Tiered Scheduling, Diagnostics & TUI
- 0.13.0 — Scalability Foundations & Full TPC-H Coverage
- 0.12.0 — Join Correctness, Diagnostics & Reliability
- 0.11.0 — Event-Driven Latency, Chain IVM & Observability Stack
- 0.10.0 — Cloud Deployment, PgBouncer & Query Engine Correctness
- 0.9.0 — Incremental Aggregates & Smarter Scheduling
- 0.8.0 — Backup, Pooler Compatibility & Reliability
- 0.7.0 — Watermark Gating, Circular Pipelines & SQL Broadening
- 0.6.0 — Idempotent DDL, Partitioned Sources & dbt Integration
- 0.5.0 — Row-Level Security, Source Gating & Append-Only Fast Path
- 0.4.0 — Parallel Refresh & Statement-Level CDC Triggers
- 0.3.0 — Incremental Correctness & Security Tooling
- 0.2.3 — Per-Table CDC Mode & WAL Lag Monitoring
- 0.2.2 — AUTO Refresh Mode & Query Alteration
- 0.2.1 — Safe Upgrades & Scheduling Improvements
- 0.2.0 — Monitoring, IMMEDIATE Mode & Diamond Consistency
- 0.1.3 — TPC-H Correctness, Window Functions & Aggregate Fixes
- 0.1.2 — Incremental Correctness Fixes & Project Rename
- 0.1.1 — CloudNativePG Image & Test Hardening
- 0.1.0 — Initial Release
[0.59.0] — Performance & Observability
What's New
v0.59.0 delivers seven hot-path performance improvements and six new
observability features. No behaviour-visible SQL API changes are made; the only
schema change is a new defining_query_hash catalog column used internally.
PERF-1: Batched CDC Buffer-Growth Monitoring
check_change_buffer_sizes() previously issued one SELECT count(*) SPI call
per source table, proportional to the number of CDC-enabled stream tables.
It now builds a single UNION ALL query and executes it in one SPI round-trip,
reducing latency and lock overhead for deployments with many stream tables.
PERF-2: Defining-Query Hash Cached in Catalog
A new defining_query_hash BIGINT column on pgtrickle.pgt_stream_tables caches
the Rust DefaultHasher digest of each stream table's defining query. Refresh
cycles skip recomputing the hash; any ALTER that changes the query updates it
atomically in the same SPI transaction.
PERF-3: Arc Shared Templates
All eight SQL template fields inside CachedMergeTemplate were changed from
String to Arc<str>. Cache reads now clone a reference-counted pointer
instead of copying the string data, reducing heap allocations on every cache hit.
PERF-4: Single MERGE_TEMPLATE_CACHE Borrow
The two consecutive MERGE_TEMPLATE_CACHE.with() calls that were needed to
check both the non_monotonic flag and the is_deduplicated flag have been
merged into a single peek() call that returns both values in one borrow, halving
the thread-local lock traffic on the hot cache-hit path.
PERF-5: WAL Decoder UPDATE Vec Pre-Allocation
The five Vec accumulators in the WAL decoder's UPDATE-row handler now call
Vec::with_capacity(num_columns) up front, eliminating the incremental
reallocations that previously occurred for each column.
PERF-6: Frontier Borrow Instead of Clone
has_stream_table_source_changes() cloned the entire Frontier (a
HashMap<Oid, Lsn>) when no frontier was stored yet. It now borrows a static
empty Frontier via Frontier::empty_ref(), avoiding the allocation on every
scheduler tick for stream tables with no CDC sources.
PERF-7: Diamond Detection Short-Circuit
detect_diamonds() in the DAG module now performs a lazy .next().is_some()
intersection check before collecting the full shared-ancestor list. Branches that
share no ancestors — the common case — exit immediately without allocating the
result Vec.
OBS-1: CDC Lag Percentile Metrics
A ring-buffer sampler (CdcLagSampler, 256 slots, protected by PgLwLock)
records CDC-to-refresh lag in milliseconds. Three new Prometheus gauges expose
rolling percentiles: pg_trickle_cdc_lag_p50_seconds,
pg_trickle_cdc_lag_p95_seconds, and pg_trickle_cdc_lag_p99_seconds.
OBS-2: Parallel Worker Utilisation Metrics
Two new counters make pool-worker pressure visible:
pg_trickle_parallel_queue_depth— jobs currently waiting for a free workerpg_trickle_worker_idle_time_seconds_total— cumulative idle time across all workers
OBS-3: WAL Decoder Pending-Record Gauge
pg_trickle_wal_decoder_pending_records reports the number of logical-replication
records buffered in the last WAL poll that have not yet been written to the CDC
change buffer, useful for detecting WAL consumer backpressure.
OBS-4: Refresh Mode Ratio Counters
pg_trickle_refresh_mode_total{mode="differential"} and
pg_trickle_refresh_mode_total{mode="full"} count every refresh cycle by mode.
The ratio surfaces differential-to-full degradation before it impacts latency.
OBS-5: pg_stat_activity Application Names
Every background-worker connection now sets application_name immediately after
connecting to SPI, making pg_trickle workers trivially identifiable in
pg_stat_activity:
| Connection | application_name |
|---|---|
| Database-discovery launcher | pg_trickle_launcher |
| Per-database scheduler | pg_trickle_scheduler |
| Parallel refresh pool worker (N) | pg_trickle_pool_N |
| Parallel refresh dispatcher | pg_trickle_dispatcher |
OBS-6: Backup & Restore Documentation
INSTALL.md now includes a dedicated Backup & Restore section explaining
which schemas to include in pg_dump, how to validate catalog integrity after
restore with pgtrickle.health_check(), and how to handle OID re-assignment
with repair_stream_table().
Upgrade
ALTER EXTENSION pg_trickle UPDATE TO '0.59.0';
The upgrade script adds the defining_query_hash column with DEFAULT 0.
Existing stream tables will recompute their hash on the next refresh and write
it back via ALTER STREAM TABLE — no manual intervention is needed.
[0.58.0] — Security & Correctness Hardening
What's New
v0.58.0 closes all HIGH-severity findings from the v0.57.0 overall assessment (Report 12). No new SQL API surface is added — every change is a targeted security fix or correctness fix.
SEC-1/2: Ownership Checks for Outbox and Publication APIs
attach_outbox(), detach_outbox(), attach_embedding_outbox(),
stream_table_to_publication(), and drop_stream_table_publication() now call
check_stream_table_ownership() immediately after resolving stream table metadata.
Previously, any role with EXECUTE on the pgtrickle schema could attach an
outbox or create a publication for a stream table owned by a different role.
Non-owner callers now receive ERROR: must be owner of stream table.
COR-1: Multi-Column NOT IN + NULL Row Handling
The v0.55.0 multi-column IN rewrite now detects NULL constants on either
side of the row constructor in NOT IN expressions. When detected, the AntiJoin
rewrite is skipped and the original subquery-based execution path is used,
emitting a diagnostic NOTICE. See LIMITATIONS.md for
details.
COR-2: Recursive-CTE Depth Guard in DIFFERENTIAL Mode
pg_trickle.ivm_recursive_max_depth GUC now applies consistently to both
DIFFERENTIAL and IMMEDIATE modes. Previously only IMMEDIATE mode enforced the
depth limit.
COR-3: WAL Decoder TOCTOU Advisory Lock
poll_source_changes() now acquires a pg_advisory_xact_lock keyed on the
source OID before calling poll_wal_changes(), serialising the eligibility
check and WAL consumption into an atomic unit.
COR-4: Compact-Buffer Lock Contention Is Observable
compact_change_buffer() now returns CompactionResult::Contended instead of
Ok(0) when it cannot acquire the advisory lock, increments the new shared-memory
counter pg_trickle_cdc_compact_contended_total, and exposes it via the
Prometheus /metrics endpoint.
SEC-3: DDL Hook Escalates on SPI Failure
handle_alter_table() now retries find_downstream_pgt_ids() once on SPI error
and, if the retry also fails, raises pgrx::error!() to block the originating
ALTER TABLE rather than silently returning.
SEC-4: Schema Identifier Quoted in CDC Buffer Names
buffer_qualified_name_for_oid() now uses sql_builder::qualified() to properly
quote the schema identifier in the change-buffer table path.
Upgrade Notes
No SQL schema changes. No ALTER EXTENSION migration is required.
[0.57.0] — Documentation Excellence
What's New
v0.57.0 completes the Documentation Excellence Arc. It delivers four new end-to-end tutorials, resolves all P2/P3 quality gaps from the Round 2 documentation audit, and applies a full consistency pass across all 83 documentation files.
New tutorials (P1):
docs/tutorials/FIRST_DASHBOARD.md— Build a real-time analytics dashboard backend over an e-commerce dataset: revenue by region, hourly order counts, top-10 products chain, and optional Grafana integration.docs/tutorials/EVENT_SOURCING.md— Stream tables as CQRS read-model projections over an event-sourced write model: current order state, customer lifetime value, and inventory levels maintained incrementally.docs/tutorials/BACKFILL_AND_MIGRATION.md— Zero-downtime migration fromREFRESH MATERIALIZED VIEWto a stream table: pre-migration assessment,validate_query()check, parallel running, verification, cutover, and rollback.docs/tutorials/SECURITY_HARDENING.md— Role separation, CDC trigger ownership, change-buffer protection, and audit logging; copy-paste SQL templates for all GRANT statements and a verification checklist.
Quality improvements (P2):
docs/SECURITY_GUIDE.md: Added "Copy-Paste Templates" section withCREATE ROLEandGRANTstatements forpgtrickle_admin,pgtrickle_user, andpgtrickle_readonly.docs/WHATS_NEW.md: Backfilled user-impact summaries for v0.1 through v0.7.docs/tutorials/HYBRID_SEARCH_PATTERNS.md: Expanded patterns 2 (RLS-scoped) and 3 (tiered storage) to match quality of pattern 1; documentedpg_trickle.enable_vector_aggGUC.docs/tutorials/PER_TENANT_ANN_PATTERNS.md: Documentedpartition_key => 'HASH:<col>:<buckets>'syntax with a partition-count guide; expanded patterns 2–3 with full step-by-step examples.docs/QUICKSTART_5MIN.md: Fixed display-text inconsistency on Installation link.docs/PERFORMANCE_COOKBOOK.md: Added three worked examples to §13: (a)max_diff_cteshit and recovery, (b) detecting when FULL beats DIFFERENTIAL viarecommend_refresh_mode(), (c) deep-join chain andmax_differential_joins.docs/SECURITY_MODEL.md: Resolved supply-chain TODO items — filled current status or marked "Planned for v1.0" with implementation notes.
Polish (P3):
docs/FAQ.md: Converted plain-text GUC cross-references to markdown links pointing toCONFIGURATION.mdanchors; added link toSQL_REFERENCE.md.docs/DVM_OPERATORS.md: Added quick-reference table at the top (operator name, mode support, section anchor).docs/tutorials/VECTOR_RAG_STARTER.md: Added full parameter breakdown forpgtrickle.embedding_stream_table()with a parameter table and examples.docs/tutorials/tuning-refresh-mode.md: Added prose explanation of composite score thresholds (+0.15/−0.15) and dead-zone tuning.docs/research/multi_db_refresh_broker.md: Added implementation status banner.
Consistency pass (DOC-CONS-28..31):
- Terminology sweep: enforced
stream table,differential refresh,change buffer,refresh frontier,CDC,DVM,DAGacross all 83 docs files. - Capitalisation sweep: enforced
pg_tricklelowercase,PostgreSQL(notPostgres),pgtrickleschema,pgrxlowercase. - Code style sweep: SQL keywords uppercase;
pgtrickle.prefix on all function calls; language hints added to unlabelled code blocks. - Cross-link audit: verified all internal
[text](path.md)links; fixed 7 broken links (USE_CASES.md,integrations/multi-tenant.md, and addeddocs/ESSENCE.mdmdbook include).
[0.56.0] — Documentation Foundation
What's New
v0.56.0 is the first release of the Documentation Excellence Arc, resolving all findings from the Round 2 documentation audit (2026-05-11). It fixes three P0 blockers, completes two reference documents, and adds three new conceptual guides that bring the documentation to world-class standard before v1.0.
P0 fixes (breaking inaccuracies):
- Fixed
scripts/gen_catalogs.py: GUC names now correctly resolve topg_trickle.*names instead of(registration pending — PGS_*). Rust types are converted to PostgreSQL type names (int4,float8,text). Stale garbage rows at the end ofGUC_CATALOG.mdare eliminated. The catalog now shows all 115 GUCs with correct names and types. - Fixed
docs/CONFIGURATION.md:pg_trickle.parallel_refresh_modenow correctly documents its default as'on'(changed from the stale'off'which was the pre-v0.11.0 default). - Completed
docs/ERRORS.md: Added documentation for 18 previously missing error variants across 6 new categories (Publication, SLA, CDC, Diagnostic, Snapshot, Outbox/pg_tide, Placeholder, DVM engine). All 39PgTrickleErrorvariants are now documented with SQLSTATE codes, descriptions, causes, and fixes.
Reference completeness:
docs/SQL_REFERENCE.md: Added working code examples for all 10 outbox/inbox consumer API functions (poll_outbox,commit_offset,extend_lease,seek_offset,consumer_heartbeat,consumer_lag,drop_consumer_group,outbox_rows_consumed,replay_inbox_messages,inbox_ordering_gaps).docs/SQL_REFERENCE.md: Added full column-schema tables for all 7 previously undocumented catalog tables (pgt_outbox_config,pgt_consumer_groups,pgt_consumer_offsets,pgt_consumer_leases,pgt_inbox_config,pgt_inbox_ordering_config,pgt_inbox_priority_config).docs/research/: Added standalone 3-paragraph abstracts to the three previously stub-only research documents (CUSTOM_SQL_SYNTAX.md,PG_IVM_COMPARISON.md,TRIGGERS_VS_REPLICATION.md).docs/DVM_REWRITE_RULES.md: Added concrete before/after SQL examples for all 5 rewrite passes (view inlining, grouping sets expansion, EXISTS→anti/semi-join, scalar sublink hoisting, delta key restriction).docs/introduction.md: Added 3 paragraphs explaining how pg_trickle works conceptually (CDC → delta SQL → MERGE cycle), plus a link to INSTALL.md.
New documents:
docs/MENTAL_MODEL.md: 8-section conceptual guide for developers who know SQL but not IVM. Covers the problem of full recomputation, delta semantics, change capture, delta SQL generation, algebraic operator classification, row identity, the refresh cycle, and DAG chaining.docs/LIMITATIONS.md: Comprehensive reference of unsupported SQL constructs, DIFFERENTIAL mode constraints, source table restrictions, operational anti-patterns, and a "Will this work?" decision tree.docs/PERFORMANCE_CHEATSHEET.md: Single-page quick reference with the three golden rules, top-10 GUC quick wins, 5 FULL-fallback patterns with rewrites, and refresh latency diagnostics.
Upgrade Notes
No SQL migration is required. Run ALTER EXTENSION pg_trickle UPDATE TO '0.56.0'
or reinstall from packages. All changes are documentation and tooling only.
After upgrading, regenerate docs/GUC_CATALOG.md with:
python3 scripts/gen_catalogs.py
[0.55.0] — Final Pre-1.0 Polish
What's New
v0.55.0 is a focused polish release that lowers technical debt and improves observability ahead of the 1.0 stable label. All nine milestones deliver better diagnostics, cleaner code structure, and more operator-friendly documentation — without any SQL schema changes.
Changes
-
M-1 — Wider invalidation ring (
shmem.rs,config.rs): Maximum ring capacity raised from 1 024 to 4 096; the GUC default is now 1 024 so deployments with many concurrent stream tables no longer drop events. -
M-2 — API module decomposition (
src/api/):api/mod.rssplit intocreate.rs,alter.rs, andrefresh_ops.rs. Each sub-module is now independently readable and testable. -
M-3 — Monitor module decomposition (
src/monitor/):monitor.rssplit intoalert.rs,health.rs, andtree.rs. Alert emission, health checks, and DAG tree rendering are now in separate, focused units. -
M-4 — Structured NOTIFY payloads: All
pg_notifycalls now emit structuredserde_jsonvalues instead of hand-built strings, making it easier to parse alert events in downstream consumers. -
M-5 — Multi-column
INrewrite (src/dvm/parser/sublinks.rs): Row expressions and multi-target sub-selects inIN/NOT INpredicates are now automatically rewritten to AND-chained equality rather than returning an unsupported-syntax error. -
M-6 — DVM parse metrics (
src/shmem.rs,src/dvm/mod.rs): Two new shared-memory counters track cumulative DVM parse time (pg_trickle_dvm_parse_ms) and total delta SQL template size (pg_trickle_delta_query_size_bytes). Both are exposed via the Prometheus/metricsendpoint. -
M-7 — Reserved column-name prefix docs (
docs/SQL_REFERENCE.md): New "Reserved Column-Name Prefixes" section documents__pgt_*and__pgs_*internal prefixes and explains the consequences of naming conflicts. -
M-8 — GUC rationale comments (
src/config.rs): Every magic-number GUC default now has an inline comment explaining why that value was chosen and when operators should raise or lower it. -
M-9 — Codecov upload in PR gate (
.github/workflows/ci.yml): The Linux unit-test job now uploads coverage data to Codecov after each run.fail_ci_if_error: falseensures that a Codecov outage never blocks merges.
Upgrade
No SQL migration is required. Run ALTER EXTENSION pg_trickle UPDATE TO '0.55.0'
or reinstall to pick up the new extension version string.
[0.54.0] — DVM Engine Hardening
What's New
v0.54.0 hardens the DVM (Differential View Maintenance) engine across seven dimensions: depth-limit enforcement, CTE-count cap, snapshot fingerprint caching, expression visitor pattern, view-inlining relkind cache, upstream frontier validation, and O(V+E) diamond detection. Every change is targeted at correctness and performance; no user-visible API surface changes.
Changed
C-7: diff_node() Recursion Depth Guard
diff_node() in src/dvm/diff.rs now enforces a hard depth limit drawn from
the pg_trickle.max_parse_depth GUC (default 64). Exceeding the limit returns
a new PgTrickleError::DiffDepthExceeded(limit) error with a user-actionable
hint instead of overflowing the call stack.
R-7: DiffContext CTE Count Cap (OOM Guard)
DiffContext tracks the number of CTEs emitted during a single differentiation
pass. When the count reaches the new pg_trickle.max_diff_ctes GUC (default
1000, range 10–100 000), diff_node() returns
PgTrickleError::DiffCteCountExceeded(limit) before allocating further
memory. This prevents pathological queries from exhausting server memory.
P-4: Snapshot Fingerprint Two-Level Cache
get_or_register_snapshot_cte() now uses a two-level cache: a fast pointer
identity check (same OpTree node, O(1)) and a structural fingerprint check
(equal subtrees, O(k)). Identical subtrees share a single CTE, eliminating
redundant snapshot SQL generation for diamond-shaped query plans.
P-5: Expr::to_sql() Visitor Pattern
Expr::to_sql() now delegates to a new to_sql_into(&self, buf: &mut String)
method that writes SQL directly into a pre-allocated buffer using push/
push_str. Intermediate heap allocations for nested expressions are
eliminated, reducing allocation pressure on large queries.
P-6: View-Inlining Relkind Cache
rewrite_views_inline_once() now passes a mutable HashMap<(schema,name), Option<relkind>> through the call chain. Each relkind lookup is cached for
the duration of the rewrite pass, preventing repeated SPI catalog queries for
the same relation within a single inlining iteration.
C-4: Upstream Stream-Table Frontier Validation
generate_delta_query() now validates that every upstream stream-table source
referenced in a query has a corresponding entry in the provided refresh
frontier. Missing entries return PgTrickleError::StSourceFrontierMissing
with a clear message and the affected pgt_id, allowing the scheduler to
reinitialize rather than silently producing incorrect delta results.
S-1: O(V+E) Diamond Detection
detect_diamonds() in src/dag.rs previously called collect_ancestors()
per fan-in branch (O(V) per branch, O(V²) total for dense graphs). It now
calls the new compute_all_ancestors() which traverses the DAG once in forward
topological order, building all ancestor sets in O(V+E) total work. Per-branch
ancestor lookup is then O(1) via the precomputed map.
[0.53.0] — Unit Test Depth Sweep
What's New
v0.53.0 fills the unit-test coverage gaps identified in the v0.51.0 overall
assessment (Report 11, findings T-2 through T-9). Six scheduler and parser
submodules that previously had zero inline #[test] coverage now each have a
#[cfg(test)] block covering their pure logic. Property-based testing is
extended to the DAG cycle detection and topological sort invariants. Two fixed
sleeps in the buffer-growth E2E tests are replaced with adaptive polling.
Changed
Scheduler Module Unit Tests
Five scheduler submodules previously had zero inline unit tests. New
#[cfg(test)] blocks have been added to:
dispatch.rs—parse_worker_extra(format validation, edge cases, rejected zero/negative job IDs) andcompute_adaptive_poll_ms(exponential backoff, completion reset, no-inflight fast path).pool.rs—pool_size_from_config_value(negative GUC values clamped to zero, positive values preserved).watermark.rs—should_emit_holdback_warningpure rate-limit helper: disabled threshold, age threshold, 60-second cooldown, saturating subtraction on clock skew.citus.rs—record_worker_failure/reset_worker_failurethread-local failure counter: increment, per-key isolation, reset-to-zero, no-op on missing key.scheduler_loop.rs— Structural compile-check test (module contains only BGW entry points; E2E coverage intests/e2e_bgworker_tests.rs).
DVM Parser Unit Tests
dvm/parser/sublinks.rs had zero inline unit tests. New tests cover:
extract_bare_scalar_subquery_sql— parenthesised SELECT, missing parens, whitespace trimming, case-insensitive SELECT detection.is_known_aggregate— known built-ins, statistical, ordered-set, and range aggregates; unknown function names.is_star_only— bare*, qualifiedt.*, empty slice, multi-expression.rewrite_having_expr— COUNT(*) and SUM rewrites, non-matching functions, recursive rewrite inside BinaryOp, literal pass-through.split_exists_correlation— simple equality extraction, non-correlation remaining predicates, AND conjunction splitting.collect_tree_source_aliases— single Scan, InnerJoin, Filter, Subquery.
Proptest Extension (T-2)
Two new proptest! blocks in src/dag.rs:
- Acyclic invariant — randomly generated chain DAGs of length 1–20 always
pass
detect_cycles(). - Cyclic invariant — adding a single back-edge to any chain of length 2–20
is always detected as a cycle by
detect_cycles(). - Topological order invariant — for any acyclic chain,
topological_order()places every upstream node before its downstream successor. - Back-edge invariant — any single back-edge added to an acyclic DAG creates a cycle (parameterised over both chain length and back-edge position).
Buffer-Growth Sleep Removal (T-8, T-9)
tests/e2e_buffer_growth_tests.rs contained two long fixed sleeps in the
sustained-write test:
- 7-second sleep replaced with
db.wait_for_auto_refresh("sustained_st", 30s). - 20-second sleep replaced with
db.wait_for_condition(...)polling until the stream table count matches the source count, with a 60-second cap.
[0.52.0] — DVM Hot-Path Performance
What's New
v0.52.0 eliminates four measurable hot-path costs in the DVM differential refresh pipeline, all identified in the v0.51.0 overall assessment (Report 11).
P-1: O(1) Placeholder Resolution (aho-corasick)
resolve_delta_template() previously called .replace() twice per source
table OID, scanning the full SQL string for each placeholder. For a 10-table
join (~50 KB SQL), this was 20 full-string scans per refresh cycle. v0.52.0
replaces the loop with a single-pass Aho-Corasick
multi-pattern replacer that resolves all __PGS_PREV_LSN_*__ and
__PGS_NEW_LSN_*__ tokens in one traversal — O(template_length) regardless
of the number of source tables.
P-2: Thread-Local Volatility Cache
lookup_function_volatility() and lookup_operator_volatility() previously
issued one SPI round-trip to pg_proc / pg_operator for every function or
operator name encountered during DVM parsing. A query referencing 50 functions
triggered 50 round-trips (~50 ms overhead). v0.52.0 adds thread-local
HashMap<String, char> caches so each name is resolved via SPI at most once
per backend session. The caches are flushed by pgtrickle.clear_caches().
P-3: Lazy DiffContext Allocations
DiffContext::new() previously initialized all maps unconditionally.
agg_sum_coalesce_defaults — only needed for queries with COALESCE-wrapped
aggregates — is now Option<HashMap<String, String>> and allocated lazily on
first use. Simple scan/filter/project queries never allocate it.
P-8: O(1) MERGE Template Cache LRU Eviction
The MERGE template cache previously stored entries in a plain HashMap and
found the least-recently-used entry by scanning all entries for the minimum
last_used counter — O(N) per eviction. v0.52.0 replaces this with
lru::LruCache, which provides O(1) eviction automatically on put().
C-1: Safety Fix in filter.rs HAVING Path
Replaced a bare .expect("BUG: …") in the HAVING-filter delta path with a
proper PgTrickleError::InternalError return. An invariant violation now
returns a clean error rather than crashing the backend.
Upgrade Notes
No SQL schema changes. No configuration changes required.
[0.51.0] — Citus Chaos Resilience & Documentation Truth
Breaking Changes
-
pg_trickle.event_driven_wakehas been removed. This GUC had no effect since v0.39.0 because PostgreSQL'sLISTENcommand is not permitted inside background worker processes. Remove it frompostgresql.confand anyALTER SYSTEMsettings to avoid an "unrecognized configuration parameter" warning on upgrade. No behavioral change — the scheduler always used efficient latch-based polling regardless of this setting. -
pg_trickle.wake_debounce_mshas been removed. This GUC was only meaningful whenevent_driven_wakewas functional (it never was). Remove it frompostgresql.confas well.
What's New
FEAT-10-01: Citus Chaos Test Rig
Three new chaos resilience scenarios for the Citus distributed integration, proving correctness under real production failure modes:
-
CHAOS-5 — Coordinator restart during active refresh: Creates a distributed stream table, starts a refresh, restarts the coordinator mid-flight, and verifies that 5 subsequent cycles produce the correct result with no phantom or missing rows.
-
CHAOS-6 — Worker kill with shard redistribution: Kills a worker node, triggers
rebalance_table_shards(), inserts new rows on the remaining workers, and verifies that DIFFERENTIAL refresh produces a consistent result post-recovery. Asserts that CDC change buffers contain no orphaned records. -
CHAOS-7 — Network partition and recovery: Uses
docker network disconnectto isolate one worker, inserts rows on the remaining workers, reconnects the isolated worker, and verifies that the stream table converges to the correct state within 3 refresh cycles with no data loss.
All three tests are marked #[ignore] and run nightly in the stability-tests.yml
workflow alongside the existing G17-SOAK and G17-MDB tests. Use
just citus-chaos-up && just test-citus-chaos to run them locally.
CQ-10-02: Remove Deprecated event_driven_wake GUC
Removed the non-functional event_driven_wake and wake_debounce_ms GUCs
and all associated dead code paths from the scheduler loop. The code that
emitted a WARNING when event_driven_wake = on is gone. The scheduler log
message at startup no longer includes the GUC value.
DOC-10-01: ARCHITECTURE.md — pg_tide Integration Boundary
Added a new § pg_tide Integration section to docs/ARCHITECTURE.md that
clearly describes the v0.46.0 extraction boundary: what remains in pg_trickle
(attach_outbox() hook, change buffer subscription interface) vs what lives in
the standalone pg_tide extension (outbox, inbox, consumer groups, relay binary).
Updated the module layout diagram to reflect the extraction.
DOC-10-03: ARCHITECTURE.md — Recursive CTE Strategy Selection
Added a new § Recursive CTE Strategy Selection subsection to the DVM Engine
section documenting the five-tier strategy selection logic (Tier 1 inline
expansion → Tier 2 shared delta → Tier 3a semi-naive → Tier 3b DRed → Tier 3c
recomputation), a selection criteria table, observability via
explain_stream_table(), and a concrete Tier 3a example for hierarchical
closure queries.
DOC-10-02 + COR-10-02: Configuration Documentation Truth
- CONFIGURATION.md:
event_driven_wakeandwake_debounce_mssections replaced with clear removal notices. All tuning profiles, interaction matrix entries, and example configs updated to remove these GUCs. - CONFIGURATION.md: Added deprecation
⚠️callouts formerge_planner_hints(accepted, no effect) anduser_triggers = 'on'(deprecated alias for'auto'). - CONFIGURATION.md: Added a Note on CDC triggers to the
pg_trickle.enabledsection explaining that CDC triggers continue to fire when the scheduler is disabled, why this is intentional, and how to fully quiesce CDC overhead during extended maintenance.
[0.50.0] — Performance, Security & Operational Hardening
What's New
PERF-10-01: Batch preflight source-table existence check
- Replaced the N-query per-OID loop in
execute_differential_refreshwith a single batchSELECT ... FROM unnest(ARRAY[oid1, oid2, ...])that returns all source-table existence checks in one SPI round-trip. - Reduces preflight overhead from O(N) queries to O(1) for stream tables with multiple sources.
PERF-10-02: CDC trigger SQL string-building micro-optimisation
build_stmt_trigger_fn_sqlnow usesString::with_capacity+ directpush_strloops instead ofVec<String>::join, eliminating intermediate allocations in the column list builders (cn,ncr,ocr).- Noticeable on high-throughput workloads that re-register triggers frequently.
PERF-10-03: Single-query watermark computation (already present; documented)
- Confirmed that
compute_safe_upper_bound()insrc/cdc.rsalready consolidatespg_current_wal_lsn(),pg_stat_activityxmin probe, andpg_prepared_xactsinto one compound CTESELECT. Added an explanatory comment referencing PERF-10-03.
SEC-10-01: Replace manual SQL string escaping with pg_catalog.quote_literal
- All
dblink(...)call sites insrc/citus.rsnow escape connection strings and remote query strings via a newpg_quote_literal()helper that delegates to PostgreSQL's built-inpg_catalog.quote_literal($1)function. - Eliminates the risk of SQL injection through attacker-controlled hostnames or slot names in Citus distributed setups.
- The manual
.replace('\'', "''")pattern has been removed fromworker_conn_string()and all fourdblinkcall sites.
OPS-10-01: Kubernetes rolling-upgrade drain hook (CNPG)
- Added
lifecycle.preStophook tocnpg/cluster-production.yamlthat runspgtrickle.drain(timeout_s => 120)before CloudNativePG shuts down a primary pod during rolling upgrades. - New
docs/RUNBOOK_DRAIN.mdsection documents the Kubernetes rolling-upgrade procedure and post-upgrade verification steps.
OPS-10-02: Prometheus reliability counters
- Three new shared-memory atomics in
src/shmem.rs:TEMPLATE_CACHE_STALE_EVICTIONS— incremented when a delta template cache entry is evicted because itsdefining_query_hashno longer matches.DAG_CYCLES_DETECTED— incremented each timedetect_cycles()returnsErr(CycleDetected).
src/dvm/mod.rs: Hash-mismatch stale entries are now detected and counted before being evicted fromDELTA_TEMPLATE_CACHE.- New
pgtrickle.reliability_counters()SQL function (insrc/monitor.rs) exposes all three reliability counters as a single-row table. - New
pg_trickle_reliabilityquery block inmonitoring/prometheus/pg_trickle_queries.ymlfor postgres_exporter.
OPS-10-03: Docker base-image digest pinning
- All three Dockerfiles (
Dockerfile.demo,Dockerfile.ghcr,tests/Dockerfile.e2e) now pinpostgres:18.3-bookwormto an exact SHA256 digest, providing supply-chain security and reproducible builds. - New
scripts/update_base_image_digests.shautomates quarterly digest refreshes. CONTRIBUTING.mddocuments the update process.
SCAL-10-01: Invalidation ring capacity documentation
- New
docs/CONFIGURATION.mdsection documentspg_trickle.invalidation_ring_capacity(default 128, hard ceiling 1024), overflow behaviour, the overflow counter, and capacity guidance for deployments with 1,000+ stream tables.
COR-10-01: Deep join chain threshold documentation
- New
docs/CONFIGURATION.mdsection documentspg_trickle.part3_max_scan_count(default 5), the Part 3 threshold trade-off between SQL complexity and delta correctness at depth, and recommendations for ≤6 vs. >6 table join chains.
SQL Upgrade
ALTER EXTENSION pg_trickle UPDATE TO '0.50.0';
[0.49.1] — Repository Migration to trickle-labs/pg-trickle
What's New
Repository Migration
- pg_trickle has moved to its permanent home at trickle-labs/pg-trickle.
- All CI/CD pipelines, Docker image publishing, and release artifacts now originate from the new repository.
- GitHub Container Registry images are published under
ghcr.io/trickle-labs/pg-trickle. - Docker Hub images are published under
tricklehq/pg_trickle. - The PGXN distribution, dbt Hub package, and CloudNativePG plugin listings are updated to reflect the new repository URL.
- No code changes — this is a pure packaging and infrastructure release.
SQL Upgrade
ALTER EXTENSION pg_trickle UPDATE TO '0.49.1';
[0.49.0] — Test Infrastructure Hardening & Scheduler Decomposition
What's New
TEST-10-01: Concurrency Test Synchronization Overhaul
- Replaced all
tokio::time::sleepbusy-waits intests/e2e_concurrent_tests.rswithpg_stat_activity-polling loops that wait until the target query is actually visible before proceeding. - Added
wait_for_active_queryhelper with configurable timeout and a clear failure message so flakiness surfaces as a named error rather than a silent pass. - Affected tests:
test_pb1_concurrent_refresh_skip_locked_no_corruption,test_concurrent_refresh_and_drop,test_conc1_alter_while_refresh,test_conc2_drop_while_refresh.
TEST-10-02: Unit Test Coverage Sweep
- Added
#[cfg(test)]modules tosrc/template_cache.rs,src/cdc/polling.rs, andsrc/cdc/rebuild.rs— modules that previously had zero unit test coverage. - New tests cover hash key derivation and round-trip correctness, CDC trigger naming conventions, CDC mode classification, replica identity sufficiency checks, and cache guard condition logic.
TEST-10-03: Fuzz Targets for Merge Codegen and Row Identity
- Added
fuzz/fuzz_targets/merge_sql_fuzz.rs— fuzzes the merge SQL construction pipeline (pg_quote_literal,parse_hash_bound_spec,extract_keyword_int, amplification ratio,build_content_hash_expr). Validates no panics, UTF-8 output, and deterministic results. - Added
fuzz/fuzz_targets/row_id_fuzz.rs— fuzzes the row identity schema classifier (is_compatible_with,verify_pipeline). Validates reflexivity and that no byte sequence causes a panic. - Both targets registered in
fuzz/Cargo.tomland thejust fuzz-allrecipe.
TEST-10-04: DDL During Concurrent Refresh E2E Test
- Added
test_ddl_during_concurrent_refreshtotests/e2e_concurrent_tests.rs. FiresALTER STREAM TABLEconcurrently with a running refresh and asserts either graceful completion or correct blocking — no torn state.
CI-10-02: Expanded e2e-Smoke Filter
- The PR smoke test now also matches
test_.*join.*,test_.*aggregate.*,test_.*window.*, andtest_.*subquery.*patterns, catching operator-level regressions earlier.
CI-10-03: Consolidated Fuzz Recipe
- Added
just fuzz-allto thejustfile— runs every fuzz target for a configurable duration (default 60 s each). - Documented all fuzz targets and corpus paths in
CONTRIBUTING.md.
CQ-10-01: Scheduler Module Decomposition
src/scheduler/mod.rswas 6,700+ lines. Extracted into three focused submodules:src/scheduler/dispatch.rs— parallel dispatch state, dynamic worker spawn, worker claiming, orphan reaping, adaptive poll-interval logic.src/scheduler/scheduler_loop.rs— BGW registration, launcher main loop, per-database scheduler main loop.src/scheduler/watermark.rs— tick watermark computation, xmin holdback, frontier advance helpers.
mod.rsis now a thin re-export façade. All existing public API is preserved with no behaviour change.
SQL Upgrade
ALTER EXTENSION pg_trickle UPDATE TO '0.49.0';
[0.48.0] — Complete Embedding Programme: Hybrid Search, Sparse Vectors & Ergonomic API
What's New
VH-1: Sparse and Half-Precision Vector Aggregates
avg(halfvec_col)andavg(sparsevec_col)stream tables now produce output columns typedhalfvec(N)andsparsevec(N)respectively — no silent coercion tovectoranymore.- The DVM engine correctly propagates vector type names through
extract_vector_agg_output_dims.
VH-2: Reactive Distance Subscriptions
- New functions:
pgtrickle.subscribe_distance(stream_table, channel, vector_column, query_vector, op, threshold),pgtrickle.unsubscribe_distance(stream_table, channel), andpgtrickle.list_distance_subscriptions(stream_table). - After each refresh, the scheduler fires NOTIFY on registered channels when rows in the storage table satisfy the distance predicate.
VH-3: Hybrid-Search Cookbook
- New doc: docs/tutorials/HYBRID_SEARCH_PATTERNS.md — three hybrid search patterns with worked SQL examples.
VH-4: Vector Benchmark Suite
- New benchmark:
benches/pgvector_bench.rs— measures OpTree construction, AggFunc dispatch, vector string encoding, and drift-detection overhead.
VA-1: embedding_stream_table() Ergonomic API
- New function:
pgtrickle.embedding_stream_table(name, source_table, vector_column, extra_columns, refresh_interval, index_type, dry_run). - Automatically generates a stream table, creates an HNSW or IVFFlat index, and configures post-refresh drift monitoring.
dry_run => truereturns the generated SQL without executing it.
VA-2: Materialised k-NN Graph Research
- New doc: docs/research/KNN_GRAPH_TRADEOFFS.md — storage/latency/maintenance analysis for materialised k-NN graphs.
VA-3: Multi-Tenant ANN Patterns
- New doc: docs/tutorials/PER_TENANT_ANN_PATTERNS.md — per-tenant ANN stream tables with RLS, tenant isolation, and security checklist.
VA-4: Embedding Outbox
- New function:
pgtrickle.attach_embedding_outbox(stream_table, vector_column, retention_hours, inline_threshold_rows). - Extends outbox events with
event_type: "embedding_change"and thevector_columnname in event headers.
VA-5: Vector RAG Starter Guide
- New doc: docs/tutorials/VECTOR_RAG_STARTER.md — quick-start guide for building a RAG pipeline with pg_trickle and pgvector.
SQL Upgrade
ALTER EXTENSION pg_trickle UPDATE TO '0.48.0';
Direct upgrade scripts are provided from v0.40.0 onward.
[0.47.0] — Embedding Pipeline Infrastructure & ANN Maintenance
⚠ Upgrade support policy change (v0.47.0+)
Starting from v0.47.0, pg_trickle provides direct upgrade scripts only for v0.40.0 and later. If you are running v0.39.0 or older, you must first upgrade to v0.40.0 before upgrading to v0.47.0 or later:
-- Users on v0.39.x or older: upgrade to v0.40.0 first ALTER EXTENSION pg_trickle UPDATE TO '0.40.0'; -- Then upgrade to the latest version ALTER EXTENSION pg_trickle UPDATE;Both steps can be issued in the same session. PostgreSQL handles the intermediate chain automatically. Users already on v0.40.0 or later are unaffected — a single
ALTER EXTENSION pg_trickle UPDATEis all that is needed.
v0.47.0 resumes the deferred embedding programme with post-refresh action hooks, drift-based HNSW reindex scheduling, vector-aware monitoring, and the pgvector RAG cookbook.
Post-Refresh Actions (VP-1)
Stream tables can now specify what happens after a successful refresh that produces changed rows:
-- Run ANALYZE after each refresh (keep statistics fresh)
SELECT pgtrickle.alter_stream_table(
'embedding_store',
post_refresh_action => 'analyze'
);
-- Always REINDEX the storage table after each refresh
SELECT pgtrickle.alter_stream_table(
'embedding_store',
post_refresh_action => 'reindex'
);
-- REINDEX only when the drift threshold is exceeded
SELECT pgtrickle.alter_stream_table(
'embedding_store',
post_refresh_action => 'reindex_if_drift',
reindex_drift_threshold => 0.20 -- 20% of rows changed
);
The action runs outside the refresh transaction so it does not add latency
to the critical refresh window. The four supported values are none (default),
analyze, reindex, and reindex_if_drift.
Drift Detection (VP-2)
Two new catalog columns track ANN index freshness:
rows_changed_since_last_reindex— running count of rows changed since the last REINDEX, reset to 0 after each successful REINDEX.last_reindex_at— timestamp of the last pg_trickle-triggered REINDEX.
A new GUC pg_trickle.reindex_drift_threshold (default 0.20) sets the global
default fraction; per-table overrides via reindex_drift_threshold take
precedence.
Vector Status View (VP-3)
SELECT * FROM pgtrickle.vector_status();
Returns one row per stream table with a non-none post_refresh_action:
| Column | Description |
|---|---|
name | Schema-qualified stream table name |
post_refresh_action | Configured action |
reindex_drift_threshold | Per-table threshold (NULL = global GUC) |
rows_changed_since_last_reindex | Rows changed since last REINDEX |
last_reindex_at | When the last REINDEX completed |
data_timestamp | When the stream table data was last updated |
embedding_lag | Interval since last refresh |
estimated_rows | PostgreSQL reltuples estimate |
drift_pct | Percentage of rows changed (NULL if no estimate available) |
pgvector RAG Cookbook (VP-4)
docs/tutorials/PGVECTOR_RAG_COOKBOOK.md — copy-paste patterns for:
- Pre-computed embeddings with always-fresh search corpus
- Tenant-isolated embedding corpus with RLS
- Drift-aware HNSW reindexing
- Centroid maintenance for cluster-aware search
- Operational sizing guidance and monitoring queries
New SQL Functions
pgtrickle.vector_status()— embedding lag, ANN age, drift percentage
New Catalog Columns
pgtrickle.pgt_stream_tables:
post_refresh_action TEXT NOT NULL DEFAULT 'none'reindex_drift_threshold DOUBLE PRECISIONrows_changed_since_last_reindex BIGINT NOT NULL DEFAULT 0last_reindex_at TIMESTAMPTZ
New GUCs
pg_trickle.reindex_drift_threshold(default:0.20) — global default drift fraction for drift-triggered REINDEX
Upgrade Notes
Existing stream tables keep post_refresh_action = 'none' after upgrade —
no behaviour change unless explicitly configured.
[0.46.0] — Extract pg_tide: Standalone Outbox, Inbox & Relay
v0.46.0 is a focused extraction release. The full transactional outbox, inbox,
and relay subsystem (~6,150 Rust LOC + ~2,500 SQL LOC) has been moved to the
new standalone pg_tide extension (trickle-labs/pg-tide). pg_trickle now
ships exactly one thing: incremental view maintenance.
The only remaining integration point is attach_outbox(), which registers a
pg_tide outbox for a stream table. After attachment, every non-empty refresh
calls tide.outbox_publish() inside the same transaction — preserving the
ADR-001/ADR-002 single-transaction atomicity guarantee.
New SQL Functions
-
TIDE-7:
pgtrickle.attach_outbox(stream_table, retention_hours=>24, inline_threshold_rows=>10000)— requirespg_tideto be installed; callstide.outbox_create()and registers the mapping inpgtrickle.pgt_outbox_config. Every subsequent non-empty refresh writes a delta-summary row to thepg_tideoutbox inside the same transaction. -
TIDE-7:
pgtrickle.detach_outbox(stream_table, if_exists=>false)— removes thepgt_outbox_configentry. Thepg_tideoutbox table itself is NOT dropped; usetide.outbox_drop()inpg_tideafter detaching to also remove the outbox data.
Removed SQL Functions
The following functions were moved to pg_tide (trickle-labs/pg-tide):
Outbox & Consumer Groups:
enable_outbox, disable_outbox, outbox_status, outbox_rows_consumed,
create_consumer_group, drop_consumer_group, poll_outbox, commit_offset,
extend_lease, seek_offset, consumer_heartbeat, consumer_lag
Inbox:
create_inbox, drop_inbox, enable_inbox_tracking, inbox_health,
inbox_status, replay_inbox_messages, enable_inbox_ordering,
disable_inbox_ordering, enable_inbox_priority, disable_inbox_priority,
inbox_ordering_gaps, inbox_is_my_partition
Relay:
set_relay_outbox, set_relay_inbox, enable_relay, disable_relay,
delete_relay, get_relay_config, list_relay_configs
Removed Catalog Tables
Dropped as part of the extraction: relay_outbox_config, relay_inbox_config,
relay_consumer_offsets, pgt_inbox_config, pgt_inbox_ordering_config,
pgt_inbox_priority_config, pgt_consumer_groups, pgt_consumer_offsets,
pgt_consumer_leases. The pgtrickle_relay role is also dropped.
pgtrickle.pgt_outbox_config is replaced with a slim integration schema.
GUC Changes
The following GUCs are removed (all moved to pg_tide):
pg_trickle.outbox_enabled, pg_trickle.outbox_retention_hours,
pg_trickle.outbox_drain_batch_size, pg_trickle.outbox_inline_threshold_rows,
pg_trickle.outbox_drain_interval_seconds, pg_trickle.outbox_storage_critical_mb,
pg_trickle.outbox_skip_empty_delta, pg_trickle.outbox_force_retention,
pg_trickle.inbox_enabled, pg_trickle.inbox_processed_retention_hours,
pg_trickle.inbox_dlq_retention_hours, pg_trickle.inbox_drain_batch_size,
pg_trickle.inbox_drain_interval_seconds, pg_trickle.inbox_dlq_alert_max_per_refresh,
pg_trickle.consumer_dead_threshold_hours
Upgrade Notes
Run pg_trickle--0.45.0--0.46.0.sql to drop all removed objects and migrate
pgt_outbox_config to the new schema. Base outbox payload tables
(pgtrickle.outbox_<st>) are not dropped — they remain for manual data
migration to pg_tide. See the pg_tide repository for migration guidance.
New: pg_tide Extension
The extracted functionality is now available as pg_tide, a standalone
PostgreSQL extension at https://github.com/trickle-labs/pg-tide. It includes:
- Transactional outbox with claim-check mode
- Idempotent inbox with DLQ, priority, and ordering
- The
pg-tiderelay binary (NATS, Kafka, SQS, webhooks, stdout) - Consumer group API (poll, commit, heartbeat, lag)
[0.45.0] — Operational Readiness, Scalability & CI Completeness
v0.45.0 is an operational and CI maturity release. It adds a first-class
preflight() health-check function, enhances the worker pool status view,
makes the invalidation ring capacity configurable, adds lag-aware scheduling,
introduces incremental DAG rebuild for faster event propagation, completes
dbt macro option parity, and substantially tightens CI coverage.
New SQL Functions
- A46-4:
pgtrickle.preflight()— returns a JSON health report with 7 system checks:shared_preload_librariespresence, scheduler running,max_worker_processessufficiency,wal_levelfor WAL-CDC, replication slots availability, invalidation ring overflow count, and Citus worker failure total. Run this after install or after configuration changes to verify the environment is ready.
Enhanced SQL Functions
- A46-5:
pgtrickle.worker_pool_status()gains four new columns:idle_workers(free slots),last_scheduler_tick_unix(Unix timestamp of last scheduler wake),ring_overflow_count(invalidation ring overflows since startup), andcitus_failure_total(Citus worker failures logged).
Configuration (GUCs)
- A46-7: New GUC
pg_trickle.invalidation_ring_capacity(integer, default 128, max 1024, postmaster scope). Configures the in-memory invalidation ring used for cross-backend event propagation. Requires a PostgreSQL restart when changed. - A46-10: New GUC
pg_trickle.lag_aware_scheduling(boolean, default false, superuser scope). When enabled, the per-database refresh quota is boosted proportionally to refresh lag (up to 2×), accelerating catch-up without starving other databases.
Performance
- A46-9: Incremental DAG schedule re-resolution — when upstream CDC events
affect a subset of stream tables, the scheduler now recomputes only the
affected CALCULATED-schedule nodes (O(affected)) instead of the full DAG
(O(V)). Falls back to full resolution if more than 25% of the DAG is
affected. The new
resolve_calculated_schedule_incremental()method is benchmarked inbenches/scheduler_bench.rs. - A46-11: Citus worker failure counter persisted in shared memory
(
pg_trickle_citus_fail_total), visible viaworker_pool_status(). The counter increments when a Citus worker crosses the failure threshold, enabling operational dashboards to track distribution health over time.
Observability & Deployment
- A46-1/A46-2:
Dockerfile.hub,Dockerfile.ghcr, andDockerfile.demonow carry the correct defaultARG VERSION=0.45.0and aHEALTHCHECKdirective (pg_isready) for Docker Compose and Kubernetes readiness probes. - A46-3:
cnpg/cluster-dev.yaml(single-instance) andcnpg/cluster-production.yaml(3-node HA) added as ready-to-use CloudNativePG cluster manifests, including the worker budget formulamax_worker_processes = 8 + (2 × num_databases) + worker_pool_size. - A46-6:
monitoring/production/README.mddocuments least-privilege role setup, TLS Prometheus scrape config, Kubernetes ServiceMonitor, and recommended alert thresholds for production deployments. - A46-16:
docs/STORAGE_BACKENDS.md— reference page covering Heap, Unlogged, Citus columnar, and pg_mooncake backends with migration guidance.
CI & Developer Experience
- A46-13: Windows compile failures are now blocking on scheduled CI
runs (removed
continue-on-error: true). A lightweightwindows-compile-gatejob also runs on every PR to catch Windows-specific compile errors early. - A46-14: New
e2e-smokeCI job runs on every PR and push to main. It builds the full E2E Docker image and runs a representative subset of tests (DVM, CDC, scheduler), catching packaging/install regressions faster than the full E2E run (schedule/manual only). - A46-15: The Coverage workflow now runs on a weekly Monday schedule in addition to push-to-main and manual dispatch, providing consistent module-level coverage trend data.
- A46-17: dbt macros fully synced with
CreateStreamTableOptions—storage_backend,temporal,append_only,diamond_consistency,diamond_schedule_policy,pooler_compatibility_mode,max_differential_joins,max_delta_fraction, andoutput_distribution_columnare now configurable from dbt model configs and are correctly passed to the underlying SQL functions.
Schema Changes
-- New function
pgtrickle.preflight() RETURNS text
-- worker_pool_status() return type extended (4 new columns):
-- idle_workers integer
-- last_scheduler_tick_unix bigint
-- ring_overflow_count bigint
-- citus_failure_total bigint
-- New GUCs (set in postgresql.conf):
-- pg_trickle.invalidation_ring_capacity = 128 -- postmaster scope
-- pg_trickle.lag_aware_scheduling = false -- superuser scope
Upgrade note:
ALTER EXTENSION pg_trickle UPDATEwill DROP and re-createworker_pool_status()automatically (return type changed). The migration scriptpg_trickle--0.44.0--0.45.0.sqlhandles this.
[0.44.0] — Security Hardening & Code Quality
v0.44.0 is a security and code-quality sprint. It hardens SECURITY DEFINER paths, centralizes dynamic SQL construction, adds RLS bypass warnings, decomposes large modules, consolidates API options, and strengthens the parser's unsafe FFI façade.
Security
- A45-1: IVM trigger function
SET search_pathhardened. BEFORE trigger functions (advisory lock only) now use a restricted path with nopublic, preventing search_path shadowing of extension internals. AFTER trigger functions retainpublicso that user delta SQL can resolve unqualified source-table references; their PLPGSQL bodies call only schema-qualifiedpgtrickle.*functions, so the security boundary is maintained. - A45-3: A
WARNINGis now emitted when a stream table is created over a source table that has Row-Level Security (RLS) enabled, clarifying that source-table RLS does not protect stream-table contents. - A45-4: Monitoring
docker-compose.ymlcredentials are now driven by environment variables with amonitoring/.env.exampletemplate. PostgreSQL and Grafana services bind to127.0.0.1by default. - A45-5: New
scripts/check_security_definer.shCI check validates that everySECURITY DEFINERoccurrence in Rust and SQL files has a correspondingSET search_pathand does not includepublicwithout justification. Added tojust lintpipeline. - A45-6:
docs/SECURITY_MODEL.mdnow documents whysuperuser = trueandtrusted = falseare required, with a privilege table and guidance for managed environments (RDS, AlloyDB, CNPG).
Code Quality
- A45-2: New
src/sql_builder.rsmodule provides safe helpers for all dynamic SQL construction:ident,qualified,literal,regclass,spi_param,list_idents. Includes unit tests and a new fuzz target (FUZZ-6). - A45-7:
src/cdc.rssplit into three files — trigger-rebuild logic extracted tosrc/cdc/rebuild.rsand polling CDC extracted tosrc/cdc/polling.rs, reducing the main file from 4,259 to 3,386 lines. - A45-8:
CreateStreamTableOptionsstruct introduced insrc/api/mod.rsto centralize allcreate_stream_tableparameters. All four create paths (create_stream_table,create_stream_table_if_not_exists,bulk_create,create_or_replace_stream_table) now construct this struct before calling the implementation. - A45-9: Extended the SAF-2 typed unsafe façade in
src/dvm/parser/mod.rswith six additional safe wrapper functions (safe_deparse_sort_clause,safe_deparse_target_list,safe_node_contains_window_func,safe_collect_all_window_func_nodes,safe_extract_func_name,safe_extract_operator_name). AddedFUZZ-6fuzz target for sql_builder and parser volatility helpers. - A45-10: Scheduler background worker now emits structured
pgrx::warning!()calls instead of silently discarding errors frompg_backend_pid(),SchedulerJob::claim(), andpg_current_wal_lsn()SPI calls. - A45-11: All milestone-ID comments audited; each ID is now accompanied by
a human-readable invariant description and links to a live design document
in
plans/.
[0.43.0] — D+I Change-Buffer Schema, GUC Tuning & WAL Diagnostics
v0.43.0 delivers a fundamental change to how CDC change buffers are stored
(D+I schema: flat column names, UPDATE decomposed into a D-row + I-row at
write time), five new operator-tuning GUCs, a new wal_source_status()
diagnostic view for per-source WAL CDC state, extended explain_stream_table()
output, and a comprehensive microbenchmark suite for all new code paths.
A44-1 — Deep-Join Threshold GUCs
Two new GUCs let operators tune when the DVM planner switches from the fast L0-scan path to the full recursive join decomposition:
| GUC | Default | Description |
|---|---|---|
pg_trickle.part3_max_scan_count | 10000 | Maximum number of source rows before the planner escalates from P3 (direct scan) to a deeper join strategy. |
pg_trickle.deep_join_l0_scan_threshold | 256 | Row count at which multi-level join decomposition uses an L0 pre-scan instead of a full plan. |
-- Lower threshold to force deep-join path for testing
SET pg_trickle.deep_join_l0_scan_threshold = 1;
A44-2 — GROUP_RESCAN: Correct Incremental SUM(CASE …) Aggregates
The P5 aggregate differentiation path now produces correct incremental results
for non-invertible expressions such as SUM(CASE WHEN status = 'active' THEN amount ELSE 0 END). The previous LATERAL VALUES decomposition has been
replaced with direct c.action = 'I' / c.action = 'D' filtering against
the D+I change buffer, eliminating the extra join overhead and fixing a
correctness gap for UPDATE rows that cross a CASE boundary.
A44-3 — WAL Poll GUCs
Two new GUCs for tuning the WAL logical replication decoder polling loop:
| GUC | Default | Description |
|---|---|---|
pg_trickle.wal_max_changes_per_poll | 10000 | Maximum number of change messages to consume from a WAL slot in a single poll pass. |
pg_trickle.wal_max_lag_bytes | 104857600 (100 MiB) | WAL slot lag threshold (bytes) above which the decoder pauses to avoid slot saturation. |
A44-4 — Cost-Cache Capacity GUC
pg_trickle.cost_cache_capacity (default 4096) controls the maximum number
of entries in the shared refresh-cost estimate cache. On deployments with
thousands of stream tables, increasing this value avoids cold-cache fallback to
full-plan estimation.
A44-5 through A44-7 — Mandatory Microbenchmarks
Three new Criterion benchmark groups:
bench_a44_5_pool_vs_spawn— measures EU-DAG pool reuse vs. per-tick rebuild atn_sts ∈ {50, 200, 500, 1000}.bench_a44_6_write_amplification— compares single-hash (pre-D+I wide schema) vs. double-hash (D+I) write overhead atcols ∈ {4, 10, 20, 50}.bench_a44_7_join_codegen_by_depthandbench_a44_7_scan_agg_delta_sql— join chain depth 2–16 and P5 aggregate SQL generation at 1–5 group columns.
A44-8 — explain_stream_table() GUC Threshold Section
pgtrickle.explain_stream_table(name) now includes a GUC thresholds
section in its output, showing the effective values of all tuning GUCs
(deep-join threshold, WAL poll limits, cost-cache capacity) alongside the
existing plan and mode information.
A44-9 — pgtrickle.wal_source_status() — Per-Source WAL Diagnostics
New SQL function returning one row per registered source table with WAL CDC diagnostics:
SELECT * FROM pgtrickle.wal_source_status();
| Column | Description |
|---|---|
source_relid | Source table OID |
source_name | Fully-qualified source table name |
cdc_mode | trigger, wal, or transitioning |
slot_name | Logical replication slot name (NULL if trigger-based) |
slot_lag_bytes | Current WAL slot lag in bytes |
publication_name | Publication name (NULL if trigger-based) |
blocked_reason | Human-readable reason why WAL CDC is unavailable (NULL if active) |
transition_started_at | Timestamp when WAL transition began (NULL if not transitioning) |
decoder_confirmed_lsn | Last LSN confirmed by the decoder (NULL if trigger-based) |
A44-10 — D+I Change-Buffer Schema
Breaking internal change — the CDC change buffer table schema has been redesigned for correctness and performance.
Before (wide schema): Each source column was stored as two columns
(new_<col> and old_<col>). UPDATE was stored as a single action = 'U'
row; the DVM scan operator decomposed it at read time using a 5-CTE UNION ALL
pipeline.
After (D+I schema): Source columns are stored with their original names
("col"). UPDATE is decomposed at write time into:
- A D-row (
action = 'D') carrying the old values. - An I-row (
action = 'I') carrying the new values.
Both rows carry the same changed_cols VARBIT bitmask; genuine
INSERT/DELETE rows have changed_cols = NULL.
Benefits:
- Scan SQL is significantly simpler (no UNION ALL decomposition at read time).
- Aggregate differentiation eliminates the LATERAL VALUES join.
- Write amplification is constant (2 rows per UPDATE regardless of column count).
- Change buffer tables are compatible with standard SQL tooling.
The sync_change_buffer_columns() migration guard detects existing wide-schema
buffers (any new_*/old_* columns) and performs a no-op, logging a warning.
To migrate an existing deployment, use pgtrickle.repair_stream_table(name).
A44-11 — D+I Benchmark Suite
bench_a44_11_di_delta_scan exercises the full D+I Scan→Aggregate pipeline
at cols ∈ {4, 10, 20, 50} to track differential scan performance as the
column count grows.
[0.42.0] — Repair API, Docs Overhaul & Test Infrastructure
v0.42.0 delivers a new repair_stream_table SQL function for disaster recovery
and self-healing after PITR restores, a comprehensive documentation overhaul
(deprecated GUC appendix, RLS bypass warnings, updated architecture diagrams),
security hardening of the WAL decoder via SQL parameterization, and a major
test infrastructure uplift with state-polling helpers, new correctness property
tests, and two new CI gates.
A42-1 — pgtrickle.repair_stream_table(name text) → text
New SQL-callable function for stream table repair and self-healing. Use after
point-in-time recovery (pg_basebackup / PITR) or any operation that may have
left CDC triggers, change buffer tables, or catalog state inconsistent.
Actions performed:
- Acquires an advisory lock on the stream table to prevent concurrent mutations.
- Verifies the stream table exists in
pgtrickle.pgt_stream_tables. - Resets the refresh frontier to
NULLand setsneeds_reinit = true, forcing a full refresh on the next scheduler cycle. - Rebuilds any missing CDC triggers on all source tables.
- Recreates any missing change buffer tables in
pgtrickle_changes. - Resets error fuse state and stream table status to
ACTIVE. - Returns a text summary of all actions taken.
-- After a PITR restore, reinstall all CDC infrastructure
SELECT pgtrickle.repair_stream_table('order_totals');
-- → "repair_stream_table(order_totals): frontier reset; triggers OK; buffers rebuilt (1 recreated); status reset to ACTIVE"
A42-2 — Catalog Generator Accuracy Improvement
scripts/gen_catalogs.py regex now correctly captures non-pub #[pg_extern]
functions (pgrx does not require pub). The SQL API catalog grew from 24 to 98
entries, including repair_stream_table. CI fails on catalog drift.
A42-3 — SQL Reference: repair_stream_table Signature
docs/SQL_REFERENCE.md now correctly documents → text (not → void) return
type with full examples and parameter table.
A42-4 — Stale-Term Docs Linter (just docs-lint)
New just docs-lint recipe greps all docs/**/*.md for retired GUC names
(pg_trickle.max_workers, pg_trickle.max_parallel_refresh_workers) and fails
if any are found outside deprecated/compatibility sections. Also integrated into
.github/workflows/docs-drift.yml as a CI gate.
A42-5 — Deprecated GUC Compatibility Appendix
docs/CONFIGURATION.md now has an Appendix: Deprecated / Compatibility
GUCs section documenting event_driven_wake and wake_debounce_ms with
migration guidance. Existing active references to the two retired GUCs were
updated to their current replacements across PATTERNS.md, SCALING.md,
PRE_DEPLOYMENT.md, and docs/integrations/multi-tenant.md.
A42-6 — ARCHITECTURE.md Module Diagram Updated
docs/ARCHITECTURE.md module layout now correctly reflects the src/dvm/parser/
subdirectory structure introduced in v0.39.0 (G13-PRF), with all five sub-modules
(mod.rs, types.rs, validation.rs, rewrites.rs, sublinks.rs) listed.
A42-7 — RLS Bypass Prominence
docs/GETTING_STARTED.md and docs/PRE_DEPLOYMENT.md now include prominent
security notices explaining that pg_trickle background workers execute with
SET LOCAL row_security = off (matching PostgreSQL's own REFRESH MATERIALIZED VIEW semantics), and providing mitigation guidance.
A42-8 — Generated Docs Freshness CI Gate
.github/workflows/docs-drift.yml now runs both the catalog check
(python3 scripts/gen_catalogs.py --check) and the stale-term linter on every
PR targeting main, on every push to main, and on a weekly schedule.
A42-9 — State-Polling Test Helpers
tests/common/mod.rs now exports seven polling helpers:
wait_for_first_refresh/wait_for_refresh_history/wait_for_refresh_afterwait_for_cdc_modewait_for_stream_table_statuswait_for_scheduler_tickwait_for_query_count
All new E2E test files created in this release use these helpers exclusively
(zero tokio::time::sleep calls). Existing tests had their most egregious
blind waits replaced.
A42-10 — Differential SUM(CASE) E2E Tests
New test file tests/e2e_sum_case_differential_tests.rs (5 tests) validating
that SUM(CASE WHEN ... END) expressions correctly trigger full refresh mode
instead of attempting algebraically incorrect incremental updates.
A42-11 — SUM(CASE) AST-Level Detection
src/dvm/operators/aggregate.rs: is_algebraically_invertible now calls the
new expr_contains_case helper which recursively inspects the Expr AST for
CASE expressions at any nesting depth, catching wrapped forms like
SUM(CAST(CASE ... END AS numeric)).
A42-12 — FULL JOIN Aggregate Property Tests
New test file tests/e2e_full_join_aggregate_tests.rs (4 tests) including a
test_full_join_diff_vs_full_property_10_cycles property test that runs 10
insert/delete cycles and asserts DIFFERENTIAL refresh produces identical output
to FULL refresh after each cycle.
A42-13 — WAL Decoder SQL Parameterization
src/wal_decoder.rs: write_decoded_change now builds fully parameterized
SPI queries using $N placeholders and Spi::run_with_args, eliminating
all direct string interpolation of WAL values into SQL. This closes a class of
SQL injection risks in the WAL CDC path.
A42-14 — Stale EC-06 Comment Cleanup
src/dvm/operators/scan.rs: Updated design comments from the outdated EC-06
reference to accurately describe the current net-counting strategy and point to
the test_keyless_multiset_property test.
A42-15 — Keyless Multiset Property Tests
New test file tests/e2e_keyless_tests.rs (4 tests) validating that keyless
(no primary key) tables maintain correct multiset semantics through 10 cycles
of insert/delete/update operations.
A42-16 — Fuzz Smoke CI Job
New .github/workflows/fuzz-smoke.yml runs daily and on PRs that touch fuzz
targets. On PRs: replays the corpus for each target (zero new crashes allowed).
On schedule/dispatch: runs each target for 60 s and uploads crash artifacts.
Targets: parser_fuzz, cron_fuzz, dag_fuzz, guc_fuzz, cdc_fuzz,
wal_fuzz.
[0.41.0] — DVM Correctness: Structural Cache Keys, Placeholder Safety & WAL Transition Guards
v0.41.0 targets internal correctness of the Differential View Maintenance (DVM)
engine: eliminating snapshot-CTE cache collisions on structurally different
subtrees, making unresolved SQL placeholders hard errors, guarding WAL CDC
transitions against concurrent DDL, and ensuring the pool worker obeys the
global pg_trickle.enabled switch.
A41-1 — Structural Snapshot CTE Cache Key Fingerprint
The old snapshot_cache_key() concatenated leaf-table aliases, meaning two
OpTrees with identical source tables but different join conditions, join types,
predicates, projections, or grouping expressions mapped to the same key and
could silently share a snapshot CTE.
The function now computes a 64-bit structural fingerprint via DefaultHasher,
recursively encoding every operator type, join condition, predicate, projection,
group-by expression, and child fingerprints before formatting the key as a
16-character hex string. Collision probability is now astronomically low for
any realistic OpTree and is independent of alias names.
A41-2 — Placeholder Resolution Full-Validation Assertion
resolve_delta_template() and resolve_lsn_placeholders() now return
Result<String, PgTrickleError> instead of String. After all substitutions
a check_no_remaining_placeholders() call scans for any leftover __PGS_*__
or __PGT_*__ tokens. If any are found, PgTrickleError::UnresolvedPlaceholder
is returned and propagated all the way to the SQL surface as a clear
ERRCODE_INTERNAL_ERROR with a detail message naming the offending token and
the calling context.
This converts a class of silent wrong-query bugs (where an unresolved placeholder was executed as literal SQL text) into an immediate, actionable server error.
A41-3 — WAL Transition Eligibility Recheck at Commit Point
Before committing the TRANSITIONING → WAL state change, the background
worker now calls recheck_source_eligible_for_wal() to verify that:
pg_class.relkind = 'r'(table not dropped)- primary-key columns are still present
REPLICA IDENTITY = FULLis still set
If any check fails, the replication slot is immediately dropped, the catalog is
reset to Trigger mode, and a WalTransitionError is returned. This closes a
race window in which a concurrent DROP CONSTRAINT or ALTER TABLE … REPLICA IDENTITY DEFAULT could leave the CDC pipeline in an inconsistent WAL mode with
stale slot resources.
A41-4 — Pool Worker pg_trickle.enabled Check
The persistent pool-worker main loop now checks config::pg_trickle_enabled()
at the top of each iteration. When pg_trickle.enabled = off the worker sleeps
500 ms and skips all job claiming, ensuring that a live-reload of the GUC
immediately quiesces all workers without requiring a process restart.
A41-5 — Document Isolation Invariants (All Execution Modes)
// A41-5 — Isolation invariant: doc comments have been added to all five
execution-mode functions in src/scheduler/mod.rs:
| Mode | Invariant |
|---|---|
execute_worker_singleton | READ COMMITTED per-refresh; no cross-session writes visible |
execute_worker_atomic_group | READ COMMITTED with sub-transactions; repeatable-read group shares a snapshot |
execute_worker_immediate_closure | Single READ COMMITTED transaction; trigger-propagated and atomic |
execute_worker_cyclic_scc | Per-iteration READ COMMITTED; external observers see partial states between iterations |
execute_worker_fused_chain | Single READ COMMITTED transaction; bypass tables ON COMMIT DROP; externally atomic |
[0.40.0] — Operator Trust, Maintainability & Release Confidence
v0.40.0 focuses on building confidence for operators, maintainers, and adopters:
auto-generated API/GUC catalogs to eliminate drift, a formal security model,
drain-mode runbook with E2E proof, expanded alert rules, dbt/relay parity,
strict unsafe-block gate, L0-cache documentation truthfulness, formal deprecation
of event_driven_wake, and secret scanning in CI.
O40-1 — Auto-Generated GUC & SQL API Catalogs
scripts/gen_catalogs.py parses src/config.rs and src/**/*.rs to produce
docs/GUC_CATALOG.md (125 GUCs) and docs/SQL_API_CATALOG.md (24 SQL-callable
functions). Both are checked by a new .github/workflows/docs-drift.yml CI gate
that fails if the catalogs fall out of sync with the source.
Run just gen-catalogs to regenerate; just check-docs-drift (or the CI gate)
detects drift.
O40-2 — Security Model & Secret-Handling Guide
New docs/SECURITY_MODEL.md covers: SECURITY DEFINER scope, search_path
hardening, RLS boundary semantics, CDC buffer access controls, TRUNCATE
semantics, relay credential storage guide, background worker privilege model,
incident response checklist, and v1.0 supply-chain preparation checklist.
O40-3 — Drain-Mode Runbook & E2E Proof
New docs/RUNBOOK_DRAIN.md provides a step-by-step operator runbook for
controlled shutdown, rolling upgrade, and load-testing drain scenarios with
observability guidance and troubleshooting steps.
Six new E2E tests in tests/e2e_drain_mode_tests.rs validate: idle drain
returns true, is_drained() state reflection, post-resume catch-up,
drain under active workload, timeout parameter semantics, and change-buffer
accumulation during drain.
O40-4 — Expanded Alert Rules
monitoring/prometheus/alerts.yml gains eight new production-grade alert
rules:
| Alert | Threshold | Severity |
|---|---|---|
PgTrickleFreshnessLagHigh | staleness > 600 s for 10 min | warning |
PgTrickleRefreshP99High | avg_duration > 60 000 ms for 5 min | warning |
PgTrickleCdcBufferDepthHigh | pending_rows > 500 000 for 5 min | warning |
PgTrickleWalSlotLagHigh | retained_wal_mb > 200 for 5 min | warning |
PgTrickleWalSlotLagCritical | retained_wal_mb > 1 024 for 2 min | critical |
PgTrickleWorkerPoolSaturated | active ≥ 90 % pool_size for 5 min | warning |
PgTrickleCitusLeaseUnhealthy | lease_held == 0 for 5 min | critical |
PgTrickleOtelExportErrors | export_errors_total > 0 for 5 min | warning |
O40-5 — dbt & Relay Parity
New dbt macro pgtrickle_operational_status() returns scheduler health,
drain state, CDC pause state, force-full mode, and back-pressure status.
New pgtrickle_drain() macro for drain from dbt. stream_table_status()
updated with cdc_paused, force_full, and is_drained fields.
O40-6 — Unsafe-Inventory Gate (Strict Mode)
.github/workflows/unsafe-inventory.yml changed from --report-only to
strict mode: the workflow now exits 1 on unsafe-block regressions, making it
a hard PR gate. Unsafe blocks that need to be added must update the baseline
via an explicit PR that reviewers can audit.
O40-7 — L0-Cache Truthfulness
pg_trickle.template_cache GUC documentation updated to explain the full
L0/L1/L2 cache architecture:
- L0 (process-local
RwLock<HashMap>) — fast, not shared across pooler connections; hit rate is low in PgBouncer transaction-pooling deployments. - L1 (thread-local delta template) — fastest, reset on each SPI connect.
- L2 (UNLOGGED catalog table) — shared across all backends; the correct layer to rely on for cross-connection performance.
Operators using transaction-pooling should rely on L2 warm-up, not L0.
O40-8 — event_driven_wake Formal Deprecation
pg_trickle.event_driven_wake and pg_trickle.wake_debounce_ms are
formally deprecated with full rationale in the GUC doc comments:
LISTEN is not allowed in PostgreSQL background workers; the scheduler
always uses latch-based polling. Both GUCs are preserved in v0.40.0 for
upgrade compatibility and will be removed in v1.0. Setting them now
emits a WARNING but does not break existing configurations.
O40-9 — Secret Scanning CI
New .github/workflows/secret-scan.yml runs gitleaks on all pull
requests to main, on pushes to main, and weekly. .gitleaks.toml
provides an allowlist for known example credentials in documentation
and test fixtures.
[0.39.0] — Operational Truthfulness & Distributed Hardening
v0.39.0 focuses on making pg_trickle's operational behavior more honest and robust: CDC hold mode, enhanced diagnostics, SQLSTATE-aware retry, OpenTelemetry documentation, Citus chaos hardening, and a broader testing pyramid.
O39-1/O39-8 — CDC Hold Mode (cdc_capture_mode)
New GUC pg_trickle.cdc_capture_mode (default discard). When set to hold,
captured change rows are buffered in the change table while CDC is paused rather
than being silently discarded. The existing discard behavior is unchanged and
remains the default to preserve backward compatibility.
New SQL function pgtrickle.cdc_pause_status() returns per-stream-table CDC
pause state including paused, capture_mode, and an operator-guidance note.
O39-2 — Wake Truthfulness
The scheduler no longer attempts LISTEN/NOTIFY in background worker contexts
(PostgreSQL does not support this). Wake truthfulness is documented in the
header of e2e_wake_tests.rs; tests now verify that the scheduler falls back
to polling correctly rather than asserting sub-polling-interval wake latency.
O39-3 — Configuration Documentation
docs/CONFIGURATION.md gains three new sections covering GUCs introduced in
v0.36.0 (WAL Backpressure), v0.37.0 (pgVectorMV & OpenTelemetry), and v0.39.0
(Operational Truthfulness: cdc_capture_mode). Each section includes an
operator checklist and configuration examples.
O39-4 — Upgrade Guide
docs/UPGRADING.md gains upgrade sections for every version from 0.34.0 to
0.39.0, including schema change details, new GUCs, new functions, and known
limitations per release.
O39-5 — OpenTelemetry Operator Guide
New docs/OPENTELEMETRY.md provides an end-to-end operator guide for the W3C
Trace Context integration introduced in v0.37.0. Covers Jaeger/Tempo/OTEL
Collector configuration, span attributes, failure behavior (best-effort; never
blocks refresh), and verification steps.
Three new E2E tests (tests/e2e_otel_tests.rs) verify trace context capture,
unreachable-endpoint graceful degradation, and disabled-tracing NULL context.
O39-6 — SQLSTATE-First SPI Retry
New GUC pg_trickle.use_sqlstate_classification (default off). When enabled,
the scheduler uses a SQLSTATE integer class (40xxx = retryable, 23xxx = not
retryable, etc.) before falling back to text pattern matching. The new unified
classify_error_for_retry() function is used at both retry decision points in
the scheduler.
O39-7 — Citus Chaos Test Harness
New tests/e2e_citus_chaos_tests.rs containing four #[ignore] chaos tests:
- CHAOS-1: Worker death mid-refresh (graceful failure + recovery)
- CHAOS-2: Coordinator restart during lease (lock invalidation + re-acquire)
- CHAOS-3: Shard rebalance during active CDC (no row gaps)
- CHAOS-4: Stale worker slot cleanup (topology change detection)
Tests require CITUS_COORDINATOR_URL and CITUS_*_CONTAINER env vars; they
are skipped automatically when not set.
O39-9 — Enhanced explain_stream_table()
pgtrickle.explain_stream_table() now shows: Status, Populated, Refresh mode
(with force_full_refresh GUC note), CDC status (paused/active + capture mode),
Backpressure state, and the Defining query. This makes the function a one-stop
diagnostic tool for operators.
O39-10 — TPC-H EXPLAIN Artifacts CI
New workflow .github/workflows/tpch-explain-artifacts.yml captures EXPLAIN
ANALYZE BUFFERS output and p50/p99 timing for TPC-H queries Q04, Q05, Q07, Q08,
Q09, Q20, Q22. Runs weekly (Sunday 06:00 UTC) and on manual dispatch. Artifacts
are uploaded and retained for 90 days.
New test_tpch_explain_artifacts test function (#[ignore]) in
tests/e2e_tpch_tests.rs performs the collection.
O39-11 — SQLancer Light PR Mode
Two new non-#[ignore] tests in tests/e2e_sqlancer_tests.rs:
test_sqlancer_crash_oracle_light: 50 random queries, crash oracle.test_sqlancer_equivalence_oracle_light: 50 random queries, equivalence oracle.
Both use a fixed seed (SQLANCER_LIGHT_SEED) and bounded case count
(SQLANCER_LIGHT_CASES, default 50) for fast, deterministic PR CI gates.
O39-12 — Fuzz Target Expansion
Two new libFuzzer targets:
fuzz/fuzz_targets/wal_fuzz.rs: SQLSTATE classifier +sqlstate_to_stringinvariants.fuzz/fuzz_targets/dag_fuzz.rs: schedule parsing, cron validation, SELECT * detection.
Both verify no-panic and determinism properties for adversarial inputs.
O39-13 — Inbox/Outbox Reliability Property Tests
Unit-level property tests in src/api/inbox.rs (#[cfg(test)]) covering:
- Partition exhaustiveness: every aggregate ID maps to exactly one worker.
- Hash determinism: same inputs always produce the same assignment.
- Negative
total_workersdegenerate case. - Known hash anchors for regression protection.
SQLSTATE classifier property tests in src/error.rs covering:
- Retryable class detection.
- Bracket-code extraction with malformed inputs.
sqlstate_to_stringtotality and determinism.
O39-14 — PR-Scoped Upgrade E2E Slice
New CI job upgrade-e2e-pr-slice in .github/workflows/ci.yml. Triggered on
PRs that modify sql/, src/config.rs, src/cdc.rs, or src/api/. Runs the
most recent N-1→N upgrade pair using a stock postgres:18.3 container (no
custom Docker build). Tests filtered to smoke | basic | catalog labels for
speed.
Upgrade: Run ALTER EXTENSION pg_trickle UPDATE TO '0.39.0'. The
0.38.0→0.39.0 migration creates pgtrickle.cdc_pause_status() and registers
the cdc_capture_mode GUC comment. No existing tables or functions are removed.
[0.38.0] — EC-01 Join Correctness Sprint
v0.38.0 is a focused correctness release for EC-01, the join phantom-row class where non-deduplicated join deltas could leave stale row IDs behind across refresh cycles.
EC-01 — Unconditional PH-D1 Cleanup
Non-deduplicated keyed join deltas now run PH-D1 cross-cycle cleanup after every differential apply. The cleanup computes the current FULL-refresh row-id set and deletes stream-table row IDs that no longer exist in the correct result. This removes historical phantoms that are not present in the current delta and keeps DIFFERENTIAL output convergent with FULL output.
RowIdSchema Planner Guard
The dormant RowIdSchema model is now exercised during DVM planning. The
planner infers row-id schemas for scans, transparent operators, joins,
aggregates, set operations, CTEs, recursive plans, lateral plans, and scalar
subqueries before generating delta SQL. If a row-id pipeline is internally
inconsistent, planning fails with a clear RowIdSchema verification failed
message rather than allowing silent refresh drift.
EC-01 Property Release Gate
Added e2e_ec01_property_tests, a DIFF-vs-FULL property test that runs a
deterministic three-table join aggregate through 100 mixed-DML cycles by
default. Each cycle includes inserts, updates, deletes on both sides of joins,
and co-delete cases, then compares DIFFERENTIAL and FULL stream tables with
multiset equality and row-id diagnostics.
Q07 and Q15 are no longer allowed in IMMEDIATE_SKIP_ALLOWLIST, so CI must
prove those query shapes instead of accepting silent skips.
Removed
pgtrickle-tui— The terminal dashboard binary has been removed from this repository. All SQL-level monitoring functions (pgtrickle.health_check(),pgtrickle.list_stream_tables(), etc.) remain fully available in the extension.
Upgrade: The 0.37.0 -> 0.38.0 migration has no SQL-object changes; the
release changes Rust DVM/refresh behavior and test coverage only.
[0.37.0] — pgVector Incremental Aggregates & Distributed Trace Propagation
v0.37.0 adds two independent capability pillars: incremental vector aggregates for pgvector workloads, and W3C Trace Context propagation through the CDC → DVM → MERGE pipeline.
F4 — pgVector Incremental Aggregates
Stream tables can now maintain avg(embedding) and sum(embedding) over
vector, halfvec, and sparsevec columns incrementally. The DVM planner
detects vector-typed aggregate arguments at plan time and reclassifies them to
use pgvector-native differential operators (VectorAvg, VectorSum) that
maintain a running (count, sum_vector) auxiliary state instead of a full
table scan on every change.
SQL usage:
CREATE EXTENSION pgvector;
CREATE TABLE products (
id SERIAL PRIMARY KEY,
category TEXT,
embedding vector(3)
);
-- This stream table is maintained incrementally — no full scan on INSERT.
SELECT pgtrickle.create_stream_table(
'category_centroids',
'SELECT category, avg(embedding)::vector AS centroid
FROM products GROUP BY category',
schedule => '5s'
);
GUC: SET pg_trickle.enable_vector_agg = on; (session-level opt-in).
Distance operator fallback: <=>, <->, <#> operators in WHERE clauses
trigger automatic full-refresh fallback because they are non-monotone. The
planner emits a WARNING so operators know the mode downgrade occurred.
Criterion benchmarks are provided for vector_avg, vector_sum, and mixed
workloads in benches/diff_operators.rs.
Documentation: docs/tutorials/PGVECTOR_EMBEDDING_PIPELINES.md.
F10 — W3C Trace Context Propagation
Every CDC change buffer table now contains a __pgt_trace_context TEXT column.
When an application sets the pg_trickle.trace_id GUC before executing DML,
the row-level and statement-level CDC triggers capture the W3C traceparent
string into that column.
After each differential refresh, if pg_trickle.enable_trace_propagation = on,
the extension reads the trace context from the change buffer and either:
- exports an OTLP/JSON span to
pg_trickle.otel_endpoint(Jaeger, Zipkin, OTEL Collector), or - logs the span at
INFOlevel when no endpoint is configured.
The span covers the full CDC-drain → DVM-plan → merge-apply cycle, linking PostgreSQL refresh latency directly to application request traces.
GUCs added:
| GUC | Type | Default | Description |
|---|---|---|---|
pg_trickle.enable_trace_propagation | BOOL | false | Enable W3C trace propagation |
pg_trickle.otel_endpoint | STRING | '' | OTLP HTTP endpoint (e.g. http://localhost:4318) |
pg_trickle.trace_id | STRING | '' | W3C traceparent set by the application session |
pg_trickle.enable_vector_agg | BOOL | false | Enable incremental pgvector aggregates |
Upgrade: The 0.36.0 → 0.37.0 migration script adds __pgt_trace_context
to all existing change buffer tables automatically.
Internal improvements
- A15/A16:
src/schedulerandsrc/refresh/mergeeach split into focused sub-modules (completed in v0.37.0 development cycle).
[0.36.0] — Structural Hardening, Performance & Temporal IVM
v0.36.0 closes structural and performance gaps accumulated since the Citus arc.
The L0 process-local template cache is now constructed (was wired-but-empty
since v0.31.0). WAL slot backpressure enforcement is available via the new
pg_trickle.enforce_backpressure GUC. Structured JSON logging arrives for
OpenTelemetry/Loki integration. The RowIdSchema type formalises cross-operator
row-id compatibility, addressing the architectural root cause of EC-01 class bugs.
Temporal IVM (SCD Type 2, AS OF TIMESTAMP ready) and columnar storage backend
support are introduced. A drain mode API enables graceful quiesce before
maintenance windows.
New features
-
A09 — L0 process-local template cache: Process-local
RwLock<HashMap>keyed by(pgt_id, cache_generation)avoids ~45 ms cold-start penalty per backend for connection-pooler workloads. Invalidated automatically on generation bump. New API:shmem::l0_cache_lookup(),shmem::l0_cache_store(),shmem::invalidate_l0_cache(). -
A12 — WAL backpressure enforcement: When
pg_trickle.enforce_backpressure = on, CDC trigger writes are suppressed once the WAL replication slot lag reachesslot_lag_critical_threshold_mb. Writes resume when lag drops below 50% of the threshold (hysteresis). Default:off. -
A17 — Typed DDL event payload: Replaced string-tag matching in
hooks.rswith aDdlCommandKindenum.CREATE OR REPLACE FUNCTIONis now correctly classified asFunctionChange. -
A18 —
RowIdSchematype: Every DVM operator can now declare its row-id hash schema. Averify_pipeline()function asserts cross-operator compatibility at plan time, making EC-01-class bugs detectable before execution. -
A20 — Structured JSON logging: New
src/logging.rsmodule withPgtLogEventstruct andpgt_info!macro. Whenpg_trickle.log_format = json, events are emitted as structured JSON with fieldsevent,pgt_id,cycle_id,duration_ms,refresh_reason,error_code,msg. Default:text. -
A25 — Bulk alter / drop APIs: New SQL functions
pgtrickle.bulk_alter_stream_tables(names TEXT[], params JSONB)andpgtrickle.bulk_drop_stream_tables(names TEXT[])for dbt deployments managing many stream tables. -
A35 — Drain mode:
pgtrickle.drain(timeout_s INT DEFAULT 60)signals the scheduler to stop accepting new cycles and waits for all in-flight refreshes to complete.pgtrickle.is_drained()checks drain status. Useful beforepg_upgrade, rolling restarts, and backup windows. -
CORR-1 / UX-1 — Temporal IVM:
create_stream_table()andcreate_stream_table_if_not_exists()now accepttemporal := true. When enabled,__pgt_valid_from TIMESTAMPTZand__pgt_valid_to TIMESTAMPTZcolumns are automatically added to the storage table. Atemporal_modecolumn is recorded inpgtrickle.pgt_stream_tables. -
CORR-2 / UX-3 — Columnar storage backend:
create_stream_table()now acceptsstorage_backend := 'heap'|'citus'|'pg_mooncake'(default:'heap'). The backend is recorded inpgtrickle.pgt_stream_tables.storage_backendand can be overridden globally via thepg_trickle.columnar_backendGUC. -
F5 — Online schema evolution: When
pg_trickle.online_schema_evolution = on,ALTER QUERYwith only column additions (no removals) preserves the existing frontier andis_populatedflag, enabling continuous differential refresh without a full reinit. Default:off. -
F11 —
CREATE STREAM TABLESQL syntax: New functionpgtrickle.exec_stream_ddl(TEXT)parses custom DDL strings such asCREATE STREAM TABLE name AS SELECT ...andCREATE OR REPLACE STREAM TABLE name AS SELECT ...andDROP STREAM TABLE name. -
F12 — Column lineage: New function
pgtrickle.stream_table_lineage(name TEXT)returnsTABLE(output_col, source_table, source_col)from thecolumn_lineageJSONB recorded in the catalog at creation time.
New GUCs
| GUC | Default | Description |
|---|---|---|
pg_trickle.enforce_backpressure | off | Pause CDC writes when slot lag exceeds critical threshold |
pg_trickle.log_format | text | Log format: text or json |
pg_trickle.drain_timeout | 60 | Default drain timeout (seconds) |
pg_trickle.online_schema_evolution | off | Preserve frontier on compatible ALTER QUERY |
pg_trickle.columnar_backend | none | Default columnar backend: none, citus, pg_mooncake |
pg_trickle.temporal_stream_tables | off | Global temporal IVM flag |
Schema changes
pgtrickle.pgt_stream_tablesgains three new columns:temporal_mode BOOLEAN NOT NULL DEFAULT FALSEstorage_backend TEXT NOT NULL DEFAULT 'heap'column_lineage JSONB
Upgrade
ALTER EXTENSION pg_trickle UPDATE TO '0.36.0';
The migration script (sql/pg_trickle--0.35.0--0.36.0.sql) is fully
idempotent and adds the new columns with IF NOT EXISTS.
[0.35.0] — Hardening, Reactive Subscriptions & Relay Resilience
v0.35.0 is a focused correctness, operability, and resilience sprint. It adds reactive NOTIFY subscriptions, an SLA summary API, CDC kill-switch GUCs, and several operator-facing improvements. The relay gains exponential reconnect backoff and ${ENV:VAR_NAME} secret interpolation.
New features
- Reactive subscriptions —
pgtrickle.subscribe(stream_table, channel)/pgtrickle.unsubscribe()/pgtrickle.list_subscriptions(): NOTIFY-based reactive delivery after every non-empty refresh cycle (UX-SUB). - SLA summary API —
pgtrickle.sla_summary()returns p50/p99 latency, freshness lag, and error-budget remaining over a configurable window (pg_trickle.sla_window_hours, default 24 h) (F17). - Explain stream table —
pgtrickle.explain_stream_table(name)returns the defining query and cached refresh metadata for a stream table (A23). - Shadow-build evolution status —
pgtrickle.view_evolution_status()lists which stream tables are in a zero-downtime shadow build (UX-STATUS). - CDC kill-switch — new
pg_trickle.cdc_pausedGUC (boolean, defaultoff) pauses all CDC capture at the trigger level without dropping triggers (A07). - Force-full-refresh GUC —
pg_trickle.force_full_refresh(boolean, defaultoff) forces all stream tables to use FULL refresh mode for a debugging/recovery window (A08). - FULL-fallback NOTICE — a
NOTICEis emitted every time differential refresh falls back to FULL refresh, including the reason string (A22). - Shadow-ST catalog columns —
in_shadow_buildandshadow_table_namecolumns added topgtrickle.pgt_stream_tables(UX-SHADOW). - History start_time index —
pgt_refresh_history_start_time_idx (start_time DESC)for faster SLA queries and retention pruning (A11). - Relay ENV-var interpolation — connection strings support
${ENV:VAR_NAME}placeholders that are expanded from the process environment at startup (A30). - Relay reconnect backoff — the relay now retries failed PostgreSQL connections with exponential backoff (initial 100 ms, max 30 s, ±20 % jitter) (A38).
- Relay backpressure — new
sink_max_inflightconfig field (default 1 000 messages) that can be used to pause upstream polling (A39). - Notify coalesce GUC —
pg_trickle.notify_coalesce_ms(integer, default 250 ms) reserved for future NOTIFY debounce (UX-GUC).
Correctness fixes
- A01 / EC-01: cross-cycle phantom-row cleanup now runs unconditionally after every join differential refresh cycle (batch 1 024 rows) instead of being deferred. This eliminates phantom residual rows that accumulated over multi-cycle windows (A01).
- A05:
join_and_predicates()no longer panics on an empty predicate list — now returnsResult<Expr, PgTrickleError>(A05).
Performance
- History prune batch size raised from 1 000 → 10 000 rows per transaction to reduce pruning lag on busy clusters (A10).
- Citus lease jitter:
try_acquire_st_lock()adds 50–500 ms random backoff on INSERT conflict to prevent coordinator thundering herd (A13).
Developer experience
- Unit tests added for
inbox_is_my_partition(A06) andoutbox_table_name_for(A06). - Relay config tests added for
${ENV:VAR_NAME}expansion.
Upgrade notes
Run the upgrade migration:
ALTER EXTENSION pg_trickle UPDATE TO '0.35.0';
The migration adds pgt_refresh_history_start_time_idx, creates pgtrickle.pgt_subscriptions, and adds in_shadow_build / shadow_table_name columns to pgtrickle.pgt_stream_tables. All DDL is idempotent.
[0.34.0] — Citus: Automated Distributed CDC Scheduler & Shard Recovery
v0.33.0 shipped all the Citus distributed CDC infrastructure — per-worker WAL
slots, pgt_st_locks coordination, poll_worker_slot_changes, and
handle_vp_promoted. v0.34.0 closes the final gap: the scheduler is now fully
aware of distributed sources and drives the per-worker slot lifecycle
automatically, making distributed stream tables completely hands-off.
What's new
-
Automated scheduler integration (COORD-10, COORD-11, COORD-12): When a stream table source has
source_placement = 'distributed', the scheduler now callsensure_worker_slot()on the first tick (and after rebalances), callspoll_worker_slot_changes()to drain per-worker WAL changes into the local buffer, and acquires/extends/releases apgt_st_lockslease around the entire operation. -
Shard rebalance auto-recovery (COORD-13): The scheduler detects
pg_dist_nodetopology changes by comparing active primaries againstpgt_worker_slots. When a change is detected, stale slot entries are dropped, new worker slots are inserted, and the stream table is marked for a full refresh — no operator intervention needed. -
Worker failure handling (COORD-14): If
poll_worker_slot_changes()fails for a worker, the error is logged and the worker is skipped for that tick. Afterpg_trickle.citus_worker_retry_ticksconsecutive failures, a WARNING is emitted in the PostgreSQL log for operator attention. Refreshes against healthy workers continue uninterrupted. -
New GUC (COORD-15):
pg_trickle.citus_worker_retry_ticks(default 5) — consecutive worker-poll failures before flagging incitus_status. Set to 0 to disable the alert. -
Extended
citus_statusview (COORD-16): The view now includeslast_polled_at(timestamp of the last successful poll for each worker slot),lease_holder,lease_acquired_at,lease_expires_at, andlease_health('unlocked'/'locked'/'expired') columns for full operational visibility.
Migration
No application-level changes required. The new scheduler behaviour activates
automatically for stream tables with source_placement = 'distributed'.
Operators using manual LISTEN + handle_vp_promoted() wiring can remove that
code — it is now redundant (though harmless to leave in place).
Run ALTER EXTENSION pg_trickle UPDATE TO '0.34.0' to pick up the new
last_polled_at column and extended citus_status view.
[0.33.0] — Citus: Distributed Source CDC & Stream Tables
This release delivers world-class incremental view maintenance over Citus distributed tables, and aligns with pg_ripple v0.58.0 Citus sharding support. pg_trickle can now track changes on distributed source tables and write results to distributed output tables, while leaving all non-Citus code paths completely unchanged.
pg_ripple Citus Co-location Helper
New: pgtrickle.handle_vp_promoted(payload TEXT) → BOOLEAN
Processes a pg_ripple.vp_promoted NOTIFY payload emitted by pg_ripple
v0.58.0 when a VP table is distributed via Citus. Call this from any
regular backend session that is LISTENing to pg_ripple.vp_promoted:
LISTEN "pg_ripple.vp_promoted";
-- … receive notification …
SELECT pgtrickle.handle_vp_promoted(:'NOTIFY_PAYLOAD');
The function:
- Parses the payload JSON (
table,shard_count,shard_table_prefix,predicate_id). - Logs the promotion details.
- When the promoted table matches an active distributed CDC source in
pgt_change_tracking, signals the scheduler to probe worker slots on the next tick without a full catalog scan. - Returns
trueif a matching source was found,falseotherwise.
docs/integrations/citus.md gains a new pg_ripple Integration section
covering co-location DDL, the vp_promoted notification contract, and
guidance on aligning pgt_st_locks lease expiry with
pg_ripple.merge_fence_timeout_ms.
Distributed stream table output
create_stream_table() gains a new optional parameter
output_distribution_column. When provided, and Citus is installed, the
output storage table is converted to a Citus distributed table on that column
immediately after creation. Existing call sites without the parameter are
unaffected.
-- Co-locate the stream table with the source shards
CALL pgtrickle.create_stream_table(
name => 'orders_summary',
query => 'SELECT customer_id, count(*) FROM orders GROUP BY 1',
output_distribution_column => 'customer_id'
);
Per-worker WAL slot tracking (pgt_worker_slots)
A new catalog table pgtrickle.pgt_worker_slots records the logical
replication slot name and last-consumed frontier for each Citus worker node
per source table. This enables per-worker CDC polling and accurate lag
monitoring across all nodes in the cluster.
Cross-node refresh coordination (pgt_st_locks)
A new catalog table pgtrickle.pgt_st_locks provides lightweight distributed
mutex semantics using INSERT … ON CONFLICT DO NOTHING. This replaces
advisory locks for distributed stream table refreshes, ensuring that only one
coordinator node applies changes at a time across a multi-coordinator Citus
setup.
Citus observability view (citus_status)
SELECT * FROM pgtrickle.citus_status returns one row per
(stream table, source, worker) combination, showing the coordinator slot,
worker slot name, last consumed LSN, and source placement type. Use this view
to monitor replication lag and detect unreachable workers.
SELECT pgt_name, worker_name, worker_port, worker_slot, worker_frontier
FROM pgtrickle.citus_status;
Correct apply path for distributed stream tables
Citus blocks cross-shard MERGE statements. pg_trickle now automatically
detects distributed output stream tables and switches to a
DELETE + INSERT … ON CONFLICT DO UPDATE apply path, which Citus supports
natively. Single-node and reference-table stream tables continue to use the
existing MERGE path.
Pre-flight checks for Citus clusters
Two new pre-flight check functions are available via the Rust API:
check_citus_version_compat()— verifies that all worker nodes are running the same pg_trickle version as the coordinator. Returns an error listing any mismatched workers.check_worker_wal_levels()— verifies thatwal_level = logicalis configured on all worker nodes. Returns an error if any worker has a lower WAL level, preventing silent slot-creation failures.
Per-worker CDC helpers
The poll_worker_slot_changes() function drains a logical replication slot on
a remote Citus worker via dblink and writes the decoded changes into the
coordinator's local change buffer. The ensure_worker_slot() function creates
the slot if it does not already exist, making the setup idempotent on every
scheduler tick.
Citus integration guide
A new documentation page at docs/integrations/citus.md covers prerequisites,
installation, placement options, the observability view, known failure modes
(unreachable workers, recycled WAL slots, shard rebalancing), and performance
considerations.
Upgrade
Run the standard extension upgrade. The migration script adds the three new
catalog objects (pgt_st_locks, pgt_worker_slots, citus_status) and
replaces the three create_stream_table function signatures with versions that
include the new output_distribution_column parameter. Existing call sites
without the new parameter continue to work without change.
ALTER EXTENSION pg_trickle UPDATE TO '0.33.0';
[0.32.0] — Citus: Stable Naming & Per-Source Frontier Foundation
This release lays the foundation for world-class Citus support by replacing OID-based internal object names with stable hash-derived names and adding Citus cluster detection helpers.
Stable internal object naming
pg_trickle now names every internal object (change buffer tables, trigger functions, WAL replication slots, publication names) using a short 16-character hex string derived from the schema-qualified source table name:
changes_a3f7b2c1d0e5f9a8 -- was: changes_12345
pgt_cdc_fn_a3f7b2c1d0e5f9a8 -- was: pgt_cdc_fn_12345
pgtrickle_a3f7b2c1d0e5f9a8 -- was: pgtrickle_12345
This name is identical on every Citus node, survives pg_dump/restore cycles,
and survives OID reassignment after a major-version upgrade. Existing
installations are upgraded automatically by the migration script — all existing
objects are renamed in a single transaction with no downtime.
The change is invisible to end users: no SQL API changes, no configuration changes, no behaviour changes on single-node PostgreSQL.
Citus cluster detection
A new internal module (src/citus.rs) provides helpers to detect whether Citus
is loaded and how a given source table is distributed (local, reference, or
distributed). This information is stored in the catalog and will drive per-node
CDC and apply strategies in v0.35.0.
New catalog columns
Three catalog tables gain new columns:
pgtrickle.pgt_stream_tables:st_placement TEXT DEFAULT 'local'pgtrickle.pgt_dependencies:source_stable_name TEXT,source_placement TEXT DEFAULT 'local'pgtrickle.pgt_change_tracking:source_stable_name TEXT,source_placement TEXT DEFAULT 'local',frontier_per_node JSONB
New SQL function
pgtrickle.source_stable_name(oid) → TEXT — returns the 16-character stable
hash for any source relation by OID. Useful for diagnostics.
Upgrade notes
The 0.31.0 → 0.32.0 migration script handles all object renames
automatically. Replication slots are renamed if the PostgreSQL version is 15+;
on older versions a manual rename step is logged as a NOTICE. Existing change
buffer data is preserved — only the table and function names change.
[0.31.0] — Performance & Scheduler Intelligence
This release delivers measurable performance improvements for deployments with many stream tables, along with new tools for monitoring scheduler behaviour and reacting to processing backlogs before they become a problem.
Faster immediate-mode updates
Stream tables configured in immediate mode — which update on every data change rather than on a schedule — now handle those changes more efficiently. Previously, every single data change caused PostgreSQL to create and destroy a temporary table in the background, a fixed cost that adds up at high write rates. That overhead has been eliminated.
This improvement is opt-in. Enable it with pg_trickle.ivm_use_enr = true
(requires PostgreSQL 18+).
Fewer database round-trips for shared sources
When multiple stream tables all read from the same source table, pg_trickle
now scans their pending changes in a single database pass instead of once per
stream table. If you have ten stream tables all watching the same orders
table, pg_trickle makes one read instead of ten. The benefit scales with the
number of stream tables. This is on by default.
Smarter update-strategy hints
Every refresh, pg_trickle chooses between two strategies for applying changes:
a merge approach (efficient for small change sets) and a delete-then-reinsert
approach (faster when large portions of the data have changed). Enabling
pg_trickle.adaptive_merge_strategy now logs a suggestion after each refresh
indicating whether the current strategy is optimal, based on the ratio of
changes to total rows. This makes performance tuning straightforward — no
restarts or code changes required.
Silent fallbacks are now visible
When pg_trickle encounters a problem analysing certain query types, it falls
back to a slower, more conservative update mode. Previously this was invisible.
The count of such fallbacks is now tracked and surfaced in
pgtrickle.metrics_summary() under ivm_lock_parse_error_count, so you can
spot and address the underlying cause.
Back-pressure alerts for overloaded pipelines
If data is arriving faster than pg_trickle can process it, the change buffer
grows. pg_trickle now watches this and, after 3 consecutive cycles above the
alert threshold (configurable via pg_trickle.backpressure_consecutive_limit),
raises a change_buffer_backpressure alert. Applications or monitoring systems
can listen for this event and respond — for example by slowing producers or
adding consumers.
Coming soon: cross-database refresh coordination
A detailed design for a future cross-database refresh coordinator has been
published in docs/research/multi_db_refresh_broker.md. Implementation is
planned for v0.32.0.
What changed
- Error messages are now categorised by standard SQL error code by default,
making them easier to parse in automated monitoring. The previous behaviour
can be restored with
pg_trickle.use_sqlstate_classification = false.
New settings
| Setting | Default | What it does |
|---|---|---|
pg_trickle.ivm_use_enr | off | Eliminate temporary-table overhead in immediate mode (PostgreSQL 18+ only) |
pg_trickle.adaptive_batch_coalescing | on | Scan change buffers for shared sources in a single pass |
pg_trickle.adaptive_merge_strategy | off | Log update-strategy suggestions after each refresh |
pg_trickle.backpressure_consecutive_limit | 3 | Consecutive over-threshold cycles before raising a back-pressure alert |
Upgrade
Run ALTER EXTENSION pg_trickle UPDATE TO '0.31.0'; — no manual changes
required. The faster immediate-mode path is opt-in; set
pg_trickle.ivm_use_enr = true to enable it.
[0.30.0] — Pre-GA Correctness & Stability Sprint
This release is focused entirely on correctness and stability in preparation for the 1.0 release. There are no new user-facing features — every change is a fix, a safety guard, or a memory efficiency improvement.
Fixed: phantom rows in join-based stream tables
Stream tables that join multiple source tables could silently accumulate stale rows over time when a refresh was interrupted part-way through. Those rows are now cleaned up automatically after every refresh, ensuring the result always converges to the correct answer.
Fixed: incorrect results for complex query patterns
Subqueries nested inside CASE expressions, COALESCE calls, and function
arguments are now correctly detected and handled. Previously, stream tables
using these patterns could produce wrong incremental refresh results.
Safer snapshots
Snapshot creation and restore are now fully atomic. If anything goes wrong mid-operation — a disk error, a timeout, a lost connection — the operation is cleanly rolled back and no partial tables are left behind.
Restoring from a snapshot no longer relies on PostgreSQL's internal column ordering, making restores safe across different PostgreSQL minor versions.
Bounded memory for in-flight update data
The internal cache that stores update data between steps was previously unbounded. On deployments with many stream tables, it could grow large over time. The cache now enforces a configurable maximum and evicts the oldest entries when full, keeping memory usage predictable.
Additionally, cached query templates now expire after a configurable age (default: 7 days). Old plans are automatically removed during background maintenance, preventing stale query plans from accumulating.
Complexity cap for queries
A new pg_trickle.max_parse_nodes setting lets you cap query complexity.
Queries that exceed the limit are rejected immediately with a clear error
instead of consuming unexpected memory.
New settings
| Setting | Default | What it does |
|---|---|---|
pg_trickle.use_sqlstate_classification | off | Categorise errors by SQL error code (useful for automated retry logic) |
pg_trickle.template_cache_max_age_hours | 168 (7 days) | Evict cached query plans older than this |
pg_trickle.max_parse_nodes | 0 (disabled) | Reject queries that exceed this complexity limit |
Upgrade
No schema changes. Upgrade from v0.29.0 with:
ALTER EXTENSION pg_trickle UPDATE TO '0.30.0';
[0.29.0] — Relay CLI (pgtrickle-relay)
This release introduces pgtrickle-relay — a standalone companion tool that
connects pg_trickle to the outside world.
What is pgtrickle-relay?
The relay bridges pg_trickle's inbox and outbox tables with external messaging systems, handling the reliable "last mile" of getting data in and out of your database.
- Forward (outbox → external): Watches your pg_trickle outbox tables and forwards new records to external systems as they arrive. Supported destinations include Kafka, NATS, HTTP webhooks, Redis Streams, AWS SQS, RabbitMQ, and plain text output.
- Reverse (external → inbox): Reads messages from external systems and writes them into your pg_trickle inbox tables, enabling fully bidirectional event-driven pipelines.
Configured entirely through SQL
There are no YAML files or config files to manage. You set up and manage relay pipelines with SQL:
| Function | What it does |
|---|---|
pgtrickle.set_relay_outbox(...) | Configure an outbox-to-external pipeline |
pgtrickle.set_relay_inbox(...) | Configure an external-to-inbox pipeline |
pgtrickle.enable_relay(name) | Start a relay pipeline |
pgtrickle.disable_relay(name) | Pause a relay pipeline |
pgtrickle.delete_relay(name) | Remove a relay pipeline |
pgtrickle.list_relay_configs() | List all configured pipelines |
Built for reliability
- No duplicate messages: Every destination uses a deduplication key to prevent the same message from being delivered more than once, even if the relay restarts mid-send.
- High availability: Multiple relay instances can run simultaneously and coordinate automatically using database-level locks — no external coordination service such as ZooKeeper or Redis is needed.
- Live config updates: Change relay configuration in SQL and it takes effect within seconds, with no restart.
- Built-in monitoring: Health check at
/healthand Prometheus metrics at/metrics(port 9090 by default).
Upgrade notes
ALTER EXTENSION pg_trickle UPDATE TO '0.29.0';
The relay binary is distributed separately (see Dockerfile.relay). Existing
stream tables, views, and outbox/inbox APIs are unchanged.
[0.28.0] — Transactional Inbox & Outbox Patterns
This release adds two complementary patterns for reliably integrating pg_trickle with external systems.
The problem these patterns solve
When you update a database and need to notify an external system — a message queue, an API, a downstream service — you face a reliability challenge: what happens if the database update succeeds but the notification fails? You can end up with data in your database that the external system never heard about, or a notification sent for a change that was rolled back.
The outbox pattern solves this: the notification is written in the same database transaction as the data change, so they either both succeed or both fail. pg_trickle then delivers the notification reliably once the transaction has committed.
The inbox pattern is the reverse: external messages arrive into a managed queue inside PostgreSQL, where they can be processed reliably, retried on failure, and replayed if needed.
Outbox
Enable the outbox on any stream table with pgtrickle.enable_outbox(). After
each refresh, pg_trickle writes a record to a dedicated outbox table. Your
application or the relay tool picks it up from there and forwards it to
external consumers.
Consumers can work in named consumer groups — similar to Kafka consumer groups. Each consumer tracks its own position in the stream independently and can be replayed, paused, or have its lease extended without affecting others.
| Function | What it does |
|---|---|
pgtrickle.enable_outbox(name, retention_hours) | Start capturing refresh output for external delivery |
pgtrickle.disable_outbox(name) | Stop capturing |
pgtrickle.outbox_status(name) | See the current outbox state |
pgtrickle.outbox_rows_consumed(stream_table, outbox_id) | Acknowledge that records have been delivered |
pgtrickle.create_consumer_group(name, outbox, ...) | Create a named group of consumers |
pgtrickle.drop_consumer_group(name) | Remove a consumer group |
pgtrickle.poll_outbox(group, consumer, batch_size, ...) | Claim the next batch of records |
pgtrickle.commit_offset(group, consumer, last_offset) | Acknowledge processed records |
pgtrickle.extend_lease(group, consumer, ...) | Hold onto a batch longer before it times out |
pgtrickle.seek_offset(group, consumer, new_offset) | Jump to a specific position (for replay) |
pgtrickle.consumer_heartbeat(group, consumer) | Signal that a consumer is still alive |
pgtrickle.consumer_lag(group) | See how far behind each consumer is |
Inbox
Create a named inbox with pgtrickle.create_inbox(). pg_trickle automatically
sets up a pending queue, a dead-letter queue (for messages that could not be
processed), and a stats table.
| Function | What it does |
|---|---|
pgtrickle.create_inbox(name, ...) | Create a managed inbox with pending queue and dead-letter queue |
pgtrickle.drop_inbox(name, ...) | Remove an inbox |
pgtrickle.enable_inbox_tracking(name, ...) | Attach inbox tracking to an existing table |
pgtrickle.inbox_health(name) | Get a health summary for an inbox |
pgtrickle.inbox_status(name) | Show queue depths and processing stats |
pgtrickle.replay_inbox_messages(name, event_ids) | Reset specific messages for re-processing |
Additional inbox capabilities:
- Ordered processing:
pgtrickle.enable_inbox_ordering()ensures messages for the same entity (e.g. the same customer or order ID) are processed in sequence, eliminating race conditions without any extra coordination in your application. - Priority tiers:
pgtrickle.enable_inbox_priority()marks messages as high or low priority so the scheduler processes urgent messages first. - Horizontal scaling:
pgtrickle.inbox_is_my_partition()provides consistent hash-based partition assignment for multi-worker inbox processing. Multiple workers can safely share an inbox without an external coordinator. - Gap detection:
pgtrickle.inbox_ordering_gaps()surfaces any sequence gaps per entity so you can detect and recover from missing messages.
New settings
| Setting | Default | What it does |
|---|---|---|
pg_trickle.outbox_enabled | on | Enable the outbox subsystem |
pg_trickle.outbox_retention_hours | 24 | How long to keep delivered outbox records |
pg_trickle.outbox_drain_batch_size | 1000 | Records to process per drain pass |
pg_trickle.outbox_skip_empty_delta | on | Skip writing an outbox record when there are no changes |
pg_trickle.consumer_dead_threshold_hours | 24 | Hours before a silent consumer is considered dead |
pg_trickle.inbox_enabled | on | Enable the inbox subsystem |
pg_trickle.inbox_processed_retention_hours | 72 | How long to keep processed inbox records |
pg_trickle.inbox_dlq_alert_max_per_refresh | 10 | Alert when this many messages land in the dead-letter queue in one cycle |
Upgrade
ALTER EXTENSION pg_trickle UPDATE TO '0.28.0';
[0.27.0] — Operability, Observability & DR
This release focuses on three areas: disaster recovery tooling, better visibility into multi-database deployments, and a more reliable built-in metrics server.
Snapshot and restore
You can now export a stream table's current data to an archive table and restore it later. This is useful for bootstrapping a new read replica without a full database dump, taking a point-in-time snapshot before a risky migration, or recovering a stream table to a known-good state.
| Function | What it does |
|---|---|
pgtrickle.snapshot_stream_table(name, target) | Export a stream table to an archive table |
pgtrickle.restore_from_snapshot(name, source) | Restore from an archive table |
pgtrickle.list_snapshots(name) | List available snapshots with size and age |
pgtrickle.drop_snapshot(snapshot_table) | Delete a snapshot |
Restore aligns the stream table's internal progress marker with the snapshot, so incremental refresh resumes correctly without any manual steps.
Predictive schedule recommendations
pg_trickle now analyses its own refresh history and recommends optimal refresh intervals for each stream table.
pgtrickle.recommend_schedule(name)returns a suggested interval and a confidence score. Confidence is low on new deployments and rises as history accumulates (at least 20 samples are needed before the score is meaningful).pgtrickle.schedule_recommendations()returns recommendations for all stream tables in one call.- A
predicted_sla_breachalert fires when the model predicts the next refresh is likely to miss your freshness target by more than 20%. The alert fires at most once every 5 minutes by default, to avoid flooding.
Cluster-wide worker visibility
In deployments that run pg_trickle across multiple databases,
pgtrickle.cluster_worker_summary() shows which databases are consuming
background workers. This makes it easy to diagnose situations where one
database is crowding out others.
All Prometheus metrics now include database-level labels, so you can split a single Grafana panel by database.
Metrics server improvements
- A new
pgtrickle.metrics_summary()SQL function returns cluster-wide refresh and error counts — useful for monitoring without a Prometheus scraper. - Port conflicts now produce a clear error message instead of failing silently.
- Malformed HTTP requests now return a proper
400 Bad Requestresponse.
New settings
| Setting | Default | What it does |
|---|---|---|
pg_trickle.schedule_recommendation_min_samples | 20 | Minimum history samples before schedule confidence is meaningful |
pg_trickle.schedule_alert_cooldown_seconds | 300 | Minimum seconds between consecutive predicted_sla_breach alerts |
pg_trickle.metrics_request_timeout_ms | 5000 | Maximum time the metrics server waits for a request (ms) |
Upgrade
This release upgrades the internal pgrx library to 0.18.0. This is transparent to users. Run:
ALTER EXTENSION pg_trickle UPDATE TO '0.27.0';
[0.26.0] — Test & Concurrency Hardening
This release is all about making pg_trickle more reliable and battle-tested. There are no new SQL commands or user-facing features — every change is internal: more tests, safer concurrent operations, cleaner code structure, and better error messages.
Safer under concurrent load
Running multiple operations at the same time — such as modifying a stream table while it's actively refreshing, or dropping a table while its workers are still running — is now explicitly tested and guaranteed to be safe. These scenarios were handled before, but lacked the tests to prove it. That proof is now part of every build.
- Simultaneous alter + refresh no longer risks a deadlock. The catalog stays consistent throughout.
- Drop during refresh aborts cleanly — no orphaned change buffers or dangling catalog rows left behind.
- Parallel scheduler workers are prevented from picking the same stream table for refresh at the same time — a hard guarantee, not just a convention.
- Simultaneous buffer promotion — when two workers race to promote a change buffer, exactly one succeeds and the metadata stays consistent.
More stable SLA-based scheduling
The scheduler uses a predictive model to decide when to refresh stream tables, balancing your freshness targets against system load. That model now holds its ground under difficult workloads.
- Bursty, sawtooth, and spike workloads are all validated in a new dedicated test suite.
- No more tier flapping — the priority tier of a stream table (which controls how aggressively it is refreshed) now requires 3 consecutive breaches before downgrading and 3 consecutive successes before upgrading. This prevents the system from oscillating at the boundary, which caused unnecessary refresh churn in earlier releases.
- A 10,000-iteration randomised stress test confirms the tier stays stable even under adversarial latency patterns.
Fuzz testing and extreme-scale validation
The extension is now tested against malformed, random, and adversarial inputs in three new fuzz test areas, preventing certain classes of unexpected input from crashing the extension:
- Invalid cron schedule expressions
- Unrecognised or malformed configuration values
- Unexpected row shapes in change-capture triggers
Two new scale tests verify behaviour at extremes:
- A source table with 1,000 partitions installs change-capture triggers and completes its first refresh within 60 seconds.
- A flooded worker pool does not starve high-priority stream tables in a second database — multi-database fairness is enforced under load.
Cleaner internals: refresh module reorganised
The refresh orchestrator had grown into a single very large file. It has been split into three focused modules with no behaviour change:
| Module | What it handles |
|---|---|
orchestrator | Deciding when and how to refresh — timing, cost model, recovery |
codegen | Building the SQL queries and managing the query cache |
merge | Executing the actual refresh — incremental, full, or TopK |
Better error messages
Error messages throughout the extension now include more context — table names, operation types, and hints such as "check system clock" on timestamp failures. This makes it easier to diagnose problems from logs alone.
A new crash-recovery test verifies that a publication subscriber that was active when the database was killed catches up with zero data loss after restart.
[0.25.0] — Scheduler Scalability & Pooler Performance
pg_trickle now comfortably manages thousands of stream tables on commodity hardware — a significant jump from the practical ceiling of a few hundred in earlier releases. The scheduler avoids reloading the full catalog on every tick, change detection is batched into far fewer database round-trips, and a new cache-sharing mechanism means connecting backends can skip expensive query re-parsing entirely. If you use a connection pooler such as PgBouncer, RDS Proxy, or Supabase Pooler, this release delivers the largest latency improvement to date.
Scales to thousands of stream tables
Previously, the scheduler queried the catalog on every tick — a process that grew slower as the stream table count increased. Metadata is now cached per backend and only reloaded when the dependency graph actually changes. Checking whether source tables have new rows is batched across an entire refresh group into a single query, down from one query per source per tick. Dependency-graph rebuilds now happen in the background without blocking ongoing refreshes, so you never get a stall when a stream table is created or dropped.
New GUC: pg_trickle.worker_pool_size (default 0 = spawn-per-task).
Set this to a positive number to keep that many background workers running
permanently, eliminating roughly 2 ms of spawn overhead per worker on
high-throughput deployments.
Faster connections through poolers
A new shared-memory signal lets each connecting backend check whether the query-template cache is already warm. If it is, the backend skips query parsing entirely and jumps straight to the cached result. This matters most in pooled environments — PgBouncer, RDS Proxy, Supabase — where backends connect and disconnect frequently and re-parsing on every connection was a hidden cost.
The per-backend template cache is now bounded by
pg_trickle.template_cache_max_entries (default 0 = unbounded). When
the limit is reached, the least-recently-used entry is evicted automatically,
keeping memory usage predictable on servers with many concurrent backends.
A new SQL function, pgtrickle.clear_caches(), flushes all cache levels
in one call — useful after schema changes or when debugging unexpected
behaviour.
Lower overhead on high-write workloads
Change fingerprinting — the hashing that identifies which rows changed — now streams values directly into the hash function instead of building a temporary string per row, eliminating one heap allocation per incoming change. SQL buffers in the query-projection step are pre-sized rather than repeatedly concatenated. Refresh timing data (how long full and incremental refreshes take) is stored in shared memory so parallel workers can read it without a catalog round-trip.
More conservative refresh-mode predictions
The predictive model that decides when to fall back from incremental to full refresh is now more stable. It waits for at least 60 seconds of history before making any prediction — preventing erratic switches on fresh deployments — removes statistical outliers before fitting, and keeps its output within a reasonable band around recent observed timings.
Subscriber lag tracking for downstream publications
If you use stream_table_to_publication() to feed a downstream system,
pg_trickle now monitors how far behind each subscriber's replication slot has
fallen. When a subscriber exceeds pg_trickle.publication_lag_warn_bytes,
a warning is logged and change-buffer cleanup is paused for that slot until it
catches up — preventing data loss for slow consumers.
A new SQL function, pgtrickle.worker_allocation_status(), returns
per-database worker usage, quotas, and queue depth across the cluster. Useful
for diagnosing scheduler starvation in multi-tenant deployments.
Upgrade notes
- Row ID change: The internal hash function changed from xxh64 to xxh3.
If your application relies on stable pg_trickle row ID values across
versions, run
SELECT pgtrickle.reinitialize('<schema>.<table>')on each affected stream table after upgrading. - No schema changes beyond two new SQL functions (
clear_cachesandworker_allocation_status). No data migration required.
[0.24.0] — Join Correctness & Durability Hardening
This release focuses on two themes: correctness — ensuring stream tables that join multiple source tables always give you the right answer — and durability — ensuring your data is never lost or skipped, even when the server crashes or long-running transactions are in flight.
More accurate results from multi-table joins
When a stream table combines rows from two or more source tables, pg_trickle now guarantees that an incremental refresh produces exactly the same result as a full recompute from scratch. A subtle bug in how rows were tracked across refresh cycles could previously cause phantom rows to accumulate silently over time. Those phantom rows are now detected automatically after every incremental refresh and cleaned up.
No data loss across crashes or restarts
pg_trickle now records its progress in a crash-safe sequence: it saves its intent before writing data, then marks completion afterwards. If the server goes down between those two steps, pg_trickle reconciles its position on restart — no changes are processed twice and none are silently dropped. The scheduler also persists its last known-safe position across restarts, closing a narrow gap that existed in earlier versions.
Long-running transactions no longer cause missed changes
If a database transaction stays open while pg_trickle is running a refresh, the changes it is writing could previously be overlooked — they were captured before the refresh started but not yet visible to it. pg_trickle now checks for open transactions before advancing its read position and waits for them to commit first.
pg_trickle.frontier_holdback_mode— controls the holdback behaviour.pg_trickle.frontier_holdback_warn_seconds(default60) — logs a warning when a transaction has been blocking progress longer than this threshold.
Works correctly on managed cloud databases
AWS RDS, Cloud SQL, and Azure Database for PostgreSQL restrict access to certain monitoring views. pg_trickle now detects this automatically and tells you exactly what to do:
GRANT pg_monitor TO <your_pg_trickle_role>;
Without this grant, pg_trickle previously behaved as if no transactions were
open — the same unsafe condition the holdback feature was built to prevent.
See docs/TROUBLESHOOTING.md section 14 for full diagnosis steps.
Choose your durability level
The new pg_trickle.change_buffer_durability setting controls how
carefully incoming changes are stored before processing:
unlogged(default) — fastest; change buffers do not survive a server crash.logged— survives crashes and replicates to standby servers.sync— maximum safety; every write is confirmed to disk before continuing.
Automatic history clean-up
Old refresh history rows are now pruned automatically in small background batches during idle time. Previously the history table grew without bound, which could become noticeable on busy deployments.
Alerts for frozen stream tables
The new pgtrickle.df_frozen_stream_tables view flags any stream table
that has not refreshed within 5× its expected interval, and sends a
notification on the pgtrickle_alert channel. Useful for catching a stuck or
disabled stream table before users notice stale data.
New monitoring metrics
Two new Prometheus metrics expose holdback state:
pg_trickle_frontier_holdback_lsn_bytes— how far behind the read position is being held, in bytes of WAL.pg_trickle_frontier_holdback_seconds— how long the oldest blocking transaction has been running.
Note: All metrics now use the
pg_trickle_prefix consistently. If your dashboards or alerting rules use the oldpgtrickle_prefix, update them before upgrading.
[0.23.0] — Performance Tuning & Diagnostics
This release gives you better tools to understand and control how pg_trickle performs, with new settings for memory tuning and new functions for inspecting what the extension is doing under the hood.
See exactly what SQL is running
Turn on pg_trickle.log_delta_sql and pg_trickle will log the SQL it
generates for each incremental refresh. You can paste that SQL directly
into EXPLAIN ANALYZE to understand why a particular refresh is taking
longer than expected — no code changes required.
Tune memory for refreshes without restarting
pg_trickle.delta_work_mem lets you give incremental refresh queries more
(or less) working memory without touching PostgreSQL's global settings or
restarting the server. Apply it instantly with:
ALTER SYSTEM SET pg_trickle.delta_work_mem = 256;
Automatic statistics before each refresh
pg_trickle now runs a quick statistics pass on change buffers before
executing an incremental refresh. This gives PostgreSQL's query planner
accurate row counts and generally produces faster, more predictable query
plans with no manual intervention. Controlled by pg_trickle.analyze_before_delta
(on by default).
Warning when incremental is unexpectedly slower than full
If an incremental refresh takes longer than the last full refresh, pg_trickle now logs a warning that includes both timings. This surfaces scenarios where incremental refresh has become counterproductive so you can investigate and adjust thresholds.
Alert when too many changes pile up
Set pg_trickle.max_change_buffer_alert_rows to a row count and pg_trickle
will warn you whenever any source table's pending change buffer exceeds
that threshold. This is useful for catching unexpected write bursts before
they slow down your refreshes.
Refresh timing statistics at a glance
The new pgtrickle.pgtrickle_refresh_stats() function returns per-stream-table
refresh durations — average, 95th percentile, and 99th percentile — in a
single query. No need to manually aggregate the history table.
Inspect generated SQL without running it
Call pgtrickle.explain_diff_sql(name) on any stream table to see the SQL
pg_trickle would use for an incremental refresh — without actually executing
it. Useful for understanding query structure and diagnosing performance issues.
[0.22.0] — Downstream CDC, Parallel Refresh & Predictive Cost Model
This release makes it easier to feed stream table changes to other systems, gives you a knob to control how many refreshes run at once, and adds automatic intelligence for choosing between incremental and full refresh.
Stream table changes can flow to other systems
stream_table_to_publication(name) creates a PostgreSQL logical replication
publication for a stream table. Any downstream tool that understands
PostgreSQL replication — Debezium, Kafka Connect, a read replica, or a
custom consumer — can then subscribe and receive changes as they happen.
Publications are removed automatically when the stream table is dropped.
Use drop_stream_table_publication(name) to remove one manually.
Control how many tables refresh at once
pg_trickle.max_parallel_workers caps the number of stream tables that
can refresh simultaneously. The scheduler already runs independent refreshes
in parallel; this setting gives you an explicit limit if you want to reserve
database resources for your application.
Automatic mode switching based on predicted cost
pg_trickle now learns from your refresh history. Before each incremental refresh it predicts how long it will take based on recent timings. If that prediction exceeds 1.5× the cost of a full refresh, it switches to full refresh for that cycle automatically — no manual intervention needed. The lookback window, threshold, and minimum sample count are all configurable:
pg_trickle.prediction_window— how many recent refreshes to consider (default 60).pg_trickle.prediction_ratio— how much more expensive incremental must be before switching to full (default 1.5).pg_trickle.prediction_min_samples— minimum history before the model activates (default 5).
Set a freshness target and let pg_trickle handle the rest
Call set_stream_table_sla(name, interval) with your target maximum data
age — for example '5 seconds' or '1 minute' — and pg_trickle assigns
the most appropriate refresh tier automatically. It re-evaluates the
assignment over time as real-world refresh performance changes.
[0.21.0] — Reliability, Safety & Operational Tools
This release focuses on making pg_trickle safer and easier to operate day-to-day. It eliminates hidden crash risks in the query analysis engine, adds new operational commands for maintenance windows, and introduces a built-in monitoring endpoint so you don't need extra software to observe pg_trickle.
The extension can no longer crash your database
When pg_trickle analyses a query internally, it previously had hidden error paths that could — in rare edge cases — abort a PostgreSQL backend process. All of those paths now return a structured error instead of crashing. Additionally, a compile-time rule now prevents production code from ever calling the Rust equivalent of an unchecked assertion, so this class of bug cannot be reintroduced silently.
Warning for queries that shouldn't use incremental refresh
If you create a stream table with a query that calls time-sensitive or
non-deterministic functions such as now(), random(), or
gen_random_uuid(), pg_trickle now warns you at creation time. Those
functions produce a different result every time they run, which means
incremental refresh would produce wrong answers — the warning lets you
catch this before it becomes a data problem.
Pause and resume everything at once
Two new functions let you halt and restart all active stream tables with a single SQL call:
SELECT pgtrickle.pause_all(); -- stop all refreshes (e.g. before maintenance)
SELECT pgtrickle.resume_all(); -- restart them when you're done
Refresh only when the data is actually stale
pgtrickle.refresh_if_stale(name, max_age) triggers a refresh only if the
stream table is older than your specified threshold. Returns TRUE when a
refresh ran, FALSE when the data was already fresh enough. Useful for
scripts and scheduled jobs that shouldn't over-refresh.
Export a stream table's definition
pgtrickle.stream_table_definition(name) returns the complete
CREATE STREAM TABLE statement for any stream table. Handy for
documentation, disaster recovery playbooks, and migrations.
Test query changes safely before going live
A three-step canary workflow lets you try a new query on a shadow copy of your stream table and compare the results before committing to the change:
canary_begin(name, new_query)— creates a shadow stream table running the new query in parallel with the original.canary_diff(name)— shows exactly which rows differ between the old and new queries.canary_promote(name)— atomically switches the live stream table to the new query once you are satisfied with the results.
Built-in monitoring endpoint
Set pg_trickle.metrics_port = 9188 and pg_trickle serves a Prometheus-
compatible metrics endpoint directly — no extra exporter software needed.
Metrics include total refreshes, failures, rows changed per refresh, and
the number of active stream tables.
Visibility into recursive query fallbacks
When a query containing a recursive clause cannot be refreshed incrementally and falls back to a full refresh, pg_trickle now logs a notice and records the reason in refresh history. Previously this happened silently.
Upgrade
ALTER EXTENSION pg_trickle UPDATE TO '0.21.0';
[0.20.0] — Self Monitoring
pg_trickle now monitors itself. Instead of you having to check on
pg_trickle's health manually, this release lets pg_trickle watch its own
performance, spot problems early, and even fix some of them on its own. Five
new stream tables sit in the pgtrickle schema and continuously analyse
refresh history — the same technology you use for your own data, pointed
inward. One SQL call sets everything up; one call tears it down.
We call this self monitoring — pg_trickle uses its own stream-table technology to keep an eye on itself, just like it keeps your data views up to date.
What's new
-
One-click self-monitoring — run
SELECT pgtrickle.setup_self_monitoring()and pg_trickle creates five monitoring stream tables that continuously track how well it is performing. Runteardown_self_monitoring()to remove them. Both are idempotent — safe to call as many times as you like, even during rolling upgrades. -
Health at a glance — the new
self_monitoring_status()function shows the status of all five monitoring views in one query: whether each one exists, its refresh mode, and the last time it refreshed. Quick to run from a monitoring script or dashboard. -
Threshold recommendations — after enough refresh cycles accumulate (typically 10–20 minutes of activity),
df_threshold_advicestarts producing suggestions for each stream table. Each recommendation includes a confidence level (HIGH / MEDIUM / LOW) and a reason — for example, "DIFF is 73% faster — raise threshold to allow more DIFF". Asla_headroom_pctcolumn shows exactly how much faster incremental refresh is versus full refresh for that table. -
Automatic tuning — set
pg_trickle.self_monitoring_auto_apply = 'threshold_only'and pg_trickle will apply HIGH-confidence threshold recommendations automatically. Changes are rate-limited to once per 10 minutes per stream table, and every adjustment is logged topgt_refresh_historywithinitiated_by = 'SELF_MONITOR'so you have a full audit trail. -
Real-time alerts — when pg_trickle detects an anomaly (duration spike exceeding 3× the baseline, or two or more recent failures), it sends a
NOTIFYon thepgtrickle_alertchannel with a JSON payload. Your application, Alertmanager webhook, orLISTENclient can act immediately without polling. -
Scheduling interference detection —
df_scheduling_interferencetracks pairs of stream tables that consistently overlap during refresh. When overlap is heavy, the scheduler automatically backs off its poll interval (up to 2× the configured base) to reduce contention. -
Visual dependency graph — the new
explain_dag()function renders your full refresh pipeline as a Mermaid or Graphviz DOT diagram. User stream tables appear in blue, self-monitoring tables in green, suspended tables in red. Paste the output into any Mermaid renderer ordotto see exactly how your tables depend on each other. -
Scheduler overhead report —
scheduler_overhead()returns metrics for the last hour: total refreshes, how many were self-monitoring, the fraction they represent, and average durations. Useful for confirming that self-monitoring adds negligible cost.
What pg_trickle watches
| Monitoring view | What it tracks |
|---|---|
df_efficiency_rolling | Rolling-window refresh speed, change ratio, DIFF vs FULL counts |
df_anomaly_signals | Duration spikes (> 3× baseline), error bursts, mode oscillation |
df_threshold_advice | Per-table threshold recommendations with confidence level and reasoning |
df_cdc_buffer_trends | Change-capture buffer growth rate per source table; alerts on burst spikes |
df_scheduling_interference | Refresh overlap patterns; pairs with 3+ concurrent refreshes in the last hour |
Faster and more reliable
- A new index on
pgt_refresh_history(pgt_id, start_time)speeds up all self-monitoring queries and general history lookups. Applied automatically during the 0.19.0 → 0.20.0 upgrade. - Old history records are now pruned in batches of 1,000 rows per transaction
(previously one large DELETE), which avoids long lock holds on
pgt_refresh_historyduring the nightly cleanup. check_cdc_health()is enriched with spill-risk alerts: if a source table's max burst delta exceeds 10× its average, you get an early warning before the buffer fills.explain_st()now shows two new properties:self_monitoring_coverage(none / partial / full) andrecommended_refresh_mode, so diagnostics automatically surface self-monitoring data when it is available.
New documentation and tooling
- SQL Reference — a new "Self Monitoring — Self-Monitoring" section covers
all five stream tables,
setup_self_monitoring(),teardown_self_monitoring(), confidence levels, and thesla_headroom_pctcolumn. - Getting Started — a new "Day 2 Operations" section walks through enabling self-monitoring, reading recommendations, enabling auto-apply, and visualising the DAG.
- Configuration —
pg_trickle.self_monitoring_auto_applyis fully documented with values, rate-limiting behaviour, and the audit trail. - A ready-made Grafana dashboard (
pg_trickle_self_monitoring.json) with five panels covers refresh throughput, anomaly heatmap, threshold calibration, CDC buffer growth, and the scheduling interference matrix. - A dbt macro (
pgtrickle_enable_monitoring) enables monitoring as a post-hook with one line indbt_project.yml. - A quick-start SQL script at
sql/self_monitoring_setup.sqlwalks through setup, auto-apply, alert listening, and status verification in six steps.
[0.19.0] — Security, Scheduler Performance & Operator Convenience
Safer, faster, easier to operate. This release closes several security and correctness gaps, adds new conveniences for operators and developers, and significantly improves performance for deployments with many stream tables. The background scheduler finds the next table to refresh 10–15× faster. Four breaking changes are included — all easy to adapt to, each one correcting behaviour that was a source of subtle bugs in production.
Breaking changes
-
Only owners can modify their own stream tables — other database users can no longer drop or alter a stream table they did not create. If shared access is intentional, grant superuser or explicitly add the user as owner. Superusers are unaffected.
-
Dropping a stream table no longer cascades —
drop_stream_table()now behaves like PostgreSQL's ownDROP TABLE: it refuses to drop if dependent objects exist, unless you passcascade => trueexplicitly. Previously it silently removed all dependents, which surprised operators after restructuring. -
The refresh notification channel was renamed — change
LISTEN pgtrickle_refreshtoLISTEN pg_trickle_refresh(note the added underscore). The old name was inconsistent with every other channel in the extension. -
The
delete_insertrefresh strategy was removed — this strategy could produce wrong results for queries containing aggregates orDISTINCT. If you had it configured, pg_trickle logs a warning and automatically switches to the safeautostrategy. No data is lost; the next refresh corrects any affected rows.
New features
-
Installation health check —
version_check()returns the installed extension version, the loaded library version, and the PostgreSQL server version in one row. If the extension was upgraded but the server has not been restarted, you get an explicit warning. Useful in deploy scripts and smoke tests. -
Write and refresh in one step —
write_and_refresh(sql, st_name)executes an arbitrary SQL statement and immediately refreshes the named stream table in the same transaction. Downstream readers see consistent results as soon as the transaction commits — no polling loop needed. -
Better connection-pooler support — the new
pg_trickle.connection_pooler_modeGUC configures pg_trickle for PgBouncer, pgcat, or Supavisor at the cluster level. Previously each stream table had to be configured individually, which was error-prone on large deployments. -
Automatic refresh history cleanup —
pgt_refresh_historyis now trimmed automatically after 90 days (configurable withpg_trickle.history_retention_days; set to0to disable). Without this, the history table could grow by thousands of rows per day on busy deployments. -
Schema migration tracking — pg_trickle now records which upgrade scripts have been applied in
pgtrickle.pgt_schema_version. This makes it straightforward to verify that a deployment is fully up to date and simplifies the rollback story. -
Clearer skip messages — when a refresh is skipped because another refresh of the same stream table is already running, you now see a
NOTICE: skipping refresh of <name> — already runningmessage instead of silence. Reduces confusion when debugging slow or stuck schedulers. -
Deeper diagnostics —
explain_st()gains awith_analyzeparameter. When set totrue, it runsEXPLAIN (ANALYZE, BUFFERS)on the refresh query and returns actual row counts, timing, and buffer hit/miss ratios — the same information PostgreSQL's query planner provides for any query, but surfaced inside the stream-table diagnostic tool. -
New deployment guides — step-by-step documentation for PgBouncer, pgcat, Supavisor, CNPG, and Kubernetes deployments, plus an operational runbook for common Kubernetes failure modes.
Bug fixes
-
Fixed a constraint-validation inconsistency in databases upgraded from 0.11.0 or earlier where
pgt_refresh_historyhad a duplicate check entry in the catalog. Affected databases could see spurious constraint errors on busy write paths. -
Error messages throughout the extension now show human-readable table names (e.g.
public.orders) instead of raw PostgreSQL OIDs. This affects "source table was dropped", "schema changed", and several other error paths that were previously unreadable without a catalog lookup.
Performance
-
10–15× faster scheduler dispatch — the scheduler now finds the next stream table to process with a direct lookup instead of scanning the full list on every poll cycle. On a deployment with 500 stream tables this drops from ~650 µs to ~45 µs per poll, reducing background CPU overhead significantly at scale.
-
Single-query change detection — when the scheduler checks whether any source tables have changed, it now issues one query covering all sources at once instead of one query per source table. On deployments with 50+ source tables this meaningfully reduces the overhead of each scheduler cycle, especially under PgBouncer transaction pooling.
[0.18.0] — Hardening & Delta Performance
Hardening & Delta Performance. This release focuses on correctness,
reliability, and giving operators better visibility into what pg_trickle is
doing. Stream tables that group by columns containing NULL values now refresh
correctly in all cases. A new memory safety net prevents runaway refreshes
from consuming too much RAM. Error messages across the board now explain what
went wrong and suggest how to fix it. Two new SQL functions —
health_summary() and cache_stats() — give you a single-query overview of
the entire system, and updated Grafana dashboards make monitoring plug-and-play.
The TPC-H industry benchmark now runs as a nightly regression guard, and
property-based tests mathematically verify the core delta engine's arithmetic.
Highlights
-
NULL values in GROUP BY now handled correctly — previous versions could produce wrong results when a stream table grouped by a column that contained NULL values and rows were deleted. The root cause was that NULL group keys broke the internal row-matching logic. This is now fixed: NULL keys are matched correctly during both inserts and deletes, so aggregate stream tables always return the right answer regardless of NULLs in the data.
-
Memory safety net for large deltas — if an unexpectedly large batch of changes arrives (for example, a bulk import into a source table), the incremental refresh could previously consume unbounded memory. A new configuration option (
pg_trickle.delta_work_mem_cap_mb) lets you set a ceiling. When a refresh would exceed it, pg_trickle automatically falls back to a full refresh instead of risking an out-of-memory crash. -
Early warning when refreshes spill to disk — when the incremental refresh engine runs low on memory, PostgreSQL may spill intermediate data to temporary files on disk, which is much slower. pg_trickle now detects this and sends a notification so you can investigate before performance degrades. If spilling happens repeatedly, the scheduler automatically switches the affected stream table to full refresh.
-
One-query system health check — the new
pgtrickle.health_summary()function returns a single row with everything you need at a glance: how many stream tables are active, how many are in error or suspended state, the worst staleness across all tables, whether the scheduler is running, and the overall cache hit rate. Perfect for dashboards, alerting rules, or a quick manual check. -
Cache performance visibility — the new
pgtrickle.cache_stats()function shows how effectively pg_trickle is reusing its internal query templates. You can see cache hit rates, eviction counts, and current cache size — useful for tuningpg_trickle.template_cache_sizeon busy systems. -
Better error messages — every error pg_trickle can raise now includes a standard PostgreSQL error code (SQLSTATE), a DETAIL line explaining the context, and a HINT suggesting what to do. Instead of a cryptic internal error, you get actionable guidance like "Table 'orders' was dropped while stream table 'order_summary' depends on it — recreate the source table or drop the stream table."
Monitoring & dashboards
-
Updated Grafana dashboards — the bundled
pg_trickle_overview.jsondashboard now includes panels for template cache hit rate, P99 and average refresh latency, hourly refresh success/failure counts, and cache eviction trends. Import it into Grafana and point it at your Prometheus instance for instant visibility. -
Prometheus metric documentation — all 8 new metrics exposed by
cache_stats()andhealth_summary()are now fully documented in the monitoring guide, with ready-to-use PromQL queries.
Correctness & testing
-
TPC-H regression guard — all 22 queries from the TPC-H industry benchmark now run nightly against known-good expected output. If a code change causes any query to return different results, CI fails immediately. This catches subtle correctness regressions that targeted tests might miss.
-
Mathematical proof of delta arithmetic — 6 property-based tests (2,000 random cases each) verify that the core engine's insert/delete accounting is correct: operations compose in the right order, groups cancel out properly, and no phantom rows appear after mixed workloads. An additional 4 end-to-end property tests exercise the full pipeline from change capture through to the final merged result.
-
CDC edge case coverage — new tests cover composite primary keys, generated (computed) columns, NULL values in non-key columns, and domain types — real-world schema patterns that were previously untested.
-
dbt integration tests — the dbt adapter now has regression tests for AUTO refresh mode, stream table health checks, and refresh history lifecycle — ensuring the dbt workflow stays reliable across releases.
Scalability
-
Scaling guide — a new
docs/SCALING.mddocument covers how to configure pg_trickle for large deployments (200+ stream tables), including worker pool sizing, tiered scheduling, per-database quotas, and tuning profiles for different workload types. -
Buffer growth stress tests — new tests verify that the
max_buffer_rowssafety limit works correctly under sustained high write rates, including automatic recovery back to incremental refresh after a burst subsides.
Testing infrastructure
-
Faster CI on pull requests — 19 additional test files (~197 tests) were moved to the lightweight test runner that does not require building a custom Docker image. Pull request CI is now faster without sacrificing coverage.
-
Upgrade path tested — the full upgrade chain from version 0.1.3 through every release up to 0.18.0 is verified automatically in CI, including function availability, schema integrity, and data survival.
Fixed
- Upgrade script completeness — the 0.17.0→0.18.0 upgrade migration now
includes all new and changed functions (
pg_trickle_hash,cache_stats(),health_summary()), soALTER EXTENSION pg_trickle UPDATEworks correctly.
[0.17.0] — Query Intelligence & Stability
Query Intelligence & Stability. This release teaches pg_trickle to make smarter decisions about how to refresh each stream table, reduces unnecessary work when only a handful of columns actually changed, and proves correctness through 10,000 automated random mutations every night. Large deployments with hundreds of stream tables now handle schema changes much faster. Alongside these improvements, three new documentation resources make it easier to get started, troubleshoot problems, and migrate from pg_ivm.
Highlights
-
Query-aware refresh decisions — pg_trickle previously used a fixed threshold to decide between incremental and full refresh: if more than 50% of rows changed, switch to full. That works for simple queries but is poorly calibrated for joins or aggregates. The engine now classifies each query by its complexity (simple scan, filter, aggregate, join, or join+aggregate) and weights the cost estimate accordingly. Simple queries stay incremental even at high change rates; expensive join-heavy queries switch to full refresh sooner when the data is largely different. You can also pin a table to always use one strategy with the new
pg_trickle.refresh_strategysetting ('auto'/'differential'/'full'), or tune the aggressiveness withpg_trickle.cost_model_safety_margin. -
Skip columns that did not change — when a row is updated in a wide source table (say, 50 columns) but only 2 columns that the stream table actually uses are modified, pg_trickle previously processed the full change anyway. It now tracks exactly which columns were modified and skips updates that touch none of the relevant columns. For aggregate stream tables the savings go further: a value-only update that does not affect group membership is applied as a single lightweight correction instead of a delete-then-insert pair. On write-heavy workloads with wide tables, this reduces the volume of data flowing through the refresh pipeline by 50–90%.
-
Faster schema changes on large deployments — every time you create, alter, or drop a stream table, pg_trickle previously rebuilt the entire internal dependency graph from scratch. With 100 stream tables that takes only a few milliseconds, but at 1,000 it becomes noticeable. The graph is now updated incrementally — only the affected edges are touched, leaving everything else in place. At 1,000 stream tables the rebuild time drops from ~600 µs to ~116 µs and no longer scales with the total number of tables in the database.
-
Nightly correctness oracle — a new automated test runs 10,000 random data mutations every night against a broad set of query shapes. For each mutation it compares the result of incremental refresh against a full recompute and fails if they ever disagree. This catches subtle correctness bugs that only surface after unusual sequences of inserts, updates, and deletes — the kind that hand-written tests rarely reach.
-
ROWS FROM()fully supported — queries that useROWS FROM()to call multiple set-returning functions side-by-side are now fully supported in incremental mode, including updates and deletes. This was previously restricted to insert-only workloads.
New documentation
-
Try it in 60 seconds — a new
playground/directory contains adocker compose upenvironment with PostgreSQL 18 + pg_trickle pre-wired, sample data loaded, and five stream tables ready to query. No installation required beyond Docker. -
Troubleshooting runbook —
docs/TROUBLESHOOTING.mdcovers 13 real-world failure scenarios: scheduler not running, stream table stuck in SUSPENDED state, CDC triggers missing, WAL slot problems, out-of-memory, disk full, circular dependency convergence issues, unexpected schema changes, worker pool exhaustion, and blown fuses. Each scenario lists symptoms, diagnostic queries, and step-by-step resolution. -
Migrating from pg_ivm —
docs/tutorials/MIGRATING_FROM_PG_IVM.mdis a step-by-step guide for teams moving from the pg_ivm extension. It maps every pg_ivm API to its pg_trickle equivalent, explains behavioral differences, and includes ready-to-run SQL examples and a post-migration verification checklist. -
New user FAQ — the top 15 common questions are now answered at the top of
docs/FAQ.mdso new users find answers before scrolling through the full document. -
Post-install verification script —
scripts/verify_install.sqlwalks through the complete setup: checks that pg_trickle is loaded, creates a test stream table, runs a refresh, verifies the result, and cleans up. Useful for confirming a fresh installation or diagnosing environment issues.
Stability & code quality
-
Safer internal code — the number of
unsafeRust blocks in the query parser was reduced from 690 to 441 (a 36% drop) by introducing two helper macros that wrap the most common unsafe patterns. No behavior change; this makes the codebase easier to audit and maintain. -
Cleaner internal structure — the largest source file (
api.rs, ~9,400 lines) was split into three focused modules. This has no user-visible effect but makes the codebase significantly easier to work with and reduces the risk of regressions from unrelated code being in the same file. -
Refresh logic extracted and tested — seven functions responsible for building the SQL used during refresh were extracted into standalone testable units and covered with 29 new unit tests. This catches regressions in generated SQL templates before they reach production.
[0.16.0] — Performance & Refresh Optimization
Performance & Refresh Optimization. This release makes stream table
refreshes significantly faster across the board. Small changes to large
tables are now applied without expensive full-table scans. Tables that only
receive new rows (no updates or deletes) use a streamlined path that skips
unnecessary work. Aggregate queries like SUM and COUNT are refreshed
with pinpoint updates instead of recalculating entire groups. A new template
cache eliminates repeated startup work when database connections are recycled.
An automated benchmark system now prevents future changes from accidentally
slowing things down.
Highlights
-
Smarter refresh for small changes — when only a handful of rows change in a large stream table (less than 1% of total rows), pg_trickle now uses a faster strategy that skips the full-table comparison. This can reduce refresh time by up to 40% for common workloads where most data stays the same between refreshes. The system picks the best strategy automatically, but you can override it via the
merge_strategysetting. -
Insert-only fast path — stream tables backed by append-only data sources (like event logs or audit trails that never update or delete rows) are now detected automatically and refreshed using a much simpler, faster path. No configuration is needed — pg_trickle observes your data patterns and switches to the fast path on its own. If an update or delete is later detected, it safely falls back to the standard approach with a warning.
-
Faster aggregate refreshes — stream tables that use
SUM,COUNT,AVG, orSTDDEVaggregates now update individual groups directly instead of re-joining against the entire table. For queries with many distinct groups, this can be 5–20× faster. Non-invertible aggregates likeMIN,MAX, andSTRING_AGGcontinue using the standard path. -
Template cache for faster cold starts — the first time a database connection refreshes a stream table, pg_trickle normally spends ~45 ms preparing the refresh query. A new cross-connection cache stores these prepared queries so that subsequent connections (including those from connection poolers like PgBouncer) start refreshing in about 1 ms instead.
-
Automated performance regression checks — every code change to pg_trickle is now automatically benchmarked before it can be merged. If any operation slows down by more than 10%, the change is blocked until the regression is fixed. This protects users from accidental performance degradation in future releases.
New features
-
Error reference guide — a new error reference page documents every error message pg_trickle can produce, explains what caused it, and suggests how to fix it. Useful when troubleshooting unexpected behavior in production.
-
Change buffer growth protection — if a stream table's refresh keeps failing, the backlog of unprocessed changes could previously grow without limit, consuming disk space. A new
max_buffer_rowssetting (default: 1,000,000 rows) caps this growth. When the limit is reached, pg_trickle performs a full refresh to clear the backlog and warns you about the situation. -
Automatic index creation control — pg_trickle has always created helpful indexes on stream tables automatically. A new
auto_indexsetting lets you disable this behavior when you want full control over indexing. Stream tables usingSELECT DISTINCTnow also get an automatic index on their distinct columns. -
Compaction and predicate pushdown stats — the
explain_st()diagnostics function now shows additional information about change buffer compaction thresholds, merge strategy selection, append-only mode, aggregate fast-path status, and template cache hit rates.
Improved
-
Configuration guidance — the documentation now includes detailed tuning advice for the
planner_aggressiveandcleanup_use_truncatesettings, especially for environments using connection poolers like PgBouncer or running under memory pressure. -
Terminal dashboard improvements — the
pgtrickleTUI dashboard now shows the effective refresh mode for each stream table (e.g., when a table is temporarily downgraded from differential to full refresh). The Alerts tab has been restructured with a clearer table layout and better distinction between "stale data" and "no upstream changes" conditions.
Fixed
-
Append-only detection with chained stream tables — stream tables that feed into other stream tables (cascading dependencies) now correctly skip the append-only fast path to avoid data inconsistencies. Previously, a chained stream table could incorrectly use the insert-only path even when downstream tables needed the full change set.
-
Append-only heuristic accuracy — the automatic detection of insert-only data sources now also checks the stream table's own change buffer for non-insert operations, avoiding false positives.
-
Full refresh fallback for mixed changes — when both a stream table and its source table have pending changes in the same refresh cycle, pg_trickle now correctly falls back to a full refresh to avoid inconsistencies.
-
resume_stream_table()confirmed working — the function referenced in error messages when a stream table entersSUSPENDEDstate was verified to exist and work correctly (present since v0.2.0).
Testing & quality
- 13 new end-to-end tests covering JOIN correctness across update/delete cycles, window function differential behavior, differential-vs-full equivalence validation, and source table schema evolution resilience.
- 5 new benchmark scenarios covering semi-joins, anti-joins, multi-table join chains, and aggregate queries at varying group counts. Total: 22 benchmark functions.
- 1,700 unit tests pass (up from 1,630 in v0.15.0).
[0.15.0] — Interactive TUI, Bulk Create & Runaway-Refresh Protection
0.15.0 brings the terminal dashboard to full operational capability, adds safety features that protect against runaway refreshes, and broadens the ecosystem with guides for popular migration and ORM frameworks. It also includes a major internal refactoring of the query parser and a new streaming benchmark suite.
Highlights
-
Interactive terminal dashboard — the
pgtrickleTUI is no longer read-only. Refresh, pause, resume, and repair stream tables directly from the dashboard. A command palette (:) with fuzzy search makes common operations fast. The poller reconnects automatically after network interruptions. -
Bulk creation —
pgtrickle.bulk_create()creates many stream tables in a single atomic transaction, ideal for CI/CD and dbt pipelines. -
Runaway-refresh protection — two new safety nets prevent expensive merges from spiralling: a pre-flight row-count estimate that downgrades to FULL refresh when deltas are too large (
max_delta_estimate_rows), and a spill detector that forces FULL refresh after repeated temp-file writes (spill_threshold_blocks). -
Stuck-watermark alerting — if an upstream ETL pipeline stops advancing its watermark, pg_trickle now pauses affected stream tables and sends a
watermark_stucknotification so the issue is surfaced immediately rather than silently producing stale data. -
Integration guides — new documentation for Flyway, Liquibase, SQLAlchemy, Django, and dbt Hub helps teams adopt pg_trickle alongside their existing tooling.
New Features
-
Volatile function policy — a new
volatile_function_policysetting lets you choose whether volatile functions (likerandom()orclock_timestamp()) should be rejected (the default), allowed with a warning, or allowed silently when creating stream tables. -
Bulk create API —
pgtrickle.bulk_create(definitions)accepts a JSON array of stream table definitions and creates them all in one transaction. If any definition fails, the entire batch is rolled back. -
Enhanced diagnostics —
pgtrickle.explain_st()now shows refresh timing statistics (min/max/average duration), partition info for partitioned source tables, and a dependency graph you can render with Graphviz. -
Join strategy override — the
merge_join_strategysetting lets you force a specific join method (hash_join,nested_loop, ormerge_join) during delta merges, which can help when the automatic heuristic doesn't suit your workload. -
Pre-flight delta estimation — when
max_delta_estimate_rowsis set, pg_trickle counts the delta rows before merging. If the count exceeds the limit, it falls back to a FULL refresh and logs a notice, preventing out-of-memory conditions on unexpectedly large change sets. -
Spill-aware refresh — if differential merges spill to disk repeatedly (controlled by
spill_threshold_blocksandspill_consecutive_limit), the scheduler switches to FULL refresh automatically. -
Stuck watermark hold-back — the
watermark_holdback_timeoutsetting detects watermarks that have not advanced within a configurable window. Downstream stream tables are paused and awatermark_stucknotification is emitted until the watermark advances again. -
Cascade drop —
drop_stream_table()now accepts an optionalcascadeparameter (defaulttrue). Setting it tofalseraises an error if dependent stream tables exist, matching PostgreSQL's RESTRICT behavior. -
Nexmark benchmark suite — a 10-query streaming benchmark (modelled on an online auction system) validates correctness under sustained high-frequency inserts, updates, and deletes.
-
17 new end-to-end tests — 7 tests for multi-level stream-table chains (3- and 4-level cascades with mixed refresh modes) and 10 tests for diamond/fan-in topologies with IMMEDIATE mode. No deadlocks were found.
Terminal Dashboard (TUI)
- Write actions — refresh, pause, resume, repair, reset fuse, and gate/ungate operations can now be performed without leaving the dashboard.
- Command palette — press
:for fuzzy-matched command entry with tab-completion. - Automatic reconnection — the dashboard reconnects with exponential back-off (up to 15 s) after a connection loss, with a visual indicator.
- Richer views — all 14 views now show additional live data (diagnostics, CDC health, refresh history with row-delta counts, error remediation hints, dependency-graph annotations, worker queue status, and watermark alignment).
- Cross-view filtering — the
/search filter now persists across all 10 list views. - Navigation re-fetch — moving between rows in the Detail view immediately fetches fresh data for the selected table.
- Toast messages — write actions show confirmation and error toasts.
- Sort cycling — press
s/Son the Dashboard to cycle through 6 sort modes. - Mouse support —
--mouseenables scroll-wheel navigation. - Theme toggle —
tor--theme dark|lightswitches colour themes. - JSON export —
Ctrl+Eor:exportwrites the current view to a file. - TLS support —
--sslmodeand--sslrootcertflags.
Documentation & Ecosystem
- Flyway / Liquibase guide — migration patterns for versioned and repeatable migrations, rollback blocks, and CI environments.
- SQLAlchemy / Django guide — read-only model patterns, write-blocking safeguards, DRF viewsets, and freshness checking.
- dbt Hub readiness — the
dbt-pgtricklepackage is version-synced and ready for dbt Hub submission. - Kubernetes / CNPG — updated probe configuration and a new deployment section in the Getting Started guide.
- Full documentation review — configuration reference expanded from 23 to 40+ settings, missing SQL reference entries filled in, outdated FAQ answers corrected.
Internal Improvements
- Parser modularisation — the 21 000-line query parser has been split
into 5 focused sub-modules (
types,validation,rewrites,sublinks, and the main entry point). No behavior change — all 1 687 unit tests pass. - Unsafe audit — every
unsafeblock in the codebase (~750 total) now has a// SAFETY:comment explaining why it is sound. - Shared-memory cache RFC — an RFC for a DSM-based MERGE template cache has been written, informing the v0.16.0 implementation plan.
- TRUNCATE handling verified — TRUNCATE on source tables in trigger CDC mode already triggers a FULL refresh; this is now documented.
- JOIN key-change fix verified — the v0.14.0 correctness fix for simultaneous JOIN key updates and DELETEs has been verified working and the former known-limitation note replaced with a description of the fix.
Bug Fixes
- Fixed a panic in the TUI when deserializing health-check data that returned 64-bit integers where 32-bit was expected.
- Fixed spurious "Error: db error" toasts in the TUI Detail view — background queries now degrade silently instead of surfacing transient errors.
- Fixed incorrect integer type annotations in two E2E tests for IMMEDIATE mode diamond topologies.
[0.14.0] — Tiered Scheduling, Diagnostics & TUI
0.14.0 is the Tiered Scheduling, Diagnostics & TUI release. It gives you fine-grained control over how often each stream table refreshes, adds tools that recommend the best refresh strategy for your workload, introduces a full-screen terminal dashboard for managing stream tables without SQL, and includes important security and reliability fixes.
Terminal Dashboard (TUI)
A new pgtrickle command-line tool lets you monitor and manage stream tables
from a terminal — no SQL required. Run it with no arguments to launch a
live-updating full-screen dashboard (think htop for stream tables), or use
one-shot subcommands like pgtrickle list, pgtrickle status, or
pgtrickle refresh for scripting and CI.
The interactive dashboard includes:
- Live overview — stream table statuses, refresh timing, and issue counts update every 2 seconds, with color-coded health indicators.
- Dependency graph — see how stream tables relate to each other in an ASCII tree view.
- Diagnostics — view refresh mode recommendations with confidence levels.
- CDC health — monitor change buffer sizes with warnings when they grow too large.
- Alert feed — real-time notification display with severity levels.
- Issue detection — automatically spots broken dependency chains, growing buffers, blown fuses, and stale data, with a persistent badge showing the issue count from any view.
- Watch mode —
pgtrickle watchprovides continuous non-interactive output suitable for log aggregation. - Output formats — all CLI subcommands support
--format json,--format csv, and human-readable table output.
See docs/TUI.md for the full user guide.
Tiered Refresh Scheduling
Stream tables can now be assigned to refresh tiers — hot, warm, cold, or frozen — to control how frequently they refresh:
- Hot (default) — refreshes at the configured interval.
- Warm — refreshes at 2× the interval.
- Cold — refreshes at 10× the interval, ideal for infrequently accessed reports.
- Frozen — pauses automatic refresh entirely until promoted back.
Assign a tier with
ALTER STREAM TABLE ... SET (tier = 'cold'). A NOTICE is emitted when
demoting from Hot to Cold or Frozen so operators are aware of the change in
refresh frequency.
Smarter Refresh Recommendations
Two new diagnostic functions help you choose the most efficient refresh strategy for each stream table:
-
pgtrickle.recommend_refresh_mode(name)— analyzes seven workload signals (change frequency, timing history, query complexity, table size, index coverage, and latency patterns) and recommends FULL or DIFFERENTIAL mode with a confidence level and plain-language explanation. Useful when you're unsure which mode will be faster for a particular table. -
pgtrickle.refresh_efficiency(name)— shows per-table refresh performance: how many FULL vs. DIFFERENTIAL refreshes have run, average timing for each, and the speedup factor. Good for monitoring dashboards and alerting.
A new tutorial — Tuning Refresh Mode — walks through the process step by step.
Reduced Write Overhead with UNLOGGED Buffers
Enable pg_trickle.unlogged_buffers = true and newly created change buffer
tables will skip write-ahead logging, reducing WAL volume by roughly 30%.
This is ideal for workloads where you can tolerate a full re-sync after a
crash (the extension detects the crash and re-syncs automatically).
A utility function — pgtrickle.convert_buffers_to_unlogged() — converts
existing buffers in one call. Run it during a maintenance window since it
briefly locks each buffer table.
Instant Error Detection
Previously, when a stream table's refresh hit a permanent error (for example,
a function that doesn't exist for the column type), the extension would retry
several times before giving up. Now it recognizes permanent errors immediately,
sets the stream table status to ERROR with a clear error message, and
stops retrying. You can see the error at a glance in the stream_tables_info
view or the TUI dashboard, and fix it by altering the stream table's query.
Security Hardening
- CDC trigger functions now use
SECURITY DEFINER— change-data-capture trigger functions run with the privileges of the extension owner rather than the current user, preventing privilege escalation through modified search paths. - Explicit
SET search_path— all CDC trigger functions now setsearch_pathtopgtrickle_changes, pg_catalogto prevent search-path manipulation attacks.
Other Improvements
-
Export definitions —
pgtrickle.export_definition(name)exports a stream table's full configuration as reproducible SQL (DROP+CREATE+ALTERstatements), making it easy to version-control or migrate stream table definitions between environments. -
Creation-time warnings — when creating a stream table with aggregates like
MIN,MAX, orSTRING_AGGin DIFFERENTIAL mode, a warning now suggests that FULL or AUTO mode may be more efficient. For algebraic aggregates (SUM/COUNT/AVG), the warning only appears when the estimated number of groups is below a configurable threshold. -
Simplified settings — the
merge_planner_hintsandmerge_work_mem_mbsettings have been consolidated into a singleplanner_aggressiveswitch. The old setting names still work but are ignored in favor of the new one. -
GHCR Docker image — a multi-architecture Docker image (
ghcr.io/trickle-labs/pg_trickle) with PostgreSQL 18.3 and pg_trickle pre-installed is now published automatically on each release. -
Pre-deployment checklist — new PRE_DEPLOYMENT.md with a 10-point checklist for production deployments.
-
Best-practice patterns guide — new PATTERNS.md with 6 common patterns: Bronze/Silver/Gold materialization, event sourcing, slowly-changing dimensions, high-fan-out topology, real-time dashboards, and tiered refresh strategies.
-
Keyless dedup fix — replaced
MAX(col)witharray_agg(col)[1]for deduplicating keyless scan results, which is more correct for non-orderable types.
Bug Fixes
-
ST-on-ST differential refresh — manually refreshing a stream table that reads from another stream table now uses true incremental (DIFFERENTIAL) refresh instead of falling back to a full re-scan. This matches the behavior of the automatic scheduler and is significantly faster for large tables.
-
Staleness tracking — the staleness indicator now uses the actual last refresh time instead of an internal data timestamp, making the
pg_stat_stream_tablesview more accurate.
Testing & Reliability
-
Soak test — a new long-running stability test validates zero worker crashes, zero ERROR states, and stable memory usage under sustained mixed workload (configurable duration, default 10 minutes).
-
Multi-database isolation test — verifies that two databases in the same PostgreSQL cluster run pg_trickle independently without interference.
-
140 TUI tests — comprehensive unit, snapshot, and interaction tests for the terminal dashboard.
-
23 mixed-object E2E tests — validates stream tables alongside regular PostgreSQL views, materialized views, and other objects.
-
Scheduler race fixes — eliminated flaky test failures caused by scheduler timing races and GUC leak between tests.
New SQL Functions
| Function | Purpose |
|---|---|
pgtrickle.recommend_refresh_mode(name) | Workload-based refresh mode recommendation |
pgtrickle.refresh_efficiency(name) | Per-table refresh performance metrics |
pgtrickle.export_definition(name) | Export stream table as reproducible DDL |
pgtrickle.convert_buffers_to_unlogged() | Convert logged change buffers to UNLOGGED |
New Settings
| Setting | Default | Purpose |
|---|---|---|
pg_trickle.planner_aggressive | true | Consolidated switch for MERGE planner hints |
pg_trickle.unlogged_buffers | false | Create new change buffers as UNLOGGED |
pg_trickle.agg_diff_cardinality_threshold | 1000 | Warn about DIFFERENTIAL mode below this group count |
Deprecated
pg_trickle.merge_planner_hints— Usepg_trickle.planner_aggressiveinstead. Still accepted but ignored at runtime.pg_trickle.merge_work_mem_mb— Same; useplanner_aggressiveinstead.
Upgrading
Run ALTER EXTENSION pg_trickle UPDATE; after installing the new binaries.
The upgrade adds new catalog columns, functions, and the TUI workspace member.
No breaking changes — everything from v0.13.0 continues to work. See
UPGRADING.md for details.
[0.13.0] — Scalability Foundations & Full TPC-H Coverage
0.13.0 is the Scalability Foundations release. It makes pg_trickle handle large tables, complex queries, and multi-tenant deployments much more efficiently — and it achieves a major milestone: all 22 TPC-H benchmark queries now run in incremental (DIFFERENTIAL) mode, meaning the engine no longer needs to fall back to slow full-refresh for any standard analytical query pattern.
Smarter Change Detection for Wide Tables
When you UPDATE a few columns in a large table — say, changing a status
column in a 60-column table — pg_trickle used to treat every column as
potentially changed, doing extra work to keep all downstream views up to date.
Now it knows the difference. Columns used in GROUP BY, JOIN, or WHERE clauses are "key columns"; everything else is a "value column." When only value columns change, the engine takes a shortcut: it sends a single correction row instead of a full delete-and-reinsert pair. For wide-table workloads, this can cut the volume of data processed by 50% or more.
Shared Change Buffers
If you have several stream tables watching the same source table, each one used to maintain its own private copy of the change log. That's wasteful. Now they share a single change buffer per source, and each consumer simply tracks how far it has read. The slowest reader protects the buffer for everyone.
You can see how this is working with the new pgtrickle.shared_buffer_stats()
function — it shows each buffer, who's reading from it, how many rows are
queued, and whether it's been automatically partitioned for performance.
Automatic Buffer Partitioning
Set pg_trickle.buffer_partitioning = 'auto' and pg_trickle will start with
simple, unpartitioned change buffers. If a buffer starts accumulating a lot of
rows (high-throughput sources), it automatically converts to a partitioned
layout where old data can be removed almost instantly instead of deleting rows
one by one.
More Partitioning Options for Stream Tables
Building on the RANGE partitioning added in v0.11.0, you can now partition stream tables in three additional ways:
- Multi-column keys — partition by a combination of columns
(
partition_by='region,year') - LIST partitioning — for low-cardinality columns like
statusortype(partition_by='LIST:status') - HASH partitioning — for even distribution across a fixed number of
partitions (
partition_by='HASH:customer_id:8')
You can also change the partition key of an existing stream table at runtime
with alter_stream_table(partition_by => ...) — data is preserved
automatically. If rows land in the default (catch-all) partition, a WARNING
is emitted to prompt you to add explicit partitions.
All 22 TPC-H Queries Now Run Incrementally
The DVM (differential view maintenance) engine received its most significant set of improvements yet, targeting the complex multi-table join patterns found in standard analytical benchmarks:
- Smarter pre-image lookups — instead of reconstructing what the data looked like before a change by subtracting deltas (expensive for large tables), the engine now uses targeted index lookups that only touch the rows that actually changed.
- Predicate pushdown — WHERE conditions from the original query are now pushed into the delta computation, preventing unnecessary cross-products in multi-table joins.
- Deep-join optimizations — queries joining 5+ tables get automatic planner hints (more memory, smarter join strategies) to avoid spilling to disk.
- Scan-count-aware strategy selector — queries that exceed configurable join complexity or delta volume thresholds automatically fall back to full refresh on a per-query basis rather than failing.
The result: all 22 TPC-H queries pass at SF=0.01 in DIFFERENTIAL mode
with zero drift across 3 refresh cycles. The DIFFERENTIAL_SKIP_ALLOWLIST
(queries that previously required full refresh) is now empty.
Refresh Performance Inspection Tools
Two new functions help you understand what pg_trickle is doing under the hood:
pgtrickle.explain_delta(name, format)— shows you the query plan for the auto-generated delta SQL, the same wayEXPLAINworks for regular queries. Available in text, JSON, XML, or YAML format.pgtrickle.dedup_stats()— reports how often concurrent writes produce duplicate entries that need pre-processing before the MERGE step.
Multi-Tenant Worker Quotas
New setting: pg_trickle.per_database_worker_quota — if you run many
databases on one PostgreSQL cluster, this prevents a busy database from
monopolizing all the refresh workers. Workers are assigned by priority
(immediate-mode tables first, then hot, warm, and cold), with burst capacity
up to 150% when other databases are idle.
TPC-H Benchmark Harness
You can now measure refresh performance across all 22 TPC-H queries in a
structured way. Run just bench-tpch to get per-query timing, FULL vs.
DIFFERENTIAL comparison, and P95 latency numbers. Five synthetic benchmarks
(q01, q05, q08, q18, q21) also measure the pure Rust delta-SQL
generation time without needing a database.
Broader SQL Support
IS JSONpredicates (PG 16+) — expressions likeexpr IS JSON OBJECTnow work in incremental mode.- SQL/JSON constructors (PG 16+) —
JSON_OBJECT(...),JSON_ARRAY(...),JSON_OBJECTAGG(...), andJSON_ARRAYAGG(...)are now accepted. - Recursive CTEs — recursive queries with non-monotone operators (like
EXCEPT) correctly fall back to full refresh instead of producing wrong results.
dbt Integration Updates
If you use dbt-pgtrickle, you can now set partitioning and fuse options directly from dbt model config:
{{ config(partition_by='customer_id') }}for partitioned stream tables{{ config(fuse='auto', fuse_ceiling=100000, fuse_sensitivity=3) }}for circuit-breaker protection
Bug Fixes
- Scheduler cascade fix — stream tables downstream of FULL-mode upstream
tables now detect changes correctly via a
last_refresh_atfallback, preventing stale data in chains where the upstream uses full refresh. - SUM(CASE WHEN ...) drift fix — aggregate expressions using CASE were occasionally producing slightly wrong incremental results; these are now correctly detected and processed via a group rescan.
- Duplicate column DDL fix — removed a duplicate column definition in the
pgt_stream_tablesDDL that could cause issues on fresh installs.
Testing Improvements
- New regression test suite targeting 9 structural weaknesses: join multi-cycle correctness (7 tests), differential-equals-full equivalence (11 tests), DVM operator execution, failure recovery, and MERGE template unit tests.
- E2E test infrastructure now uses template databases, cutting per-test setup time significantly.
New SQL Functions
| Function | Purpose |
|---|---|
pgtrickle.explain_delta(name, format) | Show the query plan for the delta SQL |
pgtrickle.dedup_stats() | MERGE deduplication frequency counters |
pgtrickle.shared_buffer_stats() | Per-source change buffer status |
pgtrickle.explain_refresh_mode(name) | Why a stream table uses its current refresh mode |
pgtrickle.reset_fuse(name) | Reset a blown circuit-breaker fuse |
pgtrickle.fuse_status() | Fuse state across all stream tables |
New Catalog Columns
Ten new columns on pgtrickle.pgt_stream_tables:
| Column | Purpose |
|---|---|
effective_refresh_mode | The actual refresh mode after AUTO resolution |
fuse_mode | Circuit-breaker configuration (off / auto / manual) |
fuse_state | Current fuse state (armed / blown) |
fuse_ceiling | Maximum change count before fuse blows |
fuse_sensitivity | Consecutive cycles above ceiling before triggering |
blown_at | When the fuse last blew |
blow_reason | Why the fuse blew |
st_partition_key | Partition key specification |
max_differential_joins | Maximum join count for differential mode |
max_delta_fraction | Maximum delta-to-table ratio for differential mode |
Upgrading
Run ALTER EXTENSION pg_trickle UPDATE; after installing the new binaries.
All new columns and functions are added automatically. No breaking changes —
everything from v0.12.0 continues to work as before. See
UPGRADING.md for details.
[0.12.0] — Join Correctness, Diagnostics & Reliability
0.12.0 is a correctness, reliability, and developer-experience release built on top of 0.11.0's major new features. It closes the last known wrong-answer bugs for complex join queries, adds tools to help you understand and debug stream table behavior, hardens the scheduler against several edge cases that could cause stale data or crashes, and backs it all with thousands of new automatically generated tests.
Stale Rows Fixed in Stream-Table Chains
What was the problem? When a stream table (B) reads from another stream table (A), each change in A is recorded as a small "what changed" entry — a row added or removed. But the identity key used for those entries was computed differently inside the change buffer than it was inside B's own storage. As a result, when A changed via an upstream UPDATE, B's refresh could silently fail to delete the old version of a row, leaving a stale duplicate.
What changed? The change buffer now computes row identity the same way B does — using a hash of all the data columns rather than the upstream source's primary key. Stale rows after UPDATE no longer appear in stream-table chains. This bug was found and confirmed by the new property-based test suite (see below).
Phantom Rows Fixed for Complex Joins (TPC-H Q7 / Q8 / Q9)
What was the problem? When a stream table's query joins three or more tables together and rows are deleted from more than one join side at the same time, the incremental engine could silently drop the correction — leaving rows in the stream table that should have been removed.
This affected TPC-H queries Q7, Q8, and Q9 (which all involve deep join trees), and any user query with a similar multi-table join structure. A temporary workaround (falling back to full refresh for wide joins) was in place since v0.11.0 and has now been lifted.
What changed? The incremental engine now takes an individual "before snapshot" for each leaf table in the join tree — each one cheaply computed from a single-table comparison — and re-joins them after the delete. This avoids writing multi-gigabyte temp files to disk (the root cause of the original workaround) and eliminates the phantom-row bug entirely. Q7, Q8, and Q9 now run in differential mode without any workarounds.
Type Errors Fixed in Parallel Refresh Chains
What was the problem? When a chain of stream tables is fused into a single
execution unit for efficiency (the "bypass" optimisation added in v0.11.0),
the internal bypass table used text for every column regardless of the
actual column type. This caused an operator does not exist: text > integer
error whenever a downstream stream table had a type-sensitive WHERE clause
(e.g. WHERE amount > 100), making the parallel worker tests fail silently
across all topologies that included a fused chain.
What changed? Bypass tables now use the real column types. The six parallel-worker benchmark tests now complete in 9–26 seconds rather than timing out after 120 seconds.
Scheduler Fixes for Diamond and ST-on-ST Topologies
Two scheduler bugs that caused incorrect refresh behavior with complex dependency graphs were fixed:
-
Diamond timeout. In a diamond topology (A → B, A → C, B+C → D), the L1 arm stream tables (B and C) were created with a 1-minute fixed interval rather than a calculated schedule. This meant D never received updates within the test window. The scheduler also had a bug loading stream table records by ID that caused silent failures in parallel worker paths. Both are fixed.
-
ST-on-ST parallel workers. When an upstream stream table changed, the parallel worker paths (singleton, atomic group, immediate closure, fused chain) were not forcing a full refresh on downstream stream tables the way the main scheduler loop did. This could leave downstream tables stale. The fix ensures all parallel paths treat upstream stream-table changes the same way.
Four New Diagnostic Functions
When stream table behavior is unexpected — wrong refresh mode, a query being rewritten in a surprising way, persistent errors — it previously required reading server logs or source code to understand why. Four new SQL functions expose that internal state directly in queries:
-
pgtrickle.explain_query_rewrite(query TEXT)— shows exactly how pg_trickle rewrites your query for incremental refresh: which operators were applied, how delta keys are injected, and how aggregates are classified. Useful for understanding why a query got a particular refresh mode. -
pgtrickle.diagnose_errors(name TEXT)— shows the last 5 errors for a stream table, each classified by type (correctness, performance, configuration, infrastructure) with a suggested fix. -
pgtrickle.list_auxiliary_columns(name TEXT)— lists the internal__pgt_*columns that pg_trickle injects into a stream table's query plan, with an explanation of each one's purpose. Helpful whenSELECT *returns unexpected extra columns. -
pgtrickle.validate_query(query TEXT)— analyses a SQL query and reports which refresh mode it would get, which SQL constructs were detected, and any warnings — all without creating a stream table.
Multi-Column IN (subquery) Now Gives a Clear Error
What was the problem? A query like WHERE (col_a, col_b) IN (SELECT x, y FROM …) passed validation but produced silently wrong results — the engine
was only matching on the first column and ignoring the second.
What changed? This construct is now detected at stream table creation time
and rejected with a clear error message that recommends rewriting it as
EXISTS (SELECT 1 FROM … WHERE col_a = x AND col_b = y).
IMMEDIATE Mode Proven Correct Under High Concurrency
IMMEDIATE mode (where the stream table updates inside the same transaction as the source table change) now has a dedicated concurrency stress test: 100–120 concurrent transactions firing simultaneously against the same source table, across five scenarios (all inserts, all updates to distinct rows, all updates to the same row, all deletes, and a mixed workload). Zero lost updates, zero phantom rows, and no deadlocks were observed in any run.
Protection Against Pathological Queries
A new guard prevents a particularly deep or convoluted query from consuming
all available stack space and crashing the database backend. When the query
analyser recurses more than 64 levels deep (configurable via
pg_trickle.max_parse_depth), it now returns a clear QueryTooComplex error
instead of crashing.
Tiered Scheduling Now On By Default
The tiered scheduling feature — which automatically slows down cold (infrequently-read) stream tables and speeds up hot ones — is now enabled by default. In large deployments this reduces the scheduler's CPU usage significantly. Stream tables you query often continue refreshing at full speed. Stream tables that nobody has read recently back off gracefully.
If you rely on all stream tables refreshing at the same rate regardless of
read frequency, set pg_trickle.tiered_scheduling = off.
Thousands of Automatically Generated Tests
Two new automated testing systems were added to complement the hand-written test suite:
-
Property-based tests — the test framework automatically generates thousands of random DAG shapes, schedule combinations, and edge cases and checks that the scheduler's ordering guarantees hold for all of them. If any configuration would cause a table to refresh in the wrong order or get spuriously suspended, these tests catch it.
-
SQLancer fuzzing — SQLancer generates random SQL queries and checks that pg_trickle's incremental result matches the result of running the same query directly in PostgreSQL. Any mismatch is automatically saved as a permanent regression test. A weekly CI job runs this continuously. At time of release, zero mismatches have been found.
CDC Write-Side Benchmark Published
A new benchmark suite measures the overhead that pg_trickle's change capture triggers add to your write workload. Results across five scenarios (single-row INSERT, bulk INSERT, bulk UPDATE, bulk DELETE, concurrent writers) are published in docs/BENCHMARK.md. Use these numbers to estimate the impact before deploying pg_trickle on a write-heavy table.
MERGE Template Validation at Test Startup
The SQL templates that pg_trickle generates for applying incremental changes
(the MERGE statements) are now validated with an EXPLAIN dry-run at every
test startup. If a code change accidentally produces a malformed MERGE
template, the tests catch it before any data is processed — rather than
manifesting as a cryptic runtime error.
[0.11.0] — Event-Driven Latency, Chain IVM & Observability Stack
This is the biggest release since the initial launch. The headline features are 34× lower latency for real-time workloads, stream-table chains that now refresh incrementally (no more forced full recomputation when one stream table feeds another), declarative partitioning to cut I/O on large tables by up to 100×, a ready-to-use Prometheus and Grafana monitoring stack, and a circuit breaker to protect production databases from runaway change bursts.
34× Lower Latency — Changes Arrive Instantly
Previously, the background worker woke up on a fixed timer every ~500 ms to check for new data, even when nothing had changed. Every change had to wait up to half a second in the change buffer before being processed.
Now, when a source table is modified, the change capture trigger immediately wakes the background worker via a PostgreSQL notification channel. The worker starts processing within ~15 ms of the write committing — a 34× improvement for low-volume workloads. Under heavy DML, a 10 ms debounce window coalesces rapid notifications so the worker isn't flooded.
Event-driven wake is on by default. You can turn it off
(pg_trickle.event_driven_wake = off) to revert to poll-based wake, and you can
tune the debounce window with pg_trickle.wake_debounce_ms (default 10).
Stream-Table-to-Stream-Table Chains Now Refresh Incrementally
Previously, when stream table B's query read from stream table A, pg_trickle had to do a full recomputation of B every time A changed — even if only a few rows in A actually changed. For long chains (A → B → C → D), every hop was a full re-scan.
Now, stream tables can read from other stream tables incrementally. When A refreshes, the rows it added and removed are recorded in a change buffer just like a base table. B wakes up, reads only the changed rows from A, and applies a delta — not a full recomputation. Even when A does a full refresh (e.g. because its query does not support differential mode), a before/after snapshot diff is captured automatically so downstream tables still receive a small insert/delete delta rather than cascading full refreshes through the chain.
Declaratively Partitioned Stream Tables
Stream tables can now be declared with a partition key:
SELECT create_stream_table(
'monthly_sales',
$$ SELECT month, region, SUM(amount) FROM orders GROUP BY 1, 2 $$,
partition_by => 'month'
);
pg_trickle creates a range-partitioned storage table and, when refreshing, automatically restricts the MERGE operation to only the partitions that contain changed rows. For large tables where changes touch only 2–3 out of 100 monthly partitions, this can reduce the MERGE I/O from 10 million rows to ~100,000 — a 100× improvement.
Ready-to-Use Prometheus and Grafana Monitoring
A complete observability stack is now included in the monitoring/ directory:
monitoring/prometheus/pg_trickle_queries.yml— drop-in configuration forpostgres_exporterthat exports 14 metrics covering refresh performance, CDC buffer sizes, staleness, error rates, and per-table status.monitoring/prometheus/alerts.yml— 8 alerting rules that page you when a stream table goes stale (> 5 min), starts error-looping (≥ 3 consecutive failures), is suspended, or when the CDC buffer exceeds 1 GB.monitoring/grafana/dashboards/pg_trickle_overview.json— a pre-built Grafana dashboard with six sections: cluster overview, refresh latency time-series, staleness heatmap, CDC lag, per-table drill-down, and scheduler health.monitoring/docker-compose.yml— brings up PostgreSQL + pg_trickle + postgres_exporter + Prometheus + Grafana with one command (docker compose up). Grafana opens at http://localhost:3000; the dashboard shows live metrics generated by a seed workload of stream tables continuously refreshing synthetic order and product data (seemonitoring/init/01_demo.sql).
No code changes are needed to use this stack with an existing pg_trickle installation.
Circuit Breaker (Fuse) — Protection Against Runaway Change Bursts
A new circuit breaker mechanism halts refresh for a stream table when its pending change count exceeds a configurable threshold. This protects your database from accidental mass-delete scripts, runaway migrations, or data imports that would otherwise trigger an unexpectedly large and expensive refresh operation.
When the fuse blows, pg_trickle sends a pgtrickle_alert PostgreSQL notification
that you can subscribe to, and suspends the affected stream table. You then choose
how to recover using reset_fuse():
reset_fuse(name, action => 'apply')— process the backlog normally (default).reset_fuse(name, action => 'reinitialize')— clear the change buffer and repopulate the stream table from scratch.reset_fuse(name, action => 'skip_changes')— discard the pending changes and resume without reprocessing them.
Configure per-table with alter_stream_table(fuse => 'on', fuse_ceiling => 10000)
or set a global default with pg_trickle.fuse_default_ceiling. Use
fuse_status() to inspect the blown/active state of all stream tables at once.
Wider Column Bitmask — No More 63-Column Limit
pg_trickle's change capture tracks which columns were actually modified in each row so that stream tables that reference only a subset of columns can ignore irrelevant updates. Previously, this optimization silently stopped working for source tables with more than 63 columns — all updates were treated as touching every column.
The bitmask has been extended from a 64-bit integer to an arbitrary-width
PostgreSQL VARBIT value, removing the column count cap entirely. Existing
deployments are migrated automatically (the old column value becomes NULL,
which the filter treats conservatively — no rows are silently dropped). Tables
with fewer than 64 columns are unaffected at the data level.
Per-Database Worker Quotas
In multi-tenant environments where multiple databases share a single PostgreSQL instance, all stream-table refresh workers previously competed for the same concurrency pool. A single busy database could crowd out others.
A new GUC pg_trickle.per_database_worker_quota sets a soft concurrency limit
per database. When the rest of the cluster is lightly loaded (< 80% of available
capacity in use), a database can burst to 150% of its quota. When the cluster is
busy, each database is held to its base quota.
Refresh work is also now dispatched in priority order: IMMEDIATE mode tables → atomic diamond groups → singleton tables.
DAG Scheduling Performance
For deployments with chains of stream tables (A → B → C), several improvements reduce end-to-end propagation latency:
- Fused single-consumer chains. When a stream table chain has exactly one downstream consumer at each hop, the scheduler fuses the chain into a single execution unit in one background worker. Intermediate deltas are stored in temporary in-memory tables instead of persistent change buffers, eliminating the WAL writes, index maintenance, and cleanup that would normally occur at each hop.
- Batch coalescing. Before a downstream table reads from an upstream change buffer, redundant insert/delete pairs for the same row are cancelled out. This prevents rapid-fire upstream refreshes from accumulating duplicate work for downstream tables.
- Adaptive dispatch polling. The parallel dispatch loop now backs off exponentially (20 ms → 200 ms) instead of using a fixed 200 ms poll, and resets to 20 ms as soon as any worker finishes. Cheap refreshes no longer wait a full 200 ms for the next tick.
- Delta amplification warnings. When a differential refresh produces many
more output rows than input rows (default threshold: 100×), a
WARNINGis emitted with the table name, input and output counts, and a tuning hint.explain_st()now exposesamplification_statsfrom the last 20 refreshes.
Smarter Diagnostics and Warnings
Several improvements to make problems visible earlier and easier to diagnose:
- Know which refresh mode is actually running. When a stream table is set to
AUTO, pg_trickle now records which mode it actually chose at each refresh (DIFFERENTIAL,FULL, etc.) in a neweffective_refresh_modecolumn onpgt_stream_tables. A newexplain_refresh_mode(name)function reports the configured mode, the actual mode used, and the reason for any downgrade — all in one query. - Clearer warning when a stream table falls back to full refresh. If a stream
table cannot use differential mode, pg_trickle now emits a
WARNINGmessage naming the affected table and the reason. Previously this happened silently. - Warning when using aggregates that require full group rescans. Aggregate
functions like
STRING_AGG,ARRAY_AGG, andJSON_AGGrequire re-aggregating the entire group whenever any member changes. pg_trickle now warns at stream table creation time when such aggregates are used inDIFFERENTIALmode, andexplain_st()classifies each aggregate's maintenance strategy (incremental, auxiliary-state, or group-rescan) so you can understand the cost. - Better error messages. Errors for unsupported query patterns, cycle
detection, upstream schema changes, and query parse failures now include a
DETAILfield explaining what went wrong and aHINTfield suggesting how to fix it. - Invalid parameter combinations are rejected at creation time. For example,
using
diamond_schedule_policy='slowest'withoutdiamond_consistency='atomic'now produces a clear error atcreate_stream_table/alter_stream_tabletime rather than silently doing the wrong thing at refresh time. - TopK queries validate their metadata on every refresh. Stream tables defined
with
ORDER BY ... LIMIT Nnow recheck that the stored LIMIT/OFFSET metadata still matches the actual query on each refresh. On mismatch, they fall back to a full refresh with aWARNINGrather than silently producing wrong results.
Safety and Reliability Improvements
- No more crashes from schema changes. If a source table's schema changes
while a refresh is running (e.g. a column is dropped), pg_trickle now catches
the error, emits a structured
WARNINGwith the table name and error details, and continues refreshing all other stream tables. The scheduler never crashes due to an individual table's error. - Failure injection tests. New end-to-end tests deliberately drop columns and tables mid-refresh to verify that the scheduler stays alive and other stream tables continue processing correctly.
- Safer defaults. Three default settings have been updated to reflect
production-safe behavior:
parallel_refresh_modenow defaults to'on'(was'off'). Parallel refresh has been stable for several releases; serial mode is now opt-in.block_source_ddlnow defaults totrue. AccidentalALTER TABLEon a source table while a stream table depends on it is now blocked by default, with clear instructions on how to temporarily disable the guard if needed.- The invalidation ring capacity has been doubled from 32 to 128 slots, reducing the risk of invalidation events being silently discarded under rapid DDL.
Getting Started Guide Restructured
docs/GETTING_STARTED.md has been reorganised into five progressive chapters:
- Hello World — create your first stream table and watch it update.
- Joins, Aggregates & Chains — multi-table dependencies and DAG patterns.
- Scheduling & Backpressure — controlling refresh frequency and auto-backoff.
- Monitoring In Depth — using the five key diagnostic functions and the Prometheus/Grafana stack.
- Advanced Topics — FUSE circuit breaker, partitioned stream tables, IMMEDIATE (in-transaction) IVM, and multi-tenant worker quotas.
TPC-H Correctness Gate Added to CI
Five queries derived from the TPC-H benchmark — covering single-table
GROUP BY, filter-aggregate, CASE WHEN inside SUM, a three-way join, and LEFT
OUTER JOIN with GROUP BY — now run in DIFFERENTIAL mode on every push to main
and daily. Any correctness mismatch between pg_trickle's incremental output and
plain PostgreSQL execution fails the CI build automatically.
Docker Hub Image Improvements
The Dockerfile.hub image that is published to Docker Hub has been expanded
with a comprehensive set of GUC defaults fine-tuned for production use. A new
just build-hub-image recipe builds the image locally for testing.
Bug Fixes
- Scheduler crash after event-driven wake was enabled. The background worker
crashed immediately after startup when
event_driven_wake = on(the default) because theLISTENcommand was being issued outside of a transaction. Fixed by issuingLISTENinside a short-lived SPI transaction at startup. (#296) - Spurious full refresh for non-recursive CTEs. Stream tables containing
WITHclauses that were not recursive (WITH foo AS (SELECT ...)) were being incorrectly forced to FULL refresh mode. Only truly recursive CTEs (WITH RECURSIVE) require this. Non-recursive CTEs now correctly use differential mode. (#298) DISTINCT ONinside a CTE body caused a parse error. When a stream table's defining query contained aWITHclause whose body usedDISTINCT ON (...), the DVM query analyser failed with a parse error. TheDISTINCT ONclause is now rewritten before analysis so it no longer interferes. (#300)- Full-refresh fallback warning now names the affected table. When pg_trickle
falls back from differential to full refresh, the emitted
WARNINGnow includes the stream table name and the reason, making it straightforward to identify which table you need to investigate. (#301)
[0.10.0] — Cloud Deployment, PgBouncer & Query Engine Correctness
The headline features of 0.10.0 are cloud deployment compatibility, query
engine correctness, refresh performance, and improved developer
experience for auto_backoff. pg_trickle now works reliably
behind PgBouncer — the connection pooler used by default on Supabase, Railway,
Neon, and other managed PostgreSQL platforms. A broad set of correctness issues
in the incremental query engine are fixed. And several performance optimizations
cut refresh time for large tables and busy deployments.
auto_backoff Is Now Much Friendlier on Developer Machines
When pg_trickle.auto_backoff = true is enabled, the scheduler automatically
slows down stream tables whose refresh cost exceeds their schedule budget — a
good safeguard in production. This release makes the feature safe to use
alongside short schedules (e.g. '1s') in developer and CI environments:
-
Trigger threshold raised from 80 % → 95 %. Backoff now only activates when a refresh consumes more than 95 % of the schedule window. A 900 ms refresh on a 1-second schedule (90 %) used to trigger backoff; it no longer does. EC-11 operator alerting continues to fire at 80 % (unchanged) so you still get an early warning before the scheduler is actually stuck.
-
Maximum slowdown reduced from 64× → 8×. In the worst case, a stream table's effective refresh interval is now capped at 8× its configured schedule (e.g. 8 seconds for a
'1s'table) instead of 64 seconds. The cap self-heals immediately: a single on-time refresh resets the factor to 1×. -
Backoff events now emit
WARNINGinstead ofINFO. When the scheduler stretches or resets a stream table's effective interval, you will see aWARNINGmessage in your PostgreSQL client, including the new effective interval — rather than a silent slowdown with no explanation. -
auto_backoffnow defaults toon. With the above improvements in place, the feature is safe in all environments. New installations get CPU runaway protection out of the box. To restore the old opt-in behaviour, setpg_trickle.auto_backoff = off.
Works Behind PgBouncer
PgBouncer is the most popular PostgreSQL connection pooler. In "transaction mode" — the default setting on most cloud PostgreSQL platforms — it hands a fresh database connection to every transaction, which breaks anything that assumes the same connection stays open between calls (session locks, prepared statements). pg_trickle previously relied on both. This release makes pg_trickle work correctly in such deployments.
-
Session locks replaced with row-level locking. The background scheduler now acquires a short-lived row-level lock on each stream table's catalog entry instead of a session-level advisory lock. Row-level locks are released automatically at transaction end — exactly what PgBouncer transaction mode requires. If a concurrent refresh is already running for a given stream table, the scheduler skips that cycle and retries, rather than blocking.
-
New
pooler_compatibility_modeoption per stream table. Settingpooler_compatibility_mode => truewhen creating or altering a stream table disables prepared statements and NOTIFY emissions for that table. Leave it off (the default) if you're not behind a pooler — behaviour is unchanged from v0.9.0. -
PgBouncer tested end-to-end. A new automated test suite boots PgBouncer in transaction-pool mode alongside pg_trickle and exercises the full lifecycle: create, refresh, alter, drop — all through the pooler. Run with
just test-pgbouncer.
Query Engine Correctness Fixes
Several SQL patterns that appeared to work correctly could produce wrong results silently under the incremental query engine. All of the following are now fixed:
-
Recursive queries (WITH RECURSIVE) update correctly when rows are deleted. Recursive queries are used for organisation hierarchies, bill-of-materials roll-ups, graph traversals, and similar structures. In DIFFERENTIAL mode, deleting a row from the source previously caused a full recomputation (correct, but expensive — O(n)). Now pg_trickle uses the Delete-and-Rederive algorithm, updating only affected rows at O(delta) cost. Computed expressions like
ancestor.path || ' > ' || node.nameupdate correctly when any ancestor is renamed or moved. -
SUM over a FULL OUTER JOIN no longer returns 0 instead of NULL. When matched rows on both join sides transition to matched on one side only (creating null-padded rows), the incremental SUM formula previously returned 0 instead of NULL. pg_trickle now tracks how many non-null values exist in each group and produces the correct answer without any full-group rescan.
-
Multi-source delta merging is now correct for diamond-shaped queries. A "diamond" topology is when two separate paths through the dependency graph both feed into the same stream table (e.g. table A → both B and C → D). Simultaneous changes on both paths could previously cause some corrections to be silently discarded, leaving D with wrong values. Now uses proper weight aggregation (Z-set algebra) so every correction is applied. Six property-based tests verify this for different diamond shapes.
-
Statistical aggregates (CORR, COVAR, REGR_*) now update in constant time. All twelve SQL correlation and regression functions —
CORR,COVAR_POP,COVAR_SAMP, and the tenREGR_*variants — now update incrementally using running totals (Welford-style accumulation) instead of rescanning the whole group. Each changed row is processed once regardless of group size. -
LATERAL subqueries only re-examine correlated rows. When data changes in the inner part of a LATERAL JOIN, pg_trickle previously re-ran the subquery for every row in the outer table. Now it re-runs it only for outer rows that actually correlate with the changed inner data, reducing work from proportional-to-table-size to proportional-to-changes.
-
Materialized view sources now work in DIFFERENTIAL mode. Stream tables can use a PostgreSQL materialized view as their data source when
pg_trickle.matview_polling = onis set. Changes are detected by comparing snapshots, the same mechanism used for foreign table sources. -
Six correctness bugs in the query rewriting engine fixed. These all involved edge cases in how the incremental engine translates SQL:
- SQL comment fragments such as
/* unsupported ... */that were being injected into generated SQL and causing runtime syntax errors are now replaced with clear extension-level errors. - When a column-rename step (e.g.
EXTRACT(year FROM orderdate) AS o_year) sits between an aggregate and its source, GROUP BY and aggregate expressions now resolve correctly. EXCEPTqueries wrapped in a projection no longer silently lose their row multiplicity tracking.- A placeholder row identifier value of zero could collide with real row
hashes; changed to a sentinel value (
i64::MIN) outside the normal hash range. - Empty scalar subqueries now raise a clear error instead of silently emitting NULL.
- SQL comment fragments such as
-
Change capture (CDC) fixes. The UPDATE trigger now correctly handles rows with NULL values in their primary key columns (previously those rows were silently dropped from the change buffer). WAL logical replication publications are automatically rebuilt when a source table is converted to partitioned after the publication was set up — previously this caused the stream table to silently stop updating. TRUNCATE followed by INSERT is handled atomically so post-TRUNCATE inserts are never lost.
Faster Refreshes
-
Automatic covering index on stream table row IDs. Stream tables with eight or fewer output columns now automatically get a covering index with
INCLUDE (col1, col2, ...)on the internal__pgt_row_idcolumn. This lets the MERGE step use index-only scans — no heap lookups for matched rows — reducing refresh time by roughly 20–50% in small-delta / large-table scenarios. -
Change buffer compaction. When the pending change buffer grows beyond
pg_trickle.compact_threshold(default 100,000 rows), pg_trickle compacts it before the next refresh cycle. INSERT→DELETE pairs that cancel each other out are eliminated; multiple sequential changes to the same row are collapsed to a single net change. Reduces delta scan overhead by 50–90% for high-churn tables. Useschange_id(notctid) for safe operation under concurrent VACUUM. -
Tiered refresh scheduling. Large deployments can assign stream tables to one of four tiers: Hot (refresh at the configured interval), Warm (2× interval), Cold (10× interval), or Frozen (skip until manually promoted). Gate the feature with
pg_trickle.tiered_scheduling = on(default off). Set per stream table viaALTER STREAM TABLE ... SET (tier => 'warm'). Frozen stream tables are entirely skipped by the scheduler until you promote them. -
Incremental dependency-graph updates. When a stream table is created, altered, or dropped, the internal dependency graph now updates only the affected entries instead of rebuilding the entire graph from scratch. Reduces the latency impact of DDL operations from roughly 50 ms to roughly 1 ms in deployments with 1,000+ stream tables.
-
Smarter topo-sort caching inside a scheduler tick. The ordering in which stream tables are refreshed (topological order through the dependency graph) is now computed once per scheduler tick and reused across all internal callers, eliminating redundant work.
Better Visibility Into What pg_trickle Is Doing
Several behaviours that previously happened silently now produce a short, actionable message at the moment they occur:
-
ORDER BYwithoutLIMITwarns you at creation time. AddingORDER BYto a stream table's defining query without also addingLIMIThas no effect: stream table storage has no guaranteed row order. pg_trickle now emits aWARNINGpointing you toward the TopK pattern or suggesting you remove theORDER BY. -
append_onlymode reversions are visible. When pg_trickle automatically exits append-only mode (because deletions or updates were detected in the source), the notice is now emitted atWARNINGlevel (wasINFO, normally suppressed) and also dispatched as apgtrickle_alertnotification. -
Cleanup failures escalate after 3 consecutive attempts. If the background worker fails to clean up a source table 3 times in a row, the message is promoted from
DEBUG1(normally invisible) toWARNINGso it appears in the server log. -
Diamond dependency with
diamond_consistency='none'now advises you. When you create a stream table that forms a diamond in the dependency graph and explicitly setdiamond_consistency='none', aNOTICEadvises you to considerdiamond_consistency='atomic'for consistent cross-branch reads. -
diamond_consistencynow defaults to'atomic'. New stream tables get atomic group semantics by default, meaning all branches of a diamond are refreshed together in a single savepoint before the convergence node is updated. This prevents a read from the convergence node seeing one branch partially updated and the other stale. To restore the old independent behavior, passdiamond_consistency => 'none'explicitly. -
Adaptive fallback is visible at the default log level. When a differential refresh falls back to a full refresh because the delta is too large, the message is now emitted at
NOTICElevel (the defaultclient_min_messagesthreshold) instead ofINFO(usually suppressed in the client session). -
CALCULATEDschedule without downstream dependents warns you. When a stream table is created withschedule='calculated'but no existing stream table references it as a downstream dependent, aNOTICEexplains that the schedule will fall back topg_trickle.default_schedule_seconds. -
Internal
__pgt_*auxiliary columns are now documented. The hidden columns that the refresh engine may add to stream table physical storage are described in a new section of SQL_REFERENCE.md. This covers all variants from the always-present__pgt_row_idprimary key through the aggregate-specific auxiliary columns for AVG, STDDEV, CORR, COVAR, REGR_*, window functions, and recursive CTE depth.
Bug Fixes
- Scheduler no longer permanently misses stream tables created under a
stale snapshot.
signal_dag_invalidationis called inside the creating transaction before it commits. If the background scheduler happened to start a new tick and capture a catalog snapshot at that exact instant, the DAG rebuild query would not see the new stream table — yet the version counter was already advanced, so the scheduler would never rebuild again. The affected stream table would then never be scheduled for refresh. Fixed by verifying that every invalidatedpgt_idis present in the rebuilt DAG after each rebuild. If any are missing the scheduler signals a full-rebuild for the next tick (which starts a fresh transaction that includes all committed data) rather than accepting the stale version. Fixes CI testtest_autorefresh_diamond_cascade.
Upgrade Notes
-
New catalog columns. The
0.9.0 → 0.10.0upgrade migration addspooler_compatibility_mode BOOLEANandrefresh_tier TEXTtopgt_stream_tables. RunALTER EXTENSION pg_trickle UPDATE TO '0.10.0'after replacing the extension files. Verification script:scripts/check_upgrade_completeness.sh. -
Hidden auxiliary columns for statistical aggregates. Stream tables using
CORR,COVAR_POP,COVAR_SAMP, or anyREGR_*aggregate will get hidden__pgt_aux_*columns when created or altered under 0.10.0. These are invisible to normal queries (excluded by theNOT LIKE '__pgt_%'convention) and managed automatically. -
pooler_compatibility_modeis off by default. Existing stream tables are unaffected. Enable it only for stream tables accessed through PgBouncer transaction-mode pooling.
Additional Bug Fixes (2026-03-24)
Scheduler stability:
-
Scheduler no longer crashes when concurrent refreshes compete. The internal function that decides whether to skip a refresh cycle was running a locking query outside a transaction boundary — a strict PostgreSQL requirement. It now runs inside a proper subtransaction, eliminating the crash.
-
Auto-backoff no longer causes a transaction conflict in the background worker. When the auto-backoff feature stretches a stream table's refresh interval, it previously tried to open a new transaction inside the background worker's already-open transaction. PostgreSQL does not allow this nesting; the code path is now restructured to avoid it.
Query engine correctness:
-
Queries that filter on hidden columns now produce correct results. For example,
SELECT name FROM users WHERE internal_id > 5— whereinternal_idis not part of the output — could return wrong rows during incremental updates. Fixed. -
JOIN results are correct when both joined tables change at the same time. Simultaneous changes to two stream tables connected by a JOIN could leave the output with stale or duplicated rows. Fixed.
-
NULLIF(a, b)expressions now work in incremental queries.NULLIFreturns NULL when its two arguments are equal. It was not recognised by the incremental parser, causing a fallback error. Fixed. -
LIKEandILIKEpattern matching now work in filter conditions. Filter expressions such asWHERE name LIKE 'A%'orWHERE description ILIKE '%widget%'were not handled by the incremental engine. Fixed. -
Subqueries with
ORDER BY,LIMIT, orOFFSETare now preserved correctly. When the incremental engine reconstructed a subquery, those clauses were silently dropped. The incremental result no longer differs from a full refresh for such queries. -
Scalar subqueries using
LIMITorOFFSETare now handled gracefully. Rather than producing a runtime error, the engine falls back to a full refresh for those cases and continues.
SQL parser:
- Wildcard column references (
table.*) now work for qualified names. A two- or three-part column reference such asschema.table.*oralias.*caused a parser crash. Fixed.
Change capture and WAL:
- State transitions no longer stall when the WAL replication slot is behind. When a stream table moves through the TRANSITIONING state, pg_trickle now advances the WAL replication slot up-front. This eliminates a lag-check stall that could cause the transition to hang indefinitely under write-heavy workloads.
Security:
- Several low-severity code quality and security scanner alerts from Semgrep and CodeQL are resolved. No user-visible behaviour changes.
[0.9.0] — Incremental Aggregates & Smarter Scheduling
The headline feature of 0.9.0 is incremental aggregate maintenance: when a single row changes inside a group of 100,000 rows, pg_trickle no longer has to re-scan all 100,000 rows to update COUNT, SUM, AVG, STDDEV, or VAR results. Instead it keeps running totals and adjusts them in constant time. Only MIN/MAX still needs a rescan — and only when the deleted value happens to be the current extreme.
Beyond aggregates, this release contains a broad set of performance optimizations that reduce wasted I/O during every refresh cycle, two new configuration knobs, a refresh-group management API, and several bug fixes.
Faster Aggregates
- Constant-time COUNT, SUM, AVG: Changed rows are now applied
algebraically (
new_sum = old_sum + inserted − deleted) instead of re-aggregating the whole group. AVG uses hidden auxiliary SUM and COUNT columns maintained automatically on the stream table. - Constant-time STDDEV and VAR: Standard-deviation and variance
aggregates (
STDDEV_POP,STDDEV_SAMP,VAR_POP,VAR_SAMP) now use a sum-of-squares decomposition with a hidden auxiliary column, achieving the same constant-time update as COUNT/SUM/AVG. - MIN/MAX safety guard: Deleting the row that currently holds the minimum (or maximum) value correctly triggers a rescan of that group. Property-based tests verify this boundary.
- Floating-point drift reset: A new setting
(
pg_trickle.algebraic_drift_reset_cycles) periodically forces a full recomputation to correct any floating-point rounding drift that accumulates over many incremental cycles.
Smarter Refresh Scheduling
- Automatic backoff for overloaded streams: The
pg_trickle.auto_backoffGUC was introduced here (default off at the time). See the v0.10.0 entry for the improved thresholds, reduced cap, and the flip toonby default. - Index-aware MERGE: A new threshold setting
(
pg_trickle.merge_seqscan_threshold, default 0.001) tells PostgreSQL to use an index lookup instead of a full table scan when only a tiny fraction of the stream table's rows are changing.
Less Wasted I/O
- Skip unchanged columns: The scan operator now checks the CDC trigger's per-row bitmask to skip UPDATE rows where none of the columns your query actually uses were modified. For wide tables where you only reference a few columns, most UPDATE processing is eliminated.
- Skip unchanged sources in joins: When a multi-source join query has
three source tables but only one of them changed, the delta branches for
the two unchanged sources are now replaced with
FALSEat plan time. PostgreSQL's planner recognises those branches as empty and skips them entirely. - Push WHERE filters into the change scan: If your stream table's
defining query has a WHERE clause (e.g.
WHERE status = 'shipped'), that filter is now applied immediately after reading the change buffer — before rows enter the join or aggregate pipeline. Rows that don't match the filter are discarded right away. - Faster DISTINCT counting: The per-row multiplicity lookup for
SELECT DISTINCTqueries now uses an index-driven scalar subquery instead of a LEFT JOIN, guaranteeing I/O proportional to the number of changed rows regardless of stream table size. - Scalar subquery short-circuit: When a scalar subquery's inner source has no changes in the current cycle, the expensive outer-table snapshot reconstruction is skipped entirely.
Refresh Group Management
- New SQL functions for grouping stream tables that should always be
refreshed together (cross-source snapshot consistency):
pgtrickle.create_refresh_group(name, members, isolation)pgtrickle.drop_refresh_group(name)pgtrickle.refresh_groups()— lists all declared groups.
Bug Fixes
- Fixed a crash when internal status queries failed: The
source_gates()andwatermarks()SQL functions previously crashed the entire PostgreSQL backend process on any internal error. They now report a normal SQL error instead. - Clearer handling of window functions in expressions: Queries like
CASE WHEN ROW_NUMBER() OVER (...) > 5 THEN ...were silently accepted but failed at refresh time with a confusing error. pg_trickle now automatically falls back to full refresh mode (in AUTO mode) or warns you at creation time (in explicit DIFFERENTIAL mode).
Documentation
- Documented the known limitation that recursive CTE stream tables in
DIFFERENTIAL mode fall back to full recomputation when rows are deleted
or updated. Workaround: use
refresh_mode = 'IMMEDIATE'. - Documented the
pgt_refresh_groupscatalog table schema and usage. - Documented the O(partition_size) cost of window function maintenance with mitigation strategies.
Deferred to v0.10.0
The following performance optimizations were evaluated and explicitly deferred. In every case the current behaviour is correct — these items would make certain workloads faster but carry enough implementation risk that they need more design work first:
- Recursive CTE incremental delete/update in DIFFERENTIAL mode (P2-1)
- SUM NULL-transition shortcut for FULL OUTER JOIN aggregates (P2-2)
- Materialized view sources in IMMEDIATE mode (P2-4)
- LATERAL subquery scoped re-execution (P2-6)
- Welford auxiliary columns for CORR/COVAR/REGR_* aggregates (P3-2)
- Merged-delta weight aggregation for multi-source deduplication (B3-2/B3-3)
Upgrade Notes
- New SQL objects: The
0.8.0 → 0.9.0upgrade migration adds thepgt_refresh_groupstable and therestore_stream_tablesfunction. RunALTER EXTENSION pg_trickle UPDATE TO '0.9.0'after replacing the extension files. - Hidden auxiliary columns: Stream tables using AVG, STDDEV, or VAR
aggregates will automatically get hidden
__pgt_aux_*columns when created or altered. These columns are invisible to normal queries (filtered by the existingNOT LIKE '__pgt_%'convention) and are managed automatically. - PGXN publishing: Release artifacts are now automatically uploaded to PGXN via GitHub Actions.
[0.8.0] — Backup, Pooler Compatibility & Reliability
This release focuses on making your streams easier to back up, far more reliable under complex scenarios, and solidifying the underlying core engine through massive testing improvements.
Added
- Backup and Restore Support: You can now safely backup your database using standard
pg_dumpandpg_restorecommands. The system will automatically reconnect all streams and data queues to eliminate downtime during disaster recovery. - Connection Pooler Opt-In: Replaced the global PgBouncer pooler compatibility setting with a per-stream option. You can now enable connection pooling optimizations selectively on a stream-by-stream basis.
Fixed
- Cyclic Stream Reliability: Fixed internal bugs that occasionally caused streams referencing each other in a loop to get stuck refreshing forever. Streams now accurately detect when row changes stop and naturally settle.
- Large Dependency Chains: Fixed a crash (stack overflow) that could happen if you attempted to drop an extremely large or heavily recursive chain of stream tables sequentially.
- Special Character Support in SQL: Handled an edge case causing errors when multi-byte characters or special non-ASCII symbols were parsed inside certain SQL commands.
- Mac Support for Developer Tooling: Addressed a minor internal tool error stopping test components from automatically building on Apple Silicon machines.
Under the Hood Code and Testing Enhancements
- Massive Testing Hardening: We have fundamentally overhauled and upgraded how we test the system. Our internal test suite has been completely enhanced with tens of thousands of continuous automated checks ensuring query answers are perfect, no matter how complex the data joins or updates get.
- Performance Migrations: Began adopting new tools (
cargo nextest) to speed up how fast we can iterate and develop the software in the background.
[0.7.0] — Watermark Gating, Circular Pipelines & SQL Broadening
0.7.0 makes pg_trickle easier to trust in real-world data pipelines. The big theme of this release is fewer surprises: the scheduler can now wait for late arriving source data, some circular pipelines can run safely instead of being blocked, more queries stay on incremental refresh, and the system does a better job of deciding when incremental work is no longer worth it.
Added
Multi-source data can wait until it is actually ready
pg_trickle can now delay a refresh until related source tables have all caught
up to roughly the same point in time. This is useful for ETL jobs where, for
example, orders arrives before order_lines and refreshing too early would
produce a half-finished report.
- New watermark APIs:
advance_watermark(source, watermark),create_watermark_group(name, sources[], tolerance_secs), anddrop_watermark_group(name). - New status helpers:
watermarks(),watermark_groups(), andwatermark_status(). - The scheduler now skips gated refreshes when grouped sources are too far apart and records the reason in refresh history.
- New catalog tables store per-source watermarks and watermark group definitions.
- 28 end-to-end tests cover normal operation, bad input, tolerance windows, and scheduler behavior.
Some circular pipelines can now run safely
Stream tables that depend on each other in a loop are no longer always blocked. If the cycle is monotone and uses DIFFERENTIAL mode, pg_trickle can now keep refreshing the group until it stops changing.
- Circular refreshes run to a fixed point, with
pg_trickle.max_fixpoint_iterationsas a safety limit. - Cycle creation and ALTER validation now check that every member is safe for convergence before allowing the loop.
pgtrickle.pgt_status()now reportsscc_id, andpgtrickle.pgt_scc_status()shows per-cycle-group status.pgtrickle.pgt_stream_tablesnow trackslast_fixpoint_iterationsso it is easier to spot slow or unstable cycles.- 6 end-to-end tests cover convergence, rejection of unsafe cycles, non-convergence handling, and cleanup.
More queries stay on incremental refresh
Several query shapes that used to fall back to FULL refresh, or fail outright, now keep working in DIFFERENTIAL and AUTO mode.
- User-defined aggregates created with
CREATE AGGREGATEnow work through the existing group-rescan strategy, including common extension-provided aggregates. - More complex
ORplus subquery patterns are now rewritten correctly, including cases that need De Morgan normalization and multiple rewrite passes. - The rewrite pipeline has a guardrail to stop runaway branch explosion.
- A dedicated 14-test end-to-end suite covers these previously missing cases.
Easier packaging ahead of 1.0
The release also adds infrastructure that makes evaluation and future distribution simpler.
Dockerfile.huband a dedicated CI workflow can build and smoke-test a ready-to-run PostgreSQL 18 image with pg_trickle preinstalled.META.jsonadds PGXN package metadata withrelease_status: "testing".- CNPG smoke testing is now part of the documented pre-1.0 packaging story.
Improved
Refresh strategy and performance decisions are smarter
The scheduler and refresh engine now make better choices when incremental work is likely to help and back off sooner when it is not.
- Wide tables now use xxh64-based change detection instead of slower MD5-based comparisons.
- Aggregate stream tables can skip expensive incremental work and jump straight to FULL refresh when the pending change set is obviously too large.
- Strategy selection now combines a change-ratio signal with recent refresh history, which helps on workloads with uneven batch sizes.
- DAG levels are extracted explicitly, enabling level-parallel refresh scheduling.
- Small internal hot paths such as column-list building and LSN comparison were tightened to remove avoidable allocations.
Benchmarking is much easier to use and compare
The performance toolchain was expanded so regressions are easier to spot and large-scale behavior is easier to study.
- Benchmarks now support per-cycle output, optional
EXPLAIN ANALYZEcapture, larger 1M-row runs, and more stable Criterion settings. - New tooling covers cross-run comparison, concurrent writers, and extra query
shapes such as window, lateral, CTE, and
UNION ALLworkloads. just bench-dockermakes it easier to run Criterion inside the builder image when local linking is awkward.
Changed
Internal Code Quality: Integration Test Suite Hardening
Completed a full hardening pass of the integration test suite, bringing all items in PLAN_TEST_EVALS_INTEGRATION.md to done:
- Multiset validation — Extracted
assert_sets_equal()helper relying on EXCEPT/UNION ALL SQL logic and applied it to workflow tests to ensure storage table state correctly matches the defining query post-refresh. - Round-trip notifications —
pg_trickle_alertnotifications now verify receipt end-to-end viasqlx::PgListener. - DVM operators — Added unit coverage for complex semi/anti-join behaviors (multi-column, filtered, complementary), multi-table join chains for inner and full joins, and
proptest!fuzz tests enforcing generated SQL invariants across INNER, SEMI, and ANTI joins. - Resilience and edge cases — Test coverage for ST drop cascades verifying dependent object removal, exact error escalation thresholds, and scheduler job lifecycles across queued mock states.
- Cleanups — Standardized naming practices (
test_workflow_*,test_infra_*) and eliminated clock-bound flakes by widening staleness assertions.
Internal low-level code is much safer to audit
This release cuts the amount of low-level unsafe Rust in half without
changing behavior.
- Unsafe blocks were reduced by 51%, from 1,309 to 641.
- Repeated patterns were consolidated into a small set of documented helper functions.
- 37 internal functions no longer need to be marked
unsafe. - Existing unit tests continued to pass unchanged after the refactor.
[0.6.0] — Idempotent DDL, Partitioned Sources & dbt Integration
Added
Idempotent DDL (create_or_replace)
New one-call function for deploying stream tables without worrying about whether they already exist. Replaces the old "check if it exists, then drop and recreate" pattern.
create_or_replace_stream_table()— a single function that does the right thing automatically:- Creates the stream table if it doesn't exist yet.
- Does nothing if the stream table already exists with the same query and settings (logs an INFO so you know it was a no-op).
- Updates settings (schedule, refresh mode, etc.) if only config changed.
- Replaces the query if the defining query changed — including automatic schema migration and a full refresh.
- dbt uses it automatically. The
stream_tablematerialization now callscreate_or_replace_stream_table()when running against pg_trickle 0.6.0+, with automatic fallback for older versions. - Whitespace-insensitive. Cosmetic SQL differences (extra spaces, tabs, newlines) are correctly treated as no-ops — won't trigger unnecessary rebuilds.
dbt Integration Enhancements
- Check stream table health from dbt. New
pgtrickle_stream_table_status()macro returns whether a stream table is healthy, stale, erroring, or paused. Pair it with the new built-instream_table_healthytest in yourschema.ymlto fail CI when a stream table is behind or broken. - Refresh everything in the right order. New
refresh_all_stream_tablesrun-operation refreshes all dbt-managed stream tables in dependency order. Run it afterdbt runand beforedbt testin your CI pipeline.
Partitioned Source Tables
Stream tables now work with PostgreSQL's declarative table partitioning — RANGE, LIST, and HASH partitioned tables all work as sources out of the box.
- Changes in any partition are captured automatically. CDC triggers fire on the parent table so inserts, updates, and deletes in any child partition are picked up.
- ATTACH PARTITION triggers automatic rebuild. When you attach a new partition, pg_trickle detects the structural change and rebuilds affected stream tables to include the new partition's pre-existing data.
- WAL mode works with partitions. Publications are configured with
publish_via_partition_root = true, so all partitions report changes under the parent table's identity. - New tutorial covering partitioned source tables, ATTACH/DETACH behavior,
and known caveats (
docs/tutorials/PARTITIONED_TABLES.md).
Circular Dependency Foundation
Lays the groundwork for stream tables that reference each other in a cycle (A → B → A). The actual cyclic refresh execution is planned for v0.7.0 — this release adds the detection, validation, and safety infrastructure.
- Cycle detection. pg_trickle can now identify groups of stream tables that form circular dependencies.
- Safety checks at creation time. Queries that can't safely participate in a cycle (those using aggregates, EXCEPT, window functions, or NOT EXISTS) are rejected with a clear error explaining why.
- New settings:
pg_trickle.allow_circular(default: off) — master switch for circular dependencies.pg_trickle.max_fixpoint_iterations(default: 100) — prevents runaway loops.
Source Gating Improvements
bootstrap_gate_status()function. Shows which sources are currently gated, when they were gated, how long the gate has been active, and which stream tables are waiting. Useful for debugging "why isn't my stream table refreshing?"- ETL coordination cookbook. SQL Reference now includes five step-by-step recipes for common bulk-load patterns.
More SQL Patterns Supported
Two query patterns that previously required workarounds now just work:
-
Window functions inside expressions. Queries like
CASE WHEN ROW_NUMBER() OVER (...) = 1 THEN 'top' ELSE 'other' ENDorCOALESCE(SUM() OVER (...), 0)are now accepted and produce correct results. Use FULL refresh mode for these queries — incremental (DIFFERENTIAL) refresh of window-in-expression patterns is not yet supported. Previously, the query was rejected entirely at creation time. -
ALL (subquery)comparisons. Queries likeWHERE price < ALL (SELECT price FROM competitors)are now accepted in both FULL and DIFFERENTIAL modes. Supports all comparison operators (>,>=,<,<=,=,<>) and correctly handles NULL values per the SQL standard.
Operational Safety Improvements
-
Function changes detected automatically. If a stream table's query calls a user-defined function and you update that function with
CREATE OR REPLACE FUNCTION, pg_trickle detects the change and automatically rebuilds the stream table on the next cycle. No manual intervention needed. -
WAL mode explains why it isn't activating. When
cdc_mode = 'auto'and the system stays on trigger-based tracking, the scheduler now periodically logs the exact reason (e.g., "wal_levelis notlogical") andcheck_cdc_health()reports the current mode so you can diagnose the issue. -
WAL + keyless tables rejected early. Creating a stream table with
cdc_mode = 'wal'on a table that has no primary key and noREPLICA IDENTITY FULLis now rejected at creation time with a clear error — instead of silently producing incomplete results later. -
Automatic recovery after backup/restore. When a PostgreSQL server is restored from
pg_basebackup, WAL replication slots are lost. pg_trickle now detects the missing slot, automatically falls back to trigger-based tracking, and logs a WARNING so you know what happened.
Documentation
- ALL (subquery) worked example in the SQL Reference with sample data and expected results.
- Window-in-expression documentation showing before/after examples of the automatic rewrite.
- Foreign table sources tutorial — step-by-step guide for using
postgres_fdwforeign tables as stream table sources.
Fixed
create_or_replacewhitespace handling. Extra spaces, tabs, and newlines in queries no longer trigger unnecessary rebuilds.create_or_replaceschema incompatibility detection. Incompatible column type changes (e.g., text → integer) are now properly detected and handled.
[0.5.0] — Row-Level Security, Source Gating & Append-Only Fast Path
Added
Row-Level Security (RLS) Support
Stream tables now work correctly with PostgreSQL's Row-Level Security feature, which lets you control which rows different users can see.
- Refreshes always see all data. When a stream table is refreshed, it computes the full result regardless of RLS policies on the source tables. This matches how PostgreSQL's built-in materialized views work. You then add RLS policies directly on the stream table to control who can read what.
- Internal tables are protected. The internal change-tracking tables used by pg_trickle are shielded from RLS interference, so refreshes won't silently fail if you turn on RLS at the schema level.
- Real-time (IMMEDIATE) mode secured. Triggers that keep stream tables updated in real time now run with elevated privileges and a locked-down search path, preventing data corruption or security bypasses.
- RLS changes are detected automatically. If you enable, disable, or force RLS on a source table, pg_trickle detects the change and marks affected stream tables for a full rebuild.
- New tutorial. Step-by-step guide for setting up per-tenant RLS policies
on stream tables (see
docs/tutorials/ROW_LEVEL_SECURITY.md).
Source Gating for Bulk Loads
New pause/resume mechanism for large data imports. When you're loading a big batch of data into a source table, you can temporarily "gate" it to prevent the background scheduler from triggering refreshes mid-load. Once the load is done, ungate it and everything catches up in a single refresh.
gate_source('my_table')— pauses automatic refreshes for any stream table that depends onmy_table.ungate_source('my_table')— resumes automatic refreshes. All changes made during the gate are picked up in the next refresh cycle.source_gates()— shows which source tables are currently gated, when they were gated, and by whom.- Manual refresh still works. Even while a source is gated, you can
explicitly call
refresh_stream_table()if needed. - Gating is idempotent — calling
gate_source()twice is safe, and gating a source that's already gated is a no-op.
Append-Only Fast Path
Significant performance improvement for tables that only receive INSERTs
(event logs, audit trails, time-series data, etc.). When you mark a stream
table as append_only, refreshes skip the expensive merge logic (checking
for deletes, updates, and row comparisons) and use a simple, fast insert.
- How to use: Pass
append_only => truewhen creating or altering a stream table. - Safe fallback. If a DELETE or UPDATE is detected on a source table, the extension automatically falls back to the standard refresh path and logs a warning. It won't silently produce wrong results.
- Restrictions. Append-only mode requires DIFFERENTIAL refresh mode and source tables with primary keys.
Usability Improvements
- Manual refresh history. When you manually call
refresh_stream_table(), the result (success or failure, timing, rows affected) is now recorded in the refresh history, just like scheduled refreshes. quick_healthview. A single-row health summary showing how many stream tables you have, how many are in error or stale, whether the scheduler is running, and an overall status (OK,WARNING,CRITICAL). Easy to plug into monitoring dashboards.create_stream_table_if_not_exists(). A convenience function that does nothing if the stream table already exists, instead of raising an error. Makes migration scripts and deployment automation simpler.
Smooth Upgrade from 0.4.0
- Existing installations can upgrade with
ALTER EXTENSION pg_trickle UPDATE TO '0.5.0'. All new features (source gating, append-only mode, quick health view, and the new convenience functions) are included in the upgrade script. - The upgrade has been verified with automated tests that confirm all 40 SQL objects survive the upgrade intact.
[0.4.0] — Parallel Refresh & Statement-Level CDC Triggers
Added
Parallel Refresh (opt-in)
Stream tables can now be refreshed in parallel, using multiple background workers instead of processing them one at a time. This can dramatically reduce end-to-end refresh latency when you have many independent stream tables.
- Off by default. Set
pg_trickle.parallel_refresh_mode = 'on'to enable. Use'dry_run'to preview what the scheduler would do without changing behavior. - Automatic dependency awareness. The scheduler figures out which stream tables can safely refresh at the same time and which must wait for others. Stream tables connected by real-time (IMMEDIATE) triggers are always refreshed together to prevent race conditions.
- Atomic groups. When a group of stream tables must succeed or fail together (e.g. diamond dependencies), all members are wrapped in a single transaction — if one fails, the whole group rolls back cleanly.
- Worker pool controls:
pg_trickle.max_dynamic_refresh_workers(default 4) — cluster-wide cap on concurrent refresh workers.pg_trickle.max_concurrent_refreshes— per-database dispatch cap.
- Monitoring:
worker_pool_status()— shows how many workers are active and the current limits.parallel_job_status(max_age_seconds)— lists recent and active refresh jobs with timing and status.health_check()now warns when the worker pool is saturated or the job queue is backing up.
- Self-healing. On startup, the scheduler automatically cleans up orphaned jobs and reclaims leaked worker slots from previous crashes.
Statement-Level CDC Triggers
Change tracking triggers have been upgraded from row-level to statement-level, reducing write-side overhead for bulk INSERT and UPDATE operations. This is now the default for all new and existing stream tables. A benchmark harness is included so you can measure the difference on your own hardware.
dbt Getting Started Example
New examples/dbt_getting_started/ project with a complete, runnable dbt
example showing org-chart seed data, staging views, and three stream table
models. Includes an automated test script.
Fixed
Refresh Lock Not Released After Errors
Fixed a bug where refresh_stream_table() could get permanently stuck after
a PostgreSQL error (e.g. running out of temp file space). The internal lock
was session-level and survived transaction rollback, causing all future
refreshes for that stream table to report "another refresh is already in
progress". Refresh locks are now transaction-level, so they are automatically
released when the transaction ends — whether it succeeds or fails.
dbt Integration Fixes
- Fixed query quoting in dbt macros that broke when queries contained single quotes.
- Fixed
schedule = nonein dbt being incorrectly mapped to SQL NULL. - Fixed view inlining when the same view was referenced with different aliases.
Changed
Internal Code Quality: Integration Test Suite Hardening
Completed a full hardening pass of the integration test suite, bringing all items in PLAN_TEST_EVALS_INTEGRATION.md to done:
-
Multiset validation — Extracted
assert_sets_equal()helper relying on EXCEPT/UNION ALL SQL logic and applied it to workflow tests to ensure storage table state correctly matches the defining query post-refresh. -
Round-trip notifications —
pg_trickle_alertnotifications now verify receipt end-to-end viasqlx::PgListener. -
DVM operators — Added unit coverage for complex semi/anti-join behaviors (multi-column, filtered, complementary), multi-table join chains for inner and full joins, and
proptest!fuzz tests enforcing generated SQL invariants across INNER, SEMI, and ANTI joins. -
Resilience and edge cases — Test coverage for ST drop cascades verifying dependent object removal, exact error escalation thresholds, and scheduler job lifecycles across queued mock states.
-
Cleanups — Standardized naming practices (
test_workflow_*,test_infra_*) and eliminated clock-bound flakes by widening staleness assertions. -
Updated to PostgreSQL 18.3 across CI and test infrastructure.
-
Dependency updates:
tokio1.49 → 1.50 and several GitHub Actions bumps.
Breaking Changes
These behavioural changes shipped in v0.4.0. They improve usability but may require action from users upgrading from v0.3.0.
-
Schedule default changed from
'1m'to'calculated'.create_stream_tablenow defaults toschedule => 'calculated', which auto-computes the refresh interval from downstream dependents instead of refreshing every 1 minute. If you relied on the implicit 1-minute default, explicitly passschedule => '1m'to preserve the old behaviour. -
NULLschedule input rejected. Passingschedule => NULLtocreate_stream_tablenow returns an error. Useschedule => 'calculated'instead — it's explicit and self-documenting. -
Diamond GUCs removed. The cluster-wide GUCs
pg_trickle.diamond_consistencyandpg_trickle.diamond_schedule_policyhave been removed. Diamond behaviour is now controlled per-table via parameters oncreate_stream_table()/alter_stream_table():diamond_consistency => 'atomic',diamond_schedule_policy => 'slowest'.
[0.3.0] — Incremental Correctness & Security Tooling
This is a correctness and hardening release. No new SQL functions, tables, or
views were added — all changes are in the compiled extension code.
ALTER EXTENSION pg_trickle UPDATE is safe and a no-op for schema objects.
Fixed
Incremental Correctness Fixes
All 18 previously-disabled correctness tests have been re-enabled (0 remaining). The following query patterns now produce correct results during incremental (non-full) refreshes:
-
HAVING clause threshold crossing. Queries with
HAVINGfilters (e.g.HAVING SUM(amount) > 100) now produce correct totals when groups cross the threshold. Previously, a group gaining enough rows to meet the condition would show only the newly added values instead of the correct total. -
FULL OUTER JOIN. Five bugs affecting incremental updates for
FULL OUTER JOINqueries are fixed: mismatched row identifiers, incorrect handling of compound GROUP BY expressions likeCOALESCE(left.col, right.col), and wrong NULL handling for SUM aggregates. -
EXISTS with HAVING subqueries. Queries using
WHERE EXISTS(... GROUP BY ... HAVING ...)now work correctly — the inner GROUP BY and HAVING were previously being silently discarded. -
Correlated scalar subqueries. Correlated subqueries in SELECT like
(SELECT MAX(e.salary) FROM emp e WHERE e.dept_id = d.id)are now automatically rewritten into LEFT JOINs so the incremental engine can handle them correctly.
Background Worker Detection on PostgreSQL 18
Fixed a bug where health_check() and the scheduler reported zero active
workers on PostgreSQL 18 due to a column name change in system views.
Scheduler Stability
Fixed a loop where the scheduler launcher could get stuck retrying failed database probes indefinitely instead of backing off properly.
Added
Security Tooling
Added static security analysis to the CI pipeline:
- GitHub CodeQL — automated security scanning across all Rust source files. First scan: zero findings.
cargo deny— enforces a license allow-list and flags unmaintained or yanked dependencies.- Semgrep — custom rules that flag potentially dangerous patterns such as dynamic SQL construction and privilege escalation. Advisory-only (does not block merges).
- Unsafe block inventory — CI tracks the count of unsafe code blocks per file and fails if any file exceeds its baseline, preventing unreviewed growth of low-level code.
[0.2.3] — Per-Table CDC Mode & WAL Lag Monitoring
Added
-
Unsafe function detection. Queries using non-deterministic functions like
random()orclock_timestamp()are now rejected when creating incremental stream tables, because they can't produce reliable results. Functions likenow()that return the same value within a transaction are allowed with a warning. -
Per-table change tracking mode. You can now choose how each stream table tracks changes (
'auto','trigger', or'wal') via thecdc_modeparameter oncreate_stream_table()andalter_stream_table(), instead of relying only on the global setting. -
CDC status view. New
pgtrickle.pgt_cdc_statusview shows the change tracking mode, replication slot, and transition status for every source table in one place. -
Configurable WAL lag thresholds. The warning and critical thresholds for replication slot lag are now configurable via
pg_trickle.slot_lag_warning_threshold_mb(default 100 MB) andpg_trickle.slot_lag_critical_threshold_mb(default 1024 MB), instead of being hard-coded. -
pg_trickle_dumpbackup tool. New standalone CLI that exports all your stream table definitions as replayable SQL, ordered by dependency. Useful for backups before upgrades or migrations. -
Upgrade path.
ALTER EXTENSION pg_trickle UPDATEpicks up all new features from this release.
Changed
Internal Code Quality: Integration Test Suite Hardening
Completed a full hardening pass of the integration test suite, bringing all items in PLAN_TEST_EVALS_INTEGRATION.md to done:
-
Multiset validation — Extracted
assert_sets_equal()helper relying on EXCEPT/UNION ALL SQL logic and applied it to workflow tests to ensure storage table state correctly matches the defining query post-refresh. -
Round-trip notifications —
pg_trickle_alertnotifications now verify receipt end-to-end viasqlx::PgListener. -
DVM operators — Added unit coverage for complex semi/anti-join behaviors (multi-column, filtered, complementary), multi-table join chains for inner and full joins, and
proptest!fuzz tests enforcing generated SQL invariants across INNER, SEMI, and ANTI joins. -
Resilience and edge cases — Test coverage for ST drop cascades verifying dependent object removal, exact error escalation thresholds, and scheduler job lifecycles across queued mock states.
-
Cleanups — Standardized naming practices (
test_workflow_*,test_infra_*) and eliminated clock-bound flakes by widening staleness assertions. -
After a full refresh, WAL replication slots are now advanced to the current position, preventing unnecessary WAL accumulation and false lag alarms.
-
Change buffers are now flushed after a full refresh, fixing a cycle where the scheduler would alternate endlessly between incremental and full refreshes on bulk-loaded tables.
-
IMMEDIATE mode now correctly rejects explicit WAL CDC requests with a clear error, since real-time mode uses its own trigger mechanism.
-
The
pg_trickle.user_triggerssetting is simplified toautoandoff. The oldonvalue still works as an alias forauto. -
CI pipelines are faster on PRs — only essential tests run; the full suite runs on merge and daily schedule.
[0.2.2] — AUTO Refresh Mode & Query Alteration
Added
-
Change a stream table's query.
alter_stream_tablenow accepts aqueryparameter, so you can change what a stream table computes without dropping and recreating it. If the new query's columns are compatible, the underlying storage table is preserved — existing views, policies, and publications continue to work. -
AUTO refresh mode (new default). Stream tables now default to
AUTOmode, which uses fast incremental updates when the query supports it and automatically falls back to a full recompute when it doesn't. You no longer need to think about whether your query is "incremental-compatible" — just create the stream table and it picks the best strategy. -
Version mismatch warning. The background scheduler now warns if the installed extension version doesn't match the compiled library, making it easier to spot a half-finished upgrade.
-
ORDER BY + LIMIT + OFFSET. You can now page through top-N results, e.g.
ORDER BY revenue DESC LIMIT 10 OFFSET 20to get the third page of top earners. -
Real-time mode: recursive queries.
WITH RECURSIVEqueries (e.g. org-chart hierarchies) now work in IMMEDIATE mode. A depth limit (default 100) prevents infinite loops. -
Real-time mode: top-N queries.
ORDER BY ... LIMIT Nqueries now work in IMMEDIATE mode — the top-N rows are recomputed on every data change. Maximum N is controlled bypg_trickle.ivm_topk_max_limit(default 1000). -
Foreign table support. Stream tables can now use foreign tables as sources. Changes are detected by comparing snapshots since foreign tables don't support triggers. Enable with
pg_trickle.foreign_table_polling = on. -
Documentation reorganization. Configuration and SQL reference docs are reorganized around practical workflows. New sections cover DDL-during-refresh behavior, standby/replica limitations, and PgBouncer constraints.
Changed
Internal Code Quality: Integration Test Suite Hardening
Completed a full hardening pass of the integration test suite, bringing all items in PLAN_TEST_EVALS_INTEGRATION.md to done:
-
Multiset validation — Extracted
assert_sets_equal()helper relying on EXCEPT/UNION ALL SQL logic and applied it to workflow tests to ensure storage table state correctly matches the defining query post-refresh. -
Round-trip notifications —
pg_trickle_alertnotifications now verify receipt end-to-end viasqlx::PgListener. -
DVM operators — Added unit coverage for complex semi/anti-join behaviors (multi-column, filtered, complementary), multi-table join chains for inner and full joins, and
proptest!fuzz tests enforcing generated SQL invariants across INNER, SEMI, and ANTI joins. -
Resilience and edge cases — Test coverage for ST drop cascades verifying dependent object removal, exact error escalation thresholds, and scheduler job lifecycles across queued mock states.
-
Cleanups — Standardized naming practices (
test_workflow_*,test_infra_*) and eliminated clock-bound flakes by widening staleness assertions. -
Default refresh mode changed from
'DIFFERENTIAL'to'AUTO'. -
Default schedule changed from
'1m'to'calculated'(automatic). -
Default change tracking mode changed from
'trigger'to'auto'— WAL-based tracking starts automatically when available, with trigger-based as fallback.
[0.2.1] — Safe Upgrades & Scheduling Improvements
Added
-
Safe upgrades. New upgrade infrastructure ensures that
ALTER EXTENSION pg_trickle UPDATEworks correctly. A CI check detects missing functions or views in upgrade scripts, and automated tests verify that stream tables survive version-to-version upgrades intact. See docs/UPGRADING.md for the upgrade guide. -
ORDER BY + LIMIT + OFFSET. You can now create stream tables over paged results, like "the second page of the top-100 products by revenue" (
ORDER BY revenue DESC LIMIT 100 OFFSET 100). -
'calculated'schedule. Instead of passing SQLNULLto request automatic scheduling, you can now writeschedule => 'calculated'. PassingNULLnow gives a helpful error message. -
Documentation expansion. Six new pages in the online book covering dbt integration, contributing guidelines, security policy, release process, and research comparisons with other projects.
-
Better warnings and safety checks:
- Warning when a source table lacks a primary key (duplicate rows are handled safely but less efficiently).
- Warning when using
SELECT *(new columns added later can break incremental updates). - Alert when the refresh queue is falling behind (> 80% capacity).
- Guard triggers prevent accidental direct writes to stream table storage.
- Automatic fallback from WAL to trigger-based change tracking when the replication slot disappears.
- Nested window functions and complex
WHEREclauses withEXISTSare now handled automatically.
-
Change buffer partitioning. For high-throughput tables, change buffers can now be partitioned so that processed data is dropped efficiently.
-
Column pruning. The incremental engine now skips source columns not used in the query, reducing I/O for wide tables.
Changed
Internal Code Quality: Integration Test Suite Hardening
Completed a full hardening pass of the integration test suite, bringing all items in PLAN_TEST_EVALS_INTEGRATION.md to done:
-
Multiset validation — Extracted
assert_sets_equal()helper relying on EXCEPT/UNION ALL SQL logic and applied it to workflow tests to ensure storage table state correctly matches the defining query post-refresh. -
Round-trip notifications —
pg_trickle_alertnotifications now verify receipt end-to-end viasqlx::PgListener. -
DVM operators — Added unit coverage for complex semi/anti-join behaviors (multi-column, filtered, complementary), multi-table join chains for inner and full joins, and
proptest!fuzz tests enforcing generated SQL invariants across INNER, SEMI, and ANTI joins. -
Resilience and edge cases — Test coverage for ST drop cascades verifying dependent object removal, exact error escalation thresholds, and scheduler job lifecycles across queued mock states.
-
Cleanups — Standardized naming practices (
test_workflow_*,test_infra_*) and eliminated clock-bound flakes by widening staleness assertions. -
Default
schedulechanged from'1m'to'calculated'(automatic). -
Minimum schedule interval lowered from 60 s to 1 s.
-
Cluster-wide diamond consistency settings removed; per-table settings remain and now default to
'atomic'/'fastest'.
Fixed
- The 0.1.3 → 0.2.0 upgrade script was accidentally a no-op, silently skipping 11 new functions. Fixed.
- Queries combining
WITH(CTEs) andUNION ALLnow parse correctly.
[0.2.0] — Monitoring, IMMEDIATE Mode & Diamond Consistency
Added
-
Monitoring & health checks. Six new functions for inspecting your stream tables at runtime (no superuser required):
change_buffer_sizes()— shows how much pending change data each stream table has queued up.list_sources(name)— lists all base tables that feed a given stream table, with row counts and size estimates.dependency_tree()— displays an ASCII tree of how your stream tables depend on each other.health_check()— quick system triage that checks whether the scheduler is running, flags tables in error or stale, and warns about large change buffers or WAL lag.refresh_timeline()— recent refresh history across all stream tables, showing timing, row counts, and any errors.trigger_inventory()— verifies that all required change-tracking triggers are in place and enabled.
-
IMMEDIATE refresh mode (real-time updates). New
'IMMEDIATE'mode keeps stream tables updated within the same transaction as your data changes. There's no delay — the stream table reflects changes the instant they happen. Supports window functions, LATERAL joins, scalar subqueries, and aggregate queries. You can switch between IMMEDIATE and other modes at any time usingalter_stream_table. -
Top-N queries (ORDER BY + LIMIT). Queries like
SELECT ... ORDER BY score DESC LIMIT 10are now supported. The stream table stores only the top N rows and updates efficiently. -
Diamond dependency consistency. When multiple stream tables share common sources and feed into the same downstream table (a "diamond" pattern), they can now be refreshed as an atomic group — either all succeed or all roll back. This prevents inconsistent reads at convergence points. Controlled via the
diamond_consistencyparameter (default:'atomic'). -
Multi-database auto-discovery. The background scheduler now automatically finds and services all databases on the server where pg_trickle is installed. No manual
pg_trickle.databaseconfiguration required — just install the extension and the scheduler discovers it.
Fixed
- Fixed IMMEDIATE mode incorrectly trying to read from change buffer tables (which don't exist in that mode) for certain aggregate queries.
- Fixed type mismatches when join queries had unchanged source tables producing empty change sets.
- Fixed join condition column order being swapped when the right-side table was
written first in the
ONclause (e.g.ON r.id = l.id). - Fixed dbt macros silently rolling back stream table creation because dbt
wraps statements in a
ROLLBACKby default. - Fixed
LIMIT ALLbeing incorrectly rejected as an unsupported LIMIT clause. - Fixed false "query may produce incorrect incremental results" warnings on
simple arithmetic like
depth + 1orpath || name. - Fixed auto-created indexes using the wrong column name when the query had a
column alias (e.g.
SELECT id AS department_id).
[0.1.3] — TPC-H Correctness, Window Functions & Aggregate Fixes
Major hardening release with 50 improvements across correctness, robustness, operational safety, and test coverage.
Added
- DDL change tracking expanded.
ALTER TYPE,ALTER POLICY, andALTER DOMAINon source tables are now detected and trigger a rebuild of affected stream tables. Previously only column changes were tracked. - Recursive query safety guard. Recursive CTEs (
WITH RECURSIVE) are now checked for non-monotonic terms that could produce incorrect incremental results. - Read replica awareness. The background scheduler detects when it's running on a read replica and skips refresh work, preventing errors.
- Range aggregates rejected.
RANGE_AGGandRANGE_INTERSECT_AGGare now properly rejected in incremental mode with a clear error. - Refresh history: row counts. Refresh history now records how many rows were inserted, updated, and deleted in each refresh cycle.
- Change buffer alerts. New
pg_trickle.buffer_alert_thresholdsetting lets you configure when to be warned about growing change buffers. st_auto_threshold()function. Shows the current adaptive threshold that decides when to switch between incremental and full refresh.- Wide table optimization. Tables with more than 50 columns use a hash shortcut during refresh merges, improving performance.
- Change buffer security. Internal change buffer tables are no longer
accessible to
PUBLIC. - Documentation. PgBouncer compatibility, keyless table limitations, delta memory bounds, sequential processing rationale, and connection overhead are all now documented in the FAQ.
TPC-H Correctness Suite: 22/22 Queries Passing
The TPC-H-derived correctness test suite (22 industry-standard analytical queries) now passes completely across multiple rounds of data changes. This validates that incremental refreshes produce identical results to full recomputation for complex real-world query patterns.
Fixed
Window Function Correctness
Fixed incremental maintenance of window functions (ROW_NUMBER, RANK, DENSE_RANK, NTILE, LAG/LEAD, SUM OVER, etc.) to correctly handle:
- Non-RANGE frame types
- Ranking functions over tied values
- Window functions wrapping aggregates (e.g.
RANK() OVER (ORDER BY SUM(x))) - Multiple window functions with different PARTITION BY clauses
INTERSECT / EXCEPT Correctness
Fixed incremental maintenance of INTERSECT and EXCEPT queries that
produced wrong results due to invalid SQL generation.
EXISTS / IN with OR Correctness
Fixed EXISTS and IN subqueries combined with OR in WHERE clauses that
produced wrong results.
Aggregate Correctness
MIN/MAXnow correctly rescan the source table when the current minimum or maximum value is deleted.STRING_AGG(... ORDER BY ...)andARRAY_AGG(... ORDER BY ...)no longer silently drop the ORDER BY clause.
[0.1.2] — Incremental Correctness Fixes & Project Rename
Changed
Internal Code Quality: Integration Test Suite Hardening
Completed a full hardening pass of the integration test suite, bringing all items in PLAN_TEST_EVALS_INTEGRATION.md to done:
- Multiset validation — Extracted
assert_sets_equal()helper relying on EXCEPT/UNION ALL SQL logic and applied it to workflow tests to ensure storage table state correctly matches the defining query post-refresh. - Round-trip notifications —
pg_trickle_alertnotifications now verify receipt end-to-end viasqlx::PgListener. - DVM operators — Added unit coverage for complex semi/anti-join behaviors (multi-column, filtered, complementary), multi-table join chains for inner and full joins, and
proptest!fuzz tests enforcing generated SQL invariants across INNER, SEMI, and ANTI joins. - Resilience and edge cases — Test coverage for ST drop cascades verifying dependent object removal, exact error escalation thresholds, and scheduler job lifecycles across queued mock states.
- Cleanups — Standardized naming practices (
test_workflow_*,test_infra_*) and eliminated clock-bound flakes by widening staleness assertions.
Project Renamed from pg_stream to pg_trickle
Renamed the entire project from pg_stream to pg_trickle to avoid a
naming collision with an unrelated project. If you were using the old name,
all configuration prefixes changed from pg_stream.* to pg_trickle.*, and
the SQL schemas changed from pgstream to pgtrickle. The "stream tables"
terminology is unchanged.
Fixed
Fixed numerous incremental computation bugs discovered while building a comprehensive correctness test suite based on all 22 TPC-H analytical queries:
- Inner join double-counting. When both sides of a join had changes in the same refresh cycle, some rows were counted twice.
- Shared source cleanup. Cleaning up processed changes for one stream table could accidentally delete entries still needed by another stream table sharing the same source.
- Scalar aggregate identity mismatch. Queries like
SELECT SUM(amount) FROM orderscould produce mismatched row identifiers between the incremental and merge phases. AVG also failed to recompute correctly after partial group changes. - EXISTS / NOT EXISTS snapshots. Incremental maintenance of
EXISTSandNOT EXISTSsubqueries missed pre-change state, producing wrong results. - Column resolution in complex joins. Several fixes for column name resolution in multi-table joins and nested subqueries.
- COUNT(*) rendering.
COUNT(*)was sometimes rendered asCOUNT()(missing the star), causing SQL errors. - Subquery rewriting. Several subquery patterns (correlated vs non-correlated scalar subqueries, derived tables in FROM) were incorrectly rewritten, blocking certain queries from being created.
- Cleanup worker crash. The background cleanup worker no longer crashes when it encounters entries for stream tables that were dropped mid-cycle.
Added
TPC-H Correctness Test Suite
Added a comprehensive correctness test suite based on all 22 TPC-H analytical queries. These tests verify that incremental refreshes produce identical results to a full recompute after INSERT, DELETE, and UPDATE mutations. 20 of 22 queries can be created as stream tables; 15 pass full correctness checks at this point (improved to 22/22 in v0.1.3).
[0.1.1] — CloudNativePG Image & Test Hardening
Changed
Internal Code Quality: Integration Test Suite Hardening
Completed a full hardening pass of the integration test suite, bringing all items in PLAN_TEST_EVALS_INTEGRATION.md to done:
- Multiset validation — Extracted
assert_sets_equal()helper relying on EXCEPT/UNION ALL SQL logic and applied it to workflow tests to ensure storage table state correctly matches the defining query post-refresh. - Round-trip notifications —
pg_trickle_alertnotifications now verify receipt end-to-end viasqlx::PgListener. - DVM operators — Added unit coverage for complex semi/anti-join behaviors (multi-column, filtered, complementary), multi-table join chains for inner and full joins, and
proptest!fuzz tests enforcing generated SQL invariants across INNER, SEMI, and ANTI joins. - Resilience and edge cases — Test coverage for ST drop cascades verifying dependent object removal, exact error escalation thresholds, and scheduler job lifecycles across queued mock states.
- Cleanups — Standardized naming practices (
test_workflow_*,test_infra_*) and eliminated clock-bound flakes by widening staleness assertions.
CloudNativePG Extension Image
Replaced the full PostgreSQL Docker image (~400 MB) with a minimal extension-only image (< 10 MB) following the CloudNativePG Image Volume Extensions specification. This means faster pulls and less disk usage in Kubernetes deployments. The image contains just the extension files — no full PostgreSQL server.
[0.1.0] — Initial Release
Initial release of pg_trickle — a PostgreSQL extension that keeps query results automatically up to date as your data changes.
Core Concept
Define a SQL query and a schedule. pg_trickle creates a stream table that stores the query's results and keeps them fresh — either on a schedule (every N seconds) or in real time. When data in your source tables changes, only the affected rows are recomputed instead of re-running the entire query.
What You Can Do
- Create stream tables from
SELECTqueries — joins, aggregates, subqueries, CTEs, window functions, set operations, and more. - Automatic refresh — a background scheduler refreshes stream tables in dependency order. You can also trigger refreshes manually.
- Incremental updates — the engine automatically figures out how to update only the rows that changed, instead of recomputing everything. This works for most query patterns including multi-table joins and aggregates.
- Views as sources — views referenced in your query are automatically expanded so change tracking works on the underlying tables.
- Tables without primary keys — supported via content hashing. Tables with primary keys get better performance.
- Hybrid change tracking — starts with lightweight triggers (no special
PostgreSQL configuration needed). Can automatically switch to WAL-based
tracking for lower overhead when
wal_level = logicalis available. - Multi-database support — the scheduler automatically discovers all databases on the server where the extension is installed.
- User triggers on stream tables — your own
AFTERtriggers on stream tables fire correctly during incremental refreshes. - DDL awareness —
ALTER TABLE,DROP TABLE,CREATE OR REPLACE FUNCTION, and other DDL on source tables or functions used in your query are detected and handled automatically.
SQL Support
Broad coverage of SQL features:
- Joins: INNER, LEFT, RIGHT, FULL OUTER, NATURAL, LATERAL subqueries,
LATERAL set-returning functions (
unnest,jsonb_array_elements, etc.) - Aggregates: 39 functions including COUNT, SUM, AVG, MIN, MAX, STRING_AGG, ARRAY_AGG, JSON_ARRAYAGG, JSON_OBJECTAGG, statistical regression functions (CORR, COVAR_, REGR_), and ordered-set aggregates (MODE, PERCENTILE_CONT, PERCENTILE_DISC)
- Window functions: ROW_NUMBER, RANK, DENSE_RANK, NTILE, LAG, LEAD, SUM OVER, etc. with full frame clause support
- Set operations: UNION, UNION ALL, INTERSECT, EXCEPT
- Subqueries: in FROM, EXISTS/NOT EXISTS, IN/NOT IN, scalar subqueries
- CTEs:
WITHandWITH RECURSIVE - Special syntax: DISTINCT, DISTINCT ON, GROUPING SETS / CUBE / ROLLUP, CASE WHEN, COALESCE, JSON_TABLE (PostgreSQL 17+)
- Unsafe function detection: queries using non-deterministic functions
like
random()are rejected with a clear error
Monitoring
explain_st()— shows the incremental computation planst_refresh_stats(),get_refresh_history(),get_staleness()— refresh performance and statusslot_health()— WAL replication slot healthcheck_cdc_health()— change tracking health per source tablestream_tables_infoandpg_stat_stream_tablesviews- NOTIFY alerts for stale data, errors, and refresh events
Documentation
- Architecture guide, SQL reference, configuration reference, FAQ, getting-started tutorial, and deep-dive tutorials.
Known Limitations
TABLESAMPLE,LIMIT/OFFSET,FOR UPDATE/FOR SHARE— not yet supported (clear error messages).- Window functions inside expressions (e.g.
CASE WHEN ROW_NUMBER() ...) — not yet supported. - Circular stream table dependencies — not yet supported.
Viewing on GitHub? The roadmap lives in ROADMAP.md. This stub is served by the pg_trickle docs site — the include below renders there.
pg_trickle Roadmap
Audience: Product managers, stakeholders, and technically curious readers who want to understand what each release delivers and why it matters — without needing to read Rust code or SQL specifications.
Versions
Foundation (v0.1.x)
| Version | Theme | Status | Scope | Full details |
|---|---|---|---|---|
| v0.1.0 | The complete foundation — differential engine, CDC, scheduling, monitoring | ✅ Released | Very Large | Full details |
| v0.1.1 | Change capture correctness fixes (WAL decoder, UPDATE handling) | ✅ Released | Patch | Full details |
| v0.1.2 | DDL tracking improvements and PgBouncer compatibility | ✅ Released | Patch | Full details |
| v0.1.3 | SQL coverage completion, WAL hardening, TPC-H 22/22 | ✅ Released | Patch | Full details |
Early Feature Development (v0.2.x – v0.5.x)
| Version | Theme | Status | Scope | Full details |
|---|---|---|---|---|
| v0.2.0 | Top-N views, IMMEDIATE refresh mode, diamond dependency safety | ✅ Released | Medium | Full details |
| v0.2.1 | Upgrade infrastructure and documentation expansion | ✅ Released | Small | Full details |
| v0.2.2 | Paginated top-N, AUTO mode default, ALTER QUERY | ✅ Released | Medium | Full details |
| v0.2.3 | Non-determinism detection and operational polish | ✅ Released | Small | Full details |
| v0.3.0 | Correctness for HAVING, FULL OUTER JOIN, and correlated subqueries | ✅ Released | Medium | Full details |
| v0.4.0 | Parallel refresh, statement-level CDC triggers, cross-source consistency | ✅ Released | Medium | Full details |
| v0.5.0 | Row-level security, ETL bootstrap gating, API polish | ✅ Released | Medium | Full details |
Scalability and Robustness (v0.6.x – v0.9.x)
| Version | Theme | Status | Scope | Full details |
|---|---|---|---|---|
| v0.6.0 | Partitioned source tables, idempotent DDL, circular dependency foundation | ✅ Released | Medium | Full details |
| v0.7.0 | Circular DAG execution, watermarks, Prometheus/Grafana observability | ✅ Released | Large | Full details |
| v0.8.0 | pg_dump backup support and multiset invariant testing | ✅ Released | Small | Full details |
| v0.9.0 | Algebraic aggregate maintenance — AVG, STDDEV, COUNT(DISTINCT) | ✅ Released | Medium | Full details |
Production Readiness (v0.10.x – v0.14.x)
| Version | Theme | Status | Scope | Full details |
|---|---|---|---|---|
| v0.10.0 | DVM hardening, PgBouncer compatibility, "No Surprises" UX | ✅ Released | Medium | Full details |
| v0.11.0 | Partitioned stream tables, event-driven scheduler (34× latency), circuit breaker | ✅ Released | Large | Full details |
| v0.12.0 | Three-table join fix (EC-01), developer tools, SQLancer fuzzing | ✅ Released | Medium | Full details |
| v0.13.0 | Columnar change tracking, shared buffers, TPC-H 22/22 DIFFERENTIAL | ✅ Released | Large | Full details |
| v0.14.0 | Tiered scheduling, UNLOGGED buffers, diagnostics | ✅ Released | Medium | Full details |
Performance and Integration (v0.15.x – v0.19.x)
| Version | Theme | Status | Scope | Full details |
|---|---|---|---|---|
| v0.15.0 | Nexmark benchmark, bulk create API, watermark hold-back, dbt Hub | ✅ Released | Medium | Full details |
| v0.16.0 | Append-only fast path, algebraic aggregates, auto-indexing, benchmark CI | ✅ Released | Medium | Full details |
| v0.17.0 | Cost-based refresh strategy, incremental DAG rebuild, pg_ivm migration guide | ✅ Released | Large | Full details |
| v0.18.0 | Z-set delta engine, consistency enforcement, safety hardening | ✅ Released | Large | Full details |
| v0.19.0 | Security hardening, packaging (PGXN, Docker Hub, apt/rpm) | ✅ Released | Medium | Full details |
Self-Monitoring and Deep Correctness (v0.20.x – v0.27.x)
| Version | Theme | Status | Scope | Full details |
|---|---|---|---|---|
| v0.20.0 | pg_trickle monitors itself using its own stream tables | ✅ Released | Large | Full details |
| v0.21.0 | Correctness hardening, zero-crash guarantee, shadow/canary mode | ✅ Released | Large | Full details |
| v0.22.0 | Downstream CDC publication, parallel refresh pool, SLA tier auto-assignment | ✅ Released | Large | Full details |
| v0.23.0 | TPC-H DVM scaling performance — all 22 queries at O(Δ) | ✅ Released | Large | Full details |
| v0.24.0 | Join correctness complete fix, two-phase frontier, TOAST-aware CDC | ✅ Released | Large | Full details |
| v0.25.0 | Thousands of stream tables, pooler cold-start fix, predictive model | ✅ Released | Large | Full details |
| v0.26.0 | Concurrency testing, fuzz targets, refresh engine modularisation | ✅ Released | Large | Full details |
| v0.27.0 | Snapshot/PITR, schedule recommendations, cluster observability | ✅ Released | Medium | Full details |
Toward Stable (v0.28.x – v1.0)
| Version | Theme | Status | Scope | Full details |
|---|---|---|---|---|
| v0.28.0 | Reliable event messaging built into PostgreSQL | ✅ Released | Large | Full details |
| v0.29.0 | Off-the-shelf connector to Kafka, NATS, SQS, and more | ✅ Released | Large | Full details |
| v0.30.0 | Quality gate before 1.0 — correctness, stability, and docs | ✅ Released | Medium | Full details |
| v0.31.0 | Smarter scheduling and faster hot paths | ✅ Released | Medium | Full details |
| v0.32.0 | Citus: stable object naming and per-source frontier foundation | ✅ Released | Medium | Full details |
| v0.33.0 | Citus: world-class distributed source CDC and stream table support | ✅ Released | Large | Full details |
| v0.34.0 | Citus: automated distributed CDC scheduler wiring and shard rebalance auto-recovery | ✅ Released | Medium | Full details |
| v0.35.0 | EC-01 correctness closeout, Citus chaos hardening, reactive subscriptions, zero-downtime schema changes | ✅ Released | Large | Full details |
| v0.36.0 | Structural hardening, L0 cache, WAL backpressure, temporal IVM, columnar storage | ✅ Released | Large | Full details |
| v0.37.0 | Scheduler & merge modularisation, pgVectorMV (vector_avg/sum), OpenTelemetry trace propagation | ✅ Released | Medium | Full details |
| v0.38.0 | EC-01 Correctness Sprint (Hard Gate): join phantom rows, property-test convergence proof — BLOCKING release gate | ✅ Released | Medium | Full details |
| v0.39.0 | Operational Truthfulness & Distributed Hardening: backpressure/wake fix, generated docs, Citus chaos, SQLSTATE rollout, diagnostics | ✅ Released | Large | Full details |
| v0.40.0 | Operator trust and maintainability: generated references, alerting, drain-mode proof, secret hygiene, unsafe gating | ✅ Released | Large | Full details |
| v0.41.0 | DVM correctness: structural cache keys, placeholder safety, WAL transition guards | ✅ Released | Medium | Full details |
| v0.42.0 | Documentation truthfulness + test quality: repair_stream_table, catalog generator, SQL reference, sleep removal, fuzz CI | ✅ Released | Large | Full details |
| v0.43.0 | Performance tunability: deep-join GUCs, GROUP_RESCAN improvement, explain_stream_table diagnostics, D+I change buffer refactor | ✅ Released | Large | Full details |
| v0.44.0 | Security hardening: IVM search_path fix, centralized SQL builder, RLS warnings, module decomposition | ✅ Released | Large | Full details |
| v0.45.0 | Operational readiness: preflight functions, scalability infrastructure, CI completeness, CNPG production examples | ✅ Released | Large | Full details |
pg_tide Extraction (v0.46.0)
| Version | Theme | Status | Scope | Full details |
|---|---|---|---|---|
| v0.46.0 | Extract pg_tide: standalone transactional outbox, inbox, and relay into trickle-labs/pg-tide | ✅ Released | Large | Full details |
Embedding & AI Programme (v0.47.x – v0.48.x)
| Version | Theme | Status | Scope | Full details |
|---|---|---|---|---|
| v0.47.0 | Embedding pipeline infrastructure: post-refresh hooks, drift-based reindex, vector monitoring | ✅ Released | Medium | Full details |
| v0.48.0 | Complete embedding programme: sparse/half-precision vector aggregates, hybrid search, embedding_stream_table() API, per-tenant ANN, embedding outbox | ✅ Released | Large | Full details |
v1.0 Readiness Arc (v0.49.x – v0.51.x)
| Version | Theme | Status | Scope | Full details |
|---|---|---|---|---|
| v0.49.0 | Test infrastructure hardening: concurrency synchronization overhaul, 10-module unit test sweep, merge/row_id fuzz targets, DDL-during-refresh E2E, scheduler decomposition, CI smoke breadth | ✅ Released | Large | Full details |
| v0.49.1 | Repository migration to trickle-labs/pg-trickle: updated CI/CD, Docker, PGXN, dbt Hub, and CloudNativePG artifact publishing | ✅ Released | Patch | — |
| v0.50.0 | Performance, security & operational hardening: SPI batching in differential refresh, dblink escaping fix, CNPG graceful-drain preStop hook, Docker image digest pinning, invalidation ring observability, deep-join drift monitoring, Prometheus secondary metrics | ✅ Released | Large | Full details |
| v0.51.0 | Citus chaos resilience & documentation truth: chaos test rig (node kill/rebalance/partition), deprecated GUC removal, ARCHITECTURE.md pg_tide boundary, recursive CTE strategy docs, CDC-enabled-flag documentation | ✅ Released | Large | Full details |
Assessment-Driven Final Hardening Arc (v0.52.x – v0.55.x)
Driven by the findings in the v0.51.0 overall assessment (plans/PLAN_OVERALL_ASSESSMENT_11.md). The assessment found 0 critical, 2 HIGH, and 22 MEDIUM findings across correctness, performance, scalability, test coverage, code quality, security, and feature completeness — all resolved in this four-release arc before v1.0.
| Version | Theme | Status | Scope | Full details |
|---|---|---|---|---|
| v0.52.0 | DVM hot-path performance: O(1) placeholder resolution (aho-corasick), thread-local volatility cache, lazy DiffContext allocations, O(1) template LRU eviction | ✅ Released | Large | Full details |
| v0.53.0 | Unit test depth sweep: dag, scheduler, CDC, parser, config — eleven modules with zero inline coverage — plus proptest extension and buffer-growth sleep removal | ✅ Released | Large | Full details |
| v0.54.0 | DVM engine hardening: diff_node depth limit, DiffContext CTE cap (OOM guard), snapshot fingerprint caching, Expr::to_sql() caching, view inlining fixpoint + batched relkind, ST source frontier validation, O(V+E) diamond detection | ✅ Released | Large | Full details |
| v0.55.0 | Final pre-1.0 polish: GUC-configurable invalidation ring, api/mod.rs and monitor.rs module decomposition, serde_json NOTIFY payloads, multi-column IN rewrite to EXISTS, DVM parse metrics, reserved-prefix docs, GUC rationale comments, PR coverage gate | ✅ Released | Large | Full details |
Documentation Excellence Arc (v0.56.x – v0.57.x)
Driven by the findings in the Round 2 documentation audit (plans/PLAN_DOCUMENTATION_GAPS_2.md, 2026-05-11). The audit found 3 P0 blockers (corrupted GUC_CATALOG.md, 54%-complete ERRORS.md, wrong GUC default), 8 P1 items, 7 P2 items, 5 P3 items, and 7 new documents that should exist before v1.0. This two-release arc resolves all findings and delivers the world-class documentation standard planned for the stable release.
| Version | Theme | Status | Scope | Full details |
|---|---|---|---|---|
| v0.56.0 | Documentation Foundation: fix GUC_CATALOG corruption, complete ERRORS.md (all 44 variants), correct parallel_refresh_mode default, complete SQL_REFERENCE outbox/inbox, add MENTAL_MODEL.md, LIMITATIONS.md, PERFORMANCE_CHEATSHEET.md | ✅ Released | Large | Full details |
| v0.57.0 | Documentation Excellence: four new tutorials (first dashboard, event sourcing, backfill/migration, security hardening), P2/P3 quality polish, full 83-file consistency sweep | ✅ Released | Large | Full details |
Assessment-Driven Hardening Arc (v0.58.x – v0.61.x)
Driven by the findings in the v0.57.0 overall assessment (plans/PLAN_OVERALL_ASSESSMENT_12.md). The assessment found 0 critical, 4 HIGH, 23 MEDIUM, and 20 LOW findings across security (ownership bypass in outbox/publication APIs), correctness (recursive-CTE depth guard in DIFFERENTIAL mode, multi-column NOT IN + NULL semantics, WAL decoder TOCTOU race), performance (per-source SPI fan-out in monitor, merge-template clone overhead, WAL decoder allocation patterns), observability (missing CDC-lag percentiles, worker queue-depth, WAL decoder queue, refresh-mode ratio counters), code quality (scheduler log levels, codegen decomposition, cdc.rs split), and test coverage (refresh orchestrator, CDC, hooks, remaining fixed sleeps). This four-release arc resolves all findings before v1.0.
| Version | Theme | Status | Scope | Full details |
|---|---|---|---|---|
| v0.58.0 | Security & Correctness Hardening: ownership checks for outbox/publication APIs, multi-column NOT IN + NULL fix, recursive CTE depth guard in DIFFERENTIAL mode, WAL decoder TOCTOU advisory lock, DDL hook escalation on SPI failure | ✅ Released | Medium | Full details |
| v0.59.0 | Performance & Observability: batched monitor buffer-growth SPI, query-hash caching, Arc | ✅ Released | Large | Full details |
| v0.60.0 | Code Quality, Test Coverage & CI: scheduler log levels, codegen decomposition, cdc.rs 4-way split, refresh orchestrator/merge/CDC/hooks unit tests, differential idempotence proptest, sleep removal, WAL OID filter, partition-attach rebuild, path-filtered full E2E on PRs, Dockerfile non-root, codecov module thresholds | Planned | Large | Full details |
| v0.61.0 | DX, Documentation & Final Pre-1.0 Polish: health_check() foreign-owner row, SQL_REFERENCE completeness, snapshot cache secondary equality, cte_counter reset, outbox name collision fix, sublinks.rs decomposition, ctid invariant comment, 3 foundational ADRs, LIMITATIONS.md NOT IN + NULL section, SEARCH/CYCLE clear error, LATERAL+DIFFERENTIAL docs | Planned | Large | Full details |
Scheduler Throughput Arc (v0.62.x – v0.63.x)
Two releases targeting scheduler throughput: eliminating redundant change-buffer
scans via fan-out, adding the pause_scheduler / resume_scheduler /
stream_table_spec SQL API required by the planned pg_aqueduct migration tool
(plans/pg-aqueduct-plan.md), and implementing
fused CTE refresh to reduce per-tick statement overhead for multi-node DAGs.
| Version | Theme | Status | Scope | Full details |
|---|---|---|---|---|
| v0.62.0 | Scheduler throughput: change-buffer fan-out (O(N)→O(1) scans for multi-consumer DAGs), pause_scheduler / resume_scheduler per-node SQL functions, stream_table_spec(oid) stable JSON projection | Planned | Medium | Full details |
| v0.63.0 | Fused multi-node refresh: CTE-chain composition of per-node delta SQL in a single statement, correctness property test, benchmark regression gate (≥ 20 % wall-time reduction on TPC-H 22-node DAG) | Planned | Large | Full details |
Beyond v1.0
| Version | Theme | Status | Scope | Full details |
|---|---|---|---|---|
| v1.0.0 | Stable release — PostgreSQL 19, package registries, signed artifacts, SBOMs, zero breaking changes | Planned | Large | Full details |
| v1.1.0 | PostgreSQL 17 support; WITH RECURSIVE … SEARCH/CYCLE clause; auto_explain integration hook | Planned | Medium | Full details |
| v1.2.0 | PGlite proof of concept; pg_partman automated partition scheduling integration | Planned | Medium | Full details |
| v1.3.0 | Core extraction (pg_trickle_core) | Planned | Large | Full details |
| v1.4.0 | PGlite WASM extension | Planned | Medium | Full details |
| v1.5.0 | PGlite reactive integration | Planned | Medium | Full details |
How these versions fit together
v0.1.0 ─── Foundation: differential engine, CDC, scheduling, 1300+ tests
│
v0.2–0.5 ─── TopK, IMMEDIATE mode, RLS, partitioned sources, parallel refresh
│
v0.6–0.9 ─── Circular DAGs, watermarks, Prometheus, algebraic aggregates
│
v0.10–14 ─── PgBouncer compat, 34× latency, partitioned outputs, tiered scheduling
│
v0.15–19 ─── Nexmark, append-only fast path, cost model, security, packaging
│
v0.20–23 ─── Self-monitoring, zero-crash guarantee, downstream CDC, TPC-H at scale
│
v0.24–27 ─── Join correctness complete, thousands of STs, snapshot/PITR
│
v0.28–29 ─── Reliable event messaging (outbox + inbox) + relay CLI
│
v0.30 ─── Quality gate: correctness, stability, docs (required for 1.0)
│
v0.31 ─── Scheduler intelligence and performance hot paths
│
v0.32 ─── Citus: stable naming foundation (additive, safe for all users)
│
v0.33 ─── Citus: distributed CDC and stream table support
│
v0.35 ─── EC-01 fix, Citus chaos rig, reactive subscriptions, shadow-ST, relay hardening
│
v0.36 ─── L0 cache, WAL backpressure, api split, temporal IVM, columnar, RowIdSchema
│
v0.37 ─── Scheduler split, pgVectorMV, OpenTelemetry, pg_partman compat
│
v0.38 ─── Correctness closeout and truthfulness: EC-01, RowIdSchema planning, backpressure, wake/docs repair
│
v0.39 ─── Distributed hardening and diagnostics: Citus chaos, durable CDC hold, TPC-H explain artifacts, fuzzing
│
v0.40 ─── Operator trust and maintainability: generated docs, alerting, drain proof, secret hygiene, unsafe gating
│
v0.41 ─── DVM correctness: structural cache keys, placeholder safety, WAL transition guards
│
v0.42 ─── Docs truthfulness + test quality: repair_stream_table, catalog generator, sleep removal, fuzz CI
│
v0.43 ─── Performance tunability: deep-join GUCs, GROUP_RESCAN improvement, explain diagnostics, D+I CB refactor
│
v0.44 ─── Security hardening: IVM search_path, SQL builder, RLS warnings, module decomposition
│
v0.45 ─── Operational readiness: preflight, scalability, CI completeness, CNPG production
│
v0.46 ─── Extract pg_tide: standalone outbox/inbox/relay → trickle-labs/pg-tide; attach_outbox() integration
│
v0.47 ─── Embedding infrastructure: post-refresh actions, drift-based reindex, vector monitoring
│
v0.48 ─── Complete embedding programme: sparse vectors, hybrid search, embedding_stream_table(), per-tenant ANN
│
v0.49 ─── Test infrastructure hardening: concurrency sync overhaul, 10-module unit sweep, merge fuzz, DDL E2E, scheduler split
│
v0.50 ─── Performance, security & ops hardening: SPI batching, dblink fix, CNPG drain hook, digest pinning, ring observability
│
v0.51 ─── Citus chaos resilience & doc truth: chaos rig, deprecated GUC removal, pg_tide boundary, CTE strategy docs
│
v0.52 ─── DVM hot-path perf: O(1) placeholder resolution, volatility cache, lazy DiffContext, O(1) LRU eviction
│
v0.53 ─── Unit test depth: dag/scheduler/CDC/parser/config sweep, proptest extension, sleep removal
│
v0.54 ─── DVM hardening: diff_node depth limit, DiffContext OOM cap, snapshot fingerprint cache, view inlining fixpoint, O(V+E) diamond detection
│
v0.55 ─── Final pre-1.0 polish: configurable ring, module decomposition, serde_json NOTIFY, multi-column IN rewrite, DVM metrics, docs
│
v0.56 ─── Documentation Foundation: GUC_CATALOG fix, ERRORS.md complete (44 variants), MENTAL_MODEL.md, LIMITATIONS.md, PERFORMANCE_CHEATSHEET.md
│
v0.57 ─── Documentation Excellence: 4 new tutorials, P2/P3 polish, full 83-file consistency sweep
│
v0.58 ─── Security & correctness hardening: ownership checks (outbox/publication APIs), NOT IN + NULL fix, recursive CTE depth guard, WAL decoder TOCTOU lock, DDL hook escalation
│
v0.59 ─── Performance & observability: batched monitor SPI, query-hash cache, Arc<str> templates, WAL decoder Vec pre-alloc, CDC-lag percentiles, worker queue metrics, app_name BGW, backup docs
│
v0.60 ─── Code quality, test coverage & CI: cdc.rs split, codegen decompose, refresh/CDC/hooks unit tests, idempotence proptest, sleep removal, WAL OID filter, partition-attach rebuild, path-filtered E2E on PRs
│
v0.61 ─── DX, docs & pre-1.0 polish: health_check foreign-owner row, SQL_REFERENCE complete, snapshot secondary equality, cte_counter reset, outbox name fix, sublinks decompose, 3 ADRs, LATERAL docs
│
v0.62 ─── Scheduler throughput: change-buffer fan-out (O(N)→O(1)), pause/resume_scheduler SQL API, stream_table_spec() stable JSON projection
│
v0.63 ─── Fused multi-node refresh: CTE-chain composition across ready nodes, ≥ 20 % wall-time reduction on TPC-H 22-node DAG
│
v1.0.0 ─── Stable release, PostgreSQL 19, package registries, signed artifacts, SBOMs
v0.1.0 through v0.27.0 build the complete core engine and harden it for
production use. v0.28.0 and v0.29.0 deliver the event-driven integration
story. v0.30.0 is a mandatory correctness and polish gate before 1.0.
v0.31.0 sharpens scheduler intelligence before new features are added.
v0.32.0 is the first of two Citus releases, shipping stable object naming
and detection helpers as an additive, non-breaking foundation. v0.33.0
delivers the full Citus integration immediately after — per-worker slot CDC,
distributed ST placement, cross-node coordination, and the Citus test suite.
Pulling v0.33.0 forward means users with Citus topologies (including
billion-row all-distributed deployments) are unblocked two releases earlier.
v0.35.0 was intended to be the single most important release before v1.0, but
the v0.37.0 overall assessment shows several of those closeout items remain
partially open or insufficiently proven. v0.36.0 and v0.37.0 still delivered
substantial structural gains: L0 cache construction, temporal IVM,
RowIdSchema, scheduler and merge splits, pgVectorMV, and OpenTelemetry trace
capture. The next three releases now form a hardening programme rather than an
immediate feature expansion.
v0.38.0 is a dedicated EC-01 correctness sprint with a hard release gate: This release will NOT ship until join phantom rows are proven closed with a comprehensive DIFF-vs-FULL property test suite covering Q07/Q15-style joins. EC-01 has been labeled critical since v0.20.0 (6+ releases) and deferred multiple times; v0.38.0 breaks that pattern by making EC-01 closure the sole release objective. No other features, no operational docs, no SQLSTATE rollout — just the join phantom-row fix and its proof.
v0.39.0 absorbs the operational truthfulness items that were originally planned for v0.38.0: backpressure hysteresis or deprecation, wake-truthfulness repair, generated configuration and upgrade docs, OpenTelemetry collector proof, SQLSTATE rollout on hot paths, and the full distributed/diagnostic coverage (Citus chaos testing, durable CDC hold semantics, per-query TPC-H explain artifacts, SQLancer light PR mode, targeted fuzzing, and inbox/outbox reliability tests).
v0.40.0 then focuses on operator trust and maintainability: generated SQL/GUC references, drain-mode proof, monitoring/alert rules, security-model and secret-handling docs, upgrade-gate coverage, unsafe-inventory PR gating, and continued decomposition of the largest files.
v0.41.0 through v0.45.0 form a second hardening arc driven by the findings
in the v0.40 overall assessment (plans/PLAN_OVERALL_ASSESSMENT_9.md). These
five releases systematically close every gap identified across 10 dimensions:
correctness (P0 cache-key and placeholder fixes), documentation truthfulness
(repair function implementation, catalog generator rewrite), test quality
(sleep removal, property tests, fuzz CI — merged into v0.42.0), performance
tunability (GUC-exposed thresholds, explain diagnostics), security
(search_path hardening, centralized SQL building), and operational readiness
(preflight functions, scalability infrastructure, CI completeness). Only after
this arc does the roadmap resume the embedding programme in v0.47.0–v0.48.0,
preserving the pgvector work while aligning the release order with the
assessment's conclusion that closing correctness and operational gaps matters
more than adding new surface area. The embedding programme itself is
consolidated into two releases: v0.47.0 for infrastructure and ANN maintenance,
and v0.48.0 completing the full feature set (sparse/half-precision aggregates,
hybrid search, the ergonomic embedding_stream_table() API, per-tenant ANN
patterns, and outbox-emitted embedding events). v0.46.0 precedes this arc
with the extraction of pg_tide — moving the outbox, inbox, and relay
subsystems into a standalone extension at trickle-labs/pg-tide.
v0.49.0 through v0.51.0 form the v1.0 readiness arc, driven by the findings in the v0.48.0 overall assessment (plans/PLAN_OVERALL_ASSESSMENT_10.md). The assessment confirmed that every P0 correctness issue from prior assessments is closed — EC-01 phantom rows, snapshot cache-key weakness, placeholder resolution, and WAL transition TOCTOU are all fixed. The project has transitioned from a capability problem to a coverage confidence problem. These three releases systematically close the remaining gaps across test reliability, performance, security hardening, operational polish, and documentation truth before v1.0.
v0.58.0 through v0.61.0 form the final assessment-driven hardening arc before v1.0, driven by the findings in the v0.57.0 overall assessment (plans/PLAN_OVERALL_ASSESSMENT_12.md). The assessment found 0 critical findings, 4 HIGH severity issues (ownership-check bypass in the outbox and publication APIs, recursive-CTE depth guard not applied in DIFFERENTIAL mode, multi-column NOT IN with NULL row semantics, and per-source SPI fan-out in the monitor health check), plus 23 MEDIUM and 20 LOW items spanning performance, observability, code quality, test coverage, and documentation. v0.58.0 closes all HIGH findings as a hard gate. v0.59.0 eliminates the performance and observability gaps. v0.60.0 completes the code quality and test coverage sweep. v0.61.0 delivers the final developer-experience and documentation polish, closing the last remaining items so that v1.0 is a clean, fully verified stable release.
v0.49.0 targets test infrastructure quality — the single highest-risk
category from the v10 assessment. All concurrency tests currently rely on
sleep(50ms) for synchronization, which provides false confidence: tests may
pass locally while missing real race conditions on slow CI runners or under
load. This release replaces sleep-based synchronization with pg_locks-polling
patterns throughout tests/e2e_concurrent_tests.rs. Alongside, ten source
modules that have zero #[cfg(test)] unit test coverage are systematically
addressed: catalog.rs, template_cache.rs, ivm.rs, cdc/polling.rs,
cdc/rebuild.rs, diagnostics.rs, logging.rs, metrics_server.rs, and
otel.rs. New fuzz targets are added for the refresh merge SQL codegen
(src/refresh/merge/) and row identity tracking (src/dvm/row_id.rs) — two
high-value surfaces with no current fuzz coverage. An E2E test for concurrent
DDL during active refresh (ALTER STREAM TABLE + in-flight refresh) is added.
The src/scheduler/mod.rs monolith (6,700+ lines) is decomposed into focused
submodule files: scheduling loop, parallel dispatch state, and cost model
each become separate files. The e2e-smoke CI filter is widened to cover join,
aggregate, and window operator regressions on every PR, and a consolidated
just fuzz-all recipe is added to the justfile.
v0.50.0 targets performance, security, and operational hardening. The
differential refresh hot path currently makes 3–4 separate SPI round-trips per
refresh cycle — buffer existence check, change count per source, and table row
estimate — that are consolidated into a single CTE query, saving 10–15ms per
multi-source refresh. The CDC trigger SQL generation loop is tightened using
String::with_capacity() to eliminate per-column heap allocations. The
watermark computation in the scheduler tick is consolidated into a single
compound query. On the security side, the src/citus.rs dblink calls that
use manual single-quote doubling for escaping are replaced with
pg_escape_literal() SPI calls for defense-in-depth. Operational gaps are
closed: the CNPG cluster-production.yaml gains a preStop lifecycle hook
that calls pgtrickle.drain(timeout_s => 120) before pod termination,
preventing interrupted in-flight refreshes during rolling upgrades. All Docker
base images are pinned to SHA256 digests for reproducible builds. The shared
memory invalidation ring capacity limit (1,024 entries) is documented in
docs/CONFIGURATION.md with a new pg_trickle_invalidation_ring_overflow
Prometheus counter. Two additional Prometheus metrics are added:
pg_trickle_dag_cycles_detected and pg_trickle_cache_stale_evictions.
The deep join chain Part 3 correction threshold GUC and its trade-off
between SQL complexity and correctness at >6 join tables is documented in the
configuration reference with an associated soak-test assertion.
v0.51.0 closes the Citus resilience gap and brings documentation into full truth — the chaos test rig (node kill, rebalance, and network-partition scenarios) proves that every Citus failure mode is handled, while deprecated GUC removals, ARCHITECTURE.md boundary updates, recursive CTE strategy documentation, and CDC-enabled-flag documentation eliminate the last documentation inaccuracies identified in the v10 assessment.
v0.52.0 through v0.55.0 form the final pre-1.0 hardening arc, driven by the findings in the v0.51.0 overall assessment (plans/PLAN_OVERALL_ASSESSMENT_11.md). The assessment found no critical issues, two HIGH findings (both performance-class), and 22 MEDIUM findings across correctness, performance, scalability, test coverage, code quality, security, and feature completeness. These four releases close every one of them in priority order.
v0.52.0 targets the two HIGH-severity performance gaps on the DVM hot
path. resolve_delta_template() currently resolves LSN placeholders by
calling .replace() twice per source OID — an O(k×n) scan for k source
tables in a SQL string of length n. This is replaced with a single
aho-corasick pass that resolves all placeholders in one traversal, cutting
multi-source refresh latency proportionally. Alongside, lookup_function_volatility()
currently makes one SPI round-trip to pg_proc for every unknown function in
a query — up to 50 ms overhead for function-heavy queries. A thread-local
HashMap<String, char> cache pre-populated with all PostgreSQL built-in
functions eliminates these trips on the hot path. Two further allocator
improvements close the LOW-rated findings: DiffContext::new() switches from
12 unconditional HashMap::new() calls to Option<HashMap> with lazy
initialization (saving 5–10 µs per refresh for simple queries), and the
template cache eviction path is replaced with a proper LRU data structure for
O(1) eviction instead of O(N) scanning.
v0.53.0 is the eleven-module unit test depth sweep. Five source modules
that are responsible for core algorithmic logic — dag.rs (cycle detection,
topological sort, diamond detection, schedule resolution), the eight
scheduler/ submodules (cost model, tier transitions, watermark computation),
cdc.rs/cdc/polling.rs/cdc/rebuild.rs (buffer naming, column escaping,
trigger SQL), all five dvm/parser/ files (Expr::to_sql(), AggFunc
classification, strip_qualifier()), and config.rs (mode parsing, threshold
conversion) — have zero inline #[cfg(test)] unit tests and are only
exercised through full-stack E2E tests. This release adds focused
#[cfg(test)] modules to every one of them using mock structures that require
no PostgreSQL backend. proptest! coverage is extended to DAG cycle detection
and schedule resolution. The two remaining fixed-sleep tests in
e2e_buffer_growth_tests.rs (7s and 20s sleeps) are replaced with adaptive
pg_locks-polling helpers.
v0.54.0 hardens the DVM engine against pathological queries and slow
parsing. diff_node() gains a depth counter that errors on breach of
max_parse_depth (default 64), preventing stack overflow on extreme nesting.
DiffContext gains a configurable CTE count ceiling (default 1,000) that
returns a clean error before OOM can occur. The snapshot cache fingerprint is
computed and stored at OpTree construction time instead of re-traversing the
tree on every diff cycle, and Expr::to_sql() caches its result string to
eliminate redundant allocations. View inlining (rewrite_views_inline()) is
refactored to batch all relkind lookups into a single SPI query and use a
fixpoint check (no changes this iteration) instead of a hard counter, cutting
3-level view hierarchies from ~15 ms to a single parse + one SPI call. The
ST-to-ST frontier resolver is hardened to return PgTrickleError::SourceNotFound
instead of silently defaulting to "0/0" when a required source is missing.
Finally, diamond detection is reimplemented with a BFS-based visited-set merge
algorithm, reducing complexity from O(V²) pairwise comparisons to O(V+E)
— critical for deployments with 500+ stream tables.
v0.55.0 delivers the final pre-1.0 polish pass across scalability,
module structure, security, documentation, and one new SQL feature. The
shared-memory invalidation ring capacity (currently hardcoded at 1,024) becomes
a GUC with a default of 1,024 and a maximum of 4,096, preventing excessive full
DAG rebuilds in DDL-burst environments. src/api/mod.rs (7,600+ lines) is
decomposed into focused submodules (api/create.rs, api/alter.rs,
api/refresh.rs), and src/monitor.rs (4,000+ lines) is split into
monitor/alert.rs, monitor/health.rs, and monitor/tree.rs. NOTIFY alert
payloads are switched from manual string escaping to serde_json::json!()
to guarantee correct JSON for error messages containing backslashes or control
characters. The DVM parser gains automatic rewriting of WHERE (a, b) IN (SELECT x, y FROM ...) multi-column row IN subqueries to equivalent
EXISTS form, closing the last user-visible SQL coverage gap. DVM parse
timing metrics (pg_trickle_dvm_parse_ms, pg_trickle_delta_query_size_bytes)
are added to the Prometheus metrics endpoint. The __PGS_/__PGT_ reserved
column-name prefixes are documented in docs/SQL_REFERENCE.md, rationale
comments are added to all magic-number GUC defaults in src/config.rs, and
coverage reporting is added to the PR gate so regressions are visible before
merge.
truth.** The Citus distributed support shipped in v0.32–v0.34 has never had
a chaos test suite — there are zero tests validating behaviour under node
failure, shard rebalance, or network partition. This release delivers a
docker-compose-based chaos rig with three scenarios: coordinator restart,
worker node kill with automatic reconnect, and rolling shard rebalance during
active refresh. The deprecated pg_trickle.event_driven_wake GUC (non-functional
since background workers cannot use LISTEN) is removed entirely along with
all associated code paths and the runtime warning it emits. Documentation is
brought to full truth: docs/ARCHITECTURE.md is updated to clearly describe
the pg_tide boundary after v0.46.0 extraction; docs/CONFIGURATION.md gains
a deprecation header on the removed GUC entry; the recursive CTE strategy
selection heuristic (semi-naive vs. DRed vs. recomputation fallback) is
documented for the first time with an example EXPLAIN output; and a note is
added to docs/CONFIGURATION.md clarifying that CDC triggers fire even when
pg_trickle.enabled = false (by design, to keep buffers ready for re-enable).
Release Process
This document describes how to create a release of pg_trickle.
Overview
Releases are fully automated via GitHub Actions. Pushing a version tag (v*)
triggers the Release workflow, which:
- Runs a preflight version-sync check to ensure all version references match the tag
- Builds extension packages for Linux (amd64), macOS (arm64), and Windows (amd64)
- Smoke-tests the Linux artifact against a live PostgreSQL 18 instance
- Creates a GitHub Release with archives and SHA256 checksums
- Builds and pushes a multi-arch extension image to GHCR (for CNPG Image Volumes)
A separate PGXN workflow also fires on the same
v* tag and publishes the source archive to the PostgreSQL Extension Network.
Prerequisites
- Push access to the repository (or a PR merged by a maintainer)
- All CI checks passing on
main(verify the last run on the version-bump commit succeeded) - The version in
Cargo.tomlmatches the tag you intend to push - Required GitHub secrets configured (see Required GitHub Secrets below)
Required GitHub Secrets
The release automation uses the following GitHub Actions secrets. Set them under Settings → Secrets and variables → Actions → New repository secret.
| Secret | Used by | Description |
|---|---|---|
PGXN_USERNAME | pgxn.yml | Your PGXN account username. Used to authenticate the curl upload to PGXN Manager when publishing source archives to the PostgreSQL Extension Network. Register at pgxn.org. |
PGXN_PASSWORD | pgxn.yml | Password for the PGXN account above. Never hardcode this — it must be stored as a secret so it is never exposed in logs or committed to the repository. |
CODECOV_TOKEN | coverage.yml | Upload token for Codecov. Used to publish unit and E2E coverage reports. Obtain it from the Codecov dashboard after linking the repository. The workflow degrades gracefully (fail_ci_if_error: false) if absent. |
BENCHER_API_TOKEN | benchmarks.yml | API token for Bencher, the continuous benchmarking platform. Used to track Criterion benchmark results on main and detect regressions on pull requests. The benchmark steps are skipped entirely when this secret is absent, so CI still passes without it. Create a project at bencher.dev and copy the token from the project settings. |
Note: The
GITHUB_TOKENsecret is provided automatically by GitHub Actions and does not need to be configured manually. It is used by the release workflow to create GitHub Releases, by the Docker workflow to push images to GHCR, and by Bencher to post PR comments.
Step-by-Step
1. Decide the version number
Follow Semantic Versioning:
| Change type | Bump | Example |
|---|---|---|
| Breaking SQL API or config change | Major | 1.0.0 → 2.0.0 |
| New feature, backward-compatible | Minor | 0.1.0 → 0.2.0 |
| Bug fix, no API change | Patch | 0.2.0 → 0.2.1 |
| Pre-release / release candidate | Suffix | 0.3.0-rc.1 |
2. Update the version
Three files must have their version bumped together:
# 1. Cargo.toml — the canonical version source for the extension
# Change: version = "0.7.0" → version = "0.8.0"
# 2. META.json — the PGXN package metadata
# Change both top-level "version" and the nested "provides" version
# 3. CHANGELOG.md
# Rename ## [Unreleased] → ## [0.8.0] — YYYY-MM-DD
# Add a new empty ## [Unreleased] section at the top
The just check-version-sync script enforces version consistency across the workspace.
The extension control file (pg_trickle.control) uses
default_version = '@CARGO_VERSION@', which pgrx substitutes automatically at
build time — no manual edit needed there.
After editing, verify all version-related files are in sync:
just check-version-sync
3. Commit the version bump
git add Cargo.toml META.json CHANGELOG.md
git commit -m "release: v0.8.0"
git push origin main
4. Wait for CI to pass and verify upgrade completeness
Ensure the CI workflow passes on main with
the version bump commit. All unit, integration, E2E, and pgrx tests must be
green.
Critical: Before tagging, verify that the upgrade script covers all SQL schema changes:
# Run comprehensive upgrade completeness checks
just check-upgrade-all
# If any check fails (e.g. "ERROR: X new function(s) missing from upgrade script"),
# fix the issue by adding the missing SQL objects to:
# sql/pg_trickle--<prev>--<new>.sql
#
# Then re-run until all checks pass:
just check-upgrade-all # Should print "All 15 upgrade step(s) passed completeness checks."
Why this matters: New SQL functions, views, tables, and columns added in any prior
release must be carried forward in the upgrade script, even if the current release
doesn't change them. The upgrade script is the source of truth for what PostgreSQL
applies when users run ALTER EXTENSION pg_trickle UPDATE.
Confirm the local and CI upgrade-E2E defaults were advanced to the new release:
just check-version-sync # Verifies ci.yml, justfile, and test defaults
5. Create and push the tag
git tag -a v0.2.0 -m "Release v0.2.0"
git push origin v0.2.0
This triggers the Release workflow automatically.
6. Monitor the release
Watch the Actions tab for progress. The release workflow runs these jobs in order:
preflight ──► build-release (linux, macos, windows)
│
▼
test-release ──► publish-release
──► publish-docker-arch (linux/amd64 + linux/arm64)
│
▼
publish-docker (merge manifest + push :latest)
The PGXN workflow (pgxn.yml) runs independently and publishes the source
archive to pgxn.org in parallel with the release workflow.
7. Make the GHCR package public (first release only)
When a package is pushed to GHCR for the first time it is private by default. Because this is an open-source project, packages linked to the public repository inherit public visibility — but you must make the package public once to unlock that:
- Go to github.com/⟨owner⟩ → Packages → pg_trickle-ext
- Click Package settings
- Scroll to Danger Zone → Change package visibility → set to Public
After that first change:
- All future pushes keep the package public automatically
- Unauthenticated
docker pull ghcr.io/trickle-labs/pg_trickle-ext:...works - Storage and bandwidth are free (GHCR open-source advantage)
- The package page shows the README, linked repository, license, and description from the OCI labels
8. Verify the release
Once both workflows complete:
- Check the GitHub Releases page for the new release
-
Verify all three platform archives are attached (
.tar.gzfor Linux/macOS,.zipfor Windows) -
Verify
SHA256SUMS.txtis present -
Verify the extension image is available at
ghcr.io/trickle-labs/pg_trickle-ext:<version> -
Verify the PGXN upload succeeded:
pgxn info pg_trickleshould show the new version - Optionally verify the extension image layout:
docker pull ghcr.io/trickle-labs/pg_trickle-ext:<version>
ID=$(docker create ghcr.io/trickle-labs/pg_trickle-ext:<version>)
docker cp "$ID:/lib/" /tmp/ext-lib/
docker cp "$ID:/share/" /tmp/ext-share/
docker rm "$ID"
ls -la /tmp/ext-lib/ /tmp/ext-share/extension/
Post-Release Checklist
Complete these steps immediately after a release tag has been pushed and both the Release and PGXN workflows have finished successfully.
-
Create a post-release branch from
main(e.g.post-release-<ver>-a) -
Bump
Cargo.tomlversionto the next development version (e.g.0.12.0→0.13.0) -
Bump
META.json— both the top-level"version"and the nested"provides" → "pg_trickle" → "version"to match -
Write
plans/PLAN_0_<next>_0.md— initial planning document for the next milestone -
Delete
plans/PLAN_0_<released>_0.md— remove the now-completed plan -
Wrap roadmap items — in
ROADMAP.md, wrap all completed items from the old release with<details>tags to archive them -
Add
## [Unreleased]stub toCHANGELOG.mdabove the just-released entry -
Create
sql/pg_trickle--<released>--<next>.sql— empty upgrade script stub for the next migration hop -
Copy
sql/archive/pg_trickle--<released>.sql→sql/archive/pg_trickle--<next>.sql— placeholder archive baseline for the next version -
Update
justfile— advancebuild-upgrade-imageandtest-upgradetodefaults to<next>; update thebuild-hubDocker image tag -
Update
tests/e2e_upgrade_tests.rs— advance allunwrap_or("<released>".into())fallback strings to<next> -
Update version numbers in
README.md— search for occurrences of the released version (e.g.0.17.0) and advance them to<next>: CNPG image reference (ghcr.io/trickle-labs/pg_trickle-ext:<version>), dbtrevisiontag, and any other hardcoded version strings. A quick check:grep -n '<released>' README.md -
Run
just check-version-sync— must exit 0 before opening the PR -
Open a PR against
mainwith the commit titlechore: start v<next> development cycle
Preparing for the Next Release (Pre-Work Checklist)
Use this checklist at the start of each new release milestone to ensure the repository is properly set up before development begins. This maps directly to what just check-version-sync verifies.
| File / target | Action | check-version-sync check |
|---|---|---|
Cargo.toml | version = "<next>" | canonical version source |
META.json | both "version" fields set to <next> | PGXN manifest |
CHANGELOG.md | ## [Unreleased] section present | (manual hygiene) |
sql/pg_trickle--<prev>--<next>.sql | stub file exists | upgrade SQL exists |
sql/archive/pg_trickle--<next>.sql | placeholder file exists (copy of <prev>) | archive SQL exists |
.github/workflows/ci.yml | upgrade matrix and chain end at <next> | CI matrix up to date |
justfile | build-upgrade-image and test-upgrade to defaults = <next> | justfile defaults |
tests/e2e_upgrade_tests.rs | all unwrap_or fallbacks = "<next>" | e2e fallback strings |
Quick-verify with:
just check-version-sync
# Should print: All version references are in sync.
Release Artifacts
Each release produces:
| Artifact | Description |
|---|---|
pg_trickle-<ver>-pg18-linux-amd64.tar.gz | Extension files for Linux x86_64 |
pg_trickle-<ver>-pg18-macos-arm64.tar.gz | Extension files for macOS Apple Silicon |
pg_trickle-<ver>-pg18-windows-amd64.zip | Extension files for Windows x64 |
SHA256SUMS.txt | SHA-256 checksums for all archives |
ghcr.io/trickle-labs/pg_trickle-ext:<ver> | CNPG extension image for Image Volumes (amd64 + arm64) |
Installing from an archive
tar xzf pg_trickle-<version>-pg18-linux-amd64.tar.gz
cd pg_trickle-<version>-pg18-linux-amd64
sudo cp lib/*.so "$(pg_config --pkglibdir)/"
sudo cp extension/*.control extension/*.sql "$(pg_config --sharedir)/extension/"
Then add to postgresql.conf and restart:
shared_preload_libraries = 'pg_trickle'
See Installation for full installation details.
Pre-releases
Tags containing -rc, -beta, or -alpha (e.g., v0.3.0-rc.1) are
automatically marked as pre-releases on GitHub. Pre-release extension images are
tagged but do not update the latest tag.
Hotfix Releases
For urgent fixes on an older release:
# Branch from the tag
git checkout -b hotfix/v0.2.1 v0.2.0
# Apply fix, bump version to 0.2.1
git commit -am "fix: ..."
git push origin hotfix/v0.2.1
# Tag from the branch (CI will still run the release workflow)
git tag -a v0.2.1 -m "Release v0.2.1"
git push origin v0.2.1
Files to Update for Each Release
Every release requires manual updates to the files below. Missing any of them leads to version skew between the code, the docs, and the packages.
| File | What to change | Why |
|---|---|---|
Cargo.toml | version = "x.y.z" field | The canonical version source. pgrx reads this at build time and substitutes it into pg_trickle.control via @CARGO_VERSION@. The git tag must match. |
META.json | Both "version" fields (top-level and inside "provides") | The PGXN package manifest. The pgxn.yml workflow uploads this file as part of the source archive; a stale version here means the wrong version appears on pgxn.org. |
CHANGELOG.md | Rename ## [Unreleased] → ## [x.y.z] — YYYY-MM-DD; add a new empty ## [Unreleased] at the top | Keeps the public changelog accurate and gives downstream users a dated record of changes. |
ROADMAP.md | Update the preamble's latest-release/current-milestone lines; mark the released milestone done; advance the "We are here" pointer to the next milestone | Keeps the forward-looking plan aligned with reality. Leaves no confusion about what just shipped versus what is next. |
README.md | Update test-count line (~N unit tests + M E2E tests) if test counts changed significantly | The README is the first thing users read; stale numbers erode trust. |
INSTALL.md | Update any version numbers in install commands or example URLs | Users copy-paste installation commands; stale versions cause failures. |
docs/UPGRADING.md | Add the new version-specific migration notes and extend the supported upgrade-path table | Documents exactly what ALTER EXTENSION ... UPDATE will do and which chains are supported. |
sql/pg_trickle--<old>--<new>.sql | Add or update the hand-authored upgrade script for every SQL-surface change (new objects, changed signatures, changed defaults, view changes). Also carry forward all functions/views/tables added in previous releases — the upgrade script is cumulative. | ALTER EXTENSION ... UPDATE only applies what is explicitly scripted; function defaults and signatures stored in pg_proc do not update themselves. Omitting a function that existed in <old> but is expected in <new> will break user upgrades. |
sql/archive/pg_trickle--<new>.sql | Regenerate and commit the full-install SQL baseline for the new version. This file was created as a placeholder copy of <prev> at the start of the development cycle — it must be replaced with the actual generated SQL before tagging. Run cargo pgrx schema (or the equivalent just target) to produce the final schema, then overwrite the placeholder. | Future upgrade-completeness checks and upgrade E2E tests need an exact baseline for the released version. A stale placeholder from the start of the cycle will cause spurious failures. |
.github/workflows/ci.yml, justfile, tests/build_e2e_upgrade_image.sh, tests/Dockerfile.e2e-upgrade | Advance the upgrade-check chain and default upgrade-E2E target version to the new release | Prevents release automation and local upgrade validation from getting stuck on the previous version after a new migration hop is added. |
pg_trickle.control | No manual edit needed — default_version is set to '@CARGO_VERSION@' and pgrx substitutes it at build time. Verify the substitution in the built artifact. | Ensures the SQL CREATE EXTENSION command installs the right version. |
CRITICAL: After updating
sql/pg_trickle--<old>--<new>.sql, always runjust check-upgrade-allto verify that the upgrade script is complete. This checks not just the immediate hop to the new version, but the entire upgrade chain from v0.1.3 onwards. If the check fails (e.g. "ERROR: 3 new function(s) missing"), it means the upgrade script is missing one or more SQL objects that users will expect to have after upgrading. Fix all failures before tagging.
Checklist summary
[ ] Cargo.toml — version bumped
[ ] META.json — both "version" fields updated to match
[ ] CHANGELOG.md — [Unreleased] renamed to [x.y.z] with date; new empty [Unreleased] added
[ ] ROADMAP.md — preamble updated; released milestone marked done
[ ] README.md — test counts current (if materially changed)
[ ] INSTALL.md — version references current
[ ] docs/UPGRADING.md — latest migration notes and supported chains added
[ ] sql/pg_trickle--<old>--<new>.sql — covers every SQL-surface change AND carries forward all previous release functions
[ ] sql/archive/pg_trickle--<new>.sql — regenerated from final schema and committed (replaces the dev-cycle placeholder)
[ ] just check-upgrade-all — all upgrade steps pass completeness checks (not just the one-step hop)
[ ] Upgrade automation defaults — CI/local upgrade checks and E2E target the new version
[ ] just check-version-sync — all version references in sync
[ ] All CI checks on main have passed (verify the last run on the version-bump commit succeeded)
[ ] git tag matches Cargo.toml version
Troubleshooting
Release workflow failed
Go to the Actions tab and identify which job failed. Then follow the appropriate recovery path below.
Option A: Re-run (transient failure)
If the failure is transient — network timeout, registry hiccup, runner issue — you can re-run without changing anything:
- Open the failed workflow run in the Actions tab
- Click Re-run all jobs (or re-run just the failed job)
This works because the v* tag still points to the same commit, and the
workflow uses cancel-in-progress: false so a re-run won't be cancelled.
Option B: Fix code and re-tag
If the failure is a real build or code issue:
# 1. Delete the remote tag
git push origin :refs/tags/v0.2.0
# 2. Delete the local tag
git tag -d v0.2.0
# 3. Fix the issue, commit, and push
git add <files>
git commit -m "fix: ..."
git push origin main
# 4. Re-tag on the new commit and push
git tag -a v0.2.0 -m "Release v0.2.0"
git push origin v0.2.0
This triggers a fresh release workflow run.
Option C: Clean up a partial GitHub Release
If the workflow created a draft or partial Release before failing:
- Go to Releases in the repository
- Delete the broken release (this does not delete the tag)
- Then follow Option A or Option B above
Upgrade script completeness check failed
If just check-upgrade-all reports errors like "ERROR: X new function(s) missing from upgrade script", it means the upgrade SQL script is incomplete:
# 1. Look at the error — it tells you exactly what's missing
just check-upgrade-all # e.g. "ERROR: 3 new function(s) missing from upgrade script:
# - pgtrickle.\"explain_refresh_mode\"
# - pgtrickle.\"fuse_status\"
# - pgtrickle.\"reset_fuse\""
# 2. Find where those objects are defined in the previous release
# (they should already exist in sql/archive/pg_trickle--<prev>.sql)
grep -n "CREATE.*FUNCTION.*explain_refresh_mode" sql/archive/pg_trickle--*.sql
# 3. Copy the function definitions (CREATE OR REPLACE FUNCTION) to the
# upgrade script you're fixing. They should go into:
# sql/pg_trickle--<old>--<new>.sql
#
# Typically, carry-forward functions are grouped in their own section
# at the top of the upgrade script with a comment explaining they're
# from a prior release.
# 4. Re-run the check to verify it passes
just check-upgrade-all
Why this happens: When a new release (e.g. v0.11.0) adds SQL functions, those
functions must be explicitly included in all subsequent upgrade scripts. The upgrade
script is the ground truth — PostgreSQL only applies what is listed in the .sql file.
If you skip a function that users expect, their upgraded extension will be missing
that object.
Common failure causes
| Symptom | Cause | Fix |
|---|---|---|
| Version mismatch error | Cargo.toml version doesn't match the git tag | Run just check-version-sync, fix any skew, commit, delete tag, re-tag (Option B) |
| Build failure | Compilation error in release profile | Fix on main, re-tag (Option B) |
| Docker push failed | Missing permissions | Verify packages: write is in the workflow and GITHUB_TOKEN has GHCR access, then re-run (Option A) |
| Smoke test failed | Extension doesn't load in PostgreSQL | Fix the issue, re-tag (Option B) |
| PGXN upload failed | Missing PGXN_USERNAME / PGXN_PASSWORD secrets, or META.json version not updated | Add the secrets in repository settings; verify META.json version matches the tag; re-run the pgxn.yml workflow from the Actions tab |
just check-upgrade-all reports missing functions/views | Upgrade script is incomplete — new objects from prior releases not carried forward | See "Upgrade script completeness check failed" above for recovery steps |
| Rate limited | GitHub API or GHCR throttling | Wait a few minutes, then re-run (Option A) |
Yanking a release
If a release has a critical issue:
- Mark it as pre-release on the GitHub Releases page (uncheck "Set as the latest release")
- Add a warning to the release notes
- Publish a patch release with the fix
Project History
pg_trickle started with a practical goal. We were inspired by data platforms built around pipelines that keep themselves incrementally up to date, and we wanted to bring that same style of self-maintaining data flow directly into PostgreSQL. In particular, we needed support for recursive CTEs, which were essential to the kinds of pipelines we had in mind. We could not find an open-source incremental view-maintenance system that matched that requirement, so pg_trickle began as an attempt to close the gap.
It also became an experiment in what coding agents could realistically help build. We set out to develop pg_trickle without editing code by hand, while still holding it to the same bar we would expect from any other systems project: broad feature coverage, strong code quality, extensive tests, and thorough documentation. Skepticism toward AI-written software is reasonable; the right way to evaluate pg_trickle is by the codebase, the tests, and the docs.
That constraint changed how we worked. Agents can produce a lot of surface area quickly, but database systems are unforgiving of vague assumptions and hidden edge cases. To make the project hold together, we had to be unusually explicit about architecture, operator semantics, failure handling, and test coverage. In practice, that pushed us toward more written design, more reviewable behavior, and more verification than a quick prototype would normally get.
The result is a spec-driven development process, not vibe-coding.
Every feature starts as a written plan — an architecture decision
record, a gap analysis, or a phased implementation spec — before any
code is generated. The
plans/
directory contains over 110 documents covering operator semantics,
CDC trade-offs, performance strategies, ecosystem comparisons, and
edge-case catalogues. Agents work from these specs; the specs are
reviewed and revised by humans. This is what makes it possible to
maintain coherence across a large codebase without manually editing
every line: the design is explicit, the invariants are written down,
and the tests verify both.
We also do not think the use of AI should lower the standard for trust. If anything, it raises it. The point of the experiment was not to ask people to trust the toolchain; it was to see whether disciplined use of coding agents could help produce a serious, inspectable PostgreSQL extension. Whether that worked is for readers and users to judge, but the intent is simple: make the code, the tests, the documentation, and the trade-offs visible enough that the project can stand on its own merits.
Contributors
- Geir O. Grønmo
- Baard H. Rehn Johansen
- GitHub Copilot — AI pair programmer
See also: Roadmap · Changelog · Contributing
Viewing on GitHub? The contributing guide lives in CONTRIBUTING.md. This stub is served by the pg_trickle docs site — the include below renders there.
Contributing to pg_trickle
Thank you for your interest in contributing! pg_trickle is an Apache 2.0-licensed open-source project and welcomes contributions of all kinds.
Before You Start
- Check the open issues and discussions to avoid duplicating work.
- For non-trivial changes, open an issue first to discuss the approach.
- Read AGENTS.md — it is the authoritative guide for all coding conventions, error handling rules, module layout, and test requirements.
- Read docs/ARCHITECTURE.md to understand the system.
- Read ROADMAP.md to see what work is planned.
Ways to Contribute
| Type | Where to start |
|---|---|
| Bug report | Open an issue |
| Feature request | Open an issue or start a discussion |
| Documentation fix | Open a PR directly — no issue needed for typos/clarity |
| Code fix or feature | Open an issue first, then a PR |
| Performance improvement | Include benchmark numbers (see just bench) |
Development Setup
# Install pgrx
cargo install cargo-pgrx --version "=0.18.0"
cargo pgrx init --pg18 /usr/lib/postgresql/18/bin/pg_config
# Build
cargo build
# Format + lint (required before every PR)
just fmt
just lint
# Run tests
just test-unit # fast, no DB
just test-integration # Testcontainers
just test-light-e2e # PR-equivalent Light E2E tier (stock postgres)
just test-e2e # full E2E (builds Docker image)
just test-pgbouncer # PgBouncer transaction-pool compatibility tests
Full setup instructions are in INSTALL.md.
Devcontainer / Containerized Development
If you are developing in a devcontainer, use the default non-root vscode user
and run the normal commands from the workspace root:
just fmt
just lint
just test-unit
just test-unit uses scripts/run_unit_tests.sh, which now selects a writable
and cache-friendly target directory in this order:
target/(preferred).cargo-target/(project-local fallback)$HOME/.cache/pg_trickle-target${TMPDIR:-/tmp}/pg_trickle-target(last resort)
This avoids permission failures on bind mounts and preserves incremental builds when source or test files change.
If you see permission errors in containerized runs, verify you are not forcing a different container user/UID than expected by your workspace mount.
Run E2E tests in devcontainer
E2E tests use Testcontainers and require Docker access from inside the
devcontainer (provided by the Docker-in-Docker feature in
.devcontainer/devcontainer.json).
Run from the workspace root inside the devcontainer:
just build-e2e-image
just test-e2e
Notes:
- The E2E harness starts containers via
testcontainers(tests/e2e/mod.rs). - The default E2E image is
pg_trickle_e2e:latest(built bytests/build_e2e_image.sh). - A plain
docker runof the dev image is not equivalent to a full VS Code devcontainer session with features/lifecycle hooks enabled.
Making a Pull Request
- Fork the repository and create a branch:
git checkout -b fix/my-fix - Make your changes following the conventions in AGENTS.md
- Run
just fmt && just lint— both must pass with zero warnings - Add or update tests — see AGENTS.md § Testing
- Open a PR against
main
The PR template will walk you through the checklist.
CI Coverage on PRs
PR CI runs a three-tier gate:
- Unit tests (Linux only)
- Integration tests
- Light E2E — curated PR-friendly end-to-end coverage split across three
shards and executed against stock
postgres:18.3
Full E2E, TPC-H tests, benchmarks, dbt, CNPG smoke, and the extra macOS / Windows unit jobs stay off the PR critical path and run on push-to-main, schedule, or manual dispatch. This keeps typical PR feedback closer to the single-digit-minute range while preserving broader scheduled coverage.
To trigger the full CI matrix on your PR branch (recommended for DVM engine, refresh, or CDC changes):
gh workflow run ci.yml --ref <your-branch>
To run all tests locally before pushing:
just test-all # unit + integration + e2e
# PR-equivalent fast path:
just test-unit
just test-integration
just test-light-e2e
# TPC-H correctness tests (requires e2e Docker image):
cargo test --test e2e_tpch_tests -- --ignored --test-threads=1 --nocapture
See AGENTS.md § Testing for the full CI coverage matrix.
Coding Conventions (summary)
- No
unwrap()orpanic!()in non-test code - All
unsafeblocks require a// SAFETY:comment - Errors go through
PgTrickleErrorinsrc/error.rs - New SQL functions use
#[pg_extern(schema = "pgtrickle")] - Tests use Testcontainers — never a local PostgreSQL instance
Full details are in AGENTS.md.
Commit Messages
Use Conventional Commits:
fix: correct pgoutput action parsing for tables named INSERT_LOG
feat: add CUBE explosion guard (max 64 UNION ALL branches)
docs: document JOIN key change limitation in SQL_REFERENCE
test: add E2E test for keyless table duplicate-row behaviour
Fuzz Testing
pg_trickle uses cargo-fuzz (libFuzzer) to exercise core parser and pipeline
logic. Fuzz targets live in fuzz/fuzz_targets/.
Available targets
| Target | What it exercises | Added in |
|---|---|---|
parser_fuzz | DVM SQL parser (OpTree construction) | v0.1.0 |
cron_fuzz | Cron expression parser | v0.26.0 |
guc_fuzz | GUC string→enum coercion | v0.26.0 |
cdc_fuzz | CDC trigger payload decoding | v0.26.0 |
wal_fuzz | WAL/SQLSTATE error classifier | v0.39.0 |
dag_fuzz | DAG/merge SQL and snapshot column-list | v0.39.0 |
sql_builder_fuzz | SQL builder + typed parser facade | v0.44.0 |
merge_sql_fuzz | Merge SQL codegen with random change streams | v0.49.0 |
row_id_fuzz | Row identity tracking with random operator trees | v0.49.0 |
Running all fuzz targets
# Run each target for 60 s (requires nightly toolchain):
just fuzz-all
# Run for a custom duration (seconds):
just fuzz-all 120
# Run a single target:
cargo +nightly fuzz run merge_sql_fuzz -- -max_total_time=60
Corpus directories are at fuzz/corpus/<target_name>/. Regression cases are
stored in proptest-regressions/.
Every new feature ships with documentation. This is a hard requirement, not a nice-to-have.
- New SQL functions → entry in
docs/SQL_REFERENCE.md. - New GUC variable → entry in
docs/CONFIGURATION.md. - New user-facing capability → at minimum a paragraph in the relevant chapter; for headline features, a dedicated page.
- New page → add it to
docs/SUMMARY.mdin the correct chapter.
PRs that introduce a #[pg_extern] function or a new GUC without
documentation will be asked to add it before merge.
Updating base image digests
Docker base images are pinned to exact SHA256 digests in all Dockerfiles
(Dockerfile.demo, Dockerfile.ghcr, tests/Dockerfile.e2e) for
reproducibility and supply-chain security (OPS-10-03).
To update the digests when a new PostgreSQL patch release is available:
# Requires docker with manifest support
scripts/update_base_image_digests.sh
The script resolves the current linux/amd64 digest for
postgres:18.3-bookworm, patches all Dockerfiles in-place, and prints the
commit command. Run this script quarterly or when a PostgreSQL patch release
is needed. Include the digest-update commit in the release PR.
If you are building for linux/arm64 or another platform, edit the
TARGET_PLATFORM variable in the script or pin to the manifest index digest
(returned by docker manifest inspect postgres:18.3-bookworm --verbose).
License
By contributing you agree that your contributions will be licensed under the Apache License 2.0.
Looking for the user security guide? See Security Guide for roles, grants, RLS interaction, SECURITY DEFINER semantics, and hardening checklists.
This page contains the project's vulnerability-reporting policy. The rendered version is served by the pg_trickle docs site; on GitHub the source is SECURITY.md.
Security Policy
Supported Versions
| Version | Supported |
|---|---|
| 0.13.x (current pre-release) | ✅ |
During pre-1.0 development, only the latest minor version receives security fixes. Once v1.0.0 is released, the two most recent minor versions will receive security fixes.
Reporting a Vulnerability
Please do not report security vulnerabilities via public GitHub Issues.
Use GitHub's built-in private vulnerability reporting:
- Go to the Security tab of this repository
- Click "Report a vulnerability"
- Fill in the details — affected version, description, reproduction steps, and potential impact
We aim to acknowledge reports within 48 hours and provide a fix or mitigation within 14 days for critical issues.
What to Include
A useful report includes:
- PostgreSQL version and
pg_trickleversion - Minimal reproduction SQL or Rust code
- Description of the unintended behaviour and its security impact
- Whether the vulnerability requires a trusted (superuser) or untrusted role to trigger
Scope
In-scope:
- SQL injection or privilege escalation via
pgtrickle.*functions - Memory safety issues in the Rust extension code (buffer overflows, use-after-free, etc.)
- Denial-of-service caused by a low-privilege user triggering runaway resource usage
- Information disclosure through change buffers (
pgtrickle_changes.*) or monitoring views
Out-of-scope:
- Vulnerabilities in PostgreSQL itself (report to the PostgreSQL security team)
- Vulnerabilities in pgrx (report to pgcentralfoundation/pgrx)
- Issues requiring physical access to the database host
Disclosure Policy
We follow coordinated disclosure. Once a fix is released we will publish a security advisory on GitHub with a CVE if applicable.
Architecture
This document describes the internal architecture of pg_trickle — a PostgreSQL 18 extension that implements stream tables with differential view maintenance. For a high-level description of what pg_trickle does and why, read ESSENCE.md. For release milestones and future plans, see Roadmap.
High-Level Overview
┌─────────────────────────────────────────────────────────────────┐
│ PostgreSQL 18 Backend │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌─────────────┐ │
│ │ Source │ │ Source │ │ Storage │ │ Storage │ │
│ │ Table A │ │ Table B │ │ Table X │ │ Table Y │ │
│ └────┬─────┘ └────┬─────┘ └────▲─────┘ └────▲────────┘ │
│ │ │ │ │ │
│ ═════╪══════════════╪══════════════╪══════════════╪════════ │
│ │ │ │ │ │
│ ┌────▼──────────────▼────┐ ┌────┴──────────────┴────┐ │
│ │ Hybrid CDC Layer │ │ Delta Application │ │
│ │ Triggers ──or── WAL │ │ (INSERT/DELETE diffs) │ │
│ └────────────┬───────────┘ └────────────▲───────────┘ │
│ │ │ │
│ ┌────────────▼───────────┐ ┌────────────┴───────────┐ │
│ │ Change Buffer │ │ DVM Engine │ │
│ │ (pgtrickle_changes.*) │ │ (Operator Tree) │ │
│ └────────────┬───────────┘ └────────────▲───────────┘ │
│ │ │ │
│ └────────────┬───────────────┘ │
│ │ │
│ ┌─────────────────────────▼─────────────────────────────┐ │
│ │ Refresh Engine │ │
│ │ ┌──────────┐ ┌──────────┐ ┌─────────────────────┐ │ │
│ │ │ Frontier │ │ DAG │ │ Scheduler │ │ │
│ │ │ Tracker │ │ Resolver │ │ (canonical schedule)│ │ │
│ │ └──────────┘ └──────────┘ └─────────────────────┘ │ │
│ └───────────────────────────────────────────────────────┘ │
│ │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ Catalog (pgtrickle.*) │ │
│ │ pgt_stream_tables │ pgt_dependencies │ pgt_refresh_history│ │
│ └────────────────────────────────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Monitoring Layer │ │
│ │ st_refresh_stats │ slot_health │ check_cdc_health │ │
│ │ explain_st │ views │ NOTIFY alerting │ │
│ └──────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
Component Details
1. SQL API Layer (src/api/)
The public entry point for users. All operations are exposed as #[pg_extern] functions in the pgtrickle schema. The API module is split into focused sub-modules:
| File | Responsibility |
|---|---|
src/api/mod.rs | Core lifecycle: create_stream_table, alter_stream_table, drop_stream_table, refresh_stream_table, bulk_create, repair_stream_table, pgt_status |
src/api/diagnostics.rs | Inspection helpers: explain_st, explain_refresh_mode, dependency_tree, list_sources |
src/api/outbox_hook.rs | pg_tide integration hook: attach_outbox() — calls into pg_tide after each successful refresh (v0.46.0+) |
src/api/snapshot.rs | Stream table snapshots (v0.27.0): snapshot_stream_table, restore_from_snapshot, list_snapshots, drop_snapshot |
src/api/self_monitoring.rs | Self-monitoring setup/teardown and auto-apply policy |
src/api/cluster.rs | Multi-database cluster overview: cluster_worker_summary |
src/api/publication.rs | Logical publication helpers and predictive cost model utilities |
src/api/metrics_ext.rs | Extended Prometheus metrics |
src/api/helpers.rs | Shared utilities (name resolution, table quoting) |
src/api/planner.rs | Schedule recommendation API |
Core functions:
- create_stream_table — Applies a chain of auto-rewrite passes (view inlining → DISTINCT ON → GROUPING SETS → scalar subquery in WHERE → correlated scalar subquery in SELECT → SubLinks in OR → multi-PARTITION BY windows), parses the defining query, builds an operator tree, creates the storage table, registers CDC slots, populates the catalog, and optionally performs an initial full refresh.
- alter_stream_table — Modifies schedule, refresh mode, status (ACTIVE/SUSPENDED), or defining query. Query changes trigger schema migration, dependency updates, and a full refresh within a single transaction.
- drop_stream_table — Removes the storage table, catalog entries, and cleans up CDC slots.
- refresh_stream_table — Triggers a manual refresh (same path as automatic scheduling).
- pgt_status — Returns a summary of all registered stream tables.
2. Catalog (src/catalog.rs)
The catalog manages persistent metadata stored in PostgreSQL tables within the pgtrickle schema:
| Table | Purpose |
|---|---|
pgtrickle.pgt_stream_tables | Core metadata: name, query, schedule, status, frontier, etc. |
pgtrickle.pgt_dependencies | DAG edges from ST to source tables |
pgtrickle.pgt_refresh_history | Audit log of every refresh operation |
pgtrickle.pgt_change_tracking | Per-source CDC slot metadata |
Schema creation is handled by extension_sql!() macros that run at CREATE EXTENSION time.
Entity-Relationship Diagram
erDiagram
pgt_stream_tables {
bigserial pgt_id PK
oid pgt_relid UK "OID of materialized storage table"
text pgt_name
text pgt_schema
text defining_query
text original_query "User's original SQL (pre-inlining)"
text schedule "Duration or cron expression"
text refresh_mode "FULL | DIFFERENTIAL | IMMEDIATE"
text status "INITIALIZING | ACTIVE | SUSPENDED | ERROR"
boolean is_populated
timestamptz data_timestamp "Freshness watermark"
jsonb frontier "DBSP-style version frontier"
timestamptz last_refresh_at
int consecutive_errors
boolean needs_reinit
float8 auto_threshold
float8 last_full_ms
timestamptz created_at
timestamptz updated_at
}
pgt_dependencies {
bigint pgt_id PK,FK "References pgt_stream_tables.pgt_id"
oid source_relid PK "OID of source table"
text source_type "TABLE | STREAM_TABLE | VIEW"
text_arr columns_used "Column-level lineage"
text cdc_mode "TRIGGER | TRANSITIONING | WAL"
text slot_name "Replication slot (WAL mode)"
pg_lsn decoder_confirmed_lsn "WAL decoder progress"
timestamptz transition_started_at "Trigger→WAL transition start"
}
pgt_refresh_history {
bigserial refresh_id PK
bigint pgt_id FK "References pgt_stream_tables.pgt_id"
timestamptz data_timestamp
timestamptz start_time
timestamptz end_time
text action "NO_DATA | FULL | DIFFERENTIAL | REINITIALIZE | SKIP"
bigint rows_inserted
bigint rows_deleted
text error_message
text status "RUNNING | COMPLETED | FAILED | SKIPPED"
text initiated_by "SCHEDULER | MANUAL | INITIAL"
timestamptz freshness_deadline
}
pgt_change_tracking {
oid source_relid PK "OID of tracked source table"
text slot_name "Trigger function name"
pg_lsn last_consumed_lsn
bigint_arr tracked_by_pgt_ids "ST IDs sharing this source"
}
pgt_stream_tables ||--o{ pgt_dependencies : "has sources"
pgt_stream_tables ||--o{ pgt_refresh_history : "has refresh history"
pgt_stream_tables }o--o{ pgt_change_tracking : "tracks via pgt_ids array"
Note: Change buffer tables (
pgtrickle_changes.changes_<oid>) are created dynamically per source table OID and live in the separatepgtrickle_changesschema.
3. CDC / Change Data Capture (src/cdc.rs, src/wal_decoder.rs)
pg_trickle uses a hybrid CDC architecture that starts with triggers and optionally transitions to WAL-based (logical replication) capture for lower write-side overhead.
Trigger Mode (default)
- Trigger Management — Creates
AFTER INSERT OR UPDATE OR DELETErow-level triggers (pg_trickle_cdc_<oid>) on each tracked source table. Each trigger fires a PL/pgSQL function (pg_trickle_cdc_fn_<oid>()) that writes changes to the buffer table. - Change Buffering — Decoded changes are written to per-source change buffer tables in the
pgtrickle_changesschema. Each row captures the LSN (pg_current_wal_lsn()), transaction ID, action type (I/U/D), and the new/old row data as typed columns (new_<col> TYPE,old_<col> TYPE) — native PostgreSQL types, not JSONB. - Cleanup — Consumed changes are deleted after each successful refresh via
delete_consumed_changes(), bounded by the upper LSN to prevent unbounded scans. - Lifecycle — Triggers and trigger functions are automatically created when a source table is first tracked and dropped when the last stream table referencing a source is removed.
The trigger approach was chosen as the default for transaction safety (triggers can be created in the same transaction as DDL), simplicity (no slot management, no wal_level = logical requirement), and immediate visibility (changes are visible in buffer tables as soon as the source transaction commits).
WAL Mode (optional, automatic transition)
When pg_trickle.cdc_mode is set to 'auto' or 'wal' and wal_level = logical is available, the system transitions from trigger-based to WAL-based CDC after the first successful refresh:
- WAL Availability Detection — At stream table creation, checks whether
wal_level = logicalis configured. If so, the source dependency is marked for WAL transition. - WAL Decoder Background Worker — A dedicated background worker (
src/wal_decoder.rs) polls logical replication slots and writes decoded changes into the same change buffer tables used by triggers, ensuring a uniform format for the DVM engine. - Transition Orchestration — The transition is a three-step process: (a) create a replication slot, (b) wait for the decoder to catch up to the trigger's last confirmed LSN, (c) drop the trigger and switch the dependency to WAL mode. If the decoder doesn't catch up within
pg_trickle.wal_transition_timeout(default 300s), the system falls back to triggers. - CDC Mode Tracking — Each source dependency in
pgt_dependenciescarries acdc_modecolumn (TRIGGER / TRANSITIONING / WAL) and WAL-specific metadata (slot_name,decoder_confirmed_lsn,transition_started_at).
See ADR-001 and ADR-002 in plans/adrs/PLAN_ADRS.md for the original design rationale and plans/sql/PLAN_HYBRID_CDC.md for the full implementation plan.
Immediate Mode / Transactional IVM (src/ivm.rs)
When refresh_mode = 'IMMEDIATE', pg_trickle uses statement-level AFTER triggers with transition tables instead of row-level CDC triggers. The stream table is maintained synchronously within the same transaction as the base table DML.
- BEFORE Triggers — Statement-level BEFORE triggers on each base table acquire an advisory lock on the stream table to prevent concurrent conflicting updates.
- AFTER Triggers — Statement-level AFTER triggers with
REFERENCING NEW TABLE AS ... OLD TABLE AS ...copy the transition table data to temp tables, then call the Rustpgt_ivm_apply_delta()function. - Delta Computation — The DVM engine's
Scanoperator reads from the temp tables (viaDeltaSource::TransitionTable) instead of change buffer tables. No LSN filtering or net-effect computation is needed — each trigger invocation represents a single atomic statement. - Delta Application — The computed delta is applied via explicit DML (DELETE + INSERT ON CONFLICT) to the stream table.
- TRUNCATE — A separate AFTER TRUNCATE trigger calls
pgt_ivm_handle_truncate(), which truncates the stream table and re-populates from the defining query.
No change buffer tables, no scheduler involvement, and no WAL infrastructure is needed for IMMEDIATE mode. See plans/sql/PLAN_TRANSACTIONAL_IVM.md for the design plan.
ST-to-ST Change Capture (v0.11.0+)
When a stream table's defining query references another stream table (rather than a base table), neither triggers nor WAL capture apply — the upstream source is itself maintained by pg_trickle. A dedicated ST change buffer mechanism enables downstream stream tables to refresh differentially even when their source is another stream table.
Base Table ──trigger/WAL──▶ changes_<oid> (base-table buffer)
Stream Table A ──refresh──▶ changes_pgt_<pgt_id> (ST buffer for A's consumers)
Stream Table B reads from changes_pgt_<pgt_id> (B depends on A)
Buffer schema. ST change buffers are named pgtrickle_changes.changes_pgt_<pgt_id> (using the internal pgt_id rather than the OID). Unlike base-table buffers, they store only new_* columns — no old_* columns — because ST deltas are expressed as INSERT/DELETE pairs, not UPDATE rows.
Delta capture — DIFFERENTIAL path. When an upstream stream table refreshes in DIFFERENTIAL mode and has downstream consumers, the refresh engine captures the computed delta (the INSERT and DELETE rows applied to the upstream ST) into the ST change buffer via explicit DML. Downstream stream tables then read from this buffer exactly as they would read from a base-table change buffer.
Delta capture — FULL path. When an upstream stream table refreshes in FULL mode (e.g., due to a mode downgrade or full => true), the engine takes a pre-refresh snapshot, executes the full refresh, then computes an EXCEPT ALL diff between the old and new contents. The resulting INSERT/DELETE pairs are written to the ST change buffer. This prevents FULL refreshes from cascading through the entire dependency chain — downstream STs always receive a minimal delta regardless of how the upstream was refreshed.
Frontier tracking. ST source positions are tracked in the same frontier JSONB structure as base-table sources, using pgt_<upstream_pgt_id> as the key (e.g., {"pgt_42": 157}) rather than the OID-based keys used for base tables. The scheduler's has_stream_table_source_changes() function compares the downstream's last-consumed frontier position against the upstream buffer's current maximum LSN to decide whether a refresh is needed.
Lifecycle. ST change buffers are created automatically when a stream table gains its first downstream consumer (create_st_change_buffer_table()), and dropped when the last downstream consumer is removed (drop_st_change_buffer_table()). On upgrade from pre-v0.11.0, existing ST-to-ST dependencies have their buffers auto-created on the first scheduler tick. Consumed rows are cleaned up by cleanup_st_change_buffers_by_frontier() after each successful downstream refresh.
Frontier Visibility Holdback (Issue #536)
The CDC frontier (pgt_stream_tables.frontier) is advanced based on LSN ordering while the change buffer is read under standard MVCC visibility. These two dimensions are orthogonal: a change buffer row may have an LSN below the new frontier yet still be invisible (uncommitted) at the moment the scheduler queries the buffer.
Failure scenario (trigger-based CDC only):
Without holdback, a transaction that inserts into a tracked table and commits after the scheduler has captured the tick watermark (pg_current_wal_lsn()) will have its change-buffer row permanently skipped on the next tick, because the frontier advanced past the row's LSN while the row was still uncommitted.
Fix — frontier_holdback_mode = 'xmin' (default):
Before computing the tick watermark, the scheduler probes pg_stat_activity and pg_prepared_xacts for the oldest in-progress transaction xmin. If any transaction from before the previous tick is still running, the frontier is held back to the previous tick's safe watermark rather than advancing to pg_current_wal_lsn(). This is a single cheap SPI round-trip per scheduler tick (~µs).
The holdback algorithm (cdc::classify_holdback) is purely functional and unit-tested independently of the backend.
Configuration:
pg_trickle.frontier_holdback_mode—'xmin'(default, safe),'none'(fast but can lose rows),'lsn:<N>'(hold back by N bytes, for debugging).pg_trickle.frontier_holdback_warn_seconds— emit aWARNING(at most once per minute) when holdback has been active longer than this many seconds (default: 60).
Note: WAL/logical-replication CDC mode is immune to this issue (commit-LSN ordering is inherently safe). The holdback is skipped when cdc_mode = 'wal'.
Observability: Two Prometheus gauges are exposed:
pg_trickle_frontier_holdback_lsn_bytes— how many WAL bytes behind write_lsn the safe frontier currently is.pg_trickle_frontier_holdback_seconds— age (in seconds) of the oldest in-progress transaction.
See plans/safety/PLAN_FRONTIER_VISIBILITY_HOLDBACK.md for the full design rationale.
4. DVM Engine (src/dvm/)
The Differential View Maintenance engine is the core of the system. It transforms the defining SQL query into an executable operator tree that can compute deltas efficiently.
Auto-Rewrite Pipeline (src/dvm/parser.rs)
Before the defining query is parsed into an operator tree, it passes through a chain of auto-rewrite passes that normalize SQL constructs the DVM parser doesn't handle directly:
| Pass | Function | Purpose |
|---|---|---|
| #0 | rewrite_views_inline() | Replace view references with (view_definition) AS alias subqueries |
| #1 | rewrite_distinct_on() | Convert DISTINCT ON to ROW_NUMBER() OVER (…) = 1 window subquery |
| #2 | rewrite_grouping_sets() | Decompose GROUPING SETS / CUBE / ROLLUP into UNION ALL of GROUP BY |
| #3 | rewrite_scalar_subquery_in_where() | Convert WHERE col > (SELECT …) to CROSS JOIN |
| #4 | rewrite_sublinks_in_or() | Split WHERE a OR EXISTS (…) into UNION branches |
| #5 | rewrite_multi_partition_windows() | Split multiple PARTITION BY clauses into joined subqueries |
The view inlining pass (#0) runs first so that view definitions containing DISTINCT ON, GROUPING SETS, etc. are further rewritten by downstream passes. Nested views are expanded via a fixpoint loop (max depth 10).
Query Parser (src/dvm/parser.rs)
Parses the defining query using PostgreSQL's internal parser (via pgrx raw_parser) and extracts:
- WITH clause — CTE definitions (non-recursive: inline expansion or shared delta; recursive: detected for mode gating)
- Target list — output columns
- FROM clause — source tables, joins, subqueries, and CTE references
- WHERE clause — filters
- GROUP BY / aggregate functions
- DISTINCT / UNION ALL / INTERSECT / EXCEPT
The parser produces an OpTree — a tree of operator nodes. CTE handling follows a tiered approach:
- Tier 1 (Inline Expansion) — Non-recursive CTEs referenced once are expanded into
Subquerynodes, equivalent to subqueries in FROM. - Tier 2 (Shared Delta) — Non-recursive CTEs referenced multiple times produce
CteScannodes that share a single delta computation via a CTE registry and delta cache. - Tier 3a/3b/3c (Recursive) — Recursive CTEs (
WITH RECURSIVE) are detected viaquery_has_recursive_cte(). In FULL mode, the query executes as-is. In DIFFERENTIAL mode, the strategy is auto-selected: semi-naive evaluation for INSERT-only changes, Delete-and-Rederive (DRed) for mixed changes, or recomputation fallback when CTE columns don't match ST storage or when the recursive term contains non-monotone operators (EXCEPT, Aggregate, Window, DISTINCT, AntiJoin, INTERSECT SET). In IMMEDIATE mode, the same semi-naive / DRed machinery runs against statement transition tables and is bounded bypg_trickle.ivm_recursive_max_depthto guard against unbounded recursion.
§ Recursive CTE Strategy Selection
The DVM engine selects among five strategies for WITH RECURSIVE queries. The
selection is logged at startup and visible via explain_stream_table().
| Tier | Condition | Strategy |
|---|---|---|
| Tier 1 | CTE is non-recursive and referenced once | Inline expansion — CTE is expanded inline; no differential overhead. |
| Tier 2 | CTE is non-recursive and referenced 2+ times | Shared delta — single delta computation reused across all reference sites. |
| Tier 3a | CTE is recursive with monotone operators only (UNION ALL, no NOT EXISTS / aggregation) | Semi-naive evaluation — frontier-bounded delta avoids full recomputation. |
| Tier 3b | CTE is recursive with non-monotone operators; base tables have primary keys | DRed (Deletion Propagation in Recursive Datalog) — handles deletions by re-deriving affected tuples. |
| Tier 3c | CTE is recursive with non-monotone operators and no primary keys, or cycle in dependency graph | Full recomputation — most conservative; correct for all inputs. |
Observability: explain_stream_table(st_name) returns a recursive_cte_strategy
field showing which tier was selected and the reason. Example output:
{
"recursive_cte_strategy": "semi_naive",
"recursive_cte_reason": "Tier 3a: monotone UNION ALL recursion with no aggregation or NOT EXISTS"
}
Example — Tier 3a (semi-naive) for hierarchical closure:
WITH RECURSIVE ancestors AS (
SELECT id, parent_id FROM org_chart WHERE parent_id IS NULL
UNION ALL
SELECT c.id, c.parent_id
FROM org_chart c
JOIN ancestors a ON c.parent_id = a.id
)
SELECT * FROM ancestors;
Because the recursive term uses only UNION ALL and a plain JOIN (both
monotone), pg_trickle selects Tier 3a (semi-naive): only newly reachable
rows are computed per delta, not the full transitive closure.
Operators (src/dvm/operators/)
Each operator knows how to generate a delta query — given a set of changes to its inputs, it produces the corresponding changes to its output:
| Operator | Delta Strategy |
|---|---|
| Scan | Direct passthrough of CDC changes |
| Filter | Apply WHERE predicate to deltas |
| Project | Apply column projection to deltas |
| Join | Join deltas against the other side's current state |
| OuterJoin | LEFT/RIGHT outer join with NULL padding |
| FullJoin | FULL OUTER JOIN with 8-part delta (both sides may produce NULLs) |
| Aggregate | Recompute group values where affected keys changed |
| Distinct | COUNT-based duplicate tracking |
| UnionAll | Merge deltas from both branches |
| Intersect | Dual-count multiplicity with LEAST boundary crossing |
| Except | Dual-count multiplicity with GREATEST(0, L-R) boundary crossing |
| Subquery | Transparent delegation + optional column renaming (CTEs, subselects) |
| CteScan | Shared delta lookup from CTE cache (multi-reference CTEs) |
| RecursiveCte | Semi-naive / DRed / recomputation for WITH RECURSIVE |
| Window | Partition-based recomputation for window functions |
| LateralFunction | Row-scoped recomputation for SRFs in FROM (jsonb_array_elements, unnest, etc.) |
| LateralSubquery | Row-scoped recomputation for correlated subqueries in LATERAL FROM |
| SemiJoin | EXISTS / IN subquery delta via semi-join |
| AntiJoin | NOT EXISTS / NOT IN subquery delta via anti-join |
| ScalarSubquery | Correlated scalar subquery in SELECT list |
See DVM_OPERATORS.md for detailed descriptions.
Diff Engine (src/dvm/diff.rs)
Generates the final diff SQL that:
- Computes the delta from the operator tree
- Produces
('+', row)for inserts and('-', row)for deletes - Applies the diff via
DELETEmatching old rows andINSERTfor new rows
5. DAG / Dependency Graph (src/dag.rs)
Stream tables can depend on other stream tables (cascading), forming a Directed Acyclic Graph:
- Cycle detection — Detects circular dependencies at creation time using Kahn's algorithm (BFS topological sort). When
pg_trickle.allow_circular = true, monotone cycles (queries using only safe operators — joins, filters, UNION ALL, etc.) are allowed; non-monotone cycles (aggregates, EXCEPT, window functions, anti-joins) are rejected. SCC IDs are automatically assigned to cycle members and recomputed on drop/alter. - SCC decomposition — Tarjan's algorithm decomposes the graph into strongly connected components. Singleton SCCs are acyclic; multi-node SCCs contain cycles that are handled by fixed-point iteration in the scheduler.
- Monotonicity analysis — Static check (
check_monotonicity()insrc/dvm/parser.rs) determines whether a query's operators are safe for cyclic fixed-point iteration. Non-monotone operators (Aggregate, EXCEPT, Window, NOT EXISTS) block cycle creation. - Topological ordering — Determines refresh order: upstream STs must be refreshed before downstream STs.
- Condensation order —
condensation_order()returns SCCs in topological order, grouping cyclic STs for fixed-point iteration. The scheduler'siterate_to_fixpoint()processes multi-node SCCs by refreshing all members repeatedly until convergence (zero net changes) ormax_fixpoint_iterationsis exceeded. - Cascade operations — When a source table changes, all transitive dependents are identified for refresh.
6. Version / Frontier Tracking (src/version.rs)
Implements a per-source frontier (JSONB map of source_oid → LSN) to track exactly how far each stream table has consumed changes:
- Read frontier — Before refresh, read the frontier to know where to start consuming changes.
- Advance frontier — After a successful refresh, the frontier is updated to the latest consumed LSN.
- Consistent snapshots — The frontier ensures that each refresh processes a contiguous, non-overlapping window of changes.
Delayed View Semantics (DVS) Guarantee
The contents of every stream table are logically equivalent to evaluating its defining query at some past point in time — the data_timestamp. The scheduler refreshes STs in topological order so that when ST B references upstream ST A, A has already been refreshed to the target data_timestamp before B runs its delta query against A's contents. The frontier lifecycle is:
- Created — on first full refresh; records the LSN of each source at that moment.
- Advanced — on each differential refresh; the old frontier becomes the lower bound and the new frontier (with fresh LSNs) the upper bound. The DVM engine reads changes in
[old, new]. - Reset — on reinitialize; a fresh frontier is created from scratch.
7. Refresh Engine (src/refresh.rs)
Orchestrates the complete refresh cycle:
┌──────────────┐
│ Check State │ → Is ST active? Has it been populated?
└──────┬───────┘
│
┌─────▼──────┐
│ Drain CDC │ → Read WAL changes into change buffer tables
└─────┬──────┘
│
┌─────▼──────────────┐
│ Determine Action │ → FULL, DIFFERENTIAL, NO_DATA, REINITIALIZE, or SKIP?
│ │ (adaptive: if change ratio > pg_trickle.differential_max_change_ratio,
│ │ downgrade DIFFERENTIAL → FULL automatically)
└─────┬──────────────┘
│
┌─────▼──────┐
│ Execute │ → Full: TRUNCATE + INSERT ... SELECT
│ │ Differential: Generate & apply delta SQL
└─────┬──────┘
│
┌─────▼──────────────┐
│ Record History │ → Write to pgtrickle.pgt_refresh_history
└─────┬──────────────┘
│
┌─────▼──────────────┐
│ Advance Frontier │ → Update JSONB frontier in catalog
└─────┬──────────────┘
│
┌─────▼──────────────┐
│ Reset Error Count │ → On success, reset consecutive_errors to 0
└──────────────────────┘
8. Background Worker & Scheduling (src/scheduler.rs)
Registration & Lifecycle
pg_trickle registers one PostgreSQL background worker — the scheduler — during _PG_init() (extension load). Because it is registered at startup, pg_trickle must appear in shared_preload_libraries, which requires a server restart.
┌──────────────────────────────────────────────────────────────────┐
│ PostgreSQL postmaster │
│ │
│ shared_preload_libraries = 'pg_trickle' │
│ │ │
│ ▼ │
│ _PG_init() │
│ ├─ Register GUCs (pg_trickle.enabled, scheduler_interval_ms …) │
│ ├─ Register shared memory (PgTrickleSharedState, atomics) │
│ └─ BackgroundWorkerBuilder::new("pg_trickle scheduler") │
│ .set_start_time(RecoveryFinished) │
│ .set_restart_time(5s) ← auto-restart on crash │
│ .load() │
│ │
│ After recovery finishes: │
│ │ │
│ ▼ │
│ pg_trickle_scheduler_main() ← background worker starts │
│ ├─ Attach SIGHUP + SIGTERM handlers │
│ ├─ Connect to SPI (database = "postgres") │
│ ├─ Crash recovery: mark stale RUNNING records as FAILED │
│ └─ Enter main loop ─────────────────────────┐ │
│ │ │ │
│ ▼ │ │
│ wait_latch(scheduler_interval_ms) │ │
│ │ │ │
│ ┌───▼───────────────────────────────┐ │ │
│ │ SIGTERM? → log + break │ │ │
│ │ pg_trickle.enabled = false? → skip │ │ │
│ │ Otherwise → scheduler tick │ │ │
│ └───┬───────────────────────────────┘ │ │
│ │ │ │
│ └──────────── loop ────────────────────┘ │
└──────────────────────────────────────────────────────────────────┘
Key lifecycle properties:
| Property | Behaviour |
|---|---|
| Start condition | After PostgreSQL recovery finishes (RecoveryFinished) |
| Auto-restart | 5-second delay after an unexpected crash |
| Graceful shutdown | Handles SIGTERM — breaks the main loop and exits cleanly |
| Config reload | Handles SIGHUP — re-reads GUC values on the next latch wake |
| Crash recovery | On startup, any pgt_refresh_history rows stuck in RUNNING status are marked FAILED (the transaction that wrote them was rolled back by PostgreSQL, but the status row may have been committed in a prior transaction) |
| Database | Connects to the postgres database via SPI |
| Standby / replica | On standby servers (pg_is_in_recovery() = true), the worker enters a sleep loop and does not attempt refreshes. Stream tables are still readable on standbys — they are regular heap tables replicated via physical streaming replication. After promotion the scheduler resumes automatically. See the FAQ § Replication for details on logical replication and subscriber limitations. |
Scheduler Tick
Each tick of the main loop performs the following steps inside a single transaction:
- DAG rebuild — Compare the shared-memory
DAG_REBUILD_SIGNALcounter against the local copy. If it advanced (aCREATE,ALTER, orDROPstream table occurred), rebuild the in-memory dependency graph (StDag) from the catalog. - Topological traversal — Walk stream tables in dependency order (upstream before downstream). This ensures that when ST B references ST A, A is refreshed first.
- Per-ST evaluation — For each active ST:
- Skip if in retry backoff (exponential, per-ST).
- Skip if schedule/cron says not yet due.
- Skip if a row-level lock on the catalog entry indicates a concurrent refresh.
- Check upstream change buffers for pending rows.
- Execute refresh — Acquire a row-level lock on the catalog entry → record
RUNNINGin history → runFULL/DIFFERENTIAL/REINITIALIZE→ store new frontier → release lock → record completion. - WAL transitions — Advance any trigger→WAL CDC mode transitions (
src/wal_decoder.rs). - Slot health — Check replication slot health and emit
NOTIFYalerts. - Prune retry state — Remove backoff entries for STs that no longer exist.
Sequential Processing (Default)
By default (parallel_refresh_mode = 'off), the scheduler processes
stream tables sequentially within a single background worker. All STs
are refreshed one at a time in topological order.
pg_trickle.max_concurrent_refreshes (default 4) only prevents a manual
pgtrickle.refresh_stream_table() call from overlapping with the
scheduler on the same ST — it does not spawn additional workers.
The PostgreSQL GUC max_worker_processes (default 8) sets the server-wide budget for all background workers (autovacuum, parallel query, logical replication, extensions). In sequential mode pg_trickle consumes one slot from that budget.
Parallel Refresh (parallel_refresh_mode = 'on')
When enabled, the scheduler builds an execution-unit DAG from the stream-table dependency graph and dispatches independent units to dynamic background workers:
- Execution units — Each independent stream table becomes a singleton unit. Atomic consistency groups and IMMEDIATE-trigger closures are collapsed into composite units that run in a single worker for correctness.
- Ready queue — Units whose upstream dependencies have all
completed enter the ready queue. The coordinator dispatches them
subject to a per-database cap (
max_concurrent_refreshes) and a cluster-wide cap (max_dynamic_refresh_workers). - Dynamic workers — Each dispatched unit spawns a short-lived
background worker via
BackgroundWorkerBuilder::load_dynamic(). Workers claim a job from thepgtrickle.pgt_scheduler_jobscatalog table, execute the refresh, and exit.
The parallel path respects the same topological ordering as the
sequential path — downstream units only become ready after all upstream
units succeed. The worker-budget caps ensure pg_trickle does not exhaust
max_worker_processes.
See PLAN_PARALLELISM.md for the full design and CONFIGURATION.md for tuning guidance.
Retry & Error Handling
Each ST maintains an in-memory RetryState (reset on scheduler restart):
- Retryable errors (SPI failures, lock contention, slot issues) trigger exponential backoff.
- Permanent errors (schema mismatch, user errors) skip backoff but increment
consecutive_errors. - When
consecutive_errorsreachespg_trickle.max_consecutive_errors(default 3), the ST is auto-suspended and aNOTIFYalert is emitted. - Schema errors additionally set
needs_reinit, triggering aREINITIALIZEon the next successful cycle.
Scheduling Policy
Automatic refresh scheduling uses canonical periods (48·2ⁿ seconds, n = 0, 1, 2, …) snapped to the user's schedule:
- Picks the smallest canonical period ≤
schedule. - For DOWNSTREAM schedule (NULL schedule), the ST refreshes only when explicitly triggered or when a downstream ST needs it.
- Advisory locks prevent concurrent refreshes of the same ST.
- The scheduler is driven by the background worker polling at the
pg_trickle.scheduler_interval_msGUC interval.
Shared Memory (src/shmem.rs)
The scheduler background worker and user sessions share a PgTrickleSharedState structure protected by a PgLwLock. Key fields:
| Field | Type | Purpose |
|---|---|---|
dag_version | u64 | Incremented when the ST catalog changes; used by the scheduler to detect when the DAG needs rebuilding. |
scheduler_pid | i32 | PID of the scheduler background worker (0 if not running). |
scheduler_running | bool | Whether the scheduler is active. |
last_scheduler_wake | i64 | Unix timestamp of the last scheduler wake cycle (for monitoring). |
A separate PgAtomic<AtomicU64> named DAG_REBUILD_SIGNAL is incremented by API functions (create, alter, drop) after catalog mutations. The scheduler compares its local copy against the atomic counter to detect when to rebuild its in-memory DAG without holding a lock.
A second PgAtomic<AtomicU64> named CACHE_GENERATION tracks DDL events that may invalidate cached delta or MERGE templates across backends. When DDL hooks fire (view change, ALTER TABLE, function change) or API functions mutate the catalog, CACHE_GENERATION is bumped. Each backend maintains a thread-local generation counter; on the next refresh, if the shared generation has advanced, the backend flushes its delta template cache, MERGE template cache, and explicitly DEALLOCATEs tracked __pgt_merge_* prepared statements before rebuilding local state.
9. DDL Tracking (src/hooks.rs)
Event triggers monitor DDL changes to source tables and functions:
_on_ddl_end— Fires onALTER TABLEto detect column adds/drops/type changes. If a source table used by a ST is altered, the ST'sneeds_reinitflag is set. Also detectsCREATE OR REPLACE FUNCTION/ALTER FUNCTION— if the function appears in a ST'sfunctions_usedcatalog column, the ST is marked for reinit._on_sql_drop— Fires onDROP TABLEto setneeds_reinitfor affected STs. Also detectsDROP FUNCTIONand marks affected STs for reinit.- Function name extraction —
object_identitystrings (e.g.,public.my_func(integer, text)) are parsed to extract the bare function name, which is matched against thefunctions_used TEXT[]column inpgt_stream_tables.
Reinitialization is deferred until the next refresh cycle, which then performs a REINITIALIZE action (drop and recreate the storage table from the updated query).
10. Error Handling (src/error.rs)
Centralized error types using thiserror:
PgTrickleErrorvariants cover catalog access, SQL execution, CDC, DVM, DAG, and config errors.- Each refresh failure increments
consecutive_errors. - When
consecutive_errorsreachespg_trickle.max_consecutive_errors(default 3), the ST is moved toERRORstatus and suspended from automatic refresh. - Manual intervention (
ALTER ... status => 'ACTIVE') resets the counter.
11. Monitoring (src/monitor.rs)
Provides observability functions:
- st_refresh_stats — Aggregate statistics (total/successful/failed refreshes, avg duration, staleness status).
- get_refresh_history — Per-ST audit trail.
- get_staleness — Current staleness in seconds.
- slot_health — Checks replication slot state and WAL retention.
- check_cdc_health — Per-source CDC health status including mode, slot lag, confirmed LSN, and alerts.
- explain_st — Describes the DVM plan for a given ST.
- diamond_groups — Lists detected diamond dependency groups, their members, convergence points, and epoch counters.
- Views —
pgtrickle.stream_tables_info(computed staleness) andpgtrickle.pg_stat_stream_tables(combined stats).
NOTIFY Alerting
Operational events are broadcast via PostgreSQL NOTIFY on the pg_trickle_alert channel. Clients can subscribe with LISTEN pg_trickle_alert; and receive JSON-formatted events:
| Event | Condition |
|---|---|
stale | data staleness exceeds 2× schedule |
auto_suspended | ST suspended after pg_trickle.max_consecutive_errors failures |
reinitialize_needed | Upstream DDL change detected |
slot_lag_warning | Replication slot WAL retention exceeded pg_trickle.slot_lag_warning_threshold_mb |
cdc_transition_complete | Source transitioned from trigger to WAL-based CDC |
cdc_transition_failed | Trigger→WAL transition failed (fell back to triggers) |
refresh_completed | Refresh completed successfully |
refresh_failed | Refresh failed with an error |
12. Row ID Hashing (src/hash.rs)
Provides deterministic 64-bit row identifiers using xxHash (xxh64) with a fixed seed. Two SQL functions are exposed:
pgtrickle.pg_trickle_hash(text)— Hash a single text value; used for simple single-column row IDs.pgtrickle.pg_trickle_hash_multi(text[])— Hash multiple values (separated by a record-separator byte\x1E) for composite keys (join row IDs, GROUP BY keys).
Row IDs are written into every stream table's storage as an internal __pgt_row_id BIGINT column and are used by the delta application phase to match DELETE candidates precisely.
13. Diamond Dependency Consistency (src/dag.rs)
When stream tables form diamond-shaped dependency graphs, a convergence (fan-in) node may read from multiple upstream STs that share a common ancestor:
A (source table)
/ \
B C (intermediate STs)
\ /
D (convergence / fan-in ST)
If B refreshes successfully but C fails, D would read a fresh version of B's data alongside stale data from C — a split-version inconsistency.
Detection
StDag::detect_diamonds() walks all fan-in nodes (STs with multiple upstream ST dependencies) and computes transitive ancestor sets per branch. If two or more branches share ancestors, a diamond is detected. Overlapping diamonds are merged.
Consistency Groups
StDag::compute_consistency_groups() converts detected diamonds into consistency groups — topologically ordered sets of STs that must be refreshed atomically. Each group contains:
- Members — All intermediate STs plus the convergence node, in refresh order.
- Convergence points — The fan-in nodes where multiple paths meet.
- Epoch counter — Advances on each successful atomic refresh.
STs not involved in any diamond are placed in singleton groups (no overhead).
Scheduler Wiring
When diamond_consistency = 'atomic' (per-ST or via the pg_trickle.diamond_consistency GUC):
- The scheduler wraps each multi-member group in a
SAVEPOINT pgt_consistency_group. - Each member is refreshed in topological order within the savepoint.
- If all succeed —
RELEASE SAVEPOINTand advance the group epoch. - If any member fails —
ROLLBACK TO SAVEPOINTundoes all members' changes. The failure is logged and the group retries on the next scheduler tick.
With diamond_consistency = 'none', members refresh independently in topological order — matching pre-feature behavior.
Schedule Policy
The diamond_schedule_policy setting (per-convergence-node or via the pg_trickle.diamond_schedule_policy GUC) controls when an atomic group fires:
| Policy | Trigger condition | Trade-off |
|---|---|---|
'fastest' (default) | Any member is due | Higher freshness, more refreshes |
'slowest' | All members are due | Lower resource cost, staler data |
The policy is set on the convergence (fan-in) node. When multiple convergence nodes exist in the same group (nested diamonds), the strictest policy wins (slowest > fastest). The GUC serves as a cluster-wide fallback for nodes without an explicit per-node setting.
Monitoring
The pgtrickle.diamond_groups() SQL function exposes detected groups for operational visibility. See SQL_REFERENCE.md for details.
14. pg_tide Integration
Extracted in v0.46.0. The outbox, inbox, and relay subsystems were moved to the standalone
pg_tideextension to give event messaging its own focused release cadence and reduce the surface area ofpg_trickle.
What Stays in pg_trickle
attach_outbox()integration hook — a lightweight hook that pg_tide calls after each successful refresh cycle to publish the delta summary to pg_tide's outbox table. pg_trickle itself never writes to the outbox; it only invokes the hook.- Change buffer subscription — pg_trickle exposes the internal change buffer
(
pgtrickle_changes.changes_<oid>) as a stable interface so pg_tide consumers can subscribe to raw CDC events without going through the refresh engine.
What Lives in pg_tide
enable_outbox()/poll_outbox()— outbox provisioning and polling API.- Consumer groups and visibility lease management.
- Claim-check mode for large payloads.
create_inbox()/enable_inbox_ordering()— inbox provisioning.- FNV-1a consistent hashing (
inbox_is_my_partition()) for horizontal scaling. - The
pgtrickle-relaybinary — forwards outbox rows to Kafka, NATS, SQS, and other transports.
API Documentation
See the pg_tide repository for the complete API reference, deployment guide, and relay architecture.
15. Stream Table Snapshots (src/api/snapshot.rs)
Added in v0.27.0.
snapshot_stream_table(name) exports the current content of a stream table into an archival table, capturing the extension version and current frontier in metadata columns (__pgt_snapshot_version, __pgt_frontier, __pgt_snapshotted_at).
restore_from_snapshot(name, source) truncates the stream table and reloads it from the snapshot, then restores the saved frontier. This ensures the next refresh cycle is DIFFERENTIAL — skipping the expensive full re-scan that would otherwise follow a blank stream table.
Primary use cases: replica bootstrap, PITR alignment, and historical archiving.
16. Configuration (src/config.rs)
Runtime behavior is controlled by a growing set of GUC (Grand Unified Configuration) variables. See CONFIGURATION.md for the complete, current list.
| GUC | Default | Purpose |
|---|---|---|
pg_trickle.enabled | true | Master on/off switch for the scheduler |
pg_trickle.scheduler_interval_ms | 1000 | Scheduler background worker wake interval (ms) |
pg_trickle.min_schedule_seconds | 60 | Minimum allowed schedule |
pg_trickle.max_consecutive_errors | 3 | Errors before auto-suspending a ST |
pg_trickle.change_buffer_schema | pgtrickle_changes | Schema for change buffer tables |
pg_trickle.max_concurrent_refreshes | 4 | Maximum parallel refresh workers |
pg_trickle.differential_max_change_ratio | 0.15 | Change-to-table-size ratio above which DIFFERENTIAL falls back to FULL |
pg_trickle.cleanup_use_truncate | true | Use TRUNCATE instead of DELETE for change buffer cleanup when the entire buffer is consumed |
pg_trickle.user_triggers | 'auto' | User-defined trigger handling: auto / off (on accepted as deprecated alias for auto) |
pg_trickle.block_source_ddl | false | Block column-affecting DDL on tracked source tables instead of reinit |
pg_trickle.cdc_mode | 'auto' | CDC mechanism: auto / trigger / wal |
pg_trickle.wal_transition_timeout | 300 | Max seconds to wait for WAL decoder catch-up during transition |
pg_trickle.slot_lag_warning_threshold_mb | 100 | Warning threshold for WAL slot retention used by slot_lag_warning and health_check() |
pg_trickle.slot_lag_critical_threshold_mb | 1024 | Critical threshold for WAL slot retention used by check_cdc_health() alerts |
pg_trickle.diamond_consistency | 'atomic' | Diamond dependency consistency mode: atomic or none |
pg_trickle.diamond_schedule_policy | 'fastest' | Schedule policy for atomic diamond groups: fastest or slowest |
pg_trickle.merge_planner_hints | true | Inject SET LOCAL planner hints (disable nestloop, raise work_mem) before MERGE |
pg_trickle.merge_work_mem_mb | 64 | work_mem (MB) applied when delta exceeds 10 000 rows and planner hints enabled |
pg_trickle.use_prepared_statements | true | Use SQL PREPARE/EXECUTE for cached MERGE templates |
Data Flow: End-to-End Refresh
Source Table INSERT/UPDATE/DELETE
│
▼
Hybrid CDC Layer:
┌─────────────────────────────────────────────┐
│ TRIGGER mode: Row-Level AFTER Trigger │
│ pg_trickle_cdc_fn_<oid>() → buffer table │
│ │
│ WAL mode: Logical Replication Slot │
│ wal_decoder bgworker → same buffer table │
│ │
│ ST-to-ST: Refresh engine captures delta │
│ → changes_pgt_<pgt_id> buffer table │
└─────────────────────────────────────────────┘
│
▼
Change Buffer Table
Base tables: pgtrickle_changes.changes_<oid>
ST sources: pgtrickle_changes.changes_pgt_<pgt_id>
Columns: change_id, lsn, action (I/U/D), pk_hash, new_<col>, old_<col> (typed)
│
▼
DVM Engine: generate delta SQL from operator tree
- Scan operator reads from changes_<oid> or changes_pgt_<id>
- Filter/Project/Join transform the deltas
- Aggregate recomputes affected groups
│
▼
Diff Engine: produce (+/-) diff rows
│
▼
Delta Application:
DELETE FROM storage WHERE __pgt_row_id IN (removed)
INSERT INTO storage SELECT ... FROM (added)
│
▼
Frontier Update: advance per-source LSN
│
▼
History Record: log to pgtrickle.pgt_refresh_history
Module Map
src/
├── lib.rs # Extension entry, module declarations, _PG_init
├── bin/
│ └── pgrx_embed.rs# pgrx SQL entity embedding (generated)
├── api/
│ ├── mod.rs # Core lifecycle functions (create/alter/drop/refresh/status)
│ ├── diagnostics.rs # explain_st, explain_refresh_mode, dependency_tree
│ ├── outbox_hook.rs # pg_tide integration hook (attach_outbox, v0.46.0+)
│ ├── snapshot.rs # Stream table snapshots (v0.27.0)
│ ├── self_monitoring.rs # Self-monitoring setup/teardown
│ ├── cluster.rs # cluster_worker_summary
│ ├── publication.rs # Logical publication helpers
│ ├── metrics_ext.rs # Extended Prometheus metrics
│ ├── planner.rs # Schedule recommendation API
│ └── helpers.rs # Shared utilities
├── catalog.rs # Catalog CRUD operations
├── cdc.rs # Change data capture (triggers + WAL transition)
├── config.rs # GUC variable registration
├── dag.rs # Dependency graph (cycle detection, SCC decomposition, topo sort)
├── error.rs # Centralized error types
├── hash.rs # xxHash row ID generation (pg_trickle_hash / pg_trickle_hash_multi)
├── hooks.rs # DDL event trigger handlers (_on_ddl_end, _on_sql_drop)
├── ivm.rs # Transactional IVM (IMMEDIATE mode: statement-level triggers)
├── shmem.rs # Shared memory state (PgTrickleSharedState, DAG_REBUILD_SIGNAL, CACHE_GENERATION)
├── dvm/
│ ├── mod.rs # DVM module root + recursive CTE orchestration
│ ├── parser/ # Query → OpTree converter (modularized, G13-PRF)
│ │ ├── mod.rs # FFI helpers, macros, entry points, tests
│ │ ├── types.rs # OpTree, Expr, Column, AggExpr, etc.
│ │ ├── validation.rs # Volatility, IVM support, IMMEDIATE, monotonicity
│ │ ├── rewrites.rs # SQL rewrite passes (view inlining, grouping sets)
│ │ └── sublinks.rs # SubLink extraction from WHERE clauses
│ ├── diff.rs # Delta SQL generation (CTE delta cache)
│ ├── row_id.rs # Row ID generation
│ └── operators/
│ ├── mod.rs # Operator trait + registry
│ ├── scan.rs # Table scan (CDC passthrough)
│ ├── filter.rs # WHERE clause filtering
│ ├── project.rs # Column projection
│ ├── join.rs # Inner join
│ ├── join_common.rs # Shared join utilities (snapshot subqueries, column disambiguation)
│ ├── outer_join.rs # LEFT/RIGHT outer join
│ ├── full_join.rs # FULL OUTER JOIN (8-part delta)
│ ├── aggregate.rs # GROUP BY + aggregate functions (39 AggFunc variants)
│ ├── distinct.rs # DISTINCT deduplication
│ ├── union_all.rs # UNION ALL merging
│ ├── intersect.rs # INTERSECT / INTERSECT ALL (dual-count LEAST)
│ ├── except.rs # EXCEPT / EXCEPT ALL (dual-count GREATEST)
│ ├── subquery.rs # Subquery / inlined CTE delegation
│ ├── cte_scan.rs # Shared CTE delta (multi-reference)
│ ├── recursive_cte.rs # Recursive CTE (semi-naive + DRed + recomputation)
│ ├── window.rs # Window function (partition recomputation)
│ ├── lateral_function.rs # LATERAL SRF (row-scoped recomputation)
│ ├── lateral_subquery.rs # LATERAL correlated subquery
│ ├── semi_join.rs # EXISTS / IN subquery (semi-join delta)
│ ├── anti_join.rs # NOT EXISTS / NOT IN subquery (anti-join delta)
│ └── scalar_subquery.rs # Correlated scalar subquery in SELECT
├── monitor.rs # Monitoring & observability functions
├── refresh.rs # Refresh orchestration
├── scheduler.rs # Automatic scheduling with canonical periods
├── version.rs # Frontier / LSN tracking
└── wal_decoder.rs # WAL-based CDC (logical replication slot polling, transitions)
Extension Control File (pg_trickle.control)
The pg_trickle.control file in the repository root is required by PostgreSQL's
extension infrastructure. It declares the extension's description, default
version, shared-library path, and privilege requirements. PostgreSQL reads this
file when CREATE EXTENSION pg_trickle; is executed.
During packaging (cargo pgrx package), pgrx replaces the @CARGO_VERSION@
placeholder with the version from Cargo.toml and copies the file into the
target's share/extension/ directory alongside the SQL migration scripts.
Note: The relay binary (
pgtrickle-relay), outbox, and inbox subsystems were extracted to the standalonepg_tideextension in v0.46.0. See § 14 pg_tide Integration and thepg_tiderepository for the relay architecture and deployment guide.
DVM Operators
This document describes the Differential View Maintenance (DVM) operators implemented by pgtrickle. Each operator transforms a stream of row-level changes (deltas) propagated from source tables through the operator tree.
Quick Reference
| Operator | FULL | DIFF | IMMED | Section |
|---|---|---|---|---|
Simple SELECT / projection | ✅ | ✅ | ✅ | Scan & Project |
WHERE filter | ✅ | ✅ | ✅ | Filter |
DISTINCT | ✅ | ✅ | ✅ | Distinct |
INNER JOIN | ✅ | ✅ | ✅ | Joins |
LEFT / RIGHT OUTER JOIN | ✅ | ✅ | ✅ | Joins |
FULL OUTER JOIN | ✅ | ✅ | ✅ | Joins |
LATERAL JOIN | ✅ | ✅ | ✅ | Joins |
| Multi-table join (≥3 right scans) | ✅ | ⚠️ | ⚠️ | Joins |
EXISTS / NOT EXISTS | ✅ | ✅ | ✅ | Subqueries |
| Scalar subquery | ✅ | ✅ | ✅ | Subqueries |
UNION ALL | ✅ | ✅ | ✅ | Set Operations |
INTERSECT / EXCEPT | ✅ | ✅ | ✅ | Set Operations |
COUNT, SUM, AVG | ✅ | ✅ | ✅ | Aggregates |
MIN / MAX | ✅ | ✅ | ✅ | Aggregates |
COUNT(DISTINCT) / SUM(DISTINCT) | ✅ | ✅ | ✅ | Aggregates |
STRING_AGG / ARRAY_AGG | ✅ | ⚠️ | ⚠️ | Aggregates |
JSONB_AGG / JSONB_OBJECT_AGG | ✅ | ⚠️ | ⚠️ | Aggregates |
| Window functions | ✅ | ⚠️ | ⚠️ | Window Functions |
ORDER BY … LIMIT (TopK) | ✅ | ✅ | ✅ | TopK |
HAVING | ✅ | ✅ | ✅ | Having |
GROUP BY ROLLUP / CUBE | ✅ | ✅ | ✅ | Grouping Sets |
| Recursive CTEs | ✅ | ⚠️ | ⚠️ | CTEs |
vector_avg / halfvec_avg | ✅ | ✅ | ✅ | Vector Aggregates |
Legend: ✅ Supported · ⚠️ Partial (see section) · ❌ Not supported
Prior Art
- Budiu, M. et al. (2023). "DBSP: Automatic Incremental View Maintenance." VLDB 2023. (comparison)
- Gupta, A. & Mumick, I.S. (1999). Materialized Views: Techniques, Implementations, and Applications. MIT Press.
- Koch, C. et al. (2014). "DBToaster: Higher-order Delta Processing for Dynamic, Frequently Fresh Views." VLDB Journal.
- PostgreSQL 9.4+ — Materialized views with
REFRESH MATERIALIZED VIEW CONCURRENTLY.
Overview
When a stream table is created, the defining SQL query is parsed into a tree of DVM operators. During an differential refresh, changes flow bottom-up through this tree:
Aggregate
│
Project
│
Filter
│
┌───────┴───────┐
Join │
┌─┴─┐ │
Scan(A) Scan(B) Scan(C)
Each operator implements a differentiation rule: given the delta (Δ) to its input(s), it produces the corresponding delta to its output. This is conceptually similar to automatic differentiation in calculus.
The general contract:
- Input: a set of
('+', row)and('-', row)tuples (inserts and deletes) - Output: a set of
('+', row)and('-', row)tuples
Updates are modeled as a delete of the old row followed by an insert of the new row.
DIFFERENTIAL and IMMEDIATE maintenance require deterministic expressions. VOLATILE functions and custom operators such as random() or clock_timestamp() are rejected during stream table creation because re-evaluation would corrupt delta semantics. STABLE functions such as now() and current_timestamp are allowed with a warning; FULL mode accepts all volatility classes because it recomputes the full result on each refresh.
Operator Support Matrix
The following table shows which SQL constructs are supported under each refresh mode.
| SQL Construct | FULL | DIFFERENTIAL | IMMEDIATE | Notes |
|---|---|---|---|---|
| Basic | ||||
Simple SELECT / projection | ✅ | ✅ | ✅ | |
WHERE filter | ✅ | ✅ | ✅ | |
| Column expressions / aliases | ✅ | ✅ | ✅ | |
DISTINCT | ✅ | ✅ | ✅ | Uses __pgt_dup_count reference counting |
DISTINCT ON | ✅ | ✅ | ✅ | |
| Joins | ||||
INNER JOIN | ✅ | ✅ | ✅ | Hybrid delta strategy |
LEFT OUTER JOIN | ✅ | ✅ | ✅ | NULL-padding transitions tracked |
RIGHT OUTER JOIN | ✅ | ✅ | ✅ | |
FULL OUTER JOIN | ✅ | ✅ | ✅ | 8-part UNION ALL delta |
CROSS JOIN | ✅ | ✅ | ✅ | |
LATERAL JOIN | ✅ | ✅ | ✅ | Row-scoped re-execution |
| Multi-table join (≤2 right scans) | ✅ | ✅ | ✅ | Full phantom-row-after-DELETE fix |
| Multi-table join (≥3 right scans) | ✅ | ⚠️ | ⚠️ | Falls back to post-change snapshot for right subtree (EC-01 boundary, fix planned for v0.12.0) |
| Subqueries | ||||
EXISTS / IN (semi-join) | ✅ | ✅ | ✅ | Delta-key pre-filter on left side |
NOT EXISTS / NOT IN (anti-join) | ✅ | ✅ | ✅ | Inverted semantics; two-part delta |
| Scalar subquery (SELECT-list) | ✅ | ✅ | ✅ | Pre/post snapshot EXCEPT ALL diff |
Correlated LATERAL subquery | ✅ | ✅ | ✅ | |
| Set Operations | ||||
UNION ALL | ✅ | ✅ | ✅ | Dual-branch merge |
INTERSECT / INTERSECT ALL | ✅ | ✅ | ✅ | Dual-count tracking |
EXCEPT / EXCEPT ALL | ✅ | ✅ | ✅ | |
| Aggregates | ||||
COUNT, SUM, AVG | ✅ | ✅ | ✅ | Algebraic — fully invertible delta |
MIN, MAX | ✅ | ✅ | ✅ | Semi-algebraic — group rescan on ambiguous delete |
COUNT(DISTINCT), SUM(DISTINCT) | ✅ | ✅ | ✅ | Algebraic via auxiliary columns |
BOOL_AND, BOOL_OR, BIT_AND, BIT_OR | ✅ | ✅ | ✅ | Algebraic via auxiliary columns |
EVERY | ✅ | ✅ | ✅ | Algebraic via auxiliary columns |
STRING_AGG, ARRAY_AGG | ✅ | ⚠️ | ⚠️ | Group-rescan strategy — warning emitted at creation time in DIFFERENTIAL mode |
STDDEV, VARIANCE, STDDEV_POP, VAR_POP | ✅ | ✅ | ✅ | Algebraic via auxiliary M2/sum/count columns |
COVAR_SAMP, COVAR_POP, CORR | ✅ | ✅ | ✅ | Algebraic via auxiliary columns |
REGR_* (all 9 regression functions) | ✅ | ✅ | ✅ | Algebraic via auxiliary columns |
PERCENTILE_CONT, PERCENTILE_DISC | ✅ | ⚠️ | ⚠️ | Group-rescan strategy |
MODE | ✅ | ⚠️ | ⚠️ | Group-rescan strategy |
XMLAGG, JSON_AGG, JSONB_AGG | ✅ | ⚠️ | ⚠️ | Group-rescan strategy |
JSON_OBJECT_AGG, JSONB_OBJECT_AGG | ✅ | ⚠️ | ⚠️ | Group-rescan strategy |
GROUP BY / HAVING | ✅ | ✅ | ✅ | |
GROUP BY ROLLUP / CUBE / GROUPING SETS | ✅ | ✅ | ✅ | Branch count capped by max_grouping_set_branches (default 64) |
| Window Functions | ||||
ROW_NUMBER, RANK, DENSE_RANK | ✅ | ✅ | ✅ | Partition-scoped recompute |
LAG, LEAD, FIRST_VALUE, LAST_VALUE | ✅ | ✅ | ✅ | Partition-scoped recompute |
NTILE, CUME_DIST, PERCENT_RANK | ✅ | ✅ | ✅ | Partition-scoped recompute |
Window frame clauses (ROWS, RANGE, GROUPS) | ✅ | ✅ | ✅ | |
| CTEs | ||||
Non-recursive WITH | ✅ | ✅ | ✅ | Inlined or delta-cached (multi-ref) |
WITH RECURSIVE (INSERT-only workload) | ✅ | ✅ | ✅ | Semi-naive evaluation |
WITH RECURSIVE (mixed INSERT/DELETE/UPDATE) | ✅ | ✅ | ✅ | Delete-and-Rederive (DRed) strategy |
| TopK | ||||
ORDER BY … LIMIT N | ✅ | ✅ | ✅ | Scoped recomputation; metadata validated each refresh |
ORDER BY … LIMIT N OFFSET M | ✅ | ✅ | ✅ | |
| Lateral / SRF | ||||
LATERAL with set-returning function | ✅ | ✅ | ✅ | Row-scoped re-execution |
JSON_TABLE | ✅ | ✅ | ✅ | Via lateral function operator |
generate_series() | ✅ | ✅ | ✅ | |
unnest() | ✅ | ✅ | ✅ | |
| ST-to-ST Dependencies | ||||
| Stream table reading from another stream table | ✅ | ✅ | ✅ | Differential via changes_pgt_ buffers (v0.11.0); FULL upstream produces I/D diff so downstream stays differential |
| Multi-level ST chains | ✅ | ✅ | ✅ | Topological order; per-level delta propagation |
| Function Volatility | ||||
IMMUTABLE functions | ✅ | ✅ | ✅ | |
STABLE functions (now(), current_timestamp) | ✅ | ⚠️ | ⚠️ | Allowed with warning — value may differ between initial load and delta evaluation |
VOLATILE functions (random(), clock_timestamp()) | ✅ | ❌ | ❌ | Rejected at creation time — re-evaluation corrupts delta semantics |
Legend: ✅ = fully supported — ⚠️ = supported with caveats (see Notes column) — ❌ = not supported (blocked at creation time)
Operators
Scan
Module: src/dvm/operators/scan.rs
The leaf operator. Reads CDC changes from a source table's change buffer.
Delta Rule:
$$\Delta(\text{Scan}(R)) = \Delta R$$
The scan operator is a direct passthrough — inserts in the source become inserts in the output, deletes become deletes.
SQL Generation:
SELECT op, row_data FROM pgtrickle_changes.changes_<oid>
WHERE xid >= <last_consumed_xid>
Notes:
- Each source table has a dedicated change buffer table created by the CDC module.
- Row data is stored as JSONB with column names as keys.
- The
__pgt_row_idcolumn (xxHash of primary key) is included for deduplication.
Filter
Module: src/dvm/operators/filter.rs
Applies a WHERE clause predicate to the delta stream.
Delta Rule:
$$\Delta(\sigma_p(R)) = \sigma_p(\Delta R)$$
Filtering is applied to the deltas in the same way as to the base data — only rows satisfying the predicate pass through.
SQL Generation:
SELECT * FROM (<input_delta>) AS d
WHERE <predicate>
Example:
If the defining query is:
SELECT * FROM orders WHERE status = 'shipped'
And a new row (id=5, status='pending') is inserted, it does not appear in the delta output. If (id=3, status='shipped') is inserted, it passes through.
Edge Cases:
- For updates that change the predicate column (e.g.,
statusfrom'pending'to'shipped'), the CDC produces a delete of the old row and insert of the new row. The filter passes the insert (matches) and blocks the delete (doesn't match the old row against the predicate), correctly resulting in a net insert.
Project
Module: src/dvm/operators/project.rs
Applies column projection from the target list.
Delta Rule:
$$\Delta(\pi_L(R)) = \pi_L(\Delta R)$$
Projects the same columns from the delta that the query projects from the base data.
SQL Generation:
SELECT <target_columns> FROM (<input_delta>) AS d
Notes:
- Projection is applied after filtering for efficiency.
- Computed expressions in the target list (e.g.,
price * quantity AS total) are evaluated on the delta rows.
Join (Inner)
Module: src/dvm/operators/join.rs
Implements inner join between two inputs.
Delta Rule:
For $R \bowtie S$:
$$\Delta(R \bowtie S) = (\Delta R \bowtie S) \cup (R' \bowtie \Delta S)$$
Where $R' = R \cup \Delta R$ (the new state of R after applying deltas).
In practice, when only one side has changes (common case), the delta join simplifies to joining the changed rows against the current state of the other side.
SQL Generation:
-- Changes to left side joined with current right side
SELECT '+' AS op, l.*, r.*
FROM (<left_delta> WHERE op = '+') AS l
JOIN <right_table> AS r ON <join_condition>
UNION ALL
-- Current left side joined with changes to right side
SELECT '+' AS op, l.*, r.*
FROM <left_table> AS l
JOIN (<right_delta> WHERE op = '+') AS r ON <join_condition>
(And corresponding DELETE queries for op = '-'.)
Notes:
- The join uses the current state of the non-changed side, not the change buffer.
- For equi-joins, this is efficient — the join key narrows the scan.
- Non-equi joins (theta joins) may require broader scans.
Outer Join
Module: src/dvm/operators/outer_join.rs (LEFT JOIN), src/dvm/operators/full_join.rs (FULL JOIN)
Implements LEFT, RIGHT, and FULL OUTER JOIN.
RIGHT JOIN Handling:
RIGHT JOIN is automatically converted to a LEFT JOIN with swapped left/right operands during query parsing. This normalization happens transparently — the user can write RIGHT JOIN and the parser rewrites it to an equivalent LEFT JOIN before the operator tree is constructed.
Delta Rule:
Similar to inner join, but additionally handles NULL-padded rows:
$$\Delta(R \text{ LEFT JOIN } S) = (\Delta R \bowtie_L S) \cup (R' \bowtie_L \Delta S)$$
With special handling for:
- Rows in ΔR that have no match in S → emit
('+', row, NULLs) - Rows in ΔS that create a first match for an R row → emit
('-', row, NULLs)and('+', row, s_data) - Rows in ΔS that remove the last match for an R row → emit
('-', row, s_data)and('+', row, NULLs)
SQL Generation (LEFT JOIN):
Uses anti-join detection (via NOT EXISTS) to correctly handle the NULL padding transitions.
FULL OUTER JOIN Delta Rule:
FULL OUTER JOIN extends the LEFT JOIN delta with symmetric right-side handling. The delta is computed as an 8-part UNION ALL:
- Parts 1–5: Same as LEFT JOIN delta (inserted/deleted rows from both sides, with NULL-padding transitions)
- Parts 6–7: Symmetric anti-join transitions for the right side (rows in ΔL that remove/create the last/first match for an S row)
- Part 8: Right-side insertions that have no match in the left side → emit
('+', NULLs, s_data)
Each part uses pre-computed delta flags (__has_ins_*, __has_del_*) to efficiently detect first-match/last-match transitions without redundant subqueries.
Nested Join Support:
Module: src/dvm/operators/join_common.rs
All join operators (inner, left, full) support nested children — i.e., a join whose left or right operand is itself another join. The join_common module provides shared helpers:
build_snapshot_sql()— returns the table reference for simple (Scan) operands, or a parenthesized subquery with disambiguated columns for nested join operandsrewrite_join_condition()— rewrites column references in ON conditions to use the correct alias prefixes for nested children (e.g.,o.cust_id→dl.o__cust_id)
This enables queries with 3 or more joined tables, e.g.:
SELECT o.id, c.name, p.title
FROM orders o
JOIN customers c ON o.cust_id = c.id
JOIN products p ON o.prod_id = p.id
Limitations:
- FULL OUTER JOIN delta computation can be expensive due to dual-side NULL tracking (8 UNION ALL parts).
- Performance degrades with high-cardinality join keys.
NATURAL JOINis supported — common columns are resolved automatically and synthesized into an explicit equi-join condition.- EC-01 pre-change snapshot boundary (SF-5): The phantom-row-after-DELETE
fix (EC-01) uses
EXCEPT ALLto reconstruct the pre-change state of a join side. This is limited to join subtrees with ≤ 2 scan nodes to avoid PostgreSQL temporary file exhaustion on wide join chains. For queries with ≥ 3 base tables on one side of a join (e.g. TPC-H Q7/Q8/Q9), a simultaneous DELETE on both join sides may leave a phantom row in the stream table until the next full refresh. Seeuse_pre_change_snapshot()injoin_common.rsfor the full rationale.
Aggregate
Module: src/dvm/operators/aggregate.rs
Handles GROUP BY with aggregate functions (COUNT, SUM, AVG, MIN, MAX, BOOL_AND, BOOL_OR, STRING_AGG, ARRAY_AGG, JSON_AGG, JSONB_AGG, BIT_AND, BIT_OR, BIT_XOR, JSON_OBJECT_AGG, JSONB_OBJECT_AGG, STDDEV_POP, STDDEV_SAMP, VAR_POP, VAR_SAMP, MODE, PERCENTILE_CONT, PERCENTILE_DISC, JSON_ARRAYAGG, JSON_OBJECTAGG) and the FILTER (WHERE …) and WITHIN GROUP (ORDER BY …) clauses.
Delta Rule:
$$\Delta(\gamma_{G, \text{agg}}(R)) = \gamma_{G, \text{agg}}(R' \text{ WHERE } G \in \text{affected_keys}) - \gamma_{G, \text{agg}}(R \text{ WHERE } G \in \text{affected_keys})$$
Where:
- $G$ = grouping columns
affected_keys= the set of group key values that appear in ΔR- $R'$ = $R \cup \Delta R$ (the new state)
Strategy:
- Identify affected groups — Collect all group key values that appear in the delta (either inserted or deleted rows).
- Recompute old values — Query the storage table for current aggregate values of affected groups.
- Recompute new values — Query the updated source for new aggregate values of affected groups.
- Diff — For each affected group:
- If old exists and new differs → emit
('-', old)and('+', new) - If old exists and new is gone → emit
('-', old)(group eliminated) - If no old and new exists → emit
('+', new)(new group appeared)
- If old exists and new differs → emit
Supported Aggregate Functions:
| Function | DVM Strategy | Notes |
|---|---|---|
COUNT(*) | Algebraic | Fully differential |
COUNT(expr) | Algebraic | Fully differential |
SUM(expr) | Algebraic | Fully differential |
AVG(expr) | Algebraic | Decomposed to SUM/COUNT internally |
MIN(expr) | Semi-algebraic | Uses LEAST merge; falls back to per-group rescan when min row is deleted |
MAX(expr) | Semi-algebraic | Uses GREATEST merge; falls back to per-group rescan when max row is deleted |
BOOL_AND(expr) | Group-rescan | Affected groups are re-aggregated from source data |
BOOL_OR(expr) | Group-rescan | Affected groups are re-aggregated from source data |
STRING_AGG(expr, sep) | Group-rescan | Affected groups are re-aggregated from source data |
ARRAY_AGG(expr) | Group-rescan | Affected groups are re-aggregated from source data |
JSON_AGG(expr) | Group-rescan | Affected groups are re-aggregated from source data |
JSONB_AGG(expr) | Group-rescan | Affected groups are re-aggregated from source data |
BIT_AND(expr) | Group-rescan | Affected groups are re-aggregated from source data |
BIT_OR(expr) | Group-rescan | Affected groups are re-aggregated from source data |
BIT_XOR(expr) | Group-rescan | Affected groups are re-aggregated from source data |
JSON_OBJECT_AGG(key, value) | Group-rescan | Affected groups are re-aggregated from source data |
JSONB_OBJECT_AGG(key, value) | Group-rescan | Affected groups are re-aggregated from source data |
STDDEV_POP(expr) / STDDEV(expr) | Group-rescan | Affected groups are re-aggregated from source data |
STDDEV_SAMP(expr) | Group-rescan | Affected groups are re-aggregated from source data |
VAR_POP(expr) | Group-rescan | Affected groups are re-aggregated from source data |
VAR_SAMP(expr) / VARIANCE(expr) | Group-rescan | Affected groups are re-aggregated from source data |
MODE() WITHIN GROUP (ORDER BY expr) | Group-rescan | Ordered-set aggregate; affected groups re-aggregated |
PERCENTILE_CONT(frac) WITHIN GROUP (ORDER BY expr) | Group-rescan | Ordered-set aggregate; affected groups re-aggregated |
PERCENTILE_DISC(frac) WITHIN GROUP (ORDER BY expr) | Group-rescan | Ordered-set aggregate; affected groups re-aggregated |
CORR(Y, X) | Group-rescan | Regression aggregate; affected groups re-aggregated |
COVAR_POP(Y, X) | Group-rescan | Regression aggregate; affected groups re-aggregated |
COVAR_SAMP(Y, X) | Group-rescan | Regression aggregate; affected groups re-aggregated |
REGR_AVGX(Y, X) | Group-rescan | Regression aggregate; affected groups re-aggregated |
REGR_AVGY(Y, X) | Group-rescan | Regression aggregate; affected groups re-aggregated |
REGR_COUNT(Y, X) | Group-rescan | Regression aggregate; affected groups re-aggregated |
REGR_INTERCEPT(Y, X) | Group-rescan | Regression aggregate; affected groups re-aggregated |
REGR_R2(Y, X) | Group-rescan | Regression aggregate; affected groups re-aggregated |
REGR_SLOPE(Y, X) | Group-rescan | Regression aggregate; affected groups re-aggregated |
REGR_SXX(Y, X) | Group-rescan | Regression aggregate; affected groups re-aggregated |
REGR_SXY(Y, X) | Group-rescan | Regression aggregate; affected groups re-aggregated |
REGR_SYY(Y, X) | Group-rescan | Regression aggregate; affected groups re-aggregated |
ANY_VALUE(expr) | Group-rescan | PostgreSQL 16+; affected groups re-aggregated |
JSON_ARRAYAGG(expr ...) | Group-rescan | SQL-standard JSON aggregation (PostgreSQL 16+); full deparsed SQL preserved |
JSON_OBJECTAGG(key: value ...) | Group-rescan | SQL-standard JSON aggregation (PostgreSQL 16+); full deparsed SQL preserved |
User-defined aggregates (CREATE AGGREGATE) | Group-rescan | Any custom aggregate is supported via group-rescan; full aggregate call SQL preserved verbatim |
FILTER Clause:
All aggregate functions support the FILTER (WHERE …) clause:
SELECT COUNT(*) FILTER (WHERE status = 'active') AS active_count FROM orders GROUP BY region
The filter predicate is applied within the delta computation — only rows matching the filter contribute to the aggregate delta. Filtered aggregates are excluded from the P5 direct-bypass optimization.
SQL Generation:
The aggregate operator uses a 3-CTE pipeline:
- Merge CTE — Joins affected group keys against old (storage) and new (source) aggregate values, producing
__pgt_meta_action('I' for new-only groups, 'D' for disappeared groups, 'U' for changed groups). - LATERAL VALUES expansion — A single-pass
LATERAL (VALUES ...)clause expands each merge row into insert and delete actions, avoiding a 4-branch UNION ALL:
FROM merge_cte m,
LATERAL (VALUES
('I', m.new_count, m.new_total),
('D', m.old_count, m.old_total)
) v(action, count_val, val_total)
WHERE (m.__pgt_meta_action = 'I' AND v.action = 'I')
OR (m.__pgt_meta_action = 'D' AND v.action = 'D')
OR (m.__pgt_meta_action = 'U')
- Final projection — Emits
('+', row)and('-', row)tuples for the refresh engine.
MIN/MAX Merge Strategy:
MIN and MAX use a semi-algebraic strategy with two cases:
-
Non-extremum deletion — When the deleted row is NOT the current minimum (or maximum), the merge uses
LEAST(old_value, new_inserts)for MIN orGREATEST(old_value, new_inserts)for MAX. This is fully algebraic and requires no rescan. -
Extremum deletion — When the row holding the current minimum (or maximum) IS deleted, the new value cannot be computed from the delta alone. The merge expression returns
NULLas a sentinel, which triggers the change-detection guard (IS DISTINCT FROM) to emit the group for re-aggregation. The MERGE layer treats this as a DELETE + INSERT pair, recomputing the group from source data. This is still more efficient than a full table refresh since only affected groups are rescanned.
Distinct
Module: src/dvm/operators/distinct.rs
Implements SELECT DISTINCT using reference counting.
Delta Rule:
$$\Delta(\delta(R)) = { r \in \Delta R : \text{count}(r, R) = 0 \land \text{count}(r, R') > 0 } - { r \in \Delta R : \text{count}(r, R) > 0 \land \text{count}(r, R') = 0 }$$
In other words:
- A row enters the output when its count transitions from 0 to ≥1
- A row leaves the output when its count transitions from ≥1 to 0
Strategy:
Maintains a hidden __pgt_dup_count column in the storage table to track how many times each distinct row appears in the pre-distinct input.
- On insert: increment count. If count was 0, emit
('+', row). - On delete: decrement count. If count becomes 0, emit
('-', row).
Notes:
- The duplicate count is not visible in user queries against the storage table (projected away by the view layer).
- Duplicate counting uses
__pgt_row_id(xxHash) for efficient lookups.
Union All
Module: src/dvm/operators/union_all.rs
Merges deltas from two branches.
Delta Rule:
$$\Delta(R \cup_{\text{all}} S) = \Delta R \cup_{\text{all}} \Delta S$$
Simply concatenates the delta streams from both branches.
SQL Generation:
SELECT * FROM (<left_delta>)
UNION ALL
SELECT * FROM (<right_delta>)
Notes:
- Column count and types must match between branches.
- Each branch is independently processed through its own operator sub-tree.
- This is the simplest operator since
UNION ALLpreserves all duplicates.
Intersect
Module: src/dvm/operators/intersect.rs
Implements INTERSECT and INTERSECT ALL using dual-count per-branch multiplicity tracking.
Delta Rule:
$$\Delta(R \cap S): \text{emit rows where } \min(\text{count}_L, \text{count}_R) \text{ crosses the 0 boundary}$$
- INTERSECT (set): a row is present when both branches contain it.
- INTERSECT ALL (bag): a row appears $\min(\text{count}_L, \text{count}_R)$ times.
SQL Generation (3-CTE chain):
- Delta CTE — tags rows from left/right child deltas with branch indicator (
'L'/'R') and computes per-row net_count. - Merge CTE — joins with the storage table to compute old and new per-branch counts (
__pgt_count_l,__pgt_count_r). - Final CTE — detects boundary crossings using
LEAST(old_count_l, old_count_r)vsLEAST(new_count_l, new_count_r).
Notes:
- Storage table requires hidden columns
__pgt_count_land__pgt_count_rfor multiplicity tracking. - Both set and bag variants use the same 3-CTE structure; only the boundary logic stays the same (both use LEAST).
Except
Module: src/dvm/operators/except.rs
Implements EXCEPT and EXCEPT ALL using dual-count per-branch multiplicity tracking.
Delta Rule:
$$\Delta(R - S): \text{emit rows where } \max(0, \text{count}_L - \text{count}_R) \text{ crosses the 0 boundary}$$
- EXCEPT (set): a row is present when it exists in the left but not the right branch.
- EXCEPT ALL (bag): a row appears $\max(0, \text{count}_L - \text{count}_R)$ times.
SQL Generation (3-CTE chain):
- Delta CTE — same as Intersect: tags rows from both child deltas with branch indicator.
- Merge CTE — joins with storage table for old/new per-branch counts.
- Final CTE — detects boundary crossings using
GREATEST(0, old_count_l - old_count_r)vsGREATEST(0, new_count_l - new_count_r).
Notes:
- EXCEPT is not commutative — left branch is the positive input, right is subtracted.
- Storage table requires hidden columns
__pgt_count_land__pgt_count_r. - Same 3-CTE structure as Intersect with different effective-count function.
Subquery
Module: src/dvm/operators/subquery.rs
Handles both inlined CTEs and explicit subqueries in FROM ((SELECT ...) AS alias).
Delta Rule:
$$\Delta(\rho_{\text{alias}}(Q)) = \rho_{\text{alias}}(\Delta Q)$$
A subquery wrapper is transparent for differentiation — it delegates to its child's delta and optionally renames output columns to match the subquery's column aliases.
SQL Generation:
-- If column aliases differ from child output columns:
SELECT __pgt_row_id, __pgt_action, child_col1 AS alias_col1, child_col2 AS alias_col2
FROM (<child_delta>)
If the child columns already match the aliases, the subquery is a pure passthrough — no additional CTE is emitted.
Notes:
- This operator enables both CTE support (Tier 1) and standalone subqueries in FROM.
- Column aliases on subqueries (
FROM (...) AS x(a, b)) are handled by emitting a thin renaming CTE. - The subquery body is fully differentiated as a normal operator sub-tree.
CTE Scan (Shared Delta)
Module: src/dvm/operators/cte_scan.rs
Handles multi-reference CTEs by computing the CTE body's delta once and reusing it across all references (Tier 2).
Delta Rule:
$$\Delta(\text{CteScan}(\text{id}, Q)) = \text{cache}[\text{id}] \quad \text{(computed once, reused)}$$
When a CTE is referenced multiple times in a query, each reference produces a CteScan node with the same cte_id. The diff engine differentiates the CTE body once and caches the result. Subsequent CteScan nodes for the same CTE reuse the cached delta.
SQL Generation:
-- First reference: differentiates the CTE body and stores result in cache
-- Subsequent references: point to the same system CTE name
SELECT __pgt_row_id, __pgt_action, <columns>
FROM __pgt_cte_<cte_name>_delta -- shared across all references
If column aliases are present, a thin renaming CTE is added on top of the cached delta.
Notes:
- Without CteScan (Tier 1), multi-reference CTEs are inlined: each reference duplicates the full operator sub-tree. CteScan (Tier 2) eliminates this duplication.
- The CTE body is pre-differentiated in dependency order (earlier CTEs before later ones that reference them).
- Column alias support follows the same pattern as the Subquery operator.
Recursive CTEs
Recursive CTEs (WITH RECURSIVE) are supported in FULL, DIFFERENTIAL, and IMMEDIATE modes, with different execution paths depending on the refresh mode:
FULL Mode
Recursive CTEs work out-of-the-box with refresh_mode = 'FULL'. The defining query is executed as-is via INSERT INTO ... SELECT ..., and PostgreSQL handles the iterative evaluation internally.
DIFFERENTIAL Mode (Three-Strategy Incremental Maintenance)
Recursive CTEs with refresh_mode = 'DIFFERENTIAL' use an automatic three-strategy approach, selected based on column compatibility and change type:
Strategy 1: Semi-Naive Evaluation (INSERT-only changes)
When only INSERT changes are present in the change buffer, pg_trickle uses semi-naive evaluation — the standard technique for incremental fixpoint computation. The base case is differentiated normally through the DVM operator tree, then the resulting delta is propagated through the recursive term using a nested WITH RECURSIVE:
WITH RECURSIVE
__pgt_base_delta AS (
-- Normal DVM differentiation of the base case (INSERT rows only)
<differentiated base case>
),
__pgt_rec_delta AS (
-- Seed: base case delta rows
SELECT cols FROM __pgt_base_delta WHERE __pgt_action = 'I'
UNION ALL
-- Seed: new base rows joining existing ST storage
SELECT cols FROM <recursive term with self_ref = ST_storage, base = change_buffer>
UNION ALL
-- Propagation: recursive term applied to growing delta
SELECT cols FROM <recursive term with self_ref = __pgt_rec_delta, base = full>
)
SELECT pgtrickle.pg_trickle_hash(...) AS __pgt_row_id, 'I' AS __pgt_action, cols
FROM __pgt_rec_delta
The cost is proportional to the number of new rows produced by the change, not the full result set.
Strategy 2: Delete-and-Rederive / DRed (mixed INSERT/DELETE/UPDATE changes)
When the change buffer contains DELETE or UPDATE changes, simple propagation is insufficient — a deleted base row may have transitively derived many recursive rows, some of which may still be derivable from alternative paths. DRed handles this in four phases:
- Insert propagation — semi-naive evaluation for the INSERT portion (same as Strategy 1)
- Over-deletion cascade — propagate base-case deletions through the recursive term against ST storage to find all transitively-derived rows that might be invalidated
- Rederivation — re-execute the recursive CTE from the remaining (non-deleted) base rows to restore any over-deleted rows that have alternative derivations
- Combine — final delta = inserts + (over-deletions − rederived rows)
This avoids full recomputation while correctly handling deletions with alternative derivation paths.
IMMEDIATE Mode
Recursive CTEs with refresh_mode = 'IMMEDIATE' use the same semi-naive and
Delete-and-Rederive machinery as DIFFERENTIAL mode, but the base changes come
from PostgreSQL statement transition tables instead of the background change
buffer. This keeps the stream table transactionally up to date within the same
statement. To guard against cyclic data or unexpectedly deep recursion, the
semi-naive SQL injects a depth counter capped by
pg_trickle.ivm_recursive_max_depth (default 100; set to 0 to disable the
guard).
Strategy 3: Recomputation Fallback
When the CTE defines more columns than the outer SELECT projects (column mismatch), the incremental strategies cannot be used because the ST storage table lacks columns needed for recursive self-joins. In this case, the full defining query is re-executed and anti-joined against current storage:
WITH __pgt_recomp_new AS (
SELECT pgtrickle.pg_trickle_hash(row_to_json(sub)::text) AS __pgt_row_id, col1, col2, ...
FROM (<defining_query>) sub
),
__pgt_recomp_ins AS (
SELECT n.__pgt_row_id, 'I'::text AS __pgt_action, n.col1, n.col2, ...
FROM __pgt_recomp_new n
LEFT JOIN <storage_table> s ON s.__pgt_row_id = n.__pgt_row_id
WHERE s.__pgt_row_id IS NULL
),
__pgt_recomp_del AS (
SELECT s.__pgt_row_id, 'D'::text AS __pgt_action, s.col1, s.col2, ...
FROM <storage_table> s
LEFT JOIN __pgt_recomp_new n ON n.__pgt_row_id = s.__pgt_row_id
WHERE n.__pgt_row_id IS NULL
)
SELECT * FROM __pgt_recomp_ins
UNION ALL
SELECT * FROM __pgt_recomp_del
The cost is proportional to the full result set size.
Strategy Selection
| CTE columns match ST? | Change type | refresh_mode / DeltaSource | Strategy |
|---|---|---|---|
| ✅ Match | INSERT-only | DIFFERENTIAL (ChangeBuffer) | Semi-naive (Strategy 1) |
| Match | Mixed (INSERT+DELETE/UPDATE) | DIFFERENTIAL (ChangeBuffer) | DRed (Strategy 2) |
| Match | INSERT-only | IMMEDIATE (TransitionTable) | Semi-naive (Strategy 1) |
| Match | Mixed (INSERT+DELETE/UPDATE) | IMMEDIATE (TransitionTable) | DRed (Strategy 2) |
| Mismatch | Any | Any | Recomputation (Strategy 3) |
DRed in DIFFERENTIAL mode (P2-1 -- implemented in v0.10.0)
DRed is now active in both DIFFERENTIAL and IMMEDIATE modes when CTE output columns match ST storage columns. Phase 1 propagates inserts via semi-naive evaluation; Phase 2 cascades deletions through ST storage; Phase 3 rederives over-deleted rows that have alternative derivation paths; Phase 4 combines the results. DRed correctly handles derived-column changes such as path rebuilds under a renamed ancestor node. Column-mismatch cases still use recomputation fallback.
Notes:
- Non-linear recursion (multiple self-references in the recursive term) is rejected — PostgreSQL restricts the recursive term to reference the CTE at most once.
- The
__pgt_row_idcolumn (xxHash of the JSON-serialized row) is used for row identity. - For write-heavy workloads on very large recursive result sets with frequent mixed changes,
refresh_mode = 'FULL'may still be more efficient than DRed.
Window Functions
Module: src/dvm/operators/window.rs
Handles window functions (ROW_NUMBER, RANK, DENSE_RANK, SUM() OVER, etc.) using partition-based recomputation.
Delta Rule:
When any row in a partition changes (insert, update, or delete), the entire partition's window function output is recomputed:
$$\Delta(\omega_{f, P}(R)) = \omega_{f, P}(R'|{\text{affected partitions}}) - \omega{f, P}(R|_{\text{affected partitions}})$$
Where $P$ is the PARTITION BY key and $f$ is the window function.
Strategy:
- Identify affected partition keys from the child delta.
- Delete old window function results for affected partitions from storage.
- Build the current input for affected partitions by excluding changed rows via NOT EXISTS on pass-through columns.
- Recompute the window function on the current input for affected partitions.
- Compute unique row IDs via
row_to_json+row_number(handles tied values in ranking functions). - Emit the recomputed rows as inserts.
SQL Generation:
-- CTE 1: Affected partition keys from delta
WITH affected_partitions AS (
SELECT DISTINCT <partition_cols> FROM (<child_delta>)
),
-- CTE 2: Current input (surviving rows not in delta) for affected partitions
current_input AS (
SELECT * FROM <child_snapshot>
WHERE (<partition_cols>) IN (SELECT * FROM affected_partitions)
AND NOT EXISTS (
SELECT 1 FROM (<child_delta>) d
WHERE d.<col1> IS NOT DISTINCT FROM <child_alias>.<col1>
AND d.<col2> IS NOT DISTINCT FROM <child_alias>.<col2> ...
)
),
-- CTE 3: Recompute window function with unique row IDs
recomputed AS (
SELECT *, pgtrickle.pg_trickle_hash(
row_to_json(w)::text || '/' || row_number() OVER ()::text
) AS __pgt_row_id
FROM (
SELECT *, <window_func> OVER (PARTITION BY <partition_cols> ORDER BY <order_cols>) AS <alias>
FROM current_input
) w
)
-- Delete old results + insert recomputed results
SELECT 'D' AS __pgt_action, ... -- old rows from affected partitions
UNION ALL
SELECT 'I' AS __pgt_action, ... -- recomputed rows
Notes:
- The cost is proportional to the size of affected partitions, not the full table. For workloads where changes spread across few partitions, this is efficient.
- When multiple window functions use different PARTITION BY clauses, the parser accepts all of them. If they share the same partition key it is used directly; otherwise the operator falls back to un-partitioned (full) recomputation.
- Without PARTITION BY, the entire table is treated as a single partition — any change triggers a full recomputation.
- Window functions wrapping aggregates (e.g.,
RANK() OVER (ORDER BY SUM(x))) are supported: the window diff rewrites ORDER BY / PARTITION BY expressions to reference aggregate output aliases viabuild_agg_alias_map. - Row IDs are computed from the full row content (
row_to_json) plus a positional disambiguator (row_number) to avoid hash collisions with tied ranking values (DENSE_RANK, RANK).
Known Limitation: O(partition_size) Recomputation Cost
Any single-row change within a window partition triggers recomputation of the entire partition. For queries with large partitions (e.g.,
PARTITION BY regionwhere a region has 500K rows), a single INSERT into that partition causes all 500K rows to be recomputed and diffed. This is inherent to the partition-based delta strategy — window functions cannot be incrementally maintained at sub-partition granularity because a single row insertion can shift the rank, row number, or running aggregate of every other row in the same partition.Mitigation strategies:
- Use more granular
PARTITION BYkeys to keep partition sizes small.- For queries without
PARTITION BY, consider restructuring as aGROUP BYaggregate if the window function is equivalent (e.g.,SUM(x) OVER ()→SUM(x)as a scalar subquery).- Accept the cost for low-change-frequency partitions; the recomputation is still cheaper than a full table refresh since only affected partitions are touched.
- If partition sizes routinely exceed 100K rows and changes are frequent, consider the FULL refresh mode which bypasses the per-partition delta entirely.
Window Frame Clauses:
Window frame specifications are fully supported:
- Modes:
ROWS,RANGE,GROUPS - Bounds:
UNBOUNDED PRECEDING,N PRECEDING,CURRENT ROW,N FOLLOWING,UNBOUNDED FOLLOWING - Between syntax:
BETWEEN <start> AND <end> - Exclusion:
EXCLUDE CURRENT ROW,EXCLUDE GROUP,EXCLUDE TIES,EXCLUDE NO OTHERS
Example: SUM(val) OVER (ORDER BY ts ROWS BETWEEN 3 PRECEDING AND CURRENT ROW)
Named WINDOW Clauses:
Named window definitions are resolved from the query-level WINDOW clause:
SELECT id, SUM(val) OVER w, AVG(val) OVER w
FROM data
WINDOW w AS (PARTITION BY category ORDER BY ts)
The parser resolves OVER w by looking up the window definition from the WINDOW clause and merging partition, order, and frame specifications.
Lateral Function (Set-Returning Functions in FROM)
Module: src/dvm/operators/lateral_function.rs
Handles set-returning functions (SRFs) used in the FROM clause with implicit LATERAL semantics: jsonb_array_elements, jsonb_each, jsonb_each_text, unnest, etc.
Delta Rule:
When a source row changes (insert, update, or delete), the SRF expansion is re-evaluated only for that source row:
$$\Delta(R \ltimes f(R.\text{col})) = (R' \ltimes f(R'.\text{col}))|{\text{changed rows}} - (R \ltimes f(R.\text{col}))|{\text{changed rows}}$$
Where $R$ is the source table, $f$ is the SRF, and changed rows are identified via the child delta.
Strategy (Row-Scoped Recomputation):
- Propagate the child delta to identify changed source rows.
- Find all existing ST rows derived from changed source rows (via column matching).
- Delete old SRF expansions for those source rows.
- Re-expand the SRF for inserted/updated source rows.
- Emit deletes + inserts as the final delta.
SQL Generation (4-CTE chain):
-- CTE 1: Changed source rows from child delta
WITH lat_changed AS (
SELECT DISTINCT "__pgt_row_id", "__pgt_action", <child_cols>
FROM <child_delta>
),
-- CTE 2: Old ST rows for changed source rows (to be deleted)
lat_old AS (
SELECT st."__pgt_row_id", st.<all_output_cols>
FROM <st_table> st
WHERE EXISTS (
SELECT 1 FROM lat_changed cs
WHERE st.<col1> IS NOT DISTINCT FROM cs.<col1>
AND st.<col2> IS NOT DISTINCT FROM cs.<col2>
...
)
),
-- CTE 3: Re-expand SRF for inserted/updated source rows
lat_expand AS (
SELECT pg_trickle_hash(<all_cols>::text) AS "__pgt_row_id",
cs.<child_cols>, <srf_alias>.<srf_cols>
FROM lat_changed cs,
LATERAL <srf_function>(cs.<arg>) AS <srf_alias>
WHERE cs."__pgt_action" = 'I'
),
-- CTE 4: Final delta
lat_final AS (
SELECT "__pgt_row_id", 'D' AS "__pgt_action", <cols> FROM lat_old
UNION ALL
SELECT "__pgt_row_id", 'I' AS "__pgt_action", <cols> FROM lat_expand
)
Row Identity:
Content-based: hash(child_columns || srf_result_columns). This is stable as long as the same source row produces the same expanded values.
Supported SRFs:
| Function | Output Columns | Notes |
|---|---|---|
jsonb_array_elements(jsonb) | value (jsonb) | Expands JSONB array to rows |
jsonb_array_elements_text(jsonb) | value (text) | Text variant |
jsonb_each(jsonb) | key (text), value (jsonb) | Expands JSONB object to key-value pairs |
jsonb_each_text(jsonb) | key (text), value (text) | Text variant |
unnest(anyarray) | Element type | Unnests PostgreSQL arrays |
| Custom SRFs | User-provided column aliases | AS alias(col1, col2) |
Notes:
- The cost is proportional to the number of changed source rows × average SRF expansion size, not the full table.
WITH ORDINALITYis supported — adds abigintordinality column to the output.ROWS FROM()with multiple functions is not supported (rejected at parse time).- Column aliases (e.g.,
AS child(value)) are used to determine output column names; for known SRFs without aliases, the alias name becomes the column name. - JSON_TABLE (PostgreSQL 17+) —
JSON_TABLE(expr, path COLUMNS (...))is modeled as aLateralFunctionand uses the same row-scoped recomputation strategy. Supported column types: regular, EXISTS, formatted, and nested columns withON ERROR/ON EMPTYbehaviors andPASSINGclauses.
Lateral Subquery (Correlated Subqueries in FROM)
Module: src/dvm/operators/lateral_subquery.rs
Handles correlated subqueries used in the FROM clause with explicit or implicit LATERAL semantics: FROM t, LATERAL (SELECT ... WHERE ref = t.col) AS alias or FROM t LEFT JOIN LATERAL (...) AS alias ON true.
Delta Rule:
When an outer row changes, the correlated subquery is re-executed only for that row:
$$\Delta(R \ltimes Q(R)) = (R' \ltimes Q(R'))|{\text{changed rows}} - (R \ltimes Q(R))|{\text{changed rows}}$$
Where $R$ is the outer table, $Q(R)$ is the correlated subquery, and changed rows are identified via the child delta.
Strategy (Row-Scoped Recomputation):
- Propagate the child delta to identify changed outer rows.
- Find all existing ST rows derived from changed outer rows (via column matching with
IS NOT DISTINCT FROM). - Delete old subquery expansions for those outer rows.
- Re-execute the subquery for inserted/updated outer rows using the original outer alias.
- Emit deletes + inserts as the final delta.
SQL Generation (4-CTE chain):
-- CTE 1: Changed outer rows from child delta
WITH lat_sq_changed AS (
SELECT DISTINCT "__pgt_row_id", "__pgt_action", <child_cols>
FROM <child_delta>
),
-- CTE 2: Old ST rows for changed outer rows (to be deleted)
lat_sq_old AS (
SELECT st."__pgt_row_id", st.<all_output_cols>
FROM <st_table> st
WHERE EXISTS (
SELECT 1 FROM lat_sq_changed cs
WHERE st.<col1> IS NOT DISTINCT FROM cs.<col1>
AND st.<col2> IS NOT DISTINCT FROM cs.<col2>
...
)
),
-- CTE 3: Re-execute subquery for inserted/updated outer rows
lat_sq_expand AS (
SELECT pg_trickle_hash(<all_cols>::text) AS "__pgt_row_id",
<outer_alias>.<child_cols>, <sub_alias>.<sub_cols>
FROM lat_sq_changed AS <outer_alias>, -- Original outer alias!
LATERAL (<subquery_sql>) AS <sub_alias>
WHERE <outer_alias>."__pgt_action" = 'I'
),
-- CTE 4: Final delta
lat_sq_final AS (
SELECT "__pgt_row_id", 'D' AS "__pgt_action", <cols> FROM lat_sq_old
UNION ALL
SELECT "__pgt_row_id", 'I' AS "__pgt_action", <cols> FROM lat_sq_expand
)
LEFT JOIN LATERAL Handling:
For queries using LEFT JOIN LATERAL (...) ON true, the expand CTE uses LEFT JOIN LATERAL instead of comma syntax and wraps subquery columns in COALESCE for hash stability:
lat_sq_expand AS (
SELECT pg_trickle_hash(<outer_cols>::text || '/' || COALESCE(<sub_cols>::text, '')) AS "__pgt_row_id",
<outer_alias>.<child_cols>, <sub_alias>.<sub_cols>
FROM lat_sq_changed AS <outer_alias>
LEFT JOIN LATERAL (<subquery_sql>) AS <sub_alias> ON true
WHERE <outer_alias>."__pgt_action" = 'I'
)
Row Identity:
Content-based: hash(outer_columns || '/' || subquery_result_columns). For LEFT JOIN with NULL results, COALESCE ensures a stable hash.
Supported Patterns:
| Pattern | Syntax | Notes |
|---|---|---|
| Top-N per group | LATERAL (SELECT ... ORDER BY ... LIMIT N) | Most common use case |
| Correlated aggregate | LATERAL (SELECT SUM(x) FROM t WHERE t.fk = p.pk) | Returns single row per outer row |
| Existence with data | LEFT JOIN LATERAL (...) ON true | Preserves outer rows with NULLs |
| Multi-column lookup | LATERAL (SELECT a, b FROM t WHERE t.fk = p.pk LIMIT 1) | Multiple derived values |
| GROUP BY inside subquery | LATERAL (SELECT type, COUNT(*) FROM t WHERE t.fk = p.pk GROUP BY type) | Multiple rows per outer row |
Key Design Decision: Outer Alias Rewriting
The subquery body contains column references to the outer table (e.g., WHERE li.order_id = o.id). In the expansion CTE, the changed-sources CTE is aliased with the original outer table alias (e.g., lat_sq_changed AS o) so that the subquery's column references resolve naturally without rewriting.
Notes:
- The cost is proportional to the number of changed outer rows × average subquery result size, not the full table.
- The subquery is stored as raw SQL (like
LateralFunction) because it cannot be independently differentiated — it depends on outer row context. - Source table OIDs referenced by the subquery body are extracted at parse time for CDC trigger setup.
- ORDER BY + LIMIT inside the subquery are valid (they apply per-outer-row, not to the stream table).
Semi-Join (EXISTS / IN Subquery)
Module: src/dvm/operators/semi_join.rs
Handles WHERE EXISTS (SELECT ... FROM ...) and WHERE col IN (SELECT ...) patterns. The parser transforms these into a SemiJoin operator with a left (outer) child, a right (inner) child, and a join condition.
Delta Rule:
$$\Delta(L \ltimes R) = \Delta L|{R} + L|{\Delta R \text{ causes existence change}}$$
- Part 1: Outer rows that changed and still satisfy the semi-join condition.
- Part 2: Existing outer rows whose semi-join result flipped due to inner changes (a matching inner row was inserted or deleted).
Strategy (Two-Part Delta):
- Part 1 (outer delta): Filter
delta_leftto rows that have at least one match in the current right-hand snapshot. - Part 2 (inner delta): For each row in the left snapshot, check whether the existence of matching right-hand rows changed between the old and current state. Emit
'I'if a match appeared,'D'if all matches disappeared.
The "old" right-hand state is reconstructed from the current state by reversing the delta: R_old = (R_current EXCEPT ALL delta_right(action='I')) UNION ALL delta_right(action='D').
Row Identity:
- Part 1: Uses
__pgt_row_idfrom the left delta. - Part 2: Content-based hash via
pg_trickle_hash_multion left-side columns.
Supported Patterns:
| Pattern | SQL | Notes |
|---|---|---|
EXISTS | WHERE EXISTS (SELECT 1 FROM t WHERE t.fk = s.pk) | Direct semi-join |
IN (subquery) | WHERE id IN (SELECT fk FROM t) | Rewritten to EXISTS with equality |
| Multiple conditions | WHERE EXISTS (... AND ...) | Additional predicates in subquery WHERE |
Anti-Join (NOT EXISTS / NOT IN Subquery)
Module: src/dvm/operators/anti_join.rs
Handles WHERE NOT EXISTS (SELECT ... FROM ...) and WHERE col NOT IN (SELECT ...) patterns. The inverse of the semi-join operator.
Delta Rule:
$$\Delta(L \triangleright R) = \Delta L|{\neg R} + L|{\Delta R \text{ causes existence change}}$$
- Part 1: Outer rows that changed and have no match in the right-hand snapshot.
- Part 2: Existing outer rows whose anti-join result flipped due to inner changes.
Strategy (Two-Part Delta):
- Part 1 (outer delta): Filter
delta_leftto rows withNOT EXISTSin the current right snapshot. - Part 2 (inner delta): For each row in the left snapshot, detect existence changes. Emit
'D'if a match appeared (row no longer qualifies),'I'if all matches disappeared (row now qualifies).
Note the inverted semantics compared to semi-join: a new match means deletion, losing all matches means insertion.
Row Identity: Same as semi-join.
Supported Patterns:
| Pattern | SQL | Notes |
|---|---|---|
NOT EXISTS | WHERE NOT EXISTS (SELECT 1 FROM t WHERE t.fk = s.pk) | Direct anti-join |
NOT IN (subquery) | WHERE id NOT IN (SELECT fk FROM t) | Rewritten to NOT EXISTS with equality |
Scalar Subquery (Correlated SELECT Subquery)
Module: src/dvm/operators/scalar_subquery.rs
Handles scalar subqueries appearing in the SELECT list, e.g., SELECT a, (SELECT max(x) FROM t) AS mx FROM s. The subquery must return exactly one row and one column.
Delta Rule:
$$\Delta(L \times q) = \Delta L \times q' + L \times (q' - q)$$
Where $q$ is the scalar subquery value and $q'$ is the updated value.
Strategy (Two-Part Delta):
- Part 1 (outer delta): Propagate the child delta, appending the current scalar subquery value to each row.
- Part 2 (scalar value change): When the scalar subquery's result changes, emit deletes for all existing outer rows (with the old scalar value) and re-inserts for all outer rows (with the new value). The old scalar value is reconstructed by reversing the inner delta.
SQL Generation (3 or 4 CTEs):
-- Part 1: child delta + current scalar value
WITH sq_outer AS (
SELECT *, (<scalar_subquery>) AS "<alias>"
FROM <child_delta>
),
-- Part 2a: DELETE all outer rows when scalar changed
sq_del AS (
SELECT "__pgt_row_id", 'D' AS "__pgt_action", <cols>
FROM <st_table>
WHERE (<scalar_old>) IS DISTINCT FROM (<scalar_current>)
),
-- Part 2b: INSERT all outer rows with new scalar value
sq_ins AS (
SELECT pg_trickle_hash_multi(...) AS "__pgt_row_id",
'I' AS "__pgt_action", <cols>, (<scalar_current>) AS "<alias>"
FROM <source_snapshot>
WHERE (<scalar_old>) IS DISTINCT FROM (<scalar_current>)
)
-- Final: UNION ALL of all parts
SELECT * FROM sq_outer
UNION ALL SELECT * FROM sq_del
UNION ALL SELECT * FROM sq_ins
Row Identity:
- Part 1:
__pgt_row_idfrom the child delta. - Part 2: Content-based hash via
pg_trickle_hash_multion all output columns.
Notes:
- The scalar subquery is stored as raw SQL (deparsed from the parse tree).
- The old scalar value is approximated using the same
EXCEPT ALL / UNION ALLreversal technique as semi/anti-join. - If the scalar subquery references a table that changes, all outer rows must be re-evaluated — the delta can be large.
- Source OIDs used by the scalar subquery are captured at parse time for CDC trigger registration.
Operator Tree Construction
The DVM engine builds the operator tree by analyzing the parsed query:
- WITH clause → CTE definitions extracted into a name→body map (non-recursive) or CTE registry (multi-reference)
- FROM clause →
Scannodes for physical tables;Subquerynodes for inlined CTEs and subqueries in FROM;CteScannodes for multi-reference CTEs;LateralFunctionnodes for SRFs and JSON_TABLE in FROM;LateralSubquerynodes for correlated subqueries in FROM - JOIN →
JoinorOuterJoinwrapping two sub-trees - LATERAL SRFs →
LateralFunctionwrapping the left-hand FROM item as its child - LATERAL subqueries →
LateralSubquerywrapping the left-hand FROM item as its child (comma syntax or JOIN LATERAL) - WHERE subqueries →
SemiJoinforEXISTS/IN (subquery),AntiJoinforNOT EXISTS/NOT IN (subquery), extracted from the WHERE clause - Scalar subqueries →
ScalarSubqueryfor(SELECT ...)in the SELECT list, wrapping the child tree - WHERE →
Filterwrapping the scan/join tree (remaining non-subquery predicates) - SELECT list →
Projectfor column selection and expressions - GROUP BY →
Aggregatewrapping the filtered/projected tree - DISTINCT →
Distincton top - UNION ALL →
UnionAllcombining two complete sub-trees - INTERSECT / EXCEPT →
IntersectorExceptcombining two sub-trees with dual-count tracking - Window functions →
Windowwrapping the sub-tree with PARTITION BY / ORDER BY metadata - ORDER BY → silently discarded (storage row order is undefined)
- LIMIT / OFFSET →
ORDER BY + LIMIT [+ OFFSET]is accepted as TopK (scoped recomputation); standaloneLIMITorOFFSETwithoutORDER BYis rejected
For recursive CTEs (WITH RECURSIVE), the query is parsed into an OpTree with RecursiveCte operator nodes. In DIFFERENTIAL mode, the strategy (semi-naive, DRed, or recomputation) is selected automatically based on column compatibility and change type — see the Recursive CTEs section above for details.
The tree is then traversed bottom-up during delta generation: each operator's generate_delta_sql() method composes its SQL fragment around the output of its child operator(s).
Further Reading
- ARCHITECTURE.md — System-wide component overview
- SQL_REFERENCE.md — Complete function reference
- CONFIGURATION.md — GUC tuning guide
DVM SQL Rewrite Rules
This document describes the transformation pipeline in
src/dvm/parser/rewrites.rs that prepares a defining query for
differentiation by the DVM (Differential View Maintenance) engine.
Each rewrite pass targets a specific SQL pattern, transforms it into a form the DVM engine can differentiate, and has a formal algebraic correctness argument.
Rewrite Pipeline Order
The rewrite passes are applied in sequence. Each pass may be iterated until a fixed point (no further changes) is reached.
- View Inlining — Replace view references with their definitions
- Grouping Sets Expansion — Expand CUBE/ROLLUP into UNION ALL
- EXISTS → Anti/Semi-Join — Convert correlated EXISTS to join operators
- Scalar Sublink Hoisting — Lift scalar subqueries to CTEs
- Delta Key Restriction — Push join key filters into R_old snapshots
1. View Inlining (rewrite_views_inline)
Input Pattern: SELECT ... FROM my_view v WHERE ...
Transformation: Replace my_view with its pg_get_viewdef() body
as a subquery: SELECT ... FROM (SELECT ... FROM base_tables) v WHERE ...
Correctness: A view is semantically equivalent to its definition. Inlining is required because the DVM engine needs to see the base tables to generate per-table change buffer references.
Before:
-- Defining query referencing a view
SELECT o.customer_id, SUM(o.amount) AS total
FROM order_summary_view o
GROUP BY o.customer_id
After:
-- View inlined; base tables are now visible for CDC binding
SELECT o.customer_id, SUM(o.amount) AS total
FROM (
SELECT orders.customer_id,
orders.amount,
orders.created_at
FROM public.orders
WHERE orders.status = 'completed'
) o
GROUP BY o.customer_id
The inlined form allows the DVM engine to bind orders as the CDC source
and generate delta SQL that reads from pgtrickle_changes.changes_<orders_oid>
instead of the whole table.
2. Grouping Sets Expansion (rewrite_grouping_sets)
Input Pattern: SELECT ... GROUP BY CUBE(a, b) or GROUP BY ROLLUP(a, b)
Transformation: Expand into a UNION ALL of individual GROUP BY
combinations. CUBE(a, b) → GROUP BY (a, b) UNION ALL GROUP BY (a)
UNION ALL GROUP BY (b) UNION ALL GROUP BY ().
Correctness: CUBE/ROLLUP is algebraically equivalent to the union of all grouping combinations. The DVM engine differentiates each branch independently, and the UNION ALL operator merges the deltas.
Guard: pg_trickle.max_grouping_set_branches (default 64) limits
explosion for high-dimensional CUBE expressions.
Before:
-- ROLLUP over region + product_type
SELECT region, product_type, SUM(revenue) AS total
FROM sales
GROUP BY ROLLUP(region, product_type)
After:
-- Expanded to three GROUP BY branches
SELECT region, product_type, SUM(revenue) AS total
FROM sales
GROUP BY region, product_type
UNION ALL
SELECT region, NULL AS product_type, SUM(revenue) AS total
FROM sales
GROUP BY region
UNION ALL
SELECT NULL AS region, NULL AS product_type, SUM(revenue) AS total
FROM sales
Each branch is an independent leaf node in the OpTree. The DVM engine
differentiates each branch by computing delta rows from the change buffer,
then merges the results via the UNION ALL parent node.
3. EXISTS → Anti/Semi-Join Conversion
Input Pattern:
SELECT ... FROM t1 WHERE EXISTS (SELECT 1 FROM t2 WHERE t2.key = t1.key)
SELECT ... FROM t1 WHERE NOT EXISTS (SELECT 1 FROM t2 WHERE t2.key = t1.key)
Transformation: Convert to OpTree::SemiJoin or OpTree::AntiJoin
with the extracted condition as the join predicate.
Correctness: EXISTS (correlated subquery) is equivalent to a
semi-join; NOT EXISTS is equivalent to an anti-join. The DVM engine
has specialized delta operators for both.
4. Scalar Sublink Hoisting (rewrite_scalar_subqueries)
Input Pattern: Scalar subqueries in SELECT or WHERE:
SELECT a, (SELECT max(b) FROM t2 WHERE t2.key = t1.key) FROM t1
Transformation: Hoist the scalar subquery to a CTE and replace with a reference:
WITH __pgt_scalar_1 AS (SELECT key, max(b) AS val FROM t2 GROUP BY key)
SELECT a, s.val FROM t1 LEFT JOIN __pgt_scalar_1 s ON s.key = t1.key
Correctness: A correlated scalar subquery is equivalent to a left join to its grouped equivalent. The CTE form allows the DVM engine to differentiate the subquery as a separate operator node.
5. Delta Key Restriction (DI-6)
Input Pattern: Anti-join / semi-join R_old snapshots that scan the full right table.
Transformation: Push equi-join key filters from the delta into the R_old snapshot to restrict it to only the changed keys.
Correctness: Only right-side rows matching changed keys can affect the anti/semi-join output. Restricting R_old to changed keys preserves correctness while reducing the scan from O(n) to O(Δ).
Before:
-- Anti-join delta: which left rows lost their right-side match?
-- R_old scans ALL of the right table (O(n))
SELECT l.*
FROM left_table l
WHERE NOT EXISTS (
SELECT 1 FROM right_table r_old WHERE r_old.key = l.key
)
AND EXISTS (
SELECT 1 FROM delta_right d WHERE d.key = l.key
)
After:
-- R_old restricted to only rows matching changed keys (O(Δ))
SELECT l.*
FROM left_table l
WHERE NOT EXISTS (
SELECT 1 FROM right_table r_old
WHERE r_old.key = l.key
AND r_old.key IN (SELECT key FROM delta_right) -- <-- restriction added
)
AND EXISTS (
SELECT 1 FROM delta_right d WHERE d.key = l.key
)
This rewrite is critical for join-heavy queries: without it, every anti-join delta scan reads the full right table regardless of how many rows actually changed.
Adding New Rewrite Passes
To add a new rewrite pass:
- Add the function in
src/dvm/parser/rewrites.rs - Add unit tests asserting the expected SQL output for a reference input
- Insert the pass at the correct position in the pipeline
- Document the pass in this file with input pattern, transformation, and correctness argument
See Also
- docs/DVM_OPERATORS.md — Per-operator differentiation rules
- docs/PERFORMANCE_COOKBOOK.md — Performance tuning
- src/dvm/parser/rewrites.rs — Implementation
pg_trickle — Benchmark Guide
This document explains how the database-level refresh benchmarks work and how to interpret their output.
Overview
The benchmark suite in tests/e2e_bench_tests.rs measures wall-clock refresh time for FULL vs DIFFERENTIAL mode across a matrix of table sizes, change rates, and query complexities. Each benchmark spawns an isolated PostgreSQL 18.x container via Testcontainers, ensuring reproducible and interference-free measurements.
The core question the benchmarks answer:
How much faster is an DIFFERENTIAL refresh compared to a FULL refresh, given a specific workload?
Prerequisites
Build the E2E test Docker image before running any benchmarks:
./tests/build_e2e_image.sh
Docker must be running on the host.
Running Benchmarks
All benchmark tests are tagged #[ignore] so they are skipped during normal CI. The --nocapture flag is required to see the printed output tables.
Quick Spot Checks (~5–10 seconds each)
# Simple scan, 10K rows, 1% change rate
cargo test --test e2e_bench_tests --features pg18 -- --ignored --nocapture bench_scan_10k_1pct
# Aggregate query, 100K rows, 1% change rate
cargo test --test e2e_bench_tests --features pg18 -- --ignored --nocapture bench_aggregate_100k_1pct
# Join + aggregate, 100K rows, 10% change rate
cargo test --test e2e_bench_tests --features pg18 -- --ignored --nocapture bench_join_agg_100k_10pct
Zero-Change Latency (~5 seconds)
cargo test --test e2e_bench_tests --features pg18 -- --ignored --nocapture bench_no_data_refresh_latency
Full Matrix (~15–30 minutes)
Runs all 30 combinations and prints a consolidated summary:
cargo test --test e2e_bench_tests --features pg18 -- --ignored --nocapture bench_full_matrix
Run All Benchmarks in Parallel
cargo test --test e2e_bench_tests --features pg18 -- --ignored --nocapture
Note: each test starts its own container, so parallel execution requires sufficient Docker resources.
Benchmark Dimensions
Table Sizes
| Size | Rows | Purpose |
|---|---|---|
| Small | 10,000 | Fast iteration; measures per-row overhead |
| Medium | 100,000 | More realistic; reveals scaling characteristics |
Change Rates
| Rate | Description |
|---|---|
| 1% | Low churn — the sweet spot for incremental refresh |
| 10% | Moderate churn — tests delta query scalability |
| 50% | High churn — stress test; approaches full-refresh cost |
Query Complexities
| Scenario | Defining Query | Operators Tested |
|---|---|---|
| scan | SELECT id, region, category, amount, score FROM src | Table scan only |
| filter | SELECT id, region, amount FROM src WHERE amount > 5000 | Scan + filter (WHERE) |
| aggregate | SELECT region, SUM(amount), COUNT(*) FROM src GROUP BY region | Scan + group-by aggregate |
| join | SELECT s.id, s.region, s.amount, d.region_name FROM src s JOIN dim d ON ... | Scan + inner join |
| join_agg | SELECT d.region_name, SUM(s.amount), COUNT(*) FROM src s JOIN dim d ON ... GROUP BY ... | Scan + join + aggregate |
DML Mix per Cycle
Each change cycle applies a realistic mix of operations:
| Operation | Fraction | Example at 10K rows, 10% rate |
|---|---|---|
| UPDATE | 70% | 700 rows have amount incremented |
| DELETE | 15% | 150 rows removed |
| INSERT | 15% | 150 new rows added |
What Each Benchmark Does
1. Start a fresh PostgreSQL 18.x container
2. Install the pg_trickle extension
3. Create and populate the source table (10K or 100K rows)
4. Create dimension table if needed (for join scenarios)
5. ANALYZE for stable query plans
── FULL mode ──
6. Create a Stream Table in FULL refresh mode
7. For each of 3 cycles:
a. Apply random DML (updates + deletes + inserts)
b. ANALYZE
c. Time the FULL refresh (TRUNCATE + re-execute entire query)
d. Record refresh_ms and ST row count
8. Drop the FULL-mode ST
── DIFFERENTIAL mode ──
9. Reset source table to same starting state
10. Create a Stream Table in DIFFERENTIAL refresh mode
11. For each of 3 cycles:
a. Apply random DML (same parameters)
b. ANALYZE
c. Time the DIFFERENTIAL refresh (delta query + MERGE)
d. Record refresh_ms and ST row count
12. Print results table and summary
Both modes start from the same data to ensure a fair comparison. The 3-cycle design captures warm-up effects (cycle 1 may be slower due to plan caching).
Reading the Output
Detail Table
╔══════════════════════════════════════════════════════════════════════════════════════╗
║ pg_trickle Refresh Benchmark Results ║
╠════════════╤══════════╤════════╤═════════════╤═══════╤════════════╤═════════════════╣
║ Scenario │ Rows │ Chg % │ Mode │ Cycle │ Refresh ms │ ST Rows ║
╠════════════╪══════════╪════════╪═════════════╪═══════╪════════════╪═════════════════╣
║ aggregate │ 10000 │ 1% │ FULL │ 1 │ 22.1 │ 5 ║
║ aggregate │ 10000 │ 1% │ FULL │ 2 │ 4.8 │ 5 ║
║ aggregate │ 10000 │ 1% │ FULL │ 3 │ 5.3 │ 5 ║
║ aggregate │ 10000 │ 1% │ DIFFERENTIAL │ 1 │ 8.4 │ 5 ║
║ aggregate │ 10000 │ 1% │ DIFFERENTIAL │ 2 │ 4.4 │ 5 ║
║ aggregate │ 10000 │ 1% │ DIFFERENTIAL │ 3 │ 4.6 │ 5 ║
╚════════════╧══════════╧════════╧═════════════╧═══════╧════════════╧═════════════════╝
| Column | Meaning |
|---|---|
| Scenario | Query complexity level (scan, filter, aggregate, join, join_agg) |
| Rows | Number of rows in the base table |
| Chg % | Percentage of rows changed per cycle |
| Mode | FULL (truncate + recompute) or DIFFERENTIAL (delta + merge) |
| Cycle | Which of the 3 measurement rounds (cycle 1 often includes warm-up) |
| Refresh ms | Wall-clock time for the refresh operation |
| ST Rows | Row count in the Stream Table after refresh (sanity check) |
Summary Table
┌─────────────────────────────────────────────────────────────────────────┐
│ Summary (avg ms per cycle) │
├────────────┬──────────┬────────┬─────────────────┬──────────────────────┤
│ Scenario │ Rows │ Chg % │ FULL avg ms │ DIFFERENTIAL avg ms │
├────────────┼──────────┼────────┼─────────────────┼──────────────────────┤
│ aggregate │ 10000 │ 1% │ 10.7 │ 5.8 ( 1.8x) │
└────────────┴──────────┴────────┴─────────────────┴──────────────────────┘
The Speedup value in parentheses is FULL avg / DIFFERENTIAL avg — how many times faster the incremental refresh is compared to a full refresh.
Interpreting the Speedup
What to Expect
| Change Rate | Table Size | Expected Speedup | Explanation |
|---|---|---|---|
| 1% | 10K | 1.5–5x | Small table; overhead is similar, delta is tiny |
| 1% | 100K | 5–50x | Larger table amplifies full-refresh cost |
| 10% | 100K | 2–10x | Moderate delta; still significantly faster |
| 50% | any | 1–2x | Delta is nearly as large as full table |
Rules of Thumb
| Speedup | Interpretation |
|---|---|
| > 10x | Strong win for DIFFERENTIAL — typical at low change rates on larger tables |
| 5–10x | Clear advantage for DIFFERENTIAL |
| 2–5x | Moderate advantage — DIFFERENTIAL is the right choice |
| 1–2x | Marginal gain — either mode is acceptable |
| ~1x | Break-even — change rate is too high for incremental to help |
| < 1x | DIFFERENTIAL is slower — would indicate overhead exceeds savings (investigate) |
Key Patterns to Look For
-
Scaling with table size: For the same change rate, speedup should increase with table size. FULL must re-process all rows; DIFFERENTIAL processes only the delta.
-
Degradation with change rate: As change rate rises from 1% → 50%, speedup should decrease. At 50%, DIFFERENTIAL processes half the table which approaches FULL cost.
-
Query complexity amplifies speedup: Aggregate and join queries benefit more from DIFFERENTIAL because they avoid expensive re-computation. A join_agg at 1% changes should show higher speedup than a simple scan at the same parameters.
-
Cycle 1 warm-up: The first cycle in each mode may be slower due to PostgreSQL plan cache population. Use cycles 2–3 for the steadiest numbers.
-
ST Rows consistency: The ST row count should be similar between FULL and DIFFERENTIAL for the same scenario (accounting for random DML). Large discrepancies indicate a correctness issue.
Zero-Change Latency
The bench_no_data_refresh_latency test measures the overhead of a refresh when no data has changed — the NO_DATA code path.
┌──────────────────────────────────────────────┐
│ NO_DATA Refresh Latency (10 iterations) │
├──────────────────────────────────────────────┤
│ Avg: 3.21 ms │
│ Max: 5.10 ms │
│ Target: < 10 ms │
│ Status: ✅ PASS │
└──────────────────────────────────────────────┘
| Metric | Meaning |
|---|---|
| Avg | Average wall-clock time across 10 no-op refreshes |
| Max | Worst-case single iteration |
| Target | The PLAN.md goal: < 10 ms per no-op refresh |
| Status | PASS if avg < 10 ms, SLOW otherwise |
A passing result confirms the scheduler's per-cycle overhead is negligible. Values > 10 ms in containerized environments may be acceptable due to Docker overhead; bare-metal PostgreSQL should comfortably meet the target.
Available Tests
Individual Tests (10K rows)
| Test Name | Scenario | Change Rate |
|---|---|---|
bench_scan_10k_1pct | scan | 1% |
bench_scan_10k_10pct | scan | 10% |
bench_scan_10k_50pct | scan | 50% |
bench_filter_10k_1pct | filter | 1% |
bench_aggregate_10k_1pct | aggregate | 1% |
bench_join_10k_1pct | join | 1% |
bench_join_agg_10k_1pct | join_agg | 1% |
Individual Tests (100K rows)
| Test Name | Scenario | Change Rate |
|---|---|---|
bench_scan_100k_1pct | scan | 1% |
bench_scan_100k_10pct | scan | 10% |
bench_scan_100k_50pct | scan | 50% |
bench_aggregate_100k_1pct | aggregate | 1% |
bench_aggregate_100k_10pct | aggregate | 10% |
bench_join_agg_100k_1pct | join_agg | 1% |
bench_join_agg_100k_10pct | join_agg | 10% |
Special Tests
| Test Name | Description |
|---|---|
bench_full_matrix | All 30 combinations (5 queries × 2 sizes × 3 rates) |
bench_no_data_refresh_latency | Zero-change overhead (10 iterations) |
Nexmark Streaming Benchmark
The Nexmark benchmark validates correctness against a sustained high-frequency DML workload modelling an online auction system. It is adapted from the Nexmark benchmark specification used by streaming systems like Flink, Feldera, and Materialize.
Data Model
| Table | Description | Default Size |
|---|---|---|
person | Registered users (sellers/bidders) | 100 rows |
auction | Items listed for sale | 500 rows |
bid | Bids placed on auctions | 2,000 rows |
Queries
| Query | Features | Description |
|---|---|---|
| Q0 | Passthrough | Identity projection of all bids |
| Q1 | Projection + arithmetic | Currency conversion |
| Q2 | Filter | Bids on specific auctions |
| Q3 | JOIN + filter | Local item suggestion (person-auction join) |
| Q4 | JOIN + GROUP BY + AVG | Average selling price by category |
| Q5 | GROUP BY + COUNT | Hot items (bid count per auction) |
| Q6 | JOIN + GROUP BY + AVG | Average bid price per seller |
| Q7 | Aggregate (MAX) | Highest bid price |
| Q8 | JOIN | Person-auction join (new users monitoring) |
| Q9 | JOIN + DISTINCT ON | Winning bid per auction with bidder info |
Running Nexmark Tests
# Default scale (100 persons, 500 auctions, 2000 bids, 3 cycles)
cargo test --test e2e_nexmark_tests -- --ignored --test-threads=1 --nocapture
# Larger scale
NEXMARK_PERSONS=1000 NEXMARK_AUCTIONS=5000 NEXMARK_BIDS=50000 NEXMARK_CYCLES=5 \
cargo test --test e2e_nexmark_tests -- --ignored --test-threads=1 --nocapture
What Each Cycle Does
Each refresh cycle applies three mutation functions (RF1-RF3) then refreshes all stream tables and asserts multiset equality:
- RF1 (INSERT): New persons, auctions, and bids
- RF2 (DELETE): Remove oldest bids, orphaned auctions, orphaned persons
- RF3 (UPDATE): Price changes, reserve adjustments, city moves
- Refresh + Assert: Differential refresh → EXCEPT ALL correctness check
Correctness Validation
The test uses the same DBSP invariant as TPC-H: after every differential
refresh, the stream table must be multiset-equal to re-executing the
defining query from scratch (symmetric EXCEPT ALL). Additionally, negative
__pgt_count values (over-retraction bugs) are detected.
DAG Topology Benchmarks
The DAG topology benchmark suite in tests/e2e_dag_bench_tests.rs measures end-to-end propagation latency and throughput through multi-level DAG topologies. While the single-ST benchmarks above measure per-operator refresh speed, these benchmarks measure how efficiently changes propagate through chains, fan-outs, diamonds, and mixed topologies with 5–100+ stream tables.
The core questions these benchmarks answer:
How long does it take for a source-table INSERT to propagate through an entire DAG to the leaf stream tables?
How does PARALLEL refresh mode compare to CALCULATED mode across different topology shapes?
Running DAG Benchmarks
# Full suite (rebuilds Docker image)
just test-dag-bench
# Skip Docker image rebuild
just test-dag-bench-fast
# Individual topology tests
cargo test --test e2e_dag_bench_tests --features pg18 -- --ignored bench_latency_linear_5 --test-threads=1 --nocapture
cargo test --test e2e_dag_bench_tests --features pg18 -- --ignored bench_throughput_diamond --test-threads=1 --nocapture
Topology Patterns
| Topology | Shape | Description |
|---|---|---|
| Linear Chain | src → st_1 → st_2 → ... → st_N | Sequential pipeline; L1 aggregate, L2+ alternating project/filter |
| Wide DAG | src → [W parallel chains × D deep] | W independent chains of depth D from a shared source; tests parallel refresh mode |
| Fan-Out Tree | src → root → [b children] → [b² grandchildren] → ... | Exponential fan-out; each parent spawns b children with filter/project variants |
| Diamond | src → [fan-out aggregates] → JOIN → [extension] | Fan-out to independent aggregates (SUM/COUNT/MAX/MIN/AVG) then converge via JOIN |
| Mixed | Two sources, 4 layers, ~15 STs | Realistic e-commerce scenario with chains, fan-out, cross-source joins, and alerts |
Measurement Modes
Latency benchmarks (auto-refresh): The scheduler is enabled with a 200 ms interval. The test INSERTs into the source table and polls pgt_refresh_history until the leaf stream table has a new COMPLETED entry. This measures the full propagation latency including scheduler overhead.
Throughput benchmarks (manual refresh): The scheduler is disabled. The test applies mixed DML (70% UPDATE, 15% DELETE, 15% INSERT) then manually refreshes all STs in topological order. This isolates pure refresh cost from scheduler overhead.
Theoretical Comparison
Each latency benchmark computes the theoretical prediction from PLAN_DAG_PERFORMANCE.md and reports the delta:
| Mode | Formula |
|---|---|
| CALCULATED | L = I_s + N × T_r |
| PARALLEL(C) | L = Σ ⌈W_l / C⌉ × max(I_p, T_r) per level |
Where T_r is the measured average per-ST refresh time, I_s = 200 ms (scheduler interval), and C is the concurrency limit.
Reading the Output
Per-Cycle Machine-Parseable Lines (stderr)
[DAG_BENCH] topology=linear_chain mode=CALCULATED sts=10 depth=10 width=1 cycle=1 actual_ms=820.3 theory_ms=700.0 overhead_pct=17.2 per_hop_ms=82.0
ASCII Summary Table (stdout)
╔══════════════════════════════════════════════════════════════════════════════════════════════════════╗
║ pg_trickle DAG Topology Benchmark Results ║
╠═══════════════╤═══════════════╤══════╤═══════╤═══════╤════════════╤════════════╤═══════════════════╣
║ Topology │ Mode │ STs │ Depth │ Width │ Actual ms │ Theory ms │ Overhead ║
╠═══════════════╪═══════════════╪══════╪═══════╪═══════╪════════════╪════════════╪═══════════════════╣
║ linear_chain │ CALCULATED │ 10 │ 10 │ 1 │ 820.3 │ 700.0 │ +17.2% ║
║ wide_dag │ PARALLEL_C8 │ 60 │ 3 │ 20 │ 2430.1 │ 1800.0 │ +35.0% ║
╚═══════════════╧═══════════════╧══════╧═══════╧═══════╧════════════╧════════════╧═══════════════════╝
Per-Level Breakdown
Per-Level Breakdown (linear_chain D=10, CALCULATED):
Level 1: avg 52.3ms [st_lc_1]
Level 2: avg 48.7ms [st_lc_2]
...
Level 10: avg 51.2ms [st_lc_10]
Total: 513.5ms (scheduler overhead: 306.8ms)
JSON Export
Results are written to target/dag_bench_results/<timestamp>.json (overridable via PGS_DAG_BENCH_JSON_DIR env var) for cross-run comparison.
Available DAG Benchmark Tests
Latency Tests (Auto-Refresh)
| Test Name | Topology | Mode | STs |
|---|---|---|---|
bench_latency_linear_5_calc | Linear, D=5 | CALCULATED | 5 |
bench_latency_linear_10_calc | Linear, D=10 | CALCULATED | 10 |
bench_latency_linear_20_calc | Linear, D=20 | CALCULATED | 20 |
bench_latency_linear_10_par4 | Linear, D=10 | PARALLEL(4) | 10 |
bench_latency_wide_3x20_calc | Wide, D=3 W=20 | CALCULATED | 60 |
bench_latency_wide_3x20_par4 | Wide, D=3 W=20 | PARALLEL(4) | 60 |
bench_latency_wide_3x20_par8 | Wide, D=3 W=20 | PARALLEL(8) | 60 |
bench_latency_wide_5x20_calc | Wide, D=5 W=20 | CALCULATED | 100 |
bench_latency_wide_5x20_par8 | Wide, D=5 W=20 | PARALLEL(8) | 100 |
bench_latency_fanout_b2d5_calc | Fan-out, b=2 d=5 | CALCULATED | 31 |
bench_latency_fanout_b2d5_par8 | Fan-out, b=2 d=5 | PARALLEL(8) | 31 |
bench_latency_diamond_4_calc | Diamond, fan=4 | CALCULATED | 5 |
bench_latency_mixed_calc | Mixed, ~15 STs | CALCULATED | ~15 |
bench_latency_mixed_par8 | Mixed, ~15 STs | PARALLEL(8) | ~15 |
Throughput Tests (Manual Refresh)
| Test Name | Topology | STs | Delta Sizes |
|---|---|---|---|
bench_throughput_linear_5 | Linear, D=5 | 5 | 10, 100, 1000 |
bench_throughput_linear_10 | Linear, D=10 | 10 | 10, 100, 1000 |
bench_throughput_linear_20 | Linear, D=20 | 20 | 10, 100, 1000 |
bench_throughput_wide_3x20 | Wide, D=3 W=20 | 60 | 10, 100, 1000 |
bench_throughput_fanout_b2d5 | Fan-out, b=2 d=5 | 31 | 10, 100, 1000 |
bench_throughput_diamond_4 | Diamond, fan=4 | 5 | 10, 100, 1000 |
bench_throughput_mixed | Mixed, ~15 STs | ~15 | 10, 100, 1000 |
What to Look For
-
Linear chain: CALCULATED faster than PARALLEL. For width=1 DAGs, PARALLEL adds poll overhead without parallelism benefit. CALCULATED should be faster.
-
Wide DAG: PARALLEL(C=8) speedup over CALCULATED. For width ≥ 20, PARALLEL should show measurable improvement — it refreshes up to C STs concurrently per level instead of sequentially.
-
Overhead < 100%. Theoretical vs actual overhead should stay below 100% across all topologies — the formulas should be in the right ballpark.
-
DIFFERENTIAL action in per-ST breakdown. ST-on-ST hops should show
DIFFERENTIALrather thanFULL, confirming differential propagation is working. -
Throughput scaling with delta size. Smaller deltas (10 rows) should yield lower per-cycle wall-clock time than larger deltas (1000 rows).
In-Process Micro-Benchmarks (Criterion.rs)
In addition to the E2E database benchmarks, the project includes two Criterion.rs benchmark suites that measure pure Rust computation time without database overhead. These are useful for tracking performance regressions in the internal query-building and IVM differentiation logic.
Benchmark Suites
refresh_bench — Utility Functions
benches/refresh_bench.rs benchmarks the low-level helper functions used during refresh operations:
| Benchmark Group | What It Measures |
|---|---|
| quote_ident | PostgreSQL identifier quoting speed |
| col_list | Column list SQL generation |
| prefixed_col_list | Prefixed column list generation (e.g., NEW.col) |
| expr_to_sql | AST expression → SQL string conversion |
| output_columns | Output column extraction from parsed queries |
| source_oids | Source table OID resolution |
| lsn_gt | LSN comparison expression generation |
| frontier_json | Frontier state JSON serialization |
| canonical_period | Interval parsing and canonicalization |
| dag_operations | DAG topological sort and cycle detection |
| xxh64 | xxHash-64 hashing throughput |
diff_operators — IVM Operator Differentiation
benches/diff_operators.rs benchmarks the delta SQL generation for every IVM operator. Each benchmark creates a realistic operator tree and measures differentiate() throughput:
| Benchmark Group | What It Measures |
|---|---|
| diff_scan | Table scan differentiation (3, 10, 20 columns) |
| diff_filter | Filter (WHERE) differentiation |
| diff_project | Projection (SELECT subset) differentiation |
| diff_aggregate | GROUP BY aggregate differentiation (simple + complex) |
| diff_inner_join | Inner join differentiation |
| diff_left_join | Left outer join differentiation |
| diff_distinct | DISTINCT differentiation |
| diff_union_all | UNION ALL differentiation (2, 5, 10 children) |
| diff_window | Window function differentiation |
| diff_join_aggregate | Composite join + aggregate pipeline |
| differentiate_full | Full differentiate() call for scan-only and filter+scan trees |
Running Micro-Benchmarks
# Run all Criterion benchmarks
just bench
# Run only refresh utility benchmarks
cargo bench --bench refresh_bench --features pg18
# Run only IVM diff operator benchmarks
just bench-diff
# or equivalently:
cargo bench --bench diff_operators --features pg18
# Output in Bencher-compatible format (for CI integration)
just bench-bencher
Output and Reports
Criterion produces statistical analysis for each benchmark including:
- Mean and standard deviation of execution time
- Throughput (iterations/sec)
- Comparison with previous run — reports improvements/regressions with confidence intervals
HTML reports are generated in target/criterion/ with interactive charts showing distributions and regression history. Open target/criterion/report/index.html to browse all results.
Sample output:
diff_scan/3_columns time: [11.834 µs 12.074 µs 12.329 µs]
diff_scan/10_columns time: [16.203 µs 16.525 µs 16.869 µs]
diff_aggregate/simple time: [21.447 µs 21.862 µs 22.301 µs]
diff_inner_join time: [25.919 µs 26.421 µs 26.952 µs]
Continuous Benchmarking with Bencher
Bencher provides continuous benchmark tracking in CI, detecting performance regressions on pull requests before they merge.
How It Works
The .github/workflows/benchmarks.yml workflow:
-
On
mainpushes — runs both Criterion suites and uploads results to Bencher as the baseline. This establishes the expected performance for each benchmark. -
On pull requests — runs the same benchmarks and compares against the
mainbaseline using a Student's t-test with a 99% upper confidence boundary. If any benchmark regresses beyond the threshold, the PR check fails.
Setup
To enable Bencher for your fork or deployment:
-
Create a Bencher account at bencher.dev and create a project.
-
Add the API token as a GitHub Actions secret:
- Go to Settings → Secrets and variables → Actions
- Add
BENCHER_API_TOKENwith your Bencher API token
-
Update the project slug in
.github/workflows/benchmarks.ymlif your Bencher project name differs frompg-trickle.
The workflow gracefully degrades — if BENCHER_API_TOKEN is not set, benchmarks still run and upload artifacts but skip Bencher tracking.
Local Bencher-Format Output
To see what Bencher would receive from CI:
just bench-bencher
This runs both suites with --output-format bencher, producing JSON output compatible with bencher run.
Dashboard
Once configured, the Bencher dashboard shows:
- Historical trends for every benchmark across commits
- Statistical thresholds with configurable alerting
- PR annotations highlighting which benchmarks regressed and by how much
Troubleshooting
| Issue | Resolution |
|---|---|
docker: command not found | Install Docker Desktop and ensure it is running |
| Container startup timeout | Increase Docker memory allocation (≥ 4 GB recommended) |
image not found | Run ./tests/build_e2e_image.sh to build the test image |
| Highly variable timings | Close other workloads; use --test-threads=1 to avoid container contention |
| SLOW status on latency test | Expected in Docker; bare-metal should pass < 10 ms |
CDC Write-Side Overhead Benchmarks
The CDC write-overhead benchmark suite in tests/e2e_cdc_write_overhead_tests.rs measures the DML throughput cost of pg_trickle's CDC triggers on source tables. This quantifies the "write amplification factor" — how much slower DML becomes when a stream table is attached.
The core question this benchmark answers:
How much write throughput do you sacrifice by attaching a stream table to a source table?
Running CDC Write Overhead Benchmarks
# Full suite (all 5 scenarios)
cargo test --test e2e_cdc_write_overhead_tests --features pg18 -- --ignored --nocapture bench_cdc_write_overhead_full
# Individual scenarios
cargo test --test e2e_cdc_write_overhead_tests --features pg18 -- --ignored --nocapture bench_cdc_single_row_insert
cargo test --test e2e_cdc_write_overhead_tests --features pg18 -- --ignored --nocapture bench_cdc_bulk_insert
cargo test --test e2e_cdc_write_overhead_tests --features pg18 -- --ignored --nocapture bench_cdc_bulk_update
cargo test --test e2e_cdc_write_overhead_tests --features pg18 -- --ignored --nocapture bench_cdc_bulk_delete
cargo test --test e2e_cdc_write_overhead_tests --features pg18 -- --ignored --nocapture bench_cdc_concurrent_writers
Scenarios
| Scenario | Description | Rows per Cycle |
|---|---|---|
| Single-row INSERT | One INSERT statement per row, 1,000 rows total | 1,000 |
| Bulk INSERT | Single INSERT ... SELECT generate_series(...) | 10,000 |
| Bulk UPDATE | Single UPDATE ... WHERE id <= N | 10,000 |
| Bulk DELETE | Single DELETE ... WHERE id <= N | 10,000 |
| Concurrent writers | 4 parallel sessions each inserting 5,000 rows | 20,000 total |
Reading the Output
╔═══════════════════════════════════════════════════════════════════════════════════╗
║ pg_trickle CDC Write-Side Overhead Benchmark ║
╠═══════════════════════╤═══════════════╤═══════════════╤═════════════════════════╣
║ Scenario │ Baseline (ms) │ With CDC (ms) │ Write Amplification ║
╠═══════════════════════╪═══════════════╪═══════════════╪═════════════════════════╣
║ single-row INSERT │ 450.2 │ 890.5 │ 1.98× ║
║ bulk INSERT (10K) │ 35.1 │ 72.3 │ 2.06× ║
║ bulk UPDATE (10K) │ 48.7 │ 105.2 │ 2.16× ║
║ bulk DELETE (10K) │ 22.4 │ 51.8 │ 2.31× ║
║ concurrent (4×5K) │ 65.3 │ 142.1 │ 2.18× ║
╚═══════════════════════╧═══════════════╧═══════════════╧═════════════════════════╝
| Column | Meaning |
|---|---|
| Scenario | DML pattern being measured |
| Baseline | Average wall-clock time with no stream table (no CDC trigger) |
| With CDC | Average wall-clock time with an active stream table (CDC trigger fires) |
| Write Amplification | With CDC / Baseline — how many times slower the write path becomes |
Machine-Readable Output
[CDC_BENCH] scenario=single-row_INSERT baseline_avg_ms=450.2 cdc_avg_ms=890.5 write_amplification=1.98
Interpreting Write Amplification
| Write Amplification | Interpretation |
|---|---|
| 1.0–1.5× | Minimal overhead — triggers add negligible cost. Typical for bulk DML with statement-level triggers. |
| 1.5–2.5× | Expected range for statement-level CDC triggers. Each DML statement incurs one additional INSERT into the change buffer. |
| 2.5–4.0× | Moderate overhead — acceptable for most workloads. Common with row-level triggers or single-row DML. |
| 4.0–10× | High overhead — consider pg_trickle.cdc_trigger_mode = 'statement' if using row-level triggers, or reduce DML frequency. |
| > 10× | Investigate — may indicate lock contention on the change buffer or pathological trigger interaction. |
Key Patterns to Look For
-
Statement-level triggers vs row-level: Statement-level triggers (default since v0.11.0) should show significantly lower overhead for bulk DML compared to row-level triggers.
-
Bulk DML advantage: Bulk INSERT/UPDATE/DELETE should show lower write amplification than single-row INSERT because the trigger fires once per statement, not once per row.
-
Concurrent writer safety: The concurrent scenario should complete without deadlocks or errors, and the write amplification should be similar to the serial bulk INSERT case.
-
DELETE overhead: DELETE triggers tend to be slightly more expensive than INSERT triggers because the trigger must capture the
OLDrow values.
CI Benchmark Workflows
All benchmark jobs run only on weekly schedule and workflow_dispatch — never on PR or push — to avoid blocking the merge gate with long-running tests.
e2e-benchmarks.yml — E2E Benchmark Tracking
Produces the numbers in README.md and this document. Each job posts a summary table to the GitHub Actions run page and uploads artifacts at 90-day retention. Manual dispatch accepts a job input (refresh | latency | cdc | tpch | all) to re-run a single job.
| Job | Test(s) | README Section | Timeout | just command |
|---|---|---|---|---|
bench-refresh | bench_full_matrix | Differential vs Full Refresh | 60 min | just test-bench-e2e-fast |
bench-latency | bench_no_data_refresh_latency | Zero-Change Latency | 20 min | just test-bench-e2e-fast |
bench-cdc | bench_cdc_trigger_overhead | Write-Path Overhead | 30 min | just test-bench-e2e-fast |
bench-tpch | test_tpch_performance_comparison | TPC-H per-query table | 30 min | just bench-tpch-fast |
ci.yml — Benchmark Jobs
Criterion micro-benchmarks and DAG topology benchmarks. Run on the daily schedule and workflow_dispatch.
| Job | Test Suite | What It Measures | Timeout | just command |
|---|---|---|---|---|
benchmarks | benches/refresh_bench.rs, benches/diff_operators.rs | In-process Rust: query building, delta SQL generation (sub-µs) | 20 min | just bench |
dag-bench-calc | e2e_dag_bench_tests (excl. par*) | DAG propagation latency + throughput, CALCULATED mode | 30 min | just test-dag-bench-fast |
dag-bench-parallel | e2e_dag_bench_tests (par*) | DAG propagation with 4–8 parallel workers | 120 min | just test-dag-bench-fast |
benchmarks.yml — Bencher Integration (opt-in)
Disabled by default (no scheduled trigger). Re-enable by restoring push/pull_request triggers and adding a BENCHER_API_TOKEN secret. When active, it annotates PRs with regressions detected via Student’s t-test at a 99% upper confidence boundary.
| Job | Test Suite | What It Measures | Tracking |
|---|---|---|---|
benchmark | benches/refresh_bench.rs, benches/diff_operators.rs | Same as ci.yml benchmarks job | Bencher (regression alert on PR) |
Artifact Retention Summary
| Workflow | Artifact | Retention |
|---|---|---|
e2e-benchmarks.yml | bench-{refresh,latency,cdc,tpch}-results (stdout + JSON) | 90 days |
ci.yml benchmarks | benchmark-results (Criterion HTML + JSON) | 7 days |
benchmarks.yml | criterion-results (Criterion HTML + JSON) | 7 days |
pg_trickle vs. DBSP: Similarities and Differences
What They Share (Conceptual Foundation)
pg_trickle explicitly cites DBSP as its theoretical foundation (see PRIOR_ART.md). The key overlap:
| Concept | DBSP (paper) | pg_trickle (implementation) |
|---|---|---|
| Z-set / delta model | Rows annotated with weights (+1/−1) in an abelian group | __pgt_action = 'I'/'D' column on every delta row — effectively Z-sets restricted to {+1, −1} |
| Per-operator differentiation | Recursive Algorithm 4.6: Q^Δ = D ∘ Q ∘ I, decomposed per-operator via the chain rule (Q₁ ∘ Q₂)^Δ = Q₁^Δ ∘ Q₂^Δ | DiffContext::diff_node() walks the OpTree and calls per-operator differentiators (scan, filter, project, join, aggregate, distinct, union, etc.) — same recursive structural decomposition |
| Linear operators are self-incremental | Theorem 3.3: for LTI operator Q, Q^Δ = Q | Filter and Project pass deltas through unchanged (just apply predicate/projection to the delta stream) |
| Bilinear join rule | Theorem 3.4: Δ(a × b) = Δa × Δb + a × Δb + Δa × b | diff_inner_join generates exactly 3 UNION ALL parts: (delta_left ⋈ current_right), (current_left ⋈ delta_right), and optionally (delta_left ⋈ delta_right) |
| Aggregate auxiliary counters | §4.2: counting algorithm for maintaining aggregates with deletions | __pgt_count auxiliary column, LEFT JOIN back to stream table to read old counts and compute new counts |
| Recursive queries | §6: fixed-point iteration with z⁻¹ delay operator, semi-naive evaluation | diff_recursive_cte uses recomputation-diff (DRed-style), not DBSP's native fixed-point circuit |
Key Differences
1. Execution model — standalone engine vs. embedded in PostgreSQL
DBSP is a standalone streaming runtime (Rust library, now Feldera). It compiles query plans into dataflow graphs that maintain in-memory state and process continuous micro-batches. Operators are long-lived stateful actors with their own memory.
pg_trickle is an extension inside PostgreSQL. It has no persistent dataflow graph. On each refresh, it generates a single SQL query (CTE chain) that PostgreSQL's own planner/executor evaluates. After execution, no operator state persists — auxiliary state lives in the stream table itself (__pgt_count columns) and change buffer tables.
2. Streams vs. periodic batches
DBSP operates on true infinite streams indexed by logical time t ∈ ℕ. Each "step" processes one micro-batch of changes, and operators carry integration state (I operator = running sum from t=0).
pg_trickle operates in discrete refresh cycles triggered by a lag-based scheduler. There is no integration operator — the "current state" is just the stream table's contents, and changes are consumed from CDC buffer tables between LSN boundaries. Each refresh is a self-contained transaction.
3. Z-set weights vs. binary actions
DBSP uses integer weights in ℤ — rows can have weights > 1 (bags) or < −1 (multiple deletions). This enables correct multiset semantics and composable group algebra.
pg_trickle uses binary actions ('I' insert, 'D' delete, sometimes 'U' update). It doesn't maintain true Z-set weights. For aggregates, the __pgt_count auxiliary column serves a similar purpose but is specific to the aggregate operator — it's not a general weight propagated through the operator tree.
4. Integration operator (I)
DBSP: The integration operator I(s)[t] = Σᵢ≤ₜ s[i] is an explicit first-class circuit element. It maintains running sums of changes and is the key mechanism for computing incremental joins (z⁻¹(I(a)) = "accumulated left side up to previous step").
pg_trickle: No explicit integration. The equivalent of I is just "read the current contents of the source/stream table." Join differentiation directly reads the current snapshot of the non-delta side (build_snapshot_sql() generates FROM "public"."orders" r), which implicitly includes all historical changes.
5. Recursion
DBSP: Native fixed-point circuits with z⁻¹ delay. Can incrementally maintain recursive queries (e.g., transitive closure) by iterating only on new changes within each step — semi-naive evaluation generalized to arbitrary recursion.
pg_trickle: Uses recomputation-diff for recursive CTEs — re-executes the full recursive query and anti-joins against current storage to compute the delta. This is correct but not truly incremental for the recursive part.
6. Correctness guarantees
DBSP: Proven correct in Lean. All theorems are machine-checked. The chain rule, cycle rule, and bilinear decomposition are formally verified.
pg_trickle: Verified empirically via property-based tests (the assert_invariant checks that Contents(ST) = Q(DB) after each mutation cycle). No formal proof, but the per-operator rules are direct translations of DBSP's rules.
7. Scope
DBSP: A general-purpose theory and streaming engine. Handles nested relations, streaming aggregation over windows, arbitrary compositions. The Feldera implementation supports a full SQL frontend.
pg_trickle: Focused on materialized views inside PostgreSQL. Supports a specific subset of SQL (scan, filter, project, inner/left/full join, aggregates, DISTINCT, UNION ALL, INTERSECT, EXCEPT, CTEs, window functions, lateral joins). It is not a general streaming engine — it leverages PostgreSQL's own query planner and executor.
Summary
pg_trickle applies DBSP's differentiation rules to generate delta queries, but it is not a DBSP implementation. It borrows the mathematical framework (per-operator differentiation, Z-set-like deltas, bilinear join decomposition) while making fundamentally different architectural choices: embedded in PostgreSQL, no persistent dataflow state, periodic batch execution, and PostgreSQL's planner as the optimizer. Think of it as "DBSP's differentiation algebra, compiled down to SQL CTEs and executed by PostgreSQL."
Research: pg_ivm Comparison
This document is a detailed technical comparison between pg_trickle and pg_ivm covering supported SQL features, refresh latency, and operational differences. It is research material for contributors and evaluators performing a deep-dive comparison.
Quick comparison table (pg_trickle vs pg_ivm and other systems) is in Comparisons.
Abstract
pg_ivm and pg_trickle both implement incremental view maintenance inside PostgreSQL, but they target different maturity levels and operational requirements. pg_ivm is a research prototype that proves the IVM concept within PostgreSQL's standard materialized-view infrastructure — it supports a subset of SQL (single-table aggregates, simple joins) and uses statement-level BEFORE triggers with immediate synchronous refresh. pg_trickle is a production extension that targets the full TPC-H benchmark at O(Δ) complexity, supports thousands of concurrent stream tables, and provides an asynchronous scheduled refresh model with CDC-based change capture.
The key architectural divergence is the change-capture layer: pg_ivm uses pg_ivm_immediate_trigger() with NEW TABLE/OLD TABLE transition tables and refreshes synchronously within the originating transaction (adding write latency), whereas pg_trickle separates write-path (CDC triggers writing to change-buffer tables) from read-path (background scheduler running differential refresh). This separation allows pg_trickle to absorb write bursts, coalesce changes, and maintain stream tables at configurable latencies without blocking the application transaction.
This document provides a feature-matrix comparison across SQL coverage, refresh strategies, operational tooling, performance characteristics, and migration path from pg_ivm to pg_trickle. Where pg_ivm supports a feature that pg_trickle does not yet support (e.g., some window function variants), the gap is documented with a planned resolution version.
pg_trickle vs pg_ivm — Comparison Report & Gap Analysis
Date: 2026-02-28 (merged 2026-03-01, updated 2026-03-20) Author: Internal research Status: Reference document
1. Executive Summary
Both pg_trickle and pg_ivm implement Incremental View Maintenance (IVM) as
PostgreSQL extensions — the goal of keeping materialized query results up-to-date
without full recomputation. Despite the shared objective they differ fundamentally
in design philosophy, maintenance model, SQL coverage, operational model, and
target audience.
pg_ivm is a mature, widely-deployed C extension (1.4k GitHub stars, 17 releases)
focused on immediate, synchronous IVM that runs inside the same transaction as
the base-table write. pg_trickle is a Rust extension (v0.9.0) offering
both deferred (scheduled) and immediate (transactional) IVM with a richer SQL
dialect, a dependency DAG, and built-in operational tooling.
pg_trickle is significantly ahead of pg_ivm in SQL coverage, operator support,
aggregate support, and operational features. As of v0.2.1, pg_trickle also
matches pg_ivm's core strength — immediate, in-transaction maintenance — via
the IMMEDIATE refresh mode (all phases complete). pg_ivm's one remaining
structural advantage is broader PostgreSQL version support (PG 13–18):
- IMMEDIATE mode — fully implemented. Statement-level AFTER triggers with transition tables update stream tables within the same transaction as base-table DML. Window functions, LATERAL, scalar subqueries, cascading IMMEDIATE stream tables, WITH RECURSIVE (with a stack-depth warning), and TopK micro-refresh are all supported. See PLAN_TRANSACTIONAL_IVM.md.
- AUTO refresh mode — new default for
create_stream_table. Selects DIFFERENTIAL when the query supports it and transparently falls back to FULL otherwise, eliminating the need to choose a mode at creation time. - pg_ivm compatibility layer — postponed. The
pgivm.create_immv()/pgivm.refresh_immv()/pgivm.pg_ivm_immvwrappers (Phase 2) are deferred to post-1.0. - PLAN_PG_BACKCOMPAT.md details backporting
pg_trickle to PG 14–18 (recommended) or PG 16–18 (minimum viable),
requiring ~2.5–3 weeks of effort primarily in
#[cfg]-gating ~435 lines of JSON/SQL-standard parse-tree handling.
With IMMEDIATE mode fully implemented, Row Level Security support (v0.5.0), pg_dump/restore support (v0.8.0), algebraic aggregate maintenance (v0.9.0), parallel refresh (v0.4.0), circular pipeline support (v0.7.0), watermark APIs (v0.7.0), and 40+ unique features, pg_ivm's only remaining advantages are PG version breadth and production maturity.
2. Project Overview
| Attribute | pg_ivm | pg_trickle |
|---|---|---|
| Repository | sraoss/pg_ivm | trickle-labs/pg-trickle |
| Language | C | Rust (pgrx 0.17) |
| Latest release | 1.13 (2025-10-20) | 0.9.0 (2026-03-20) |
| Stars | ~1,400 | early stage |
| License | PostgreSQL License | Apache 2.0 |
| PG versions | 13 – 18 | 18 only; PG 14–18 planned |
| Schema | pgivm | pgtrickle / pgtrickle_changes |
| Shared library required | Yes (shared_preload_libraries or session_preload_libraries) | Yes (shared_preload_libraries, required for background worker) |
| Background worker | No | Yes (scheduler + optional WAL decoder) |
3. Maintenance Model
This is the most important design difference between the two extensions.
pg_ivm — Immediate Maintenance
pg_ivm updates its views synchronously inside the same transaction that
modified the base table. When a row is inserted/updated/deleted, AFTER row
triggers fire and update the IMMV before the transaction commits.
BEGIN;
UPDATE base_table ...; -- triggers fire here
-- IMMV is updated before COMMIT
COMMIT;
Consequences:
- The IMMV is always exactly consistent with the committed state of the base table — zero staleness.
- Write latency increases by the cost of view maintenance. For large joins or aggregates on popular tables this can be significant.
- Locking:
ExclusiveLockis held on the IMMV during maintenance to prevent concurrent anomalies. InREPEATABLE READorSERIALIZABLEisolation, errors are raised when conflicts are detected. TRUNCATEon a base table triggers full IMMV refresh (for most view types).- Not compatible with logical replication (subscriber nodes are not updated).
pg_trickle — Deferred, Scheduled Maintenance
pg_trickle updates its stream tables asynchronously, driven by a background worker scheduler. Changes are captured by row-level triggers (or optionally by WAL decoding) into change-buffer tables and are applied in batch on the next refresh cycle.
-- Write path: only a trigger INSERT into change buffer
BEGIN;
UPDATE base_table ...; -- trigger captures delta into pgtrickle_changes.*
COMMIT;
-- Separate refresh cycle (background worker):
apply_delta_to_stream_table(...)
Consequences:
- Write latency is minimized — the trigger write into the change buffer is ~2–50 μs regardless of view complexity.
- Stream tables are stale between refresh cycles. The staleness bound is
configurable (e.g.
'30s','5m','@hourly', or cron expressions). - Refresh can be triggered manually:
pgtrickle.refresh_stream_table(...). - Multiple stream tables can share a refresh pipeline ordered by dependency (topological DAG scheduling).
- The WAL-based CDC mode (
pg_trickle.cdc_mode = 'wal') eliminates trigger overhead entirely whenwal_level = logicalis available. - Append-only fast path (v0.5.0):
append_only => trueskips merge for INSERT-only tables with auto-fallback if DELETE/UPDATE detected. - Source gating (v0.5.0): pause CDC during bulk loads via
gate_source()andungate_source()to avoid trigger overhead during large batch inserts.
Implemented: pg_trickle IMMEDIATE Mode
pg_trickle now offers an IMMEDIATE refresh mode (Phase 1 + Phase 3 complete)
that uses statement-level AFTER triggers with transition tables — the same
mechanism pg_ivm uses. Key implementation details:
- Reuses the DVM engine — the Scan operator reads from transition tables (via temporary views) instead of change buffer tables.
- Phase 1 (complete): core IMMEDIATE engine — INSERT/UPDATE/DELETE/TRUNCATE
handling, advisory lock-based concurrency (
IvmLockMode), mode switching viaalter_stream_table, query restriction validation. - Phase 2 (postponed):
pgivm.*compatibility layer for drop-in migration. - Phase 3 (complete): extended SQL support — window functions, LATERAL,
scalar subqueries, cascading IMMEDIATE stream tables, WITH RECURSIVE
(IM1: supported with a stack-depth warning), and TopK micro-refresh
(IM2: recomputes top-K on every DML, gated by
pg_trickle.ivm_topk_max_limit). - Phase 4 (complete): delta SQL template caching (
IVM_DELTA_CACHE); ENR-based transition tables and C-level triggers deferred to post-1.0 as optimizations only.
-- Create an IMMEDIATE stream table (zero staleness)
SELECT pgtrickle.create_stream_table(
'live_totals',
'SELECT region, SUM(amount) AS total FROM orders GROUP BY region',
NULL, -- no schedule needed
'IMMEDIATE'
);
-- Updates propagate within the same transaction
BEGIN;
INSERT INTO orders (region, amount) VALUES ('EU', 100);
SELECT * FROM live_totals; -- already includes the new row
COMMIT;
4. SQL Feature Coverage — Summary
| Dimension | pg_ivm | pg_trickle | Winner |
|---|---|---|---|
| Maintenance timing | Immediate (in-transaction triggers) | Deferred (scheduler/manual) and IMMEDIATE (in-transaction) | pg_trickle (offers both models) |
| PostgreSQL versions | 13–18 | 18 only; PG 14–18 planned | pg_ivm (today); planned parity |
| Aggregate functions | 5 (COUNT, SUM, AVG, MIN, MAX) | 60+ (all built-in aggregates incl. algebraic O(1) for COUNT/SUM/AVG/STDDEV/VAR) | pg_trickle |
| FILTER clause on aggregates | No | Yes | pg_trickle |
| HAVING clause | No | Yes | pg_trickle |
| Inner joins | Yes (including self-join) | Yes (including self-join, NATURAL, nested) | pg_trickle |
| Outer joins | Yes (limited — equijoin, single condition, many restrictions) | Yes (LEFT/RIGHT/FULL, nested, complex conditions) | pg_trickle |
| DISTINCT | Yes (reference-counted) | Yes (reference-counted) | Tie |
| DISTINCT ON | No | Yes (auto-rewritten to ROW_NUMBER) | pg_trickle |
| UNION / INTERSECT / EXCEPT | No | Yes (all 6 variants, bag + set) | pg_trickle |
| Window functions | No | Yes (partition recomputation) | pg_trickle |
| CTEs (non-recursive) | Simple only (no aggregates, no DISTINCT inside) | Full (aggregates, DISTINCT, multi-reference shared delta) | pg_trickle |
| CTEs (recursive) | No | Yes (semi-naive, DRed, recomputation; IMMEDIATE mode with stack-depth warning) | pg_trickle |
| Subqueries in FROM | Simple only (no aggregates/DISTINCT inside) | Full support | pg_trickle |
| EXISTS subqueries | Yes (WHERE only, AND only, no agg/DISTINCT) | Yes (WHERE + targetlist, AND/OR, agg/DISTINCT inside) | pg_trickle |
| NOT EXISTS / NOT IN | No | Yes (anti-join operator) | pg_trickle |
| IN (subquery) | No | Yes (semi-join operator) | pg_trickle |
| Scalar subquery in SELECT | No | Yes (scalar subquery operator) | pg_trickle |
| LATERAL subqueries | No | Yes (row-scoped recomputation) | pg_trickle |
| LATERAL SRFs | No | Yes (jsonb_array_elements, unnest, etc.) | pg_trickle |
| JSON_TABLE (PG 17+) | No | Yes | pg_trickle |
| GROUPING SETS / CUBE / ROLLUP | No | Yes (auto-rewritten to UNION ALL) | pg_trickle |
| Views as sources | No (simple tables only) | Yes (auto-inlined, nested) | pg_trickle |
| Partitioned tables | No | Yes | pg_trickle |
| Foreign tables | No | FULL mode only | pg_trickle |
| Cascading (view-on-view) | No | Yes (DAG-aware scheduling) | pg_trickle |
| Background scheduling | No (user must trigger) | Yes (cron + duration, background worker) | pg_trickle |
| Monitoring / observability | 1 catalog table | Extensive (stats, history, staleness, CDC health, NOTIFY) | pg_trickle |
| CDC mechanism | Triggers only | Hybrid (triggers + optional WAL) | pg_trickle |
| DDL tracking | No automatic handling | Yes (event triggers, auto-reinit) | pg_trickle |
| TRUNCATE handling | Yes (auto-truncate IMMV) | IMMEDIATE mode: full refresh in same txn; DEFERRED: queued full refresh | Tie (functionally equivalent in IMMEDIATE mode) |
| Auto-indexing | Yes (on GROUP BY / DISTINCT / PK columns) | No (user creates indexes) | pg_ivm |
| Row Level Security | Yes (with limitations) | Yes (refreshes see all data; RLS on stream table; IMMEDIATE mode secured) | pg_trickle (richer model) |
| Concurrency model | ExclusiveLock on IMMV during maintenance | Advisory locks, non-blocking reads, parallel refresh | pg_trickle |
| Data type restrictions | Must have btree opclass (no json, xml, point) | No documented type restrictions | pg_trickle |
| Maturity / ecosystem | 4 years, 1.4k stars, PGXN, yum packages | v0.9.0 released, 1,100+ unit tests + 900+ E2E tests, 22 TPC-H benchmarks, dbt integration | pg_ivm |
4.1 Areas Where pg_ivm Wins
Of the ~35 dimensions in the summary table above, pg_ivm holds an advantage in only 3 (down from 6 before IMMEDIATE mode and RLS were implemented). One is substantive, two are temporary gaps with existing plans.
1. PostgreSQL Version Support (substantive, planned resolution)
pg_ivm ships pre-built packages for PostgreSQL 13–18 across all major Linux distros via yum.postgresql.org and PGXN. pg_trickle currently targets PG 18 only.
This is the single largest remaining structural gap. PG 13 is EOL (Nov 2025), but PG 14–17 are widely deployed in production environments. Users on those versions simply cannot use pg_trickle today.
Planned resolution: PLAN_PG_BACKCOMPAT.md
details backporting to PG 14–18 (~2.5–3 weeks). pgrx 0.17 already supports
PG 14–18 via feature flags; ~435 lines in parser.rs need #[cfg] gating
for JSON/SQL-standard parse-tree handling.
2. Auto-Indexing (substantive, low priority)
When pg_ivm creates an IMMV, it automatically adds indexes on columns used in
GROUP BY, DISTINCT, and primary keys. This is a genuine usability advantage
— new users get reasonable read performance without manual intervention.
pg_trickle leaves index creation entirely to the user. For DIFFERENTIAL mode
stream tables, the DVM engine's MERGE-based delta application already uses the
stream table's primary key (which is auto-created), and index-aware MERGE
(pg_trickle.merge_seqscan_threshold, added v0.9.0) uses index lookups for
tiny change ratios, but secondary indexes for read-side query patterns must
be added manually.
Impact: Low — experienced users always create application-specific indexes anyway. Auto-indexing mostly helps onboarding and simple use-cases.
Planned resolution: Tracked as part of the pg_ivm compatibility layer
(Phase 2, postponed to post-1.0). Could also be implemented independently as
a CREATE INDEX IF NOT EXISTS step in create_stream_table.
3. Maturity / Ecosystem (temporary, closing over time)
pg_ivm has 4 years of production use, ~1,400 GitHub stars, 17 releases, and is distributed via PGXN, yum, and apt package repositories. It has a track record of stability and a community of users.
pg_trickle is a v0.9.0 series release with 1,100+ unit tests, 200+ integration tests, 570+ light E2E tests, 90+ full E2E tests, and 22 TPC-H correctness benchmarks—but no wide production deployments yet. It lacks the battle-testing that comes from years of real-world usage.
Impact: High for risk-averse organizations considering production adoption. Low for greenfield projects or teams willing to adopt early.
Resolution: This gap closes naturally with time, releases, and adoption.
The dbt integration (dbt-pgtrickle) and CNPG/Kubernetes deployment support
accelerate ecosystem development.
5. Detailed SQL Comparison
5.1 Aggregate Functions
| Function | pg_ivm | pg_trickle |
|---|---|---|
| COUNT(*) / COUNT(expr) | ✅ Algebraic | ✅ Algebraic (O(1) running total, v0.9.0) |
| SUM | ✅ Algebraic | ✅ Algebraic (O(1) running total, v0.9.0) |
| AVG | ✅ Algebraic (via SUM/COUNT) | ✅ Algebraic (O(1) via SUM/COUNT decomposition, v0.9.0) |
| MIN | ✅ Semi-algebraic (rescan on extremum delete) | ✅ Semi-algebraic (O(1) unless extremum deleted, v0.9.0 safety guard) |
| MAX | ✅ Semi-algebraic (rescan on extremum delete) | ✅ Semi-algebraic (O(1) unless extremum deleted, v0.9.0 safety guard) |
| BOOL_AND / BOOL_OR | ❌ | ✅ Group-rescan |
| STRING_AGG | ❌ | ✅ Group-rescan |
| ARRAY_AGG | ❌ | ✅ Group-rescan |
| JSON_AGG / JSONB_AGG | ❌ | ✅ Group-rescan |
| BIT_AND / BIT_OR / BIT_XOR | ❌ | ✅ Group-rescan |
| JSON_OBJECT_AGG / JSONB_OBJECT_AGG | ❌ | ✅ Group-rescan |
| STDDEV / VARIANCE (all variants) | ❌ | ✅ Algebraic (O(1) sum-of-squares decomposition, v0.9.0) |
| MODE / PERCENTILE_CONT / PERCENTILE_DISC | ❌ | ✅ Group-rescan |
| CORR / COVAR / REGR_* (11 functions) | ❌ | ✅ Group-rescan |
| ANY_VALUE (PG 16+) | ❌ | ✅ Group-rescan |
| JSON_ARRAYAGG / JSON_OBJECTAGG (PG 16+) | ❌ | ✅ Group-rescan |
| User-defined aggregates (CREATE AGGREGATE) | ❌ | ✅ Group-rescan |
| FILTER (WHERE) clause | ❌ | ✅ |
| WITHIN GROUP (ORDER BY) | ❌ | ✅ |
| COUNT(DISTINCT expr) / SUM(DISTINCT expr) | ❌ | ✅ |
| Total | 5 | 60+ |
Gap for pg_ivm: Massive. Only 5 of ~60 built-in aggregate functions are supported.
pg_trickle v0.9.0 also introduced algebraic (O(1)) maintenance for COUNT,
SUM, AVG, STDDEV, and VARIANCE — meaning these aggregates update in constant
time per changed row via running totals, whereas pg_ivm’s algebraic support
is limited to COUNT, SUM, AVG. pg_trickle additionally supports user-defined
aggregates via group-rescan and floating-point drift correction
(pg_trickle.algebraic_drift_reset_cycles).
5.2 Joins
| Feature | pg_ivm | pg_trickle |
|---|---|---|
| Inner join | ✅ | ✅ |
| Self-join | ✅ | ✅ |
| LEFT JOIN | ✅ (restricted) | ✅ (full) |
| RIGHT JOIN | ✅ (restricted) | ✅ (normalized to LEFT) |
| FULL OUTER JOIN | ✅ (restricted) | ✅ (8-part delta) |
| NATURAL JOIN | ? | ✅ |
| Cross join | ? | ✅ |
| Nested joins (3+ tables) | ✅ | ✅ |
| Non-equi joins (theta) | ? | ✅ |
| Outer join + aggregates | ❌ | ✅ |
| Outer join + subqueries | ❌ | ✅ |
| Outer join + CASE/non-strict | ❌ | ✅ |
| Outer join multi-condition | ❌ (single equality only) | ✅ |
Gap for pg_ivm: Outer joins are heavily restricted — single equijoin condition, no aggregates, no subqueries, no CASE expressions, no IS NULL in WHERE.
5.3 Subqueries
| Feature | pg_ivm | pg_trickle |
|---|---|---|
| Simple subquery in FROM | ✅ (no aggregates/DISTINCT inside) | ✅ (full support) |
| EXISTS in WHERE | ✅ (AND only, no agg/DISTINCT inside) | ✅ (AND + OR, full SQL inside) |
| NOT EXISTS in WHERE | ❌ | ✅ (anti-join operator) |
| IN (subquery) | ❌ | ✅ (rewritten to semi-join) |
| NOT IN (subquery) | ❌ | ✅ (rewritten to anti-join) |
| ALL (subquery) | ❌ | ✅ (rewritten to anti-join) |
| Scalar subquery in SELECT | ❌ | ✅ (scalar subquery operator) |
| Scalar subquery in WHERE | ❌ | ✅ (auto-rewritten to CROSS JOIN) |
| LATERAL subquery in FROM | ❌ | ✅ (row-scoped recomputation) |
| LATERAL SRF in FROM | ❌ | ✅ (jsonb_array_elements, unnest, etc.) |
| Subqueries in OR | ❌ | ✅ (auto-rewritten to UNION) |
Gap for pg_ivm: Severely limited subquery support. No anti-joins, no scalar subqueries, no LATERAL, no SRFs.
5.4 CTEs
| Feature | pg_ivm | pg_trickle |
|---|---|---|
| Simple non-recursive CTE | ✅ (no aggregates/DISTINCT inside) | ✅ (full SQL inside) |
| Multi-reference CTE | ? | ✅ (shared delta optimization) |
| Chained CTEs | ? | ✅ |
| WITH RECURSIVE | ❌ | ✅ (semi-naive, DRed, recomputation; IMMEDIATE mode with stack-depth warning) |
Gap for pg_ivm: No recursive CTEs, no aggregates/DISTINCT inside CTEs.
5.5 Set Operations
| Feature | pg_ivm | pg_trickle |
|---|---|---|
| UNION ALL | ❌ | ✅ |
| UNION (set) | ❌ | ✅ (via DISTINCT + UNION ALL) |
| INTERSECT | ❌ | ✅ (dual-count multiplicity) |
| INTERSECT ALL | ❌ | ✅ |
| EXCEPT | ❌ | ✅ (dual-count multiplicity) |
| EXCEPT ALL | ❌ | ✅ |
Gap for pg_ivm: No set operations at all.
5.6 Window Functions
| Feature | pg_ivm | pg_trickle |
|---|---|---|
| ROW_NUMBER, RANK, DENSE_RANK | ❌ | ✅ |
| SUM/AVG/COUNT OVER () | ❌ | ✅ |
| Frame clauses (ROWS/RANGE/GROUPS) | ❌ | ✅ |
| Named WINDOW clauses | ❌ | ✅ |
| PARTITION BY recomputation | ❌ | ✅ |
Gap for pg_ivm: Window functions are completely unsupported.
5.7 DISTINCT & Grouping
| Feature | pg_ivm | pg_trickle |
|---|---|---|
| SELECT DISTINCT | ✅ | ✅ |
| DISTINCT ON (expr, ...) | ❌ | ✅ (auto-rewritten to ROW_NUMBER) |
| GROUP BY | ✅ | ✅ |
| GROUPING SETS | ❌ | ✅ (auto-rewritten to UNION ALL) |
| CUBE | ❌ | ✅ (auto-rewritten via GROUPING SETS) |
| ROLLUP | ❌ | ✅ (auto-rewritten via GROUPING SETS) |
| GROUPING() function | ❌ | ✅ |
| HAVING | ❌ | ✅ |
5.8 Source Table Types
| Source type | pg_ivm | pg_trickle |
|---|---|---|
| Simple heap tables | ✅ | ✅ |
| Views | ❌ | ✅ (auto-inlined) |
| Materialized views | ❌ | FULL mode only |
| Partitioned tables | ❌ | ✅ |
| Partitions | ❌ | ✅ (via parent) |
| Foreign tables | ❌ | FULL mode only |
| Other IMMVs / stream tables | ❌ | ✅ (DAG cascading) |
Gap for pg_ivm: Only simple heap tables. No views, no partitioned tables, no cascading.
6. API Comparison
pg_ivm API
-- Create an IMMV
SELECT pgivm.create_immv('myview', 'SELECT * FROM mytab');
-- Full refresh (emergency)
SELECT pgivm.refresh_immv('myview', true); -- with data
SELECT pgivm.refresh_immv('myview', false); -- disable maintenance
-- Inspect
SELECT immvrelid, pgivm.get_immv_def(immvrelid)
FROM pgivm.pg_ivm_immv;
-- Drop
DROP TABLE myview;
-- Rename
ALTER TABLE myview RENAME TO myview2;
pg_ivm IMMVs are standard PostgreSQL tables. They can be dropped with
DROP TABLE and renamed with ALTER TABLE.
pg_trickle API
-- Create a stream table (AUTO mode: DIFFERENTIAL when possible, FULL fallback)
SELECT pgtrickle.create_stream_table(
'order_totals',
'SELECT region, SUM(amount) AS total FROM orders GROUP BY region'
-- refresh_mode defaults to 'AUTO', schedule defaults to 'calculated'
);
-- Create a stream table (explicit deferred, scheduled)
SELECT pgtrickle.create_stream_table(
'order_totals',
'SELECT region, SUM(amount) AS total FROM orders GROUP BY region',
schedule => '2m',
refresh_mode => 'DIFFERENTIAL'
);
-- Create a stream table (immediate, in-transaction)
SELECT pgtrickle.create_stream_table(
'live_totals',
'SELECT region, SUM(amount) AS total FROM orders GROUP BY region',
schedule => NULL,
refresh_mode => 'IMMEDIATE'
);
-- Manual refresh
SELECT pgtrickle.refresh_stream_table('order_totals');
-- Alter schedule, mode, or defining query
SELECT pgtrickle.alter_stream_table('order_totals', schedule => '5m');
SELECT pgtrickle.alter_stream_table(
'order_totals',
query => 'SELECT region, SUM(amount) AS total FROM orders WHERE active GROUP BY region'
);
-- Drop
SELECT pgtrickle.drop_stream_table('order_totals');
-- Status and monitoring
SELECT * FROM pgtrickle.pgt_status();
SELECT * FROM pgtrickle.pg_stat_stream_tables;
SELECT * FROM pgtrickle.pgt_stream_tables;
-- DAG inspection
SELECT * FROM pgtrickle.pgt_dependencies;
-- Extended observability (added v0.2.0+)
SELECT * FROM pgtrickle.change_buffer_sizes(); -- CDC buffer health
SELECT * FROM pgtrickle.list_sources('order_totals'); -- source table stats
SELECT * FROM pgtrickle.dependency_tree(); -- ASCII DAG view
SELECT * FROM pgtrickle.health_check(); -- OK/WARN/ERROR triage
SELECT * FROM pgtrickle.refresh_timeline(); -- cross-stream history
SELECT * FROM pgtrickle.trigger_inventory(); -- CDC trigger audit
SELECT * FROM pgtrickle.diamond_groups(); -- diamond consistency groups
-- Source gating (v0.5.0)
SELECT pgtrickle.gate_source('orders'); -- pause CDC
SELECT pgtrickle.ungate_source('orders'); -- resume CDC
SELECT * FROM pgtrickle.source_gates(); -- gate status
-- Watermarks (v0.7.0)
SELECT pgtrickle.advance_watermark('orders', '2026-03-20 12:00:00');
SELECT pgtrickle.create_watermark_group('sync', ARRAY['orders','products'], 30);
SELECT * FROM pgtrickle.watermarks();
SELECT * FROM pgtrickle.watermark_status();
-- Parallel refresh monitoring (v0.4.0)
SELECT * FROM pgtrickle.worker_pool_status();
SELECT * FROM pgtrickle.parallel_job_status();
-- Refresh groups (v0.9.0)
SELECT pgtrickle.create_refresh_group('my_group', ARRAY['st1','st2']);
SELECT pgtrickle.drop_refresh_group('my_group');
-- Idempotent DDL (v0.6.0)
SELECT pgtrickle.create_or_replace_stream_table(
'order_totals',
'SELECT region, SUM(amount) AS total FROM orders GROUP BY region'
);
pg_trickle stream tables are regular PostgreSQL tables but managed through the
pgtrickle schema's API functions. They cannot be renamed with ALTER TABLE
(use alter_stream_table).
7. Scheduling and Dependency Management
| Capability | pg_ivm | pg_trickle |
|---|---|---|
| Automatic scheduling | ❌ (immediate only, no scheduler) | ✅ background worker |
| Manual refresh | ✅ refresh_immv() | ✅ refresh_stream_table() |
| Cron schedules | ❌ | ✅ (standard 5/6-field cron + aliases) |
| Duration-based staleness bounds | ❌ | ✅ ('30s', '5m', '1h', …) |
| Dependency DAG | ❌ | ✅ (stream tables can reference other stream tables) |
| Topological refresh ordering | ❌ | ✅ (upstream refreshes before downstream) |
| CALCULATED schedule propagation | ❌ | ✅ (consumers drive upstream schedules) |
| Parallel refresh | ❌ | ✅ (worker pool with database + cluster caps, v0.4.0) |
| Circular pipeline support | ❌ | ✅ (monotone cycles with fixed-point iteration, v0.7.0) |
| Watermark coordination | ❌ | ✅ (multi-source readiness gates, v0.7.0) |
| Refresh group management | ❌ | ✅ (atomic multi-ST refresh, v0.9.0) |
pg_trickle's DAG scheduling is a significant differentiator: you can build multi-layer pipelines where each downstream stream table is automatically refreshed after its upstream dependencies.
8. Change Data Capture
| Attribute | pg_ivm | pg_trickle |
|---|---|---|
| Mechanism | AFTER row triggers (inline, same txn) | AFTER row/statement triggers → change buffer |
| WAL-based CDC | ❌ | ✅ optional (pg_trickle.cdc_mode = 'wal') |
| Statement-level triggers | ❌ | ✅ (v0.4.0, reduced overhead for bulk operations) |
| Logical replication slots | Not used | Used in WAL mode only |
| Write-side overhead | Higher (view maintenance in txn) | Lower (small trigger insert only) |
| Change buffer tables | None (applied immediately) | pgtrickle_changes.changes_<oid> |
| TRUNCATE handling | IMMV truncated/refreshed synchronously | Change buffer cleared; full refresh queued |
9. Concurrency and Isolation
pg_ivm
- Holds
ExclusiveLockon the IMMV during incremental update. - In
READ COMMITTED: serializes concurrent updates to the same IMMV. - In
REPEATABLE READ/SERIALIZABLE: raises an error when a concurrent transaction has already updated the IMMV. - Single-table INSERT-only IMMVs use the lighter
RowExclusiveLock.
pg_trickle
- Refresh operations acquire an advisory lock per stream table so only one refresh can run at a time.
- Base table writes are never blocked by refresh operations.
- Parallel refresh (v0.4.0):
pg_trickle.parallel_refresh_mode = 'on'enables a worker pool with per-database (max_concurrent_refreshes, default 4) and cluster-wide (max_dynamic_refresh_workers) caps. - Atomic refresh groups for diamond dependencies.
- Crash recovery: in-flight refreshes are marked failed on restart; the scheduler retries on the next cycle.
10. Observability
| Feature | pg_ivm | pg_trickle |
|---|---|---|
| Catalog of managed views | pgivm.pg_ivm_immv | pgtrickle.pgt_stream_tables |
| Per-refresh timing/history | ❌ | ✅ pgtrickle.pgt_refresh_history |
| Staleness reporting | ❌ | ✅ stale column + get_staleness() |
| Scheduler status | ❌ | ✅ pgtrickle.pgt_status() |
| NOTIFY-based alerting | ❌ | ✅ pgtrickle_refresh channel (10+ alert types) |
| Error tracking | ❌ | ✅ consecutive error counter, last error message |
| dbt integration | ❌ | ✅ dbt-pgtrickle macro package |
| Explain/introspection | ❌ | ✅ explain_st |
| CDC buffer health | ❌ | ✅ pgtrickle.change_buffer_sizes() (v0.2.0) |
| Source table stats | ❌ | ✅ pgtrickle.list_sources() (v0.2.0) |
| Dependency tree view | ❌ | ✅ pgtrickle.dependency_tree() (v0.2.0) |
| Health triage | ❌ | ✅ pgtrickle.health_check() (v0.2.0) |
| Cross-stream refresh history | ❌ | ✅ pgtrickle.refresh_timeline() (v0.2.0) |
| CDC trigger audit | ❌ | ✅ pgtrickle.trigger_inventory() (v0.2.0) |
| Diamond group inspection | ❌ | ✅ pgtrickle.diamond_groups() (v0.2.0) |
| Quick health summary | ❌ | ✅ pgtrickle.quick_health view (v0.5.0) |
| Source gating status | ❌ | ✅ pgtrickle.source_gates() (v0.5.0) |
| Watermark monitoring | ❌ | ✅ pgtrickle.watermarks() / watermark_status() (v0.7.0) |
| Parallel worker status | ❌ | ✅ pgtrickle.worker_pool_status() / parallel_job_status() (v0.4.0) |
| SCC cycle status | ❌ | ✅ pgtrickle.pgt_scc_status() (v0.7.0) |
| Replication slot health | ❌ | ✅ pgtrickle.slot_health() |
| CDC mode per-source | ❌ | ✅ pgtrickle.pgt_cdc_status view |
11. Installation and Deployment
| Attribute | pg_ivm | pg_trickle |
|---|---|---|
| Pre-built packages | RPM via yum.postgresql.org | OCI image, tarball |
| CNPG / Kubernetes | ❌ (no OCI image) | ✅ OCI extension image + CNPG smoke tests |
| Docker local dev | Manual | ✅ documented + Docker Hub image |
shared_preload_libraries | Required (or session_preload_libraries) | Required |
| Extension upgrade scripts | ✅ (1.0 → 1.1 → … → 1.13) | ✅ (0.1.3 → … → 0.9.0, CI completeness check, upgrade E2E tests) |
pg_dump / restore | Manual IMMV recreation required | ✅ Standard pg_dump supported (v0.8.0) |
12. Performance Characteristics
pg_ivm
- Write path: slower — every DML statement triggers inline view maintenance. From the README example: a single row update on a 10M-row join IMMV takes ~15 ms vs ~9 ms for a plain table update.
- Read path: instant — IMMV is always current, no refresh needed on read.
- Refresh (full): comparable to
REFRESH MATERIALIZED VIEW(~20 seconds for a 10M-row join in the example).
pg_trickle
- Write path: minimal overhead — only a small trigger INSERT into the change buffer (~2–50 μs per row). In WAL mode, zero trigger overhead. Statement-level CDC triggers (v0.4.0) further reduce overhead for bulk ops.
- Read path: instant from the materialized table (potentially stale).
- Refresh (differential): proportional to the number of changed rows, not the total table size. A single-row change on a million-row aggregate touches one row's worth of computation. Algebraic aggregates (v0.9.0) like COUNT/SUM/AVG/STDDEV/VAR update in O(1) constant time per changed row.
- Refresh (full): re-runs the entire query; comparable to
REFRESH MATERIALIZED VIEW. - Parallel refresh (v0.4.0): linear speedup with worker pool size.
- I/O optimizations (v0.9.0): column skipping, source skipping in joins, WHERE filter push-down, index-aware MERGE for tiny change ratios, scalar subquery short-circuit.
13. Known Limitations
pg_ivm Limitations
- Adds latency to every write on tracked base tables.
- Cannot track tables modified via logical replication (subscriber nodes are not updated).
pg_dump/pg_upgraderequire manual recreation of all IMMVs.- Limited aggregate support (no user-defined aggregates, no window functions).
- Column type restrictions (btree operator class required in target list).
- No scheduler or background worker — refresh is immediate only.
- On high-churn tables,
min/maxaggregates can trigger expensive rescans.
pg_trickle Limitations
- In DIFFERENTIAL/FULL mode, data is stale between refresh cycles. Use IMMEDIATE mode for zero-staleness, in-transaction consistency.
- Recursive CTEs in IMMEDIATE mode emit a stack-depth warning; very deep recursion may hit PostgreSQL's stack limit.
- Recursive CTEs in DIFFERENTIAL mode fall back to full recomputation for mixed DELETE/UPDATE changes (DRed scheduled for v0.10.0+).
LIMITwithoutORDER BYis not supported in defining queries.OFFSETwithoutORDER BY … LIMITis not supported. Paged TopK (ORDER BY … LIMIT N OFFSET M) is fully supported.ORDER BY+LIMIT(TopK) without OFFSET uses scoped recomputation (MERGE).- Volatile SQL functions rejected in DIFFERENTIAL mode.
- Materialized views as sources not supported in DIFFERENTIAL mode.
- Window functions in expressions (e.g.
CASE WHEN ROW_NUMBER() OVER (...) > 5) require FULL mode. - Foreign tables as sources require FULL mode.
ALTER EXTENSION pg_trickle UPDATEmigration scripts ship from v0.2.1; continuous upgrade path through v0.9.0.- Targets PostgreSQL 18 only; no backport to PG 13–17 (planned for PG 14–18).
- v0.9.x series — extensive testing but not yet production-hardened at scale.
14. PostgreSQL Version Support
| pg_ivm | pg_trickle (current) | pg_trickle (planned) | |
|---|---|---|---|
| PG 13 | ✅ | ❌ | ❌ (EOL Nov 2025) |
| PG 14 | ✅ | ❌ | ✅ (full plan) |
| PG 15 | ✅ | ❌ | ✅ (full plan) |
| PG 16 | ✅ | ❌ | ✅ (MVP target) |
| PG 17 | ✅ | ❌ | ✅ (MVP target) |
| PG 18 | ✅ | ✅ | ✅ |
Planned resolution: PLAN_PG_BACKCOMPAT.md:
- Minimum viable (PG 16–18): ~1.5 weeks effort.
- Full target (PG 14–18): ~2.5–3 weeks effort.
- pgrx 0.17.0 already supports PG 14–18 via feature flags.
- ~435 lines in
src/dvm/parser.rsneed#[cfg]gating (all in JSON/SQL-standard sections). The remaining ~13,500 lines compile unchanged.
Feature degradation matrix:
| Feature | PG 14 | PG 15 | PG 16 | PG 17 | PG 18 |
|---|---|---|---|---|---|
| Core streaming tables | ✅ | ✅ | ✅ | ✅ | ✅ |
| Trigger-based CDC | ✅ | ✅ | ✅ | ✅ | ✅ |
| Differential refresh | ✅ | ✅ | ✅ | ✅ | ✅ |
| SQL/JSON constructors | — | — | ✅ | ✅ | ✅ |
| JSON_TABLE | — | — | — | ✅ | ✅ |
| WAL-based CDC | Needs test | Needs test | Likely | Likely | ✅ |
15. Features Unique to Each System
Features Unique to pg_trickle (42 items, no pg_ivm equivalent)
- IMMEDIATE + deferred modes (pg_ivm is immediate-only; pg_trickle offers both)
- 60+ aggregate functions (vs 5), including algebraic O(1) for COUNT/SUM/AVG/STDDEV/VAR
- FILTER / HAVING / WITHIN GROUP on aggregates
- Window functions (partition recomputation)
- Set operations (UNION ALL, UNION, INTERSECT, EXCEPT — all 6 variants)
- Recursive CTEs (semi-naive, DRed, recomputation; including IMMEDIATE mode with stack-depth warning)
- LATERAL subqueries and SRFs (jsonb_array_elements, unnest, JSON_TABLE)
- Anti-join / semi-join operators (NOT EXISTS, NOT IN, IN, EXISTS with full SQL)
- Scalar subqueries in SELECT list
- Views as sources (auto-inlined with nested expansion)
- Partitioned table support (RANGE, LIST, HASH with auto-rebuild on ATTACH PARTITION)
- Cascading stream tables (ST referencing other STs via DAG)
- Background scheduler (cron + duration + canonical periods) with multi-database auto-discovery
- GROUPING SETS / CUBE / ROLLUP (auto-rewritten)
- DISTINCT ON (auto-rewritten to ROW_NUMBER)
- Hybrid CDC (trigger → WAL transition)
- DDL change detection and automatic reinitialization (including ALTER FUNCTION body changes)
- Monitoring suite (15+ observability functions:
change_buffer_sizes,list_sources,dependency_tree,health_check,refresh_timeline,trigger_inventory,diamond_groups,source_gates,watermarks,watermark_groups,watermark_status,worker_pool_status,parallel_job_status,pgt_scc_status,slot_health,check_cdc_health) - Auto-rewrite pipeline (6 transparent SQL rewrites)
- Volatile function detection
- AUTO refresh mode (smart DIFFERENTIAL/FULL selection with transparent fallback)
- ALTER QUERY — change the defining query of an existing stream table online, with schema-change classification and OID-preserving migration
- dbt macro package (materialization, status macro, health test, refresh operation)
- CNPG / Kubernetes deployment
- SQL/JSON constructors (JSON_OBJECT, JSON_ARRAY, etc.)
- JSON_TABLE support (PG 17+)
- TopK stream tables (ORDER BY + LIMIT, including IMMEDIATE mode via micro-refresh)
- Paged TopK (ORDER BY + LIMIT + OFFSET for server-side pagination)
- Diamond dependency consistency (multi-path refresh atomicity with SAVEPOINT)
- Extension upgrade infrastructure (SQL migration scripts, CI completeness check, upgrade E2E tests, per-release SQL baselines)
- Row Level Security (refreshes see all data; RLS policies on ST itself; IMMEDIATE mode secured; internal change buffers shielded from RLS interference) (v0.5.0)
- Source gating (pause/resume CDC for bulk loads:
gate_source,ungate_source) (v0.5.0) - Append-only fast path (
append_only => trueskips merge for INSERT-only tables) (v0.5.0) - Parallel refresh (background worker pool with per-database and cluster-wide caps, atomic groups for diamond dependencies) (v0.4.0)
- Statement-level CDC triggers (reduced write-side overhead for bulk operations) (v0.4.0)
- Circular pipeline support (monotone cycles with fixed-point iteration,
max_fixpoint_iterationssafety limit, SCC status monitoring) (v0.7.0) - Watermark APIs (delay refresh until multi-source data is ready:
advance_watermark,create_watermark_group, tolerance-based readiness) (v0.7.0) - pg_dump / pg_restore support (safe backup with auto-reconnect of streams) (v0.8.0)
- Algebraic aggregate maintenance (O(1) constant-time updates for COUNT/SUM/AVG/STDDEV/VAR with floating-point drift correction) (v0.9.0)
- Refresh group management (
create_refresh_group,drop_refresh_groupfor atomic multi-ST refresh) (v0.9.0) - Automatic backoff (exponential slowdown for overloaded streams) (v0.9.0)
- Index-aware MERGE (use index lookups for tiny change ratios) (v0.9.0)
Features Unique to pg_ivm (with planned resolutions)
| # | Feature | Status | Ref |
|---|---|---|---|
| 1 | Immediate (synchronous) maintenance | ✅ Closed — IMMEDIATE refresh mode fully implemented (all phases) | PLAN_TRANSACTIONAL_IVM |
| 2 | Auto-index creation on GROUP BY / DISTINCT / PK | Postponed (Phase 2 of transactional IVM) | PLAN_TRANSACTIONAL_IVM §5.2 |
| 3 | TRUNCATE propagation (auto-truncate IMMV) | ✅ Closed — IMMEDIATE mode fires full refresh on TRUNCATE | PLAN_TRANSACTIONAL_IVM §3.2 |
| 4 | Row Level Security respect | ✅ Closed — v0.5.0: refreshes see all data; RLS on ST itself; IMMEDIATE mode secured; change buffers shielded | ROW_LEVEL_SECURITY.md |
| 5 | PostgreSQL 13–17 support | PG 14–18 backcompat planned (~2.5–3 weeks) | PLAN_PG_BACKCOMPAT |
| 6 | session_preload_libraries | Not applicable (background worker needs shared_preload) | — |
| 7 | Rename via ALTER TABLE | Event trigger support (low effort) | — |
| 8 | Drop via DROP TABLE | Postponed (Phase 2 of transactional IVM) | PLAN_TRANSACTIONAL_IVM §4.3 |
| 9 | Extension upgrade scripts | ✅ Closed — Scripts ship from v0.2.1; CI completeness check and upgrade E2E tests in place | — |
| 10 | pg_dump / pg_restore | ✅ Closed — v0.8.0: safe backup with pg_dump and pg_restore, auto-reconnect streams | — |
Of the 10 items, 5 are now closed (immediate maintenance, TRUNCATE, RLS, upgrade scripts, pg_dump), 3 have concrete implementation plans, and 2 are low-priority or not applicable.
16. Use-Case Fit
| Scenario | Recommended |
|---|---|
| Need views consistent within the same transaction | Either (pg_trickle IMMEDIATE mode or pg_ivm) |
| Application cannot tolerate any view staleness | Either (pg_trickle IMMEDIATE mode or pg_ivm) |
| High write throughput, views can be slightly stale | pg_trickle (DIFFERENTIAL mode) |
| Multi-layer summary pipelines with dependencies | pg_trickle |
| Time-based or cron-driven refresh schedules | pg_trickle |
| Views with complex SQL (window functions, CTEs, UNION) | pg_trickle |
| Simple aggregation with zero-staleness requirement | Either (pg_trickle has richer SQL coverage) |
| Kubernetes / CloudNativePG deployment | pg_trickle |
| dbt integration | pg_trickle |
| Circular / self-referencing pipelines | pg_trickle |
| Multi-source watermark coordination | pg_trickle |
| High-throughput bulk loading (append-only) | pg_trickle (append-only fast path) |
| Row Level Security on analytical summaries | pg_trickle (richer RLS model) |
| pg_dump / pg_restore workflow | pg_trickle |
| PostgreSQL 13–17 | pg_ivm |
| PostgreSQL 18 | pg_trickle (superset of pg_ivm) |
| Production-hardened, stable API | pg_ivm |
| Early adopter, rich SQL coverage needed | pg_trickle |
17. Coexistence
The two extensions can be installed in the same database simultaneously — they
use different schemas (pgivm vs pgtrickle/pgtrickle_changes) and do not
interfere with each other. However, with pg_trickle's IMMEDIATE mode now
available and its dramatically broader feature set (v0.9.0), there is little
reason to use both:
- Use pg_trickle IMMEDIATE for small, critical lookup tables that must be perfectly consistent within transactions (the use-case that previously required pg_ivm).
- Use pg_trickle DIFFERENTIAL/FULL for large analytical summary tables, multi-layer aggregation pipelines, circular pipelines, or views where slight staleness is acceptable.
- Use pg_trickle AUTO (default) to let the system choose the best strategy.
- Use pg_ivm only if you need PostgreSQL 13–17 support or prefer its mature, battle-tested codebase.
18. Recommendations
Planned work that closes pg_ivm gaps
| Priority | Item | Plan | Effort | Closes Gaps |
|---|---|---|---|---|
| ✅ Done | IMMEDIATE refresh mode (all phases) | PLAN_TRANSACTIONAL_IVM | Complete | #1 (immediate maintenance), #3 (TRUNCATE) |
| ✅ Done | Extension upgrade scripts | v0.2.1 release | Complete | #9 (upgrade scripts) |
| ✅ Done | Row Level Security | v0.5.0 release | Complete | #4 (RLS) |
| ✅ Done | pg_dump / pg_restore | v0.8.0 release | Complete | #10 (backup/restore) |
| Postponed | pg_ivm compatibility layer | PLAN_TRANSACTIONAL_IVM Phase 2 | Deferred to post-1.0 | #2 (auto-indexing), #7 (rename), #8 (DROP TABLE) |
| High | PG 16–18 backcompat (MVP) | PLAN_PG_BACKCOMPAT §11 | ~1.5 weeks | #5 (PG version support) |
| Medium | PG 14–18 backcompat (full) | PLAN_PG_BACKCOMPAT §5 | ~2.5–3 weeks | #5 (PG version support) |
Remaining small gaps (no existing plan)
| Priority | Item | Description | Effort |
|---|---|---|---|
| Low | ALTER TABLE RENAME | Detect rename via event trigger, update catalog | 2–4h |
Not worth pursuing
| Item | Reason |
|---|---|
| PG 13 support | EOL since November 2025. Incompatible raw_parser() API. |
| session_preload_libraries | Requires background worker, which needs shared_preload_libraries. |
19. Conclusion
pg_trickle covers all of pg_ivm's SQL surface and extends it dramatically with 55+ additional aggregate functions (including algebraic O(1) maintenance for COUNT/SUM/AVG/STDDEV/VAR), window functions, set operations, recursive CTEs, LATERAL support, anti/semi-joins, circular pipeline support, watermark coordination, parallel refresh, Row Level Security, and a comprehensive operational layer.
The immediate maintenance gap is now fully closed: pg_trickle's IMMEDIATE
refresh mode provides the same in-transaction consistency as pg_ivm, while also
supporting window functions, LATERAL, scalar subqueries, WITH RECURSIVE (IM1),
TopK micro-refresh (IM2), and cascading stream tables in IMMEDIATE mode — all
of which pg_ivm cannot do.
The upgrade infrastructure gap is also closed: v0.2.1 ships SQL migration scripts with continuous upgrade path through v0.9.0, a CI completeness checker, and upgrade E2E tests, matching pg_ivm's upgrade path story.
The Row Level Security gap is closed (v0.5.0): refreshes see all data, RLS policies on the stream table itself control access, and IMMEDIATE mode is secured with shielded change buffers.
The pg_dump/restore gap is closed (v0.8.0): safe backup with standard PostgreSQL tools and automatic stream reconnection on restore.
The one remaining structural gap is PG version support:
- PLAN_PG_BACKCOMPAT details backporting
to PG 14–18 (or PG 16–18 as MVP) in ~2.5–3 weeks, primarily by
#[cfg]- gating ~435 lines of JSON/SQL-standard parse-tree code.
Once backcompat is implemented, pg_trickle will be a strict superset of pg_ivm in every dimension: same immediate maintenance model, comparable PG version support (14–18 vs 13–18, with PG 13 EOL), dramatically wider SQL coverage (60+ aggregates vs 5, 21 DVM operators, 42 unique features), and a complete operational layer that pg_ivm entirely lacks.
For users migrating from pg_ivm, the IMMEDIATE refresh mode already provides
the same zero-staleness guarantee. A full compatibility layer (pgivm.create_immv,
pgivm.refresh_immv, pgivm.pg_ivm_immv) is planned for post-1.0 to enable
zero-change migration.
References
- pg_ivm repository: https://github.com/sraoss/pg_ivm
- pg_trickle repository: https://github.com/trickle-labs/pg-trickle
- DBSP differential dataflow paper: https://arxiv.org/abs/2203.16684
- pg_trickle ESSENCE.md: ../../ESSENCE.md
- pg_trickle DVM operators: ../../docs/DVM_OPERATORS.md
- pg_trickle architecture: ../../docs/ARCHITECTURE.md
Research: Custom SQL Syntax Options
This document surveys custom-syntax extensions considered for pg_trickle
(e.g. CREATE STREAM TABLE) and the tradeoffs against the
current function-based API (pgtrickle.create_stream_table()). It is
intended for contributors and language/parser research.
User documentation on SQL functions is in SQL Reference.
Abstract
pg_trickle deliberately chose a function-based API (pgtrickle.create_stream_table()) over custom DDL syntax (CREATE STREAM TABLE …) for three reasons: PostgreSQL's pg_catalog has no extension point for new top-level statement types without patching the core parser; function calls are portable across every client library and ORMs without any driver-level changes; and they compose naturally with PL/pgSQL, transaction blocks, and conditional DDL patterns.
This research surveys the three realistic implementation routes — grammar patches, ProcessUtility hooks, and comment-driven DDL shims — and quantifies the upgrade maintenance burden of each against the stable, zero-patch approach of function calls. The findings strongly favour the current design for an extension targeting production deployments.
The document also catalogues prior art: pg_partman's run_maintenance() model, timescaledb's grammar extension (and its associated patch surface), and pg_ivm's function-only API. For pg_trickle's scale and lifecycle goals, the function-based API remains the correct long-term choice. A lightweight CREATE STREAM TABLE compatibility shim delivered via a PL/pgSQL wrapper is noted as a viable opt-in convenience without any parser modifications.
Custom SQL Syntax for PostgreSQL Extensions
Comprehensive Technical Research Report
Date: 2026-02-25
Context: pg_trickle extension — evaluating approaches to support CREATE STREAM TABLE syntax or equivalent native-feeling DDL.
Table of Contents
- Executive Summary
- PostgreSQL Parser Hooks / Utility Hooks
- The ProcessUtility_hook Approach
- Raw Parser Extension (gram.y)
- The Utility Command Approach
- Custom Access Methods (CREATE ACCESS METHOD)
- Table Access Method API (PostgreSQL 12+)
- Foreign Data Wrapper Approach
- Event Triggers
- TimescaleDB Continuous Aggregates Pattern
- Citus Distributed DDL Pattern
- PostgreSQL 18 New Features
- COMMENT / OPTIONS Abuse Pattern
- pg_ivm (Incremental View Maintenance) Pattern
- CREATE TABLE ... USING (Table Access Methods) Deep Dive
- Comparison Matrix
- Recommendations for pg_trickle
1. Executive Summary
PostgreSQL's parser is not extensible — there is no parser hook that allows extensions to add new grammar rules. This is a fundamental design constraint. Every approach to "custom DDL syntax" in extensions falls into one of two categories:
- Intercept existing syntax — Use
ProcessUtility_hookor event triggers to intercept standard DDL (e.g.,CREATE TABLE,CREATE VIEW) and augment its behavior. - Use a SQL function as the DDL interface — Define
SELECT my_extension.create_thing(...)as the user-facing API (this is what pg_trickle currently does).
No production PostgreSQL extension ships truly new SQL grammar without forking the PostgreSQL parser. TimescaleDB, Citus, pg_ivm, and others all work within existing syntax boundaries.
2. PostgreSQL Parser Hooks / Utility Hooks
Available Hook Points
PostgreSQL provides several hook function pointers that extensions can override in _PG_init():
| Hook | Header | Purpose |
|---|---|---|
ProcessUtility_hook | tcop/utility.h | Intercept utility (DDL) statement execution |
post_parse_analyze_hook | parser/analyze.h | Inspect/modify the analyzed parse tree after semantic analysis |
planner_hook | optimizer/planner.h | Replace or augment the query planner |
ExecutorStart_hook | executor/executor.h | Intercept executor startup |
ExecutorRun_hook | executor/executor.h | Intercept executor row processing |
ExecutorFinish_hook | executor/executor.h | Intercept executor finish |
ExecutorEnd_hook | executor/executor.h | Intercept executor cleanup |
object_access_hook | catalog/objectaccess.h | Notifications when objects are created/modified/dropped |
emit_log_hook | utils/elog.h | Intercept log messages |
What's Missing: No Parser Hook
There is no parser_hook or raw_parser_hook. The raw parser (gram.y → scan.l → bison grammar) is compiled into the PostgreSQL server binary. Extensions cannot:
- Add new keywords (e.g.,
STREAM) - Add new grammar productions (e.g.,
CREATE STREAM TABLE) - Modify the tokenizer/lexer
- Intercept raw SQL text before parsing
The closest hook is post_parse_analyze_hook, which fires after the SQL has already been parsed and analyzed. By this point:
- The SQL string has already been tokenized and parsed by gram.y
- A parse tree (
Querynode) has been produced - If the SQL contains unknown syntax, a
syntax errorhas already been raised
Technical Details of post_parse_analyze_hook
/* In src/backend/parser/analyze.c */
typedef void (*post_parse_analyze_hook_type)(ParseState *pstate,
Query *query,
JumbleState *jstate);
post_parse_analyze_hook_type post_parse_analyze_hook = NULL;
Extensions can set this in _PG_init():
static post_parse_analyze_hook_type prev_post_parse_analyze_hook = NULL;
void _PG_init(void) {
prev_post_parse_analyze_hook = post_parse_analyze_hook;
post_parse_analyze_hook = my_post_parse_analyze;
}
Use cases: Query rewriting after parsing (e.g., adding security predicates, row-level security), statistics collection, plan caching invalidation. Not usable for new syntax because parsing has already completed.
Pros/Cons
| Aspect | Assessment |
|---|---|
| Native syntax | Impossible — cannot add new grammar |
| Intercept existing DDL | Yes via ProcessUtility_hook |
| Modify parsed queries | Yes via post_parse_analyze_hook |
| Complexity | Low for hooking, but limited in capability |
| PG version | All modern versions (hooks stable since PG 9.x) |
| Maintenance | Very low — hook signatures rarely change |
3. The ProcessUtility_hook Approach
How It Works
ProcessUtility_hook is the most powerful DDL interception point. It fires for every "utility statement" (DDL, COPY, EXPLAIN, etc.) after parsing but before execution.
typedef void (*ProcessUtility_hook_type)(PlannedStmt *pstmt,
const char *queryString,
bool readOnlyTree,
ProcessUtilityContext context,
ParamListInfo params,
QueryEnvironment *queryEnv,
DestReceiver *dest,
QueryCompletion *qc);
An extension can:
- Inspect the parse tree node — The
PlannedStmt->utilityStmtfield contains the parsed DDL node (e.g.,CreateStmt,AlterTableStmt,ViewStmt). - Modify the parse tree — Change fields before passing to the standard handler.
- Replace execution entirely — Skip calling the standard handler and do something else.
- Post-process — Call the standard handler first, then do additional work.
- Block execution — Raise an error to prevent the DDL.
What Extensions Use This
| Extension | What they intercept | Purpose |
|---|---|---|
| TimescaleDB | CREATE TABLE, ALTER TABLE, DROP TABLE, CREATE INDEX, etc. | Convert regular tables to hypertables, distribute DDL |
| Citus | Most DDL statements | Propagate DDL to worker nodes |
| pg_partman | CREATE TABLE, partition DDL | Auto-manage partitioning |
| pg_stat_statements | All utility statements | Track DDL execution statistics |
| pgAudit | All utility statements | Audit logging |
| pg_hint_plan | — | Uses post_parse_analyze_hook instead |
| sepgsql | Object creation/modification | Security label enforcement |
Can It Handle New Syntax?
No. It can only intercept DDL that PostgreSQL's parser already understands. You cannot use ProcessUtility_hook to handle CREATE STREAM TABLE because the parser will reject that syntax before the hook is ever called.
However, it can intercept and augment existing syntax:
CREATE TABLE ... (some_option)→ InterceptCreateStmt, check for special markers, do extra workCREATE VIEW ... WITH (custom_option = true)→ InterceptViewStmt, checkreloptionsCREATE MATERIALIZED VIEW ... WITH (custom = true)→ Same approach
Pattern: Intercepting CREATE TABLE
static void my_process_utility(PlannedStmt *pstmt, ...) {
Node *parsetree = pstmt->utilityStmt;
if (IsA(parsetree, CreateStmt)) {
CreateStmt *stmt = (CreateStmt *) parsetree;
// Check for a special reloption or table name pattern
ListCell *lc;
foreach(lc, stmt->options) {
DefElem *opt = (DefElem *) lfirst(lc);
if (strcmp(opt->defname, "stream") == 0) {
// This is a stream table! Do custom logic.
create_stream_table_from_ddl(stmt, queryString);
return; // Don't call standard handler
}
}
}
// Pass through to standard handler
if (prev_ProcessUtility)
prev_ProcessUtility(pstmt, ...);
else
standard_ProcessUtility(pstmt, ...);
}
Pros/Cons
| Aspect | Assessment |
|---|---|
Native CREATE STREAM TABLE | No — parser rejects unknown syntax |
CREATE TABLE ... WITH (stream=true) | Yes — feasible via reloptions |
| Complexity | Medium — must carefully chain with other extensions |
| PG version | All modern versions |
| Maintenance | Low — hook signature changes rarely (changed in PG14, PG15) |
| Risk | Must always chain prev_ProcessUtility — misbehaving can break other extensions |
4. Raw Parser Extension (gram.y)
How It Works
PostgreSQL's SQL parser is a Bison-generated LALR(1) parser defined in:
src/backend/parser/gram.y— Grammar rules (~18,000 lines)src/backend/parser/scan.l— Flex lexer (tokenizer)src/include/parser/kwlist.h— Reserved/unreserved keyword list
To add CREATE STREAM TABLE, you would:
- Add
STREAMto the keyword list (unreserved or reserved) - Add grammar rules to
gram.y:CreateStreamTableStmt: CREATE STREAM TABLE qualified_name '(' OptTableElementList ')' OptWith AS SelectStmt { CreateStreamTableStmt *n = makeNode(CreateStreamTableStmt); n->relation = $4; n->query = $9; /* ... */ $$ = (Node *) n; } ; - Add a new
NodeTagforCreateStreamTableStmt - Handle it in
ProcessUtility - Rebuild the PostgreSQL server
Implications
This requires forking PostgreSQL. The modified parser is compiled into postgres binary. You cannot ship a grammar modification as a loadable extension (.so/.dylib).
Who Does This?
- YugabyteDB — Fork of PG with custom grammar for distributed features
- CockroachDB — Entirely custom parser (Go, not PG's Bison grammar)
- Amazon Aurora (partially) — Custom grammar additions for Aurora-specific features
- Greenplum — Fork of PG with added grammar for
DISTRIBUTED BY,PARTITION BYetc. - ParadeDB — Fork of PG with some custom syntax additions
Pros/Cons
| Aspect | Assessment |
|---|---|
Native CREATE STREAM TABLE | Yes — full parser-level support |
| Complexity | Very high — must maintain a PG fork |
| PG version | Tied to a single PG version |
| Maintenance | Extremely high — must rebase on every PG release (gram.y changes significantly between major versions) |
| Distribution | Cannot use CREATE EXTENSION; must ship entire modified PostgreSQL |
| User adoption | Very low — users must replace their PostgreSQL installation |
| psql autocomplete | Would work with matching psql modifications |
| pg_dump/pg_restore | Broken unless you also modify those tools |
Verdict: Not viable for an extension. Only viable for a PostgreSQL fork/distribution.
5. The Utility Command Approach
How It Works
Some sources reference a "custom utility command" mechanism. In practice, this does not exist as a formal PostgreSQL extension point. What people sometimes mean is one of:
5a. Using DO Blocks as Custom Commands
DO $$ BEGIN PERFORM pgtrickle.create_stream_table('my_st', 'SELECT ...'); END $$;
This is just a wrapped function call — not a real custom command.
5b. Abusing COMMENT or SET for Command Dispatch
Some extensions parse custom commands from strings:
-- Using SET to pass commands
SET myext.command = 'CREATE STREAM TABLE my_st AS SELECT ...';
SELECT myext.execute_pending_command();
Or using post_parse_analyze_hook to intercept a specially-formatted query:
-- Extension intercepts this via post_parse_analyze_hook
SELECT * FROM myext.dispatch('CREATE STREAM TABLE ...');
5c. Overloading Existing Syntax
Some extensions overload SELECT or CALL:
CALL pgtrickle.create_stream_table('my_st', $$SELECT ...$$);
CALL was introduced in PostgreSQL 11 for stored procedures. Using it makes the DDL feel more "command-like" than SELECT function().
Pros/Cons
| Aspect | Assessment |
|---|---|
| Native syntax | No — still a function call in disguise |
| User experience | Moderate — CALL is better than SELECT |
| Complexity | Low |
| PG version | PG11+ for CALL |
| Maintenance | Very low |
6. Custom Access Methods (CREATE ACCESS METHOD)
How It Works
PostgreSQL supports extension-defined access methods (index AMs and table AMs):
CREATE ACCESS METHOD my_am TYPE TABLE HANDLER my_am_handler;
This was introduced in PostgreSQL 9.6 for index AMs and extended to table AMs in PostgreSQL 12. The CREATE ACCESS METHOD statement shows PostgreSQL's philosophy: extensions can define new implementations of existing concepts (tables, indexes) but not new concepts (stream tables).
Table AM vs. Index AM
| Type | Since | Handler Signature | Example |
|---|---|---|---|
| Index AM | PG 9.6 | IndexAmRoutine with scan/insert/delete callbacks | bloom, brin, GiST |
| Table AM | PG 12 | TableAmRoutine with 60+ callbacks | heap (default), columnar (Citus), zedstore (experimental) |
Can We Use This for Stream Tables?
The table AM API defines how tuples are stored and retrieved, not how tables are created or maintained. A stream table's key features are:
- Defining query — Not part of the table AM concept
- Automatic refresh — Not part of the table AM concept
- Change tracking — Could partially overlap with table AM's tuple modification callbacks
- Storage — The actual storage could use heap (default) AM
You could theoretically create a custom table AM that:
- Uses heap storage underneath
- Intercepts INSERT/UPDATE/DELETE to maintain change buffers
- Adds custom metadata
But this would be an extreme abuse of the API. Table AMs are meant for storage engines, not for implementing materialized view semantics.
Pros/Cons
| Aspect | Assessment |
|---|---|
| Native syntax | No — CREATE TABLE ... USING my_am is the closest |
| Complexity | Extremely high — 60+ callbacks to implement |
| Fitness | Poor — table AM is about storage, not view maintenance |
| PG version | PG 12+ |
| Maintenance | High — AM API evolves between major versions |
7. Table Access Method API (PostgreSQL 12+)
Deep Technical Details
The Table Access Method (AM) API was introduced in PostgreSQL 12 via commit c2fe139c20 by Andres Freund. It abstracts the storage layer, allowing extensions to replace the default heap storage with custom implementations.
The CREATE TABLE ... USING Syntax
-- Use default AM (heap)
CREATE TABLE normal_table (id int, data text);
-- Use custom AM
CREATE TABLE my_table (id int, data text) USING my_custom_am;
-- Set default for a database
SET default_table_access_method = 'my_custom_am';
TableAmRoutine Structure
The handler function must return a TableAmRoutine struct with callbacks:
typedef struct TableAmRoutine {
NodeTag type;
/* Slot callbacks */
const TupleTableSlotOps *(*slot_callbacks)(Relation rel);
/* Scan callbacks */
TableScanDesc (*scan_begin)(Relation rel, Snapshot snap, int nkeys, ...);
void (*scan_end)(TableScanDesc scan);
void (*scan_rescan)(TableScanDesc scan, ...);
bool (*scan_getnextslot)(TableScanDesc scan, ...);
/* Parallel scan */
Size (*parallelscan_estimate)(Relation rel);
Size (*parallelscan_initialize)(Relation rel, ...);
void (*parallelscan_reinitialize)(Relation rel, ...);
/* Index fetch */
IndexFetchTableData *(*index_fetch_begin)(Relation rel);
void (*index_fetch_reset)(IndexFetchTableData *data);
void (*index_fetch_end)(IndexFetchTableData *data);
bool (*index_fetch_tuple)(IndexFetchTableData *data, ...);
/* Tuple modification */
void (*tuple_insert)(Relation rel, TupleTableSlot *slot, ...);
void (*tuple_insert_speculative)(Relation rel, ...);
void (*tuple_complete_speculative)(Relation rel, ...);
void (*multi_insert)(Relation rel, TupleTableSlot **slots, int nslots, ...);
TM_Result (*tuple_delete)(Relation rel, ItemPointer tid, ...);
TM_Result (*tuple_update)(Relation rel, ItemPointer otid, ...);
TM_Result (*tuple_lock)(Relation rel, ItemPointer tid, ...);
/* DDL callbacks */
void (*relation_set_new_filelocator)(Relation rel, ...);
void (*relation_nontransactional_truncate)(Relation rel);
void (*relation_copy_data)(Relation rel, const RelFileLocator *newrlocator);
void (*relation_copy_for_cluster)(Relation rel, ...);
void (*relation_vacuum)(Relation rel, VacuumParams *params, ...);
bool (*scan_analyze_next_block)(TableScanDesc scan, ...);
bool (*scan_analyze_next_tuple)(TableScanDesc scan, ...);
/* Planner support */
void (*relation_estimate_size)(Relation rel, int32 *attr_widths, ...);
/* ... more callbacks */
} TableAmRoutine;
Hybrid Approach: Table AM + ProcessUtility_hook
A more practical pattern:
- Register a custom table AM (e.g.,
stream_am) that wraps heap - Use
ProcessUtility_hookto interceptCREATE TABLE ... USING stream_am - When detected, perform stream table registration (catalog, CDC, etc.)
- The actual storage uses standard heap via delegation
-- User writes:
CREATE TABLE order_totals (region text, total numeric)
USING stream_am
WITH (query = 'SELECT region, SUM(amount) FROM orders GROUP BY region',
schedule = '1m',
refresh_mode = 'DIFFERENTIAL');
Problems with This Approach
- Column list is mandatory —
CREATE TABLE ... USINGrequires explicit column definitions. Stream tables should derive columns from the query. - Query in WITH clause — Storing a full SQL query in
reloptionsis hacky and has length limits. - No AS SELECT — Table AMs don't support
CREATE TABLE ... AS SELECTwith USING clause in the standard grammar. - VACUUM, ANALYZE complexity — Must implement or delegate all maintenance callbacks.
- pg_dump compatibility — pg_dump would dump
CREATE TABLE ... USING stream_ambut not the associated metadata (query, schedule, etc.)
Pros/Cons
| Aspect | Assessment |
|---|---|
| Native syntax | Partial — CREATE TABLE ... USING stream_am |
| Feels like a stream table | No — still looks like a regular table with options |
| Complexity | Very high |
| pg_dump | Broken — metadata in catalog tables won't be dumped |
| PG version | PG 12+ |
| Maintenance | High — table AM API changes between versions |
8. Foreign Data Wrapper Approach
How It Works
Foreign Data Wrappers (FDW) allow PostgreSQL to access external data sources via CREATE FOREIGN TABLE. An extension can register a custom FDW:
CREATE EXTENSION pg_trickle;
CREATE SERVER stream_server FOREIGN DATA WRAPPER pgtrickle_fdw;
CREATE FOREIGN TABLE order_totals (region text, total numeric)
SERVER stream_server
OPTIONS (
query 'SELECT region, SUM(amount) FROM orders GROUP BY region',
schedule '1m',
refresh_mode 'DIFFERENTIAL'
);
FDW API
The FDW API provides callbacks for:
GetForeignRelSize— Estimate relation size for planningGetForeignPaths— Generate access pathsGetForeignPlan— Create a plan nodeBeginForeignScan— Start scanIterateForeignScan— Get next tupleEndForeignScan— End scanAddForeignUpdatePaths— Support INSERT/UPDATE/DELETE (optional)
How It Could Work for Stream Tables
- Define a custom FDW (
pgtrickle_fdw) - The FDW's scan callbacks read from the underlying storage table
ProcessUtility_hookinterceptsCREATE FOREIGN TABLE ... SERVER stream_serverto set up CDC, catalog entries, etc.- A background worker handles refresh scheduling
Problems
- Foreign tables have restrictions — Cannot have indexes, constraints, triggers, or participate in inheritance. This severely limits usability.
- Query planner limitations — Foreign tables use a separate planning path with potentially worse plan quality.
- No MVCC — Foreign tables typically don't provide snapshot isolation semantics.
- User model confusion — "Foreign table" implies external data, not a derived view.
- EXPLAIN output — Shows "Foreign Scan" instead of "Seq Scan", confusing users.
- pg_dump — Foreign tables are dumped, but server/FDW setup may not transfer correctly.
- Two-step creation — Requires
CREATE SERVERbeforeCREATE FOREIGN TABLE.
Pros/Cons
| Aspect | Assessment |
|---|---|
| Native syntax | Partial — CREATE FOREIGN TABLE with options |
| Feels like a stream table | No — foreign tables have different semantics |
| Index support | No — major limitation |
| Trigger support | No — major limitation |
| Complexity | Medium |
| PG version | PG 9.1+ |
| Maintenance | Low — FDW API is very stable |
Verdict: Not suitable. The restrictions on foreign tables (no indexes, no triggers) make this impractical for stream tables that need to behave like regular tables.
9. Event Triggers
How It Works
Event triggers fire on DDL events at the database level:
CREATE EVENT TRIGGER my_trigger ON ddl_command_end
WHEN TAG IN ('CREATE TABLE', 'ALTER TABLE', 'DROP TABLE')
EXECUTE FUNCTION my_handler();
Available events:
ddl_command_start— Before DDL execution (PG 9.3+)ddl_command_end— After DDL execution (PG 9.3+)sql_drop— When objects are dropped (PG 9.3+)table_rewrite— When a table is rewritten (PG 9.5+)
Inside the Handler
CREATE FUNCTION my_handler() RETURNS event_trigger AS $$
DECLARE
obj record;
BEGIN
FOR obj IN SELECT * FROM pg_event_trigger_ddl_commands()
LOOP
-- obj.objid, obj.object_type, obj.command_tag, etc.
IF obj.command_tag = 'CREATE TABLE' AND obj.object_type = 'table' THEN
-- Check if this table has a special marker
-- (e.g., a specific reloption or comment)
END IF;
END LOOP;
END;
$$ LANGUAGE plpgsql;
Pattern: CREATE TABLE + Event Trigger
- User creates a table with a special comment or option:
CREATE TABLE order_totals (region text, total numeric); COMMENT ON TABLE order_totals IS 'pgtrickle:query=SELECT region...;schedule=1m'; - Event trigger on
ddl_command_endfires - Handler parses the comment, detects stream table intent
- Handler registers the stream table in the catalog
Limitations
- Cannot modify the DDL — Event triggers observe DDL, they can't change what happened. On
ddl_command_end, the table already exists. - Cannot prevent DDL — On
ddl_command_start, you can raise an error to prevent it, but you can't redirect it. - Two-step process — User must
CREATE TABLEAND then mark it somehow (comment, option, separate function call). - No custom syntax — Event triggers watch existing DDL commands.
- pg_trickle already uses this — For DDL tracking on upstream tables (see
hooks.rs).
Pros/Cons
| Aspect | Assessment |
|---|---|
| Native syntax | No — watches existing DDL only |
| Complexity | Low |
| Can transform DDL | No — observe only |
| PG version | PG 9.3+ |
| Maintenance | Very low |
| pg_trickle usage | Already used for upstream DDL tracking |
10. TimescaleDB Continuous Aggregates Pattern
How It Works
TimescaleDB continuous aggregates (caggs) demonstrate the most sophisticated approach to custom DDL-like syntax in a PostgreSQL extension. Their evolution is instructive.
Phase 1: Pure Function API (early versions)
-- Create a view, then register it
CREATE VIEW daily_temps AS
SELECT time_bucket('1 day', time) AS day, AVG(temp)
FROM conditions GROUP BY 1;
SELECT add_continuous_aggregate_policy('daily_temps', ...);
Phase 2: CREATE MATERIALIZED VIEW WITH (introduced in TimescaleDB 2.0)
CREATE MATERIALIZED VIEW daily_temps
WITH (timescaledb.continuous) AS
SELECT time_bucket('1 day', time) AS day, device_id, AVG(temp)
FROM conditions
GROUP BY 1, 2;
How the Hook Chain Works
TimescaleDB's approach uses layered hooks:
ProcessUtility_hookinterceptsCREATE MATERIALIZED VIEW- Checks
reloptionsfortimescaledb.continuousin theWithClause - If found:
- Does NOT call standard ProcessUtility for the matview
- Instead creates a regular hypertable (the materialization)
- Creates an internal view (the user-facing query interface)
- Registers refresh policies in the catalog
- Sets up continuous aggregate metadata
- For
REFRESH MATERIALIZED VIEW, intercepts and routes to their refresh engine - For
DROP MATERIALIZED VIEW, intercepts and cleans up all artifacts
The Magic: Reloptions as Extension Point
PostgreSQL's CREATE MATERIALIZED VIEW ... WITH (option = value) passes options as DefElem nodes in the parse tree. The parser treats these as generic key-value pairs — it does NOT validate the option names. This is the key insight: PostgreSQL's parser accepts arbitrary options in WITH clauses.
// In ProcessUtility_hook:
if (IsA(parsetree, CreateTableAsStmt)) {
CreateTableAsStmt *stmt = (CreateTableAsStmt *) parsetree;
if (stmt->objtype == OBJECT_MATVIEW) {
// Check for our custom option in stmt->into->options
bool is_continuous = false;
ListCell *lc;
foreach(lc, stmt->into->rel->options) {
DefElem *opt = (DefElem *) lfirst(lc);
if (strcmp(opt->defname, "timescaledb.continuous") == 0) {
is_continuous = true;
break;
}
}
if (is_continuous) {
// Handle as continuous aggregate
return;
}
}
}
Refresh Policies
-- Add a refresh policy (function call, not DDL)
SELECT add_continuous_aggregate_policy('daily_temps',
start_offset => INTERVAL '1 month',
end_offset => INTERVAL '1 day',
schedule_interval => INTERVAL '1 hour');
What pg_trickle Could Learn
The TimescaleDB pattern for pg_trickle would look like:
-- Option A: CREATE MATERIALIZED VIEW with custom option
CREATE MATERIALIZED VIEW order_totals
WITH (pgtrickle.stream = true, pgtrickle.schedule = '1m', pgtrickle.mode = 'DIFFERENTIAL')
AS SELECT region, SUM(amount) FROM orders GROUP BY region;
-- Option B: CREATE TABLE with custom option (less natural)
CREATE TABLE order_totals (region text, total numeric)
WITH (pgtrickle.stream = true);
-- Then separately: SELECT pgtrickle.set_query('order_totals', 'SELECT ...');
Pros/Cons
| Aspect | Assessment |
|---|---|
| Native syntax | Good — CREATE MATERIALIZED VIEW ... WITH (pgtrickle.stream) looks natural |
| User experience | Very good — familiar DDL syntax with extension options |
| Complexity | High — must implement full ProcessUtility_hook chain |
| pg_dump | Partial — matview DDL is dumped, but custom metadata needs pg_dump extension or config tables |
| PG version | PG 9.3+ (matviews), PG 12+ (better option handling) |
| Maintenance | Medium — must track changes to matview creation internals |
| Shared preload | Required — ProcessUtility_hook needs shared_preload_libraries |
11. Citus Distributed DDL Pattern
How It Works
Citus (now part of Microsoft) demonstrates another approach to extending DDL behavior:
ProcessUtility_hook Chain
Citus has one of the most comprehensive ProcessUtility_hook implementations:
void multi_ProcessUtility(PlannedStmt *pstmt, ...) {
// 1. Classify the DDL
Node *parsetree = pstmt->utilityStmt;
// 2. Check if it affects distributed tables
if (IsA(parsetree, AlterTableStmt)) {
// Propagate ALTER TABLE to all worker nodes
PropagateAlterTable((AlterTableStmt *)parsetree, queryString);
}
// 3. Call standard handler (or skip for intercepted commands)
if (prev_ProcessUtility)
prev_ProcessUtility(pstmt, ...);
else
standard_ProcessUtility(pstmt, ...);
// 4. Post-processing
if (IsA(parsetree, CreateStmt)) {
// Check if we should auto-distribute this table
}
}
Table Distribution via Function Calls
Citus does NOT add custom DDL syntax. Distribution is done via function calls:
-- Create a regular table
CREATE TABLE events (id bigint, data jsonb, created_at timestamptz);
-- Distribute it (function call, not DDL)
SELECT create_distributed_table('events', 'id');
-- Or create a reference table
SELECT create_reference_table('lookups');
Columnar Storage via Table AM
Citus also provides columnar storage as a table AM:
CREATE TABLE analytics_data (...)
USING columnar;
This uses the table AM API (PostgreSQL 12+) — see Section 7.
What Citus Teaches Us
- Function calls for complex operations —
create_distributed_table()is analogous topgtrickle.create_stream_table(). - ProcessUtility_hook for DDL propagation — Intercept standard DDL and add behavior.
- Table AM for storage — Separate concern from distribution logic.
- No custom syntax — Even with Microsoft's resources, Citus doesn't fork the parser.
Pros/Cons
| Aspect | Assessment |
|---|---|
| Native syntax | No — uses function calls like pg_trickle |
| Approach validated | Yes — Citus is used at massive scale with this pattern |
| Complexity | Medium (function API) to High (ProcessUtility_hook) |
| User adoption | Proven successful |
| Maintenance | Low for function API |
12. PostgreSQL 18 New Features
Relevant Extension Points in PG 18
PostgreSQL 18 (released 2025) includes several features relevant to this analysis:
12a. Virtual Generated Columns
PG 18 adds GENERATED ALWAYS AS (expr) VIRTUAL columns. Not directly relevant to stream tables, but shows PostgreSQL's willingness to expand CREATE TABLE syntax incrementally.
12b. Improved Table AM API
PG 18 refines the table AM API with better TOAST handling and improved parallel scan support. This makes custom table AMs slightly more practical.
12c. Enhanced Event Trigger Information
PG 18 expands pg_event_trigger_ddl_commands() with additional metadata fields, making event-trigger-based approaches more capable.
12d. pg_stat_io Improvements
Enhanced I/O statistics infrastructure that could benefit monitoring of stream table refresh operations.
12e. No New Parser Extension Points
PostgreSQL 18 does not add any parser extension mechanism. The parser remains monolithic and non-extensible. There have been occasional discussions on pgsql-hackers about parser hooks, but no concrete proposals have been accepted.
12f. No Custom DDL Extension Points
No new general-purpose DDL extension points beyond the existing hook system.
Looking Forward: Discussion on pgsql-hackers
There have been recurring threads on pgsql-hackers about:
- Extension-defined SQL syntax — Rejected due to complexity and parser architecture
- Loadable parser modules — Theoretical discussions, no implementation
- Extension catalogs — Some interest in allowing extensions to register custom catalogs
None of these are implemented in PG 18.
Pros/Cons
| Aspect | Assessment |
|---|---|
| New syntax extension points | None in PG 18 |
| Table AM improvements | Minor — slightly easier to implement |
| Event trigger improvements | Minor — more metadata available |
| Parser extensibility | Not planned for any upcoming PG release |
13. COMMENT / OPTIONS Abuse Pattern
How It Works
Several extensions use table comments or reloptions as a "poor man's metadata" to tag tables with custom semantics.
Pattern 1: COMMENT-based
CREATE TABLE order_totals (region text, total numeric);
COMMENT ON TABLE order_totals IS '@pgtrickle {"query": "SELECT ...", "schedule": "1m"}';
An event trigger or background worker scans pg_description for tables with the @pgtrickle prefix and processes them.
Pattern 2: Reloptions-based
CREATE TABLE order_totals (region text, total numeric)
WITH (fillfactor = 70, pgtrickle.stream = true);
Problem: PostgreSQL validates reloptions against a known list. You cannot add arbitrary options to WITH (...) without registering them. Extensions can register custom reloptions via add_reloption() functions, but this is a relatively obscure API.
Pattern 3: GUC-based Tagging
-- Set a GUC that our ProcessUtility_hook reads
SET pgtrickle.next_create_is_stream = true;
SET pgtrickle.stream_query = 'SELECT region, SUM(amount) FROM orders GROUP BY region';
-- Hook intercepts this CREATE TABLE and registers it
CREATE TABLE order_totals (region text, total numeric);
-- Reset
RESET pgtrickle.next_create_is_stream;
This is extremely hacky but has been used in practice (some partitioning extensions used similar patterns before native partitioning).
Who Uses This?
- pgmemcache — Uses comments to configure caching behavior
- Some row-level security extensions — Comments to define policies
- pg_partman — Uses a configuration table (not comments) but similar concept
Pros/Cons
| Aspect | Assessment |
|---|---|
| Native syntax | No — abuses existing mechanisms |
| User experience | Poor — fragile, easy to break by editing comments |
| Complexity | Low |
| pg_dump | COMMENT is dumped — metadata survives pg_dump/restore |
| Robustness | Low — comments can be accidentally changed |
| PG version | All versions |
14. pg_ivm (Incremental View Maintenance) Pattern
How It Works
pg_ivm is the most directly comparable extension to pg_trickle. It implements incremental view maintenance for PostgreSQL.
API Design
pg_ivm uses a pure function-call API:
-- Create an incrementally maintainable materialized view
SELECT create_immv('order_totals', 'SELECT region, SUM(amount) FROM orders GROUP BY region');
-- Refresh
SELECT refresh_immv('order_totals');
-- Drop
DROP TABLE order_totals; -- Just drop the underlying table
Key function: create_immv(name, query) — Creates an "Incrementally Maintainable Materialized View" (IMMV).
Internal Implementation
create_immv()is a SQL function (not a hook)- It parses the query, creates a storage table, sets up triggers on source tables
- IMMVs are stored as regular tables with metadata in a custom catalog (
pg_ivm_immv) - Triggers on source tables automatically update the IMMV on DML
No ProcessUtility_hook
pg_ivm does not use ProcessUtility_hook. It operates entirely through:
- SQL functions (
create_immv,refresh_immv) - Row-level triggers for automatic maintenance
- A custom catalog table for metadata
Why No Custom Syntax?
pg_ivm was developed as a proof-of-concept for PostgreSQL core IVM support. The authors explicitly chose function-call syntax to:
- Avoid
shared_preload_librariesrequirement (hooks need it) - Keep the extension simple and portable
- Focus on the IVM algorithm, not the user interface
Eventually Merged to Core?
There was discussion about upstreaming IVM to PostgreSQL core. If merged, it would get proper syntax (CREATE INCREMENTAL MATERIALIZED VIEW). As an extension, it stays with function calls.
Relevance to pg_trickle
pg_trickle's current API (pgtrickle.create_stream_table()) follows the exact same pattern as pg_ivm. This is the established approach for IVM extensions.
Pros/Cons
| Aspect | Assessment |
|---|---|
| Native syntax | No — function calls |
| Complexity | Low — simple function API |
| shared_preload_libraries | Not required for basic function API |
| pg_dump | No — function calls are not dumped; must use custom dump/restore |
| User experience | Moderate — familiar to pg_ivm users |
| Community acceptance | Established pattern for IVM extensions |
15. CREATE TABLE ... USING (Table Access Methods) Deep Dive
Full Syntax
CREATE TABLE tablename (
column1 datatype,
column2 datatype,
...
) USING access_method_name
WITH (storage_parameter = value, ...);
How the Parser Handles USING
In gram.y:
CreateStmt: CREATE OptTemp TABLE ...
OptTableAccessMethod OptWith ...
OptTableAccessMethod:
USING name { $$ = $2; }
| /* empty */ { $$ = NULL; }
;
The USING clause sets CreateStmt->accessMethod to the access method name string.
How ProcessUtility Handles It
In createRelation() (src/backend/commands/tablecmds.c):
- If
accessMethodis specified, look it up inpg_am - Verify it's a table AM (not an index AM)
- Store the AM OID in
pg_class.relam - Use the AM's callbacks for all subsequent operations
Custom Reloptions with Table AMs
Table AMs can define custom reloptions via:
static relopt_parse_elt stream_relopt_tab[] = {
{"query", RELOPT_TYPE_STRING, offsetof(StreamOptions, query)},
{"schedule", RELOPT_TYPE_STRING, offsetof(StreamOptions, schedule)},
{"refresh_mode", RELOPT_TYPE_STRING, offsetof(StreamOptions, refresh_mode)},
};
This would allow:
CREATE TABLE order_totals (region text, total numeric)
USING stream_heap
WITH (query = 'SELECT ...', schedule = '1m', refresh_mode = 'DIFFERENTIAL');
Problems Specific to Stream Tables
-
Column derivation — Stream tables derive columns from the query.
CREATE TABLE ... USINGrequires explicit column definitions, creating redundancy and potential inconsistency. -
No AS SELECT — You can't combine
USINGwithAS SELECT:-- This does NOT work in PostgreSQL grammar: CREATE TABLE order_totals USING stream_heap AS SELECT region, SUM(amount) FROM orders GROUP BY region; -
Full AM implementation required — Even if you delegate to heap, you must implement all callbacks and handle edge cases.
-
VACUUM/ANALYZE — Must properly delegate to heap for these to work.
-
Replication — Logical replication assumes heap tuples; custom AMs may break.
Hybrid Practical Approach
If pursuing this route:
-- Step 1: Set default AM
SET default_table_access_method = 'stream_heap';
-- Step 2: Create with query in options
CREATE TABLE order_totals ()
WITH (pgtrickle.query = 'SELECT region, SUM(amount) FROM orders GROUP BY region',
pgtrickle.schedule = '1m');
-- ProcessUtility_hook would:
-- 1. Detect USING stream_heap (or detect our custom reloptions)
-- 2. Parse the query from options
-- 3. Derive columns from the query
-- 4. Create the actual table with proper columns using heap AM
-- 5. Register in pgtrickle catalog
-- 6. Set up CDC
Pros/Cons
| Aspect | Assessment |
|---|---|
| Native syntax | Partial — CREATE TABLE ... USING stream_heap WITH (...) |
| Column derivation | Not supported — must specify columns or use hook magic |
| Complexity | Very high |
| pg_dump | Good — CREATE TABLE ... USING is properly dumped |
| PG version | PG 12+ |
| Maintenance | High — AM API changes between versions |
16. Comparison Matrix
| Approach | Native Syntax | Complexity | pg_dump | PG Version | Maintenance | Recommended |
|---|---|---|---|---|---|---|
| Function API (current) | No | Low | No* | Any | Very Low | Yes |
| ProcessUtility_hook + MATVIEW WITH | Good | High | Partial | 9.3+ | Medium | Maybe |
| Raw parser fork | Perfect | Very High | No | Fork only | Very High | No |
| Table AM USING | Partial | Very High | Yes | 12+ | High | No |
| FDW FOREIGN TABLE | Partial | Medium | Yes | 9.1+ | Low | No |
| Event triggers alone | No | Low | No | 9.3+ | Low | No |
| COMMENT abuse | No | Low | Yes | Any | Low | No |
| GUC + CREATE TABLE hack | No | Medium | Partial | Any | Medium | No |
| TimescaleDB pattern (MATVIEW + WITH) | Good | High | Partial | 9.3+ | Medium | Best option |
* Custom pg_dump support can be added via pg_dump hook or wrapper script.
17. Recommendations for pg_trickle
Current Approach: Function API (Keep and Enhance)
pg_trickle's current approach (pgtrickle.create_stream_table('name', 'query', ...)) is:
- Proven — Same pattern as pg_ivm, Citus, and many other extensions
- Simple — No
shared_preload_librariesrequired for basic usage - Maintainable — No hook chains to debug
- Portable — Works on any PG version that supports pgrx
Enhancement opportunities:
-- Current
SELECT pgtrickle.create_stream_table('order_totals',
'SELECT region, SUM(amount) FROM orders GROUP BY region', '1m');
-- Enhanced: CALL syntax for more DDL-like feel (PG 11+)
CALL pgtrickle.create_stream_table('order_totals',
$$SELECT region, SUM(amount) FROM orders GROUP BY region$$, '1m');
Future Option: TimescaleDB-style Materialized View Integration
If user demand justifies the complexity, pg_trickle could add a second creation path via ProcessUtility_hook:
-- New native-feeling syntax (requires shared_preload_libraries)
CREATE MATERIALIZED VIEW order_totals
WITH (pgtrickle.stream = true, pgtrickle.schedule = '1m')
AS SELECT region, SUM(amount) FROM orders GROUP BY region
WITH NO DATA;
-- Original function API still works (no hook needed)
SELECT pgtrickle.create_stream_table('order_totals',
'SELECT region, SUM(amount) FROM orders GROUP BY region', '1m');
Implementation plan for hook-based approach:
- Register
ProcessUtility_hookin_PG_init()(already needed forshared_preload_libraries) - Intercept
CREATE MATERIALIZED VIEW→ Check forpgtrickle.streamoption - If found: parse options, call
create_stream_table_impl()internally, create standard storage table instead of matview - Intercept
DROP MATERIALIZED VIEW→ Check if target is a stream table → Clean up - Intercept
REFRESH MATERIALIZED VIEW→ Route to stream table refresh engine - Intercept
ALTER MATERIALIZED VIEW→ Route to stream table alter logic
Estimated complexity: ~800-1200 lines of Rust hook code + tests.
Not Recommended
- Forking PostgreSQL for custom grammar — Maintenance cost is prohibitive
- Table AM approach — Complexity without proportional benefit
- FDW approach — Too many restrictions on foreign tables
- COMMENT abuse — Fragile and poor UX
pg_dump / pg_restore Strategy
Regardless of approach, pg_dump is a challenge. Options:
- Custom dump/restore functions —
pgtrickle.dump_config()andpgtrickle.restore_config() - Migration script generation —
pgtrickle.generate_migration()outputs SQL to recreate all stream tables - Event trigger on restore — Detect when tables are restored and re-register them
- Sidecar file — Generate a companion SQL file alongside pg_dump
Appendix A: Hook Registration in pgrx (Rust)
For reference, here's how ProcessUtility_hook registration works in pgrx:
#![allow(unused)] fn main() { use pgrx::prelude::*; use pgrx::pg_sys; use std::ffi::CStr; static mut PREV_PROCESS_UTILITY_HOOK: pg_sys::ProcessUtility_hook_type = None; #[pg_guard] pub extern "C-unwind" fn my_process_utility( pstmt: *mut pg_sys::PlannedStmt, query_string: *const std::os::raw::c_char, read_only_tree: bool, context: pg_sys::ProcessUtilityContext, params: pg_sys::ParamListInfo, query_env: *mut pg_sys::QueryEnvironment, dest: *mut pg_sys::DestReceiver, qc: *mut pg_sys::QueryCompletion, ) { // SAFETY: pstmt is a valid pointer provided by PostgreSQL let stmt = unsafe { (*pstmt).utilityStmt }; // Check if this is a CreateTableAsStmt (materialized view) if unsafe { pgrx::is_a(stmt, pg_sys::NodeTag::T_CreateTableAsStmt) } { // Check for our custom options... } // Chain to previous hook or standard handler unsafe { if let Some(prev) = PREV_PROCESS_UTILITY_HOOK { prev(pstmt, query_string, read_only_tree, context, params, query_env, dest, qc); } else { pg_sys::standard_ProcessUtility( pstmt, query_string, read_only_tree, context, params, query_env, dest, qc); } } } pub fn register_hooks() { unsafe { PREV_PROCESS_UTILITY_HOOK = pg_sys::ProcessUtility_hook; pg_sys::ProcessUtility_hook = Some(my_process_utility); } } }
Appendix B: Key Source Files in PostgreSQL
| File | Purpose |
|---|---|
src/backend/parser/gram.y | SQL grammar (~18,000 lines) |
src/backend/parser/scan.l | Lexer/tokenizer |
src/include/parser/kwlist.h | Keyword definitions |
src/backend/tcop/utility.c | ProcessUtility() — DDL dispatcher |
src/backend/commands/tablecmds.c | CREATE/ALTER/DROP TABLE implementation |
src/backend/commands/createas.c | CREATE TABLE AS / CREATE MATVIEW AS |
src/include/access/tableam.h | Table Access Method API |
src/include/foreign/fdwapi.h | FDW API |
src/backend/commands/event_trigger.c | Event trigger infrastructure |
Appendix C: References
- PostgreSQL Documentation — Table Access Method Interface
- PostgreSQL Documentation — Event Triggers
- PostgreSQL Documentation — Writing A Foreign Data Wrapper
- TimescaleDB Source — process_utility.c
- Citus Source — multi_utility.c
- pg_ivm Source — createas.c
- pgrx Documentation — Hooks
- PostgreSQL Wiki — CustomScanProviders
Research: Triggers vs. WAL Replication for CDC
This document analyses the architectural tradeoffs between trigger-based CDC (pg_trickle's default) and WAL logical-replication CDC. It provides the engineering rationale behind ADR-001 and ADR-002.
User-facing CDC documentation is in CDC Modes.
Abstract
Change-data capture (CDC) is the mechanism by which pg_trickle discovers which rows in a source table changed between refresh cycles. Two fundamentally different approaches exist inside PostgreSQL: row/statement-level AFTER triggers that fire synchronously inside the source transaction, and logical replication (WAL decoding) that reads the write-ahead log asynchronously after the transaction commits.
pg_trickle's default CDC mode is trigger-based (ADR-001, v0.1.0) for a single decisive reason: trigger-based CDC delivers transactional atomicity — the change is captured in the same transaction that made it, under the same locks, with no possibility of a committed write being invisible to the next refresh cycle. WAL-based CDC always has a decoding lag; a crash between transaction commit and WAL decode can lose a change window. For an IVM system where data correctness is non-negotiable, the trigger approach eliminates an entire class of consistency hazards.
The tradeoffs are real: trigger-based CDC adds 5–20% write-side latency on bulk DML (mitigated by statement-level triggers since v0.4.0), requires an additional change-buffer table per source, and cannot capture changes made by pg_dump or direct file-level manipulation. WAL-based CDC has zero write-side overhead but requires wal_level = logical, is affected by slot lag under write storms, and has a non-trivial failure-recovery surface. This document quantifies both approaches with benchmarks, defines the boundary conditions under which WAL-mode is preferable (append-only high-throughput sources with relaxed latency requirements), and explains pg_trickle's hybrid auto mode that starts with triggers and promotes to WAL when conditions are right.
Triggers vs Logical Replication for CDC in pg_trickle
Status: Evaluation Report (updated with implementation status)
Date: 2026-02-24
Context: ADR-001/ADR-002 in PLAN_ADRS.md · PLAN_USER_TRIGGERS_EXPLICIT_DML.md
Executive Summary
pg_trickle uses row-level AFTER triggers to capture changes on source tables. This report evaluates the trigger-based approach against logical replication (WAL-based CDC) across five dimensions: correctness, performance, operations, and two end-user features — user-defined triggers on stream tables and logical replication subscriptions from stream tables.
Conclusion: Triggers remain the correct choice for the current scope given
operational simplicity and zero-config deployment. The hybrid approach —
trigger bootstrap for creation with automatic WAL transition for steady-state —
is now implemented (pg_trickle.cdc_mode GUC, src/wal_decoder.rs). User-
defined triggers on stream tables are also implemented (pg_trickle.user_triggers
GUC, DISABLE TRIGGER USER during refresh). These were previously recommendations
(§6.2, §6.6); both are now shipped.
However, the atomicity constraint — the original reason for choosing triggers — is primarily a creation-time inconvenience, not a steady-state limitation. Once a stream table exists, logical replication has three significant runtime advantages:
-
No write-side overhead — With triggers, every INSERT/UPDATE/DELETE on a tracked source table does extra work before the application's transaction can commit: it runs a PL/pgSQL function, writes a row into a buffer table, and updates an index. This slows down the application. With logical replication, PostgreSQL already writes every change to its internal transaction log (WAL) regardless — the CDC layer simply reads that log after the fact, so the application's writes are not slowed down at all.
-
TRUNCATE capture — When someone runs
TRUNCATEon a source table, row-level triggers do not fire (TRUNCATE replaces the entire file rather than deleting rows one-by-one). This leaves stream tables silently stale until a manual refresh. Logical replication captures TRUNCATE natively from the WAL, so pg_trickle would know immediately that all rows were removed. -
Change ordering from the transaction log — With triggers, each trigger independently calls
pg_current_wal_lsn()to timestamp its change. With logical replication, the ordering comes directly from the WAL — the authoritative, global record of all database changes — which means change ordering is guaranteed to match commit order, even across concurrent transactions.
The two end-user features (user triggers and logical replication FROM stream tables) are both achievable without changing the CDC mechanism. A hybrid approach (triggers for creation, logical replication for steady-state) deserves serious consideration. See §3 for the full analysis.
1. Background
Current Architecture
CDC triggers on each tracked source table write typed per-column rows into
per-table buffer tables (pgtrickle_changes.changes_<oid>). Each buffer row
captures:
| Column | Purpose |
|---|---|
change_id | BIGSERIAL ordering within a source |
lsn | pg_current_wal_lsn() at trigger time |
action | 'I' / 'U' / 'D' |
pk_hash | Content hash of PK columns (optional) |
new_<col> | Per-column NEW values (INSERT/UPDATE) |
old_<col> | Per-column OLD values (UPDATE/DELETE) |
A covering B-tree index (lsn, pk_hash, change_id) INCLUDE (action) supports
the differential refresh's LSN-range scan.
The Atomicity Constraint
create_stream_table() performs DDL (CREATE TABLE) and DML (catalog inserts)
before setting up CDC. pg_create_logical_replication_slot() cannot execute
inside a transaction that has already performed writes. This makes
single-transaction atomic creation impossible with logical replication — the
decisive factor in the original ADR.
2. Comparison Matrix
2.1 Correctness & Transactional Safety
| Aspect | Triggers | Logical Replication |
|---|---|---|
| Atomic creation | ✅ Same transaction as DDL+catalog | ❌ Slot creation requires separate transaction |
| Change visibility | ✅ Immediate (same transaction) | ⚠️ Asynchronous (after COMMIT + WAL decode) |
| TRUNCATE capture | ❌ Row-level triggers not fired | ✅ WAL emits TRUNCATE since PG 11 |
| Transaction ordering | ✅ Change buffer rows ordered by LSN | ✅ WAL stream preserves commit order |
| Crash recovery | ✅ Buffer tables are WAL-logged; no orphan state | ⚠️ Slot survives crash but may need re-sync |
| Schema change handling | ✅ DDL event hooks rebuild trigger in-place | ⚠️ Requires slot re-creation or output plugin awareness |
Key insight: The TRUNCATE gap is the most significant correctness
limitation of the trigger approach. A statement-level AFTER TRUNCATE trigger
that marks downstream STs for automatic FULL refresh would close this gap
without changing the CDC architecture (see §6 Recommendation 3).
2.2 Performance
| Metric | Triggers | Logical Replication |
|---|---|---|
| Per-row write overhead | ~2–4 μs (narrow INSERT) to ~5–15 μs (wide UPDATE) | ~0 (WAL writes happen regardless) |
| Expected throughput reduction | 1.5–5× on tracked source tables | None on source tables |
| Write amplification | 2× (source WAL + buffer table WAL + index) | 1× (only source WAL) |
| Change buffer storage | Heap table + index per source | WAL segments (shared, recycled) |
| Sequence contention | BIGSERIAL per buffer (lightweight) | N/A |
| Throughput ceiling | ~5,000 writes/sec (estimated) | WAL throughput (much higher) |
| Decoding CPU cost | N/A | Non-trivial; output plugin runs in WAL sender |
| Zero-change refresh | ~3 ms (EXISTS check on empty buffer) | ~3 ms (no pending WAL changes) |
Key insight: Trigger overhead is synchronous — every committing transaction pays the cost. For applications with moderate write rates (<5,000 writes/sec) this is acceptable. For high-throughput OLTP workloads, logical replication's zero write-side overhead is a significant advantage.
2.3 Operational Complexity
| Aspect | Triggers | Logical Replication |
|---|---|---|
| PostgreSQL configuration | None required | wal_level = logical + restart |
| Managed PG compatibility | ✅ Works everywhere | ⚠️ Some providers restrict wal_level |
| WAL retention risk | None (buffer tables are independent) | Slots prevent WAL cleanup; disk exhaustion risk |
| Slot management | N/A | Create, monitor, drop; orphan detection |
max_replication_slots | N/A | Must be sized for number of tracked sources |
REPLICA IDENTITY config | N/A | Required on all tracked source tables |
| Monitoring | Buffer table row counts | Slot lag, WAL retention, decode rate |
| Extension dependencies | None | Output plugin (pgoutput, wal2json, or custom) |
| Upgrade path | CREATE OR REPLACE FUNCTION | Slot protocol version compatibility |
Key insight: Triggers are operationally simpler by a wide margin. Logical replication introduces a class of failure modes (stuck slots, WAL bloat, replica identity misconfiguration) that require dedicated monitoring and operational runbooks.
2.4 Feature: User Triggers on Stream Tables
This addresses end-user triggers on the output stream tables, not CDC triggers on source tables.
| Aspect | Current (Trigger CDC) | With Logical Replication CDC |
|---|---|---|
| Feasibility | ✅ Achievable via session_replication_role | ✅ Same mechanism applies |
| Refresh suppression | SET LOCAL session_replication_role = 'replica' | Same |
| Post-refresh notification | NOTIFY pg_trickle_refresh with metadata | Same |
| MERGE firing pattern | DELETE+INSERT (not UPDATE); must be suppressed | Same — refresh mechanism is independent of CDC |
Key insight: User trigger support on stream tables is orthogonal to the
CDC mechanism and is now implemented. The solution uses ALTER TABLE ... DISABLE TRIGGER USER / ENABLE TRIGGER USER around FULL refresh (avoiding
the session_replication_role conflict with logical replication publishing).
In DIFFERENTIAL mode, explicit per-row DML (INSERT/UPDATE/DELETE) is used
instead of MERGE so that user-defined AFTER triggers fire correctly. The
implementation is controlled by the pg_trickle.user_triggers GUC (auto/
on/off). See PLAN_USER_TRIGGERS_EXPLICIT_DML.md
for the full design.
Note: Sections 2.1–2.5 compare creation-time and operational aspects. For a focused steady-state comparison (what matters once the ST exists), see §3.
2.5 Feature: Logical Replication FROM Stream Tables
This addresses end-users subscribing to stream table changes via PostgreSQL's built-in logical replication.
| Aspect | Status | Notes |
|---|---|---|
| Basic publishing | ✅ Works today | STs are regular heap tables; CREATE PUBLICATION works |
__pgt_row_id column | ⚠️ Replicated by default | Use column list in PUBLICATION to exclude, or document as usable PK |
| Differential refresh | ✅ DELETE+INSERT via MERGE are replicated | Subscriber sees individual DELETEs and INSERTs, not UPDATEs |
| Full refresh | ✅ TRUNCATE + INSERT replicated | Subscriber needs replica_identity set; receives TRUNCATE + mass INSERT |
REPLICA IDENTITY | Needs configuration | __pgt_row_id could serve as unique index for identity |
The session_replication_role Conflict
If the refresh engine sets session_replication_role = 'replica' to suppress
user triggers (Phase 1 of the user-trigger plan), this may also suppress
publication of the DML to logical replication subscribers. When a session
is in replica mode, PostgreSQL treats it as a replication subscriber — DML
performed in that session may not be forwarded to downstream subscribers
(depending on the publication's publish_via_partition_root and the
subscriber's origin setting).
This is a potential conflict between the two features. Options:
| Option | User Triggers Suppressed? | Replication Published? | Drawback |
|---|---|---|---|
session_replication_role = 'replica' | ✅ Yes | ❌ May not be published | Breaks logical replication from STs |
ALTER TABLE ... DISABLE TRIGGER USER | ✅ Yes | ✅ Yes | Requires ACCESS EXCLUSIVE lock |
pg_trickle.suppress_user_triggers GUC → DISABLE TRIGGER USER only when needed | ✅ Configurable | ✅ Yes | Lock overhead; crash safety concern (ENABLE on recovery) |
tgisinternal flag manipulation | ✅ Yes | ✅ Yes | Non-portable; catalog-level hack |
Recommended resolution: Use ALTER TABLE ... DISABLE TRIGGER USER within
a SAVEPOINT, restoring on error. The ACCESS EXCLUSIVE lock is brief (only
held for the catalog update, not the entire refresh). If the user has enabled
both user triggers AND logical replication on a stream table, this is the only
approach that supports both simultaneously. If neither feature is in use, skip
the overhead entirely.
3. Separating Creation-Time from Steady-State
The original ADR chose triggers because pg_create_logical_replication_slot()
cannot execute inside a transaction that has already performed writes. This
report initially treated that constraint as "decisive." But it deserves
scrutiny: the atomicity constraint only affects the create_stream_table()
call — a one-time event. Once a stream table exists, CDC runs for hours,
days, or months. The steady-state characteristics are what actually matter for
performance, correctness, and user experience.
3.1 The Atomicity Constraint Is a Solvable Engineering Problem
The constraint is real but workable. Three approaches exist, all with well-understood trade-offs:
| Approach | How It Works | Downside |
|---|---|---|
| Two-phase creation | Phase 1: DDL + catalog in one transaction. Phase 2: slot creation in a separate transaction. Rollback Phase 1 artifacts on Phase 2 failure. | Brief window where catalog entry exists without CDC. Cleanup on failure adds ~50 lines of code. |
| Background worker handoff | Main transaction creates DDL + catalog + temporary trigger. Background worker creates slot asynchronously, then drops trigger. | Race window: changes between COMMIT and slot creation are captured by the temporary trigger, so no data is lost. Adds complexity (~100 lines). |
| Trigger bootstrap → slot transition | Create with triggers (current approach). After first successful refresh, migrate to logical replication in the background. | Trigger overhead during bootstrap period (minutes). Most natural hybrid approach. |
None of these are architecturally difficult. The two-phase approach is straightforward — if slot creation fails, drop the storage table and catalog entry. The temporary-trigger approach eliminates even the theoretical data-loss window. These are engineering inconveniences, not fundamental blockers.
3.2 Steady-State: Triggers vs Logical Replication (Honest Comparison)
Once the stream table exists and CDC is running, here is how the two approaches compare on their actual runtime merits.
In plain terms: With triggers, every time the application writes a row to a tracked source table, the database does extra work right then and there — calling a function, writing to a buffer table, updating an index — all before the application's transaction can finish. This is like a toll booth on a highway: every car (write) must stop and pay (trigger overhead) before continuing.
With logical replication, the database already writes every change to its internal transaction log (the WAL) as part of normal operation. CDC simply reads that log after the fact, in a separate background process. The application's writes pass through without stopping — there is no toll booth. The cost of reading the log is paid by the database server, but it happens asynchronously and never slows down the application.
Where Logical Replication Wins (Steady-State)
| Dimension | Trigger Impact | Logical Replication Advantage |
|---|---|---|
| Write-path latency | Every INSERT/UPDATE/DELETE on a tracked source pays ~2–15 μs synchronous overhead (PL/pgSQL dispatch, buffer INSERT, index update). This is inside the committing transaction's critical path. | Zero additional write-path cost. WAL writes happen regardless; decoding is asynchronous. Source table DML performance is completely unaffected. |
| Write amplification | Each source row change produces: (1) source table WAL, (2) buffer table heap write, (3) buffer table WAL, (4) buffer index update, (5) index WAL. ~2–3× total write amplification. | 1× — only the source table's normal WAL. No additional heap writes, no secondary indexes. |
| TRUNCATE capture | Cannot capture. Row-level triggers don't fire. Requires a separate statement-level AFTER TRUNCATE workaround (§4) that only marks for reinit — the actual row deletions are invisible to differential mode. | Native. WAL emits TRUNCATE events since PG 11. The decoder receives a clean signal that all rows were removed. |
| Throughput ceiling | Estimated ~5,000 writes/sec on tracked sources before trigger overhead dominates. PL/pgSQL function dispatch is the bottleneck. | Bounded by WAL throughput — typically 50,000–200,000+ writes/sec depending on hardware and wal_buffers. |
| Connection-pool pressure | Trigger executes in the application's connection. Long-running trigger INSERTs can increase connection hold time under load. | Decoding runs in a dedicated WAL sender process. Application connections are unaffected. |
| Vacuum pressure | Buffer tables accumulate dead tuples between cleanups. Each refresh cycle creates bloat that autovacuum must reclaim. | No buffer tables to vacuum. WAL segments are recycled by the WAL management subsystem. |
| Transaction ID consumption | Each trigger INSERT consumes sub-transaction resources within the outer transaction. High-volume batch operations can cause excessive subtransaction overhead. | No additional transaction work. |
Where Triggers Win (Steady-State)
| Dimension | Trigger Advantage | Logical Replication Impact |
|---|---|---|
| Operational simplicity | No external state to manage. Buffer tables are regular heap tables — queryable, monitorable, backed up normally. Drop the trigger and it's gone. | Replication slots are persistent server-side state. A stuck or crashed consumer prevents WAL recycling, potentially filling the disk. Requires monitoring, max_slot_wal_keep_size guards, and orphan-slot cleanup. |
| Zero configuration | Works with any wal_level (minimal, replica, logical). No restart required. No REPLICA IDENTITY configuration. | Requires wal_level = logical (server restart), max_replication_slots sizing, and REPLICA IDENTITY on every tracked source table. Many managed PostgreSQL providers default to wal_level = replica. |
| Schema evolution | DDL event hooks rebuild the trigger function via CREATE OR REPLACE FUNCTION. New columns are added to the buffer table with ADD COLUMN IF NOT EXISTS. Simple, same-transaction, no coordination. | Schema changes on tracked tables require careful handling. The output plugin must be aware of column additions/removals. Slot may need to be recreated. ALTER TABLE during active decoding can cause protocol errors. |
| Debugging & visibility | Change buffers are queryable tables: SELECT * FROM pgtrickle_changes.changes_12345 ORDER BY change_id DESC LIMIT 10. Immediate visibility into what was captured. | WAL is binary and opaque. Inspecting captured changes requires pg_logical_slot_peek_changes() which advances or peeks the slot — disruptive in production. |
| Crash recovery | Buffer tables are WAL-logged and survive crashes. No special recovery needed — the refresh engine picks up from the last frontier LSN. | Slots survive crashes, but the decoding position may be ahead of what pg_trickle has consumed. Requires careful bookkeeping to avoid replaying or losing changes. |
| Multi-source coordination | Each source has an independent buffer table. The refresh engine reads from multiple buffers with independent LSN ranges. No coordination between sources. | Multiple sources could share a single slot (decoding all tables) or use per-source slots. Shared slots require demultiplexing; per-source slots multiply the slot management burden. |
| Isolation | Trigger failure (e.g., buffer table full) raises an error in the application transaction — visible and immediate. | Decoding failure is asynchronous. The application commits successfully, but changes may never reach the buffer. Silent data loss is possible unless monitored. |
Neutral (Roughly Equivalent)
| Dimension | Notes |
|---|---|
| Refresh-path performance | Both approaches populate the same buffer table schema. The MERGE/DVM pipeline is identical regardless of how buffers were filled. |
| Zero-change detection | Triggers: EXISTS check on empty buffer (~3 ms). Logical replication: check slot position vs current WAL LSN (~3 ms). Equivalent. |
| Memory footprint | Triggers: PL/pgSQL function cache per backend. Logical replication: WAL sender process + decoding context. Both are modest. |
3.3 When Does Logical Replication Become the Better Choice?
The crossover point depends on workload characteristics:
| Scenario | Better Choice | Why |
|---|---|---|
| < 1,000 writes/sec on tracked sources | Triggers | Overhead is negligible; operational simplicity dominates |
| 1,000–5,000 writes/sec | Either / Triggers still acceptable | Trigger overhead is measurable but unlikely to be the bottleneck |
| > 5,000 writes/sec | Logical Replication | Write-path overhead starts to matter; 2–3× write amplification compounds |
| ETL patterns (TRUNCATE + bulk INSERT) | Logical Replication | Native TRUNCATE capture; no stale-data gap |
| Wide tables (20+ columns) | Logical Replication | Trigger overhead scales with column count (~5–15 μs); WAL overhead does not |
Managed PostgreSQL with wal_level restrictions | Triggers | No choice — logical replication may not be available |
| Many tracked sources (50+) | Logical Replication | Fewer moving parts than 50 triggers + 50 buffer tables + 50 indexes |
| Need logical replication FROM stream tables | Triggers (with caveats) | see §2.5 — session_replication_role conflict with DISABLE TRIGGER USER as workaround |
3.4 Reassessing the Decision
With the atomicity constraint properly scoped as a creation-time concern, the decision to use triggers rests on three remaining pillars:
-
Operational simplicity — no
wal_levelchange, no slot management, noREPLICA IDENTITYconfiguration. This is genuinely valuable for an early-stage extension that needs frictionless adoption. -
Debugging visibility — queryable buffer tables are a major developer experience advantage. Being able to
SELECT * FROM changes_<oid>during debugging is invaluable. -
Zero-config deployment — works on any PostgreSQL 18 instance without server restarts or configuration changes. Critical for managed PostgreSQL environments.
However, these advantages are primarily about developer and operator experience, not about the fundamental capability of the system. A mature pg_trickle deployment that needs high write throughput, TRUNCATE support, or minimal source-table impact would be better served by logical replication in steady-state.
The honest assessment: Triggers are the right choice today for pragmatic reasons (simplicity, early-stage adoption, managed PG compatibility). But the report should not overstate the atomicity constraint as a fundamental blocker — it is a solvable problem. If pg_trickle grows to serve high-throughput production workloads, the migration to logical replication for steady-state CDC should be treated as a planned evolution, not a theoretical future.
4. TRUNCATE: The Gap and How to Close It
This limitation is one of the strongest arguments for logical replication in steady-state — see §3.2 for the comparison.
The TRUNCATE limitation is the most commonly cited drawback of trigger-based CDC. PostgreSQL does not fire row-level triggers for TRUNCATE because TRUNCATE operates at the file level (O(1)) — there are no individual rows to enumerate.
Current Behavior
- User runs
TRUNCATE source_table - CDC trigger does not fire — change buffer remains empty
- Scheduler sees zero changes →
NO_DATA→ stream table is stale - Stream table shows data from rows that no longer exist
Proposed Fix: Statement-Level AFTER TRUNCATE Trigger
PostgreSQL supports statement-level AFTER TRUNCATE triggers. While they
provide no OLD row data, they can mark downstream stream tables for
reinitialization:
CREATE TRIGGER pg_trickle_truncate_<oid>
AFTER TRUNCATE ON <source_table>
FOR EACH STATEMENT
EXECUTE FUNCTION pgtrickle.on_source_truncated('<source_oid>');
The trigger function would:
- Look up all stream tables that depend on this source
- Mark them
needs_reinit = truein the catalog - Cascade transitively to downstream STs
This closes the TRUNCATE gap without changing the CDC architecture. The next scheduler cycle would trigger a FULL refresh automatically.
Effort estimate: ~2–4 hours (trigger creation in cdc.rs, PL/pgSQL or
Rust function for on_source_truncated, cascade logic reuse from hooks.rs).
5. Migration Path: Trigger → Logical Replication (Now Implemented)
Status: Phase A (Hybrid Creation) is now implemented in
src/wal_decoder.rs. Thepg_trickle.cdc_modeGUC controls the behavior (trigger/auto/wal).
As discussed in §3, the atomicity constraint is a creation-time problem with known solutions. The buffer table schema and downstream IVM pipeline are decoupled from the capture mechanism, so migration is isolated to the CDC layer. This should be treated as a planned evolution for high-throughput deployments, not a theoretical future:
Phase A: Hybrid Creation
create_stream_table()continues using triggers for atomic creation- After first successful full refresh, a background worker creates a replication slot and transitions to WAL-based capture
- Trigger is dropped; buffer table continues to be populated from WAL decode
Phase B: Steady-State WAL Capture
- Background worker runs a logical decoding consumer per tracked source
- WAL changes are decoded and written to the same buffer table schema
- Downstream pipeline (DVM, MERGE, frontier) is unchanged
- TRUNCATE events are captured natively from WAL
Prerequisites
wal_level = logical(must be documented as optional upgrade path)REPLICA IDENTITYon tracked sources (auto-configured or user-managed)- Custom output plugin or
pgoutput+ column mapping - Slot health monitoring (WAL retention alerts, orphan cleanup)
Effort estimate: 3–5 weeks for a production-quality implementation.
6. Recommendations
Recommendation 1: Keep Trigger-Based CDC (For Now)
Operational simplicity and zero-config deployment are strong advantages for an early-stage extension. The performance ceiling (~5,000 writes/sec) is adequate for current target use cases. The atomicity constraint, while solvable (see §3.1), adds creation-time complexity that is not yet justified.
However: This decision should be revisited when any of these triggers are
hit: (a) users report write-path latency from CDC triggers, (b) TRUNCATE-based
ETL patterns become a common pain point, (c) pg_trickle targets environments
where wal_level = logical is already the norm. The steady-state advantages of
logical replication (§3.2) are substantial and should not be dismissed.
Recommendation 2: ✅ IMPLEMENTED — User Trigger Suppression
User-defined triggers on stream tables are now fully supported. The
implementation uses ALTER TABLE ... DISABLE TRIGGER USER / ENABLE TRIGGER USER around FULL refresh, and explicit per-row DML (INSERT/UPDATE/DELETE)
instead of MERGE during DIFFERENTIAL refresh so user AFTER triggers fire
correctly. Controlled by pg_trickle.user_triggers GUC (auto/on/off).
The session_replication_role approach from the original plan was rejected to
avoid conflict with logical replication publishing (see §2.5).
Recommendation 3: Add TRUNCATE Capture Trigger
Add a statement-level AFTER TRUNCATE trigger on each tracked source table
that marks downstream STs for reinitialization. This closes the most
significant usability gap without changing the CDC architecture.
Recommendation 4: Document Logical Replication FROM Stream Tables
Add documentation and examples for CREATE PUBLICATION on stream tables,
including:
- Column filtering to exclude
__pgt_row_id REPLICA IDENTITYconfiguration using__pgt_row_idas unique index- Behavior during FULL vs DIFFERENTIAL refresh
- Interaction with user trigger suppression
Recommendation 5: Benchmark Trigger Overhead
Execute the benchmark plan in PLAN_TRIGGERS_OVERHEAD.md to establish data-driven thresholds for the logical replication migration crossover point. The results should feed directly into the §3.3 crossover analysis.
Recommendation 6: ✅ IMPLEMENTED — Hybrid CDC Approach
The "trigger bootstrap → slot transition" pattern is now implemented in
src/wal_decoder.rs (1152 lines). The implementation includes:
- Automatic transition: After stream table creation with triggers, a background worker creates a logical replication slot and transitions to WAL-based capture.
- GUC control:
pg_trickle.cdc_mode(trigger/auto/wal) andpg_trickle.wal_transition_timeoutcontrol the behavior. - Transition orchestration: Create slot → wait for catch-up → drop trigger. Automatic fallback to triggers if slot creation fails.
- Catalog extension:
pgt_dependenciesgainscdc_mode,slot_name,decoder_confirmed_lsn,transition_started_atcolumns. - Health monitoring:
pgtrickle.check_cdc_health()function andNOTIFY pg_trickle_cdc_transitionnotifications.
7. Decision Log
| # | Decision | Rationale |
|---|---|---|
| D1 | Keep triggers for CDC on source tables — for now | Zero-config, operational simplicity, adequate for current scale |
| D2 | Atomicity constraint is solvable, not fundamental | Two-phase creation and hybrid bootstrap are proven patterns (§3.1) |
| D3 | Logical replication is superior in steady-state | Zero write overhead, TRUNCATE capture, higher throughput ceiling (§3.2) |
| D4 | User triggers on STs are orthogonal to CDC choice | session_replication_role / DISABLE TRIGGER USER works with either approach |
| D5 | Logical replication FROM STs works today | Regular heap tables; needs documentation, not code |
| D6 | TRUNCATE gap is closable with statement-level trigger | Low effort, high impact — but logical replication handles it natively |
| D7 | Hybrid approach is the optimal long-term target | Trigger bootstrap for creation + logical replication for steady-state |
| D8 | User trigger suppression uses DISABLE TRIGGER USER | Avoids session_replication_role conflict with logical replication publishing (§2.5) |
| D9 | Hybrid CDC implemented with auto-transition | pg_trickle.cdc_mode = 'auto' triggers → WAL transition after creation |
| D10 | Explicit DML for DIFFERENTIAL refresh with user triggers | INSERT/UPDATE/DELETE instead of MERGE so AFTER triggers fire correctly |
Prior Art
This document lists the academic papers, PostgreSQL commits, open-source tools,
and standard algorithms whose techniques are reused in pg_trickle.
Maintaining this record serves two purposes:
- Attribution — credit the research and engineering work this project builds upon.
- Independent derivation — demonstrate that every core technique predates and is independent of any single vendor's commercial product.
Differential View Maintenance (DVM)
DBSP — Automatic Incremental View Maintenance
Budiu, M., Ryzhyk, L., McSherry, F., & Tannen, V. (2023). "DBSP: Automatic Incremental View Maintenance for Rich Query Languages." Proceedings of the VLDB Endowment (PVLDB), 16(7), 1601–1614. https://arxiv.org/abs/2203.16684
The Z-set abstraction (rows annotated with +1/−1 multiplicity) is the
theoretical foundation for the __pgt_action column produced by the delta
operators in src/dvm/operators/. The per-operator differentiation rules
(scan, filter, project, join, aggregate, union) are direct applications of
the DBSP lifting operator (D) described in this paper.
See DBSP_COMPARISON.md for a detailed comparison of pg_trickle's architecture with the DBSP model.
Gupta & Mumick — Materialized Views Survey
Gupta, A. & Mumick, I.S. (1995). "Maintenance of Materialized Views: Problems, Techniques, and Applications." IEEE Data Engineering Bulletin, 18(2), 3–18.
Gupta, A. & Mumick, I.S. (1999). Materialized Views: Techniques, Implementations, and Applications. MIT Press. ISBN 978-0-262-57122-7.
The per-operator differentiation rules in src/dvm/operators/ follow the
derivation given in section 3 of the 1995 survey. The counting algorithm
for maintaining aggregates with deletions uses the approach described in
the MIT Press book.
DBToaster — Higher-order Delta Processing
Koch, C., Ahmad, Y., Kennedy, O., Nikolic, M., Nötzli, A., Olteanu, D., & Zavodny, J. (2014). "DBToaster: Higher-order Delta Processing for Dynamic, Frequently Fresh Views." The VLDB Journal, 23(2), 253–278. https://doi.org/10.1007/s00778-013-0348-4
Inspiration for the recursive delta compilation strategy where the delta of a complex query is itself a query that can be differentiated.
DRed — Deletion and Re-derivation
Gupta, A., Mumick, I.S., & Subrahmanian, V.S. (1993). "Maintaining Views Incrementally." Proceedings of the 1993 ACM SIGMOD International Conference, 157–166.
The DRed algorithm for handling deletions in recursive views is the basis for
the recursive CTE differential refresh strategy in src/dvm/operators/recursive_cte.rs.
Scheduling
Earliest-Deadline-First (EDF)
Liu, C.L. & Layland, J.W. (1973). "Scheduling Algorithms for Multiprogramming in a Hard-Real-Time Environment." Journal of the ACM, 20(1), 46–61. https://doi.org/10.1145/321738.321743
The schedule-based scheduling in src/scheduler.rs applies the classic
EDF principle: the stream table whose freshness deadline expires soonest is
refreshed first. EDF is optimal for uniprocessor preemptive scheduling and is
a standard technique in operating systems and real-time databases.
Topological Sort — Kahn's Algorithm
Kahn, A.B. (1962). "Topological sorting of large networks." Communications of the ACM, 5(11), 558–562. https://doi.org/10.1145/368996.369025
The dependency DAG in src/dag.rs uses Kahn's algorithm for topological
ordering and cycle detection. This is standard computer science curriculum
and appears in every major algorithms textbook (Cormen et al., Sedgewick,
Kleinberg & Tardos).
Change Data Capture (CDC)
PostgreSQL Row-Level Triggers
Row-level AFTER INSERT/UPDATE/DELETE triggers have been available in
PostgreSQL since version 6.x (late 1990s). The trigger-based change capture
pattern used in src/cdc.rs is a well-established PostgreSQL technique:
- PostgreSQL documentation: CREATE TRIGGER — trigger-based CDC has been a standard pattern for decades.
- PostgreSQL wiki: "Trigger-based Change Data Capture in PostgreSQL."
Debezium
Debezium project (Red Hat, open source since 2016). https://debezium.io/
Debezium implements trigger-based and WAL-based CDC for PostgreSQL and other
databases. The change buffer table pattern (pg_trickle_changes.changes_<oid>)
follows a similar approach, modified for single-process consumption within
the PostgreSQL backend.
pgaudit
pgaudit extension (2015). https://github.com/pgaudit/pgaudit
Captures DML via AFTER row-level triggers for audit logging, demonstrating
the same trigger-based change-capture technique in production since 2015.
Materialized View Refresh
PostgreSQL REFRESH MATERIALIZED VIEW CONCURRENTLY
PostgreSQL 9.4 (December 2014, commit
96ef3b8).src/backend/commands/matview.c
The snapshot-diff strategy used for recomputation-diff refreshes (where the
full query is re-executed and anti-joined against current storage to compute
inserts and deletes) mirrors the algorithm implemented in PostgreSQL's
REFRESH MATERIALIZED VIEW CONCURRENTLY. This PostgreSQL feature predates
all relevant patents and is publicly documented.
SQL MERGE Statement
ISO/IEC 9075:2003 (SQL:2003 standard) —
MERGEstatement. PostgreSQL 15 (October 2022, commit7103eba).
The MERGE-based delta application in src/refresh.rs uses the
ISO-standard MERGE statement, independently implemented by Oracle, SQL
Server, DB2, and PostgreSQL. This is not derived from any vendor-specific
implementation.
General Database Theory
Relational Algebra
Codd, E.F. (1970). "A Relational Model of Data for Large Shared Data Banks." Communications of the ACM, 13(6), 377–387.
The operator tree in src/dvm/parser.rs models standard relational algebra
operators (select, project, join, aggregate, union). These are foundational
database theory from 1970.
Semi-Naive Evaluation
Bancilhon, F. & Ramakrishnan, R. (1986). "An Amateur's Introduction to Recursive Query Processing Strategies." Proceedings ACM SIGMOD, 16–52.
General background for recursive CTE evaluation strategies. PostgreSQL's own
WITH RECURSIVE implementation uses iterative fixpoint evaluation based on
these principles.
This document is maintained for attribution and independent-derivation documentation purposes. It does not constitute legal advice.
Multi-Database Refresh Broker — Design Document
Implementation Status: Design only — not yet implemented. This feature is planned for a future release after v1.0. Track progress at ROADMAP.md.
The design below is a stable reference for contributors and reviewers. The API described here is aspirational and subject to change before implementation begins.
Status: Design only (v0.31.0 SCAL-2). Implementation planned for a future release.
Problem
When pg_trickle is installed in multiple databases on the same PostgreSQL cluster, each
per-database scheduler independently scans its change buffers. For workloads where two
databases reference the same upstream source — commonly via postgres_fdw foreign tables
or logical replication — each scheduler pays the full scan cost independently:
DB A scheduler: SELECT * FROM pgtrickle_changes.changes_12345 (full scan)
DB B scheduler: SELECT * FROM pgtrickle_changes.changes_12345 (same scan, again)
At 100 stream tables across 10 databases with 5 shared sources, this is 10× the necessary I/O.
Goal
Introduce a "refresh broker" — a singleton background worker that:
- De-duplicates change-buffer scans across databases in the same cluster.
- Distributes scan results to per-database schedulers via shared memory.
- Reduces total change-buffer I/O proportionally to the number of databases sharing a source.
Design
Components
┌─────────────────────────────────────────────────────────────┐
│ PostgreSQL Cluster │
│ │
│ ┌────────────────┐ shared memory ┌─────────────┐ │
│ │ Refresh Broker │ ──── scan results ────► │ DB-A sched │ │
│ │ (singleton) │ │ DB-B sched │ │
│ │ │ ◄─── scan requests ──── │ DB-C sched │ │
│ └────────────────┘ └─────────────┘ │
│ │ │
│ │ single scan per source OID per tick │
│ ▼ │
│ pgtrickle_changes.changes_{oid} │
└─────────────────────────────────────────────────────────────┘
Broker Protocol
-
Registration: Each per-database scheduler registers its interest in a set of source OIDs with the broker via shared memory (
PgLwLock<BrokerRegistry>). -
Tick coordination: At the start of each tick, the broker scans each registered source OID once. Results (row counts + LSN watermarks) are written to a shared memory segment indexed by (database_oid, source_oid).
-
Result consumption: Per-database schedulers read the broker's results instead of issuing their own SPI queries. The broker's scan is authoritative for the tick.
-
Fallback: If the broker is not running (e.g.
max_worker_processesexhausted), per-database schedulers fall back to their current direct-scan behaviour.
Shared Memory Layout
#![allow(unused)] fn main() { /// BRK-1: Broker registry entry for one (database, source) pair. struct BrokerEntry { db_oid: pg_sys::Oid, source_oid: pg_sys::Oid, /// Last scan result: row count in the change buffer. pending_rows: i64, /// Last scan result: maximum LSN seen in the change buffer. max_lsn_u64: u64, /// Monotone tick counter when this entry was last updated. last_updated_tick: u64, } /// Maximum number of (db, source) pairs the broker tracks. const BROKER_CAPACITY: usize = 4096; }
Broker Worker Loop
loop:
1. Sleep until next tick (shared scheduler_interval_ms).
2. Lock BrokerRegistry (read) to collect unique source OIDs.
3. For each unique source OID, run:
SELECT COUNT(*), MAX(lsn) FROM pgtrickle_changes.changes_{oid}
4. Write results to BrokerScanResults (lock-free CAS update).
5. Advance broker tick counter.
6. Per-DB schedulers wake and consume results.
Integration with Per-DB Schedulers
In src/scheduler.rs, has_table_source_changes would gain a fast path:
#![allow(unused)] fn main() { fn has_table_source_changes(st: &StreamTableMeta) -> bool { // Fast path: try broker results first. if config::pg_trickle_adaptive_batch_coalescing() { if let Some(result) = broker::get_scan_result(source_oid) { return result.pending_rows > 0; } } // Fallback: direct SPI query (current behaviour). // ... } }
Open Questions
-
Transaction isolation: The broker scans in its own transaction. Per-DB schedulers that read its results are using data from a different snapshot. Is this acceptable? (Short answer: yes — the existing behaviour already has a tick-window delay between when changes are written and when they are consumed.)
-
Cross-database connectivity: The broker must connect to each database to read its change buffers. PostgreSQL background workers connect to a specific database. We may need a pool of broker workers, one per database, coordinated by a shared-memory rendezvous point.
-
Authorization: The broker needs read access to
pgtrickle_changes.*in each database. This is satisfied byshared_preload_libraries+SECURITY DEFINERwrapper functions. -
Failure isolation: If the broker crashes, per-DB schedulers must detect the absence of fresh results and fall back to direct scans within one tick.
Next Steps (v0.32.0+)
-
Implement
BrokerRegistryshared memory struct +init_shared_memoryhook - Implement broker background worker registration
-
Add
broker::get_scan_resultfast path inhas_table_source_changes -
Add
pg_trickle.enable_refresh_brokerGUC (defaultfalseuntil stable) - Add E2E test: two databases sharing a source, broker reduces SPI queries to 1×
- Benchmark: 10 databases × 5 shared sources — measure scan reduction
Filed under v0.31.0 (SCAL-2). ADR reference: see plans/adrs/ for architectural rationale on trigger-based vs broker-based CDC.