pg_trickle Limitations

This document covers what pg_trickle cannot do, the constraints of DIFFERENTIAL mode, source table restrictions, and operational anti-patterns. Use the decision tree at the end to quickly determine if your use case is supported.

Unsupported SQL Constructs

The following SQL features are not supported in the defining query of a stream table. Attempting to use them will produce an UnsupportedOperator error at creation time, or at the first refresh attempt.

In DIFFERENTIAL mode only

These constructs force a fallback to FULL refresh. The stream table is created successfully, but every refresh performs a full table recomputation:

Construct	Reason	Workaround
`ORDER BY` without `LIMIT`	Result ordering is non-deterministic as a delta	Remove `ORDER BY`, or use `ORDER BY ... LIMIT N` for Top-N views
`TABLESAMPLE`	Non-deterministic sampling cannot be differentiated	Use `FULL` mode explicitly
`VOLATILE` functions in SELECT	`random()`, `now()`, `clock_timestamp()`, `nextval()` change on every call	Pre-compute volatile values in the source table, or use `FULL` mode
`STABLE` functions in GROUP BY keys	Key values change between refresh cycles	Use stable or immutable functions only
Window functions in output	`ROW_NUMBER()`, `RANK()`, `LEAD()`, `LAG()` require global reordering	Use `FULL` mode, or pre-aggregate and use Top-N views
`FETCH FIRST` without `ORDER BY`	Non-deterministic selection	Add a deterministic `ORDER BY` and use Top-N via `LIMIT`
`GROUPING SETS` beyond branch limit	Explosion prevents O(Δ) maintenance	Reduce dimensions, or raise `pg_trickle.max_grouping_set_branches`

Not supported at all (any mode)

Construct	Reason
`WITH RECURSIVE`	Recursive CTEs require convergence logic not yet implemented
DDL inside the defining query	`CREATE TABLE`, `CALL` etc. are not valid in `SELECT`
`RETURNING` clauses	Not applicable to `SELECT` queries
`FOR UPDATE` / `FOR SHARE`	Locking hints cannot be used in defining queries
Subqueries with side effects	`INSERT ... RETURNING` in subqueries
`pg_catalog` internal tables as sources	Internal catalog tables are not tracked by CDC
Temp tables as sources	Temporary tables are session-scoped; CDC triggers cannot be installed

DIFFERENTIAL Mode Constraints

Source table requirements

For DIFFERENTIAL mode to work correctly, each source table must:

Have a primary key or unique index on the columns used as join keys. Without a reliable row identity, the MERGE step cannot match old and new versions of a row. pg_trickle can fall back to a hash-based row ID (__pgt_row_id) for sources without primary keys, but this adds overhead.
Not use UNLOGGED or TEMPORARY storage for the stream table output. The stream table must survive a crash-recovery cycle. Source tables can be UNLOGGED (changes are still captured by triggers).
Not be altered concurrently in ways that change column structure while a refresh is running. pg_trickle blocks source DDL by default (pg_trickle.block_source_ddl = true). Disabling this risks schema inconsistency between the change buffer and the stream table.

Multi-source join constraints

When a defining query joins multiple source tables:

All join keys must be equi-joins (e.g., t1.id = t2.id). Range joins (t1.ts BETWEEN t2.start AND t2.end) force FULL mode.
The number of delta CTEs grows with the number of sources. Queries joining 5+ large tables may hit the pg_trickle.max_diff_ctes limit. The default limit is 200; raise it or simplify the query.
Left outer joins with nullable right-side keys add correctness complexity. pg_trickle handles them correctly, but the delta SQL is larger.

Aggregate constraints

Aggregate	Supported?	Notes
`COUNT(*)`	✅ Yes	Fully algebraic
`SUM(x)`	✅ Yes	Fully algebraic
`MIN(x)`, `MAX(x)`	✅ Yes	With reference counting
`AVG(x)`	✅ Yes	Via sum + count decomposition
`STDDEV(x)`, `VARIANCE(x)`	✅ Yes	Via sum-of-squares decomposition
`COUNT(DISTINCT x)`	✅ Yes	Via Z-set algebraic counting
`ARRAY_AGG(x)`	❌ No	Order-dependent; use FULL mode
`STRING_AGG(x, sep)`	❌ No	Order-dependent; use FULL mode
`JSON_AGG(x)`	❌ No	Order-dependent; use FULL mode
`PERCENTILE_CONT(f) WITHIN GROUP (ORDER BY x)`	❌ No	Requires global sort
`MODE()`	❌ No	Requires global frequency computation
Custom user-defined aggregates	⚠️ Maybe	Supported if the aggregate provides `sfunc` + `finalfunc` that pg_trickle can decompose; marked `STRICT` aggregates are rejected

Source Table Restrictions

Supported source types

Source type	Supported?	Notes
Regular heap tables	✅ Yes	Full CDC support
Partitioned tables (declarative)	✅ Yes	Triggers installed on each partition
Foreign tables (postgres_fdw)	✅ Yes	Snapshot-comparison mode
Materialized views	✅ Yes	Snapshot-comparison mode
Other stream tables	✅ Yes	DAG chaining supported
UNLOGGED tables	✅ Yes (source)	Changes captured; stream table output must be logged
Temporary tables	❌ No	Session-scoped; CDC triggers cannot persist
System catalogs (`pg_class`, etc.)	❌ No	Not tracked by CDC
Views (non-materialized)	❌ No	Automatically inlined; the base tables become the sources
Remote tables via dblink	⚠️ Limited	Use foreign tables via postgres_fdw instead

Column type constraints

text-typed columns named as join keys work, but are less efficient than integer or UUID keys. Use an index on the join key columns.
jsonb columns in GROUP BY are supported but hash joins on JSONB are expensive. Consider extracting the key sub-field.
bytea columns work in the output but cannot be used as GROUP BY keys in DIFFERENTIAL mode.

Operational Anti-Patterns

Anti-pattern 1: Very high write rates with low schedules

Problem: If a source table receives 100K inserts/second and the stream table schedule is 1 second, the change buffer accumulates 100K rows per cycle. The DIFFERENTIAL delta SQL must process all 100K rows on every refresh, which may take longer than 1 second — causing the scheduler to fall behind.

Fix:

Increase the schedule to allow batching: schedule => '10s'
Enable the adaptive fallback: pg_trickle.differential_max_change_ratio = 0.15 (default: fall back to FULL when > 15% of the source table changed)
Use pg_trickle.max_delta_estimate_rows to cap delta size

Anti-pattern 2: Unbounded DAG depth

Problem: A chain of 20+ stream tables where each depends on the previous creates O(depth) sequential refresh latency on every cycle.

Fix: Flatten the DAG where possible. Use parallel refresh (pg_trickle.parallel_refresh_mode = 'on') for independent branches. Consider whether intermediate stream tables are necessary.

Anti-pattern 3: Schema changes on live sources

Problem: ALTER TABLE ... DROP COLUMN on a source table while pg_trickle is running will break the change buffer schema and cause refresh errors.

Fix: Keep pg_trickle.block_source_ddl = true (the default). This causes schema changes to fail with a descriptive error; you can then update the stream table query explicitly before re-applying the schema change.

Anti-pattern 4: Treating stream tables as application write targets

Problem: Inserting or updating rows directly in a stream table bypasses pg_trickle's refresh logic. On the next refresh, the direct writes will be overwritten.

Fix: Stream tables are read-only from the application's perspective. All writes must go through the source tables. Use the pgtrickle.repair_stream_table() function if a stream table gets into an inconsistent state.

Anti-pattern 5: Using `pg_trickle.enabled = false` in production

Problem: Setting pg_trickle.enabled = false globally stops all refreshes. Change buffers accumulate indefinitely. Re-enabling causes a burst refresh of all stream tables simultaneously.

Fix: Use pgtrickle.suspend_stream_table() to pause individual stream tables, or pg_trickle.drain_mode = true to stop new work while completing in-flight refreshes.

"Will This Work?" Decision Tree

Does your query use window functions in the output?
  YES → Use FULL mode (refresh_mode => 'FULL')
  NO  ↓

Does your query use volatile functions (random(), now(), nextval())?
  YES → Use FULL mode, or pre-compute the volatile value in the source
  NO  ↓

Does your query use ORDER BY without LIMIT?
  YES → Remove ORDER BY, or use LIMIT N for a Top-N stream table
  NO  ↓

Does your query use WITH RECURSIVE?
  YES → Not supported. Materialize the recursive query into a regular table first.
  NO  ↓

Do all join keys use equi-join conditions (= not BETWEEN / >=)?
  NO  → Use FULL mode, or rewrite the join condition
  YES ↓

Does every source table have a primary key or unique index on the join key?
  NO  → pg_trickle will use hash-based row IDs (slightly less efficient, but works)
  YES ↓

✅ Your query is a good candidate for DIFFERENTIAL mode.
   Use: refresh_mode => 'DIFFERENTIAL' or 'AUTO' (default).

Multi-Column NOT IN with Nullable Elements (COR-1, v0.58.0)

When a defining query contains a multi-column NOT IN subquery such as:

SELECT a, b FROM t
WHERE (a, b) NOT IN (SELECT x, y FROM s)

pg_trickle v0.55.0 introduced an optimisation that rewrites (a, b) IN (SELECT x, y …) as a SemiJoin and NOT IN as an AntiJoin. However, SQL semantics for NOT IN differ from AntiJoin semantics when either side of the comparison can be NULL: SQL propagates UNKNOWN (which excludes the outer row), whereas an AntiJoin keeps the outer row.

Behaviour: When any element on the left-hand side of the row constructor is a NULL constant, or when any column in the subquery's SELECT list is a NULL literal, pg_trickle v0.58.0 detects this condition and falls back to the subquery-based (FULL refresh) execution path, emitting a NOTICE:

NOTICE: pg_trickle: multi-column NOT IN with nullable elements cannot be
rewritten to an anti-join; falling back to subquery-based delta computation.

Workaround: Rewrite using NOT EXISTS or add explicit IS NOT NULL guards to avoid NULL-producing expressions in the row constructor.

Known Future Improvements

Limitation	Planned in
Window functions in output	v1.1+
`WITH RECURSIVE` support	v1.2+
`STRING_AGG` / `ARRAY_AGG` incremental maintenance	Researching
Cross-database stream tables (without foreign tables)	Not planned

pg_trickle Documentation