pg_trickle Limitations
This document covers what pg_trickle cannot do, the constraints of DIFFERENTIAL mode, source table restrictions, and operational anti-patterns. Use the decision tree at the end to quickly determine if your use case is supported.
Unsupported SQL Constructs
The following SQL features are not supported in the defining query of a
stream table. Attempting to use them will produce an UnsupportedOperator
error at creation time, or at the first refresh attempt.
In DIFFERENTIAL mode only
These constructs force a fallback to FULL refresh. The stream table is created successfully, but every refresh performs a full table recomputation:
| Construct | Reason | Workaround |
|---|---|---|
ORDER BY without LIMIT | Result ordering is non-deterministic as a delta | Remove ORDER BY, or use ORDER BY ... LIMIT N for Top-N views |
TABLESAMPLE | Non-deterministic sampling cannot be differentiated | Use FULL mode explicitly |
VOLATILE functions in SELECT | random(), now(), clock_timestamp(), nextval() change on every call | Pre-compute volatile values in the source table, or use FULL mode |
STABLE functions in GROUP BY keys | Key values change between refresh cycles | Use stable or immutable functions only |
| Window functions in output | ROW_NUMBER(), RANK(), LEAD(), LAG() require global reordering | Use FULL mode, or pre-aggregate and use Top-N views |
FETCH FIRST without ORDER BY | Non-deterministic selection | Add a deterministic ORDER BY and use Top-N via LIMIT |
GROUPING SETS beyond branch limit | Explosion prevents O(Δ) maintenance | Reduce dimensions, or raise pg_trickle.max_grouping_set_branches |
Not supported at all (any mode)
| Construct | Reason |
|---|---|
WITH RECURSIVE | Recursive CTEs require convergence logic not yet implemented |
| DDL inside the defining query | CREATE TABLE, CALL etc. are not valid in SELECT |
RETURNING clauses | Not applicable to SELECT queries |
FOR UPDATE / FOR SHARE | Locking hints cannot be used in defining queries |
| Subqueries with side effects | INSERT ... RETURNING in subqueries |
pg_catalog internal tables as sources | Internal catalog tables are not tracked by CDC |
| Temp tables as sources | Temporary tables are session-scoped; CDC triggers cannot be installed |
DIFFERENTIAL Mode Constraints
Source table requirements
For DIFFERENTIAL mode to work correctly, each source table must:
-
Have a primary key or unique index on the columns used as join keys. Without a reliable row identity, the MERGE step cannot match old and new versions of a row. pg_trickle can fall back to a hash-based row ID (
__pgt_row_id) for sources without primary keys, but this adds overhead. -
Not use UNLOGGED or TEMPORARY storage for the stream table output. The stream table must survive a crash-recovery cycle. Source tables can be UNLOGGED (changes are still captured by triggers).
-
Not be altered concurrently in ways that change column structure while a refresh is running. pg_trickle blocks source DDL by default (
pg_trickle.block_source_ddl = true). Disabling this risks schema inconsistency between the change buffer and the stream table.
Multi-source join constraints
When a defining query joins multiple source tables:
- All join keys must be equi-joins (e.g.,
t1.id = t2.id). Range joins (t1.ts BETWEEN t2.start AND t2.end) force FULL mode. - The number of delta CTEs grows with the number of sources. Queries
joining 5+ large tables may hit the
pg_trickle.max_diff_cteslimit. The default limit is 200; raise it or simplify the query. - Left outer joins with nullable right-side keys add correctness complexity. pg_trickle handles them correctly, but the delta SQL is larger.
Aggregate constraints
| Aggregate | Supported? | Notes |
|---|---|---|
COUNT(*) | ✅ Yes | Fully algebraic |
SUM(x) | ✅ Yes | Fully algebraic |
MIN(x), MAX(x) | ✅ Yes | With reference counting |
AVG(x) | ✅ Yes | Via sum + count decomposition |
STDDEV(x), VARIANCE(x) | ✅ Yes | Via sum-of-squares decomposition |
COUNT(DISTINCT x) | ✅ Yes | Via Z-set algebraic counting |
ARRAY_AGG(x) | ❌ No | Order-dependent; use FULL mode |
STRING_AGG(x, sep) | ❌ No | Order-dependent; use FULL mode |
JSON_AGG(x) | ❌ No | Order-dependent; use FULL mode |
PERCENTILE_CONT(f) WITHIN GROUP (ORDER BY x) | ❌ No | Requires global sort |
MODE() | ❌ No | Requires global frequency computation |
| Custom user-defined aggregates | ⚠️ Maybe | Supported if the aggregate provides sfunc + finalfunc that pg_trickle can decompose; marked STRICT aggregates are rejected |
Source Table Restrictions
Supported source types
| Source type | Supported? | Notes |
|---|---|---|
| Regular heap tables | ✅ Yes | Full CDC support |
| Partitioned tables (declarative) | ✅ Yes | Triggers installed on each partition |
| Foreign tables (postgres_fdw) | ✅ Yes | Snapshot-comparison mode |
| Materialized views | ✅ Yes | Snapshot-comparison mode |
| Other stream tables | ✅ Yes | DAG chaining supported |
| UNLOGGED tables | ✅ Yes (source) | Changes captured; stream table output must be logged |
| Temporary tables | ❌ No | Session-scoped; CDC triggers cannot persist |
System catalogs (pg_class, etc.) | ❌ No | Not tracked by CDC |
| Views (non-materialized) | ❌ No | Automatically inlined; the base tables become the sources |
| Remote tables via dblink | ⚠️ Limited | Use foreign tables via postgres_fdw instead |
Column type constraints
text-typed columns named as join keys work, but are less efficient than integer or UUID keys. Use an index on the join key columns.jsonbcolumns in GROUP BY are supported but hash joins on JSONB are expensive. Consider extracting the key sub-field.byteacolumns work in the output but cannot be used as GROUP BY keys in DIFFERENTIAL mode.
Operational Anti-Patterns
Anti-pattern 1: Very high write rates with low schedules
Problem: If a source table receives 100K inserts/second and the stream table schedule is 1 second, the change buffer accumulates 100K rows per cycle. The DIFFERENTIAL delta SQL must process all 100K rows on every refresh, which may take longer than 1 second — causing the scheduler to fall behind.
Fix:
- Increase the schedule to allow batching:
schedule => '10s' - Enable the adaptive fallback:
pg_trickle.differential_max_change_ratio = 0.15(default: fall back to FULL when > 15% of the source table changed) - Use
pg_trickle.max_delta_estimate_rowsto cap delta size
Anti-pattern 2: Unbounded DAG depth
Problem: A chain of 20+ stream tables where each depends on the previous creates O(depth) sequential refresh latency on every cycle.
Fix: Flatten the DAG where possible. Use parallel refresh
(pg_trickle.parallel_refresh_mode = 'on') for independent branches.
Consider whether intermediate stream tables are necessary.
Anti-pattern 3: Schema changes on live sources
Problem: ALTER TABLE ... DROP COLUMN on a source table while pg_trickle
is running will break the change buffer schema and cause refresh errors.
Fix: Keep pg_trickle.block_source_ddl = true (the default). This causes
schema changes to fail with a descriptive error; you can then update the stream
table query explicitly before re-applying the schema change.
Anti-pattern 4: Treating stream tables as application write targets
Problem: Inserting or updating rows directly in a stream table bypasses pg_trickle's refresh logic. On the next refresh, the direct writes will be overwritten.
Fix: Stream tables are read-only from the application's perspective.
All writes must go through the source tables. Use the
pgtrickle.repair_stream_table() function if a stream table gets into an
inconsistent state.
Anti-pattern 5: Using pg_trickle.enabled = false in production
Problem: Setting pg_trickle.enabled = false globally stops all refreshes.
Change buffers accumulate indefinitely. Re-enabling causes a burst refresh of
all stream tables simultaneously.
Fix: Use pgtrickle.suspend_stream_table() to pause individual stream
tables, or pg_trickle.drain_mode = true to stop new work while completing
in-flight refreshes.
"Will This Work?" Decision Tree
Does your query use window functions in the output?
YES → Use FULL mode (refresh_mode => 'FULL')
NO ↓
Does your query use volatile functions (random(), now(), nextval())?
YES → Use FULL mode, or pre-compute the volatile value in the source
NO ↓
Does your query use ORDER BY without LIMIT?
YES → Remove ORDER BY, or use LIMIT N for a Top-N stream table
NO ↓
Does your query use WITH RECURSIVE?
YES → Not supported. Materialize the recursive query into a regular table first.
NO ↓
Do all join keys use equi-join conditions (= not BETWEEN / >=)?
NO → Use FULL mode, or rewrite the join condition
YES ↓
Does every source table have a primary key or unique index on the join key?
NO → pg_trickle will use hash-based row IDs (slightly less efficient, but works)
YES ↓
✅ Your query is a good candidate for DIFFERENTIAL mode.
Use: refresh_mode => 'DIFFERENTIAL' or 'AUTO' (default).
Multi-Column NOT IN with Nullable Elements (COR-1, v0.58.0)
When a defining query contains a multi-column NOT IN subquery such as:
SELECT a, b FROM t
WHERE (a, b) NOT IN (SELECT x, y FROM s)
pg_trickle v0.55.0 introduced an optimisation that rewrites (a, b) IN (SELECT x, y …) as a
SemiJoin and NOT IN as an AntiJoin. However, SQL semantics for NOT IN differ from
AntiJoin semantics when either side of the comparison can be NULL: SQL propagates
UNKNOWN (which excludes the outer row), whereas an AntiJoin keeps the outer row.
Behaviour: When any element on the left-hand side of the row constructor is a NULL
constant, or when any column in the subquery's SELECT list is a NULL literal,
pg_trickle v0.58.0 detects this condition and falls back to the subquery-based
(FULL refresh) execution path, emitting a NOTICE:
NOTICE: pg_trickle: multi-column NOT IN with nullable elements cannot be
rewritten to an anti-join; falling back to subquery-based delta computation.
Workaround: Rewrite using NOT EXISTS or add explicit IS NOT NULL guards to avoid
NULL-producing expressions in the row constructor.
Known Future Improvements
| Limitation | Planned in |
|---|---|
| Window functions in output | v1.1+ |
WITH RECURSIVE support | v1.2+ |
STRING_AGG / ARRAY_AGG incremental maintenance | Researching |
| Cross-database stream tables (without foreign tables) | Not planned |
See Also
- MENTAL_MODEL.md — Conceptual overview of how IVM works
- CONFIGURATION.md — GUC options to tune limits
- TROUBLESHOOTING.md — Diagnosing refresh problems
- ERRORS.md — Error reference