pg_trickle Limitations

This document covers what pg_trickle cannot do, the constraints of DIFFERENTIAL mode, source table restrictions, and operational anti-patterns. Use the decision tree at the end to quickly determine if your use case is supported.


Unsupported SQL Constructs

The following SQL features are not supported in the defining query of a stream table. Attempting to use them will produce an UnsupportedOperator error at creation time, or at the first refresh attempt.

In DIFFERENTIAL mode only

These constructs force a fallback to FULL refresh. The stream table is created successfully, but every refresh performs a full table recomputation:

ConstructReasonWorkaround
ORDER BY without LIMITResult ordering is non-deterministic as a deltaRemove ORDER BY, or use ORDER BY ... LIMIT N for Top-N views
TABLESAMPLENon-deterministic sampling cannot be differentiatedUse FULL mode explicitly
VOLATILE functions in SELECTrandom(), now(), clock_timestamp(), nextval() change on every callPre-compute volatile values in the source table, or use FULL mode
STABLE functions in GROUP BY keysKey values change between refresh cyclesUse stable or immutable functions only
Window functions in outputROW_NUMBER(), RANK(), LEAD(), LAG() require global reorderingUse FULL mode, or pre-aggregate and use Top-N views
FETCH FIRST without ORDER BYNon-deterministic selectionAdd a deterministic ORDER BY and use Top-N via LIMIT
GROUPING SETS beyond branch limitExplosion prevents O(Δ) maintenanceReduce dimensions, or raise pg_trickle.max_grouping_set_branches

Not supported at all (any mode)

ConstructReason
WITH RECURSIVERecursive CTEs require convergence logic not yet implemented
DDL inside the defining queryCREATE TABLE, CALL etc. are not valid in SELECT
RETURNING clausesNot applicable to SELECT queries
FOR UPDATE / FOR SHARELocking hints cannot be used in defining queries
Subqueries with side effectsINSERT ... RETURNING in subqueries
pg_catalog internal tables as sourcesInternal catalog tables are not tracked by CDC
Temp tables as sourcesTemporary tables are session-scoped; CDC triggers cannot be installed

DIFFERENTIAL Mode Constraints

Source table requirements

For DIFFERENTIAL mode to work correctly, each source table must:

  1. Have a primary key or unique index on the columns used as join keys. Without a reliable row identity, the MERGE step cannot match old and new versions of a row. pg_trickle can fall back to a hash-based row ID (__pgt_row_id) for sources without primary keys, but this adds overhead.

  2. Not use UNLOGGED or TEMPORARY storage for the stream table output. The stream table must survive a crash-recovery cycle. Source tables can be UNLOGGED (changes are still captured by triggers).

  3. Not be altered concurrently in ways that change column structure while a refresh is running. pg_trickle blocks source DDL by default (pg_trickle.block_source_ddl = true). Disabling this risks schema inconsistency between the change buffer and the stream table.

Multi-source join constraints

When a defining query joins multiple source tables:

  • All join keys must be equi-joins (e.g., t1.id = t2.id). Range joins (t1.ts BETWEEN t2.start AND t2.end) force FULL mode.
  • The number of delta CTEs grows with the number of sources. Queries joining 5+ large tables may hit the pg_trickle.max_diff_ctes limit. The default limit is 200; raise it or simplify the query.
  • Left outer joins with nullable right-side keys add correctness complexity. pg_trickle handles them correctly, but the delta SQL is larger.

Aggregate constraints

AggregateSupported?Notes
COUNT(*)✅ YesFully algebraic
SUM(x)✅ YesFully algebraic
MIN(x), MAX(x)✅ YesWith reference counting
AVG(x)✅ YesVia sum + count decomposition
STDDEV(x), VARIANCE(x)✅ YesVia sum-of-squares decomposition
COUNT(DISTINCT x)✅ YesVia Z-set algebraic counting
ARRAY_AGG(x)❌ NoOrder-dependent; use FULL mode
STRING_AGG(x, sep)❌ NoOrder-dependent; use FULL mode
JSON_AGG(x)❌ NoOrder-dependent; use FULL mode
PERCENTILE_CONT(f) WITHIN GROUP (ORDER BY x)❌ NoRequires global sort
MODE()❌ NoRequires global frequency computation
Custom user-defined aggregates⚠️ MaybeSupported if the aggregate provides sfunc + finalfunc that pg_trickle can decompose; marked STRICT aggregates are rejected

Source Table Restrictions

Supported source types

Source typeSupported?Notes
Regular heap tables✅ YesFull CDC support
Partitioned tables (declarative)✅ YesTriggers installed on each partition
Foreign tables (postgres_fdw)✅ YesSnapshot-comparison mode
Materialized views✅ YesSnapshot-comparison mode
Other stream tables✅ YesDAG chaining supported
UNLOGGED tables✅ Yes (source)Changes captured; stream table output must be logged
Temporary tables❌ NoSession-scoped; CDC triggers cannot persist
System catalogs (pg_class, etc.)❌ NoNot tracked by CDC
Views (non-materialized)❌ NoAutomatically inlined; the base tables become the sources
Remote tables via dblink⚠️ LimitedUse foreign tables via postgres_fdw instead

Column type constraints

  • text-typed columns named as join keys work, but are less efficient than integer or UUID keys. Use an index on the join key columns.
  • jsonb columns in GROUP BY are supported but hash joins on JSONB are expensive. Consider extracting the key sub-field.
  • bytea columns work in the output but cannot be used as GROUP BY keys in DIFFERENTIAL mode.

Operational Anti-Patterns

Anti-pattern 1: Very high write rates with low schedules

Problem: If a source table receives 100K inserts/second and the stream table schedule is 1 second, the change buffer accumulates 100K rows per cycle. The DIFFERENTIAL delta SQL must process all 100K rows on every refresh, which may take longer than 1 second — causing the scheduler to fall behind.

Fix:

  • Increase the schedule to allow batching: schedule => '10s'
  • Enable the adaptive fallback: pg_trickle.differential_max_change_ratio = 0.15 (default: fall back to FULL when > 15% of the source table changed)
  • Use pg_trickle.max_delta_estimate_rows to cap delta size

Anti-pattern 2: Unbounded DAG depth

Problem: A chain of 20+ stream tables where each depends on the previous creates O(depth) sequential refresh latency on every cycle.

Fix: Flatten the DAG where possible. Use parallel refresh (pg_trickle.parallel_refresh_mode = 'on') for independent branches. Consider whether intermediate stream tables are necessary.

Anti-pattern 3: Schema changes on live sources

Problem: ALTER TABLE ... DROP COLUMN on a source table while pg_trickle is running will break the change buffer schema and cause refresh errors.

Fix: Keep pg_trickle.block_source_ddl = true (the default). This causes schema changes to fail with a descriptive error; you can then update the stream table query explicitly before re-applying the schema change.

Anti-pattern 4: Treating stream tables as application write targets

Problem: Inserting or updating rows directly in a stream table bypasses pg_trickle's refresh logic. On the next refresh, the direct writes will be overwritten.

Fix: Stream tables are read-only from the application's perspective. All writes must go through the source tables. Use the pgtrickle.repair_stream_table() function if a stream table gets into an inconsistent state.

Anti-pattern 5: Using pg_trickle.enabled = false in production

Problem: Setting pg_trickle.enabled = false globally stops all refreshes. Change buffers accumulate indefinitely. Re-enabling causes a burst refresh of all stream tables simultaneously.

Fix: Use pgtrickle.suspend_stream_table() to pause individual stream tables, or pg_trickle.drain_mode = true to stop new work while completing in-flight refreshes.


"Will This Work?" Decision Tree

Does your query use window functions in the output?
  YES → Use FULL mode (refresh_mode => 'FULL')
  NO  ↓

Does your query use volatile functions (random(), now(), nextval())?
  YES → Use FULL mode, or pre-compute the volatile value in the source
  NO  ↓

Does your query use ORDER BY without LIMIT?
  YES → Remove ORDER BY, or use LIMIT N for a Top-N stream table
  NO  ↓

Does your query use WITH RECURSIVE?
  YES → Not supported. Materialize the recursive query into a regular table first.
  NO  ↓

Do all join keys use equi-join conditions (= not BETWEEN / >=)?
  NO  → Use FULL mode, or rewrite the join condition
  YES ↓

Does every source table have a primary key or unique index on the join key?
  NO  → pg_trickle will use hash-based row IDs (slightly less efficient, but works)
  YES ↓

✅ Your query is a good candidate for DIFFERENTIAL mode.
   Use: refresh_mode => 'DIFFERENTIAL' or 'AUTO' (default).

Multi-Column NOT IN with Nullable Elements (COR-1, v0.58.0)

When a defining query contains a multi-column NOT IN subquery such as:

SELECT a, b FROM t
WHERE (a, b) NOT IN (SELECT x, y FROM s)

pg_trickle v0.55.0 introduced an optimisation that rewrites (a, b) IN (SELECT x, y …) as a SemiJoin and NOT IN as an AntiJoin. However, SQL semantics for NOT IN differ from AntiJoin semantics when either side of the comparison can be NULL: SQL propagates UNKNOWN (which excludes the outer row), whereas an AntiJoin keeps the outer row.

Behaviour: When any element on the left-hand side of the row constructor is a NULL constant, or when any column in the subquery's SELECT list is a NULL literal, pg_trickle v0.58.0 detects this condition and falls back to the subquery-based (FULL refresh) execution path, emitting a NOTICE:

NOTICE: pg_trickle: multi-column NOT IN with nullable elements cannot be
rewritten to an anti-join; falling back to subquery-based delta computation.

Workaround: Rewrite using NOT EXISTS or add explicit IS NOT NULL guards to avoid NULL-producing expressions in the row constructor.


Known Future Improvements

LimitationPlanned in
Window functions in outputv1.1+
WITH RECURSIVE supportv1.2+
STRING_AGG / ARRAY_AGG incremental maintenanceResearching
Cross-database stream tables (without foreign tables)Not planned

See Also