Snowflake
Snowflake is a cloud-native data warehouse that separates compute from storage, allowing you to scale query processing independently of data volume. It runs on AWS, Azure, and GCP, providing a single platform for data warehousing, data lakes, and data sharing across clouds. When pg_tide delivers messages to Snowflake, your PostgreSQL events flow directly into your data warehouse for analytics, reporting, and machine learning without requiring intermediate ETL pipelines.
When to Use This Sink
Choose Snowflake when your organization uses it as the primary data warehouse for analytics and BI, when you need to combine PostgreSQL event data with other data sources already in Snowflake, or when you want zero-maintenance analytical storage that scales automatically. Snowflake's semi-structured data support (VARIANT type) handles JSON event payloads natively, making it easy to query nested event data without predefined schemas.
Configuration
Production Configuration
SELECT tide.relay_set_outbox(
'events-to-snowflake',
'events',
'snowflake-relay',
'{
"sink_type": "snowflake",
"account": "${env:SNOWFLAKE_ACCOUNT}",
"database": "ANALYTICS",
"schema": "EVENTS",
"table": "RAW_EVENTS",
"warehouse": "INGEST_WH",
"username": "${env:SNOWFLAKE_USER}",
"private_key_path": "${env:SNOWFLAKE_KEY_PATH}",
"batch_size": 500,
"stage": "pg_tide_stage"
}'::jsonb
);
Configuration Reference
| Parameter | Type | Default | Description |
|---|---|---|---|
sink_type | string | — | Must be "snowflake" |
account | string | — | Snowflake account identifier |
database | string | — | Target database |
schema | string | — | Target schema |
table | string | — | Target table |
warehouse | string | — | Compute warehouse for COPY operations |
username | string | — | Authentication username |
password | string | null | Password (use key-pair auth instead for production) |
private_key_path | string | null | Path to RSA private key for key-pair authentication |
stage | string | null | Internal stage name for COPY operations |
batch_size | int | 500 | Rows per micro-batch |
role | string | null | Snowflake role to assume |
Authentication
For production, use key-pair authentication rather than passwords:
- Generate a key pair:
openssl genrsa 2048 | openssl pkcs8 -topk8 -inform PEM -out snowflake_key.p8 -nocrypt - Assign the public key to your Snowflake user:
ALTER USER relay_user SET RSA_PUBLIC_KEY='...' - Reference the private key in your config:
"private_key_path": "/etc/snowflake/key.p8"
How It Works
The relay uses a stage-based approach for efficient loading:
- Messages are accumulated into micro-batches
- Each batch is written as a compressed file to an internal Snowflake stage
- A COPY INTO command loads the staged file into the target table
- The stage file is removed after successful loading
This approach is more cost-effective than streaming inserts because Snowflake charges per-compute-second, and batch loading uses minimal warehouse time.
Table Schema
CREATE TABLE ANALYTICS.EVENTS.RAW_EVENTS (
event_id VARCHAR,
outbox_name VARCHAR,
payload VARIANT, -- Stores JSON natively
dedup_key VARCHAR,
operation VARCHAR,
ingested_at TIMESTAMP_NTZ DEFAULT CURRENT_TIMESTAMP()
);
Cost Optimization
- Use an X-Small warehouse for ingestion (sufficient for most event volumes)
- Set warehouse auto-suspend to 60 seconds to minimize idle costs
- Batch sizes of 500-1000 reduce the number of COPY operations
- Consider using Snowpipe for continuous micro-batch loading in very high volume scenarios
Troubleshooting
- "Warehouse not running" — Ensure the warehouse is set to auto-resume, or start it manually
- "Insufficient privileges" — Grant USAGE on warehouse, INSERT on table, WRITE on stage
- "Key-pair authentication failed" — Verify the public key is assigned to the Snowflake user and the private key path is correct
Further Reading
- BigQuery — Google Cloud alternative
- ClickHouse — Self-hosted real-time analytics alternative
- Object Storage — Raw file-based data lake landing