Backup and restore¶
riverbank's state lives entirely in PostgreSQL. Standard PostgreSQL backup strategies apply.
What to back up¶
- The
_riverbankschema (catalog: sources, fragments, profiles, runs, audit log) - The pg-ripple graph store (compiled triples, SHACL shapes, provenance)
- pg-trickle materialized views (can be regenerated, but backup avoids downtime)
pg_dump (logical backup)¶
pg_dump -h localhost -U riverbank -d riverbank \
--schema=_riverbank \
--schema=pg_ripple \
-F custom -f riverbank_backup.dump
Restore:
Point-in-time recovery (PITR)¶
For production, use continuous WAL archiving:
- Configure
archive_mode = onandarchive_commandinpostgresql.conf - Take periodic base backups with
pg_basebackup - Restore to any point in time with
recovery_target_time
Kubernetes (volume snapshots)¶
If using a CSI driver that supports volume snapshots:
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
name: riverbank-db-snapshot
spec:
volumeSnapshotClassName: csi-snapclass
source:
persistentVolumeClaimName: data-postgres-0
What can be regenerated¶
If you lose the backup, these can be reconstructed:
- Compiled triples — re-ingest the corpus (
riverbank ingest) - pg-trickle views — refresh automatically after graph writes
- Rendered pages — re-render (
riverbank render)
What cannot be regenerated:
- Audit log — append-only history is lost
- Run history — cost/token accounting is lost
- Review decisions — human judgments from Label Studio are lost