Backup and restore¶

riverbank's state lives entirely in PostgreSQL. Standard PostgreSQL backup strategies apply.

What to back up¶

The _riverbank schema (catalog: sources, fragments, profiles, runs, audit log)
The pg-ripple graph store (compiled triples, SHACL shapes, provenance)
pg-trickle materialized views (can be regenerated, but backup avoids downtime)

pg_dump (logical backup)¶

pg_dump -h localhost -U riverbank -d riverbank \
  --schema=_riverbank \
  --schema=pg_ripple \
  -F custom -f riverbank_backup.dump

Restore:

pg_restore -h localhost -U riverbank -d riverbank \
  --clean --if-exists \
  riverbank_backup.dump

Point-in-time recovery (PITR)¶

For production, use continuous WAL archiving:

Configure archive_mode = on and archive_command in postgresql.conf
Take periodic base backups with pg_basebackup
Restore to any point in time with recovery_target_time

Kubernetes (volume snapshots)¶

If using a CSI driver that supports volume snapshots:

apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
  name: riverbank-db-snapshot
spec:
  volumeSnapshotClassName: csi-snapclass
  source:
    persistentVolumeClaimName: data-postgres-0

What can be regenerated¶

If you lose the backup, these can be reconstructed:

Compiled triples — re-ingest the corpus (riverbank ingest)
pg-trickle views — refresh automatically after graph writes
Rendered pages — re-render (riverbank render)

What cannot be regenerated:

Audit log — append-only history is lost
Run history — cost/token accounting is lost
Review decisions — human judgments from Label Studio are lost