Observability¶

riverbank has three observability layers: LLM traces (Langfuse), pipeline spans (OpenTelemetry), and aggregate metrics (Prometheus).

Layer 1: Langfuse (LLM observability)¶

Every LLM extraction call is traced in Langfuse with:

Input prompt (fragment text + system prompt)
Output (extracted triples)
Token counts (prompt + completion)
Latency
Cost estimate

Access traces via riverbank runs:

riverbank runs --since 1h

The "Langfuse" column contains deep-links to individual trace views.

Configuration¶

export RIVERBANK_LANGFUSE__ENABLED=true
export RIVERBANK_LANGFUSE__PUBLIC_KEY="pk-lf-..."
export RIVERBANK_LANGFUSE__SECRET_KEY="sk-lf-..."
export RIVERBANK_LANGFUSE__HOST="http://langfuse:3000"

Layer 2: OpenTelemetry (pipeline spans)¶

The pipeline emits spans for each stage:

riverbank.ingest — top-level span for an ingest run
riverbank.parse — document parsing
riverbank.fragment — fragmentation
riverbank.extract — LLM extraction
riverbank.validate — SHACL validation
riverbank.write — graph write

Configuration¶

export OTEL_EXPORTER_OTLP_ENDPOINT="http://tempo:4317"
export OTEL_SERVICE_NAME="riverbank"

Supports gRPC and HTTP exporters. Compatible with Jaeger, Tempo, Honeycomb, and any OTLP-compatible backend.

Viewing traces¶

In Jaeger/Tempo, search for service riverbank to see the full span tree for a single fragment compilation.

Layer 3: Prometheus metrics¶

Six metrics are emitted at /metrics:

Metric	Type	What to monitor
`riverbank_runs_total`	Counter	Error rate, throughput
`riverbank_run_duration_seconds`	Histogram	Latency percentiles
`riverbank_llm_cost_usd_total`	Counter	Daily spend
`riverbank_shacl_score`	Gauge	Quality regression
`riverbank_review_queue_depth`	Gauge	Reviewer backlog
`riverbank_context_efficiency_ratio`	Gauge	RAG relevance

See Metrics reference for full details.

Alert rules (example)¶

groups:
  - name: riverbank
    rules:
      - alert: SHACLScoreDrop
        expr: riverbank_shacl_score < 0.7
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "SHACL score dropped below threshold"

      - alert: ReviewQueueBacklog
        expr: riverbank_review_queue_depth > 100
        for: 1h
        labels:
          severity: warning
        annotations:
          summary: "Review queue has >100 pending items"

      - alert: HighErrorRate
        expr: rate(riverbank_runs_total{outcome="error"}[5m]) > 0.1
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "High extraction error rate"

Perses dashboards¶

The perses/riverbank-overview.json file contains a pre-built dashboard showing:

Runs per hour (success vs. error)
LLM cost accumulation
SHACL scores over time
Review queue depth
Run duration percentiles

Import into Perses or convert to Grafana format.

Nightly flows¶

Two Prefect flows run on schedule:

snapshot_shacl_scores() — records SHACL scores for all named graphs into _riverbank.shacl_score_history
run_nightly_lint() — runs the full lint pass and records findings as pgc:LintFinding triples