Fragment and artifact model¶
riverbank's data model has two core entities: fragments (units of source content) and artifacts (compiled knowledge derived from fragments).
Fragments¶
A fragment is a stable section of a source document — typically one heading and its content. Fragments are the unit of compilation: each fragment is independently extractable, hashable, and cacheable.
Source document
├── Fragment A (## Introduction)
├── Fragment B (## Architecture)
└── Fragment C (## Deployment)
Fragment properties¶
| Property | Type | Purpose |
|---|---|---|
id |
UUID | Internal primary key |
source_id |
FK | Reference to parent source |
fragment_key |
string | Stable heading path (e.g., ## Architecture) |
content_hash |
bytes | xxh3_128 of fragment text — enables skip logic |
char_start |
int | Start offset in source document |
char_end |
int | End offset in source document |
tenant_id |
string | Tenant scope (nullable) |
Hash-based skip logic¶
The content hash is the foundation of incremental compilation:
- On ingest, compute
xxh3_128(fragment_text) - Compare to stored hash from previous run
- If identical → skip (zero LLM cost)
- If different → recompile this fragment
Artifacts¶
An artifact is a compiled fact (an RDF triple) written to the knowledge graph. Each artifact traces back to its source fragment via provenance edges.
Artifact properties (in the graph)¶
| Property | Type | Purpose |
|---|---|---|
| Subject IRI | URI | The entity or concept |
| Predicate | URI | The relationship |
| Object | URI or literal | The value |
pgc:confidence |
float | Extraction confidence [0.0, 1.0] |
pgc:fromFragment |
URI | Source fragment reference |
pgc:byProfile |
URI | Compiler profile used |
pgc:compiledAt |
dateTime | When this fact was compiled |
pgc:epistemicStatus |
string | Epistemic status tag |
pgc:evidenceSpan |
JSON | Character range + excerpt |
The dependency graph¶
The _riverbank.artifact_deps table records edges between artifacts and their dependencies:
flowchart BT
F1[Fragment A] --> A1[Artifact: Acme produces widgets]
F1 --> A2[Artifact: Acme founded 1995]
F2[Fragment B] --> A3[Artifact: Acme HQ in NYC]
A1 --> R[Rendered page: Acme]
A2 --> R
A3 --> R
When Fragment A changes: - Artifacts A1 and A2 are invalidated - The rendered page for Acme is marked stale - Fragment B and Artifact A3 are untouched
Named graphs¶
Artifacts live in named graphs that separate concerns:
| Graph | Contents |
|---|---|
<trusted> |
High-confidence facts (above threshold) |
<draft> |
Low-confidence facts pending review |
<vocab> |
SKOS concepts from the vocabulary pass |
<human-review> |
Corrections from Label Studio |
Multi-tenant deployments use per-tenant graph prefixes (e.g., <http://riverbank.example/graph/acme/trusted>).