Compile a policy corpus¶
This tutorial uses the docs-policy-v1 profile to compile the example Markdown corpus into a governed knowledge graph, then validates the result with competency questions.
Scenario¶
You have a set of internal documentation files and want to compile them into structured facts that your team can query with SPARQL. The compiler profile defines what gets extracted, what quality threshold is acceptable, and what questions the compiled graph must be able to answer.
Prerequisites¶
- riverbank installed with
[dev]extras - Docker Compose stack running (
docker compose up -d) riverbank initcompleted
Step 1: Register the profile¶
This registers the docs-policy-v1 profile in the _riverbank.profiles table. The profile specifies:
- Extractor:
noop(for CI; swap toinstructorfor real extraction) - Model:
llama3.2via Ollama - Confidence threshold: 0.7
- Named graph:
http://riverbank.example/graph/trusted
Step 2: Review the profile¶
The profile at examples/profiles/docs-policy-v1.yaml contains:
name: docs-policy-v1
version: 1
extractor: noop
model_provider: ollama
model_name: llama3.2
embed_model: nomic-embed-text
max_fragment_tokens: 2000
named_graph: "http://riverbank.example/graph/trusted"
prompt_text: |
Extract factual claims from documentation as RDF triples.
For each claim provide: subject, predicate, object_value, confidence, evidence.
Only extract claims directly supported by the text. Do NOT fabricate evidence.
editorial_policy:
min_fragment_length: 50
max_fragment_length: 8000
min_heading_depth: 0
confidence_threshold: 0.7
allowed_languages: [en]
competency_questions:
- id: cq-01
description: "The corpus defines a concept called 'Confidence'"
sparql: ASK { ?s ?p "Confidence" . }
- id: cq-02
description: "The corpus mentions evidence spans with character offsets"
sparql: ASK { ?s ?p ?o . FILTER(CONTAINS(STR(?o), "character")) }
- id: cq-03
description: "The corpus contains at least one subject–predicate–object triple"
sparql: ASK { ?s ?p ?o . }
The competency questions are SPARQL assertions that the compiled graph must satisfy. They function as regression tests for knowledge quality.
Step 3: Ingest the corpus¶
The pipeline will:
- Discover the three Markdown files
- Fragment each at heading boundaries
- Apply the editorial policy gate (min 50 chars, max 8000 chars)
- Skip unchanged fragments (hash check)
- Extract triples using the configured extractor
- Validate via SHACL and write to the named graph
Step 4: Query the result¶
To check a specific competency question:
Step 5: Run the quality gate¶
This exits with code 0 if the SHACL score meets or exceeds 0.7.
Step 6: Validate in CI¶
The golden corpus tests validate the same pipeline in CI:
These tests register the profile, ingest the example corpus, and assert the competency questions all pass.
What you learned¶
- Compiler profiles control what gets extracted and how quality is measured
- Competency questions are SPARQL-based regression tests for your knowledge graph
- The
riverbank lintcommand enforces quality gates in CI - The golden corpus test suite validates the full pipeline end-to-end