Writing a plugin¶
riverbank supports five plugin extension points. This guide covers all of them.
Entry-point groups¶
| Group | When to use |
|---|---|
riverbank.parsers |
Support a new document format |
riverbank.fragmenters |
Custom splitting logic |
riverbank.extractors |
Alternative LLM or rule-based extraction |
riverbank.connectors |
Pull documents from APIs, S3, queues |
riverbank.reviewers |
Route to alternative review systems |
Step 1: Implement the base class¶
Each group has a base class in riverbank.<group>.base:
# Example: custom extractor
from riverbank.extractors.base import BaseExtractor, ExtractionResult, EvidenceSpan
class MyExtractor(BaseExtractor):
name = "my-extractor"
def extract(self, fragment_text: str, profile: dict) -> ExtractionResult:
triples = []
# Your extraction logic
triples.append({
"subject": "http://example.org/entity/X",
"predicate": "http://example.org/relatedTo",
"object_value": "Y",
"confidence": 0.85,
"evidence": EvidenceSpan(
char_start=0,
char_end=20,
excerpt=fragment_text[:20],
),
})
return ExtractionResult(triples=triples)
Step 2: Register the entry point¶
In your pyproject.toml:
Step 3: Write tests¶
Use the existing conftest.py fixtures:
def test_my_extractor():
from my_package.extractors import MyExtractor
extractor = MyExtractor()
result = extractor.extract("Sample text for extraction.", {})
assert len(result.triples) >= 1
for triple in result.triples:
assert triple["confidence"] > 0
assert triple["evidence"].char_start >= 0
assert triple["evidence"].char_end > triple["evidence"].char_start
Step 4: Install and verify¶
pip install -e .
python -c "from importlib.metadata import entry_points; print([e.name for e in entry_points(group='riverbank.extractors')])"
Your extractor should appear in the list.
Step 5: Use in a profile¶
Base class contracts¶
Parser contract¶
parse(file_path)→ParsedDocument(content, headings, metadata)- Must preserve character offsets (evidence spans depend on this)
supported_extensionsdetermines file matching
Fragmenter contract¶
fragment(document, policy)→list[Fragment]- Fragments must have non-overlapping character ranges
- Each fragment gets a stable
fragment_key
Extractor contract¶
extract(fragment_text, profile)→ExtractionResult(triples)- Every triple must include an
EvidenceSpanwith valid offsets - The excerpt must match
fragment_text[char_start:char_end]
Connector contract¶
discover(config)→list[SourceDocument]- Each document needs a stable IRI for deduplication
- Configuration comes from the profile or environment
Reviewer contract¶
enqueue(task)→task_id | Nonecollect()→ yieldsReviewDecisionobjects- Decisions include: accepted, corrected, or rejected
Review process for core inclusion¶
To get a plugin included in the riverbank core package:
- Open an issue describing the plugin's purpose
- Implement with full test coverage
- Add an entry in
pyproject.tomlunder the appropriate group - Submit a PR with the implementation, tests, and a docs page
- Maintainers review for API contract compliance and test quality