Skip to content

Use Batch Extraction

Batch extraction groups multiple document fragments into a single LLM call, reducing API costs and overhead.

Enable Batch Extraction

Add to your profile's extraction_strategy:

extraction_strategy:
  batch_size: 3  # Group 3 fragments per LLM call

Cost Model

Setup Per-Fragment Calls Per-Batch Calls Savings
10 fragments, no batching 10
10 fragments, batch_size=2 5 50%
10 fragments, batch_size=5 2 80%

Trade-offs

Without Batching With Batching
✓ Minimal hallucination ✗ Risk of cross-fragment connections
✗ More LLM calls ✓ Fewer calls
✓ Per-token efficiency ✗ More tokens/call

Quick Examples

Use pre-built batched profile

riverbank evaluate-wikidata --article "Marie Curie" \
  --profile examples/profiles/wikidata-eval-v1-llm-batched.yaml

Set batch_size in your profile

extraction_strategy:
  mode: "permissive"
  batch_size: 2  # Batch 2 fragments per call

Disable batching (default)

extraction_strategy:
  mode: "permissive"
  # batch_size: 0 or omitted

When to Use Batching

Use if: - Using OpenAI/hosted APIs (per-call overhead is high) - Fragments are semantically independent - Cost per call is significant

Don't use if: - Using Ollama locally (per-call overhead is minimal) - Fragments are closely related (risk of hallucination) - Paying per-token (fewer tokens is better)

Implementation Details

  • Method: InstructorExtractor.extract_batch(fragments: list)
  • Return: Dict mapping fragment_key → ExtractionResult
  • Automatic: No pipeline changes needed (optional in extractor)
  • Backward compatible: Falls back to per-fragment if batch_size not set

Supported LLM Backends

Batch extraction works with LLM backends that properly support JSON structured output:

  • ✅ OpenAI (JSON mode)
  • ✅ Claude/Copilot (JSON mode)
  • ❌ Ollama/Gemma (limited structured output support)

If you're using Ollama locally, batch extraction may not work reliably. Per-fragment extraction is recommended.