Ingestion¶
CRP's ingest() lets you feed external data into a session WITHOUT making
an LLM call. Facts are extracted using the graduated extraction pipeline
(stages 1–5, purely statistical/ML — no LLM).
Why Ingest?¶
| Scenario | Use dispatch() |
Use ingest() |
|---|---|---|
| Generate a response | ✓ | |
| Pre-load reference material | ✓ | |
| Feed API responses | ✓ | |
| Load documentation | ✓ | |
| Process search results | ✓ |
ingest() is free — no LLM tokens consumed. It populates the warm state
with facts that future dispatch() calls will use.
Basic Usage¶
import crp
session = crp.init(provider="ollama", model="qwen3-4b")
# Ingest external data
article = """
TLS 1.3 reduces the handshake from 2-RTT to 1-RTT, eliminating
an entire round trip. It removes support for vulnerable cipher
suites like RC4 and 3DES. Forward secrecy is now mandatory via
ephemeral Diffie-Hellman. The 0-RTT resumption mode enables
instant reconnection but is vulnerable to replay attacks.
"""
result = session.ingest(text=article, label="tls-1.3-overview")
print(f"Facts extracted: {result.facts_extracted}")
# Facts extracted: 6
# Now dispatch — the TLS facts are automatically in the envelope
response = session.dispatch(
task="Write a security assessment of TLS 1.3 migration risks"
)
print(f"Quality: {response.quality_tier}")
# Quality: A (grounded in the ingested facts)
The Extraction Pipeline¶
ingest() runs stages 1–5 of the extraction pipeline:
| Stage | Method | What It Does | Cost |
|---|---|---|---|
| 1 | Regex patterns | Extract structured data (dates, URLs, emails, IPs) | ~1ms |
| 2 | TextRank | Graph-based keyword extraction | ~5ms |
| 3 | GLiNER | Zero-shot NER with task-derived labels | ~50ms |
| 4 | Sentence scoring | Identify key sentences by TF-IDF + position | ~10ms |
| 5 | Fact consolidation | Deduplicate, merge, score confidence | ~5ms |
Stage 6 (LLM-based extraction) is NOT used during ingestion — only during
dispatch(), which has the LLM available.
Multiple Ingestions¶
You can ingest multiple documents:
# Ingest from different sources
session.ingest(text=api_docs, label="api-reference")
session.ingest(text=changelog, label="recent-changes")
session.ingest(text=user_feedback, label="user-reports")
# All facts are available for the next dispatch
response = session.dispatch(
task="Summarize the current state of the API and recent user feedback"
)
Ingestion with Labels¶
Labels help CRP organize facts by source:
session.ingest(text=article_1, label="source-a")
session.ingest(text=article_2, label="source-b")
# Facts are tagged with their source label
# The provenance engine can trace claims back to specific sources
Best Practices¶
Ingest before dispatch
Always ingest reference material BEFORE the dispatch call that needs it. Facts are available immediately after ingestion.
Use labels
Labels make provenance tracking clearer. When the provenance engine traces a claim back to a fact, the label tells you which source it came from.
Token limits still apply
Ingested text is processed into facts, and facts are packed into envelopes with a token budget. Ingesting a 50-page document produces many facts, but only the most relevant will make it into any given envelope.