Skip to content

Ingestion

CRP's ingest() lets you feed external data into a session WITHOUT making an LLM call. Facts are extracted using the graduated extraction pipeline (stages 1–5, purely statistical/ML — no LLM).

Why Ingest?

Scenario Use dispatch() Use ingest()
Generate a response
Pre-load reference material
Feed API responses
Load documentation
Process search results

ingest() is free — no LLM tokens consumed. It populates the warm state with facts that future dispatch() calls will use.

Basic Usage

import crp

session = crp.init(provider="ollama", model="qwen3-4b")

# Ingest external data
article = """
TLS 1.3 reduces the handshake from 2-RTT to 1-RTT, eliminating
an entire round trip. It removes support for vulnerable cipher
suites like RC4 and 3DES. Forward secrecy is now mandatory via
ephemeral Diffie-Hellman. The 0-RTT resumption mode enables
instant reconnection but is vulnerable to replay attacks.
"""

result = session.ingest(text=article, label="tls-1.3-overview")
print(f"Facts extracted: {result.facts_extracted}")
# Facts extracted: 6

# Now dispatch — the TLS facts are automatically in the envelope
response = session.dispatch(
    task="Write a security assessment of TLS 1.3 migration risks"
)
print(f"Quality: {response.quality_tier}")
# Quality: A  (grounded in the ingested facts)

The Extraction Pipeline

ingest() runs stages 1–5 of the extraction pipeline:

Stage Method What It Does Cost
1 Regex patterns Extract structured data (dates, URLs, emails, IPs) ~1ms
2 TextRank Graph-based keyword extraction ~5ms
3 GLiNER Zero-shot NER with task-derived labels ~50ms
4 Sentence scoring Identify key sentences by TF-IDF + position ~10ms
5 Fact consolidation Deduplicate, merge, score confidence ~5ms

Stage 6 (LLM-based extraction) is NOT used during ingestion — only during dispatch(), which has the LLM available.

Multiple Ingestions

You can ingest multiple documents:

# Ingest from different sources
session.ingest(text=api_docs, label="api-reference")
session.ingest(text=changelog, label="recent-changes")
session.ingest(text=user_feedback, label="user-reports")

# All facts are available for the next dispatch
response = session.dispatch(
    task="Summarize the current state of the API and recent user feedback"
)

Ingestion with Labels

Labels help CRP organize facts by source:

session.ingest(text=article_1, label="source-a")
session.ingest(text=article_2, label="source-b")

# Facts are tagged with their source label
# The provenance engine can trace claims back to specific sources

Best Practices

Ingest before dispatch

Always ingest reference material BEFORE the dispatch call that needs it. Facts are available immediately after ingestion.

Use labels

Labels make provenance tracking clearer. When the provenance engine traces a claim back to a fact, the label tells you which source it came from.

Token limits still apply

Ingested text is processed into facts, and facts are packed into envelopes with a token budget. Ingesting a 50-page document produces many facts, but only the most relevant will make it into any given envelope.