Core Protocol¶
CRP is built on 10 design axioms that define its behavior. Every implementation decision traces back to these principles.
Design Axioms¶
| # | Axiom | Meaning |
|---|---|---|
| 1 | Task Isolation | Every LLM operation gets its own context window |
| 2 | Maximum Context Saturation | Envelope fills ALL remaining space: $E = C - S - T - G$ |
| 3 | Zero Interpretation Overhead | Pre-digested facts, not raw data |
| 4 | Model Ignorance | The LLM doesn't know CRP exists — no protocol metadata leaks |
| 5 | Unbounded Capacity | Throughput = $N \times C$, with honest quality tier degradation |
| 6 | Portability | Language, model, and framework independent |
| 7 | Window Provenance | DAG tracking of all window lineage |
| 8 | Hardware-Adaptive | Self-configures to VRAM / RAM / CPU |
| 9 | Output Integrity | dispatch() returns unmodified LLM output; extraction is read-only |
| 10 | LLM Amplification | CRP amplifies the LLM, never replaces it |
Session Lifecycle¶
sequenceDiagram
participant App
participant CRP as CRP Orchestrator
participant LLM
App->>CRP: create Client(provider, config)
App->>CRP: ingest(raw_text, source_label)
CRP->>CRP: 6-stage extraction → facts
App->>CRP: dispatch(system_prompt, task_input)
CRP->>CRP: Build context envelope
CRP->>LLM: envelope + task
LLM-->>CRP: output + finish_reason
CRP->>CRP: Extract facts from output
alt finish_reason == "length"
CRP->>CRP: Build continuation envelope
CRP->>LLM: continuation + task gap
LLM-->>CRP: more output
CRP->>CRP: Stitch windows
end
CRP-->>App: (output, QualityReport)
TaskIntent¶
Every dispatch call creates a TaskIntent — CRP's declarative task specification.
All fields are optional with sensible defaults:
# These are all controlled via dispatch() kwargs:
client.dispatch(
system_prompt="You are a technical writer.",
task_input="Write a Kubernetes guide.",
temperature=0.7,
max_output_tokens=4096,
max_continuations=10,
)
Window DAG¶
CRP tracks every generation window in a Directed Acyclic Graph (DAG):
- Continuation: Window A → Window B (extend output)
- Fan-out: Window A → [B₁, B₂, B₃] (parallel dispatch)
- Fan-in: [B₁, B₂, B₃] → Window C (merge results)
Four-Tier Memory Hierarchy¶
CRP manages context across 4 memory tiers with different lifetimes:
graph TB
subgraph "Tier 0 — Active Context"
A[LLM KV Cache<br/>Lifetime: one call]
end
subgraph "Tier 1 — Hot State"
B[Envelope facts<br/>Lifetime: one window]
end
subgraph "Tier 2 — Warm State"
C[All session facts<br/>In-memory + async persist]
end
subgraph "Tier 3 — Cold State (CKF)"
D[Cross-session knowledge<br/>SQLite + vector store]
end
A --> B --> C --> D
| Tier | Name | Lifetime | Storage | Size |
|---|---|---|---|---|
| 0 | Active Context | One LLM call | KV cache | Model context window |
| 1 | Hot State | One window | App memory | Adaptive |
| 2 | Warm State | One session | Memory + async disk | MB scale |
| 3 | Cold State (CKF) | Cross-session | SQLite + vectors | Up to 500 MB |
Performance
Warm state hot path is in-memory only. Persistence is async with batch flush every 5 windows. This means zero I/O latency on the critical path.
Orchestration Flow¶
For each dispatch call, CRP follows this pipeline:
- Receive TaskIntent — Parse system prompt, task input, and kwargs
- Construct Envelope — Query warm/cold state, rank facts, pack envelope
- Assemble Window — Create window metadata in the DAG
- Dispatch to LLM — Send envelope + task via provider adapter
- Run Extraction — 6-stage pipeline on LLM output
- Measure Info Flow — Calculate quality score $Q(t, w)$
- Check Continuation — If truncated AND task unfulfilled → continue
- Update DAG — Record window relationships and provenance
- Return Output — Stitched text +
QualityReport
Configuration¶
from crp import Client
client = Client(
provider="openai", # or "anthropic", "ollama", etc.
model="gpt-4o",
config={
"max_continuations": 10,
"extraction_stages": [1, 2, 3, 4, 5], # Skip LLM-assisted
"ckf_enabled": True,
"ckf_path": "./my_knowledge_base",
},
)
See Providers for provider-specific configuration.