Why CRP?¶
The Context Problem Nobody Has Solved¶
Every AI application faces the same fundamental limitation: LLMs have finite context and finite output. Ask for a comprehensive document and it truncates. Build a multi-turn agent and context degrades. Deploy to production and there's no audit trail.
Existing solutions each solve one piece:
- RAG retrieves documents but doesn't manage output or continuation
- MemGPT/Letta pages memory but burns tokens on self-management
- LangChain/LlamaIndex chains prompts but has no structured context lifecycle
- MCP gives agents tools but not context
- A2A lets agents communicate but not reason effectively
CRP is the first protocol that manages the complete context lifecycle — ingestion, extraction, packing, dispatch, continuation, quality assessment, and cross-session persistence — as a single, coherent system.
How CRP Manages Context¶
The Core Insight¶
Instead of cramming everything into one massive context window and hoping for the best, CRP uses dedicated, optimized windows where every token earns its place:
graph TB
subgraph "Traditional Approach"
A1["Raw text + instructions + history + tools<br/>all dumped in one window"] --> A2["Attention degrades<br/>Output truncates<br/>No continuation"]
end
subgraph "CRP Approach"
B1["6-Stage Extraction"] --> B2["Scored Fact Graph"]
B2 --> B3["Maximally-Saturated Envelope"]
B3 --> B4["Dedicated Task Window"]
B4 -->|"wall hit"| B5["Continuation Engine"]
B5 --> B1
B4 -->|"complete"| B6["Quality-Assessed Output"]
end
The 10 Axioms¶
Every design decision in CRP flows from 10 non-negotiable axioms:
| # | Axiom | What It Means |
|---|---|---|
| 1 | Task Isolation | Every LLM call gets its own dedicated window. No cross-task contamination |
| 2 | Maximum Context Saturation | Fill every available token with scored, relevant facts: $E = C - S - T - G$ |
| 3 | Zero Interpretation Overhead | Pre-digested facts, not raw data. The LLM acts immediately, doesn't parse |
| 4 | Model Ignorance | The LLM never knows CRP exists. All intelligence lives in the orchestrator |
| 5 | Unbounded Capacity | Total throughput = $N_{windows} \times C_{tokens}$. No hard ceiling |
| 6 | Portability | Works with any model, any provider, any application. Zero lock-in |
| 7 | Window Provenance | Every fact is a node in a DAG. Full lineage from output back to source |
| 8 | Hardware-Adaptive | Adapts to available VRAM, RAM, and CPU. No hardcoded assumptions |
| 9 | Output Integrity | CRP NEVER modifies LLM output. Extraction is read-only |
| 10 | LLM Amplification | CRP amplifies the LLM, never replaces it. The LLM does all thinking |
CRP vs Everything Else¶
Detailed Comparison¶
| CRP | RAG | MemGPT | LangChain | MCP | A2A | |
|---|---|---|---|---|---|---|
| Context management | Full lifecycle | Retrieval only | Virtual paging | Chain-based | None | None |
| Output continuation | Automatic multi-window | No | No | Manual chains | No | No |
| Knowledge extraction | 6-stage pipeline | Chunk embedding | LLM-managed | Manual | No | No |
| Quality assessment | S/A/B/C/D tiers | No | No | No | No | No |
| Provenance tracking | Full DAG lineage | Document source | No | No | No | No |
| Hallucination detection | 6-signal DPE | No | No | No | No | No |
| Cross-session memory | CKF (graph + vector + SQL) | Vector DB | Archival storage | Manual | No | No |
| In-window overhead | Zero | Low (chunks) | High (paging) | Medium | Very High (schemas) | Varies |
| Model coupling | Any model | Any model | Needs function calling | Any model | Any model | Any model |
| Compliance | 33/35 EU AI Act controls | No | No | No | No | No |
| Meta-learning | ORC + ICML + RTL | No | No | No | No | No |
The Token Efficiency Gap¶
MCP in a typical 20-step agentic loop with 50 tools:
$$20 \text{ steps} \times 10{,}000 \text{ schema tokens} = 200{,}000 \text{ tokens on tool definitions alone}$$
CRP puts tool schemas only in tool-selection windows → ~90% fewer protocol tokens, ~70% lower cloud API cost.
Why Not Just Use a 128K Context Window?¶
Even with infinite context, CRP adds irreplaceable value:
| Value | Why Native Context Can't Provide It |
|---|---|
| Context quality | CRP scores and ranks facts. Raw text has no ranking |
| Attention optimization | Critical facts at the start, not buried at position 50K |
| Cost efficiency | $O(N)$ total tokens vs $O(N^2)$ for growing context |
| Cross-session knowledge | Persists across sessions, machines, and time |
| Structured knowledge | Typed fact graph, not flat text |
| Observability | Full provenance for every claim |
| Reasoning amplification | Small models gain multi-step ability |
"Retrieval can significantly improve the performance of LLMs regardless of their extended context window sizes" — Xu et al., ICLR 2024
The 9 Innovations¶
1. Chained Generation Windows¶
Instead of one truncated output, CRP produces multiple high-quality segments across dedicated windows. Each window gets fresh attention, a re-injected system prompt, and an extraction-built envelope carrying full semantic state.
Result
11.8x more content from the same model. See benchmarks →
2. 6-Stage Graduated Extraction¶
Not text chunking — structured knowledge extraction: regex → statistical NLP → GLiNER NER → UIE relations → RST discourse → LLM-assisted relational. Stages self-gate based on content complexity.
3. Maximally-Saturated Context Envelopes¶
The old approach: a ~100 token baton between windows. CRP fills the entire remaining context with semantically scored, priority-packed, DAG-tracked atomic facts. Multi-phase scoring: bi-encoder → cross-encoder → graph-aware packing.
4. Contextual Knowledge Fabric (CKF)¶
4-mode retrieval: graph walk + pattern query + semantic fallback + community summaries (Leiden detection). Not flat vectors — a typed knowledge graph with temporal history and event sourcing.
5. Multi-Signal Completion Detection¶
Four independent signals: fact flow, structural flow, vocabulary novelty, structural completion — weighted by content type. No arbitrary "max iterations."
6. Zero In-Window Protocol Overhead¶
The LLM receives system prompt + scored facts + task. No protocol metadata, no memory management instructions, no function call schemas. The model never knows CRP exists.
7. Reasoning Amplification (Meta-Learning)¶
ORC, ICML, and RTL enable small models (2B–7B) to perform multi-step reasoning. A 770M model with CRP scaffolding outperforms a 540B model on structured tasks (Hsieh et al., 2023).
8. Security as Architecture¶
8-layer defense-in-depth, every control < 1ms: HMAC-SHA256 session binding (~2μs), BLAKE3 fact integrity (~5μs), AES-256-GCM encryption, quantum-resistant symmetric crypto. Security is structural, not bolted on.
9. Agentic Cognitive Architecture¶
The LLM operates inside CRP as its cognitive engine — task analysis, strategy routing, fact synthesis, output evaluation, and memory curation. Single _cognitive_call() bottleneck for budget control.
Where CRP Sits in the AI Stack¶
┌─────────────────────────────────────────────┐
│ Layer 3: A2A │
│ Agent-to-Agent Communication │
│ "How agents talk to each other" │
├─────────────────────────────────────────────┤
│ Layer 2: MCP │
│ Model Context Protocol │
│ "How agents access tools" │
├─────────────────────────────────────────────┤
│ Layer 1: CRP ◀── THE FOUNDATION │
│ Context Relay Protocol │
│ "How each agent manages its own context" │
│ Unbounded context · Unbounded generation │
│ Amplified reasoning · Full provenance │
└─────────────────────────────────────────────┘
MCP gives agents tools. A2A lets agents talk. CRP gives every agent the context foundation both protocols assume but neither provides.
Proof: Real Numbers¶
| Metric | Without CRP | With CRP | Multiplier |
|---|---|---|---|
| Words produced | 592 | 6,993 | 11.8x |
| Sections completed | 8/30 | 25/30 | 3.1x |
| Task completed? | No | Yes | — |
| Quality tier | — | A | — |
| Protocol overhead | — | 6.1% | — |
| Throughput | 4.9 w/s | 4.9 w/s | Same |
Same model, same hardware, same task. CRP takes 12x longer but produces 12x more content at identical throughput. The difference: CRP finishes the task.