Skip to content

Why CRP?

The Context Problem Nobody Has Solved

Every AI application faces the same fundamental limitation: LLMs have finite context and finite output. Ask for a comprehensive document and it truncates. Build a multi-turn agent and context degrades. Deploy to production and there's no audit trail.

Existing solutions each solve one piece:

  • RAG retrieves documents but doesn't manage output or continuation
  • MemGPT/Letta pages memory but burns tokens on self-management
  • LangChain/LlamaIndex chains prompts but has no structured context lifecycle
  • MCP gives agents tools but not context
  • A2A lets agents communicate but not reason effectively

CRP is the first protocol that manages the complete context lifecycle — ingestion, extraction, packing, dispatch, continuation, quality assessment, and cross-session persistence — as a single, coherent system.


How CRP Manages Context

The Core Insight

Instead of cramming everything into one massive context window and hoping for the best, CRP uses dedicated, optimized windows where every token earns its place:

graph TB
    subgraph "Traditional Approach"
        A1["Raw text + instructions + history + tools<br/>all dumped in one window"] --> A2["Attention degrades<br/>Output truncates<br/>No continuation"]
    end

    subgraph "CRP Approach"
        B1["6-Stage Extraction"] --> B2["Scored Fact Graph"]
        B2 --> B3["Maximally-Saturated Envelope"]
        B3 --> B4["Dedicated Task Window"]
        B4 -->|"wall hit"| B5["Continuation Engine"]
        B5 --> B1
        B4 -->|"complete"| B6["Quality-Assessed Output"]
    end

The 10 Axioms

Every design decision in CRP flows from 10 non-negotiable axioms:

# Axiom What It Means
1 Task Isolation Every LLM call gets its own dedicated window. No cross-task contamination
2 Maximum Context Saturation Fill every available token with scored, relevant facts: $E = C - S - T - G$
3 Zero Interpretation Overhead Pre-digested facts, not raw data. The LLM acts immediately, doesn't parse
4 Model Ignorance The LLM never knows CRP exists. All intelligence lives in the orchestrator
5 Unbounded Capacity Total throughput = $N_{windows} \times C_{tokens}$. No hard ceiling
6 Portability Works with any model, any provider, any application. Zero lock-in
7 Window Provenance Every fact is a node in a DAG. Full lineage from output back to source
8 Hardware-Adaptive Adapts to available VRAM, RAM, and CPU. No hardcoded assumptions
9 Output Integrity CRP NEVER modifies LLM output. Extraction is read-only
10 LLM Amplification CRP amplifies the LLM, never replaces it. The LLM does all thinking

CRP vs Everything Else

Detailed Comparison

CRP RAG MemGPT LangChain MCP A2A
Context management Full lifecycle Retrieval only Virtual paging Chain-based None None
Output continuation Automatic multi-window No No Manual chains No No
Knowledge extraction 6-stage pipeline Chunk embedding LLM-managed Manual No No
Quality assessment S/A/B/C/D tiers No No No No No
Provenance tracking Full DAG lineage Document source No No No No
Hallucination detection 6-signal DPE No No No No No
Cross-session memory CKF (graph + vector + SQL) Vector DB Archival storage Manual No No
In-window overhead Zero Low (chunks) High (paging) Medium Very High (schemas) Varies
Model coupling Any model Any model Needs function calling Any model Any model Any model
Compliance 33/35 EU AI Act controls No No No No No
Meta-learning ORC + ICML + RTL No No No No No

The Token Efficiency Gap

MCP in a typical 20-step agentic loop with 50 tools:

$$20 \text{ steps} \times 10{,}000 \text{ schema tokens} = 200{,}000 \text{ tokens on tool definitions alone}$$

CRP puts tool schemas only in tool-selection windows → ~90% fewer protocol tokens, ~70% lower cloud API cost.

Why Not Just Use a 128K Context Window?

Even with infinite context, CRP adds irreplaceable value:

Value Why Native Context Can't Provide It
Context quality CRP scores and ranks facts. Raw text has no ranking
Attention optimization Critical facts at the start, not buried at position 50K
Cost efficiency $O(N)$ total tokens vs $O(N^2)$ for growing context
Cross-session knowledge Persists across sessions, machines, and time
Structured knowledge Typed fact graph, not flat text
Observability Full provenance for every claim
Reasoning amplification Small models gain multi-step ability

"Retrieval can significantly improve the performance of LLMs regardless of their extended context window sizes" — Xu et al., ICLR 2024


The 9 Innovations

1. Chained Generation Windows

Instead of one truncated output, CRP produces multiple high-quality segments across dedicated windows. Each window gets fresh attention, a re-injected system prompt, and an extraction-built envelope carrying full semantic state.

Result

11.8x more content from the same model. See benchmarks →

2. 6-Stage Graduated Extraction

Not text chunking — structured knowledge extraction: regex → statistical NLP → GLiNER NER → UIE relations → RST discourse → LLM-assisted relational. Stages self-gate based on content complexity.

3. Maximally-Saturated Context Envelopes

The old approach: a ~100 token baton between windows. CRP fills the entire remaining context with semantically scored, priority-packed, DAG-tracked atomic facts. Multi-phase scoring: bi-encoder → cross-encoder → graph-aware packing.

4. Contextual Knowledge Fabric (CKF)

4-mode retrieval: graph walk + pattern query + semantic fallback + community summaries (Leiden detection). Not flat vectors — a typed knowledge graph with temporal history and event sourcing.

5. Multi-Signal Completion Detection

Four independent signals: fact flow, structural flow, vocabulary novelty, structural completion — weighted by content type. No arbitrary "max iterations."

6. Zero In-Window Protocol Overhead

The LLM receives system prompt + scored facts + task. No protocol metadata, no memory management instructions, no function call schemas. The model never knows CRP exists.

7. Reasoning Amplification (Meta-Learning)

ORC, ICML, and RTL enable small models (2B–7B) to perform multi-step reasoning. A 770M model with CRP scaffolding outperforms a 540B model on structured tasks (Hsieh et al., 2023).

8. Security as Architecture

8-layer defense-in-depth, every control < 1ms: HMAC-SHA256 session binding (~2μs), BLAKE3 fact integrity (~5μs), AES-256-GCM encryption, quantum-resistant symmetric crypto. Security is structural, not bolted on.

9. Agentic Cognitive Architecture

The LLM operates inside CRP as its cognitive engine — task analysis, strategy routing, fact synthesis, output evaluation, and memory curation. Single _cognitive_call() bottleneck for budget control.


Where CRP Sits in the AI Stack

┌─────────────────────────────────────────────┐
│  Layer 3:  A2A                              │
│  Agent-to-Agent Communication               │
│  "How agents talk to each other"            │
├─────────────────────────────────────────────┤
│  Layer 2:  MCP                              │
│  Model Context Protocol                     │
│  "How agents access tools"                  │
├─────────────────────────────────────────────┤
│  Layer 1:  CRP  ◀── THE FOUNDATION         │
│  Context Relay Protocol                     │
│  "How each agent manages its own context"   │
│  Unbounded context · Unbounded generation   │
│  Amplified reasoning · Full provenance      │
└─────────────────────────────────────────────┘

MCP gives agents tools. A2A lets agents talk. CRP gives every agent the context foundation both protocols assume but neither provides.


Proof: Real Numbers

Metric Without CRP With CRP Multiplier
Words produced 592 6,993 11.8x
Sections completed 8/30 25/30 3.1x
Task completed? No Yes
Quality tier A
Protocol overhead 6.1%
Throughput 4.9 w/s 4.9 w/s Same

Same model, same hardware, same task. CRP takes 12x longer but produces 12x more content at identical throughput. The difference: CRP finishes the task.

See full benchmark results → Try the demo app →