Skip to content

Context Envelope

The context envelope is CRP's core innovation — a dynamically packed payload that fills all available space in the LLM's context window with the most relevant facts.

Maximum Context Saturation

$$E_{\max} = C - S - T - G$$

Symbol Meaning Example (128K)
$C$ Context window size 131,072
$S$ System prompt tokens ~500
$T$ Task input tokens ~8,756
$G$ Generation reserve 16,384
$E_{\max}$ Envelope capacity 105,816

$G$ is automatically determined: user's max_output_tokens → provider's reported max → min(C // 4, 16384).

In practice

CRP achieves 0.939–1.021 saturation (mean 0.994) — virtually every available token is used for relevant context.

Envelope Sections

The envelope is structured with 11 priority-ordered sections. Higher-priority sections survive when space is limited:

Priority Section Tokens Purpose
1 Critical State 100–500 GOAL, PHASE, BLOCKER, CONSTRAINT, WINDOW
2 LLM Synthesis Adaptive LLM's own curated understanding
3 Task Brief Varies What to do + output format
4 Discoveries Bulk Atomic facts with graph edges
5 Source Passages Variable Verbatim text for high-relevance facts
6 Decisions & Plan Variable Reasoning trail with justifications
7 Error Log Small What failed and why
8 Tool History Small Compact execution summaries
9 Expanded Context Overflow Full-fidelity data from warm state
10 CKF Retrievals Variable Cross-session knowledge
11 Reasoning Scaffold Small Step-by-step templates (weak models)

Fact Selection Algorithm

CRP uses a 3-phase pipeline to select which facts go into the envelope:

Phase 1: Multi-Aspect Task Decomposition

The task is broken into noun phrases / aspects. A fact matching any aspect scores high:

$$\text{score}(f) = \max_{a \in \text{aspects}} \cos\bigl(\text{embed}(f),\; \text{embed}(a)\bigr)$$

Phase 2: Bi-Encoder Fast Scoring

All facts scored using all-MiniLM-L6-v2 embeddings. For >1,000 facts, an HNSW ANN index provides $O(\log N)$ retrieval.

Composite score:

$$\text{final}(f) = \text{sim}(f) \times \text{recency}(f) \times \text{novelty}(f) + \text{dep_bonus}(f)$$

Factor Formula Range
Recency $e^{-0.1 \times \text{age_in_windows}}$ 0 → 1
Novelty Unseen: 1.5×, <3 uses: 1.0×, 3+: 0.5× 0.5 → 1.5
Dependency Graph-connected facts inherit relevance 0 → 0.5

Phase 3: Cross-Encoder Reranking

Top 200 candidates re-scored with ms-marco-MiniLM-L6-v2:

$$\text{blended} = 0.6 \times \text{CE_score} + 0.4 \times \text{BE_score}$$

Cache hit rate: 50–80% in continuation chains (saves 200–320 ms/window).

Packing Strategy

After scoring, facts are packed using greedy bin-packing with:

  • Dependency-aware graph pulling — up to 2 hops of connected facts
  • Bookend strategy — top 3 facts duplicated at envelope end (counters "lost in the middle" attention bias)
  • Progressive compression — truncation → summarization → tabular → reference replacement

Continuation Envelopes

When output is truncated, CRP builds a continuation envelope containing:

Component Purpose
Extracted facts From the truncated output
Structural state Open blocks, list position, section headers
Task gap Missing items from original task
Style anchor Last natural paragraph for voice consistency
Voice profile Sentence length, vocabulary, tone markers
Document map Running TOC with section completion status

Note

Continuation envelopes use extraction results, not raw text overlap. This is key to CRP's quality preservation across windows.

Preview

Before dispatching, you can preview what the envelope will look like:

preview = client.preview_envelope(
    system_prompt="You are a technical writer.",
    task_input="Explain Kubernetes networking.",
)
print(f"Total tokens:      {preview.total_tokens}")
print(f"Envelope tokens:   {preview.envelope_tokens}")
print(f"Generation reserve: {preview.generation_reserve}")
print(f"Facts included:    {preview.facts_included}")
print(f"Facts available:   {preview.facts_available}")
print(f"Saturation:        {preview.saturation:.1%}")