Contextual Knowledge Fabric (CKF)¶
The CKF is CRP's Tier 3 cold storage — a persistent, cross-session knowledge base that survives beyond individual sessions. It stores facts and their relationships in a graph structure with 4 retrieval modes.
Architecture¶
graph TB
subgraph "CKF Storage"
A[SQLite<br/>Fact nodes + edges]
B[Vector Store<br/>HNSW embeddings]
C[Community Index<br/>Leiden clusters]
end
subgraph "Retrieval Modes"
D[Graph Walk<br/>BFS traversal]
E[Pattern Query<br/>Structural matching]
F[Semantic Search<br/>ANN / cosine]
G[Community Summary<br/>Cluster-level]
end
D --> A
E --> A
F --> B
G --> C
4 Retrieval Modes¶
1. Graph Walk¶
Breadth-first traversal from a seed fact, following typed edges:
# CKF is used automatically during envelope construction.
# Facts from CKF appear in Section 10 (CKF Retrievals) of the envelope.
- Starts from facts most relevant to the current task
- Follows
depends_on,elaborates,cause_effectedges - Configurable depth (default: 2 hops)
- Results scored by graph distance + relevance
2. Pattern Query¶
Structural pattern matching against the fact graph:
- Find all facts matching a template pattern
- Supports wildcard nodes and edge types
- Efficient for "find all causes of X" queries
3. Semantic Search¶
ANN (Approximate Nearest Neighbor) vector search:
- Uses shared
all-MiniLM-L6-v2embeddings - HNSW index for $O(\log N)$ retrieval
- Cosine similarity scoring
- Fallback when graph structure doesn't capture the relationship
4. Community Detection¶
Leiden algorithm clusters related facts into communities:
- Identifies topic clusters automatically
- Community summaries provide high-level context
- Useful for understanding the "shape" of accumulated knowledge
Multi-Mode Merge¶
When multiple retrieval modes return results, CRP merges them:
- Deduplicate across modes
- Blend scores (configurable weights per mode)
- Respect token budget for CKF section of envelope
- Prioritize facts that appear in multiple modes
Storage¶
| Component | Backend | Purpose |
|---|---|---|
| Facts + edges | SQLite | Durable, ACID-compliant graph storage |
| Embeddings | HNSW index | Fast vector similarity search |
| Communities | In-memory (rebuilt) | Topic clustering |
| Size limit | 500 MB default | Configurable |
Encryption¶
CKF data is encrypted at rest with AES-256-GCM using HKDF-derived keys. See Security for details.
Garbage Collection¶
The CKF includes a garbage collector that manages fact lifecycle:
- Staleness detection — Facts not accessed within configurable window
- Relevance decay — Facts with zero cross-session references
- Deduplication — Merge semantically equivalent facts
- Compaction — Rebuild indexes after significant deletions
Configuration¶
client = Client(
provider="openai",
model="gpt-4o",
config={
"ckf_enabled": True,
"ckf_path": "./my_knowledge_base",
"ckf_max_size_mb": 500,
},
)
Event System¶
CKF emits events via a pub/sub bus for observability:
| Event Type | Trigger |
|---|---|
FACT_STORED |
New fact persisted to CKF |
FACT_RETRIEVED |
Fact pulled from CKF into envelope |
GC_RUN |
Garbage collection cycle |
COMMUNITY_UPDATED |
Community structure changed |
INDEX_REBUILT |
HNSW index rebuilt |
When does CKF activate?
CKF retrieval runs during envelope construction (Phase 3 of the
packing algorithm). Facts from CKF are placed in Section 10 of the
envelope. CKF is not used if ckf_enabled=False (default: True when
a ckf_path is configured).