Running Tests¶
Prerequisites¶
# Install CRP with dev dependencies
pip install -e ".[dev]"
# Verify pytest is available
python -m pytest --version
Dev dependencies include: pytest>=7.4, pytest-asyncio>=0.21, pytest-cov>=4.1.
Test Configuration¶
From pyproject.toml:
Running Tests¶
One File at a Time (Recommended)¶
# Run a specific test file
python -m pytest tests/test_smoke.py -v --tb=short
# Run a specific test class
python -m pytest tests/test_phase1.py::TestErrorCodes -v
# Run a specific test function
python -m pytest tests/test_phase2.py::TestExtractionPipeline::test_stage1_regex -v
# Run with coverage
python -m pytest tests/test_phase3.py --cov=crp --cov-report=term
Never run all tests in parallel
Running the full suite simultaneously will max out CPU and memory. Always run one file at a time.
Shared Fixtures¶
Defined in tests/conftest.py:
| Fixture | Type | Purpose |
|---|---|---|
sample_task_intent |
TaskIntent |
Minimal task with system prompt and input |
sample_system_prompt |
str |
"You are a helpful assistant." |
sample_task_input |
str |
"What is CRP?" |
Individual test files define additional fixtures inline for their specific needs.
Test Categories¶
Smoke Tests¶
File: test_smoke.py (6 tests)
Fast sanity checks that CRP imports correctly:
- Package version is a valid string
CRPErrorexists and is anExceptionTaskIntenthas correct default values- Core modules are importable
Unit Tests: Core Phases¶
9 files covering all SDK phases, 636 tests total:
| File | Tests | Phase | What It Tests |
|---|---|---|---|
test_phase1.py |
102 | Errors & Config | Error codes, session config, window sizing, orchestrator lifecycle, provider registration |
test_phase2.py |
96 | Extraction | All 6 extraction stages, quality gate, contradiction detection, complexity analysis |
test_phase3.py |
59 | Envelope | All 6 envelope building phases, token budgeting, fact packing, saturation |
test_phase4.py |
65 | State & CKF | StateFact lifecycle, WarmStateStore, snapshots, compaction, cold storage, graph serialization |
test_phase5.py |
58 | Continuation | Wall detection, gap analysis, stitch algorithm, echo detection, voice profiles, termination |
test_phase6.py |
67 | Security | Session binding, fact HMAC integrity, encryption, input validation, injection detection, RBAC, rate limiting |
test_phase7.py |
91 | Advanced | Auto-ingest, scale mode, hierarchical/parallel strategies, curator, meta-learning, batch, idempotency, cost model |
test_phase8.py |
38 | CLI & Deploy | CLI commands, startup sequence, event emitter, deployment configuration |
test_phase9.py |
60 | Observability | Metrics collection, audit trail, quality scores, telemetry, overhead measurement |
# Run Phase 2 (extraction)
python -m pytest tests/test_phase2.py -v --tb=short
# Run just the envelope tests
python -m pytest tests/test_phase3.py -v --tb=short
Unit Tests: Specialized¶
14 files with deep module-level coverage, 634 tests total:
| File | Tests | Module |
|---|---|---|
test_adaptive_allocator.py |
57 | Resource allocator, hardware detection, EWMA, model unloading |
test_adversarial_provenance.py |
41 | Edge cases: empty strings, unicode attacks, HTML injection, null bytes |
test_agentic.py |
84 | Agentic architecture (§22): facilitator, task analysis, strategy routing |
test_ckf_gate.py |
11 | CKF gate threshold, budget, retriever |
test_compliance_security.py |
75 | Privacy, consent, audit trail, GDPR (§7.12–§7.15) |
test_compliance_wiring.py |
44 | Audit entries for session/dispatch/ingest, PII scanning, lineage |
test_decision_provenance.py |
40 | Envelope audit, LLM call audit, fact extraction audit |
test_decision_provenance_engine.py |
79 | DPE: claim detection, attribution, provenance chains, reports |
test_entailment_risk.py |
62 | NLI verification, hallucination risk scoring, heuristic fallback |
test_fidelity_verification.py |
63 | Distortion, omission, fabrication, contradiction detection |
test_relay_strategies.py |
61 | Reflexive, progressive, stream-augmented strategies (§21) |
test_resource_manager.py |
38 | Resource lifecycle, meta-learning calibration, WindowMetrics |
test_security_modules.py |
45 | Audit trail, privacy, injection, RBAC, encryption, integrity modules |
test_tool_relay.py |
34 | Tool-mediated relay (§20), pull architecture, tool loop, fallback |
# Run provenance engine tests
python -m pytest tests/test_decision_provenance_engine.py -v
# Run security module tests
python -m pytest tests/test_security_modules.py -v
Integration Tests¶
File: test_integration.py (57 tests)
Cross-module end-to-end tests using CustomProvider with controlled
generate_fn. No external APIs needed — everything runs locally with
mock responses.
Production Hardening¶
File: test_production_hardening.py (40 tests)
Tests for production reliability:
- Circuit breaker behavior
- Configuration validation
- Retry logic with backoff
- Session cleanup on crash
- Structured logging format
- Key rotation
Performance Benchmarks¶
File: test_benchmarks.py (12 tests)
Performance regression tests with specific targets:
| Benchmark | Target |
|---|---|
| Cold session init | < 200ms |
| Warm session init | < 50ms |
| Dispatch overhead | < 100ms |
| Envelope assembly | < 50ms |
| Ingest throughput | > 100 facts/sec |
| Cache hit | < 1ms |
| Event emission | < 5ms |
| Metrics export | < 10ms |
Live Tests¶
4 files, 52 tests — require a running LLM (LM Studio or Ollama):
| File | Tests | What It Verifies |
|---|---|---|
test_gap_fixes_live.py |
30 | Gap fixes A–E against real LLM |
test_live_comprehensive.py |
11 | Full protocol verification |
test_live_full_capture.py |
11 | Output capture and analysis |
test_live_long_generation.py |
— | Long generation (standalone script) |
LM Studio connection
Live tests connect to LM Studio at http://192.168.0.6:1234 by default.
Update the connection URL in the test file if your setup differs.
Killer Test Suite¶
Standalone adversarial/stress test scripts in tests/killer_test/:
crp_killer_test.py— Comprehensive stress testdebug_gap.py,debug_gap2.py— Gap debugging utilities
Run directly with Python: