Provenance & Hallucination Detection¶
CRP includes a full Decision Provenance Engine (DPE) — a 13-module pipeline that traces every claim in the LLM's output back to its source facts, detects hallucinations, contradictions, fabrications, and distortions, and produces audit-ready reports.
Why Provenance Matters¶
LLMs generate plausible-sounding text, but they routinely:
- Fabricate entities, statistics, and citations that don't exist
- Distort source facts (change numbers, flip negations, broaden scope)
- Omit critical information the context provided
- Contradict themselves across long outputs
- Hallucinate confident claims with zero grounding
CRP's provenance engine catches all of these automatically.
Pipeline Overview¶
graph LR
A[LLM Output] --> B[Claim Detection]
B --> C[Attribution Scoring]
C --> D[Fidelity Verification]
D --> E[Entailment Verification]
E --> F[Hallucination Scoring]
F --> G[Report Generation]
The full pipeline runs in DecisionProvenanceEngine.analyse():
from crp.provenance import DecisionProvenanceEngine
engine = DecisionProvenanceEngine()
report = engine.analyse(
output_text=result.output,
facts=result.facts_extracted,
task_intent=task
)
Stage 1: Claim Detection¶
Splits the LLM output into individual sentences and classifies each:
| Claim Type | Description | Example |
|---|---|---|
FACTUAL_CLAIM |
Verifiable assertion | "Python 3.12 added f-string nesting" |
OPINION |
Subjective judgment | "React is the best framework" |
PROCEDURAL |
Instructions/steps | "Run pip install crprotocol" |
HEDGE |
Qualified statement | "It may improve performance" |
CONNECTIVE |
Structural text | "As mentioned above..." |
Only FACTUAL_CLAIM entries proceed to attribution scoring. Classification
uses pattern-based heuristics — no LLM call, runs in <5ms.
Stage 2: Attribution Scoring¶
Each factual claim is scored against ALL envelope facts using two signals:
- Semantic similarity — Dense embeddings (sentence-transformers, all-MiniLM-L6-v2, 384-dim) or hash-projected bag-of-words fallback
- Lexical overlap — Jaccard similarity on word sets
The composite score determines the attribution type:
| Attribution Type | Meaning | Composite Score |
|---|---|---|
CONTEXT_GROUNDED |
Claim supported by envelope facts | ≥ similarity threshold |
MIXED |
Partial support | Between thresholds |
PARAMETRIC |
From model's training data only | Below threshold |
UNCERTAIN |
Cannot determine | Inconclusive |
Aiming for CONTEXT_GROUNDED
High CONTEXT_GROUNDED percentages indicate the model is using the
envelope facts you provided. Low scores mean the model is relying on its
training data — which may be outdated or wrong.
Stage 3: Fidelity Verification¶
Four parallel checks catch different failure modes:
Distortion Detection¶
Catches when context-grounded claims subtly misrepresent their source:
| Distortion Type | Example |
|---|---|
NUMBER_CHANGED |
Fact: "99.9% uptime" → Claim: "99.99% uptime" |
NEGATION_FLIP |
Fact: "does not support X" → Claim: "supports X" |
QUALIFIER_DROPPED |
Fact: "may improve by 10%" → Claim: "improves by 10%" |
QUALIFIER_ADDED |
Fact: "improves by 10%" → Claim: "only improves by 10%" |
SCOPE_CHANGED |
Fact: "in Python 3.12" → Claim: "in all Python versions" |
ENTITY_SUBSTITUTED |
Fact: "PostgreSQL" → Claim: "MySQL" |
SEMANTIC_DRIFT |
Embedding similarity drops below threshold |
Fabrication Detection¶
Catches invented entities not present in any source fact:
- Extracts: percentages, numbers, dates, proper nouns, citations
- Cross-references against ALL facts using word-boundary regex
- Any entity with zero matches → fabrication alert
Contradiction Detection¶
Three signals for self-contradictions:
- Negation conflicts — "X is fast" then later "X is not fast"
- Number conflicts — "costs $10" then later "costs $50"
- Semantic conflicts — Embedding similarity on opposing claims
Checks both intra-window and cross-window contradictions.
Omission Analysis¶
Detects important facts the model silently ignored:
- Ranks each fact by relevance to the task
- Facts with high relevance and zero coverage → omission
- Severity levels:
CRITICAL/HIGH/MEDIUM/LOW(quartile-based)
Stage 4: Entailment Verification¶
ML-powered Natural Language Inference using a cross-encoder model:
Premise (fact): "TLS 1.3 reduces handshake to 1-RTT"
Hypothesis (claim): "TLS 1.3 greatly improves handshake speed"
Result: ENTAILED ✓
Three outcomes:
| Result | Meaning |
|---|---|
ENTAILED |
Claim logically follows from fact |
NEUTRAL |
Claim is unrelated to fact |
CONTRADICTION |
Claim opposes the fact |
Catches meaning-level drift that regex can't detect:
- Specificity loss ("1-RTT" → "improved")
- Causation inflation ("correlates with" → "causes")
- Scope generalisation ("in Python" → "in all languages")
Graceful degradation: falls back to heuristic when the ML model is unavailable.
Stage 5: Hallucination Risk Scoring¶
Fuses all signals into ONE risk score per claim:
| Signal | Weight |
|---|---|
| Attribution | 0.30 |
| Fidelity | 0.25 |
| Entailment | 0.30 |
| Specificity | 0.15 |
Risk levels:
| Level | Score Range | Meaning |
|---|---|---|
LOW |
< 0.25 | Well-grounded claim |
MEDIUM |
0.25 – 0.50 | Some support, verify manually |
HIGH |
0.50 – 0.75 | Weak grounding, likely parametric |
CRITICAL |
≥ 0.75 | Probable hallucination |
Stage 6: Report Generation¶
Two output formats:
Markdown Report¶
Structured for human review, follows EU AI Act Article 12 format:
- Attribution table with grounding percentages
- Fidelity verification results
- Entailment analysis
- Hallucination risk assessment per claim
- Omissions list with severity
JSON Report¶
Machine-readable for CI/CD pipelines:
report = engine.analyse(output_text, facts, task_intent)
# Access programmatically
print(report.grounding_percentage) # 0.87
print(report.hallucination_count) # 2
print(report.critical_fabrications) # []
print(report.omitted_facts) # [Fact(...), ...]
Provenance Chain¶
Every claim gets a full chain:
This allows tracing any claim back to its origin — which window produced it, what facts were in the envelope at the time, and what the original task was.
Integration with Quality Tiers¶
Hallucination risk feeds directly into Quality Tiers:
| Tier | Max Hallucination Risk |
|---|---|
| S | < 0.10 |
| A | < 0.25 |
| B | < 0.50 |
| C | < 0.75 |
| D | ≥ 0.75 |
EU AI Act Compliance¶
The DPE was designed with EU AI Act Article 12 in mind:
- Full traceability from output to source
- Automated risk scoring
- Human-readable audit reports
- Machine-readable JSON for regulatory submission
- Omission detection (what the model chose to ignore)
See EU AI Act Compliance for full regulatory details.