Meta-Learning¶
CRP includes three meta-learning mechanisms (§19) that help small models punch above their weight — and help all models improve over time.
The Problem¶
Small models (0.5B–4B parameters) struggle with complex tasks:
- They forget instructions mid-generation
- They produce shallow, generic outputs
- They don't break problems into steps
CRP's meta-learning addresses this by providing external scaffolding that compensates for model limitations.
Three Mechanisms¶
graph TD
A[Task Arrives] --> B{Model Capable?}
B -->|Yes| C[Standard Dispatch]
B -->|No| D[ORC: Break into Steps]
D --> E[Execute Each Step]
E --> F[Synthesize Result]
A --> G[ICML: Build Envelope]
G --> H{Trace Library?}
H -->|Yes| I[Add Few-Shot Examples]
H -->|No| J[Add Reasoning Scaffold]
I --> C
J --> C
C --> K[Result]
K --> L{Quality ≥ 0.7?}
L -->|Yes| M[RTL: Store Trace]
L -->|No| N[Discard]
ORC: Orchestrated Reasoning Chains¶
ORC decomposes complex tasks into micro-steps when the model can't handle them in one shot.
When ORC Activates¶
All three conditions must be true:
| Condition | Check |
|---|---|
| Resource pressure | Must be < HIGH (won't add overhead under pressure) |
| Model capability < task complexity | Model needs help |
| Probe quality < 0.7 | Initial attempt was poor |
If the model can handle it alone, ORC stays out of the way.
How ORC Works¶
- Decompose — Split the task into 3–10 micro-steps based on task type
- Execute — Dispatch each step individually (each gets its own envelope)
- Synthesize — Combine step outputs into a coherent final answer
- Retry — If a step fails, retry with heavier scaffolding
Scaffold Levels¶
| Level | Name | What It Does |
|---|---|---|
| 0 | None | No scaffolding |
| 1 | Light | Key factors to consider |
| 2 | Medium | Step-by-step template |
| 3 | Heavy | 5-step structural template with explicit output format |
Example¶
A 2B model asked to "Compare React, Vue, and Angular":
Without ORC: Produces a shallow 3-paragraph summary missing key dimensions.
With ORC:
- Step 1: "List the core architecture of React" → focused output
- Step 2: "List the core architecture of Vue" → focused output
- Step 3: "List the core architecture of Angular" → focused output
- Step 4: "Compare these architectures on performance, learning curve, ecosystem" → structured comparison
- Synthesis: Combine into coherent comparison article
Each step gets the model's full attention on a manageable subtask.
ICML: In-Context Meta-Learning¶
ICML enriches the dispatch envelope with reasoning scaffolds and few-shot examples from the trace library.
Adaptive Scaffolding¶
The scaffold level adapts to model capability:
| Model Capability | Scaffold | What's Added |
|---|---|---|
| ≤ 1 (0.5B–1B) | Heavy | 5-step template: "1. Identify key concepts, 2. List relevant facts, 3. Organize by theme, 4. Draft response, 5. Review for completeness" |
| ≤ 2 (2B–7B) | Light | Key factors prompt: "Consider these factors: ..." |
| > 2 (7B+) | None | Model doesn't need scaffolding |
Few-Shot Examples¶
If the trace library has similar past tasks with quality ≥ 0.7:
- Retrieve top-3 matching traces (by word overlap)
- Include their step descriptions as examples in the envelope
- Model sees "here's how a similar task was solved before"
Metacognitive Envelope¶
The final envelope combines:
Base Envelope (facts, task, system prompt)
+ Reasoning Scaffold (if model needs it)
+ Few-Shot Traces (if available)
= Metacognitive Envelope
RTL: Reasoning Template Library¶
RTL stores successful reasoning traces for future reuse.
Storage Criteria¶
A trace is stored only if:
- Quality score ≥ 0.7 (configurable via
rtl_min_quality_for_storage) - The trace is complete (all steps executed)
Trace Structure¶
ReasoningTrace(
trace_id="abc-123",
task_type="comparison",
task_summary="Compare React, Vue, Angular",
steps=[
ReasoningStep(
step_description="Analyze React architecture",
system_prompt_template="...",
expected_output_format="bullet_points",
scaffold_level=1
),
# ...
],
model_class="2B-7B",
quality_score=0.82,
usage_count=0
)
Retrieval¶
When a new task arrives, RTL retrieves traces by word overlap:
- Tokenize task description
- Compare against stored trace summaries
- Return top-k matches (default: 3)
- Increment
usage_counton retrieved traces
Curation¶
Every curation_interval (default: 5) requests, the library prunes:
- Traces with quality below threshold
- Traces that haven't been used
Configuration¶
from crp.advanced.meta_learning import MetaLearningEngine, MetaLearningConfig
config = MetaLearningConfig(
enabled=True,
orc_enabled=True,
orc_max_steps=10,
orc_min_model_capability=1,
icml_enabled=True,
icml_max_examples=3,
rtl_enabled=True,
rtl_min_quality_for_storage=0.7,
scaffold_level="auto", # "auto"|"none"|"light"|"heavy"
curation_interval=5
)
engine = MetaLearningEngine(
dispatch_fn=my_dispatch_function,
model_capability=2, # 0=tiny, 1=small, 2=medium, 3=large
config=config
)
Model Capability Levels¶
| Level | Parameter Range | Examples |
|---|---|---|
| 0 | < 0.5B | TinyLlama |
| 1 | 0.5B – 1B | Phi-2, Qwen-0.5B |
| 2 | 2B – 7B | Qwen3-4B, Llama-3.1-8B |
| 3 | 7B+ | Llama-3.1-70B, GPT-4 |
Research Foundations¶
Meta-learning in CRP is grounded in:
- MAML (Finn 2017) — Learning to learn with few examples
- ICL as Implicit Gradient Descent (Dai 2023) — In-context learning IS learning
- STaR (Zelikman 2022) — Self-taught reasoning via bootstrapping
- Distilling Step-by-Step (Hsieh 2023) — 770M model outperforming 540B with scaffolding
The key insight: you don't need a bigger model if you give a smaller model better structure. CRP's meta-learning provides that structure automatically.
See Research Foundations for the full academic basis.