Context Engineering
Context injection that assembles, filters, and compresses every turn.
Overview
LLMs are stateless. Attention decays over position. Identity drifts within 8 turns. The context engineering system solves this by treating context as environment, not memory.
The system does not rely on the model remembering. It injects what it knows, scores whether the response used it, and adjusts for the next turn.
Working Memory
Context is organized into typed, immutable regions:
| Region | Contents |
|---|---|
| Core | System instructions, constraints |
| Persona | RICE definition, voice patterns |
| Context | User profile, session state |
| History | Conversation turns (compressed) |
| Reasoning | Internal monologue, planning |
| Web context | Search results (when triggered) |
| Capsule context | Domain-specific knowledge packs |
| Summary | Compressed prior conversation |
Four-Layer Filtering Pipeline
Raw context is noisy. The filtering pipeline reduces it to signal:
Provider Activation
Selects which context providers fire based on detected intent. Not every turn needs every context source.
Remote Relevance Scoring
A lightweight model scores relevance. Raw context reduces to a focused signal for the current turn.
Sanitization
Strips injection patterns, suspicious code blocks, and clamps blocks exceeding size limits.
Focus Compression
Compresses filtered context to a single targeting directive: what the model should focus on for this specific turn.
Cognitive Pipeline
Each turn runs through a six-step reasoning pipeline:
Detect Intent
Identifies intent, topics, and confidence level.
Internal Monologue
Surfaces assumptions, questions, and risks before responding.
Plan Context Retrieval
Determines which providers to query and what to look for.
Plan Response
Sets tone, structure, and key points.
Generate Response
Produces the final output using persona + context.
Summarize
Compresses conversation history when it exceeds capacity.
Context Packs
A context pack is the unit of composability. Each pack declares:
Providers
Where the data comes from: APIs, knowledge bases, search indexes.
Access Rules
Who can use it, under what conditions, with what permissions.
Capabilities
What it enables the persona to do: answer medical questions, check order status, recommend products.
Cache Policies
TTL, invalidation triggers, freshness requirements.
Vertical specialization is composition. A healthcare persona is: RICE definition + patient chart pack + medications pack + lab results pack + clinical guidelines pack.
Scoring context usage
Context Fidelity Score (CFS) measures the gap between stored and applied context. It catches the case where the system injects context but the model ignores it: "knows but doesn't use."
Prompt Assembly
Six layers compose the final prompt sent to the LLM:
| Layer | Purpose |
|---|---|
| RICE base | Persona definition and constraints |
| Style modifier | Adjustments based on EQ assessment |
| Conversation context | Injected facts, session state |
| Deflection override | Prevents off-topic drift |
| Goal steering | State-appropriate steering prompts |
| Search context | Web or capsule results (when triggered) |
Each layer is independently testable. The assembled prompt is logged for auditability.