Architecture
Decision loop, measurement layer, and product primitives.
Decision Architecture
A closed loop runs once per user message through a five-state conversation machine. The state machine runs independently of the LLM. EQ assessment opens each pass. Measurement closes it. Scripts adjust based on scores before the next message arrives.
EQ Assessment
Evaluates emotional state and communication style of the incoming message.
Intent Detection
Determines what the user wants, filtered through how they feel.
Listening Layer
Extracts explicit facts and implicit signals (frustration, trust, confusion).
State Machine
Routes through INIT, GREETED, ENGAGED, CONTEXT_RICH, or RESOLVED states.
Persona + LLM
Applies RICE definition for model-agnostic generation.
Measurement
Scores ICS, VCS, MCS, CFS, and DR. Feeds back into the next loop.
Each state gates available scripts and constrains persona behavior. Transitions fire on signals from the listening layer (fact density, trust indicators, resolution markers), not turn count. Every transition is logged with triggering evidence.
Measurement
No standard benchmark measures whether a persona held across turns, models, or adversarial pressure. We built PersonaPersistBench to close this gap. Five metrics run inside the loop, scored every turn.
| Metric | Function |
|---|---|
| ICS Identity Coherence | Embedding similarity against persona definition. Below 0.70 triggers regeneration. |
| VCS Voice Consistency | Signature phrase presence, forbidden pattern absence. |
| MCS Memory Continuity | Facts from prior sessions applied downstream. Binary per fact. |
| CFS Context Fidelity | Gap between stored and applied context. Catches "knows but ignores." |
| DR Drift Rate | EWMA across turns. Separates noise from systematic decline. |
Product Primitives
Personas SDK. A persona is a deployable unit. RICE definition, decision architecture, measurement loop, context injection. The full end-to-end system ships with the persona.
How it validates. Every RICE definition gets pre-computed into a high-dimensional embedding. Every model response gets embedded and scored against that vector in real time. Cosine similarity above 0.70 passes. Below 0.70 triggers regeneration.
Model switch. Swap models mid-conversation. The persona layer rebuilds context each turn. The incoming model never knows it replaced another. ICS holds above 0.70 at every swap.
Not fine-tuning
Fine-tuning costs thousands per persona, freezes behavior at training time, locks to one model. RICE definitions adjust at runtime, run on any model, iterate in minutes. Not RAG: RAG gives retrieval, not voice. Not vanilla prompts: no structure, no scoring, no state.
Context Engineering
The context injection system operates on a four-layer filtering pipeline:
Provider activation
Selects which context providers fire based on detected intent.
Relevance scoring
A lightweight model scores relevance, reducing raw context to a focused signal.
Sanitization
Strips injection patterns, code blocks, and clamps oversized blocks.
Focus compression
Summarizes to a single targeting directive for the current turn.
Context packs are the unit of composability. Each declares its providers, access rules, capabilities, and cache policies. A healthcare persona composes RICE + patient chart + medications + lab results + clinical guidelines.