PersonaPersistBench
Evaluation framework scoring identity persistence across turns and models.
Overview
No standard benchmark measures whether a persona held across turns, models, or adversarial pressure. CharacterBench (AAAI 2025) evaluates character customization across 11 dimensions but assumes a single model and does not test identity persistence through model swaps. Existing commercial metrics (CSAT, deflection, resolution rate) measure outcomes, not coherence.
PersonaPersistBench closes this gap. Five metrics run inside the decision loop, scored every turn.
Metrics
ICS: Identity Coherence
Embedding cosine similarity between response and persona definition. Measures whether the AI stayed in character.
VCS: Voice Consistency
Signature phrase presence, forbidden pattern absence. Measures whether the AI sounds like itself.
MCS: Memory Continuity
Fact extraction and downstream recall verification. Measures whether the AI remembers what it learned.
CFS: Context Fidelity
Gap between stored and applied context. Catches "knows but ignores."
DR: Drift Rate
EWMA of ICS across turns. Separates noise from systematic identity decline.
Identity Coherence (ICS)
Every RICE definition is pre-computed into a high-dimensional embedding. Every model response is embedded and scored against that vector in real time.
| Range | Interpretation |
|---|---|
| > 0.85 | Strong alignment |
| 0.70-0.85 | Acceptable, minor drift |
| < 0.70 | Triggers regeneration |
Automatic recovery
When ICS drops below 0.70, the system automatically regenerates the response with reinforced persona context. The user never sees the failed response.
Voice Consistency (VCS)
Pattern matching runs against the RICE Communication layer every turn:
| Match Type | Score Impact |
|---|---|
| Signature phrase present | +0.1 |
| Forbidden pattern detected | -0.2 |
| Stylistic rule compliance | +0.05 |
Memory Continuity (MCS)
Facts are extracted from each turn and stored. MCS verifies downstream recall: did the system use a fact the user shared three turns ago? Binary per fact, aggregated per session.
Context Fidelity (CFS)
Measures the gap between stored and applied context. Catches the case where the system injects context but the model ignores it: "knows but doesn't use."
Drift Rate (DR)
Uses Exponentially Weighted Moving Average (EWMA) to separate noise from systematic decline. This catches gradual persona erosion that single-turn scoring would miss.
| Range | Interpretation |
|---|---|
| < 0.10 | Stable identity |
| 0.10-0.15 | Minor drift, acceptable |
| > 0.15 | Systematic decline, intervention needed |
Success Criteria
| Condition | Threshold |
|---|---|
| ICS across all model switches | > 0.70 |
| VCS regardless of underlying model | Consistent |
| DR across full conversation | < 0.15 |
| Adversarial resistance | Robust to prompt injection |
Test Scenarios
PersonaPersistBench evaluates across three dimensions:
- Single session: sustained identity over 20+ turns
- Multi-turn with model switches: multiple LLMs within one conversation
- Adversarial pressure: character break attempts, prompt injection, role confusion
References
- Li, K. et al. (2024). Measuring and Controlling Instruction (In)Stability in Language Model Dialogs. COLM 2024.
- Choi, J. et al. (2024). Examining Identity Drift in Conversations of LLM Agents.
- Zhou, J. et al. (2025). CharacterBench: Benchmarking Character Customization of Large Language Models. AAAI 2025.