Opinion Coherence
How models fail at holding opinions and how personas fix it.
Overview
We ran 57 evaluation runs across 3 controversial topics and 5 model families to test whether LLMs can hold consistent opinions under conversational pressure. The results reveal three distinct failure modes that all disappear when a structured persona is applied.
Three Failure Modes
Base models fail in predictable, model-specific ways:
| Failure Mode | Behavior |
|---|---|
| Drift | Starts with a position, gradually adopts whatever direction the interviewer pushes. |
| Refusal | Refuses to engage with positions. Stays neutral to the point of being unhelpful. |
| Capture | Takes maximum positions immediately. Swings multiple points depending on interviewer framing. |
Steering vulnerability
One model swung 3.47 points on a geopolitical topic depending on whether the interviewer script was pro or contra. Same model, same topic, completely different output based on conversational pressure alone.
Persona Fixes All Three
When the same models run with a structured persona, all three failure modes resolve:
| Condition | Base Behavior | With Persona | Change |
|---|---|---|---|
| Model A (polarizing topic) | Captured, maximum position | Independent, moderated | +2.33 shift |
| Model B (polarizing topic) | Captured, maximum position | Moderated response | +1.07 shift |
| Model C (polarizing topic) | Drifting toward interviewer | Stable position | +0.37 shift |
Slope Analysis
The clearest signal is the slope of opinion scores across conversation turns:
- Base models show negative slopes: they drift toward the interviewer's position over time
- Persona models show near-zero or positive slopes: they resist steering and maintain their initial position
This is measurable evidence that persona persistence works under adversarial conversational pressure.
Directness Metrics
Personas don't just hold positions. They add human voice:
| Condition | Base Directness | Persona Directness |
|---|---|---|
| Model A | 2.07 | 2.57 |
| Model B (diffusion) | 0.54 (structured output) | 1.75 (conversational) |
| Model C | 2.73 (captured, not direct) | 1.76 (resistant, measured) |
One diffusion model outputs structured tables by default. With a persona, it outputs first-person conversational text. The structured identity spec changes the output modality entirely.
What This Proves
- Base models are unreliable opinion holders: they drift, refuse, or get captured
- Persona persistence is measurable: slope analysis quantifies resistance to steering
- All three failure modes resolve with structure: RICE definitions don't just constrain, they add coherent voice
- Model-agnostic: the same persona spec works across all tested model families
Methodology
- 57 completed runs across 5 model families
- 3 topics: polarizing public figures and geopolitical subjects
- Matched pairs: same model, same topic, with and without persona
- Scoring: position strength (-3 to +3), directness (0-3), slope across turns