Opinion Coherence

Overview

We ran 57 evaluation runs across 3 controversial topics and 5 model families to test whether LLMs can hold consistent opinions under conversational pressure. The results reveal three distinct failure modes that all disappear when a structured persona is applied.

Three Failure Modes

Base models fail in predictable, model-specific ways:

Failure Mode	Behavior
Drift	Starts with a position, gradually adopts whatever direction the interviewer pushes.
Refusal	Refuses to engage with positions. Stays neutral to the point of being unhelpful.
Capture	Takes maximum positions immediately. Swings multiple points depending on interviewer framing.

Steering vulnerability

One model swung 3.47 points on a geopolitical topic depending on whether the interviewer script was pro or contra. Same model, same topic, completely different output based on conversational pressure alone.

Persona Fixes All Three

When the same models run with a structured persona, all three failure modes resolve:

Condition	Base Behavior	With Persona	Change
Model A (polarizing topic)	Captured, maximum position	Independent, moderated	+2.33 shift
Model B (polarizing topic)	Captured, maximum position	Moderated response	+1.07 shift
Model C (polarizing topic)	Drifting toward interviewer	Stable position	+0.37 shift

Slope Analysis

The clearest signal is the slope of opinion scores across conversation turns:

Base models show negative slopes: they drift toward the interviewer's position over time
Persona models show near-zero or positive slopes: they resist steering and maintain their initial position

This is measurable evidence that persona persistence works under adversarial conversational pressure.

Directness Metrics

Personas don't just hold positions. They add human voice:

Condition	Base Directness	Persona Directness
Model A	2.07	2.57
Model B (diffusion)	0.54 (structured output)	1.75 (conversational)
Model C	2.73 (captured, not direct)	1.76 (resistant, measured)

One diffusion model outputs structured tables by default. With a persona, it outputs first-person conversational text. The structured identity spec changes the output modality entirely.

What This Proves

Base models are unreliable opinion holders: they drift, refuse, or get captured
Persona persistence is measurable: slope analysis quantifies resistance to steering
All three failure modes resolve with structure: RICE definitions don't just constrain, they add coherent voice
Model-agnostic: the same persona spec works across all tested model families

Methodology

57 completed runs across 5 model families
3 topics: polarizing public figures and geopolitical subjects
Matched pairs: same model, same topic, with and without persona
Scoring: position strength (-3 to +3), directness (0-3), slope across turns