Cultured Computer
Evaluation

Opinion Coherence

How models fail at holding opinions and how personas fix it.

Overview

We ran 57 evaluation runs across 3 controversial topics and 5 model families to test whether LLMs can hold consistent opinions under conversational pressure. The results reveal three distinct failure modes that all disappear when a structured persona is applied.

Three Failure Modes

Base models fail in predictable, model-specific ways:

Failure ModeBehavior
DriftStarts with a position, gradually adopts whatever direction the interviewer pushes.
RefusalRefuses to engage with positions. Stays neutral to the point of being unhelpful.
CaptureTakes maximum positions immediately. Swings multiple points depending on interviewer framing.

Steering vulnerability

One model swung 3.47 points on a geopolitical topic depending on whether the interviewer script was pro or contra. Same model, same topic, completely different output based on conversational pressure alone.

Persona Fixes All Three

When the same models run with a structured persona, all three failure modes resolve:

ConditionBase BehaviorWith PersonaChange
Model A (polarizing topic)Captured, maximum positionIndependent, moderated+2.33 shift
Model B (polarizing topic)Captured, maximum positionModerated response+1.07 shift
Model C (polarizing topic)Drifting toward interviewerStable position+0.37 shift

Slope Analysis

The clearest signal is the slope of opinion scores across conversation turns:

  • Base models show negative slopes: they drift toward the interviewer's position over time
  • Persona models show near-zero or positive slopes: they resist steering and maintain their initial position

This is measurable evidence that persona persistence works under adversarial conversational pressure.

Directness Metrics

Personas don't just hold positions. They add human voice:

ConditionBase DirectnessPersona Directness
Model A2.072.57
Model B (diffusion)0.54 (structured output)1.75 (conversational)
Model C2.73 (captured, not direct)1.76 (resistant, measured)

One diffusion model outputs structured tables by default. With a persona, it outputs first-person conversational text. The structured identity spec changes the output modality entirely.

What This Proves

  1. Base models are unreliable opinion holders: they drift, refuse, or get captured
  2. Persona persistence is measurable: slope analysis quantifies resistance to steering
  3. All three failure modes resolve with structure: RICE definitions don't just constrain, they add coherent voice
  4. Model-agnostic: the same persona spec works across all tested model families

Methodology

  • 57 completed runs across 5 model families
  • 3 topics: polarizing public figures and geopolitical subjects
  • Matched pairs: same model, same topic, with and without persona
  • Scoring: position strength (-3 to +3), directness (0-3), slope across turns

On this page