Cultured Computer
Case Studies

Mercury

Diffusion-based LLM with burst generation and TTS pipeline.

Overview

Mercury is a diffusion language model by Inception Labs. Unlike autoregressive models that generate tokens one at a time, Mercury generates all tokens simultaneously using a burst generation pattern.

This changes the architecture for voice-enabled personas. Instead of streaming tokens as they arrive, the system uses a two-call pattern optimized for time-to-first-audio.

Two-Call Pattern

Mercury's burst generation means waiting for the full response before speaking. The two-call pattern solves this:

CallPurposeSearch
Call 1Short acknowledgment, fires immediately to TTSSkipped
Call 2Full response with search context, generates while Call 1 is speakingIncluded

The user hears a natural acknowledgment almost instantly while the substantive response generates in parallel.

TTS Pipeline

Sentence Detection

Response is split at sentence boundaries for natural speech chunking.

Audio Queue

Sentences are queued with a mutex to prevent overlap between chunks.

Audio Synthesis

Each sentence is converted to audio with lip-sync alignment data.

Avatar Rendering

3D avatar renders synchronized mouth movements in real time.

Garble detection

Diffusion models at low temperatures can produce garbled output: very short responses, missing punctuation, repeated words. The system applies a temperature floor with automatic garble detection and retry.

Voice Consistency

The VCS scoring system runs on Mercury responses the same as any other model:

  • Signature phrase detection against RICE Communication layer
  • Forbidden pattern matching with score penalties
  • Sentence distribution validation (short/medium/long ratios)

Without a persona, Mercury outputs structured tables. With a persona, it outputs first-person conversational text. The structured identity spec changes the output modality entirely.

Search Integration

Search operates in two tiers:

TierMethodUsed By
Tier 1Heuristic intent classificationAll models
Tier 2AI-powered upgradeNon-Mercury models

Mercury skips Tier 2. The burst generation pattern can't wait for search classification. Call 1 fires without search, Call 2 integrates results if available.

Prompt Assembly

Six layers compose the final prompt:

LayerPurpose
RICE basePersona definition and constraints
Style modifierEQ-adjusted tone parameters
Conversation contextInjected facts, session state
Deflection overridePrevents off-topic drift
Goal steeringState-appropriate steering prompts
Search contextWeb or capsule results (Call 2 only)

Model Comparison

The persona layer treats all models identically: same RICE definition, same scoring, same context injection. Identity coherence holds across all backends.

Model TypeGeneration PatternPersona Integration
Autoregressive (streaming)Token-by-tokenFull pipeline, standard streaming
Diffusion (burst)All tokens at onceTwo-call pattern for voice latency

On this page