Skip to main content
Turn-taking controls the rhythm of a voice conversation: when the agent decides the caller has finished speaking, how it reacts when the caller talks over it, and how long it waits before responding. These settings live on the agent’s stt_llm_tts pipeline, alongside the voice and model config, and they trade response latency against the risk of cutting the caller off. Three blocks govern this behavior:
  • turn_detection — which model decides the caller is done speaking.
  • interruption — how the agent reacts when the caller speaks while it is talking.
  • endpointing — how long the agent waits after speech ends before responding.

Turn detection

turn_detection carries a models list (at least one entry). Each entry names a provider and a model. The model returns a probability that the caller has finished; the endpointing delays below decide how that probability becomes a pause.

Providers

Provider livekit uses LiveKit’s native turn-detection models. Pick one model by name:
nameUse it for
englishEnglish-only calls
multilingualCalls that may switch or run in other languages
{
  "turn_detection": {
    "models": [
      { "provider": "livekit", "model": { "name": "english" } }
    ]
  }
}
The defaults above are the detector’s own runtime defaults. When a parameter is left unset on the agent config, the detector applies these values.

When the detector is unavailable

If inference times out (after timeout seconds) or errors, the detector returns fallback_threshold (0.8) instead of a fresh prediction. A high fallback keeps the conversation moving when the model is briefly slow, at the cost of occasionally ending a turn early.

Interruptions

interruption decides what happens when the caller speaks while the agent is talking. The whole block is optional; omit it to use the agent’s defaults.
FieldTypeDefaultDescription
modeenumadaptiveadaptive or vad. adaptive weighs speech content; vad triggers on detected voice activity alone.
enabledbooltrueWhether the caller can interrupt the agent at all.
min_durationfloatunsetMinimum seconds of caller speech required to count as an interruption.
min_wordsintegerunsetMinimum number of caller words required to count as an interruption.
false_interruption_timeoutfloat2.0Seconds the agent waits before deciding an interruption was false (for example, a stray “mm-hmm”).
resume_false_interruptionbooltrueResume the agent’s response after a false interruption is detected.
Use min_duration and min_words to filter out backchannels like “yeah” or “okay” so the agent keeps talking through them. When a brief sound does stop the agent, false_interruption_timeout plus resume_false_interruption let it pick the response back up instead of dropping it.
Set enabled to false only for scripted segments where the agent must finish, such as a required disclosure. For normal conversation, keep interruptions on so callers can talk naturally.

Endpointing

endpointing sets how long the agent waits after the caller stops before it responds. The turn-detection probability selects which bound applies: a high probability (the caller seems done) uses min_delay; a low one (the caller may continue) uses max_delay.
FieldTypeDefaultDescription
min_delayfloat0.05Shortest pause before responding, in seconds.
max_delayfloat1.5Longest pause before responding, in seconds.
{
  "endpointing": { "min_delay": 0.05, "max_delay": 1.5 }
}

Changing endpointing mid-call

The update_endpointing action adjusts these bounds while a call is in progress. Use it to tighten responsiveness during quick back-and-forth and loosen it when the caller is likely to give a long answer. See abilities and actions for how actions are configured.
FieldTypeDescription
typeenumAlways update_endpointing.
min_endpointing_delayfloatNew minimum delay, in seconds.
max_endpointing_delayfloatNew maximum delay, in seconds.
conditionstringOptional condition for when to apply.

How these settings affect call feel

The detector, interruption rules, and endpointing delays together set the conversation’s latency and how forgiving it feels.
caller stops speaking


turn detector returns a probability

   high │ likely done            low │ may continue
        ▼                            ▼
  wait min_delay                wait max_delay
        │                            │
        └──────────► agent responds ◄┘
GoalAdjust
Faster, snappier repliesLower max_delay; keep interruptions enabled.
Fewer cut-offs on long answersRaise max_delay; set min_words or min_duration.
Better non-English handlingUse the multilingual LiveKit model or the anyreach detector.
Resilience to slow inferenceKeep a sensible timeout and fallback_threshold on the anyreach detector.
Very low endpointing delays can make the agent respond before a caller finishes a thought, especially on noisy lines. Very high delays add dead air. Start from the defaults (0.05 / 1.5) and adjust in small steps.

Voice and model config

Configure the STT, LLM, TTS, and VAD stack that turn-taking sits on.

Abilities and actions

Use the update_endpointing action and other in-call behaviors.