Turn-taking, interruptions, and endpointing

Turn-taking controls the rhythm of a voice conversation: when the agent decides the caller has finished speaking, how it reacts when the caller talks over it, and how long it waits before responding. These settings live on the agent’s stt_llm_tts pipeline, alongside the voice and model config, and they trade response latency against the risk of cutting the caller off. Three blocks govern this behavior:

turn_detection — which model decides the caller is done speaking.
interruption — how the agent reacts when the caller speaks while it is talking.
endpointing — how long the agent waits after speech ends before responding.

Turn detection

turn_detection carries a models list (at least one entry). Each entry names a provider and a model. The model returns a probability that the caller has finished; the endpointing delays below decide how that probability becomes a pause.

Providers

LiveKit
Anyreach

Provider livekit uses LiveKit’s native turn-detection models. Pick one model by name:

`name`	Use it for
`english`	English-only calls
`multilingual`	Calls that may switch or run in other languages

{
  "turn_detection": {
    "models": [
      { "provider": "livekit", "model": { "name": "english" } }
    ]
  }
}

Provider anyreach uses a custom multilingual detector backed by a Cerebras model. It analyzes the partial transcript and recent conversation context to judge semantic completeness, then returns a probability. It supports any language. Tune it with parameters:

Parameter	Type	Default	Description
`timeout`	float	`3.0`	Seconds to wait for inference before falling back.
`fallback_threshold`	float	`0.8`	Probability returned when inference times out or errors.
`max_context_items`	integer	`10`	Maximum recent conversation items included as context.
`instructions`	string	built-in	Prompt that defines how completeness is judged. Omit to use the default.
`api_key`	string	env	Cerebras key. Falls back to the `CEREBRAS_API_KEY` environment variable.

{
  "turn_detection": {
    "models": [
      {
        "provider": "anyreach",
        "model": {
          "name": "multilingual",
          "parameters": { "timeout": 3.0, "max_context_items": 10 }
        }
      }
    ]
  }
}

The defaults above are the detector’s own runtime defaults. When a parameter is left unset on the agent config, the detector applies these values.

When the detector is unavailable

If inference times out (after timeout seconds) or errors, the detector returns fallback_threshold (0.8) instead of a fresh prediction. A high fallback keeps the conversation moving when the model is briefly slow, at the cost of occasionally ending a turn early.

Interruptions

interruption decides what happens when the caller speaks while the agent is talking. The whole block is optional; omit it to use the agent’s defaults.

Field	Type	Default	Description
`mode`	enum	`adaptive`	`adaptive` or `vad`. `adaptive` weighs speech content; `vad` triggers on detected voice activity alone.
`enabled`	bool	`true`	Whether the caller can interrupt the agent at all.
`min_duration`	float	unset	Minimum seconds of caller speech required to count as an interruption.
`min_words`	integer	unset	Minimum number of caller words required to count as an interruption.
`false_interruption_timeout`	float	`2.0`	Seconds the agent waits before deciding an interruption was false (for example, a stray “mm-hmm”).
`resume_false_interruption`	bool	`true`	Resume the agent’s response after a false interruption is detected.

Use min_duration and min_words to filter out backchannels like “yeah” or “okay” so the agent keeps talking through them. When a brief sound does stop the agent, false_interruption_timeout plus resume_false_interruption let it pick the response back up instead of dropping it.

Set enabled to false only for scripted segments where the agent must finish, such as a required disclosure. For normal conversation, keep interruptions on so callers can talk naturally.

Endpointing

endpointing sets how long the agent waits after the caller stops before it responds. The turn-detection probability selects which bound applies: a high probability (the caller seems done) uses min_delay; a low one (the caller may continue) uses max_delay.

Field	Type	Default	Description
`min_delay`	float	`0.05`	Shortest pause before responding, in seconds.
`max_delay`	float	`1.5`	Longest pause before responding, in seconds.

{
  "endpointing": { "min_delay": 0.05, "max_delay": 1.5 }
}

Changing endpointing mid-call

The update_endpointing action adjusts these bounds while a call is in progress. Use it to tighten responsiveness during quick back-and-forth and loosen it when the caller is likely to give a long answer. See abilities and actions for how actions are configured.

Field	Type	Description
`type`	enum	Always `update_endpointing`.
`min_endpointing_delay`	float	New minimum delay, in seconds.
`max_endpointing_delay`	float	New maximum delay, in seconds.
`condition`	string	Optional condition for when to apply.

How these settings affect call feel

The detector, interruption rules, and endpointing delays together set the conversation’s latency and how forgiving it feels.

caller stops speaking
        │
        ▼
turn detector returns a probability
        │
   high │ likely done            low │ may continue
        ▼                            ▼
  wait min_delay                wait max_delay
        │                            │
        └──────────► agent responds ◄┘

Goal	Adjust
Faster, snappier replies	Lower `max_delay`; keep interruptions enabled.
Fewer cut-offs on long answers	Raise `max_delay`; set `min_words` or `min_duration`.
Better non-English handling	Use the `multilingual` LiveKit model or the `anyreach` detector.
Resilience to slow inference	Keep a sensible `timeout` and `fallback_threshold` on the `anyreach` detector.

Very low endpointing delays can make the agent respond before a caller finishes a thought, especially on noisy lines. Very high delays add dead air. Start from the defaults (0.05 / 1.5) and adjust in small steps.

Voice and model config

Configure the STT, LLM, TTS, and VAD stack that turn-taking sits on.

Abilities and actions

Use the update_endpointing action and other in-call behaviors.

​Turn detection

​Providers

​When the detector is unavailable

​Interruptions

​Endpointing

​Changing endpointing mid-call

​How these settings affect call feel

​Related

Voice and model config

Abilities and actions

Turn detection

Providers

When the detector is unavailable

Interruptions

Endpointing

Changing endpointing mid-call

How these settings affect call feel

Related