Tune how the agent decides when the caller is done speaking and how it handles interruptions.
Turn-taking controls the rhythm of a voice conversation: when the agent decides the caller has finished speaking, how it reacts when the caller talks over it, and how long it waits before responding. These settings live on the agent’s stt_llm_tts pipeline, alongside the voice and model config, and they trade response latency against the risk of cutting the caller off.Three blocks govern this behavior:
turn_detection — which model decides the caller is done speaking.
interruption — how the agent reacts when the caller speaks while it is talking.
endpointing — how long the agent waits after speech ends before responding.
turn_detection carries a models list (at least one entry). Each entry names a provider and a model. The model returns a probability that the caller has finished; the endpointing delays below decide how that probability becomes a pause.
Provider anyreach uses a custom multilingual detector backed by a Cerebras model. It analyzes the partial transcript and recent conversation context to judge semantic completeness, then returns a probability. It supports any language. Tune it with parameters:
Parameter
Type
Default
Description
timeout
float
3.0
Seconds to wait for inference before falling back.
fallback_threshold
float
0.8
Probability returned when inference times out or errors.
max_context_items
integer
10
Maximum recent conversation items included as context.
instructions
string
built-in
Prompt that defines how completeness is judged. Omit to use the default.
api_key
string
env
Cerebras key. Falls back to the CEREBRAS_API_KEY environment variable.
If inference times out (after timeout seconds) or errors, the detector returns fallback_threshold (0.8) instead of a fresh prediction. A high fallback keeps the conversation moving when the model is briefly slow, at the cost of occasionally ending a turn early.
interruption decides what happens when the caller speaks while the agent is talking. The whole block is optional; omit it to use the agent’s defaults.
Field
Type
Default
Description
mode
enum
adaptive
adaptive or vad. adaptive weighs speech content; vad triggers on detected voice activity alone.
enabled
bool
true
Whether the caller can interrupt the agent at all.
min_duration
float
unset
Minimum seconds of caller speech required to count as an interruption.
min_words
integer
unset
Minimum number of caller words required to count as an interruption.
false_interruption_timeout
float
2.0
Seconds the agent waits before deciding an interruption was false (for example, a stray “mm-hmm”).
resume_false_interruption
bool
true
Resume the agent’s response after a false interruption is detected.
Use min_duration and min_words to filter out backchannels like “yeah” or “okay” so the agent keeps talking through them. When a brief sound does stop the agent, false_interruption_timeout plus resume_false_interruption let it pick the response back up instead of dropping it.
Set enabled to false only for scripted segments where the agent must finish, such as a required disclosure. For normal conversation, keep interruptions on so callers can talk naturally.
endpointing sets how long the agent waits after the caller stops before it responds. The turn-detection probability selects which bound applies: a high probability (the caller seems done) uses min_delay; a low one (the caller may continue) uses max_delay.
The update_endpointing action adjusts these bounds while a call is in progress. Use it to tighten responsiveness during quick back-and-forth and loosen it when the caller is likely to give a long answer. See abilities and actions for how actions are configured.
The detector, interruption rules, and endpointing delays together set the conversation’s latency and how forgiving it feels.
caller stops speaking │ ▼turn detector returns a probability │ high │ likely done low │ may continue ▼ ▼ wait min_delay wait max_delay │ │ └──────────► agent responds ◄┘
Goal
Adjust
Faster, snappier replies
Lower max_delay; keep interruptions enabled.
Fewer cut-offs on long answers
Raise max_delay; set min_words or min_duration.
Better non-English handling
Use the multilingual LiveKit model or the anyreach detector.
Resilience to slow inference
Keep a sensible timeout and fallback_threshold on the anyreach detector.
Very low endpointing delays can make the agent respond before a caller finishes a thought, especially on noisy lines. Very high delays add dead air. Start from the defaults (0.05 / 1.5) and adjust in small steps.