> ## Documentation Index
> Fetch the complete documentation index at: https://docs.anyreach.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Turn-taking, interruptions, and endpointing

> Tune how the agent decides when the caller is done speaking and how it handles interruptions.

Turn-taking controls the rhythm of a voice conversation: when the agent decides the caller has finished speaking, how it reacts when the caller talks over it, and how long it waits before responding. These settings live on the agent's `stt_llm_tts` pipeline, alongside the [voice and model config](/agents/voice-and-model-config), and they trade response latency against the risk of cutting the caller off.

Three blocks govern this behavior:

* `turn_detection` — which model decides the caller is done speaking.
* `interruption` — how the agent reacts when the caller speaks while it is talking.
* `endpointing` — how long the agent waits after speech ends before responding.

## Turn detection

`turn_detection` carries a `models` list (at least one entry). Each entry names a provider and a model. The model returns a probability that the caller has finished; the endpointing delays below decide how that probability becomes a pause.

### Providers

<Tabs>
  <Tab title="LiveKit">
    Provider `livekit` uses LiveKit's native turn-detection models. Pick one model by `name`:

    | `name`         | Use it for                                      |
    | -------------- | ----------------------------------------------- |
    | `english`      | English-only calls                              |
    | `multilingual` | Calls that may switch or run in other languages |

    ```json theme={null}
    {
      "turn_detection": {
        "models": [
          { "provider": "livekit", "model": { "name": "english" } }
        ]
      }
    }
    ```
  </Tab>

  <Tab title="Anyreach">
    Provider `anyreach` uses a custom multilingual detector backed by a Cerebras model. It analyzes the partial transcript and recent conversation context to judge semantic completeness, then returns a probability. It supports any language. Tune it with `parameters`:

    | Parameter            | Type    | Default  | Description                                                              |
    | -------------------- | ------- | -------- | ------------------------------------------------------------------------ |
    | `timeout`            | float   | `3.0`    | Seconds to wait for inference before falling back.                       |
    | `fallback_threshold` | float   | `0.8`    | Probability returned when inference times out or errors.                 |
    | `max_context_items`  | integer | `10`     | Maximum recent conversation items included as context.                   |
    | `instructions`       | string  | built-in | Prompt that defines how completeness is judged. Omit to use the default. |
    | `api_key`            | string  | env      | Cerebras key. Falls back to the `CEREBRAS_API_KEY` environment variable. |

    ```json theme={null}
    {
      "turn_detection": {
        "models": [
          {
            "provider": "anyreach",
            "model": {
              "name": "multilingual",
              "parameters": { "timeout": 3.0, "max_context_items": 10 }
            }
          }
        ]
      }
    }
    ```
  </Tab>
</Tabs>

<Note>
  The defaults above are the detector's own runtime defaults. When a parameter is left unset on the agent config, the detector applies these values.
</Note>

### When the detector is unavailable

If inference times out (after `timeout` seconds) or errors, the detector returns `fallback_threshold` (`0.8`) instead of a fresh prediction. A high fallback keeps the conversation moving when the model is briefly slow, at the cost of occasionally ending a turn early.

## Interruptions

`interruption` decides what happens when the caller speaks while the agent is talking. The whole block is optional; omit it to use the agent's defaults.

| Field                        | Type    | Default    | Description                                                                                             |
| ---------------------------- | ------- | ---------- | ------------------------------------------------------------------------------------------------------- |
| `mode`                       | enum    | `adaptive` | `adaptive` or `vad`. `adaptive` weighs speech content; `vad` triggers on detected voice activity alone. |
| `enabled`                    | bool    | `true`     | Whether the caller can interrupt the agent at all.                                                      |
| `min_duration`               | float   | unset      | Minimum seconds of caller speech required to count as an interruption.                                  |
| `min_words`                  | integer | unset      | Minimum number of caller words required to count as an interruption.                                    |
| `false_interruption_timeout` | float   | `2.0`      | Seconds the agent waits before deciding an interruption was false (for example, a stray "mm-hmm").      |
| `resume_false_interruption`  | bool    | `true`     | Resume the agent's response after a false interruption is detected.                                     |

Use `min_duration` and `min_words` to filter out backchannels like "yeah" or "okay" so the agent keeps talking through them. When a brief sound does stop the agent, `false_interruption_timeout` plus `resume_false_interruption` let it pick the response back up instead of dropping it.

<Tip>
  Set `enabled` to `false` only for scripted segments where the agent must finish, such as a required disclosure. For normal conversation, keep interruptions on so callers can talk naturally.
</Tip>

## Endpointing

`endpointing` sets how long the agent waits after the caller stops before it responds. The turn-detection probability selects which bound applies: a high probability (the caller seems done) uses `min_delay`; a low one (the caller may continue) uses `max_delay`.

| Field       | Type  | Default | Description                                   |
| ----------- | ----- | ------- | --------------------------------------------- |
| `min_delay` | float | `0.05`  | Shortest pause before responding, in seconds. |
| `max_delay` | float | `1.5`   | Longest pause before responding, in seconds.  |

```json theme={null}
{
  "endpointing": { "min_delay": 0.05, "max_delay": 1.5 }
}
```

### Changing endpointing mid-call

The `update_endpointing` action adjusts these bounds while a call is in progress. Use it to tighten responsiveness during quick back-and-forth and loosen it when the caller is likely to give a long answer. See [abilities and actions](/agents/abilities-and-actions) for how actions are configured.

| Field                   | Type   | Description                           |
| ----------------------- | ------ | ------------------------------------- |
| `type`                  | enum   | Always `update_endpointing`.          |
| `min_endpointing_delay` | float  | New minimum delay, in seconds.        |
| `max_endpointing_delay` | float  | New maximum delay, in seconds.        |
| `condition`             | string | Optional condition for when to apply. |

## How these settings affect call feel

The detector, interruption rules, and endpointing delays together set the conversation's latency and how forgiving it feels.

```
caller stops speaking
        │
        ▼
turn detector returns a probability
        │
   high │ likely done            low │ may continue
        ▼                            ▼
  wait min_delay                wait max_delay
        │                            │
        └──────────► agent responds ◄┘
```

| Goal                           | Adjust                                                                         |
| ------------------------------ | ------------------------------------------------------------------------------ |
| Faster, snappier replies       | Lower `max_delay`; keep interruptions enabled.                                 |
| Fewer cut-offs on long answers | Raise `max_delay`; set `min_words` or `min_duration`.                          |
| Better non-English handling    | Use the `multilingual` LiveKit model or the `anyreach` detector.               |
| Resilience to slow inference   | Keep a sensible `timeout` and `fallback_threshold` on the `anyreach` detector. |

<Warning>
  Very low endpointing delays can make the agent respond before a caller finishes a thought, especially on noisy lines. Very high delays add dead air. Start from the defaults (`0.05` / `1.5`) and adjust in small steps.
</Warning>

## Related

<CardGroup cols={2}>
  <Card title="Voice and model config" icon="waveform-lines" href="/agents/voice-and-model-config">
    Configure the STT, LLM, TTS, and VAD stack that turn-taking sits on.
  </Card>

  <Card title="Abilities and actions" icon="bolt" href="/agents/abilities-and-actions">
    Use the update\_endpointing action and other in-call behaviors.
  </Card>
</CardGroup>
