Skip to main content
The agent’s language block declares which languages it can understand and speak. It accepts a list of language codes, and the platform validates that list against the rules below and against the language support of the speech provider you select. Speech-to-text, text-to-speech, and turn detection all read from this configuration.

The language block

language is an optional object on the agent. When set, it requires a non-empty languages array.
FieldTypeDefaultDescription
languagesstring[]noneList of language codes. At least one entry is required.
{
  "language": {
    "languages": ["en-US"]
  }
}
A single-language agent may use either a base code (en) or a locale-specific code (en-US). A multi-language agent must use base codes only.

Base codes vs locale codes

A base code names a language (en, es, fr). A locale code adds a region with a hyphen (en-US, pt-BR, es-419). The rule is enforced by the agent language validator: if languages contains more than one entry, no entry may contain a hyphen. A locale-specific code is only valid when exactly one language is selected.
SelectionAllowedExample
Single languageBase or locale code["en"] or ["en-US"]
Multiple languagesBase codes only["en", "es", "fr"]
Mixing a locale code into a multi-language list is rejected. ["en-US", "es"] fails validation with Locale-specific code 'en-US' not allowed with multiple languages. Use base language code instead. Use ["en", "es"].

Per-provider language support

Each speech provider enforces its own supported-language set. Pick languages that the provider you configure on the agent’s stack actually supports. See /agents/voice-and-model-config for how providers are assigned.
ProviderRoleSupported languages
DeepgramSTT48 base codes (ar, en, es, fr, de, hi, ja, zh-HK, and more). Nova-3 also accepts locale variants such as en-US, pt-BR, es-419, and the special multi value.
Gladia (Solaria-1)STT101 base codes. Each entry in languages is validated against the supported set; an unsupported code raises Unsupported Gladia language code.
Cartesia (Sonic-3)TTS42 languages. Defaults to en.
The Deepgram and Gladia language lists are validated server-side. Submitting a code outside a provider’s supported set is rejected at save time, not silently dropped.

Deepgram multi-language mode

Deepgram Nova-3 supports a multi language value that lets a single STT stream recognize multiple languages. In this mode, key terms are disallowed: providing keyterms together with language: "multi" raises keyterms can only be provided when language is not specified as 'multi'. Set keyterms only when the language is a specific code, not multi.

Gladia code-switching

Gladia Solaria-1 accepts a languages list and a code_switching flag (default true) that lets recognition switch between the listed languages mid-utterance. An empty languages list enables auto-detection.

The anyreach meta provider

The anyreach STT provider is a meta provider: it does not bind to one engine. At runtime it resolves to Deepgram or Gladia based on the agent’s configured language, so you do not have to choose the underlying STT engine per language yourself. Its default model takes an optional vocabulary list of domain-specific terms. That vocabulary maps to Deepgram keyterms or Gladia custom_vocabulary depending on which engine is resolved.
{
  "provider": "anyreach",
  "model": {
    "name": "default",
    "parameters": {
      "vocabulary": ["AnyReach", "PostgREST", "LiveKit"]
    }
  }
}

Multilingual turn detection

Turn detection decides when the caller has finished speaking. The anyreach turn-detection provider offers a multilingual model that adapts to the agent’s language rather than assuming English, so end-of-turn timing works across the languages the agent supports. LiveKit’s turn detection exposes both an english and a multilingual model for the same reason. Pair multilingual turn detection with a multi-language agent and a multilingual-capable STT provider so every stage of the pipeline handles the same set of languages.

Putting it together

1

Decide single or multi-language

One language lets you use a locale code (en-US) for region-specific recognition and speech. Multiple languages require base codes (en, es).
2

Check provider support

Confirm each code is in the supported set for the STT and TTS providers on the agent’s stack. Use the anyreach STT provider to let the platform resolve the engine for you.
3

Choose turn detection

For multi-language agents, use a multilingual turn-detection model so end-of-turn timing adapts to the spoken language.
4

Set key terms carefully

Add keyterms (Deepgram) or vocabulary (anyreach) only when targeting a specific language. Deepgram multi mode rejects key terms.

Voice and model config

Assign STT, TTS, and turn-detection providers to the agent stack.