language block declares which languages it can understand and speak. It accepts a list of language codes, and the platform validates that list against the rules below and against the language support of the speech provider you select. Speech-to-text, text-to-speech, and turn detection all read from this configuration.
The language block
language is an optional object on the agent. When set, it requires a non-empty languages array.
| Field | Type | Default | Description |
|---|---|---|---|
languages | string[] | none | List of language codes. At least one entry is required. |
en) or a locale-specific code (en-US). A multi-language agent must use base codes only.
Base codes vs locale codes
A base code names a language (en, es, fr). A locale code adds a region with a hyphen (en-US, pt-BR, es-419).
The rule is enforced by the agent language validator: if languages contains more than one entry, no entry may contain a hyphen. A locale-specific code is only valid when exactly one language is selected.
| Selection | Allowed | Example |
|---|---|---|
| Single language | Base or locale code | ["en"] or ["en-US"] |
| Multiple languages | Base codes only | ["en", "es", "fr"] |
Per-provider language support
Each speech provider enforces its own supported-language set. Pick languages that the provider you configure on the agent’s stack actually supports. See /agents/voice-and-model-config for how providers are assigned.| Provider | Role | Supported languages |
|---|---|---|
| Deepgram | STT | 48 base codes (ar, en, es, fr, de, hi, ja, zh-HK, and more). Nova-3 also accepts locale variants such as en-US, pt-BR, es-419, and the special multi value. |
| Gladia (Solaria-1) | STT | 101 base codes. Each entry in languages is validated against the supported set; an unsupported code raises Unsupported Gladia language code. |
| Cartesia (Sonic-3) | TTS | 42 languages. Defaults to en. |
The Deepgram and Gladia language lists are validated server-side. Submitting a code outside a provider’s supported set is rejected at save time, not silently dropped.
Deepgram multi-language mode
Deepgram Nova-3 supports amulti language value that lets a single STT stream recognize multiple languages. In this mode, key terms are disallowed: providing keyterms together with language: "multi" raises keyterms can only be provided when language is not specified as 'multi'. Set keyterms only when the language is a specific code, not multi.
Gladia code-switching
Gladia Solaria-1 accepts alanguages list and a code_switching flag (default true) that lets recognition switch between the listed languages mid-utterance. An empty languages list enables auto-detection.
The anyreach meta provider
Theanyreach STT provider is a meta provider: it does not bind to one engine. At runtime it resolves to Deepgram or Gladia based on the agent’s configured language, so you do not have to choose the underlying STT engine per language yourself.
Its default model takes an optional vocabulary list of domain-specific terms. That vocabulary maps to Deepgram keyterms or Gladia custom_vocabulary depending on which engine is resolved.
Multilingual turn detection
Turn detection decides when the caller has finished speaking. Theanyreach turn-detection provider offers a multilingual model that adapts to the agent’s language rather than assuming English, so end-of-turn timing works across the languages the agent supports. LiveKit’s turn detection exposes both an english and a multilingual model for the same reason.
Pair multilingual turn detection with a multi-language agent and a multilingual-capable STT provider so every stage of the pipeline handles the same set of languages.
Putting it together
Decide single or multi-language
One language lets you use a locale code (
en-US) for region-specific recognition and speech. Multiple languages require base codes (en, es).Check provider support
Confirm each code is in the supported set for the STT and TTS providers on the agent’s stack. Use the
anyreach STT provider to let the platform resolve the engine for you.Choose turn detection
For multi-language agents, use a multilingual turn-detection model so end-of-turn timing adapts to the spoken language.
Voice and model config
Assign STT, TTS, and turn-detection providers to the agent stack.

