When to use one
Use a KB when:- Callers ask factual questions whose answers live in static documents (FAQs, manuals, policy pages)
- The content is too long to fit in the agent’s system prompt
- The content changes infrequently — daily or less
- Per-call state (the caller’s name, prior answers in this call) — that lives in conversation context
- Real-time data (today’s stock price, current order status) — use a workflow tool instead
- Tiny content (a 2-paragraph product description) — just put it in the prompt
How it works
pending → converting_to_markdown → chunking → embedding → ready.
Datasets and sources
The dashboard says “Knowledge Base,” but the API and database use three related nouns:| Term | What it is |
|---|---|
| Dataset | A knowledge base. Identified by a dataset_id. |
| Source | A file or URL you ingested. Has its own upload state. |
| Dataset source | The attachment of a source to a KB. Carries its own per-KB processing status (converting_to_markdown → chunking → embedding → ready) and a dataset_source_id you use to detach it. |
Model choices
Pick an embedding model and dimension when you create the KB:| Model | Dimensions | Use when |
|---|---|---|
text-embedding-3-small | 768 (default) | Most use cases. Faster, cheaper. |
text-embedding-3-large | 1536 or 3072 | Long-tail factual recall, technical content where the small model misses nuance |
Supported content
- File types:
.pdf,.txt,.csv,.json,.md. Other types (including.html) are rejected — ingest web content as a URL source instead. - URLs: crawled, optionally following links. See Crawling URLs.
- Sources per KB: governed by your plan.

