The three levers
1. What you index
Garbage in, garbage out. Before tuning chunking ortop_n, look at the actual chunks being retrieved (KB → source → view chunks) and ask:
- Is there boilerplate (nav, footers, copyright notices) competing with the real content?
- Are there outdated articles still in the index?
- Is the same fact present in multiple slightly-different versions, splitting the retrieval signal?
2. How you chunk
See Chunking and embeddings. Quick guide:- Switch to structure-based for any source where answers naturally live inside a section or list.
- Lower the chunk size (e.g. 500 chars) if your content is dense and you want more focused retrieval.
- Raise the chunk size (e.g. 1500 chars) if answers span multiple sentences and the LLM keeps missing context.
3. How many chunks (top_n)
Default is 5. Adjustments:
top_n | Use when |
|---|---|
3 | Content is highly precise, you want the LLM not to be distracted by lower-ranked matches |
5 (default) | Most use cases |
8-10 | Answers often require context from multiple chunks (e.g. a process spread over several sections) |
15+ | Rarely useful; LLM starts to lose focus |
top_n per agent attachment, or per explicit knowledge_base tool.
Testing retrieval directly
Use the API to test queries without involving an agent:- Validate retrieval quality outside the agent loop
- Build automated regression tests for KB changes
- Wire KB queries into workflows directly via an HTTP step
Adding rewrites for hard cases
If callers ask a question in many different phrasings, add an FAQ-style rewrite source:Q: How long is the warranty? A: All products carry a 2-year limited warranty from the purchase date. See [warranty.md] for full details.This dense Q&A chunk will match casual phrasings (“how long do I get to return”, “is there a warranty”, “what about defects”) much better than the original policy doc. A small “FAQ rewrites” source with 50-100 high-traffic Q&As often outperforms ten times the volume of original docs.
Multi-KB strategies
When content spans clearly distinct domains, splitting into multiple KBs and attaching all of them to one agent works better than one large KB:- The agent retrieves from each KB with the same
top_n - Each KB has more focused embeddings (less semantic crowding)
- You can swap one KB’s content without recomputing others
When tuning isn’t enough
If you’ve tunedtop_n, switched chunking, cleaned up sources, and still miss key answers, the underlying problem is usually:
- The answer literally isn’t in the source content. Add it.
- The query’s phrasing is so different from the doc that no embedding model bridges the gap. Add Q&A rewrites.
- Your agent’s prompt is suppressing KB-grounded answers. Check the system prompt for instructions like “answer concisely” — the LLM may be skipping context to be brief.

