Retrieval tuning - Anyreach

Once you have a working KB, retrieval quality has three knobs: what you index, how you chunk it, and how many chunks you pull at query time.

The three levers

1. What you index

Garbage in, garbage out. Before tuning chunking or top_n, look at the actual chunks being retrieved (KB → source → view chunks) and ask:

Is there boilerplate (nav, footers, copyright notices) competing with the real content?
Are there outdated articles still in the index?
Is the same fact present in multiple slightly-different versions, splitting the retrieval signal?

Cleaning up sources usually moves the needle more than any other tuning.

2. How you chunk

See Chunking and embeddings. Quick guide:

Switch to structure-based for any source where answers naturally live inside a section or list.
Lower the chunk size (e.g. 500 chars) if your content is dense and you want more focused retrieval.
Raise the chunk size (e.g. 1500 chars) if answers span multiple sentences and the LLM keeps missing context.

3. How many chunks (`top_n`)

Default is 5. Adjustments:

`top_n`	Use when
`3`	Content is highly precise, you want the LLM not to be distracted by lower-ranked matches
`5` (default)	Most use cases
`8-10`	Answers often require context from multiple chunks (e.g. a process spread over several sections)
`15+`	Rarely useful; LLM starts to lose focus

Set top_n per agent attachment, or per explicit knowledge_base tool.

Testing retrieval directly

Use the API to test queries without involving an agent:

curl -X POST https://api.anyreach.ai/knowledge-base/datasets/$DATASET_ID/query \
  -H "Authorization: Bearer $ANYREACH_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "what is the warranty period?",
    "results_count": 5
  }'

The response includes the matching chunks and their similarity scores. Use this to:

Validate retrieval quality outside the agent loop
Build automated regression tests for KB changes
Wire KB queries into workflows directly via an HTTP step

Adding rewrites for hard cases

If callers ask a question in many different phrasings, add an FAQ-style rewrite source:

Q: How long is the warranty? A: All products carry a 2-year limited warranty from the purchase date. See [warranty.md] for full details.

This dense Q&A chunk will match casual phrasings (“how long do I get to return”, “is there a warranty”, “what about defects”) much better than the original policy doc. A small “FAQ rewrites” source with 50-100 high-traffic Q&As often outperforms ten times the volume of original docs.

Multi-KB strategies

When content spans clearly distinct domains, splitting into multiple KBs and attaching all of them to one agent works better than one large KB:

The agent retrieves from each KB with the same top_n
Each KB has more focused embeddings (less semantic crowding)
You can swap one KB’s content without recomputing others

Use this when domains are unambiguous (Product / Policy / Pricing). Don’t use this if the LLM would have to guess which KB to draw from for a single question — it doesn’t choose, both are queried.

When tuning isn’t enough

If you’ve tuned top_n, switched chunking, cleaned up sources, and still miss key answers, the underlying problem is usually:

The answer literally isn’t in the source content. Add it.
The query’s phrasing is so different from the doc that no embedding model bridges the gap. Add Q&A rewrites.
Your agent’s prompt is suppressing KB-grounded answers. Check the system prompt for instructions like “answer concisely” — the LLM may be skipping context to be brief.

​The three levers

​1. What you index

​2. How you chunk

​3. How many chunks (top_n)

​Testing retrieval directly

​Adding rewrites for hard cases

​Multi-KB strategies

​When tuning isn’t enough