Skip to main content
Programmatic KB management is essential for any team with more than a handful of sources. Use it to:
  • Sync content from your CMS on a schedule
  • Build pattern-based ingestion (one KB per product, populated from canonical URLs)
  • Run KB queries from workflows without going through an agent
Full reference: Knowledge Base API. All paths are under the /knowledge-base prefix.

Create a KB

curl -X POST https://api.anyreach.ai/knowledge-base/datasets \
  -H "Authorization: Bearer $ANYREACH_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Product FAQ",
    "description": "Top customer questions about our flagship product.",
    "embedding_model": "text-embedding-3-small",
    "embedding_dimension": 768
  }'
The response has the dataset_id you’ll use for all subsequent calls. (In the API and database a knowledge base is called a dataset.)

Add sources

One file at a time

curl -X POST https://api.anyreach.ai/knowledge-base/datasets/$DATASET_ID/sources \
  -H "Authorization: Bearer $ANYREACH_TOKEN" \
  -F "file=@./faq.md" \
  -F 'metadata={"name":"FAQ","chunking_strategy":"structure_based"}'

One URL at a time

curl -X POST https://api.anyreach.ai/knowledge-base/datasets/$DATASET_ID/sources \
  -H "Authorization: Bearer $ANYREACH_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "type": "URL",
    "url": "https://docs.example.com/getting-started",
    "name": "Getting Started"
  }'
See Crawling URLs for the crawl options you can pass with a URL source.

Pattern-based bulk add

curl -X POST https://api.anyreach.ai/knowledge-base/datasets/$DATASET_ID/sources/pattern \
  -H "Authorization: Bearer $ANYREACH_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "url_pattern": "https://docs.example.com/help/{slug}",
    "slugs": ["billing", "returns", "shipping"]
  }'

Poll source readiness

A source is queryable when its processing status reaches ready. Poll the dataset’s sources:
curl https://api.anyreach.ai/knowledge-base/datasets/$DATASET_ID/sources \
  -H "Authorization: Bearer $ANYREACH_TOKEN"
Response includes each source’s status (pending, converting_to_markdown, chunking, embedding, ready, failed) plus total_chunks and processed_chunks for in-flight sources.

Query a KB

Run retrieval without going through an agent:
curl -X POST https://api.anyreach.ai/knowledge-base/datasets/$DATASET_ID/query \
  -H "Authorization: Bearer $ANYREACH_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "what is the return window?",
    "results_count": 5
  }'
Response: a list of matching chunks with similarity scores. This is also how HTTP steps in workflows can call your KB — point them at the /knowledge-base/datasets/{id}/query endpoint.

Sync pattern: idempotent CMS → KB

There is no in-place “refresh” of a source — re-ingesting means deleting and re-adding. A typical nightly sync workflow:
# 1. List canonical URLs from your CMS
canonical = fetch_cms_canonical_urls()

# 2. List existing sources in the KB
existing = list_kb_sources(dataset_id)

# 3. Compute the diff
to_add     = canonical - existing.urls
to_remove  = existing.urls - canonical
to_refresh = canonical & existing.urls   # re-ingest = delete + re-add

# 4. Apply
for url in to_add:     add_url_source(dataset_id, url)
for url in to_remove:  delete_source(dataset_id, source_id_for(url))
for url in to_refresh:
    delete_source(dataset_id, source_id_for(url))
    add_url_source(dataset_id, url)
Wire this as a workflow on a daily cron trigger using a code step.

Limits and rate

  • The add-source endpoint accepts an array of sources; use /sources/pattern for large URL batches
  • Embedding throughput is bounded by your OpenAI/Azure quota
  • Query rate is bounded by your plan