> ## Documentation Index
> Fetch the complete documentation index at: https://docs.anyreach.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Programmatic KBs

> Manage knowledge bases and sources via the API.

Programmatic KB management is essential for any team with more than a handful of sources. Use it to:

* Sync content from your CMS on a schedule
* Build pattern-based ingestion (one KB per product, populated from canonical URLs)
* Run KB queries from workflows without going through an agent

Full reference: [Knowledge Base API](/api-reference/knowledge-bases/overview). All paths are under the `/knowledge-base` prefix.

## Create a KB

```bash theme={null}
curl -X POST https://api.anyreach.ai/knowledge-base/datasets \
  -H "Authorization: Bearer $ANYREACH_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Product FAQ",
    "description": "Top customer questions about our flagship product.",
    "embedding_model": "text-embedding-3-small",
    "embedding_dimension": 768
  }'
```

The response has the `dataset_id` you'll use for all subsequent calls. (In the API and database a knowledge base is called a **dataset**.)

## Add sources

### One file at a time

```bash theme={null}
curl -X POST https://api.anyreach.ai/knowledge-base/datasets/$DATASET_ID/sources \
  -H "Authorization: Bearer $ANYREACH_TOKEN" \
  -F "file=@./faq.md" \
  -F 'metadata={"name":"FAQ","chunking_strategy":"structure_based"}'
```

### One URL at a time

```bash theme={null}
curl -X POST https://api.anyreach.ai/knowledge-base/datasets/$DATASET_ID/sources \
  -H "Authorization: Bearer $ANYREACH_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "type": "URL",
    "url": "https://docs.example.com/getting-started",
    "name": "Getting Started"
  }'
```

See [Crawling URLs](/knowledge-bases/crawling-urls) for the crawl options you can pass with a URL source.

### Pattern-based bulk add

```bash theme={null}
curl -X POST https://api.anyreach.ai/knowledge-base/datasets/$DATASET_ID/sources/pattern \
  -H "Authorization: Bearer $ANYREACH_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "url_pattern": "https://docs.example.com/help/{slug}",
    "slugs": ["billing", "returns", "shipping"]
  }'
```

## Poll source readiness

A source is queryable when its processing status reaches `ready`. Poll the dataset's sources:

```bash theme={null}
curl https://api.anyreach.ai/knowledge-base/datasets/$DATASET_ID/sources \
  -H "Authorization: Bearer $ANYREACH_TOKEN"
```

Response includes each source's status (`pending`, `converting_to_markdown`, `chunking`, `embedding`, `ready`, `failed`) plus `total_chunks` and `processed_chunks` for in-flight sources.

## Query a KB

Run retrieval without going through an agent:

```bash theme={null}
curl -X POST https://api.anyreach.ai/knowledge-base/datasets/$DATASET_ID/query \
  -H "Authorization: Bearer $ANYREACH_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "what is the return window?",
    "results_count": 5
  }'
```

Response: a list of matching chunks with similarity scores.

This is also how [HTTP steps in workflows](/workflows/steps/http) can call your KB — point them at the `/knowledge-base/datasets/{id}/query` endpoint.

## Sync pattern: idempotent CMS → KB

There is no in-place "refresh" of a source — re-ingesting means deleting and re-adding. A typical nightly sync workflow:

```python theme={null}
# 1. List canonical URLs from your CMS
canonical = fetch_cms_canonical_urls()

# 2. List existing sources in the KB
existing = list_kb_sources(dataset_id)

# 3. Compute the diff
to_add     = canonical - existing.urls
to_remove  = existing.urls - canonical
to_refresh = canonical & existing.urls   # re-ingest = delete + re-add

# 4. Apply
for url in to_add:     add_url_source(dataset_id, url)
for url in to_remove:  delete_source(dataset_id, source_id_for(url))
for url in to_refresh:
    delete_source(dataset_id, source_id_for(url))
    add_url_source(dataset_id, url)
```

Wire this as a workflow on a daily cron trigger using a code step.

## Limits and rate

* The add-source endpoint accepts an array of sources; use `/sources/pattern` for large URL batches
* Embedding throughput is bounded by your OpenAI/Azure quota
* Query rate is bounded by your plan
