Knowledge Bases
Knowledge Bases
Manage knowledge bases and their sources for RAG.
Base path
https://api.aiconnectos.com/v1/basesA knowledge base is a collection of documents (PDFs, URLs, transcripts, scraped pages) that has been chunked, embedded, and stored for retrieval. Bases plug into voice agents and chat assistants as their RAG corpus.
Source types
| Type | Notes |
|---|---|
file | Pre-uploaded file. The dashboard handles direct uploads; the API accepts a storageUrl. |
url | Public URL to a PDF, HTML page, or other text-extractable document. |
youtube | YouTube URL - transcript is extracted. |
scrape | Entire site crawled from a root URL. Respects robots.txt and a max-page cap. |
Processing pipeline
- Source ingested → uploaded to storage.
- Trigger.dev
process-sourcetask picks it up. - Docling extracts text (PDFs, HTML, Office docs).
- Text is chunked using the base's
ragSettings.chunkingStrategy. - Chunks are embedded with OpenAI
text-embedding-3-small(1536 dimensions). - Chunks are written to
source_chunkswith pgvector embeddings.
Status processing can take seconds (short PDF) to a few minutes (long YouTube video). Use webhooks instead of polling for production workflows.
RAG settings
Each base has a ragSettings JSON object that controls retrieval:
| Field | Default | Notes |
|---|---|---|
chunkingStrategy | "semantic" | "semantic" | "sentence" | "fixed" |
chunkSize | 512 | Tokens per chunk (for fixed). |
chunkOverlap | 64 | Token overlap between adjacent chunks. |
topK | 10 | Chunks returned per query. |
similarityThreshold | 0.7 | Cosine similarity floor. |
hybridSearch | true | Combine vector similarity with full-text BM25. |
Per-source overrides via ragSettingsOverride when adding a source.