Knowledge Bases

Base path

https://api.aiconnectos.com/v1/bases

A knowledge base is a collection of documents (PDFs, URLs, transcripts, scraped pages) that has been chunked, embedded, and stored for retrieval. Bases plug into voice agents and chat assistants as their RAG corpus.

Source types

Type	Notes
`file`	Pre-uploaded file. The dashboard handles direct uploads; the API accepts a `storageUrl`.
`url`	Public URL to a PDF, HTML page, or other text-extractable document.
`youtube`	YouTube URL - transcript is extracted.
`scrape`	Entire site crawled from a root URL. Respects `robots.txt` and a max-page cap.

Processing pipeline

Source ingested → uploaded to storage.
Trigger.dev process-source task picks it up.
Docling extracts text (PDFs, HTML, Office docs).
Text is chunked using the base's ragSettings.chunkingStrategy.
Chunks are embedded with OpenAI text-embedding-3-small (1536 dimensions).
Chunks are written to source_chunks with pgvector embeddings.

Status processing can take seconds (short PDF) to a few minutes (long YouTube video). Use webhooks instead of polling for production workflows.

RAG settings

Each base has a ragSettings JSON object that controls retrieval:

Field	Default	Notes
`chunkingStrategy`	`"semantic"`	`"semantic"` \| `"sentence"` \| `"fixed"`
`chunkSize`	`512`	Tokens per chunk (for `fixed`).
`chunkOverlap`	`64`	Token overlap between adjacent chunks.
`topK`	`10`	Chunks returned per query.
`similarityThreshold`	`0.7`	Cosine similarity floor.
`hybridSearch`	`true`	Combine vector similarity with full-text BM25.

Per-source overrides via ragSettingsOverride when adding a source.

Base path

Source types

Processing pipeline

RAG settings

On this page