AI Connect OSAI Connect OS Documentation
DocumentationAPI Reference
Knowledge Bases

Knowledge Bases

Manage knowledge bases and their sources for RAG.

Base path

https://api.aiconnectos.com/v1/bases

A knowledge base is a collection of documents (PDFs, URLs, transcripts, scraped pages) that has been chunked, embedded, and stored for retrieval. Bases plug into voice agents and chat assistants as their RAG corpus.

Source types

TypeNotes
filePre-uploaded file. The dashboard handles direct uploads; the API accepts a storageUrl.
urlPublic URL to a PDF, HTML page, or other text-extractable document.
youtubeYouTube URL - transcript is extracted.
scrapeEntire site crawled from a root URL. Respects robots.txt and a max-page cap.

Processing pipeline

  1. Source ingested → uploaded to storage.
  2. Trigger.dev process-source task picks it up.
  3. Docling extracts text (PDFs, HTML, Office docs).
  4. Text is chunked using the base's ragSettings.chunkingStrategy.
  5. Chunks are embedded with OpenAI text-embedding-3-small (1536 dimensions).
  6. Chunks are written to source_chunks with pgvector embeddings.

Status processing can take seconds (short PDF) to a few minutes (long YouTube video). Use webhooks instead of polling for production workflows.

RAG settings

Each base has a ragSettings JSON object that controls retrieval:

FieldDefaultNotes
chunkingStrategy"semantic""semantic" | "sentence" | "fixed"
chunkSize512Tokens per chunk (for fixed).
chunkOverlap64Token overlap between adjacent chunks.
topK10Chunks returned per query.
similarityThreshold0.7Cosine similarity floor.
hybridSearchtrueCombine vector similarity with full-text BM25.

Per-source overrides via ragSettingsOverride when adding a source.

On this page