What it does

Haiku RAG is a retrieval-augmented generation system that indexes documents and answers questions with citations and page numbers. It combines vector search and full-text retrieval via Reciprocal Rank Fusion, supports multimodal search (text queries over figures, image queries over documents), and wraps agentic skills for question-answering, vision-aware QA, and analysis via sandboxed Python. Built on LanceDB (local embeddings), Pydantic AI (agent logic), and Docling (document structure).

Who it's for

Backend engineers and developers building document-aware AI applications—particularly those needing embeddings-agnostic retrieval (Ollama, OpenAI, VoyageAI) without external servers. Teams building conversational interfaces over knowledge bases (internal docs, research papers, PDFs) or running complex document analysis tasks.

Common use cases

Query a local PDF or web-content collection and retrieve answers with citations
Build a multi-turn conversational assistant over internal documentation or research archives
Analyze documents programmatically via sandboxed Python (cross-document aggregation, metric extraction)
Search documents by image and retrieve matching figures with multimodal embeddings
Continuously ingest and index documents from filesystem, S3, HTTP, and WebDAV sources

Setup pitfalls

Requires a pre-configured embedding provider (Ollama, OpenAI, VoyageAI) before indexing
Python 3.12+ required
Reads and writes to local filesystem; ensure proper permissions and consider S3/GCS backend for production deployments
Multimodal and vision features require additional vLLM setup and vision-capable models
Use haiku.rag-slim for minimal installs; install only the extras you need

haiku.rag

What it does

Who it's for

Common use cases

Setup pitfalls