What it does
Haiku RAG is a retrieval-augmented generation system that indexes documents and answers questions with citations and page numbers. It combines vector search and full-text retrieval via Reciprocal Rank Fusion, supports multimodal search (text queries over figures, image queries over documents), and wraps agentic skills for question-answering, vision-aware QA, and analysis via sandboxed Python. Built on LanceDB (local embeddings), Pydantic AI (agent logic), and Docling (document structure).
Who it's for
Backend engineers and developers building document-aware AI applications—particularly those needing embeddings-agnostic retrieval (Ollama, OpenAI, VoyageAI) without external servers. Teams building conversational interfaces over knowledge bases (internal docs, research papers, PDFs) or running complex document analysis tasks.
Common use cases
- Query a local PDF or web-content collection and retrieve answers with citations
- Build a multi-turn conversational assistant over internal documentation or research archives
- Analyze documents programmatically via sandboxed Python (cross-document aggregation, metric extraction)
- Search documents by image and retrieve matching figures with multimodal embeddings
- Continuously ingest and index documents from filesystem, S3, HTTP, and WebDAV sources
Setup pitfalls
- Requires a pre-configured embedding provider (Ollama, OpenAI, VoyageAI) before indexing
- Python 3.12+ required
- Reads and writes to local filesystem; ensure proper permissions and consider S3/GCS backend for production deployments
- Multimodal and vision features require additional vLLM setup and vision-capable models
- Use
haiku.rag-slimfor minimal installs; install only the extras you need