What it does
Semble is a code search library built for AI agents. It takes natural language queries and returns exact code snippets from a codebase, using approximately 98% fewer tokens than traditional grep-and-read approaches. The tool indexes and searches full codebases end-to-end in under a second, achieving roughly 200x faster indexing and 10x faster queries than code-specialized transformer models while maintaining 99% of their retrieval quality. It runs entirely on CPU with no external API keys, GPU, or remote services required.
Who it's for
Semble is built for coding agents—Claude Code, Cursor, Codex, OpenCode, and similar tools—and the developers who rely on them to navigate large codebases. It's useful anywhere an agent needs instant, precise code references without iterative file reading and grepping.
Common use cases
- Query a codebase in natural language to locate specific implementation patterns (e.g., "authentication flow," "error handling").
- Search remote repositories cloned on demand via git URL.
- Filter results by content type—code (default), docs, config, or all files.
- Find code snippets related to a known file and line number using semantic similarity.
- Integrate code search directly into agent workflows as an MCP server, CLI tool, or dedicated sub-agent.
Setup pitfalls
- Requires filesystem write access to cache indexes automatically; provide a persistent cache directory if running sandboxed.
- Makes network calls to fetch remote repositories; ensure outbound git and HTTPS connectivity are available.
- Respects
.gitignoreand.sembleignorepatterns; verify that key files aren't accidentally excluded from indexing. - The interactive installer (
semble install) autodetects agents; manual MCP configuration is required if auto-detection fails.