Semble

Created By

Minish3 months ago

Fast, accurate, local code search for agents. Indexes any local path or GitHub repo on demand in ~250ms and answers queries in ~1.5ms. Works on CPU, no API keys or external services.

Overview Content Tools Comments

Overview

Fast and Accurate Code Search for Agents

Semble is a code search library built for agents. It returns the exact code snippets they need instantly, cutting both token usage and waiting time on every step. Indexing and searching a full codebase end-to-end takes under a second, with ~200x faster indexing and ~10x faster queries than a code-specialized transformer, at 99% of its retrieval quality. Everything runs on CPU with no API keys, GPU, or external services. Run it as an MCP server and any agent (Claude Code, Cursor, Codex, OpenCode, etc.) gets instant access to any repo, cloned and indexed on demand.

MCP Server

Semble can run as an MCP server so agents can search any codebase directly. Repos are cloned and indexed on demand, and indexes are cached for the lifetime of the session.

Setup

Requires uv to be installed.

Claude Code

claude mcp add semble -s user -- uvx --from "semble[mcp]" semble

Codex

Add to ~/.codex/config.toml:

[mcp_servers.semble]
command = "uvx"                                                                                                                                                                           
args = ["--from", "semble[mcp]", "semble"]

OpenCode

Add to ~/.opencode/config.json:

{
  "mcp": {                                                                                                                                                                                
    "semble": {
      "type": "local",                                                                                                                                                                    
      "command": ["uvx", "--from", "semble[mcp]", "semble"]
    }
  }
}

Cursor

Add to ~/.cursor/mcp.json (or .cursor/mcp.json in your project):

{
  "mcpServers": {
    "semble": {
      "command": "uvx",                                                                                                                                                                   
      "args": ["--from", "semble[mcp]", "semble"]
    }                                                                                                                                                                                     
  }                                                       
}

Tools

Tool	Description
`search`	Search a codebase with a natural-language or code query. Pass `repo` as a git URL or local path.
`find_related`	Given a file path and line number, return chunks semantically similar to the code at that location.

Main Features

Fast: indexes a repo in ~250 ms and answers queries in ~1.5 ms, all on CPU.
Accurate: NDCG@10 of 0.854 on our benchmarks, on par with code-specialized transformer models, at a fraction of the size and cost.
Local and remote: pass a local path or a git URL.
MCP server: drop-in tool for Claude Code, Cursor, Codex, OpenCode, and any other MCP-compatible agent.
Zero setup: runs on CPU with no API keys, GPU, or external services required.

Quickstart

pip install semble  # Install with pip                    
uv add semble       # Install with uv

from semble import SembleIndex                            
                                                                                                                                                                                          
# Index a local directory
index = SembleIndex.from_path("./my-project")                                                                                                                                             
                                                                                                                                                                                          
# Index a remote git repository
index = SembleIndex.from_git("https://github.com/MinishLab/model2vec")                                                                                                                    
                                                                                                                                                                                          
# Search the index with a natural-language or code query
results = index.search("save model to disk", top_k=3)                                                                                                                                     
                                                                                                                                                                                          
# Find code similar to a specific result
related = index.find_related(results[0], top_k=3)                                                                                                                                         
                                                                                                                                                                                          
# Each result exposes the matched chunk
result = results[0]                                                                                                                                                                       
result.chunk.file_path   # "model2vec/model.py"           
result.chunk.start_line  # 127                                                                                                                                                            
result.chunk.end_line    # 150                            
result.chunk.content     # "def save_pretrained(self, path: PathLike, ..."

How it works

Semble splits each file into code-aware chunks using Chonkie, then scores every query against the chunks with two complementary retrievers:
static Model2Vec embeddings using the code-specialized potion-code-16M model for semantic similarity, and BM25 for lexical matches on identifiers and API names. The two score lists are fused with Reciprocal Rank Fusion (RRF).

Results are then reranked with code-aware signals: adaptive lexical/semantic weighting for symbol-like queries, definition boosts, identifier stem matching, file coherence, and noise
penalties for tests and legacy shims. Because the embedding model is static with no transformer forward pass at query time, all of this runs in milliseconds on CPU.

Benchmarks

Speed vs quality

We benchmark quality and speed across all methods on ~1,250 queries over 63 repositories in 19 languages.

Method	NDCG@10	Index time	Query p50
CodeRankEmbed Hybrid	0.862	57 s	16 ms
semble	0.854	263 ms	1.5 ms
CodeRankEmbed	0.765	57 s	16 ms
ColGREP	0.693	5.8 s	124 ms
BM25	0.673	263 ms	0.02 ms
ripgrep	0.126	—	12 ms

Semble achieves 99% of the performance of the 137M-parameter CodeRankEmbed Hybrid, while indexing 218x faster and answering queries 11x faster.

License

MIT

Citing

If you use Semble in your research, please cite the following:

@software{minishlab2026semble,
  author       = {{van Dongen}, Thomas and Stephan Tulkens},                                                                                                                              
  title        = {Semble: Fast and Accurate Code Search for Agents},                                                                                                                      
  year         = {2026},                                                                                                                                                                  
  publisher    = {Zenodo},                                                                                                                                                                
  doi          = {10.5281/zenodo.19785932},                                                                                                                                               
  url          = {https://github.com/MinishLab/semble},                                                                                                                                   
  license      = {MIT}                                                                                                                                                                    
}

Try in Playground

Server Config

{
  "mcpServers": {
    "semble": {
      "command": "uvx",
      "args": [
        "--from",
        "semble[mcp]",
        "semble"
      ]
    }
  }
}

Project Info

Created At

3 months ago

Updated At

3 months ago

Author Name

Minish

Star

Language

License

Semble

Fast and Accurate Code Search for Agents

MCP Server

Setup

Claude Code

Codex

OpenCode

Cursor

Tools

Main Features

Quickstart

How it works

Benchmarks

License

Citing

Server Config

Recommend Servers