Show HN: Semble – Code search for agents that uses 98% fewer tokens than grep
434 points
• 1 day ago
• Article
Link
Semble 是一款为 AI 代理量身打造的代码搜索库,能快速且准确地从代码仓库中检索相关代码片段。它解决了 grep 等传统方法和全文读取带来的低效问题——这些方法会消耗大量 tokens 并增加延迟。通过只返回必要的精确代码片段,Semble 比基于 grep 的方案节省约 98% 的 tokens,同时维持高质量的检索效果。
该工具完全在 CPU 上运行,无需 API 密钥、 GPU 或外部服务,轻量且易于部署。性能表现优异:平均对一个仓库进行索引约需 250 毫秒,查询响应约 1.5 毫秒。在覆盖 19 种语言、 63 个仓库的 1,250 次查询对比测试中,Semble 的 NDCG@10 达到 0.854,检索质量与专用 transformer 模型相当,但在索引和查询速度上明显更快。
Semble 使用 Chonkie 将文件按代码语义拆分为片段;在评分时采用两种互补方法:通过 Model2Vec embeddings(potion-code-16M 模型)进行语义相似度匹配,和通过 BM25 对标识符与 API 名称做词汇匹配。两者得分以 Reciprocal Rank Fusion 合并,随后依据多种代码感知信号重排序,包括基于查询类型的自适应权重、针对符号查询的定义提升、标识符词干匹配、文件连贯性加分,以及对测试文件和遗留代码的噪声惩罚。
该库提供多种集成方式:可作为 MCP 服务器运行,兼容 Claude Code 、 Cursor 、 Codex 、 OpenCode 等 MCP 兼容代理,便于在开发环境内直接搜索代码库;也可通过 AGENTS.md 或 CLAUDE.md 的 bash 集成,让无法访问 MCP 的子代理仍能调用其能力;此外还有独立 CLI,便于脚本化和命令行使用。
主要特性包括支持本地路径与远程 git 仓库、本地目录的自动文件监控与重索引,以及用于估算搜索节省量的 token 节省追踪器。提供 Python API 以便程序化集成自定义工具。 Semble 在 MIT 许可证下开源,自发布以来在 GitHub 上已获得超过 1,100 个 star,受到广泛关注。
Semble is a code search library designed specifically for AI agents, providing fast and accurate retrieval of relevant code snippets from repositories. It addresses the inefficiency of traditional methods like grep and full-file reading, which consume excessive tokens and latency. By returning only the precise chunks of code needed, Semble uses approximately 98% fewer tokens than grep-based approaches while maintaining high retrieval quality.
The tool operates entirely on CPU without requiring API keys, GPUs, or external services, making it lightweight and easy to deploy. It achieves impressive performance benchmarks, indexing an average repository in about 250 milliseconds and answering queries in roughly 1.5 milliseconds. In comparative testing across 1,250 queries over 63 repositories in 19 languages, Semble achieved an NDCG@10 score of 0.854, matching the retrieval quality of specialized transformer models while being significantly faster at both indexing and query time.
Semble works by splitting files into code-aware chunks using Chonkie, then scoring queries against these chunks through two complementary methods: semantic similarity via Model2Vec embeddings using the potion-code-16M model, and lexical matching via BM25 for identifiers and API names. These scores are combined using Reciprocal Rank Fusion, followed by reranking with code-aware signals such as adaptive weighting based on query type, definition boosts for symbol queries, identifier stem matching, file coherence bonuses, and noise penalties for test files and legacy code.
The library offers multiple integration paths for different use cases. It can run as an MCP server compatible with Claude Code, Cursor, Codex, OpenCode, and other MCP-compatible agents, allowing direct codebase searching from within development environments. Alternatively, it provides bash integration through AGENTS.md or CLAUDE.md configuration files, enabling sub-agents that cannot access MCP tools to still leverage its capabilities. A standalone CLI is also available for scripting and direct command-line usage.
Key features include support for both local paths and remote git repositories, automatic file watching and re-indexing for local directories, and a savings tracker that estimates token savings across searches. The Python API allows programmatic access for custom tooling integration. Semble is open source under the MIT license and has gained significant traction with over 1,100 GitHub stars since its release.
144 comments • Comments Link
大量以 grep 为主训练的模型通常不信任其他工具,会反复重试或重新读取结果,从而抵消了更高效工具带来的 token 节省。通过一个全局的 CLAUDE.md 文件指示 Claude 使用 LSP 而非 grep,可以解决对工具结果不信任的问题。 RTK 在 GPT 5.5 xhigh 的 Codex CLI 上运行良好,但对不支持的标志返回错误信息,会导致额外的 token 浪费或阻止执行。为 RTK 和自定义 AI 记忆系统强制使用全局内存,可以在工具本身未失败的情况下保证工具使用的一致性。
团队对优化提示语和工具描述很感兴趣,以提高模型对语义搜索而非 grep 的采用率。语义代码搜索对人类开发者和 Agent 都很有价值,而且可以索引非代码文档(如 markdown 和 JSON)。 Hooks 被证明是执行工具使用规则的有效机制。该基准使用 NDCG 来衡量检索准确性,而不是端到端编码 Agent 的整体性能。与 grep+readfile 的调用相比,语义搜索为了达到相同结果所需的查询更少;grep 往往需要额外的上下文标志或后续读取,而语义搜索能直接提供更相关的片段,其 NDCG 分数明显优于 grep 替代方案。
讨论集中在用 Semble 、 LSP 等语义搜索工具替换 grep,以提升 AI 编码 Agent 的效率,解决模型因训练偏差对非 grep 结果的不信任问题。参与者分享了通过全局配置文件和 hooks 强制采用工具的策略,并指出语义搜索通过更少的查询提供更相关的上下文,从而减少 token 消耗。基准测试显示 grep 在检索准确性上明显落后于语义替代方案,同时语义搜索也能为人类开发者带来益处;为此,团队在持续优化提示和工具集成以提升 Agent 性能。 • Models heavily trained with grep often distrust alternative tools and repeatedly retry or reread results, negating token savings from more efficient tools.
• A global CLAUDE.md file instructing Claude to use LSP instead of grep resolved trust issues with tool results.
• RTK works well with Codex CLI on GPT 5.5 xhigh, but error messages for unsupported flags cause token waste or prevent execution.
• Forcing global memory for RTK and a custom AI memory system ensured consistent tool usage unless the tool itself failed.
• Interest in optimizing prompts and tool descriptions to improve model adoption of semantic search over grep.
• Semantic code search is valuable for humans, not just agents, and can index non-code documents like markdown and JSON.
• Hooks are an effective mechanism to enforce tool usage rules within a harness.
• The benchmark measures retrieval accuracy using NDCG, not end-to-end coding agent performance.
• Fewer semantic search queries are needed compared to grep+readfile calls to achieve the same outcome.
• Grep requires additional context flags or follow-up reads, while semantic search provides more relevant snippets directly.
• NDCG scores for grep are notably poor compared to semantic alternatives.
The discussion centers on improving AI coding agent efficiency by replacing grep with semantic search tools like Semble or LSP, addressing model distrust of non-grep results due to training biases. Participants share strategies like global configuration files and hooks to enforce tool adoption, noting that semantic search reduces token usage by providing more relevant context in fewer queries. Benchmarks focus on retrieval accuracy, with grep performing poorly compared to semantic alternatives, which also benefit human developers. The conversation highlights ongoing efforts to optimize prompts and tool integration for better agent performance.