Documentation Crawler & MCP Server

Created By
alizdavoodia year ago
This project provides a toolset to crawl websites wikis, tool/library documentions and generate Markdown documentation, and make that documentation searchable via a Model Context Protocol (MCP) server, designed for integration with tools like Cursor.
Overview

What is MCPDocSearch?

MCPDocSearch is a toolset designed to crawl websites, generate Markdown documentation, and make that documentation searchable via a Model Context Protocol (MCP) server, facilitating integration with tools like Cursor.

How to use MCPDocSearch?

To use MCPDocSearch, you first run the crawler_cli to crawl a website and generate a Markdown file. Then, you run the mcp_server to load and serve the documentation, allowing clients like Cursor to query the content.

Key features of MCPDocSearch?

  • Web Crawler (crawler_cli): Configurable crawling of websites with options for depth, URL patterns, and HTML cleaning.
  • MCP Server (mcp_server): Loads Markdown files, parses them into semantic chunks, and exposes tools for searching and retrieving documentation.
  • Cursor Integration: Designed for seamless operation with Cursor, allowing for easy querying of documentation.

Use cases of MCPDocSearch?

  1. Crawling and documenting API references from various websites.
  2. Creating searchable documentation for internal company resources.
  3. Integrating with tools like Cursor for enhanced documentation accessibility.

FAQ from MCPDocSearch?

  • Can MCPDocSearch crawl any website?

Yes, as long as the website allows crawling and follows the robots.txt rules.

  • Is there a limit to the crawl depth?

Yes, the maximum crawl depth is configurable, typically between 1 and 5.

  • How do I integrate MCPDocSearch with Cursor?

You need to configure a .cursor/mcp.json file in the project root with the appropriate settings for the MCP server.

Project Info
Created At
a year ago
Updated At
a year ago
Author Name
alizdavoodi
Star
8
Language
Python
License
MIT license

Recommend Servers

View All
Crevio

a day ago
Bring your real authenticated browser session to AI coding agents. Local-first MCP server + Chrome MV3 extension. No cloud. No telemetry.
@Cubenest

peek records the user's actual logged-in browser (DOM via rrweb, console events, network metadata, optional response bodies via opt-in Deep capture) through a Chrome MV3 extension. The extension ships events through a native-messaging stdio bridge to a local MCP server (peek-mcp), which persists them to a SQLite database at ~/.peek/sessions.db. AI coding agents (Claude Code, Cursor, Cline, Windsurf) read sessions from the database via 10 MCP tools: Tool What it does list_recent_sessions List recently recorded sessions (id, origin, ts, event count). get_session_summary LLM-readable narrative summary of a session. get_session_console_errors Console errors recorded in a session. get_session_network_errors Failed/notable network requests in a session. get_user_action_before_error Last N user actions before a console error. generate_playwright_repro Generate a runnable Playwright test from a session. get_dom_snapshot Reconstruct the DOM at a given timestamp. query_dom_history Timeline of attribute/text changes for a selector. request_authorization Side-panel consent for write actions (Level 3). execute_action Dispatch a UI action (gated by permission level + destructive blocklist). Why local-first matters Every other "browser session for AI" tool ships to a vendor cloud. peek's SQLite + extension live on the user's machine — no remote endpoints, no telemetry. The privacy policy (docs/peek/PRIVACY_POLICY.md) is the source of truth. Install # 1. Add the MCP server to Claude Code claude mcp add peek -- npx -y @peekdev/mcp # 2. Install the Chrome extension from the Chrome Web Store # (link added once the CWS listing is approved)

19 hours ago