crawl4-mcp

Created By
ShiDuLina year ago
本项目是一个crawl4ai 的爬虫MCP SERVER,提供高级网络爬虫。使用此 MCP SERVER,您可以抓取任何内容,将内容保存为本地markdown文件,然后在任何地方将该知识用于 RAG。
Overview

What is crawl4-mcp?

crawl4-mcp is an advanced web scraping server designed for the crawl4ai project. It allows users to scrape any content from the web and save it as local markdown files for use in Retrieval-Augmented Generation (RAG).

How to use crawl4-mcp?

To use crawl4-mcp, clone the repository, set up a virtual environment, install dependencies, and run the server. Once the server is running, you can connect to it using the provided configuration.

Key features of crawl4-mcp?

  • Advanced web scraping capabilities
  • Ability to save scraped content as markdown files
  • Integration with MCP clients via SSE (Server-Sent Events)

Use cases of crawl4-mcp?

  1. Scraping data for research purposes
  2. Collecting content for knowledge management systems
  3. Automating data collection for machine learning models

FAQ from crawl4-mcp?

  • What are the environment requirements?

Python 3.12 or higher and the uv package manager are required.

  • How do I install crawl4-mcp?

Clone the repository, create a virtual environment, install dependencies, and run the server.

  • Can I integrate crawl4-mcp with other applications?

Yes! You can connect to the MCP client using the provided SSE configuration.

Project Info
Created At
a year ago
Updated At
a year ago
Author Name
ShiDuLin
Star
0
Language
Python
License
-

Recommend Servers

View All
Bring your real authenticated browser session to AI coding agents. Local-first MCP server + Chrome MV3 extension. No cloud. No telemetry.
@Cubenest

peek records the user's actual logged-in browser (DOM via rrweb, console events, network metadata, optional response bodies via opt-in Deep capture) through a Chrome MV3 extension. The extension ships events through a native-messaging stdio bridge to a local MCP server (peek-mcp), which persists them to a SQLite database at ~/.peek/sessions.db. AI coding agents (Claude Code, Cursor, Cline, Windsurf) read sessions from the database via 10 MCP tools: Tool What it does list_recent_sessions List recently recorded sessions (id, origin, ts, event count). get_session_summary LLM-readable narrative summary of a session. get_session_console_errors Console errors recorded in a session. get_session_network_errors Failed/notable network requests in a session. get_user_action_before_error Last N user actions before a console error. generate_playwright_repro Generate a runnable Playwright test from a session. get_dom_snapshot Reconstruct the DOM at a given timestamp. query_dom_history Timeline of attribute/text changes for a selector. request_authorization Side-panel consent for write actions (Level 3). execute_action Dispatch a UI action (gated by permission level + destructive blocklist). Why local-first matters Every other "browser session for AI" tool ships to a vendor cloud. peek's SQLite + extension live on the user's machine — no remote endpoints, no telemetry. The privacy policy (docs/peek/PRIVACY_POLICY.md) is the source of truth. Install # 1. Add the MCP server to Claude Code claude mcp add peek -- npx -y @peekdev/mcp # 2. Install the Chrome extension from the Chrome Web Store # (link added once the CWS listing is approved)

a day ago
Crevio

2 days ago