mcp-server-webcrawl

Created By
pragmara year ago
Bridge the gap between your web crawler and AI language models using Model Context Protocol (MCP). With mcp-server-webcrawl, your AI client filters and analyzes web content under your direction or autonomously, extracting insights from your web content. Support for WARC, wget, InterroBot, Katana, and SiteOne crawlers is available out of the gate. The server includes a full-text search interface with boolean support, resource filtering by type, HTTP status, and more.
Overview

what is mcp-server-webcrawl?

mcp-server-webcrawl is an open-source server that bridges the gap between web crawlers and AI language models using the Model Context Protocol (MCP). It allows AI clients to filter and analyze web content, extracting insights either under user direction or autonomously.

how to use mcp-server-webcrawl?

To use mcp-server-webcrawl, install it via pip with the command: pip install mcp-server-webcrawl. You can then run the server using the command: mcp-server-webcrawl --crawler wget --datasrc /path/to/wget/archives/.

key features of mcp-server-webcrawl?

  • Compatibility with Claude Desktop
  • Full-text search interface with boolean support
  • Resource filtering by type and HTTP status
  • Support for various crawlers including wget, WARC, and more
  • Ability to augment your LLM knowledge base
  • ChatGPT support is coming soon

use cases of mcp-server-webcrawl?

  1. Analyzing web content for research purposes
  2. Extracting insights from large datasets collected by web crawlers
  3. Enhancing AI language models with real-time web data

FAQ from mcp-server-webcrawl?

  • Is mcp-server-webcrawl free to use?

Yes! mcp-server-webcrawl is free and open-source.

  • What are the system requirements?

It requires Claude Desktop and Python version 3.10 or higher.

  • Which crawlers are supported?

It supports wget, WARC, InterroBot, Katana, and SiteOne crawlers.

Server Config

{
  "mcpServers": {
    "webcrawl": {
      "command": "mcp-server-webcrawl",
      "args": [
        "--crawler",
        "wget",
        "--datasrc",
        "/path/to/wget/archives/"
      ]
    }
  }
}
Project Info
Created At
a year ago
Updated At
a year ago
Author Name
pragmar
Star
-
Language
-
License
-
Category

Recommend Servers

View All
Bring your real authenticated browser session to AI coding agents. Local-first MCP server + Chrome MV3 extension. No cloud. No telemetry.
@Cubenest

peek records the user's actual logged-in browser (DOM via rrweb, console events, network metadata, optional response bodies via opt-in Deep capture) through a Chrome MV3 extension. The extension ships events through a native-messaging stdio bridge to a local MCP server (peek-mcp), which persists them to a SQLite database at ~/.peek/sessions.db. AI coding agents (Claude Code, Cursor, Cline, Windsurf) read sessions from the database via 10 MCP tools: Tool What it does list_recent_sessions List recently recorded sessions (id, origin, ts, event count). get_session_summary LLM-readable narrative summary of a session. get_session_console_errors Console errors recorded in a session. get_session_network_errors Failed/notable network requests in a session. get_user_action_before_error Last N user actions before a console error. generate_playwright_repro Generate a runnable Playwright test from a session. get_dom_snapshot Reconstruct the DOM at a given timestamp. query_dom_history Timeline of attribute/text changes for a selector. request_authorization Side-panel consent for write actions (Level 3). execute_action Dispatch a UI action (gated by permission level + destructive blocklist). Why local-first matters Every other "browser session for AI" tool ships to a vendor cloud. peek's SQLite + extension live on the user's machine — no remote endpoints, no telemetry. The privacy policy (docs/peek/PRIVACY_POLICY.md) is the source of truth. Install # 1. Add the MCP server to Claude Code claude mcp add peek -- npx -y @peekdev/mcp # 2. Install the Chrome extension from the Chrome Web Store # (link added once the CWS listing is approved)

a day ago