Web Scraper MCP

Created By
navin4078a year ago
Scrape websites and let them talk to your LLM
Overview

What is MCP Web Scraper?

MCP Web Scraper is a lightweight and efficient web scraping server that allows users to scrape websites and interact with their data using the Model Context Protocol (MCP).

How to use MCP Web Scraper?

To use MCP Web Scraper, you can either automate the setup by cloning the repository and running the setup script, or manually set it up by creating a virtual environment and installing the required dependencies.

Key features of MCP Web Scraper?

  • Text, link, image, and table data extraction with CSS selectors.
  • Comprehensive metadata extraction including Open Graph and Twitter Cards.
  • Integration with Claude Desktop for seamless operation.
  • Configurable result limits and error handling.

Use cases of MCP Web Scraper?

  1. Extracting text content from various websites.
  2. Gathering headlines and metadata for news articles.
  3. Scraping images and tables for data analysis.

FAQ from MCP Web Scraper?

  • Can MCP Web Scraper handle all types of websites?

Yes, it can scrape a wide variety of websites as long as they allow it in their robots.txt file.

  • Is there a limit to the number of results I can scrape?

Yes, you can configure the maximum number of results to prevent overload.

  • What dependencies does MCP Web Scraper require?

It requires libraries like requests, beautifulsoup4, and lxml for web scraping.

Server Config

{
  "mcpServers": {
    "web-scraper": {
      "command": "/full/path/to/your/venv/bin/python",
      "args": [
        "/full/path/to/your/app_mcp.py"
      ]
    }
  }
}
Project Info
Created At
a year ago
Updated At
a year ago
Author Name
navin4078
Star
-
Language
-
License
-

Recommend Servers

View All
Bring your real authenticated browser session to AI coding agents. Local-first MCP server + Chrome MV3 extension. No cloud. No telemetry.
@Cubenest

peek records the user's actual logged-in browser (DOM via rrweb, console events, network metadata, optional response bodies via opt-in Deep capture) through a Chrome MV3 extension. The extension ships events through a native-messaging stdio bridge to a local MCP server (peek-mcp), which persists them to a SQLite database at ~/.peek/sessions.db. AI coding agents (Claude Code, Cursor, Cline, Windsurf) read sessions from the database via 10 MCP tools: Tool What it does list_recent_sessions List recently recorded sessions (id, origin, ts, event count). get_session_summary LLM-readable narrative summary of a session. get_session_console_errors Console errors recorded in a session. get_session_network_errors Failed/notable network requests in a session. get_user_action_before_error Last N user actions before a console error. generate_playwright_repro Generate a runnable Playwright test from a session. get_dom_snapshot Reconstruct the DOM at a given timestamp. query_dom_history Timeline of attribute/text changes for a selector. request_authorization Side-panel consent for write actions (Level 3). execute_action Dispatch a UI action (gated by permission level + destructive blocklist). Why local-first matters Every other "browser session for AI" tool ships to a vendor cloud. peek's SQLite + extension live on the user's machine — no remote endpoints, no telemetry. The privacy policy (docs/peek/PRIVACY_POLICY.md) is the source of truth. Install # 1. Add the MCP server to Claude Code claude mcp add peek -- npx -y @peekdev/mcp # 2. Install the Chrome extension from the Chrome Web Store # (link added once the CWS listing is approved)

a day ago