Web Scraping with Anthropic’s MCP

Created By
luminati-ioa year ago
Example MCP server and instructions for connecting Anthropic LLMs to external web scraping tools, with real-world examples and Bright Data integration.
Overview

What is Web Scraping with Anthropic’s MCP?

Web Scraping with Anthropic’s MCP is a project that provides an example server and instructions for connecting Anthropic's large language models (LLMs) to external web scraping tools, enabling real-time data extraction from websites.

How to use the project?

To use the project, set up an MCP server by following the provided instructions, and connect it with tools like Claude Desktop or Cursor. You can then send prompts to extract data from web pages.

Key features of the project?

  • Integration with Bright Data for seamless web scraping.
  • Standardized communication protocol for LLMs to interact with external tools.
  • Ability to fetch and extract structured data from web pages.

Use cases of the project?

  1. Extracting product information from e-commerce sites like Amazon.
  2. Gathering real-time data from news websites.
  3. Automating data collection for research purposes.

FAQ from the project?

  • Can I use this with any LLM?
    Yes, the MCP protocol is designed to work with various LLMs that support external tool integration.

  • Is there a cost associated with using Bright Data?
    Bright Data offers a range of pricing plans, including free credits for new users.

  • What programming languages are supported?
    The project primarily uses Python for server implementation, but the MCP protocol can be implemented in other languages as well.

Project Info
Created At
a year ago
Updated At
a year ago
Author Name
luminati-io
Star
0
Language
-
License
-

Recommend Servers

View All
Bring your real authenticated browser session to AI coding agents. Local-first MCP server + Chrome MV3 extension. No cloud. No telemetry.
@Cubenest

peek records the user's actual logged-in browser (DOM via rrweb, console events, network metadata, optional response bodies via opt-in Deep capture) through a Chrome MV3 extension. The extension ships events through a native-messaging stdio bridge to a local MCP server (peek-mcp), which persists them to a SQLite database at ~/.peek/sessions.db. AI coding agents (Claude Code, Cursor, Cline, Windsurf) read sessions from the database via 10 MCP tools: Tool What it does list_recent_sessions List recently recorded sessions (id, origin, ts, event count). get_session_summary LLM-readable narrative summary of a session. get_session_console_errors Console errors recorded in a session. get_session_network_errors Failed/notable network requests in a session. get_user_action_before_error Last N user actions before a console error. generate_playwright_repro Generate a runnable Playwright test from a session. get_dom_snapshot Reconstruct the DOM at a given timestamp. query_dom_history Timeline of attribute/text changes for a selector. request_authorization Side-panel consent for write actions (Level 3). execute_action Dispatch a UI action (gated by permission level + destructive blocklist). Why local-first matters Every other "browser session for AI" tool ships to a vendor cloud. peek's SQLite + extension live on the user's machine — no remote endpoints, no telemetry. The privacy policy (docs/peek/PRIVACY_POLICY.md) is the source of truth. Install # 1. Add the MCP server to Claude Code claude mcp add peek -- npx -y @peekdev/mcp # 2. Install the Chrome extension from the Chrome Web Store # (link added once the CWS listing is approved)

2 days ago