Webscraper MCP

Created By
saishridhara year ago
MCP server that transcribes webpages for LLMs to use by providing the url to the LLM.
Overview

what is Webscraper MCP?

Webscraper MCP is a server designed to transcribe web pages for large language models (LLMs) by providing the URL of the content to be scraped. It can also extract transcripts from YouTube videos and convert PDF documents into markdown text.

how to use Webscraper MCP?

To use Webscraper MCP, provide the URL of the webpage, YouTube video, or PDF document you want to scrape. The server will return the text content or transcript based on the provided link.

key features of Webscraper MCP?

  • Extracts text content from web pages.
  • Retrieves transcripts from YouTube videos.
  • Converts PDF files into markdown text.

use cases of Webscraper MCP?

  1. Scraping text from articles for research purposes.
  2. Extracting transcripts from educational YouTube videos for study materials.
  3. Converting PDF reports into editable markdown format for easier manipulation.

FAQ from Webscraper MCP?

  • Can Webscraper MCP handle all types of URLs?

No, it primarily supports web pages, YouTube links, and PDF files.

  • Is there a limit to the size of the content that can be scraped?

The server can handle standard content sizes, but very large documents may require additional processing time.

  • Is Webscraper MCP free to use?

Yes! Webscraper MCP is free to use for everyone.

Project Info
Created At
a year ago
Updated At
a year ago
Author Name
saishridhar
Star
0
Language
Python
License
-

Recommend Servers

View All
Tavily Mcp
@tavily-ai

JavaScript
a year ago
Bring your real authenticated browser session to AI coding agents. Local-first MCP server + Chrome MV3 extension. No cloud. No telemetry.
@Cubenest

peek records the user's actual logged-in browser (DOM via rrweb, console events, network metadata, optional response bodies via opt-in Deep capture) through a Chrome MV3 extension. The extension ships events through a native-messaging stdio bridge to a local MCP server (peek-mcp), which persists them to a SQLite database at ~/.peek/sessions.db. AI coding agents (Claude Code, Cursor, Cline, Windsurf) read sessions from the database via 10 MCP tools: Tool What it does list_recent_sessions List recently recorded sessions (id, origin, ts, event count). get_session_summary LLM-readable narrative summary of a session. get_session_console_errors Console errors recorded in a session. get_session_network_errors Failed/notable network requests in a session. get_user_action_before_error Last N user actions before a console error. generate_playwright_repro Generate a runnable Playwright test from a session. get_dom_snapshot Reconstruct the DOM at a given timestamp. query_dom_history Timeline of attribute/text changes for a selector. request_authorization Side-panel consent for write actions (Level 3). execute_action Dispatch a UI action (gated by permission level + destructive blocklist). Why local-first matters Every other "browser session for AI" tool ships to a vendor cloud. peek's SQLite + extension live on the user's machine — no remote endpoints, no telemetry. The privacy policy (docs/peek/PRIVACY_POLICY.md) is the source of truth. Install # 1. Add the MCP server to Claude Code claude mcp add peek -- npx -y @peekdev/mcp # 2. Install the Chrome extension from the Chrome Web Store # (link added once the CWS listing is approved)

a day ago