AI-First Scraper

Created By

yubinkim4442 months ago

Ad-free web scraping and search exposed as 3 MCP tools fetch_page, fetch_pages_batch, search_web. Works with Claude Desktop, Cursor, Cline.

# web-scraping

# markdown

Overview Content Tools Comments

Overview

ai-first-scraper-mcp

Plug Claude Desktop, Cursor, or Cline straight into an ad-free web scraper + search engine. Three tools, one line of config.

What it does

Adds three tools to any MCP-compatible agent:

Tool	What it does
`fetch_page`	Fetch one URL → return clean Markdown (HTML or PDF).
`fetch_pages_batch`	Fetch up to 25 URLs in parallel → return Markdown for each.
`search_web`	Run a web search and return the top-k result pages already converted to Markdown.

No more "the model called curl and then tried to parse 80kB of ad HTML." Your agent receives clean Markdown ready to reason about.

Backed by the ai-first-scraper and ai-first-search APIs.

Install

Fastest — `uvx` (no install, runs from PyPI on demand)

// claude_desktop_config.json  /  cline_mcp_settings.json  /  ~/.cursor/mcp.json
{
  "mcpServers": {
    "ai-first-scraper": {
      "command": "uvx",
      "args": ["ai-first-scraper-mcp"]
    }
  }
}

Restart your client (Claude Desktop / Cursor / Cline). The three tools above will appear automatically.

Alternative — pip install

pip install ai-first-scraper-mcp

{
  "mcpServers": {
    "ai-first-scraper": {
      "command": "ai-first-scraper-mcp"
    }
  }
}

Where the config file lives

Client	Config path
Claude Desktop (macOS)	`~/Library/Application Support/Claude/claude_desktop_config.json`
Claude Desktop (Windows)	`%APPDATA%\Claude\claude_desktop_config.json`
Cursor	`~/.cursor/mcp.json`
Cline (VS Code)	`~/Library/Application Support/Code/User/globalStorage/saoudrizwan.claude-dev/settings/cline_mcp_settings.json`

Point at your own backend (optional)

By default this server calls the public ai-first-scraper.onrender.com and ai-first-search.onrender.com instances. If you want to self-host, set env vars in your MCP config:

{
  "mcpServers": {
    "ai-first-scraper": {
      "command": "uvx",
      "args": ["ai-first-scraper-mcp"],
      "env": {
        "SCRAPER_URL": "https://your-scraper.example.com",
        "SEARCH_URL":  "https://your-search.example.com",
        "AFS_TIMEOUT": "60"
      }
    }
  }
}

Verify it works

Open your MCP client and ask the agent:

"Use the search_web tool to find the top 3 recent articles about MCP and summarize them in 5 bullets each."

You should see the agent call search_web, get back Markdown for each result, and produce the summary without ever touching raw HTML.

Companion projects

ai-first-scraper — the per-URL Markdown cleaner this MCP server fans out to.
ai-first-search — search → scrape → markdown pipeline.
mcp-rec — record & replay any MCP server's traffic for tests and bug reports.
llm-cache-proxy — local cache for OpenAI/Anthropic API calls.
promptlocker — lockfile for prompts.
context-diff — see what blew up your Claude Code context window.
agentwatch — overlay for browser AI agents.

Develop locally

git clone https://github.com/yubinkim444/ai-first-scraper-mcp.git
cd ai-first-scraper-mcp

uv sync                    # or: pip install -e .
ai-first-scraper-mcp       # speaks MCP over stdio

To test against a local client, point its MCP config at the same command.

License

MIT © yubinkim444

Try in Playground

Server Config

{
  "mcpServers": {
    "ai-first-scraper": {
      "command": "uvx",
      "args": [
        "ai-first-scraper-mcp"
      ]
    }
  }
}

Project Info

Created At

2 months ago

Updated At

2 months ago

Author Name

yubinkim444

Star

Language

License

Recommend Servers

View All

Plan B Asia — Sovereign Mobility for Southeast Asia

@Mert Alis (Plan B Asia)

Strategic mobility, visa, and relocation advisory for Southeast Asia — Thailand focus. Thailand-resident advisor, human-led delivery, 2-hour response on WhatsApp. Serves Turkish (planbasya.com) and English (planbasia.com) markets from the ground. Signature services: Thailand DTV Visa (5-year multi-entry, 180-day stays, from $150), Soft Power language residency (6–12 months, €1,350–2,199), and the Nomad Incubator (3-month end-to-end relocation, €5,000). Not a visa broker. Not a travel agent. Independent advisory. AI agents: list_services to browse, get_service for full detail incl. FAQ and pricing, submit_lead to open a real advisory inquiry.

a day ago

iTechSmart ProofLink MCP Server

@Iteksmart

ProofLink MCP Server brings iTechSmart proof receipts, verifier access, accountability checks, and AI governance tooling to MCP-compatible agents.

13 hours ago

AppAmbit MCP

@appambit

AppAmbit MCP Server Connect AI assistants to AppAmbit — the all-in-one platform for app analytics, crash reporting, and build distribution. This MCP server provides full access to the AppAmbit platform, enabling AI agents to: Analytics — Query real-time user analytics, session metrics, device distribution, and audience insights across your apps. Crash Reporting — Retrieve crash reports with full stack traces, browse error logs, and diagnose issues across devices and environments. Build Distribution — Upload builds, manage distribution groups, and notify testers directly from your CI/CD pipeline. App Management — Create and configure apps, manage team members, and administer organization settings. Supports .NET MAUI, Swift, Objective-C, Android, Flutter and React Native. Ideal for mobile teams, indie devs, and agencies who want to monitor, debug, and ship their apps through natural language. Requires an AppAmbit account. Sign up at appambit.com.

9 hours ago

Scamcheck

@smijo-geek

Scan suspicious messages, URLs, and text for scams using AI. Returns a verdict, risk score 0-100, category, and actionable next steps. Free anonymous tier — no API key required.

19 hours ago

Tunetank Mcp

@tunetank

The Tunetank music & sound‑effects catalog as an MCP server — so any AI assistant (Claude, ChatGPT, Cursor, …) can find the right royalty‑free track or SFX for a video, ad, podcast or stream.

11 hours ago

Widgy

Build forms, surveys, quizzes, and polls straight from your AI assistant — publish a shareable link and read the responses back, all through chat.

18 hours ago

Twiceshy

@dotts-h

Experience memory for coding agents: pull validated engineering traps, fixes and dead-ends at decision time, and contribute your own — every contribution is quarantined and validated before it can reach anyone. Remote MCP over streamable HTTP; free alpha token at https://twiceshy.app.

11 hours ago

Redis

@modelcontextprotocol

A Model Context Protocol server that provides access to Redis databases. This server enables LLMs to interact with Redis key-value stores through a set of standardized tools.

a year ago

Mapquest Mcp

9 hours ago

Postking

@PostKing

Turn your client publishing control plane: create, schedule, and publish posts to connected social accounts; read and update queue state before adding new items; manage content across LinkedIn, X, Instagram, Facebook, and Threads; repurpose URLs or blog text into new drafts; generate SEO landing pages and blog posts from brand data; apply voice profiles for tone consistency; pull performance and engagement data back into the agent context. It's infrastructure for content distribution, not just another scheduling dashboard.

11 hours ago

MCP Advisor

@istarwyh

MCP Advisor & Installation - Use the right MCP server for your needs

TypeScript

a year ago

Howtocook Mcp

@worryzyy

基于Anduin2017 / HowToCook （程序员在家做饭指南）的mcp server，帮你推荐菜谱、规划膳食，解决“今天吃什么“的世纪难题； Based on Anduin2017/HowToCook (Programmer's Guide to Cooking at Home), MCP Server helps you recommend recipes, plan meals, and solve the century old problem of "what to eat today"

a year ago

FileToUrl

@Fraser Richardson

Upload files programmatically and get back a permanent, shareable URL. REST API and MCP server for AI agents. Requires a FileToUrl Developer plan (£48/year).

15 hours ago

EverArt

@modelcontextprotocol

AI image generation using various models

a year ago

Scribo

@causa-prima-ai

Free, EN 16931-compliant e-invoicing (German ZUGFeRD & XRechnung live; Factur-X, Peppol BIS and Spanish Facturae coming) or a plain US PDF — generated straight from your AI assistant. No signup; the sender's email is the login.

20 hours ago

Time

@modelcontextprotocol

A Model Context Protocol server that provides time and timezone conversion capabilities. This server enables LLMs to get current time information and perform timezone conversions using IANA timezone names, with automatic system timezone detection.

5 months ago

Wingie Enuygun MCP

Search and compare flights, hotels, buses, and car rentals with ENUYGUN's MCP.

2 hours ago

PostgreSQL

@modelcontextprotocol

Read-only database access with schema inspection

a year ago

Filesystem

Secure file operations with configurable access controls

a year ago

Wingie Enuygun Mcp

2 hours ago

Neon MCP Server

@neondatabase-labs

MCP server for interacting with Neon Management API and databases

TypeScript

a year ago

Gamma Watermark Remover

@gammaremover

Remove the "Made with Gamma" watermark from PDF/PPTX exports — structural, lossless. Local stdio server (files never leave your machine) or hosted endpoint at https://gammaremover.com/mcp. Free web version: gammaremover.com

5 hours ago

Moxie Docs

@Jackalope Digital

Living documentation MCP server for GitHub repos. Gives coding agents repo-specific conventions, doc gaps, and citation-backed context before edits, and lets them propose doc updates that ship in the same PR. Connect at https://moxiedocs.com/api/mcp (Streamable HTTP). Auth via OAuth or dashboard token.

12 hours ago

Bauta

9 hours ago

Sqlai.dev Sql Verifier

@JadeSparrow

Gives AI agents ground truth on SQL: submit a schema, query, and optional seed rows; the server executes in a fresh ephemeral in-memory SQLite sandbox and returns real rows, typed errors (unknown_column, syntax, type_mismatch, ...) with positions and did-you-mean suggestions, query plans with full-scan warnings, and result-set diffs for refactor checks. Dialect notes flag SQLite/Postgres/MySQL/DuckDB differences. 20 free calls/day, then x402 per-call micropayments (USDC on Base). No accounts, no API keys, no LLM anywhere, nothing stored. Free browser demo at https://sqlai.dev

a day ago

Motherwise

Practitioner-verified hypnobirthing & matrescence content for AI agents (English + Hungarian): Q&A, guided technique scripts, affirmations, week-by-week pregnancy guidance, maternity-system facts (NL/HU), plus guides and audio. Reviewed by certified practitioner Julia Farkas. Pay-per-call via x402 (USDC).

20 hours ago

Bdns Subvenciones Espana

11 hours ago

Bucket Feature Flags MCP Server

@bucketco

Flag features directly from chat in your code editor, including VS Code, Cursor, Windsurf, Claude Code—any IDE with MCP support.

a year ago

Fetch

@test

Web content fetching and conversion for efficient LLM usage

9 months ago

Stat Api — Sports Data