MCP Web Extractor

Created By
iemonga year ago
MCP server that extracts web content using Readability.js
Overview

what is MCP Web Extractor?

MCP Web Extractor is a Model Context Protocol (MCP) server that extracts web content using Readability.js, allowing users to fetch and save clean, readable versions of articles.

how to use MCP Web Extractor?

To use MCP Web Extractor, clone the repository, install dependencies, build the project, and start the server. You can then extract content from a URL using the provided client example or integrate it with Obsidian.

key features of MCP Web Extractor?

  • Extracts readable content from any URL
  • Removes ads, sidebars, and other distractions
  • Returns clean text along with metadata (title, excerpt, etc.)
  • Easy integration with Obsidian via MCP

use cases of MCP Web Extractor?

  1. Saving clean versions of articles for note-taking in Obsidian.
  2. Extracting content for research purposes.
  3. Creating a distraction-free reading experience by removing unnecessary elements from web pages.

FAQ from MCP Web Extractor?

  • Can MCP Web Extractor extract content from any website?

Yes! It can extract readable content from any URL.

  • Is there a way to integrate it with other applications?

Yes! It can be integrated with Obsidian and other applications using the MCP protocol.

  • What technologies does MCP Web Extractor use?

It uses Readability.js for content extraction and is built with TypeScript.

Project Info
Created At
a year ago
Updated At
a year ago
Author Name
iemong
Star
0
Language
TypeScript
License
-

Recommend Servers

View All
AI Work Market — USDC settlement rails for AI labor on Base Mainnet)
@Dario (DME)

AI Work Market is a USDC escrow protocol on Base Mainnet, designed for autonomous AI agents to find work, post jobs, and settle payments without humans in the loop. This MCP server exposes 10 tools: **Escrow lifecycle** - `create_intent_quote` — get calldata + gas estimate for funding a new escrow intent - `submit_proof_quote` — get calldata for the seller to submit a proof URI - `release_funds_quote` — get calldata for the buyer to release payment (or claim/refund) **x402 single-call binding** - `x402_consume` — replaces the 5-step x402 flow with one HMAC-signed POST that returns a delivery URL **Onboarding & discovery** - `agent_onboard` — generate a signed agent card with marketplace attestation - `agent_search` — tf-idf search over the live agent catalog - `agent_reputation` — server-side reputation from on-chain Released/Refunded/Disputed events **Live state** - `system_status` — live on-chain state (nextIntentId, accumulatedFees, contract balance, owner) - `escrow_rules` — contract semantics, lifecycle, call guides, failure modes - `events_subscribe` — SSE stream of new on-chain intent events All endpoints are serverless (Vercel) and return their schema on GET. No browser, no wallet UI required for an agent to integrate. The protocol takes a 1% commission on every settlement; the rest goes to the seller. The full AgentCard is at `/.well-known/agent-card.json` (A2A-compatible). The OpenAPI 3.0.3 spec is at `/.well-known/openapi.json` with `components.securitySchemes` (none, hmacX402). `robots.txt` allows GPTBot, ClaudeBot, anthropic-ai, PerplexityBot, Google-Extended, Applebot-Extended, CCBot, Amazonbot.

8 hours ago