Crawlio Mcp

Created By
3 months ago
AI-powered website crawling, analysis, and export via MCP. 38 tools for crawl control, browser enrichment capture, WARC/ZIP/single-HTML export, observation timeline, and evidence-backed findings. 4 resources, 4 prompts. Dual-mode: full (38 tools) or code (6 tools). Install: npx crawlio-mcp
Overview

crawlio-mcp

1.3.4 • Public • Published

crawlio-mcp

MCP server for Crawlio — the AI-native macOS website crawler.

Exposes 36 tools and 4 resources over stdio transport, compatible with any MCP client (Claude Code, Cursor, Windsurf, VS Code, Zed, etc.).

Install

npx crawlio-mcp init

Downloads the binary automatically and configures all detected MCP clients.

Homebrew

brew install crawlio-app/tap/crawlio-mcp
crawlio-mcp init

Bundled with Crawlio.app

brew install --cask crawlio-app/tap/crawlio

The desktop app bundles the MCP server — the cask symlinks it to your PATH automatically.

Quick Setup

Run init to auto-detect and configure your AI clients:

npx crawlio-mcp init          # Configure all detected clients
npx crawlio-mcp init --full   # Enable all 36 tools (default: 6 code-mode tools)
npx crawlio-mcp init --portal # HTTP transport + launchd auto-start
npx crawlio-mcp init --dry-run # Preview changes without writing

Manual configuration

Add to your MCP client config (.mcp.json, mcp.json, etc.):

{
  "mcpServers": {
    "crawlio": {
      "command": "npx",
      "args": ["-y", "crawlio-mcp"]
    }
  }
}

Prerequisites

  • macOS 15 (Sequoia) or later
  • Crawlio.app must be running for control tools (read-only tools work offline)

How it works

This npm package is a thin wrapper that locates or downloads the native CrawlioMCP binary and forwards stdio through it. Binary resolution order:

  1. $CRAWLIO_MCP_BINARY environment variable
  2. ~/.crawlio/bin/CrawlioMCP (npm auto-download cache)
  3. /Applications/Crawlio.app/Contents/Helpers/CrawlioMCP
  4. ~/Applications/Crawlio.app/Contents/Helpers/CrawlioMCP
  5. /opt/homebrew/bin/crawlio-mcp (Homebrew)
  6. /usr/local/bin/crawlio-mcp

If no binary is found, it downloads from GitHub Releases on first run.

Tools

36 tools across 8 categories:

CategoryCountTools
Status & Monitoring6 get_crawl_status, get_crawl_logs, get_errors, get_downloads, get_failed_urls, get_site_tree
Control4 start_crawl, stop_crawl, pause_crawl, resume_crawl
Settings & Config3 get_settings, update_settings, recrawl_urls
Projects5 list_projects, save_project, load_project, delete_project, get_project
Export & Extraction5 export_site, get_export_status, extract_site, get_extraction_status, trigger_capture
Composite Analysis2 analyze_page (capture + poll + status in one call), compare_pages (two-site comparison with summary)
Enrichment6 get_enrichment, submit_enrichment_bundle, submit_enrichment_framework, submit_enrichment_network, submit_enrichment_console, submit_enrichment_dom
Observations & Findings4 get_observations, create_finding, get_findings, get_crawled_urls
OCR1extract_text_from_image

Plus 3 HTTP-only endpoints accessible via execute_api: get_health, get_debug_metrics, dump_state.

Resources

URIDescription
crawlio://statusEngine state and progress
crawlio://settingsCurrent crawl settings
crawlio://site-treeDownloaded file tree
crawlio://enrichmentBrowser enrichment data

Server Config

{
  "mcpServers": {
    "crawlio": {
      "command": "npx",
      "args": [
        "-y",
        "crawlio-mcp"
      ]
    }
  }
}
Project Info
Created At
3 months ago
Updated At
3 months ago
Author Name
-
Star
-
Language
-
License
-
Category

Recommend Servers

View All
Bring your real authenticated browser session to AI coding agents. Local-first MCP server + Chrome MV3 extension. No cloud. No telemetry.
@Cubenest

peek records the user's actual logged-in browser (DOM via rrweb, console events, network metadata, optional response bodies via opt-in Deep capture) through a Chrome MV3 extension. The extension ships events through a native-messaging stdio bridge to a local MCP server (peek-mcp), which persists them to a SQLite database at ~/.peek/sessions.db. AI coding agents (Claude Code, Cursor, Cline, Windsurf) read sessions from the database via 10 MCP tools: Tool What it does list_recent_sessions List recently recorded sessions (id, origin, ts, event count). get_session_summary LLM-readable narrative summary of a session. get_session_console_errors Console errors recorded in a session. get_session_network_errors Failed/notable network requests in a session. get_user_action_before_error Last N user actions before a console error. generate_playwright_repro Generate a runnable Playwright test from a session. get_dom_snapshot Reconstruct the DOM at a given timestamp. query_dom_history Timeline of attribute/text changes for a selector. request_authorization Side-panel consent for write actions (Level 3). execute_action Dispatch a UI action (gated by permission level + destructive blocklist). Why local-first matters Every other "browser session for AI" tool ships to a vendor cloud. peek's SQLite + extension live on the user's machine — no remote endpoints, no telemetry. The privacy policy (docs/peek/PRIVACY_POLICY.md) is the source of truth. Install # 1. Add the MCP server to Claude Code claude mcp add peek -- npx -y @peekdev/mcp # 2. Install the Chrome extension from the Chrome Web Store # (link added once the CWS listing is approved)

a day ago