Unicrawler

Created By
6 months ago
Stop writing selectors. Start describing data. UniCrawler ships an MCP (Model Context Protocol) server that exposes UniCrawler’s crawling/parsing/storage capabilities to MCP-capable clients.
Overview

what is UniCrawler?

UniCrawler is a web crawling and data extraction tool that allows users to describe the data they want to extract using natural language, eliminating the need for complex selectors.

how to use UniCrawler?

To use UniCrawler, install it via PyPI, set up a virtual environment, and start the MCP server. You can then use commands to crawl URLs and extract structured data.

key features of UniCrawler?

  • Natural language driven data extraction
  • AI-powered parsing of messy HTML/DOM
  • Browser automation for dynamic page handling
  • One-click storage to PostgreSQL

use cases of UniCrawler?

  1. Extracting product information from e-commerce sites.
  2. Gathering data from news articles or blogs.
  3. Automating data collection for research purposes.

FAQ from UniCrawler?

  • Can UniCrawler handle dynamic web pages?

Yes! UniCrawler uses browser automation to work with dynamic content.

  • Is there a learning curve for using UniCrawler?

No, it is designed to be user-friendly with natural language descriptions.

  • What databases can I use with UniCrawler?

UniCrawler supports PostgreSQL for data storage.

Project Info
Created At
6 months ago
Updated At
6 months ago
Author Name
-
Star
-
Language
-
License
-
Category

Recommend Servers

View All
Bring your real authenticated browser session to AI coding agents. Local-first MCP server + Chrome MV3 extension. No cloud. No telemetry.
@Cubenest

peek records the user's actual logged-in browser (DOM via rrweb, console events, network metadata, optional response bodies via opt-in Deep capture) through a Chrome MV3 extension. The extension ships events through a native-messaging stdio bridge to a local MCP server (peek-mcp), which persists them to a SQLite database at ~/.peek/sessions.db. AI coding agents (Claude Code, Cursor, Cline, Windsurf) read sessions from the database via 10 MCP tools: Tool What it does list_recent_sessions List recently recorded sessions (id, origin, ts, event count). get_session_summary LLM-readable narrative summary of a session. get_session_console_errors Console errors recorded in a session. get_session_network_errors Failed/notable network requests in a session. get_user_action_before_error Last N user actions before a console error. generate_playwright_repro Generate a runnable Playwright test from a session. get_dom_snapshot Reconstruct the DOM at a given timestamp. query_dom_history Timeline of attribute/text changes for a selector. request_authorization Side-panel consent for write actions (Level 3). execute_action Dispatch a UI action (gated by permission level + destructive blocklist). Why local-first matters Every other "browser session for AI" tool ships to a vendor cloud. peek's SQLite + extension live on the user's machine — no remote endpoints, no telemetry. The privacy policy (docs/peek/PRIVACY_POLICY.md) is the source of truth. Install # 1. Add the MCP server to Claude Code claude mcp add peek -- npx -y @peekdev/mcp # 2. Install the Chrome extension from the Chrome Web Store # (link added once the CWS listing is approved)

a day ago