Peekaboo MCP – lightning-fast macOS screenshots for AI agents

Created By
steipetea year ago
## What Peekaboo Can Do Peekaboo provides three main tools that give AI agents visual capabilities: - **`image`** - Capture screenshots of screens or specific applications - **`analyze`** - Ask AI questions about captured images using vision models - **`list`** - Enumerate available screens and windows for targeted captures Each tool is designed to be powerful and flexible. The most powerful feature is visual question answering - agents can ask questions about screenshots like "What do you see in this window?" or "Is the submit button visible?" and get accurate answers. This saves context space since asking specific questions is much more efficient than returning raw image data. Peekaboo supports both cloud and local vision models, letting you choose between accuracy and privacy.
Overview

My MCP Ecosystem

Peekaboo is part of a growing collection of MCP servers I'm building:

Each serves a specific purpose in building autonomous AI workflows.

Technical Architecture

Peekaboo combines TypeScript and Swift for the best of both worlds. TypeScript provides excellent MCP support and easy distribution via npm, while Swift enables direct access to Apple's ScreenCaptureKit for capturing windows without focus changes.

My initial AppleScript prototype had a fatal flaw: it required focus changes to capture windows. The Swift rewrite uses ScreenCaptureKit to access the window manager directly - no focus changes, no user disruption.

The system uses a Swift CLI that communicates with a Node.js MCP server, supporting both local models and cloud providers with automatic fallback. Built with Swift 6 and the new Swift Testing framework (now that I have experience with it!), Peekaboo delivers fast, non-intrusive screenshot capture with intelligent window matching.

For detailed testing instructions using the MCP Inspector, see the Peekaboo README.

The Vision: Autonomous Agent Debugging

Peekaboo is like one puzzle piece in a larger set of MCPs I'm building to help agents stay in the loop. The goal is simple: if an agent can answer questions by itself, you don't have to intervene and it can simply continue and debug itself. This is the holy grail for building applications with CI - you want to do everything so the agent can loop and work until what you want is done.

When your build fails, when your UI doesn't look right, when something breaks - instead of stopping and asking you "what do you see?", the agent can take a screenshot, analyze it, and continue fixing the problem autonomously. That's the power of giving agents their eyes.

👻 Peekaboo MCP is available now - ⭐ the repo if this saves you a debug session!

Server Config

{
  "mcpServers": {
    "peekaboo": {
      "command": "npx",
      "args": [
        "-y",
        "@steipete/peekaboo-mcp"
      ],
      "env": {
        "PEEKABOO_AI_PROVIDERS": "ollama/llava:latest"
      }
    }
  }
}
Project Info
Created At
a year ago
Updated At
a year ago
Author Name
steipete
Star
-
Language
-
License
-
Category

Recommend Servers

View All
Tavily Mcp
@tavily-ai

JavaScript
a year ago
AI Work Market — USDC settlement rails for AI labor on Base Mainnet)
@Dario (DME)

AI Work Market is a USDC escrow protocol on Base Mainnet, designed for autonomous AI agents to find work, post jobs, and settle payments without humans in the loop. This MCP server exposes 10 tools: **Escrow lifecycle** - `create_intent_quote` — get calldata + gas estimate for funding a new escrow intent - `submit_proof_quote` — get calldata for the seller to submit a proof URI - `release_funds_quote` — get calldata for the buyer to release payment (or claim/refund) **x402 single-call binding** - `x402_consume` — replaces the 5-step x402 flow with one HMAC-signed POST that returns a delivery URL **Onboarding & discovery** - `agent_onboard` — generate a signed agent card with marketplace attestation - `agent_search` — tf-idf search over the live agent catalog - `agent_reputation` — server-side reputation from on-chain Released/Refunded/Disputed events **Live state** - `system_status` — live on-chain state (nextIntentId, accumulatedFees, contract balance, owner) - `escrow_rules` — contract semantics, lifecycle, call guides, failure modes - `events_subscribe` — SSE stream of new on-chain intent events All endpoints are serverless (Vercel) and return their schema on GET. No browser, no wallet UI required for an agent to integrate. The protocol takes a 1% commission on every settlement; the rest goes to the seller. The full AgentCard is at `/.well-known/agent-card.json` (A2A-compatible). The OpenAPI 3.0.3 spec is at `/.well-known/openapi.json` with `components.securitySchemes` (none, hmacX402). `robots.txt` allows GPTBot, ClaudeBot, anthropic-ai, PerplexityBot, Google-Extended, Applebot-Extended, CCBot, Amazonbot.

29 minutes ago