Iris

Created By
iris-eval3 months ago
The first MCP-native eval and observability tool for AI agents. Any MCP-compatible agent discovers and uses Iris automatically — no SDK, no code changes. Log traces with hierarchical span trees, evaluate output quality with 12 built-in rules (PII detection, prompt injection, cost thresholds), and track what your agents are actually doing and costing you. Real-time dark-mode dashboard, OpenTelemetry-compatible span structure, self-hosted with SQLite. MIT licensed.
Overview

Iris — MCP-Native Agent Eval & Observability

GitHub stars npm version npm downloads CI License: MIT

See what your AI agents are actually doing. Iris is an open-source MCP server that logs every trace, evaluates output quality, and tracks costs across all your agents. Any MCP-compatible agent discovers and uses it automatically — no SDK, no code changes.

Iris Dashboard

The Problem

Your agents are running in production. Traditional monitoring sees 200 OK and moves on. It has no idea the agent just:

  • Leaked a social security number in its response
  • Hallucinated an answer with zero factual grounding
  • Burned $0.47 on a single query — 4.7x your budget threshold
  • Made 6 tool calls when 2 would have sufficed

Iris sees all of it.

What You Get

Trace LoggingHierarchical span trees with per-tool-call latency, token usage, and cost in USD. Stored in SQLite, queryable instantly.
Output Evaluation12 built-in rules across 4 categories: completeness, relevance, safety, cost. PII detection, prompt injection patterns, hallucination markers. Add custom rules with Zod schemas.
Cost VisibilityAggregate cost across all agents over any time window. Set budget thresholds. Get flagged when agents overspend.
Web DashboardReal-time dark-mode UI with trace visualization, eval results, and cost breakdowns.

Quickstart

Add Iris to your Claude Desktop (or Cursor, Claude Code, Windsurf) MCP config:

{
  "mcpServers": {
    "iris-eval": {
      "command": "npx",
      "args": ["@iris-eval/mcp-server"]
    }
  }
}

That's it. Your agent discovers Iris and starts logging traces automatically.

Want the dashboard?

npx @iris-eval/mcp-server --dashboard
# Open http://localhost:6920

Other Install Methods

# Global install
npm install -g @iris-eval/mcp-server
iris-mcp --dashboard

# Docker
docker run -p 3000:3000 -v iris-data:/data ghcr.io/iris-eval/mcp-server

MCP Tools

Iris registers three tools that any MCP-compatible agent can invoke:

  • log_trace — Log an agent execution with spans, tool calls, token usage, and cost
  • evaluate_output — Score output quality against completeness, relevance, safety, and cost rules
  • get_traces — Query stored traces with filtering, pagination, and time-range support

Full tool schemas and configuration: iris-eval.com

Cloud Tier (Coming Soon)

Self-hosted Iris runs on your machine with SQLite. As your team grows, the cloud tier adds PostgreSQL, team dashboards, alerting, and managed infrastructure.

Join the waitlist to get early access.

Examples

Community

Configuration & Security

CLI Arguments

FlagDefaultDescription
--transportstdioTransport type: stdio or http
--port3000HTTP transport port
--db-path~/.iris/iris.dbSQLite database path
--config~/.iris/config.jsonConfig file path
--api-keyAPI key for HTTP authentication
--dashboardfalseEnable web dashboard
--dashboard-port6920Dashboard port

Environment Variables

VariableDescription
IRIS_TRANSPORTTransport type
IRIS_PORTHTTP port
IRIS_DB_PATHDatabase path
IRIS_LOG_LEVELLog level: debug, info, warn, error
IRIS_DASHBOARDEnable dashboard (true/false)
IRIS_API_KEYAPI key for HTTP authentication
IRIS_ALLOWED_ORIGINSComma-separated allowed CORS origins

Security

When using HTTP transport, Iris includes:

  • API key authentication with timing-safe comparison
  • CORS restricted to localhost by default
  • Rate limiting (100 req/min API, 20 req/min MCP)
  • Helmet security headers
  • Zod input validation on all routes
  • ReDoS-safe regex for custom eval rules
  • 1MB request body limits
# Production deployment
iris-mcp --transport http --port 3000 --api-key "$(openssl rand -hex 32)" --dashboard

If Iris is useful to you, consider starring the repo — it helps others find it.

Star on GitHub

MIT Licensed.

Server Config

{
  "mcpServers": {
    "iris": {
      "command": "npx",
      "args": [
        "-y",
        "@iris-eval/mcp-server"
      ]
    }
  }
}
Project Info
Created At
3 months ago
Updated At
3 months ago
Author Name
iris-eval
Star
-
Language
-
License
-
Category

Recommend Servers

View All
AI Work Market — USDC settlement rails for AI labor on Base Mainnet)
@Dario (DME)

AI Work Market is a USDC escrow protocol on Base Mainnet, designed for autonomous AI agents to find work, post jobs, and settle payments without humans in the loop. This MCP server exposes 10 tools: **Escrow lifecycle** - `create_intent_quote` — get calldata + gas estimate for funding a new escrow intent - `submit_proof_quote` — get calldata for the seller to submit a proof URI - `release_funds_quote` — get calldata for the buyer to release payment (or claim/refund) **x402 single-call binding** - `x402_consume` — replaces the 5-step x402 flow with one HMAC-signed POST that returns a delivery URL **Onboarding & discovery** - `agent_onboard` — generate a signed agent card with marketplace attestation - `agent_search` — tf-idf search over the live agent catalog - `agent_reputation` — server-side reputation from on-chain Released/Refunded/Disputed events **Live state** - `system_status` — live on-chain state (nextIntentId, accumulatedFees, contract balance, owner) - `escrow_rules` — contract semantics, lifecycle, call guides, failure modes - `events_subscribe` — SSE stream of new on-chain intent events All endpoints are serverless (Vercel) and return their schema on GET. No browser, no wallet UI required for an agent to integrate. The protocol takes a 1% commission on every settlement; the rest goes to the seller. The full AgentCard is at `/.well-known/agent-card.json` (A2A-compatible). The OpenAPI 3.0.3 spec is at `/.well-known/openapi.json` with `components.securitySchemes` (none, hmacX402). `robots.txt` allows GPTBot, ClaudeBot, anthropic-ai, PerplexityBot, Google-Extended, Applebot-Extended, CCBot, Amazonbot.

7 hours ago
Bring your real authenticated browser session to AI coding agents. Local-first MCP server + Chrome MV3 extension. No cloud. No telemetry.
@Cubenest

peek records the user's actual logged-in browser (DOM via rrweb, console events, network metadata, optional response bodies via opt-in Deep capture) through a Chrome MV3 extension. The extension ships events through a native-messaging stdio bridge to a local MCP server (peek-mcp), which persists them to a SQLite database at ~/.peek/sessions.db. AI coding agents (Claude Code, Cursor, Cline, Windsurf) read sessions from the database via 10 MCP tools: Tool What it does list_recent_sessions List recently recorded sessions (id, origin, ts, event count). get_session_summary LLM-readable narrative summary of a session. get_session_console_errors Console errors recorded in a session. get_session_network_errors Failed/notable network requests in a session. get_user_action_before_error Last N user actions before a console error. generate_playwright_repro Generate a runnable Playwright test from a session. get_dom_snapshot Reconstruct the DOM at a given timestamp. query_dom_history Timeline of attribute/text changes for a selector. request_authorization Side-panel consent for write actions (Level 3). execute_action Dispatch a UI action (gated by permission level + destructive blocklist). Why local-first matters Every other "browser session for AI" tool ships to a vendor cloud. peek's SQLite + extension live on the user's machine — no remote endpoints, no telemetry. The privacy policy (docs/peek/PRIVACY_POLICY.md) is the source of truth. Install # 1. Add the MCP server to Claude Code claude mcp add peek -- npx -y @peekdev/mcp # 2. Install the Chrome extension from the Chrome Web Store # (link added once the CWS listing is approved)

a day ago