MCP Evals

Created By

mclenharda year ago

A Node.js package and GitHub Action for evaluating MCP (Model Context Protocol) tool implementations using LLM-based scoring. This helps ensure your MCP server's tools are working correctly and performing well.

# ai

# mcp

Overview Content Tools Comments

Overview

What is MCP Evals?

MCP Evals is a Node.js package and GitHub Action designed for evaluating Model Context Protocol (MCP) tool implementations using LLM-based scoring. It ensures that your MCP server's tools are functioning correctly and performing optimally.

How to use MCP Evals?

To use MCP Evals, you can install it as a Node.js package or integrate it into your GitHub Actions workflow. For Node.js, run npm install mcp-evals. For GitHub Actions, add the specified YAML configuration to your workflow file.

Key features of MCP Evals?

Evaluates MCP tool implementations using LLM-based scoring.
Provides detailed evaluation results including accuracy, completeness, relevance, clarity, and reasoning scores.
Automatically posts evaluation results as comments on pull requests in GitHub.

Use cases of MCP Evals?

Ensuring the accuracy of tool implementations in MCP servers.
Automating evaluations during pull requests to maintain code quality.
Providing feedback on tool performance to developers.

FAQ from MCP Evals?

Can MCP Evals be used with any MCP tool?

Yes! MCP Evals is designed to work with any tool that follows the Model Context Protocol.

Is there a specific Node.js version required?

It is recommended to use Node.js version 20 or higher.

How do I view the evaluation results?

The results are posted as comments on the pull request where the evaluations are run.

Project Info

Created At

a year ago

Updated At

a year ago

Author Name

mclenhard

Star

Language

TypeScript

License

MIT license

Recommend Servers

View All

Wundervault MCP

@wundervault

MCP server for Wundervault zero-knowledge secret management. Exposes vault secrets to AI agents via the Model Context Protocol — secrets are decrypted server-side and never returned to the agent in plaintext.

2 days ago

Matchbox

@Matchbox (Co-fe GmbH)

Describe a real-world problem in plain language and Matchbox finds products built to solve it - with reasoning, honest caveats, what each product won't cover, and a frank 'no strong match' when nothing fits. The catalog (~12,000 products) focuses on early-stage and lesser-known products that search engines and LLM training data usually miss. Never sponsored; payment never affects ranking. Tools: find_products_for_problem, search_catalog, get_product. No auth required.

a day ago

MCP Advisor

@istarwyh

MCP Advisor & Installation - Use the right MCP server for your needs

TypeScript

a year ago

MCP Server for Milvus

@zilliztech

The Milvus MCP server enables AI applications to interact with Milvus vector databases using natural language commands. It allows AI models to perform vector searches, manage collections, and retrieve data without writing custom database queries. This integration facilitates seamless access to vector data, enhancing the capabilities of AI tools like Claude Desktop and Cursor.

a year ago

Shippo

@Shippo

2 days ago

Deckextract

Download DocSend and Papermark links as files. Converts decks to PDF or PPTX and data rooms to a ZIP of PDFs, including email-gated and passcode-protected links.

2 days ago

orkestr MCP

@orkestr

The orkestr MCP server gives AI agents full control of the orkestr deployment platform over the Model Context Protocol. From an MCP client an agent can create and manage projects from a GitHub, GitLab, Bitbucket, or Codeberg repo, spin up environments, trigger and roll back deployments, deploy and invoke serverless functions, provision and back up managed PostgreSQL and Redis add-ons, manage custom domains, and read live logs, build logs, metrics, and health, all on infrastructure that stays in the EU.

a day ago

Time

@modelcontextprotocol

A Model Context Protocol server that provides time and timezone conversion capabilities. This server enables LLMs to get current time information and perform timezone conversions using IANA timezone names, with automatic system timezone detection.

4 months ago

flatten-mcp

@shayaShav

An MCP server that flattens Claude Code sessions — keeping every prompt and event verbatim while reclaiming context tokens, so you resume the exact same raw conversation at a lower token count instead of compacting it into a lossy summary. It moves bulky tool output (large file reads, command logs, base64 screenshots) into a sidecar file, leaving a tiny retrievable reference in its place. Crash-safe, idempotent, and fully reversible. Real example from the README: a 317,236-token session flattened to 182,287 tokens.

a day ago

Fonteum Mcp Server

@Fonteum

Hosted MCP server for source-provenanced US federal healthcare provider data — NPPES, CMS PECOS, Care Compare, OIG LEIE, Open Payments. Every field returns with its exact federal source, snapshot date, and SHA-256 attestation. Public data only; no PHI. Install: npx -y @fonteum/mcp

a day ago

Serper MCP Server

@garymengcom

A Serper MCP Server

Python

a year ago

Gas Fee Predictor

@higher-being

Live Ethereum + Layer-2 gas-fee data for AI agents — current gas, cheapest L2, ETH price, best time to transact, and per-action cost estimates. Wraps the free gasfeepredictor.com API. No key required.

2 days ago

Almega

@almega-ai

Give your AI agents a wallet they can't abuse. Almega is an MCP server that puts a control layer in front of every payment: per-agent spending limits, allow-listed categories, 1-click human approval on sensitive transactions, and a full audit ledger. Two backends ship in one file — `memory` (zero-config, 30-second demo) and `stripe` (real Stripe Issuing test-mode virtual cards, no real money). 7 tools, stdio transport, Python 3.10+, MIT.

a day ago

Indian Food Nutrition Mcp - Log Indian meals with your AI using accurate data. India's official IFCT 2017 nutrition tables + USDA (8,335 foods), by text or photo. Local-first, open source.

@krishnabhat

One-line description: Log Indian meals with your AI using accurate data. India's official IFCT 2017 nutrition tables + USDA (8,335 foods), by text or photo. Local-first, open source. Long description: An MCP server that gives Claude (and soon ChatGPT) accurate Indian food data. Most calorie databases are US-centric and wrong for home-cooked Indian food. This wraps India's official Food Composition Tables (IFCT 2017, National Institute of Nutrition) plus USDA. Log by talking ("2 rotis and a katori of dal") or by photo; the model identifies the food, the database supplies the numbers (no LLM guessing), and your history feeds back so the AI can coach you against what you actually ate. Local SQLite, no account, no telemetry. AGPL-3.0. Tools: search_food, log_meal, get_day, get_history, edit_entry, delete_entry, fetch_image

a day ago

Wpnews

a day ago

Linkpulse

@Joost Boer

Know what every affiliate link actually earns, and fix what's bleeding revenue. See revenue per article, catch dead links before they cost you, and ask it anything in plain English. Works on any site.

2 days ago

Dependency Freshness Checker

@Armigerous

**Dependency Freshness Checker** tells any AI coding agent whether an npm or PyPI package is **outdated (out of date)** — and gives the cited facts to prove it: the latest version, release dates, deprecation status, how many versions behind you are, and a dated **"what changed since your version"** breaking-change diff. It is MCP-native, reads only public registries and GitHub releases (no scraping, no ToS risk), and is priced Pay-Per-Event for pay-as-you-go agent use.

13 hours ago

Mindcore Memory Mcp

@woshilaohei

Long-term memory system for AI agents with semantic search, context management, and multi-format storage. Hierarchical memory architecture (STM → LTM → Deduction), FAISS vector retrieval, SQLite persistence.

4 hours ago

Zhipu Web Search

@BigModel

Zhipu Web Search MCP Server is a search engine specifically designed for large models. It integrates four search engines, allowing users to flexibly compare and switch between them. Building upon the web crawling and ranking capabilities of traditional search engines, it enhances intent recognition capabilities, returning results more suitable for large model processing (such as webpage titles, URLs, summaries, site names, site icons, etc.). This helps AI applications achieve "dynamic knowledge acquisition" and "precise scenario adaptation" capabilities.

a year ago

Senado BR MCP

@Sidney da Silva Pereira Bissoli

Brazilian Federal Senate open data over MCP — 90 tools across the legislative process, Senate administration and the e-Cidadania portal. Cloudflare Workers, Streamable HTTP, no auth. Responses in pt-BR.

13 hours ago

//beforeyouship — LLM Cost Modeling From Your Editor

@Indiegoing

Query realistic LLM cost models without leaving your editor. beforeyouship models the **true monthly cost** of an LLM app architecture — retries, prompt caching, batch discounts, infra overhead, and 3×/10× growth — across GPT-5.x, Claude, Gemini, DeepSeek, and more. Not a token calculator: a planning tool for the design phase, before you commit to a stack. **No API key needed to try it** — demo mode covers the six free-tier models. A Pro key from [beforeyouship.dev](https://beforeyouship.dev) unlocks the full 18-model catalog. ## What you can ask - "How much will a RAG chatbot cost at 10,000 requests/day?" - "Compare Claude Haiku vs Gemini Flash pricing for my workload" - "What's the cheapest model for a multi-step agent at scale?" - "Show me current per-token prices for Anthropic models" ## Tools ### `estimate_cost` Full cost model for an architecture at a given usage level. Returns Naive / Realistic / Worst Case monthly cost per model, 3×/10× growth scenarios, and an opinionated recommendation with reasoning. ### `get_model_prices` Current per-1M-token pricing — input, output, cached input, batch — with context windows and staleness metadata. ### `list_archetypes` Seven preset architecture patterns (simple chatbot, chatbot with history, RAG pipeline, multi-model router, coding assistant, document processor, multi-step agent) used as starting points for estimates. ## Setup **Claude Code:** ```bash claude mcp add --transport http beforeyouship https://beforeyouship.dev/api/mcp ``` **Cursor / other clients** — add a remote server: ```json { "mcpServers": { "beforeyouship": { "type": "streamable-http", "url": "https://beforeyouship.dev/api/mcp" } } } ``` Add an `Authorization: Bearer bys_...` header with a Pro key for the full catalog. ## Try it > Estimate the monthly cost of a RAG pipeline at 10,000 requests/day

2 days ago

Aiimagemultistyle

@codecraftm

A Model Context Protocol (MCP) server for image generation and manipulation using fal.ai's Stable Diffusion model.

a year ago

Puppeteer

@modelcontextprotocol

Browser automation and web scraping

a year ago

mcp-server-flomo MCP Server

@chatmcp

Write notes to Flomo

JavaScript

a year ago

Github

@modelcontextprotocol

Repository management, file operations, and GitHub API integration

a year ago

HourLedger — Work Hours & Overtime Calculator

@wudongjie

Calculate work hours, overtime, and gross pay with tested rulesets for US federal, California, Alaska, Colorado, and Nevada law. Handles overnight shifts, rounding policies, and workweek start. Local, no API key, no data leaves your machine.

2 days ago

Catalyst Governance

@Stratogenic-AI

Governance middleware for AI agents — permission gates, human-in-the-loop approvals, compliance scanning across 8 frameworks, and hash-linked audit ledger. Hosted SSE endpoint, no self-hosting required.

2 days ago

Cirdan

@adanb13

Cirdan maps and watches the live infrastructure your agent session can reach — Docker, Kubernetes, cloud, IaC, and telemetry — then exposes it over MCP. It fingerprints the environment, builds a dependency graph, detects incidents, and can run evidence-backed actions. It inherits the session's own access and never escalates beyond it.

9 hours ago

Ai Visibility Checker — X402 Agent Store

@rccola990-cloud

Pay-per-call MCP server for AI agents. Check whether ChatGPT, Perplexity, or Google AI recommends any brand — score 0-100, mention rate, and the competitors AI names instead. Also serves the 25-brand AI Visibility Index dataset, US Treasury macro data, DeFi yields, crypto prices, and trucking load-profit calculators. Paid tools settle per call in USDC on Base via x402 — no API keys, no signup. Install in Claude Desktop or Cursor. Premium brand check $0.95. Free human checker: https://aivisibility.agentexchange.work · Leaderboard: https://index.agentexchange.work

a day ago

GBOX Android MCP

@babelcloud

GBOX provides environments for AI Agents to operate computer and mobile devices. Mobile Scenario: Your agents can use GBOX to develop/test android apps, or run apps on the Android to complete various tasks(mobile automation). Desktop Scenario: Your agents can use GBOX to operate desktop apps such as browser, terminal, VSCode, etc(desktop automation). MCP: You can also plug GBOX MCP to any Agent you like, such as Cursor, Claude Code. These agents will instantly get the ability to operate computer and mobile devices.

10 months ago