MCP Evals

Created By
mclenharda year ago
A Node.js package and GitHub Action for evaluating MCP (Model Context Protocol) tool implementations using LLM-based scoring. This helps ensure your MCP server's tools are working correctly and performing well.
Overview

What is MCP Evals?

MCP Evals is a Node.js package and GitHub Action designed for evaluating Model Context Protocol (MCP) tool implementations using LLM-based scoring. It ensures that your MCP server's tools are functioning correctly and performing optimally.

How to use MCP Evals?

To use MCP Evals, you can install it as a Node.js package or integrate it into your GitHub Actions workflow. For Node.js, run npm install mcp-evals. For GitHub Actions, add the specified YAML configuration to your workflow file.

Key features of MCP Evals?

  • Evaluates MCP tool implementations using LLM-based scoring.
  • Provides detailed evaluation results including accuracy, completeness, relevance, clarity, and reasoning scores.
  • Automatically posts evaluation results as comments on pull requests in GitHub.

Use cases of MCP Evals?

  1. Ensuring the accuracy of tool implementations in MCP servers.
  2. Automating evaluations during pull requests to maintain code quality.
  3. Providing feedback on tool performance to developers.

FAQ from MCP Evals?

  • Can MCP Evals be used with any MCP tool?

Yes! MCP Evals is designed to work with any tool that follows the Model Context Protocol.

  • Is there a specific Node.js version required?

It is recommended to use Node.js version 20 or higher.

  • How do I view the evaluation results?

The results are posted as comments on the pull request where the evaluations are run.

Project Info
Created At
a year ago
Updated At
a year ago
Author Name
mclenhard
Star
30
Language
TypeScript
License
MIT license
Tags

Recommend Servers

View All
Shippo
@Shippo

2 days ago
Wpnews

a day ago
//beforeyouship — LLM Cost Modeling From Your Editor
@Indiegoing

Query realistic LLM cost models without leaving your editor. beforeyouship models the **true monthly cost** of an LLM app architecture — retries, prompt caching, batch discounts, infra overhead, and 3×/10× growth — across GPT-5.x, Claude, Gemini, DeepSeek, and more. Not a token calculator: a planning tool for the design phase, before you commit to a stack. **No API key needed to try it** — demo mode covers the six free-tier models. A Pro key from [beforeyouship.dev](https://beforeyouship.dev) unlocks the full 18-model catalog. ## What you can ask - "How much will a RAG chatbot cost at 10,000 requests/day?" - "Compare Claude Haiku vs Gemini Flash pricing for my workload" - "What's the cheapest model for a multi-step agent at scale?" - "Show me current per-token prices for Anthropic models" ## Tools ### `estimate_cost` Full cost model for an architecture at a given usage level. Returns Naive / Realistic / Worst Case monthly cost per model, 3×/10× growth scenarios, and an opinionated recommendation with reasoning. ### `get_model_prices` Current per-1M-token pricing — input, output, cached input, batch — with context windows and staleness metadata. ### `list_archetypes` Seven preset architecture patterns (simple chatbot, chatbot with history, RAG pipeline, multi-model router, coding assistant, document processor, multi-step agent) used as starting points for estimates. ## Setup **Claude Code:** ​```bash claude mcp add --transport http beforeyouship https://beforeyouship.dev/api/mcp ​``` **Cursor / other clients** — add a remote server: ​```json { "mcpServers": { "beforeyouship": { "type": "streamable-http", "url": "https://beforeyouship.dev/api/mcp" } } } ​``` Add an `Authorization: Bearer bys_...` header with a Pro key for the full catalog. ## Try it > Estimate the monthly cost of a RAG pipeline at 10,000 requests/day

2 days ago