Search Cases · Compare Models

Created By

Luoxiaoshan5 months ago

Search and compare LLM test cases with real benchmark data. Find use cases by scenario, view model rankings, and evaluate performance—all through natural language.

# MCP

# LLM

Overview Content Tools Comments

Overview

XSCT MCP

用 AI 帮你选 AI —— 一句话完成大模型选型决策

配置好 XSCT MCP 后，直接向 AI 助手提问即可。AI 会根据你的问题自动调用工具，完成查数据、算成本、做对比，最后给出可执行的建议。

怎么用

用自然语言问一句话，例如：

「润色场景有哪些模型比较好？」
「代码生成场景，用哪个模型性价比最高？」
「Qwen3-Max 和 Claude 在创意写作上有什么差异？」
「图像生成哪个模型中文最好？」

AI 会自动判断该调用哪些工具，你不需要关心具体实现。

工具列表

工具	说明
`get_leaderboard`	查排行榜
`get_model_scores`	查某个模型的各维度评分
`compare_models`	对比两个模型
`search_testcases`	搜索测试用例
`get_model_case_result`	查模型在某用例上的表现
`get_dimensions`	查所有评测维度
`calculate_cost`	计算模型成本
`get_testcase_curl`	生成可复现的 CURL 命令

这 8 个工具覆盖：查榜单、看评分、搜场景、比模型、算成本。无需记忆，AI 会根据你的问题自动调用。

使用示例

场景一：简单选型

提问：「润色场景有哪些模型比较好？」

AI 会自动：

调用 search_testcases 搜索润色相关用例
调用 get_leaderboard 获取排行榜
给出场景分类和初步建议

场景二：企业级成本分析

提问：「输入 5000 token，输出 2000 token，每天 300 次调用，80% 触发 KV Cache，哪些模型比较好？」

AI 会自动：

拆解计算逻辑（Cache 命中率、token 成本）
调用 calculate_cost 批量计算多个模型
生成完整的成本分析报告
给出分层推荐（首选 / 备选 / 不推荐）

场景三：深度对比

提问：「对比一下 MIMO V2 Flash 和 Qwen3-Max 在润色用例上的表现」

AI 会自动调用 compare_models，选取代表性用例进行深度对比。

场景四：生成可执行代码

提问：「帮我生成这个用例的 CURL 命令」

AI 会调用 get_testcase_curl，生成可直接运行的 CURL 命令，改一下 KEY 即可在终端测试。

平台

官网：xsct.ai
系统提示词参考：

# 角色：XSCT-Bench 智能选型顾问

你是一位基于 XSCT Arena 真实评测数据的 AI 模型选型专家。你通过 MCP 工具实时获取最新数据，为用户提供数据驱动的精准推荐。

## 核心理念

永远不要凭感觉推荐。你的每一个建议都必须调用工具获取实时数据，让数据说话。

## 工具使用策略

**启动时自我发现**

每次对话开始，如果用户询问能力范围，先调用 getDimensions() 了解当前支持的所有评测维度和测试类型。这能帮助你了解平台最新的评测能力边界。

**按需探索数据**

不要假设任何模型排名或分数。当用户问"哪个模型最好"，调用 getLeaderboard 获取实时排行。当用户问某模型能力，调用 getModelScores 获取详情。数据会持续更新，始终以工具返回为准。

**场景匹配策略**

当用户描述使用场景时，先调用 searchTestcases 用关键词搜索相关用例。搜索结果会告诉你该场景对应什么维度和测试类型，以此指导后续的排行榜查询。

## 服务流程

**通用选型流程**

用户说"推荐一个模型"时：先询问任务类型、使用频率、预算要求；根据任务调用 getLeaderboard 获取排行；对候选模型调用 calculateCost 估算成本；综合性能和成本给出推荐。

**模型对比流程**

用户说"A和B哪个好"时：确认对比的任务类型；调用 compareModels 获取维度对比；必要时调用 getModelScores 深挖各自优劣；给出分场景选择建议。

**场景验证流程**

用户想看真实效果时：调用 searchTestcases 找相关用例；调用 getModelCaseResult 展示实际表现；提供 getTestcaseCurl 生成的命令让用户自测。

**成本优化流程**

用户有预算限制时：了解使用量预估；批量调用 calculateCost；筛选预算内选项；按性能排序推荐。

## 输出原则

**推荐时**：说明数据来源（"根据最新排行榜..."），给出核心指标，附带成本估算，提供验证方式。

**对比时**：展示关键维度差异，分析各自适用场景，给出选择建议。

**验证时**：展示实际生成效果，提供评分和理由，附上可运行的测试命令。

## 性价比计算

性价比指数 = (综合评分 - 基准分) × 场景权重 / log10(月成本 + 1)

基准分和权重根据 getLeaderboard 返回的分数分布动态确定，不要使用固定值。

## 诚实原则

如果工具返回数据为空或报错，坦诚告知用户当前无法获取该信息。不要编造数据，不要用过时的记忆回答。如果某个场景没有对应评测，说明局限性而非硬推荐。

关于

XSCT Bench 是大模型评测平台，将评测数据汇聚并提供 MCP 协议，让 AI 助手能直接查询。大模型选型本质是「信息检索 + 数据分析 + 决策推理」，每一步都是大模型擅长的事。

作者：洛小山

你只管问，剩下的，交给 AI。

Try in Playground

Server Config

{
  "mcpServers": {
    "xsct-bench": {
      "url": "https://xsct.ai/mcp"
    }
  }
}

Project Info

Created At

5 months ago

Updated At

5 months ago

Author Name

Luoxiaoshan

Star

Language

License

Recommend Servers

View All

GitLab

@modelcontextprotocol

GitLab API, enabling project management

a year ago

Baidu Map

@baidu-maps

百度地图核心API现已全面兼容MCP协议，是国内首家兼容MCP协议的地图服务商。

a year ago

Test

@modelcontextprotocol

test

7 months ago

MCP Advisor

@istarwyh

MCP Advisor & Installation - Use the right MCP server for your needs

TypeScript

a year ago

Sentry

@modelcontextprotocol

Retrieving and analyzing issues from Sentry.io

a year ago

Australia Payments Mcp

@junter1989k-ai

12 days ago

mcp-server-flomo MCP Server

@chatmcp

Write notes to Flomo

JavaScript

a year ago

Bahrain Payments Mcp

@junter1989k-ai

12 days ago

Lithuania Payments Mcp

@junter1989k-ai

12 days ago

Realdentalcosts Mcp

@tresor4k

Query verified 2026 US dental cost data from Real Dental Costs: average implant, veneer, braces and procedure prices for all 50 states and 206 cities, state-vs-state comparisons, and the composite US Dental Cost Index. Open data (CC BY 4.0, Zenodo DOIs). Every response includes the source and a reference URL. Pricing/market research, not medical advice. Data: https://realdentalcosts.com

12 days ago

Filesystem

Secure file operations with configurable access controls

a year ago

12 days ago

12 days ago

12 days ago

13 days ago

VC follow intelligence for AI agents. Track what top investors follow on X — detect new follows, convergence signals, and trending companies before they're announced.

7 days ago

Search1API

One API for Search, Crawling, and Sitemaps

a year ago

Croatia Payments Mcp

@junter1989k-ai

12 days ago

Time

@modelcontextprotocol

A Model Context Protocol server that provides time and timezone conversion capabilities. This server enables LLMs to get current time information and perform timezone conversions using IANA timezone names, with automatic system timezone detection.

5 months ago

Slack

@modelcontextprotocol

Channel management and messaging capabilities

a year ago

12 days ago

12 days ago

12 days ago

12 days ago

12 days ago

@modelcontextprotocol

AI image generation using various models

a year ago

Neon MCP Server

@neondatabase-labs

MCP server for interacting with Neon Management API and databases

TypeScript

a year ago

Qiniu MCP Server

@Qiniu

基于七牛云产品构建的 Model Context Protocol (MCP) Server，支持用户在 AI 大模型客户端的上下文中通过该 MCP Server 来访问七牛云存储资源、利用 Dora 服务进行图片操作等。如果有什么需求欢迎在下方评论，您也可以在 github 仓库中提 issue。

Python

a year ago

Framesail AI

@framesail

Official remote MCP server for Framesail AI. Create long-form (faceless YouTube) videos end to end from any MCP client: script, locked character references, storyboard, voiceover, and final video editing — with characters and style held consistent across every shot. Making long-form AI video today means 8+ tabs stitched by hand — an LLM for the script, a voice model, an image model, a video model — with characters drifting between tools and style resetting at every export. Framesail replaces the patchwork: the whole pipeline runs in one place and manages your video's context end to end. Six stages: Style (paste images, videos, or YouTube links and Framesail reverse-engineers the look, voice, and direction), Script (write it yourself or generate it in your narrative style), Reference images (auto-generated for every character, place, and prop), Voiceover (one narrator or many characters, with word-level timing), Storyboard (planned scene by scene), and Editor (captions, music, SFX, then export). No black box: you control every prompt, asset, model, and setting.

12 days ago

Slovakia Payments Mcp

@junter1989k-ai

12 days ago