Sail MCP Server for Spark SQL

Created By
lakehqa year ago
Sail is an open-source computation framework that serves as a drop-in replacement for Apache Spark (SQL and DataFrame API) in both single-host and distributed settings. The built-in MCP server in Sail exposes tools for LLM agents to register datasets and execute Spark SQL queries.
Overview

what is Sail?

Sail is a unified platform designed for stream processing, batch processing, and compute-intensive workloads, including AI tasks. It serves as a drop-in replacement for Spark SQL and the Spark DataFrame API, functioning in both single-host and distributed environments.

how to use Sail?

To use Sail, install it via pip with pip install "pysail[spark]", or build it from source for optimized performance. Start the Sail server using command line, Python API, or deploy it on Kubernetes for distributed processing.

key features of Sail?

  • Unified processing for stream, batch, and AI workloads.
  • Drop-in replacement for Spark SQL and DataFrame API.
  • Supports local and distributed server setups.
  • Easy integration with PySpark.

use cases of Sail?

  1. Real-time data analytics and processing.
  2. Batch processing of large datasets.
  3. AI model training and inference in a distributed environment.

FAQ from Sail?

  • Is Sail compatible with existing Spark applications?

Yes! Sail is designed to be a drop-in replacement for Spark SQL and DataFrame API.

  • Can I run Sail on Kubernetes?

Yes! Sail can be deployed on Kubernetes for distributed processing.

  • What support options are available for Sail?

LakeSail offers flexible enterprise support options for Sail.

Server Config

{
  "mcpServers": {
    "sail": {
      "command": "sail",
      "args": [
        "spark",
        "mcp-server",
        "--transport",
        "stdio"
      ]
    }
  }
}
Project Info
Created At
a year ago
Updated At
a year ago
Author Name
lakehq
Star
-
Language
-
License
-
Category

Recommend Servers

View All
//beforeyouship — LLM Cost Modeling From Your Editor
@Indiegoing

Query realistic LLM cost models without leaving your editor. beforeyouship models the **true monthly cost** of an LLM app architecture — retries, prompt caching, batch discounts, infra overhead, and 3×/10× growth — across GPT-5.x, Claude, Gemini, DeepSeek, and more. Not a token calculator: a planning tool for the design phase, before you commit to a stack. **No API key needed to try it** — demo mode covers the six free-tier models. A Pro key from [beforeyouship.dev](https://beforeyouship.dev) unlocks the full 18-model catalog. ## What you can ask - "How much will a RAG chatbot cost at 10,000 requests/day?" - "Compare Claude Haiku vs Gemini Flash pricing for my workload" - "What's the cheapest model for a multi-step agent at scale?" - "Show me current per-token prices for Anthropic models" ## Tools ### `estimate_cost` Full cost model for an architecture at a given usage level. Returns Naive / Realistic / Worst Case monthly cost per model, 3×/10× growth scenarios, and an opinionated recommendation with reasoning. ### `get_model_prices` Current per-1M-token pricing — input, output, cached input, batch — with context windows and staleness metadata. ### `list_archetypes` Seven preset architecture patterns (simple chatbot, chatbot with history, RAG pipeline, multi-model router, coding assistant, document processor, multi-step agent) used as starting points for estimates. ## Setup **Claude Code:** ​```bash claude mcp add --transport http beforeyouship https://beforeyouship.dev/api/mcp ​``` **Cursor / other clients** — add a remote server: ​```json { "mcpServers": { "beforeyouship": { "type": "streamable-http", "url": "https://beforeyouship.dev/api/mcp" } } } ​``` Add an `Authorization: Bearer bys_...` header with a Pro key for the full catalog. ## Try it > Estimate the monthly cost of a RAG pipeline at 10,000 requests/day

20 hours ago
Mnemom

21 hours ago
Shippo
@Shippo

a day ago