Local Speech-to-Text MCP Server

Created By
SmartLittleAppsa year ago
A high-performance Model Context Protocol (MCP) server providing local speech-to-text transcription using whisper.cpp, optimized for Apple Silicon.
Overview

What is Local Speech-to-Text MCP Server?

Local Speech-to-Text MCP Server is a high-performance Model Context Protocol (MCP) server that provides local speech-to-text transcription using whisper.cpp, specifically optimized for Apple Silicon devices.

How to use Local Speech-to-Text MCP Server?

To use the server, clone the repository from GitHub, install the necessary dependencies, and configure your MCP client to connect to the server. You can transcribe audio files in various formats.

Key features of Local Speech-to-Text MCP Server?

  • 100% Local Processing for complete privacy
  • Optimized for Apple Silicon with 15x+ real-time transcription speed
  • Speaker Diarization to identify and separate multiple speakers
  • Universal Audio Support with automatic conversion from various formats
  • Multiple Output Formats including txt, json, vtt, srt, csv
  • Low Memory Footprint of less than 2GB
  • Full TypeScript support for modern development

Use cases of Local Speech-to-Text MCP Server?

  1. Transcribing meetings or lectures for documentation.
  2. Creating subtitles for videos from audio content.
  3. Assisting in accessibility by providing text for spoken content.

FAQ from Local Speech-to-Text MCP Server?

  • Is the transcription process cloud-based?

No, all processing is done locally, ensuring privacy.

  • What audio formats are supported?

The server supports WAV, FLAC, MP3, M4A, and more, with automatic conversion capabilities.

  • Do I need a HuggingFace account for speaker diarization?

Yes, a HuggingFace token is required for speaker diarization functionality.

Project Info
Created At
a year ago
Updated At
a year ago
Author Name
SmartLittleApps
Star
0
Language
TypeScript
License
MIT license

Recommend Servers

View All
//beforeyouship — LLM Cost Modeling From Your Editor
@Indiegoing

Query realistic LLM cost models without leaving your editor. beforeyouship models the **true monthly cost** of an LLM app architecture — retries, prompt caching, batch discounts, infra overhead, and 3×/10× growth — across GPT-5.x, Claude, Gemini, DeepSeek, and more. Not a token calculator: a planning tool for the design phase, before you commit to a stack. **No API key needed to try it** — demo mode covers the six free-tier models. A Pro key from [beforeyouship.dev](https://beforeyouship.dev) unlocks the full 18-model catalog. ## What you can ask - "How much will a RAG chatbot cost at 10,000 requests/day?" - "Compare Claude Haiku vs Gemini Flash pricing for my workload" - "What's the cheapest model for a multi-step agent at scale?" - "Show me current per-token prices for Anthropic models" ## Tools ### `estimate_cost` Full cost model for an architecture at a given usage level. Returns Naive / Realistic / Worst Case monthly cost per model, 3×/10× growth scenarios, and an opinionated recommendation with reasoning. ### `get_model_prices` Current per-1M-token pricing — input, output, cached input, batch — with context windows and staleness metadata. ### `list_archetypes` Seven preset architecture patterns (simple chatbot, chatbot with history, RAG pipeline, multi-model router, coding assistant, document processor, multi-step agent) used as starting points for estimates. ## Setup **Claude Code:** ​```bash claude mcp add --transport http beforeyouship https://beforeyouship.dev/api/mcp ​``` **Cursor / other clients** — add a remote server: ​```json { "mcpServers": { "beforeyouship": { "type": "streamable-http", "url": "https://beforeyouship.dev/api/mcp" } } } ​``` Add an `Authorization: Bearer bys_...` header with a Pro key for the full catalog. ## Try it > Estimate the monthly cost of a RAG pipeline at 10,000 requests/day

10 hours ago
Puter Mcp

2 days ago
Linkpulse

13 hours ago