Claude Desktop Real-time Audio MCP Server (Python Implementation)

Created By
joelfuller2016a year ago
Python-based Model Context Protocol (MCP) server for real-time microphone input to Claude Desktop on Windows. FastMCP + sounddevice + multiple STT engines for sub-500ms latency voice conversations.
Overview

What is Claude Desktop Real-time Audio MCP Server?

Claude Desktop Real-time Audio MCP Server is a Python-based server that facilitates real-time microphone input for Claude Desktop on Windows, enabling fast voice conversations with low latency.

How to use Claude Desktop Real-time Audio MCP Server?

To use the server, clone the repository, set up a virtual environment, install dependencies, configure your audio settings and STT engines, and run the server.

Key features of Claude Desktop Real-time Audio MCP Server?

  • Real-time audio capture with sub-500ms latency.
  • Supports multiple speech-to-text engines including OpenAI Whisper, Azure Speech, and Google Speech-to-Text.
  • Easy configuration through JSON/YAML files and environment variables.
  • Comprehensive logging and performance monitoring.
  • Async architecture for non-blocking operations.

Use cases of Claude Desktop Real-time Audio MCP Server?

  1. Enabling voice-driven interactions with Claude Desktop.
  2. Real-time transcription of spoken language into text.
  3. Voice activity detection for improved audio processing.

FAQ from Claude Desktop Real-time Audio MCP Server?

  • What platforms does it support?

    It supports Windows 10/11 and requires Python 3.8 or higher.

  • Is it free to use?

    Yes, it is open-source and available under the MIT License.

  • How can I contribute?

    Contributions are welcome, especially in areas like additional STT engines and cross-platform support.

Project Info
Created At
a year ago
Updated At
a year ago
Author Name
joelfuller2016
Star
0
Language
Python
License
MIT license

Recommend Servers

View All
AI Work Market — USDC settlement rails for AI labor on Base Mainnet)
@Dario (DME)

AI Work Market is a USDC escrow protocol on Base Mainnet, designed for autonomous AI agents to find work, post jobs, and settle payments without humans in the loop. This MCP server exposes 10 tools: **Escrow lifecycle** - `create_intent_quote` — get calldata + gas estimate for funding a new escrow intent - `submit_proof_quote` — get calldata for the seller to submit a proof URI - `release_funds_quote` — get calldata for the buyer to release payment (or claim/refund) **x402 single-call binding** - `x402_consume` — replaces the 5-step x402 flow with one HMAC-signed POST that returns a delivery URL **Onboarding & discovery** - `agent_onboard` — generate a signed agent card with marketplace attestation - `agent_search` — tf-idf search over the live agent catalog - `agent_reputation` — server-side reputation from on-chain Released/Refunded/Disputed events **Live state** - `system_status` — live on-chain state (nextIntentId, accumulatedFees, contract balance, owner) - `escrow_rules` — contract semantics, lifecycle, call guides, failure modes - `events_subscribe` — SSE stream of new on-chain intent events All endpoints are serverless (Vercel) and return their schema on GET. No browser, no wallet UI required for an agent to integrate. The protocol takes a 1% commission on every settlement; the rest goes to the seller. The full AgentCard is at `/.well-known/agent-card.json` (A2A-compatible). The OpenAPI 3.0.3 spec is at `/.well-known/openapi.json` with `components.securitySchemes` (none, hmacX402). `robots.txt` allows GPTBot, ClaudeBot, anthropic-ai, PerplexityBot, Google-Extended, Applebot-Extended, CCBot, Amazonbot.

8 hours ago