- Gemini OCR MCP
Gemini OCR MCP
This project provides a simple yet powerful OCR (Optical Character Recognition) service through a FastMCP server, leveraging the capabilities of the Google Gemini API. It allows you to extract text from images either by providing a file path or a base64 encoded string.
Overview
Overview
This project, Gemini OCR MCP Server, is a Python-based service that provides Optical Character Recognition (OCR) capabilities using the Google Gemini API. It is designed to run as a FastMCP server, exposing OCR functionality as simple tools that can be accessed programmatically.
Key Features
- File-based OCR: Extract text from local image files.
- Base64 OCR: Extract text from images provided as base64-encoded strings.
- Google Gemini Integration: Utilizes advanced Gemini models for high-accuracy text recognition.
- Easy Integration: Can be configured as a server in a parent MCP application.
How It Works
- The server exposes two main tools:
ocr_image_file: Accepts a file path, reads the image, and returns the extracted text.ocr_image_base64: Accepts a base64-encoded image string and returns the extracted text.
- Both tools use the Google Gemini API to perform OCR, requiring a valid API key and model specification via environment variables.
Usage
- Clone the repository and install dependencies.
- Set up your environment with the required Google Gemini API credentials.
- Run the server or integrate it into your MCP configuration.
- Use the provided tools to extract text from images either by file path or base64 string.
Example
Extract the text from an image (e.g., a CAPTCHA) and convert it to plain text using the provided tools.
Server Config
{
"mcpServers": {
"gemini-ocr-mcp": {
"command": "uv",
"args": [
"--directory",
"/path/to/your/project/gemini-ocr-mcp",
"run",
"gemini-ocr-mcp.py"
],
"env": {
"GEMINI_MODEL": "gemini-2.5-flash-preview-05-20",
"GEMINI_API_KEY": "YOUR_GEMINI_API_KEY"
}
}
}
}Project Info
Created At
a year agoUpdated At
a year agoAuthor Name
WindoCStar
-Language
-License
-Recommend Servers
View AllMemory
@modelcontextprotocol
a year ago
Playwright Mcp
@microsoft
Playwright MCP server
TypeScript
10 months ago
Filesystem
@modelcontextprotocol
2 months ago
Tavily Mcp
@tavily-ai
JavaScript
a year ago