Sifter - Turn a folder of documents into typed records you can query

Created By
Bruno Fortunato - sifter-ai4 hours ago
Sifter extracts structured, typed records from your documents (PDFs, scans, contracts, invoices) using a natural-language field spec, then lets an agent query and aggregate them — exact counts, sums, filters, with citations back to the source page. Unlike RAG, it answers collection-wide questions, not just "find the passage.
Overview

Sifter MCP Server

Turn a folder of documents into a database your agent can query.

RAG is great at finding a passage. It can't answer the questions people actually ask about a pile of documents — "how many invoices are unpaid", "total billed to this client this year", "which contracts expire in the next 90 days". Those are aggregations over the whole collection, and top-k retrieval only ever sees a handful of docs.

Sifter takes a different path: it extracts every document into a typed record (you describe the fields in plain language, the schema is inferred), then exposes them over MCP so your agent can query and aggregate them — exact counts, sums, filters, group-bys — with every field cited back to its source page. Not a paragraph. A figure.

What the agent can do

  • Create a sift — define an extraction in natural language (e.g. "from invoices: client, date, total — skip anything that isn't an invoice").
  • Upload documents — PDFs, scans, contracts, receipts, images.
  • List & filter records — typed fields, real filters.
  • Aggregate — counts, sums, group-bys over all records, not a sample.
  • Get citations — trace any value back to its source document, page, and bounding box.

Connect

Remote (hosted, zero install — Starter+)

{
  "mcpServers": {
    "sifter": {
      "url": "https://api.sifter.run/mcp",
      "headers": { "Authorization": "Bearer sk-..." }
    }
  }
}

Get an API key at sifter.runAPI Keys. The remote endpoint is a Starter+ feature; free-plan keys receive 402 on tool calls.

Local (self-host, free, MIT — bring your own model)

{
  "mcpServers": {
    "sifter": {
      "command": "uvx",
      "args": ["sifter-mcp", "--base-url", "http://localhost:8000"],
      "env": { "SIFTER_API_KEY": "sk-..." }
    }
  }
}

Run the open-source engine with docker compose up -d and point the server at your instance. Local models work — the LLM is only the extractor, so nothing has to leave your machine.

Try it

"How much have we invoiced per client this year, highest first?" "What's the total unpaid across all invoices?" "Which contracts expire in the next 90 days?"

Each runs as a real query over every record and returns an exact answer, traceable to the source.

Tags: document-extraction · structured-data · rag · pdf · ocr · data · agents · invoices · self-hosted

Server Config

{
  "mcpServers": {
    "sifter": {
      "command": "uvx",
      "args": [
        "sifter-mcp",
        "--base-url",
        "https://api.sifter.run/api"
      ],
      "env": {
        "SIFTER_API_KEY": "sk-..."
      }
    }
  }
}
Project Info
Created At
4 hours ago
Updated At
4 hours ago
Author Name
Bruno Fortunato - sifter-ai
Star
-
Language
-
License
-
Category

Recommend Servers

View All
Meteomatics

10 hours ago
Shotapi

13 hours ago