flatten-mcp

Created By

shayaShava day ago

An MCP server that flattens Claude Code sessions — keeping every prompt and event verbatim while reclaiming context tokens, so you resume the exact same raw conversation at a lower token count instead of compacting it into a lossy summary. It moves bulky tool output (large file reads, command logs, base64 screenshots) into a sidecar file, leaving a tiny retrievable reference in its place. Crash-safe, idempotent, and fully reversible. Real example from the README: a 317,236-token session flattened to 182,287 tokens.

# mcp

# claude-code

Overview Content Tools Comments

Overview

flatten-mcp

Resume the exact same conversation at a lower token cost — without compacting it into a lossy summary.

flatten-mcp is a Model Context Protocol server for Claude Code. It shrinks a session's token footprint by moving bulky tool output (large file reads, command logs, base64 screenshots) out of the conversation and into a sidecar file — leaving a tiny, retrievable reference in its place. Your prompts and the chronological flow of the session are preserved verbatim — those lines are never rewritten. You resume the same raw conversation; it just costs less to carry.

See how 317,236 tokens turned into 182,287:

https://github.com/user-attachments/assets/4672b3cd-f78f-4146-97ba-e0077b655381

Why flatten instead of compact?

The standard answer to a full context window is compaction: the model reads the whole conversation and rewrites it into a shorter summary. That summary is lossy by construction — an interpretation of your history, and interpretations drift, smooth over the awkward parts, and quietly drop the detail you didn't know you'd need. But the history is exactly what's worth keeping verbatim: the words you typed at 2 a.m., the precise order of events, the dead ends and the decisions. A fuzzy, half-formed prompt carries more raw truth about your intent than any tidy paragraph written about it after the fact — and preserving it untouched is the foundation of trust in a coding agent.

Flattening is the opposite move. It changes nothing about what was said. In most sessions the model reads a lot — large files, long logs, multiple sources — and keeps every byte of it in context, even though it has nearly always already written down the conclusion in plain prose: the one line that mattered in a 2 MB log, the finding distilled from five files, the running tally of open tasks. The raw source has done its job. Flattening lifts those already-summarized blocks out and swaps each for a lightweight reference ID — so starting cold from a flattened session is usually smooth sailing, and on the rare occasion the raw bytes are needed, they're one retrieve_flattened call away.

What sits in the context window:

   USER         "fix the crash"
   ASSISTANT    reading the logs…
   TOOL_RESULT  ▓▓▓ 2 MB log dump ▓▓▓        ← bulk; already summarized in prose below
   ASSISTANT    "the OOM is at line 88,402 — the fix is …"

After flatten — same words, only the bulk set aside:

   USER         "fix the crash"
   ASSISTANT    reading the logs…
   TOOL_RESULT  [FLATTENED id=… → sidecar]   ← one marker; fetch the full dump on demand
   ASSISTANT    "the OOM is at line 88,402 — the fix is …"

What you'll actually save

Token reduction depends entirely on what the session did:

Read-heavy sessions (lots of large files, logs, or screenshots in context) — expect reductions up to ~50%.
Prose-heavy sessions (little external data ingested) — savings are negligible. There's simply not much bulk to move.
It varies a lot — often a pleasant surprise, and once in a while a touch underwhelming.

When to reach for it. A common point is around 200k tokens. For critical sessions where you want the model at its sharpest and most context-aware, flattening around 250k–300k is where the most dramatic reductions tend to show up.

Flatten smartly, the same way you wouldn't compact mid-way through a large reading task. That said, nothing is ever lost — flattening everything and then cherry-picking the few blocks you still need is a perfectly legitimate strategy.

Quick start

Requires Node.js ≥ 18 and Claude Code.

One command — installs from npm and registers it user-wide:

claude mcp add flatten -s user -- npx -y flatten-mcp@latest

Or register it manually (in ~/.claude.json, or your project's .mcp.json):

{
  "mcpServers": {
    "flatten": {
      "command": "npx",
      "args": ["-y", "flatten-mcp@latest"]
    }
  }
}

Recommended — install the /flatten slash command:

curl -fsSL https://raw.githubusercontent.com/shayaShav/flatten-mcp/main/commands/flatten.md -o ~/.claude/commands/flatten.md

From source (for development)

git clone https://github.com/shayaShav/flatten-mcp.git
cd flatten-mcp
npm install      # builds automatically via the "prepare" script
cp commands/flatten.md ~/.claude/commands/   # optional: installs the /flatten command

{
  "mcpServers": {
    "flatten": {
      "command": "node",
      "args": ["/absolute/path/to/flatten-mcp/dist/index.js"]
    }
  }
}

Configuration

By default the server operates on the project the CLI runs in (its current working directory). Pass project_dir explicitly on any call to target a different project.

Env var	Required	Purpose
`ANTHROPIC_API_KEY`	no	If set, token savings are counted exactly via Anthropic's free `count_tokens` endpoint instead of estimated locally.
`FLATTEN_COUNT_MODEL`	no	Model id used for the exact token count (default: `claude-haiku-4-5-20251001`).

Usage

CAUTION

Always exit the session you want to flatten with Ctrl-C, then flatten it from a different window. Rewriting a live session's file out from under Claude Code corrupts its in-memory state and bricks the session.

Exit the session you want to flatten with Ctrl-C. This is mandatory — a 10-second live-write guard refuses to touch a recently-modified session unless you force it, but exiting is the safe path.
In a new Claude Code window, type /flatten latest or /flatten <session-id> — or ask:

"Flatten the latest session." · or · "Flatten session <session-id>."

/flatten latest (or bare /flatten) flattens the larger of the two most recent sessions — the smaller, seconds-old one is almost always the window doing the flattening itself, and the session worth flattening is the big one. It never forces past the live-write guard.
Resume your original session and send a prompt. When Claude starts outputting text, you'll see the token count drop.

To preview without touching anything, ask for a dry run first. To undo, ask to unflatten the session — every original block is restored to its exact original value.

TIP

Flattening needs no model intelligence — park a second window on a fast, inexpensive model (/model haiku) as a dedicated flattening station and just type /flatten latest.

Tools

Tool	What it does
`flatten_session`	Move bulky tool results into a sidecar, leaving `[FLATTENED …]` markers. Crash-safe and reversible. Supports `dry_run`, `min_size`, `force`, and `include_tool_use_result`.
`retrieve_flattened`	Fetch one original block back by its id — returns the original text, or re-renders a flattened screenshot as a real image.
`unflatten_session`	Reverse a flatten completely: re-inline every block from the sidecar, restoring each flattened result to its exact original value.
`prune_flatten_artifacts`	Reclaim disk by deleting leftover `.bak` / `.tmp` files (and, opt-in, sidecars). Defaults to a safe dry run.
`list_sessions`	List a project's sessions with branch, message count, size, and first prompt.
`search_sessions`	Keyword / branch / date search across past sessions — scans prose, tool I/O, and flatten sidecars so nothing goes dark after flattening.

When a session is flattened, the model sees compact markers like this in place of the original output:

[FLATTENED id=toolu_01AbC… tool=Read file_path=/src/server.ts | text 48213B/612L | session=2f9c… | retrieve_flattened(id,session) for raw content]

Everything the model needs to fetch the original — the id and the session — is right there in the marker.

How it works

Sidecar, not deletion. Each extracted block is written verbatim to <session>.flat.jsonl next to the session. The original session file is backed up once to <session>.jsonl.bak before the first rewrite.
Crash-safe. Originals are persisted to the sidecar before they're removed from the session, and the session is rewritten via an atomic temp-file-and-rename, so an interrupted run can never leave a half-written, irreplaceable session file.
Idempotent. Re-running flatten skips already-flattened blocks and never double-writes a sidecar entry.
Lossless & reversible. Text and base64 images are stored exactly as they appeared, so unflatten_session restores each flattened block to its exact original value (byte-identical for Claude Code's canonical JSON). Your prompts and untouched lines were never altered to begin with.
Disk vs. context tokens. Claude Code stores each tool result twice on disk (once in the API message, once in a toolUseResult mirror) and only one copy is ever sent to the model. flatten reports both diskBytesSaved (affects --resume parse speed) and contextTokensSaved out of contextTokensTotal (the number that actually matters for the context window and compaction) — they differ a lot, and the tool is explicit about which is which.

See docs/ARCHITECTURE.md for the session JSONL format, the sidecar schema, and the marker protocol.

Compatibility & roadmap

Claude Code only, for now. flatten-mcp reads Claude Code's session store at ~/.claude/projects/<encoded-project-dir>/*.jsonl. It has been tested against Claude Code exclusively; the paths and the JSONL schema are specific to it and will not work for other agents or LLM CLIs as-is.
Planned — a pluggable session backend. Porting to other agents means abstracting the storage location and the on-disk message format behind a small adapter. Contributions welcome.

Contributing

Issues and PRs are welcome. To develop locally:

npm install
npm run dev        # tsc --watch
npm run build      # one-off compile to dist/

License

MIT © Shaya Shaviv

Try in Playground

Server Config

{
  "mcpServers": {
    "flatten": {
      "command": "npx",
      "args": [
        "-y",
        "flatten-mcp@latest"
      ]
    }
  }
}

Project Info

Created At

a day ago

Updated At

a day ago

Author Name

shayaShav

Star

Language

License

Recommend Servers

View All

Amap Maps

@amap

高德地图官方 MCP Server

a year ago

Matchbox

@Matchbox (Co-fe GmbH)

Describe a real-world problem in plain language and Matchbox finds products built to solve it - with reasoning, honest caveats, what each product won't cover, and a frank 'no strong match' when nothing fits. The catalog (~12,000 products) focuses on early-stage and lesser-known products that search engines and LLM training data usually miss. Never sponsored; payment never affects ranking. Tools: find_products_for_problem, search_catalog, get_product. No auth required.

a day ago

Orkestr

16 hours ago

302_sandbox_mcp

@302ai

Create a remote sandbox that can execute code/run commands/upload and download files. 创建远程沙盒，可以执行代码/运行命令/上传下载文件

a year ago

MCP Server for Milvus

@zilliztech

The Milvus MCP server enables AI applications to interact with Milvus vector databases using natural language commands. It allows AI models to perform vector searches, manage collections, and retrieve data without writing custom database queries. This integration facilitates seamless access to vector data, enhancing the capabilities of AI tools like Claude Desktop and Cursor.

a year ago

Trainzilla Mcp

9 hours ago

Solnk MCP

@solnk

solnk is a social media management platform that lets you schedule and publish content to nine networks — Instagram, TikTok, YouTube, X, LinkedIn, Pinterest, Facebook, Threads, and Bluesky — from one place. Its MCP server lets AI agents draft and publish social posts through a single interface, with a draft-first safety model so nothing goes live without review. Includes a content calendar, team approval workflow, and analytics. Start free at solnk.com.

2 days ago

Hevy MCP

@InvIngeniero

Analyze your workout history, manage routines, track progress, and help you plan future training using your real fitness data from Hevy

2 days ago

Zhipu Web Search

@BigModel

Zhipu Web Search MCP Server is a search engine specifically designed for large models. It integrates four search engines, allowing users to flexibly compare and switch between them. Building upon the web crawling and ranking capabilities of traditional search engines, it enhances intent recognition capabilities, returning results more suitable for large model processing (such as webpage titles, URLs, summaries, site names, site icons, etc.). This helps AI applications achieve "dynamic knowledge acquisition" and "precise scenario adaptation" capabilities.

a year ago

GBOX Android MCP

@babelcloud

GBOX provides environments for AI Agents to operate computer and mobile devices. Mobile Scenario: Your agents can use GBOX to develop/test android apps, or run apps on the Android to complete various tasks(mobile automation). Desktop Scenario: Your agents can use GBOX to operate desktop apps such as browser, terminal, VSCode, etc(desktop automation). MCP: You can also plug GBOX MCP to any Agent you like, such as Cursor, Claude Code. These agents will instantly get the ability to operate computer and mobile devices.

10 months ago

Latlng

@latlng-work

Official LatLng MCP server for geocoding, reverse geocoding, places search, nearby POI lookup, and place categories. Powered by the LatLng API and OpenStreetMap data.

21 hours ago

Gas Fee Predictor

@higher-being

Live Ethereum + Layer-2 gas-fee data for AI agents — current gas, cheapest L2, ETH price, best time to transact, and per-action cost estimates. Wraps the free gasfeepredictor.com API. No key required.

a day ago

Wundervault MCP

@wundervault

MCP server for Wundervault zero-knowledge secret management. Exposes vault secrets to AI agents via the Model Context Protocol — secrets are decrypted server-side and never returned to the agent in plaintext.

a day ago

Lu71 — Agentic Dispute Resolution

@Team Lu71

Agentic dispute resolution. File chargebacks on Visa, Mastercard, Stripe Issuing, and Lithic when AI agent purchases go wrong. Verify crypto addresses against sanctions and scam databases before sending (10 chains). Record signed purchase intent as tamper proof evidence. Real time webhook updates. 5% success fee only when you win.

14 hours ago

Catalyst Governance

@Stratogenic-AI

Governance middleware for AI agents — permission gates, human-in-the-loop approvals, compliance scanning across 8 frameworks, and hash-linked audit ledger. Hosted SSE endpoint, no self-hosting required.

a day ago

Baidu Map

@baidu-maps

百度地图核心API现已全面兼容MCP协议，是国内首家兼容MCP协议的地图服务商。

a year ago

UniversalBench: run code, web, databases, and any LLM from your AI, safely

@UniversalBench

An MCP execution platform that gives any AI real infrastructure to act. Through three tools it can run code and shell commands, install packages, run tasks in parallel, search the web, make HTTP calls, read write and query databases, call any LLM, commit to GitHub, take screenshots, handle files, and keep persistent memory across sessions, all with safety checks before results reach the model. AI never ships broken code, never burns your wallet, and cannot reach your internal network. Up to 96.5 percent fewer tokens than doing the work in chat. Works with Claude, ChatGPT, Gemini, and any MCP-compatible AI.

12 hours ago

Qiniu MCP Server

@Qiniu

基于七牛云产品构建的 Model Context Protocol (MCP) Server，支持用户在 AI 大模型客户端的上下文中通过该 MCP Server 来访问七牛云存储资源、利用 Dora 服务进行图片操作等。如果有什么需求欢迎在下方评论，您也可以在 github 仓库中提 issue。

Python

a year ago

Deckextract

Download DocSend and Papermark links as files. Converts decks to PDF or PPTX and data rooms to a ZIP of PDFs, including email-gated and passcode-protected links.

a day ago

SeedBase — Synthetic Test Data

@Marcel Gläser

Generate realistic, FK-consistent test data for your databases from your AI assistant. List projects, get schema DDL, generate datasets as SQL.

2 days ago

Serper MCP Server

@garymengcom

A Serper MCP Server

Python

a year ago

Jina AI MCP Tools

@PsychArch

A Model Context Protocol (MCP) server that integrates with Jina AI Search Foundation APIs.

JavaScript

a year ago

Cirdan

@adanb13

Cirdan maps and watches the live infrastructure your agent session can reach — Docker, Kubernetes, cloud, IaC, and telemetry — then exposes it over MCP. It fingerprints the environment, builds a dependency graph, detects incidents, and can run evidence-backed actions. It inherits the session's own access and never escalates beyond it.

an hour ago

Erabi

@HMAKT99

ERABI is the open, cryptographically auditable intent exchange for AI agents: register an identity in one command, discover providers ranked by reputation (never by payment), fire intents, and build verifiable reputation and earnings from dual-signed outcomes on a public hash-chained ledger. Zero-config — `npx -y erabi-mcp` joins the live public network with no accounts, no API keys. Six tools: register, discover, intent, report_outcome, my_reputation, my_earnings. Live explorer: https://erabi-explorer.vercel.app

2 days ago

Linkpulse

a day ago

Agent Signals — Hiring, SEC, Research, GitHub & HN Data

@hassanahashish-design

Five live data tools for AI agents — company hiring signals (Greenhouse/Lever/Ashby), SEC EDGAR filings, academic papers (OpenAlex), GitHub repos, and Hacker News — as flat, citation-ready JSON. Hosted on Apify; pay-per-result billing to the caller's own Apify account, empty results cost $0.

a day ago

Linkpulse

@Joost Boer

Know what every affiliate link actually earns, and fix what's bleeding revenue. See revenue per article, catch dead links before they cost you, and ask it anything in plain English. Works on any site.

a day ago

Obsidian Hybrid Search

@flowing-abyss

Local-first MCP server for searching private Obsidian vaults with hybrid full-text, fuzzy, semantic, and wikilink graph retrieval.

2 days ago

Memory

@modelcontextprotocol

a year ago

MiniMax MCP

@MiniMax-AI

Official MiniMax Model Context Protocol (MCP) server that enables interaction with powerful Text to Speech, image generation and video generation APIs.

Python

a year ago