Agentmem

Created By
Thezenmonster2 months ago
Governed memory for coding agents with trust lifecycle, conflict detection, staleness tracking, and health scoring. SQLite + FTS5, zero infrastructure. Works with Claude Code, Cursor, Codex, Windsurf.
Overview

agentmem

Shared memory for Claude Code, Cursor, and Codex that knows what's still true. Save sessions, catch stale and conflicting rules, and stop your agent from repeating old mistakes.

PyPI Python License: MIT Tests

The Problem

Your AI coding assistant forgets everything between sessions. It repeats old mistakes. It can't tell current rules from outdated ones. Context compresses and recovery is painful.

Most memory tools solve storage. agentmem solves trust.

Get Started (Claude Code / Cursor / Codex)

pip install quilmem[mcp]
agentmem init --tool claude --project myapp

That's it. Restart your editor. Your agent now has 13 memory tools. Run memory_health to confirm.

agentmem demo: install, init, health check

Python-only? pip install quilmem works without the MCP extra. See the Python API below.

60-Second Demo

from agentmem import Memory

mem = Memory()

# Store typed memories
mem.add(type="bug", title="loudnorm undoes SFX levels",
        content="Never apply loudnorm to final mix. It re-normalizes everything.",
        status="validated")

mem.add(type="decision", title="Use per-line atempo",
        content="Bake speed into per-line TTS. No global pass.",
        status="active")

# Something you're not sure about yet
hypothesis = mem.add(type="decision", title="Maybe try 2-second gaps before CTA",
        content="Hypothesis from last session. Needs testing.",
        status="hypothesis")

# Search — validated and active memories rank highest.
# Deprecated and superseded memories are excluded automatically.
results = mem.search("audio mixing")

# Context-budgeted recall — fits the best memories into your token limit
context = mem.recall("building a narration track", max_tokens=2000)

# Lifecycle — promote what's proven, deprecate what's not
mem.promote(hypothesis.id)                # hypothesis -> active -> validated
mem.deprecate(hypothesis.id, reason="Disproven by data")

# Supersede: replace an outdated memory with a newer one
replacement = mem.add(type="decision", title="Use 1-second gaps before CTA",
        content="Confirmed by A/B test.", status="active")
mem.supersede(hypothesis.id, replacement.id)  # old points to replacement

# Health check — is your memory system trustworthy?
from agentmem import health_check
report = health_check(mem._conn)
# Health: 85/100 | Conflicts: 0 | Stale: 2 | Validated: 14

What Makes This Different

Other memory tools store things. agentmem knows what's still true.

Mem0LettaMengramagentmem
Memory storageYesYesYesYes
Full-text searchVectorAgent-drivenKnowledge graphFTS5
Memory lifecycle statesNoPartialNohypothesis -> active -> validated -> deprecated -> superseded
Conflict detectionNoNoPartialBuilt-in
Staleness detectionNoNoNoBuilt-in
Health scoringNoNoNoBuilt-in
Provenance trackingNoNoNosource_path + source_hash
Trust-ranked recallNoNoNoValidated > active > hypothesis
Human-readable source filesNoNoNoCanonical markdown
Local-first, zero infrastructureNoSelf-host optionSelf-host optionYes, always
MCP serverSeparateSeparateYesBuilt-in

Truth Governance

The core idea: every memory has a status that tracks how much you should trust it.

hypothesis    New observation. Not yet confirmed. Lowest trust in recall.
    |
  active      Default. Currently believed true. Normal trust.
    |
 validated    Explicitly confirmed. Highest trust in recall.

 deprecated   Was true, no longer. Excluded from recall. Kept for history.
 superseded   Replaced by a newer memory. Points to replacement.

Why this matters: Without governance, your agent's memory accumulates stale rules, contradictions, and outdated decisions. It doesn't know that the voice setting from January was overridden in March. It retrieves both and the LLM picks randomly. Governed memory solves this.

Conflict Detection

from agentmem import detect_conflicts

conflicts = detect_conflicts(mem._conn)
# Found 2 conflict(s):
#   !! [decision] "Always apply loudnorm to voice"
#      vs [decision] "NEVER apply loudnorm to voice"
#      Contradiction on shared topic (voice, loudnorm, audio)

agentmem finds memories that contradict each other:

  • Detects topic overlap (Jaccard similarity)
  • Separates duplicates from contradictions
  • Sentence-level negation matching (not just keyword scanning)
  • Severity: critical (both active) vs warning (one deprecated)

Staleness Detection

from agentmem import detect_stale

stale = detect_stale(mem._conn, stale_days=30)
# [decision] "Use atempo 0.90" — Source changed since import (hash mismatch)
# [bug] "Firewall blocks port" — Not updated in 45 days

Finds outdated memories by:

  • Age (not updated in N days)
  • Source file missing (referenced file was deleted)
  • Hash drift (source file content changed but memory wasn't updated)

Health Check

from agentmem import health_check

report = health_check(mem._conn)
print(f"Health: {report.health_score}/100")
print(f"Conflicts: {len(report.conflicts)}")
print(f"Stale: {len(report.stale)}")

Scores your memory system 0-100 based on: conflicts, stale percentage, orphaned references, deprecated weight, and whether you have any validated memories.

Provenance-Aware Sync

Sync canonical markdown files into the DB with source tracking:

# Each memory tracks where it came from
mem.add(type="bug", title="loudnorm lifts noise",
        content="...",
        source_path="/docs/errors.md",
        source_section="Audio Bugs",
        source_hash="a1b2c3d4e5f6")

The sync engine:

  • Same hash = skip (idempotent, re-running changes nothing)
  • Different hash = update (source file changed)
  • Section removed = deprecate (with reason)
  • Section restored = resurrect (reactivates deprecated memory)

Three Interfaces

Python API

from agentmem import Memory

mem = Memory("./my-agent.db", project="frontend")

# CRUD
record = mem.add(type="decision", title="Use TypeScript", content="...")
mem.get(record.id)
mem.update(record.id, content="Updated reasoning.")
mem.delete(record.id)
mem.list(type="bug", limit=20)

# Search + recall
results = mem.search("typescript migration", type="decision")
context = mem.recall("setting up the build", max_tokens=3000)

# Governance
mem.promote(record.id)              # hypothesis -> active -> validated
mem.deprecate(record.id, reason="No longer relevant")
replacement = mem.add(type="decision", title="Use v2 approach", content="...")
mem.supersede(record.id, replacement.id)  # links old to replacement

# Session persistence
mem.save_session("Working on auth refactor. Blocked on token refresh.")
mem.load_session()                  # picks up where last instance left off

# Health
mem.stats()

CLI

# Get started in 30 seconds
agentmem init --tool claude --project myapp

# Check if everything's working
agentmem doctor

# Core
agentmem add --type bug --title "CSS grid issue" "Flexbox fallback needed"
agentmem search "grid layout"
agentmem recall "frontend styling" --tokens 2000

# Governance
agentmem promote <id>
agentmem deprecate <id> --reason "Fixed in v2.3"
agentmem health
agentmem conflicts
agentmem stale --days 14

# Import + sessions
agentmem import ./errors.md --type bug
agentmem save-session "Finished auth module, starting tests"
agentmem load-session

# MCP server
agentmem serve

MCP Server

Built-in Model Context Protocol server for Claude Code, Cursor, and any MCP client.

pip install quilmem[mcp]

Claude Code config (.claude/settings.json):

{
  "mcpServers": {
    "agentmem": {
      "command": "agentmem",
      "args": ["--db", "./memory.db", "--project", "myproject", "serve"],
      "type": "stdio"
    }
  }
}

MCP tools: add_memory, search_memory, recall_memory, update_memory, delete_memory, list_memories, save_session, load_session, promote_memory, deprecate_memory, supersede_memory, memory_health, memory_conflicts

Tell your agent how to use memory: Copy the agent instructions into your CLAUDE.md, .cursorrules, or AGENTS.md. This teaches your agent the session protocol, trust hierarchy, and when to search vs add.

Typed Memory

Seven types that cover real agent workflows:

TypeWhat it storesExample
settingConfiguration, parameters"Voice speed: atempo 1.08"
bugErrors and their fixes"loudnorm lifts noise floor"
decisionRules, policies, choices"3rd-person narration banned"
procedureWorkflows, pipelines"TTS -> speed -> 48kHz -> mix"
contextBackground knowledge"Project uses FFmpeg + Python 3.11"
feedbackUser corrections"Always pick, don't ask"
sessionCurrent work state"Working on auth. Blocked on tokens."

Trust-Ranked Recall

recall() doesn't just find relevant memories. It finds the most trustworthy relevant memories:

  1. FTS5 search returns candidates
  2. Each scored: relevance (25%) + trust status (20%) + provenance (20%) + recency (15%) + frequency (10%) + confidence (10%)
  3. Validated canonical memories rank above unprovenanced hypothesis memories
  4. Deprecated and superseded memories are excluded entirely
  5. Packed greedily into your token budget

Project Scoping

frontend = Memory("./shared.db", project="frontend")
backend = Memory("./shared.db", project="backend")

frontend.search("bug")  # Only frontend bugs
backend.search("bug")   # Only backend bugs

Battle-Tested

This isn't theoretical. agentmem was built under production pressure over 2+ months of daily use:

  • 65+ YouTube Shorts produced with zero repeated production bugs
  • 330+ memories governing voice generation, FFmpeg assembly, image prompting, upload workflows
  • Every bug caught once, fixed once, never repeated
  • Governance engine reduced conflicts from 1,848 false positives to 11 real findings

How It Works

  • Storage: SQLite with WAL mode (concurrent reads, thread-safe)
  • Search: FTS5 with porter stemming and unicode61 tokenizer
  • Ranking: Composite score: text relevance + trust status + provenance + recency + frequency + confidence
  • Governance: Status lifecycle, conflict detection, staleness detection, health scoring
  • Sync: Provenance-aware with source hashing and resurrection
  • Zero infrastructure: No API keys, no cloud, no vector DB. Just a .db file.

License

MIT

Project Info
Created At
2 months ago
Updated At
a month ago
Author Name
Thezenmonster
Star
-
Language
-
License
-
Category
Tags

Recommend Servers

View All
Tavily Mcp
@tavily-ai

JavaScript
a year ago
Bring your real authenticated browser session to AI coding agents. Local-first MCP server + Chrome MV3 extension. No cloud. No telemetry.
@Cubenest

peek records the user's actual logged-in browser (DOM via rrweb, console events, network metadata, optional response bodies via opt-in Deep capture) through a Chrome MV3 extension. The extension ships events through a native-messaging stdio bridge to a local MCP server (peek-mcp), which persists them to a SQLite database at ~/.peek/sessions.db. AI coding agents (Claude Code, Cursor, Cline, Windsurf) read sessions from the database via 10 MCP tools: Tool What it does list_recent_sessions List recently recorded sessions (id, origin, ts, event count). get_session_summary LLM-readable narrative summary of a session. get_session_console_errors Console errors recorded in a session. get_session_network_errors Failed/notable network requests in a session. get_user_action_before_error Last N user actions before a console error. generate_playwright_repro Generate a runnable Playwright test from a session. get_dom_snapshot Reconstruct the DOM at a given timestamp. query_dom_history Timeline of attribute/text changes for a selector. request_authorization Side-panel consent for write actions (Level 3). execute_action Dispatch a UI action (gated by permission level + destructive blocklist). Why local-first matters Every other "browser session for AI" tool ships to a vendor cloud. peek's SQLite + extension live on the user's machine — no remote endpoints, no telemetry. The privacy policy (docs/peek/PRIVACY_POLICY.md) is the source of truth. Install # 1. Add the MCP server to Claude Code claude mcp add peek -- npx -y @peekdev/mcp # 2. Install the Chrome extension from the Chrome Web Store # (link added once the CWS listing is approved)

a day ago
AI Work Market — USDC settlement rails for AI labor on Base Mainnet)
@Dario (DME)

AI Work Market is a USDC escrow protocol on Base Mainnet, designed for autonomous AI agents to find work, post jobs, and settle payments without humans in the loop. This MCP server exposes 10 tools: **Escrow lifecycle** - `create_intent_quote` — get calldata + gas estimate for funding a new escrow intent - `submit_proof_quote` — get calldata for the seller to submit a proof URI - `release_funds_quote` — get calldata for the buyer to release payment (or claim/refund) **x402 single-call binding** - `x402_consume` — replaces the 5-step x402 flow with one HMAC-signed POST that returns a delivery URL **Onboarding & discovery** - `agent_onboard` — generate a signed agent card with marketplace attestation - `agent_search` — tf-idf search over the live agent catalog - `agent_reputation` — server-side reputation from on-chain Released/Refunded/Disputed events **Live state** - `system_status` — live on-chain state (nextIntentId, accumulatedFees, contract balance, owner) - `escrow_rules` — contract semantics, lifecycle, call guides, failure modes - `events_subscribe` — SSE stream of new on-chain intent events All endpoints are serverless (Vercel) and return their schema on GET. No browser, no wallet UI required for an agent to integrate. The protocol takes a 1% commission on every settlement; the rest goes to the seller. The full AgentCard is at `/.well-known/agent-card.json` (A2A-compatible). The OpenAPI 3.0.3 spec is at `/.well-known/openapi.json` with `components.securitySchemes` (none, hmacX402). `robots.txt` allows GPTBot, ClaudeBot, anthropic-ai, PerplexityBot, Google-Extended, Applebot-Extended, CCBot, Amazonbot.

8 hours ago