Agent Memory: Persistent Memory for AI Coding Agents

Agent Memory (agentmemory) is a persistent memory engine that runs alongside your AI coding agent and gives it the ability to remember everything across sessions. Without it, your agent forgets your entire codebase, your architectural decisions, your naming conventions, and the bugs you just fixed the moment a session ends. With Agent Memory running, your agent silently captures what it does, compresses those observations into searchable memory, and recalls the right context the moment the next session starts — automatically, with no changes to your workflow. Agent Memory works with Claude Code, Cursor, GitHub Copilot CLI, Codex CLI, Gemini CLI, and 13+ other agents via the Model Context Protocol (MCP). One local server, one install, and memories are shared across every agent you use.

How it works

Agent Memory installs as a local server that sits silently in the background. It hooks into your agent’s tool-use lifecycle and captures observations — file reads, edits, shell commands, test results, error messages — without you lifting a finger. Those raw observations are compressed into structured facts and indexed using a three-stream hybrid search engine (BM25 keyword matching, vector similarity, and knowledge graph traversal). When your next session starts, Agent Memory injects only the most relevant context within a token budget, so your agent already knows where you left off.

Session ends  →  Raw observations compressed into facts, concepts, and narratives
                 Indexed via BM25 + vector + knowledge graph

Session starts →  Hybrid search runs against your current project
                  Top-K results injected into context (default: 2,000 tokens)
                  Your agent starts working immediately — no re-explaining

You do not configure this pipeline. It runs automatically from the moment you start the server and connect your agent.

Key stats

Metric	Result
Retrieval recall (R@5)	95.2% on LongMemEval-S (500 questions, ICLR 2025)
Token reduction	~92% fewer tokens vs. loading full context (~170K tokens/year vs. 19.5M+)
Estimated annual cost	~ $10/year with a cloud LLM provider, $ 0 with local embeddings
MCP tools exposed	53 tools (8 visible by default, all 53 with `AGENTMEMORY_TOOLS=all`)
Auto-capture hooks	12 hooks covering the full session lifecycle
External dependencies	Zero — no Postgres, no Redis, no vector database to manage

Start here

Quickstart

Install Agent Memory, start the server, connect your first agent, and verify recall is working — all in under 2 minutes.

Connect Agents

Wire up Claude Code, Cursor, Copilot, Codex, Gemini CLI, and 12+ other MCP-compatible agents in one command.

How Memory Works

Understand the 4-tier consolidation pipeline, hybrid search, memory lifecycle, and what gets captured automatically.

Configuration

Set your LLM provider, tune search weights, enable auto-compression, and explore every environment variable.

What Agent Memory is not

Before you dive in, it helps to know what Agent Memory deliberately avoids being. It is not a RAG system you manage. You do not write ingestion scripts, define chunking strategies, or maintain an embedding pipeline. Agent Memory captures observations from your agent’s live tool use and handles everything behind the scenes. It is not a vector database. There is no Qdrant, pgvector, Pinecone, or Weaviate to provision, configure, or pay for. Agent Memory uses SQLite and an in-process vector index bundled inside the iii engine — zero external infrastructure required. It is not another CLAUDE.md workaround. Static files like CLAUDE.md, .cursorrules, or Cursor notepads cap out around 200 lines, go stale, and load their entire contents into every session. Agent Memory is the searchable database that feeds those files: it recalls only what is relevant to the current session, keeps token usage around 1,900 tokens per session, and stays accurate as your codebase evolves. It is not a cloud service. Agent Memory runs entirely on your machine. Your code observations never leave your local environment unless you explicitly deploy the server to a remote host.

Requirements

Node.js 18+ and npm (Node.js 20+ recommended)
macOS or Linux for the one-command install path. Windows works via WSL2; native Windows engine setup requires a manual step (downloading the iii binary separately)
One of the following for LLM-backed compression and summarization (all optional — Agent Memory runs in zero-LLM mode without any key):
- ANTHROPIC_API_KEY — Anthropic Claude (recommended for Claude Code users)
- OPENAI_API_KEY — OpenAI GPT models
- GEMINI_API_KEY — Google Gemini
- OPENROUTER_API_KEY — any model via OpenRouter
- Local Ollama or LM Studio — set OPENAI_BASE_URL to your local endpoint
Ports 3111 (REST API), 3112 (streams), 3113 (viewer), and 49134 (engine WebSocket) available on your machine

You can run Agent Memory with no API key at all by setting EMBEDDING_PROVIDER=local. This uses the all-MiniLM-L6-v2 model on-device for embeddings (free, offline) and synthetic BM25 compression for observations. Hybrid search and recall still work — you only lose LLM-backed summarization and graph extraction.

​How it works

​Key stats

​Start here

Quickstart

Connect Agents

How Memory Works

Configuration

​What Agent Memory is not

​Requirements

How it works

Key stats

Start here

What Agent Memory is not

Requirements