agentmemory) is a persistent memory engine that runs alongside your AI coding agent and gives it the ability to remember everything across sessions. Without it, your agent forgets your entire codebase, your architectural decisions, your naming conventions, and the bugs you just fixed the moment a session ends. With Agent Memory running, your agent silently captures what it does, compresses those observations into searchable memory, and recalls the right context the moment the next session starts — automatically, with no changes to your workflow.
Agent Memory works with Claude Code, Cursor, GitHub Copilot CLI, Codex CLI, Gemini CLI, and 13+ other agents via the Model Context Protocol (MCP). One local server, one install, and memories are shared across every agent you use.
How it works
Agent Memory installs as a local server that sits silently in the background. It hooks into your agent’s tool-use lifecycle and captures observations — file reads, edits, shell commands, test results, error messages — without you lifting a finger. Those raw observations are compressed into structured facts and indexed using a three-stream hybrid search engine (BM25 keyword matching, vector similarity, and knowledge graph traversal). When your next session starts, Agent Memory injects only the most relevant context within a token budget, so your agent already knows where you left off.Key stats
| Metric | Result |
|---|---|
| Retrieval recall (R@5) | 95.2% on LongMemEval-S (500 questions, ICLR 2025) |
| Token reduction | ~92% fewer tokens vs. loading full context (~170K tokens/year vs. 19.5M+) |
| Estimated annual cost | ~0 with local embeddings |
| MCP tools exposed | 53 tools (8 visible by default, all 53 with AGENTMEMORY_TOOLS=all) |
| Auto-capture hooks | 12 hooks covering the full session lifecycle |
| External dependencies | Zero — no Postgres, no Redis, no vector database to manage |
Start here
Quickstart
Install Agent Memory, start the server, connect your first agent, and verify recall is working — all in under 2 minutes.
Connect Agents
Wire up Claude Code, Cursor, Copilot, Codex, Gemini CLI, and 12+ other MCP-compatible agents in one command.
How Memory Works
Understand the 4-tier consolidation pipeline, hybrid search, memory lifecycle, and what gets captured automatically.
Configuration
Set your LLM provider, tune search weights, enable auto-compression, and explore every environment variable.
What Agent Memory is not
Before you dive in, it helps to know what Agent Memory deliberately avoids being. It is not a RAG system you manage. You do not write ingestion scripts, define chunking strategies, or maintain an embedding pipeline. Agent Memory captures observations from your agent’s live tool use and handles everything behind the scenes. It is not a vector database. There is no Qdrant, pgvector, Pinecone, or Weaviate to provision, configure, or pay for. Agent Memory uses SQLite and an in-process vector index bundled inside theiii engine — zero external infrastructure required.
It is not another CLAUDE.md workaround. Static files like CLAUDE.md, .cursorrules, or Cursor notepads cap out around 200 lines, go stale, and load their entire contents into every session. Agent Memory is the searchable database that feeds those files: it recalls only what is relevant to the current session, keeps token usage around 1,900 tokens per session, and stays accurate as your codebase evolves.
It is not a cloud service. Agent Memory runs entirely on your machine. Your code observations never leave your local environment unless you explicitly deploy the server to a remote host.
Requirements
- Node.js 18+ and
npm(Node.js 20+ recommended) - macOS or Linux for the one-command install path. Windows works via WSL2; native Windows engine setup requires a manual step (downloading the
iiibinary separately) - One of the following for LLM-backed compression and summarization (all optional — Agent Memory runs in zero-LLM mode without any key):
ANTHROPIC_API_KEY— Anthropic Claude (recommended for Claude Code users)OPENAI_API_KEY— OpenAI GPT modelsGEMINI_API_KEY— Google GeminiOPENROUTER_API_KEY— any model via OpenRouter- Local Ollama or LM Studio — set
OPENAI_BASE_URLto your local endpoint
- Ports 3111 (REST API), 3112 (streams), 3113 (viewer), and 49134 (engine WebSocket) available on your machine
You can run Agent Memory with no API key at all by setting
EMBEDDING_PROVIDER=local. This uses the all-MiniLM-L6-v2 model on-device for embeddings (free, offline) and synthetic BM25 compression for observations. Hybrid search and recall still work — you only lose LLM-backed summarization and graph extraction.