All configuration is stored in ~/.agentmemory/.env. Create it by running agentmemory init, then uncomment the variables you want to activate. Restart agentmemory after making changes — the server reads the file at startup and does not hot-reload.
Every variable is optional. Without any keys set, agentmemory runs in a safe no-LLM mode: observations are indexed via synthetic compression, hybrid BM25 search still works, but LLM-backed summarisation, consolidation, and reflection are disabled.
Minimal Config
The smallest useful configuration — an LLM provider key plus the two most impactful features:
# LLM provider (pick one)
ANTHROPIC_API_KEY=your-key-here
# Enable key features
CONSOLIDATION_ENABLED=true
AGENTMEMORY_INJECT_CONTEXT=true
Run agentmemory doctor after editing to verify the daemon sees your changes.
LLM Providers
agentmemory uses a single LLM provider for compression, summarisation, consolidation, and reflection. Set exactly one provider key. The detection order when multiple keys are present is: OPENAI_API_KEY → MINIMAX_API_KEY → ANTHROPIC_API_KEY → GEMINI_API_KEY → OPENROUTER_API_KEY → noop.
| Variable | Provider | Default Model |
|---|
ANTHROPIC_API_KEY | Anthropic | claude-sonnet-4-20250514 |
OPENAI_API_KEY | OpenAI | gpt-4o-mini |
GEMINI_API_KEY / GOOGLE_API_KEY | Google Gemini | gemini-2.5-flash |
OPENROUTER_API_KEY | OpenRouter | anthropic/claude-sonnet-4-20250514 |
MINIMAX_API_KEY | MiniMax | MiniMax-M2.7 |
Additional LLM variables:
| Variable | Description |
|---|
ANTHROPIC_MODEL | Override the Anthropic model (e.g. claude-opus-4-5) |
GEMINI_MODEL | Override the Gemini model |
OPENROUTER_MODEL | Override the OpenRouter model |
MINIMAX_MODEL | Override the MiniMax model |
OPENAI_MODEL | Override the OpenAI model |
OPENAI_BASE_URL | Override the OpenAI-compatible endpoint — use this for Ollama, vLLM, LM Studio, DeepSeek, or Azure |
ANTHROPIC_BASE_URL | Override the Anthropic-compatible endpoint for proxies or Azure AI Foundry |
MAX_TOKENS | Cap completion tokens for LLM calls (default: 4096) |
AGENTMEMORY_LLM_TIMEOUT_MS | Outbound LLM request timeout in milliseconds (default: 60000) |
FALLBACK_PROVIDERS | Comma-separated list of providers to try after the primary returns an error, e.g. anthropic,gemini |
If you set OPENROUTER_MODEL to a premium model like claude-sonnet or gpt-4o, background compression can cost $5+ per day under active use. Cheaper alternatives with comparable quality for memory compression: deepseek/deepseek-v4-pro, deepseek/deepseek-chat, qwen/qwen3-coder.
Embedding Providers
Embeddings power the vector leg of agentmemory’s hybrid search. Without an embedding provider, search falls back to BM25-only mode. The detection order is: EMBEDDING_PROVIDER override → GEMINI_API_KEY → OPENAI_API_KEY → VOYAGE_API_KEY → COHERE_API_KEY → OPENROUTER_API_KEY → local (offline).
| Variable | Provider | Model |
|---|
EMBEDDING_PROVIDER=local | Local (offline, no API key needed) | all-MiniLM-L6-v2 (384-dim) |
OPENAI_API_KEY | OpenAI | text-embedding-3-small |
VOYAGE_API_KEY | Voyage AI (optimised for code) | voyage-code-3 |
COHERE_API_KEY | Cohere | embed-english-v3.0 |
GEMINI_API_KEY | Google Gemini | gemini-embedding-001 |
Additional embedding variables:
| Variable | Description |
|---|
EMBEDDING_PROVIDER | Force a specific provider: local, openai, voyage, cohere, gemini, or openrouter |
OPENAI_EMBEDDING_MODEL | Override the OpenAI embedding model |
OPENAI_EMBEDDING_DIMENSIONS | Required when the model is not in the known-models table |
OPENROUTER_EMBEDDING_MODEL | Embedding model when using OpenRouter (default: openai/text-embedding-3-small) |
EMBEDDING_PROVIDER=local runs entirely offline using the bundled all-MiniLM-L6-v2 model. It is slower on first use (model download) but requires no API key and works in air-gapped environments.
Feature Flags
All feature flags default to false unless noted. Enable them by setting the variable to true in ~/.agentmemory/.env.
| Variable | Default | Description |
|---|
AGENTMEMORY_AUTO_COMPRESS | false | Run LLM compression on every observation batch as it is captured. Requires a provider key. Disabled by default because synthetic compression handles most cases without burning API tokens. |
AGENTMEMORY_INJECT_CONTEXT | false | Inject recalled memories into the agent’s conversation at session start. When disabled, hooks capture observations for background indexing but do not modify the conversation. |
CONSOLIDATION_ENABLED | auto | Run the 4-tier consolidation pipeline (observations → memories → semantic → procedural) at session end. Defaults to true when any LLM provider key is set; set to false to disable even with a key. |
GRAPH_EXTRACTION_ENABLED | false | Extract knowledge graph entities and relationships on every remember call. Powers the graph-traversal recall path and the graph stats shown in agentmemory status. |
AGENTMEMORY_SLOTS | false | Enable pinned, editable memory slots that persist across sessions. |
AGENTMEMORY_REFLECT | false | Automatically synthesize lessons from memories at the end of each session. |
SNAPSHOT_ENABLED | false | Periodically export a git-versioned snapshot of the memory state and BM25/vector indexes to ~/.agentmemory/snapshots/. |
CLAUDE_MEMORY_BRIDGE | false | Bi-directionally sync compressed memories into the CLAUDE.md file in your project. Requires CLAUDE_PROJECT_PATH to also be set. |
AGENTMEMORY_TOOLS | all | Tool surface exposed to MCP clients. all enables all 53 tools; core limits to the 8 essential tools for a lighter footprint. |
Additional behaviour variables:
| Variable | Description |
|---|
CONSOLIDATION_DECAY_DAYS | Age in days after which non-reinforced memories decay during consolidation (default: 30) |
GRAPH_EXTRACTION_BATCH_SIZE | Memories processed per graph-extraction batch (default: 8 — tuned for the default LLM context window) |
SNAPSHOT_DIR | Directory for periodic snapshots (default: ~/.agentmemory/snapshots) |
SNAPSHOT_INTERVAL | Seconds between snapshots (default: 3600) |
CLAUDE_PROJECT_PATH | Absolute path to your project, required when CLAUDE_MEMORY_BRIDGE=true |
CLAUDE_MEMORY_LINE_BUDGET | Max lines the bridge writes into CLAUDE.md (default: 200) |
SUMMARIZE_CHUNK_SIZE | Observations per chunk during large-session summarisation (default: 400). Primarily matters for bulk-imported JSONL sessions. |
SUMMARIZE_CHUNK_CONCURRENCY | Parallel LLM calls during chunked summarisation (default: 6) |
Ports
| Variable | Default | Purpose |
|---|
III_REST_PORT | 3111 | REST API and MCP HTTP endpoint |
AGENTMEMORY_VIEWER_PORT | 3113 | Real-time web viewer |
Overriding III_REST_PORT automatically shifts all derived ports, so a single variable is all you need to run a second instance on the same machine.
Additional port and URL variables:
| Variable | Description |
|---|
AGENTMEMORY_URL | Full REST base URL (e.g. http://localhost:3111). Honored by status, doctor, and the MCP shim. |
AGENTMEMORY_VIEWER_URL | Override the viewer URL printed by agentmemory status |
Search Tuning
Adjust the balance between keyword and semantic search, and control how much context is injected per session.
| Variable | Default | Description |
|---|
BM25_WEIGHT | 0.4 | Weight for the BM25 keyword search stream in hybrid ranking |
VECTOR_WEIGHT | 0.6 | Weight for the vector embedding search stream in hybrid ranking |
AGENTMEMORY_GRAPH_WEIGHT | 0.2 | Bonus weight applied to results found via knowledge graph traversal |
TOKEN_BUDGET | 2000 | Maximum tokens injected as context per session via mem::context |
MAX_OBS_PER_SESSION | 500 | Maximum observations captured per session before consolidation is triggered |
BM25_WEIGHT and VECTOR_WEIGHT are independent — they do not need to sum to 1.0. The graph weight is an additive bonus on top of the hybrid score, not a separate stream.
Multi-Agent Scoping
When you run multiple agents or users against the same memory server, use these variables to namespace memories and control whether agents share or isolate their recall.
| Variable | Description |
|---|
TEAM_ID | Team namespace — memories are scoped to this identifier when set alongside USER_ID |
USER_ID | Individual user identity within the team |
AGENT_ID | Agent identity for per-agent memory scoping. Trimmed to 128 characters. |
AGENTMEMORY_AGENT_SCOPE | shared (default) — tag memories with AGENT_ID but do not filter recall. isolated — tag and filter so each agent only recalls its own memories. |
Shared team memory
Isolated agent memory
All agents see all memories. Use this when you want every agent to benefit from the team’s accumulated context.TEAM_ID=acme-eng
USER_ID=alice
AGENTMEMORY_AGENT_SCOPE=shared
Each agent only recalls its own memories. Use this for multi-agent pipelines where agents should not cross-contaminate context.AGENT_ID=planner-agent
AGENTMEMORY_AGENT_SCOPE=isolated
Security
| Variable | Description |
|---|
AGENTMEMORY_SECRET | Bearer token required on all API and viewer requests when set. Without this, the REST endpoints are open on loopback. Set it when you expose agentmemory beyond localhost or run behind a reverse proxy. |
When AGENTMEMORY_SECRET is set, all CLI commands (status, doctor, import-jsonl, etc.) automatically attach the Authorization: Bearer <secret> header so they continue to work without any extra configuration.
Do not commit ~/.agentmemory/.env to version control. It may contain API keys and your bearer token.
Use agentmemory status to verify which features are active after editing your config. The status panel shows the detected LLM provider, embedding provider, and a checklist of every enabled feature flag.