Connect an LLM Provider to Agent Memory

Agent Memory uses an LLM for three things: compressing observations into structured memories, running the consolidation pipeline at session end, and extracting entities and relationships into the knowledge graph. None of these require a key — without one, Agent Memory runs in BM25-only mode and synthetic compression handles indexing. Recall still works. But adding an LLM key meaningfully improves the quality of long-term memory and semantic search.

Supported Providers

Anthropic

Default model: claude-sonnet-4-20250514. High quality for both compression and knowledge graph extraction.

OpenAI

Default model: gpt-4o-mini. Cost-effective for continuous background compression. Override with OPENAI_MODEL=gpt-4o.

Google Gemini

Default model: gemini-2.5-flash. Also auto-enables Gemini embeddings (gemini-embedding-001). Supports a free tier.

OpenRouter

Default model: anthropic/claude-sonnet-4-20250514. Routes to any model in the OpenRouter catalog — useful for cost optimization.

MiniMax

Default model: MiniMax-M2.7. Anthropic-compatible API. Good alternative for high-volume compression workloads.

Local / Ollama

Uses any OpenAI-compatible server. Zero API cost, fully offline. Works with Ollama, LM Studio, vLLM, and llama.cpp.

Setup for Each Provider

Add the relevant key to ~/.agentmemory/.env, then restart Agent Memory.

Anthropic
OpenAI
Gemini
OpenRouter
Local / Ollama

# ~/.agentmemory/.env
ANTHROPIC_API_KEY=sk-ant-...

To pin a specific model:

ANTHROPIC_API_KEY=sk-ant-...
ANTHROPIC_MODEL=claude-sonnet-4-20250514

To route through an Anthropic-compatible proxy or Azure AI Foundry:

ANTHROPIC_API_KEY=sk-ant-...
ANTHROPIC_BASE_URL=https://your-proxy.example.com

# ~/.agentmemory/.env
OPENAI_API_KEY=sk-...

To use a more capable model:

OPENAI_API_KEY=sk-...
OPENAI_MODEL=gpt-4o

Setting OPENAI_API_KEY activates both the OpenAI LLM provider and the OpenAI embedding provider. If you only want OpenAI for embeddings, add OPENAI_API_KEY_FOR_LLM=false.

# ~/.agentmemory/.env
GEMINI_API_KEY=...

GOOGLE_API_KEY also works as an alias, though GEMINI_API_KEY takes precedence when both are set.To pin a specific Gemini model:

GEMINI_API_KEY=...
GEMINI_MODEL=gemini-2.5-flash

# ~/.agentmemory/.env
OPENROUTER_API_KEY=sk-or-...

To select a specific model through OpenRouter:

OPENROUTER_API_KEY=sk-or-...
OPENROUTER_MODEL=anthropic/claude-sonnet-4-20250514

Premium-tier models like claude-sonnet-4 and gpt-4o can cost $5+ per day under active use when used for continuous background compression. Agent Memory warns you at startup when a premium model is detected. Consider cost-optimized alternatives like deepseek/deepseek-v4-pro or qwen/qwen3-coder for equivalent compression quality at roughly 10× lower cost. Set AGENTMEMORY_SUPPRESS_COST_WARNING=1 to silence the warning once you’ve made an informed choice.

Run Agent Memory fully offline with no API costs using any OpenAI-compatible local server.Ollama (runs on port 11434 by default):

ollama pull qwen2.5-coder:7b
ollama serve

# ~/.agentmemory/.env
OPENAI_API_KEY=local
OPENAI_BASE_URL=http://localhost:11434/v1
OPENAI_MODEL=qwen2.5-coder:7b
EMBEDDING_PROVIDER=local

LM Studio (runs on port 1234 by default):Open LM Studio → Local Server tab → Start Server. Then:

# ~/.agentmemory/.env
OPENAI_API_KEY=lmstudio
OPENAI_BASE_URL=http://localhost:1234/v1
OPENAI_MODEL=qwen2.5-coder-7b-instruct
EMBEDDING_PROVIDER=local

Setting EMBEDDING_PROVIDER=local alongside a local LLM gives you a fully offline setup — no network calls at all.Recommended models for memory work (compression tasks are short and don’t need large models):

Model	Size	Notes
`qwen2.5-coder:7b`	~4.7 GB	Best for code-heavy sessions
`llama3.2:3b`	~2 GB	Smallest viable option
`mistral:7b-instruct`	~4.4 GB	Good general-purpose baseline
`deepseek-r1:7b`	~4.7 GB	Cleaner knowledge graph extractions

Embedding Providers

Embeddings are configured separately from the LLM provider. Agent Memory auto-detects the embedding provider from your available keys, or you can set EMBEDDING_PROVIDER explicitly.

Provider	Variable	Model	Notes
Local (default)	`EMBEDDING_PROVIDER=local`	`all-MiniLM-L6-v2` (384-dim)	Free, offline, no key required. Ships bundled via `@xenova/transformers`.
Voyage AI	`VOYAGE_API_KEY=pa-...`	`voyage-code-3`	Recommended for code projects. Optimized for code semantics and retrieval.
OpenAI	`OPENAI_API_KEY=sk-...`	`text-embedding-3-small` (1536-dim)	Enabled automatically when `OPENAI_API_KEY` is set. Override model with `OPENAI_EMBEDDING_MODEL`.
Gemini	`GEMINI_API_KEY=...`	`gemini-embedding-001`	Enabled automatically when `GEMINI_API_KEY` is set. Supports 100+ languages.
Cohere	`COHERE_API_KEY=...`	`embed-english-v3.0`	General-purpose embeddings with a free trial tier.
OpenRouter	`OPENROUTER_API_KEY=...`	configurable	Set `OPENROUTER_EMBEDDING_MODEL` to select the model.

Provider Auto-Detection

Agent Memory checks for API keys in a fixed priority order and activates the first one it finds. You don’t need to set EMBEDDING_PROVIDER or any provider name explicitly — just set your API key. Detection order for LLM providers:

OPENAI_API_KEY → MINIMAX_API_KEY → ANTHROPIC_API_KEY → GEMINI_API_KEY → OPENROUTER_API_KEY → noop

Detection order for embedding providers:

EMBEDDING_PROVIDER (explicit) → GEMINI_API_KEY → OPENAI_API_KEY → VOYAGE_API_KEY → COHERE_API_KEY → OPENROUTER_API_KEY → local

Fallback Chain

If your primary LLM provider returns an error (for example, a rate limit or temporary outage), Agent Memory can automatically retry with a secondary provider:

# ~/.agentmemory/.env
ANTHROPIC_API_KEY=sk-ant-...
FALLBACK_PROVIDERS=openai,gemini
OPENAI_API_KEY=sk-...
GEMINI_API_KEY=...

Agent Memory tries each fallback in the order listed. If all providers fail, the operation is skipped and retried on the next session.

Recommended Setup for Code Projects

For the best recall quality on code-heavy projects:

# ~/.agentmemory/.env
ANTHROPIC_API_KEY=sk-ant-...      # or OPENAI_API_KEY
VOYAGE_API_KEY=pa-...              # voyage-code-3: best code embeddings
CONSOLIDATION_ENABLED=true
GRAPH_EXTRACTION_ENABLED=true

voyage-code-3 is specifically trained on code and significantly outperforms general-purpose embedding models on code retrieval tasks. Pair it with any LLM provider for consolidation and graph extraction.

Get Started

Core Concepts

Guides

Integrations

Connect an LLM Provider to Agent Memory

Supported Providers

Anthropic

OpenAI

Google Gemini

OpenRouter

MiniMax

Local / Ollama

Setup for Each Provider

Embedding Providers

Provider Auto-Detection

Fallback Chain

Recommended Setup for Code Projects

​Supported Providers

Anthropic

OpenAI

Google Gemini

OpenRouter

MiniMax

Local / Ollama

​Setup for Each Provider

​Embedding Providers

​Provider Auto-Detection

​Fallback Chain

​Recommended Setup for Code Projects

Supported Providers

Setup for Each Provider

Embedding Providers

Provider Auto-Detection

Fallback Chain

Recommended Setup for Code Projects