Memory
AI-in-a-Box stores long-lived agent context in the memory service. The service persists typed memories through Mem0 into Qdrant, scopes every operation by tenant and user, and records memory operations as turn events when a request is inside a tracked agent turn.
The current agent runtime uses memory in two ways:
- Before the model is constructed, the chat handler prefetches relevant memories and injects them into the prompt as recent relevant notes.
- During a turn, the agent may explicitly call
memory_store,memory_recall, ormemory_getwhen the injected notes are not enough.
Agents are instructed to recall memory only when the user asks about remembered or past context, when there is a substantial topic shift, or when the injected notes are insufficient. They store memories for durable preferences, decisions, corrections, and references.
Memory Types
| Type | Purpose |
|---|---|
user | User preferences and personal working style. |
feedback | Corrections the agent should learn from. |
project | Architecture decisions, project state, and implementation context. |
reference | Links, external references, and durable factual notes. |
Scope
Every memory request carries a MemoryScope:
{
"tenant_id": "acme",
"user_id": "alice",
"agent_id": "assistant",
"session_id": "optional-session"
}
The tenant and user are folded into the Mem0 user id as
{tenant_id}:{user_id}. Optional agent and session fields become narrower
filters. In the gateway path, X-Tenant-ID, X-User-ID, and roles are derived
from the authenticated principal; direct service calls still accept explicit
scope fields for internal and development use.
Non-admin callers can operate on their own scope. tenant_admin can operate
within its tenant, and platform roles can operate across tenants.
API
The gateway exposes the service at /v1/memory.
Store a Memory
curl http://localhost:8080/v1/memory \
-H "Content-Type: application/json" \
-d '{
"content": "User prefers concise TypeScript examples",
"memory_type": "user",
"scope": {
"tenant_id": "default",
"user_id": "admin"
}
}'
You can also pass messages and let Mem0 extract candidate memories:
curl http://localhost:8080/v1/memory \
-H "Content-Type: application/json" \
-d '{
"messages": [
{"role": "user", "content": "Use TypeScript for frontend examples."},
{"role": "assistant", "content": "Understood."}
],
"memory_type": "feedback",
"scope": {"tenant_id": "default", "user_id": "admin"}
}'
Search Memories
curl http://localhost:8080/v1/memory/search \
-H "Content-Type: application/json" \
-d '{
"query": "coding style preference",
"scope": {"tenant_id": "default", "user_id": "admin"},
"memory_type": "user",
"limit": 5
}'
Response:
{
"memories": [
{
"id": "mem-7f3a",
"content": "User prefers concise TypeScript examples",
"memory_type": "user",
"scope": {
"tenant_id": "default",
"user_id": "admin",
"agent_id": null,
"session_id": null
},
"pinned": false,
"expires_at": null,
"source": null,
"last_recalled_at": null,
"why_stored": null,
"created_at": null,
"updated_at": null,
"score": 0.87
}
]
}
List, Update, and Delete
curl "http://localhost:8080/v1/memory?tenant_id=default&user_id=admin"
curl -X PUT http://localhost:8080/v1/memory/mem-7f3a \
-H "Content-Type: application/json" \
-d '{
"content": "User prefers concise TypeScript examples with tests",
"scope": {"tenant_id": "default", "user_id": "admin"}
}'
curl -X DELETE "http://localhost:8080/v1/memory/mem-7f3a?tenant_id=default&user_id=admin"
POST /v1/memory/{memory_id}/forget is an alias for delete that records the
operation as a user-initiated forget event.
Turn Events
When the gateway has minted a turn context, memory operations emit structured events to the audit service. These events become part of the sealed turn proof:
memory_writefor stores and updatesmemory_recallfor searchesmemory_forgetfor deletes
Use Audit Trail for the receipt and proof API.
Configuration
| Variable | Default | Description |
|---|---|---|
QDRANT_HOST | localhost | Qdrant host. |
QDRANT_PORT | 6333 | Qdrant port. |
COLLECTION_PREFIX | mem0 | Prefix for tenant memory collections. |
EMBEDDING_DIMS | 384 | Expected embedding vector size. |
CONSOLIDATION_INTERVAL_HOURS | 24 | Memory consolidation cadence. |
LLM_BASE_URL | http://inference-router:8004/v1 | LLM endpoint for consolidation. |
LLM_MODEL | Compose fallback meta-llama/llama-3.3-70b-instruct:free; deploy/.env.example sets MEMORY_LLM_MODEL=openai/gpt-5.4 | Model used by memory consolidation. |
MEMORY_PORT | 8003 | Memory service port. |
On startup, the service validates existing Qdrant collections that match the configured prefix and recreates collections with incompatible dimensions.