Skip to main content

Memory

AI-in-a-Box stores long-lived agent context in the memory service. The service persists typed memories through Mem0 into Qdrant, scopes every operation by tenant and user, and records memory operations as turn events when a request is inside a tracked agent turn.

The current agent runtime uses memory in two ways:

  • Before the model is constructed, the chat handler prefetches relevant memories and injects them into the prompt as recent relevant notes.
  • During a turn, the agent may explicitly call memory_store, memory_recall, or memory_get when the injected notes are not enough.

Agents are instructed to recall memory only when the user asks about remembered or past context, when there is a substantial topic shift, or when the injected notes are insufficient. They store memories for durable preferences, decisions, corrections, and references.

Memory Types

TypePurpose
userUser preferences and personal working style.
feedbackCorrections the agent should learn from.
projectArchitecture decisions, project state, and implementation context.
referenceLinks, external references, and durable factual notes.

Scope

Every memory request carries a MemoryScope:

{
"tenant_id": "acme",
"user_id": "alice",
"agent_id": "assistant",
"session_id": "optional-session"
}

The tenant and user are folded into the Mem0 user id as {tenant_id}:{user_id}. Optional agent and session fields become narrower filters. In the gateway path, X-Tenant-ID, X-User-ID, and roles are derived from the authenticated principal; direct service calls still accept explicit scope fields for internal and development use.

Non-admin callers can operate on their own scope. tenant_admin can operate within its tenant, and platform roles can operate across tenants.

API

The gateway exposes the service at /v1/memory.

Store a Memory

curl http://localhost:8080/v1/memory \
-H "Content-Type: application/json" \
-d '{
"content": "User prefers concise TypeScript examples",
"memory_type": "user",
"scope": {
"tenant_id": "default",
"user_id": "admin"
}
}'

You can also pass messages and let Mem0 extract candidate memories:

curl http://localhost:8080/v1/memory \
-H "Content-Type: application/json" \
-d '{
"messages": [
{"role": "user", "content": "Use TypeScript for frontend examples."},
{"role": "assistant", "content": "Understood."}
],
"memory_type": "feedback",
"scope": {"tenant_id": "default", "user_id": "admin"}
}'

Search Memories

curl http://localhost:8080/v1/memory/search \
-H "Content-Type: application/json" \
-d '{
"query": "coding style preference",
"scope": {"tenant_id": "default", "user_id": "admin"},
"memory_type": "user",
"limit": 5
}'

Response:

{
"memories": [
{
"id": "mem-7f3a",
"content": "User prefers concise TypeScript examples",
"memory_type": "user",
"scope": {
"tenant_id": "default",
"user_id": "admin",
"agent_id": null,
"session_id": null
},
"pinned": false,
"expires_at": null,
"source": null,
"last_recalled_at": null,
"why_stored": null,
"created_at": null,
"updated_at": null,
"score": 0.87
}
]
}

List, Update, and Delete

curl "http://localhost:8080/v1/memory?tenant_id=default&user_id=admin"
curl -X PUT http://localhost:8080/v1/memory/mem-7f3a \
-H "Content-Type: application/json" \
-d '{
"content": "User prefers concise TypeScript examples with tests",
"scope": {"tenant_id": "default", "user_id": "admin"}
}'
curl -X DELETE "http://localhost:8080/v1/memory/mem-7f3a?tenant_id=default&user_id=admin"

POST /v1/memory/{memory_id}/forget is an alias for delete that records the operation as a user-initiated forget event.

Turn Events

When the gateway has minted a turn context, memory operations emit structured events to the audit service. These events become part of the sealed turn proof:

  • memory_write for stores and updates
  • memory_recall for searches
  • memory_forget for deletes

Use Audit Trail for the receipt and proof API.

Configuration

VariableDefaultDescription
QDRANT_HOSTlocalhostQdrant host.
QDRANT_PORT6333Qdrant port.
COLLECTION_PREFIXmem0Prefix for tenant memory collections.
EMBEDDING_DIMS384Expected embedding vector size.
CONSOLIDATION_INTERVAL_HOURS24Memory consolidation cadence.
LLM_BASE_URLhttp://inference-router:8004/v1LLM endpoint for consolidation.
LLM_MODELCompose fallback meta-llama/llama-3.3-70b-instruct:free; deploy/.env.example sets MEMORY_LLM_MODEL=openai/gpt-5.4Model used by memory consolidation.
MEMORY_PORT8003Memory service port.

On startup, the service validates existing Qdrant collections that match the configured prefix and recreates collections with incompatible dimensions.