Memory

AI-in-a-Box stores long-lived agent context in the memory service. The service persists typed memories through Mem0 into Qdrant, scopes every operation by tenant and user, and records memory operations as turn events when a request is inside a tracked agent turn.

The current agent runtime uses memory in two ways:

Before the model is constructed, the chat handler prefetches relevant memories and injects them into the prompt as recent relevant notes.
During a turn, the agent may explicitly call memory_store, memory_recall, or memory_get when the injected notes are not enough.

Agents are instructed to recall memory only when the user asks about remembered or past context, when there is a substantial topic shift, or when the injected notes are insufficient. They store memories for durable preferences, decisions, corrections, and references.

Memory Types

Type	Purpose
`user`	User preferences and personal working style.
`feedback`	Corrections the agent should learn from.
`project`	Architecture decisions, project state, and implementation context.
`reference`	Links, external references, and durable factual notes.

Scope

Every memory request carries a MemoryScope:

{
  "tenant_id": "acme",
  "user_id": "alice",
  "agent_id": "assistant",
  "session_id": "optional-session"
}

The tenant and user are folded into the Mem0 user id as {tenant_id}:{user_id}. Optional agent and session fields become narrower filters. In the gateway path, X-Tenant-ID, X-User-ID, and roles are derived from the authenticated principal; direct service calls still accept explicit scope fields for internal and development use.

Non-admin callers can operate on their own scope. tenant_admin can operate within its tenant, and platform roles can operate across tenants.

API

The gateway exposes the service at /v1/memory.

Store a Memory

curl http://localhost:8080/v1/memory \
  -H "Content-Type: application/json" \
  -d '{
    "content": "User prefers concise TypeScript examples",
    "memory_type": "user",
    "scope": {
      "tenant_id": "default",
      "user_id": "admin"
    }
  }'

You can also pass messages and let Mem0 extract candidate memories:

curl http://localhost:8080/v1/memory \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "user", "content": "Use TypeScript for frontend examples."},
      {"role": "assistant", "content": "Understood."}
    ],
    "memory_type": "feedback",
    "scope": {"tenant_id": "default", "user_id": "admin"}
  }'

Search Memories

curl http://localhost:8080/v1/memory/search \
  -H "Content-Type: application/json" \
  -d '{
    "query": "coding style preference",
    "scope": {"tenant_id": "default", "user_id": "admin"},
    "memory_type": "user",
    "limit": 5
  }'

Response:

{
  "memories": [
    {
      "id": "mem-7f3a",
      "content": "User prefers concise TypeScript examples",
      "memory_type": "user",
      "scope": {
        "tenant_id": "default",
        "user_id": "admin",
        "agent_id": null,
        "session_id": null
      },
      "pinned": false,
      "expires_at": null,
      "source": null,
      "last_recalled_at": null,
      "why_stored": null,
      "created_at": null,
      "updated_at": null,
      "score": 0.87
    }
  ]
}

List, Update, and Delete

curl "http://localhost:8080/v1/memory?tenant_id=default&user_id=admin"

curl -X PUT http://localhost:8080/v1/memory/mem-7f3a \
  -H "Content-Type: application/json" \
  -d '{
    "content": "User prefers concise TypeScript examples with tests",
    "scope": {"tenant_id": "default", "user_id": "admin"}
  }'

curl -X DELETE "http://localhost:8080/v1/memory/mem-7f3a?tenant_id=default&user_id=admin"

POST /v1/memory/{memory_id}/forget is an alias for delete that records the operation as a user-initiated forget event.

Turn Events

When the gateway has minted a turn context, memory operations emit structured events to the audit service. These events become part of the sealed turn proof:

memory_write for stores and updates
memory_recall for searches
memory_forget for deletes

Use Audit Trail for the receipt and proof API.

Configuration

Variable	Default	Description
`QDRANT_HOST`	`localhost`	Qdrant host.
`QDRANT_PORT`	`6333`	Qdrant port.
`COLLECTION_PREFIX`	`mem0`	Prefix for tenant memory collections.
`EMBEDDING_DIMS`	`384`	Expected embedding vector size.
`CONSOLIDATION_INTERVAL_HOURS`	`24`	Memory consolidation cadence.
`LLM_BASE_URL`	`http://inference-router:8004/v1`	LLM endpoint for consolidation.
`LLM_MODEL`	Compose fallback `meta-llama/llama-3.3-70b-instruct:free`; `deploy/.env.example` sets `MEMORY_LLM_MODEL=openai/gpt-5.4`	Model used by memory consolidation.
`MEMORY_PORT`	`8003`	Memory service port.

On startup, the service validates existing Qdrant collections that match the configured prefix and recreates collections with incompatible dimensions.

Memory Types​

Scope​

API​

Store a Memory​

Search Memories​

List, Update, and Delete​

Turn Events​

Configuration​