Skip to main content

Upload documents

Index a file in the knowledge service and watch the agent answer with RAG-grounded context.

Prerequisites

  • Quickstart completed and the stack running.
  • Signed in to http://localhost.
  • A Keycloak access token (copied from browser devtools) if you want to call the gateway directly.

Steps

1. Prepare a test document

Any PDF, DOCX, or plain text file works. For a quick test file:

echo "AIBox runs chat, agent tools, guardrails, memory, and knowledge retrieval from the local stack." > test-doc.txt

2. Upload via the UI

Open the chat surface and click the paperclip in the input bar. Select your file and wait for the indexed confirmation.

The ingestion pipeline runs in five stages:

  1. Parse the file into raw text.
  2. Split the text into chunks.
  3. Add an LLM-generated context prefix to each chunk (contextual retrieval).
  4. Embed each chunk into a dense + sparse vector pair.
  5. Store and index the chunks in the tenant's Qdrant collection.

3. Ask a question grounded in the document

Type a question that the document can answer:

What does AIBox support?

In the tool timeline you should see the agent call knowledge_search, retrieve chunks, and use them to compose its answer.

4. Verify ingestion via the API

TOKEN=<paste-from-browser-devtools>
curl "http://localhost:8080/v1/knowledge/documents?tenant_id=default" \
-H "Authorization: Bearer $TOKEN"

Abbreviated response:

{
"documents": [
{
"id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"tenant_id": "default",
"title": "test-doc.txt",
"filename": "test-doc.txt",
"content_type": "text/plain",
"chunk_count": 1,
"ingest_status": "ready",
"visibility_mode": "private",
"owner_user_id": "admin",
"created_at": "2026-05-28T10:00:00Z"
}
]
}

tenant_id is a required query/body field on knowledge routes and must match your signed principal unless you are a platform admin. Do not send X-Tenant-ID; the gateway strips spoofable identity headers.

5. Search the knowledge base directly

curl http://localhost:8080/v1/knowledge/search \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"query": "local stack", "tenant_id": "default", "limit": 3}'

Abbreviated response:

{
"results": [
{
"content": "AIBox runs chat, agent tools, guardrails...",
"source": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"score": 0.92,
"document_title": "test-doc.txt",
"chunk_index": 0,
"retrieval_rationale": "Matched chunk 0 from test-doc.txt"
}
]
}

Verify

  • The upload UI shows indexed.
  • GET /v1/knowledge/documents lists the file with ingest_status: "ready".
  • POST /v1/knowledge/search returns at least one match with score > 0.5.
  • The chat answer cites the uploaded content.

Troubleshooting

  • ingest_status: "failed" — open the knowledge container logs (docker logs aibox-knowledge-1) for the parser error; common cause is a scanned PDF without an OCR layer.
  • Empty search results — re-run with a query that contains words from the document. Hybrid retrieval still needs lexical overlap when the corpus is tiny.
  • 401 — your bearer token expired. Refresh it from devtools.

Next


Verified against commit 7f571493 (2026-06-11) · sources 78100af62724.