Skip to main content

Observability Reference

The observability service stores first-party generation records emitted by the inference router. It complements Langfuse and the audit service; it does not replace either.

Data Flow

The router records route metadata such as backend, provider version, deterministic capability, weight hash, latency, token counts, and cost when available.

API Surface

EndpointPurpose
POST /v1/generationsInternal ingestion endpoint used by inference-router.
GET /v1/generationsList generation records.
GET /v1/generations/statsAggregate generation usage.
GET /v1/generations/{event_id}Fetch one generation record.
GET /v1/pricesList known model prices.
POST /v1/prices/refreshRefresh OpenRouter model pricing.
GET /v1/admin/observability/tracesAdmin trace summaries through the gateway.
GET /v1/admin/observability/traces/{trace_id}Admin trace detail through the gateway.
GET /v1/admin/observability/usageAdmin usage report.
GET /v1/admin/observability/budgetsList tenant budgets.
PUT /v1/admin/observability/budgets/{tenant_id}Update tenant budget controls.

Admin endpoints are reached through the gateway and require admin role context.

Pricing

OpenRouter prices come from OPENROUTER_MODELS_URL. Local model prices are operator-defined:

OBS_LOCAL_PRICE_JSON='{"vllm-local:google/gemma-4-E4B-it":{"in":0.0,"out":0.0}}'

Pricing is advisory. Use audit receipts for integrity claims and billing exports for final financial reconciliation.

SystemUse it for
LangfusePrompt/model debugging and model-call traces.
AuditSecurity events, hash-chain verification, and signed turn receipts.
Prometheus/Grafana/JaegerOptional infrastructure metrics and distributed tracing from docker-compose.observability.yml.