Observability Reference
The observability service stores first-party generation records emitted by the inference router. It complements Langfuse and the audit service; it does not replace either.
Data Flow
The router records route metadata such as backend, provider version, deterministic capability, weight hash, latency, token counts, and cost when available.
API Surface
| Endpoint | Purpose |
|---|---|
POST /v1/generations | Internal ingestion endpoint used by inference-router. |
GET /v1/generations | List generation records. |
GET /v1/generations/stats | Aggregate generation usage. |
GET /v1/generations/{event_id} | Fetch one generation record. |
GET /v1/prices | List known model prices. |
POST /v1/prices/refresh | Refresh OpenRouter model pricing. |
GET /v1/admin/observability/traces | Admin trace summaries through the gateway. |
GET /v1/admin/observability/traces/{trace_id} | Admin trace detail through the gateway. |
GET /v1/admin/observability/usage | Admin usage report. |
GET /v1/admin/observability/budgets | List tenant budgets. |
PUT /v1/admin/observability/budgets/{tenant_id} | Update tenant budget controls. |
Admin endpoints are reached through the gateway and require admin role context.
Pricing
OpenRouter prices come from OPENROUTER_MODELS_URL. Local model prices are operator-defined:
OBS_LOCAL_PRICE_JSON='{"vllm-local:google/gemma-4-E4B-it":{"in":0.0,"out":0.0}}'
Pricing is advisory. Use audit receipts for integrity claims and billing exports for final financial reconciliation.
Related Systems
| System | Use it for |
|---|---|
| Langfuse | Prompt/model debugging and model-call traces. |
| Audit | Security events, hash-chain verification, and signed turn receipts. |
| Prometheus/Grafana/Jaeger | Optional infrastructure metrics and distributed tracing from docker-compose.observability.yml. |