Skip to documentation content Read the source on GitHub
Browse documentation
Getting Started
Reference
In progress Page being written
Metrics
Every Prometheus metric exposed by the LLMKube controller and metal-agent, with labels, units, and notes on what to alert on.
What this page will cover
- Controller metrics: reconcile duration, model download duration, InferenceService phase counts.
- Metal-agent metrics: memory pressure level, evictions total, per-process RSS.
- Pod-side llama.cpp metrics surfaced via the always-on --metrics flag.
- Recommended alert thresholds (paging vs ticketing) for each gauge and counter.