Metrics

Every Prometheus metric exposed by the LLMKube controller and metal-agent, with labels, units, and notes on what to alert on.

What this page will cover

Controller metrics: reconcile duration, model download duration, InferenceService phase counts.
Metal-agent metrics: memory pressure level, evictions total, per-process RSS.
Pod-side llama.cpp metrics surfaced via the always-on --metrics flag.
Recommended alert thresholds (paging vs ticketing) for each gauge and counter.