Skip to content
Skip to documentation content
Browse documentation
In progress Page being written

Metrics

Every Prometheus metric exposed by the LLMKube controller and metal-agent, with labels, units, and notes on what to alert on.

What this page will cover

  • Controller metrics: reconcile duration, model download duration, InferenceService phase counts.
  • Metal-agent metrics: memory pressure level, evictions total, per-process RSS.
  • Pod-side llama.cpp metrics surfaced via the always-on --metrics flag.
  • Recommended alert thresholds (paging vs ticketing) for each gauge and counter.
Read the source on GitHub
LLMKube LLMKube

Kubernetes for Local LLMs. Deploy, manage, and scale AI inference workloads with production-grade orchestration.

© 2026 Defilan Technologies LLC

Community

Built for the Kubernetes and AI communities

LLMKube is not affiliated with or endorsed by the Cloud Native Computing Foundation or the Kubernetes project. Kubernetes® is a registered trademark of The Linux Foundation.