Skip to content
Skip to documentation content
Browse documentation
In progress Page being written

Observability

Wire LLMKube into your existing Prometheus + Grafana stack. The operator ships a PodMonitor for inference pods and exposes ten custom metrics from the controller itself.

What this page will cover

  • Enabling the bundled PodMonitor under monitoring.podMonitor.enabled in values.yaml.
  • Switching to ServiceMonitor for clusters that prefer Service-based scrape targets.
  • Controller-side metrics: reconcile timing, model download duration, InferenceService phase counts.
  • Pod-side metrics from llama.cpp /metrics (always enabled via the --metrics flag).
View Helm values on GitHub
LLMKube LLMKube

Kubernetes for Local LLMs. Deploy, manage, and scale AI inference workloads with production-grade orchestration.

© 2026 Defilan Technologies LLC

Community

Built for the Kubernetes and AI communities

LLMKube is not affiliated with or endorsed by the Cloud Native Computing Foundation or the Kubernetes project. Kubernetes® is a registered trademark of The Linux Foundation.