Observability

Wire LLMKube into your existing Prometheus + Grafana stack. The operator ships a PodMonitor for inference pods and exposes ten custom metrics from the controller itself.

What this page will cover

Enabling the bundled PodMonitor under monitoring.podMonitor.enabled in values.yaml.
Switching to ServiceMonitor for clusters that prefer Service-based scrape targets.
Controller-side metrics: reconcile timing, model download duration, InferenceService phase counts.
Pod-side metrics from llama.cpp /metrics (always enabled via the --metrics flag).

View Helm values on GitHub