Skip to content
Skip to documentation content
Browse documentation
In progress Page being written

CLI reference

The llmkube CLI is a thin client over the Kubernetes API: it knows how to construct Model and InferenceService manifests from a small set of flags.

What this page will cover

  • llmkube deploy: deploy a catalog model or a custom GGUF source URL.
  • llmkube catalog list / info: browse the bundled model catalog.
  • llmkube status: check phase, replicas, and endpoint for a deployed InferenceService.
  • llmkube version: print client and server build info.
View CLI section in README
LLMKube LLMKube

Kubernetes for Local LLMs. Deploy, manage, and scale AI inference workloads with production-grade orchestration.

© 2026 Defilan Technologies LLC

Community

Built for the Kubernetes and AI communities

LLMKube is not affiliated with or endorsed by the Cloud Native Computing Foundation or the Kubernetes project. Kubernetes® is a registered trademark of The Linux Foundation.