Skip to content
Skip to documentation content
Browse documentation
In progress Page being written

Runtime backends

LLMKube's RuntimeBackend interface lets one operator drive multiple inference engines. This page explains how runtimes plug in and what each existing backend covers.

What this page will cover

  • The RuntimeBackend Go interface: ContainerName, DefaultImage, BuildArgs, BuildProbes.
  • Behavior of the built-in backends: llamacpp, vllm, tgi, personaplex, generic.
  • Where the metal-agent path differs (oMLX, Ollama) and why those bypass the in-cluster backend.
  • Adding a new backend: registering it in the runtime selector, wiring health probes, optional CRD config blocks.
Read the source on GitHub
LLMKube LLMKube

Kubernetes for Local LLMs. Deploy, manage, and scale AI inference workloads with production-grade orchestration.

© 2026 Defilan Technologies LLC

Community

Built for the Kubernetes and AI communities

LLMKube is not affiliated with or endorsed by the Cloud Native Computing Foundation or the Kubernetes project. Kubernetes® is a registered trademark of The Linux Foundation.