Skip to documentation content Read the source on GitHub
Browse documentation
Getting Started
Reference
In progress Page being written
Runtime backends
LLMKube's RuntimeBackend interface lets one operator drive multiple inference engines. This page explains how runtimes plug in and what each existing backend covers.
What this page will cover
- The RuntimeBackend Go interface: ContainerName, DefaultImage, BuildArgs, BuildProbes.
- Behavior of the built-in backends: llamacpp, vllm, tgi, personaplex, generic.
- Where the metal-agent path differs (oMLX, Ollama) and why those bypass the in-cluster backend.
- Adding a new backend: registering it in the runtime selector, wiring health probes, optional CRD config blocks.