Runtime backends

LLMKube's RuntimeBackend interface lets one operator drive multiple inference engines. This page explains how runtimes plug in and what each existing backend covers.

What this page will cover

The RuntimeBackend Go interface: ContainerName, DefaultImage, BuildArgs, BuildProbes.
Behavior of the built-in backends: llamacpp, vllm, tgi, personaplex, generic.
Where the metal-agent path differs (oMLX, Ollama) and why those bypass the in-cluster backend.
Adding a new backend: registering it in the runtime selector, wiring health probes, optional CRD config blocks.

Read the source on GitHub