v0.5.1 Open Source ยท Kubernetes Native
Run production LLMs
on your own hardware
We analyzed 200 GitHub issues with a 14B model on two $400 GPUs. Total cost: one cent. LLMKube makes self-hosted inference actually work.
See it in action
Deploy GPU-accelerated LLMs in seconds with the llmkube CLI
terminal
$ llmkube catalog list
๐ LLMKube Model Catalog (v1.0)
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
ID NAME SIZE QUANT VRAM
โโ โโโโ โโโโ โโโโโ โโโโ
llama-3.1-8b Llama 3.1 8B Instruct 8B Q5_K_M 5-8GB
qwen-2.5-coder-7b Qwen 2.5 Coder 7B 7B Q5_K_M 5-8GB
mistral-7b Mistral 7B Instruct 7B Q5_K_M 5-8GB
phi-3-mini Phi-3 Mini (3.8B) 3.8B Q5_K_M 2-4GB
๐ก To deploy: llmkube deploy <MODEL_ID> --gpu
Step 1/4: Browse available models
Why LLMKube?
Local LLMs are great for prototyping. Scaling them for a team is where it gets hard.
The scaling problem
- ร Silent failures with no alerts
- ร Multi-GPU memory math by trial and error
- ร Updates that break your setup
- ร Docker Compose that doesn't scale
- ร One person managing everything
- ร Every machine set up by hand
With LLMKube
- Health checks that actually tell you when things break
- GPU layer offloading with automatic configuration
- Helm-pinned versions that don't break on update
- Infrastructure as code, not scripts and duct tape
- Your whole team can deploy and manage
- Prometheus + Grafana integration for GPU monitoring
Ollama for dev. vLLM for speed. LLMKube for Kubernetes.
The platform layer your inference engine is missing.
Deploy an LLM in seconds
Simple, declarative YAML that feels native to Kubernetes developers
apiVersion: inference.llmkube.dev/v1alpha1
kind: Model
metadata:
name: phi-3-mini
spec:
source: https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf/resolve/main/Phi-3-mini-4k-instruct-q4.gguf
format: gguf
quantization: Q4_K_M
hardware:
accelerator: cuda
gpu:
enabled: true
count: 1
resources:
cpu: "2"
memory: "4Gi"Supports GGUF models from HuggingFace, with automatic download and caching
Limited to 10 Teams
Early Adopter Program
Help shape the future of LLMKube and get direct access to the maintainer.
What you get
- Private Discord with other early adopters
- Direct input on the roadmap
- Your logo on our website (when ready)
- Early access to new features
What we need
- Real-world feedback on your use case
- 30 minutes monthly for a feedback call
- Permission to share your story (anonymized if needed)
Apply to join
Ready to deploy your first LLM?
Join the community of developers deploying LLMs on Kubernetes.
Open source and free forever