Getting Started
Deploy your first LLM with LLMKube in under 5 minutes
Prerequisites
- Kubernetes cluster (v1.11.3+) - can be minikube, kind, GKE, EKS, or AKS
-
kubectlinstalled and configured - Cluster admin permissions (to install CRDs)
1. Install LLMKube
Install the Custom Resource Definitions (CRDs) and deploy the LLMKube operator:
# Install CRDs
kubectl apply -f https://github.com/defilantech/llmkube/releases/latest/download/install.yaml
# Verify installation
kubectl get pods -n llmkube-systemNote: The operator should be running in the llmkube-system namespace. Wait for all pods to be in Running state before proceeding.
2. Deploy Your First Model
Create a Model resource that specifies which LLM to use:
cat <<EOF | kubectl apply -f -
apiVersion: inference.llmkube.dev/v1alpha1
kind: Model
metadata:
name: phi-3-mini
spec:
source: https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf/resolve/main/Phi-3-mini-4k-instruct-q4.gguf
format: gguf
quantization: Q4_K_M
hardware:
accelerator: cpu
resources:
cpu: "2"
memory: "4Gi"
EOFCheck the model download status:
kubectl get models
kubectl describe model phi-3-mini3. Create an Inference Service
Deploy an InferenceService that serves the model via an OpenAI-compatible API:
cat <<EOF | kubectl apply -f -
apiVersion: inference.llmkube.dev/v1alpha1
kind: InferenceService
metadata:
name: phi-3-inference
spec:
modelRef: phi-3-mini
replicas: 1
endpoint:
port: 8080
path: /v1/chat/completions
type: ClusterIP
resources:
cpu: "2"
memory: "4Gi"
EOFVerify the service is running:
kubectl get inferenceservices
kubectl get pods -l app=phi-3-inference4. Test the API
Port-forward to the service and test the OpenAI-compatible endpoint:
# Port forward to local machine
kubectl port-forward svc/phi-3-inference 8080:8080In another terminal, send a test request:
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "phi-3-mini",
"messages": [
{
"role": "user",
"content": "Explain Kubernetes in one sentence"
}
]
}'Success! You should receive a JSON response with the model's completion. The API is fully OpenAI-compatible, so you can use existing SDKs and tools.
Next Steps
Explore Examples
Check out more examples in the GitHub repository, including GPU deployments and advanced configurations.
Read Documentation
Dive deeper into CRD specifications, architecture, and advanced features in the full documentation.
Join the Community
Connect with other LLMKube users, ask questions, and share your experiences on GitHub Discussions.
Need Enterprise Support?
Get dedicated support, architecture reviews, and advanced features for production deployments.
Contact Sales →Troubleshooting
Model stuck in "Downloading" state
Check the model controller logs:
kubectl logs -n llmkube-system -l app=llmkube-controller Ensure your cluster has internet access (or the model is available via the configured source URL).
InferenceService pods not starting
Check if the Model is ready:
kubectl get models Inspect the pod events:
kubectl describe pod -l app=phi-3-inferenceAPI requests timing out
Ensure the pod is running and healthy:
kubectl get pods -l app=phi-3-inference Check the inference server logs:
kubectl logs -l app=phi-3-inference For larger models, increase resource allocations in the InferenceService spec.