Getting Started

Deploy your first LLM with LLMKube in under 5 minutes

Prerequisites

  • Kubernetes cluster (v1.11.3+) - can be minikube, kind, GKE, EKS, or AKS
  • kubectl installed and configured
  • Cluster admin permissions (to install CRDs)

1. Install LLMKube

Install the Custom Resource Definitions (CRDs) and deploy the LLMKube operator:

# Install CRDs
kubectl apply -f https://github.com/defilantech/llmkube/releases/latest/download/install.yaml

# Verify installation
kubectl get pods -n llmkube-system

Note: The operator should be running in the llmkube-system namespace. Wait for all pods to be in Running state before proceeding.

2. Deploy Your First Model

Create a Model resource that specifies which LLM to use:

cat <<EOF | kubectl apply -f -
apiVersion: inference.llmkube.dev/v1alpha1
kind: Model
metadata:
  name: phi-3-mini
spec:
  source: https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf/resolve/main/Phi-3-mini-4k-instruct-q4.gguf
  format: gguf
  quantization: Q4_K_M
  hardware:
    accelerator: cpu
  resources:
    cpu: "2"
    memory: "4Gi"
EOF

Check the model download status:

kubectl get models
kubectl describe model phi-3-mini

3. Create an Inference Service

Deploy an InferenceService that serves the model via an OpenAI-compatible API:

cat <<EOF | kubectl apply -f -
apiVersion: inference.llmkube.dev/v1alpha1
kind: InferenceService
metadata:
  name: phi-3-inference
spec:
  modelRef: phi-3-mini
  replicas: 1
  endpoint:
    port: 8080
    path: /v1/chat/completions
    type: ClusterIP
  resources:
    cpu: "2"
    memory: "4Gi"
EOF

Verify the service is running:

kubectl get inferenceservices
kubectl get pods -l app=phi-3-inference

4. Test the API

Port-forward to the service and test the OpenAI-compatible endpoint:

# Port forward to local machine
kubectl port-forward svc/phi-3-inference 8080:8080

In another terminal, send a test request:

curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "phi-3-mini",
    "messages": [
      {
        "role": "user",
        "content": "Explain Kubernetes in one sentence"
      }
    ]
  }'

Success! You should receive a JSON response with the model's completion. The API is fully OpenAI-compatible, so you can use existing SDKs and tools.

Next Steps

Troubleshooting

Model stuck in "Downloading" state

Check the model controller logs:

kubectl logs -n llmkube-system -l app=llmkube-controller

Ensure your cluster has internet access (or the model is available via the configured source URL).

InferenceService pods not starting

Check if the Model is ready:

kubectl get models

Inspect the pod events:

kubectl describe pod -l app=phi-3-inference
API requests timing out

Ensure the pod is running and healthy:

kubectl get pods -l app=phi-3-inference

Check the inference server logs:

kubectl logs -l app=phi-3-inference

For larger models, increase resource allocations in the InferenceService spec.