Getting Started

Deploy your first LLM with LLMKube in under 5 minutes

Prerequisites

Kubernetes cluster (v1.11.3+) - can be minikube, kind, GKE, EKS, or AKS
kubectl installed and configured
Cluster admin permissions (to install CRDs)

1. Install LLMKube

Install the Custom Resource Definitions (CRDs) and deploy the LLMKube operator:

# Install CRDs
kubectl apply -f https://github.com/defilantech/llmkube/releases/latest/download/install.yaml

# Verify installation
kubectl get pods -n llmkube-system

Note: The operator should be running in the llmkube-system namespace. Wait for all pods to be in Running state before proceeding.

2. Deploy Your First Model

Create a Model resource that specifies which LLM to use:

cat <<EOF | kubectl apply -f -
apiVersion: inference.llmkube.dev/v1alpha1
kind: Model
metadata:
  name: phi-3-mini
spec:
  source: https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf/resolve/main/Phi-3-mini-4k-instruct-q4.gguf
  format: gguf
  quantization: Q4_K_M
  hardware:
    accelerator: cpu
  resources:
    cpu: "2"
    memory: "4Gi"
EOF

Check the model download status:

kubectl get models
kubectl describe model phi-3-mini

3. Create an Inference Service

Deploy an InferenceService that serves the model via an OpenAI-compatible API:

cat <<EOF | kubectl apply -f -
apiVersion: inference.llmkube.dev/v1alpha1
kind: InferenceService
metadata:
  name: phi-3-inference
spec:
  modelRef: phi-3-mini
  replicas: 1
  endpoint:
    port: 8080
    path: /v1/chat/completions
    type: ClusterIP
  resources:
    cpu: "2"
    memory: "4Gi"
EOF

Verify the service is running:

kubectl get inferenceservices
kubectl get pods -l app=phi-3-inference

4. Test the API

Port-forward to the service and test the OpenAI-compatible endpoint:

# Port forward to local machine
kubectl port-forward svc/phi-3-inference 8080:8080

In another terminal, send a test request:

curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "phi-3-mini",
    "messages": [
      {
        "role": "user",
        "content": "Explain Kubernetes in one sentence"
      }
    ]
  }'

Success! You should receive a JSON response with the model's completion. The API is fully OpenAI-compatible, so you can use existing SDKs and tools.

Next Steps

Explore Examples

Check out more examples in the GitHub repository, including GPU deployments and advanced configurations.

Read Documentation

Dive deeper into CRD specifications, architecture, and advanced features in the full documentation.

Join the Community

Connect with other LLMKube users, ask questions, and share your experiences on GitHub Discussions.

Need Enterprise Support?

Get dedicated support, architecture reviews, and advanced features for production deployments.

Contact Sales →

Troubleshooting

Model stuck in "Downloading" state

Check the model controller logs:

kubectl logs -n llmkube-system -l app=llmkube-controller

Ensure your cluster has internet access (or the model is available via the configured source URL).

InferenceService pods not starting

Check if the Model is ready:

kubectl get models

Inspect the pod events:

kubectl describe pod -l app=phi-3-inference

API requests timing out

Ensure the pod is running and healthy:

kubectl get pods -l app=phi-3-inference

Check the inference server logs:

kubectl logs -l app=phi-3-inference

For larger models, increase resource allocations in the InferenceService spec.

← Back to home