Model cache

LLMKube creates a per-namespace PVC and downloads each model once. Recreating an InferenceService skips the download entirely.

What this page will cover

How the per-namespace llmkube-model-cache PVC is provisioned automatically on first use.
The cache key (SHA256 of the source URL) and how the operator dedupes identical sources.
Sizing the cache for multi-model workloads, and when to switch to a shared StorageClass.
Manually pre-staging models onto the PVC for faster cold starts.