Skip to documentation content Read the source on GitHub
Browse documentation
Getting Started
Reference
In progress Page being written
Model cache
LLMKube creates a per-namespace PVC and downloads each model once. Recreating an InferenceService skips the download entirely.
What this page will cover
- How the per-namespace llmkube-model-cache PVC is provisioned automatically on first use.
- The cache key (SHA256 of the source URL) and how the operator dedupes identical sources.
- Sizing the cache for multi-model workloads, and when to switch to a shared StorageClass.
- Manually pre-staging models onto the PVC for faster cold starts.