Skip to content
Skip to documentation content
Browse documentation
In progress Page being written

Model cache

LLMKube creates a per-namespace PVC and downloads each model once. Recreating an InferenceService skips the download entirely.

What this page will cover

  • How the per-namespace llmkube-model-cache PVC is provisioned automatically on first use.
  • The cache key (SHA256 of the source URL) and how the operator dedupes identical sources.
  • Sizing the cache for multi-model workloads, and when to switch to a shared StorageClass.
  • Manually pre-staging models onto the PVC for faster cold starts.
Read the source on GitHub
LLMKube LLMKube

Kubernetes for Local LLMs. Deploy, manage, and scale AI inference workloads with production-grade orchestration.

© 2026 Defilan Technologies LLC

Community

Built for the Kubernetes and AI communities

LLMKube is not affiliated with or endorsed by the Cloud Native Computing Foundation or the Kubernetes project. Kubernetes® is a registered trademark of The Linux Foundation.