What we shipped in LLMKube 0.7.6: memory-pressure protection, mutable modelRef, and a community PR worth celebrating
0.7.6 is the biggest LLMKube release since multi-GPU sharding landed. Memory-pressure protection on the metal-agent (priority-based eviction with a friendly-fire guard), modelRef finally mutable, ParallelSlots extended to vLLM thanks to a polished community PR from @Faylixe, three new K8s-native pod fields (runtimeClassName, podAnnotations, podLabels), a real CNCF-style docs site, plus a quickstart-killer caught and fixed Saturday night. Here's what landed.