Building ShadowStack: Our On-Prem LLM Testing Lab
You can't truly validate air-gapped deployments in the cloud. So we built ShadowStack: a bare-metal testing environment that emulates what organizations actually run in their datacenters. Here's the story of why we built it, what's inside, and what we're testing on it.
Testing in the Shadows
When you're building infrastructure for defense contractors, healthcare systems, and financial institutions, "it works in my Kubernetes cluster on AWS" doesn't cut it. We know that many industries would want something like LLMKube in:
- SCIFs and classified networks where there's no internet access, period
- Hospital data centers with strict HIPAA compliance and legacy hardware
- Factory floors with edge deployments on constrained hardware
- Financial trading desks where latency is measured in microseconds and data never leaves the building
Building LLMKube as a one-person shop means being resourceful. We needed a way to test in an environment that actually resembles these scenarios. Not a cloud VM. Not a laptop. Real bare metal, real air-gaps, real constraints.
Enter ShadowStack.
The Build: 32 GB of VRAM and Zero Compromises
ShadowStack is purpose-built for one thing: running production-grade LLM workloads in conditions that mirror real-world datacenter environments. Here's what we packed into it:
The Hardware
| CPU | AMD Ryzen 9 7900X (12c/24t, 4.7–5.6 GHz) |
| Memory | 64 GB DDR5-6000 (2×32 GB) |
| GPUs | 2× NVIDIA RTX 5060 Ti 16 GB (32 GB total VRAM) |
| Storage | Samsung 990 Pro 1 TB NVMe PCIe 4.0 |
| Cooling | Noctua NH-U12S (keeps the 7900X under 85°C) |
| Power | Corsair RM1000x 1000W 80+ Gold (plenty of headroom) |
Why These Parts Matter
Every component was chosen with intention:
- Dual RTX 5060 Ti 16 GB GPUs: 32 GB of total VRAM lets us fully offload a 70B model quantized to Q4_K_M, or run multiple smaller models simultaneously. The new Blackwell architecture gives us 30-50% better performance than the previous generation.
- Ryzen 9 7900X: 12 cores and 24 threads provide massive headroom for Kubernetes orchestration, tokenization, RAG pipelines, and everything else happening alongside inference.
- 64 GB DDR5-6000: Plenty of system RAM for multiple large models in memory plus K3s overhead. Fast memory speeds help with model loading and preprocessing.
- 1 TB NVMe: Fast storage means sub-5-second cold starts when loading models from disk. Critical for testing auto-scaling behavior.
What ShadowStack Enables
Having real hardware unlocks testing scenarios that are impossible in cloud environments:
1. Air-Gap Simulation
We can physically disconnect ShadowStack from the internet and test the full offline installation workflow: downloading container images to USB drives, transferring model weights, setting up local registries. This is exactly what defense and healthcare organizations do.
2. Multi-Node Kubernetes
While ShadowStack currently runs as a single-node K3s cluster, we can expand it with additional nodes to test true distributed scenarios like model sharding, replica scheduling, network policies, and service mesh behavior.
3. Real GPU Workloads
No mocks. No simulations. We're running actual LLM inference with vLLM, Ollama, and LLMKube on real NVIDIA hardware. This lets us catch GPU-specific issues like memory fragmentation, CUDA errors, and thermal throttling that never show up in CPU-only testing.
4. Edge Deployment Testing
ShadowStack represents a typical edge datacenter or high-end workstation deployment. It's not a rack of servers but a single box that needs to handle everything. This constraint forces us to optimize for resource efficiency and multi-tenancy.
What ShadowStack Can Handle
With 32 GB of VRAM across dual RTX 5060 Ti GPUs, ShadowStack should comfortably handle 70B models with full GPU offload and run multiple smaller models simultaneously. The 12-core Ryzen 9 7900X provides plenty of headroom for Kubernetes orchestration, preprocessing, and everything else happening alongside inference.
We'll be publishing detailed performance benchmarks as we put ShadowStack through its paces: testing everything from cold start times to multi-model workloads to failure scenarios. Real numbers from real hardware, not theoretical projections.
What's Next: Demos, Tutorials, and Real Deployments
ShadowStack isn't just for internal testing. It's also our content production environment. Over the coming months, you'll see:
- Video demos showing LLMKube deployments from scratch on bare metal
- Air-gap installation tutorials walking through the full offline workflow
- Performance benchmarks comparing different inference engines and model formats
- Multi-model deployments demonstrating how to run multiple LLMs with resource isolation
- Disaster recovery scenarios testing node failures, GPU crashes, and network partitions
All of this content will be filmed on ShadowStack, so you'll see exactly how things work on real hardware, not sanitized cloud examples.
Building Your Own ShadowStack
If you're building a similar test environment for your team, here are a few lessons we learned:
- Prioritize VRAM over everything. You can work around slow CPUs or limited system RAM, but if your models don't fit in GPU memory, you're dead in the water. Dual 16 GB GPUs was the sweet spot for us.
- Don't skimp on the PSU. GPU power spikes are real. We went with 1000W to have plenty of headroom, and it's paid off in stability.
- Fast storage matters more than you think. Model loading is I/O bound. An NVMe drive turns 30-second cold starts into 5-second ones.
- Plan for expansion. We started with a single node, but chose a motherboard with dual PCIe 5.0 slots and room to add more storage. Future-proofing is worth the extra $50.
- Air-gapped testing requires discipline. It's tempting to just "quickly pull that container image" from the internet. Resist. Treat it like a real SCIF environment and you'll catch deployment issues early.
The Real Test Lab
Cloud testing is convenient. It's fast. It scales infinitely. But it doesn't prepare you for the reality of datacenter deployments where there's no `kubectl logs` piped to a cloud observability platform, no magical auto-scaling, and definitely no internet access.
ShadowStack gives us that reality check. It's our proving ground for LLMKube, a place where we can break things, fix them, and document the whole process for teams building similar systems.
Over the next few months, you'll see a lot more content coming from ShadowStack. We'll be documenting everything: the successes, the failures, and the weird edge cases that only show up on real hardware.
Stay tuned. The real work is just beginning.
Want to follow along? Watch the LLMKube GitHub repository for updates, and check back here for demos and tutorials filmed on ShadowStack.