Building ShadowStack: Our On-Prem LLM Testing Lab

You can't truly validate air-gapped deployments in the cloud. So we built ShadowStack: a bare-metal testing environment that emulates what organizations actually run in their datacenters. Here's the story of why we built it, what's inside, and what we're testing on it.

Testing in the Shadows

When you're building infrastructure for defense contractors, healthcare systems, and financial institutions, "it works in my Kubernetes cluster on AWS" doesn't cut it. We know that many industries would want something like LLMKube in:

SCIFs and classified networks where there's no internet access, period
Hospital data centers with strict HIPAA compliance and legacy hardware
Factory floors with edge deployments on constrained hardware
Financial trading desks where latency is measured in microseconds and data never leaves the building

Building LLMKube as a one-person shop means being resourceful. We needed a way to test in an environment that actually resembles these scenarios. Not a cloud VM. Not a laptop. Real bare metal, real air-gaps, real constraints.

Enter ShadowStack.

The Build: 32 GB of VRAM and Zero Compromises

ShadowStack is purpose-built for one thing: running production-grade LLM workloads in conditions that mirror real-world datacenter environments. Here's what we packed into it:

The Hardware

CPU	AMD Ryzen 9 7900X (12c/24t, 4.7–5.6 GHz)
Memory	64 GB DDR5-6000 (2×32 GB)
GPUs	2× NVIDIA RTX 5060 Ti 16 GB (32 GB total VRAM)
Storage	Samsung 990 Pro 1 TB NVMe PCIe 4.0
Cooling	Noctua NH-U12S (keeps the 7900X under 85°C)
Power	Corsair RM1000x 1000W 80+ Gold (plenty of headroom)

Why These Parts Matter

Every component was chosen with intention:

Dual RTX 5060 Ti 16 GB GPUs: 32 GB of total VRAM lets us fully offload a 70B model quantized to Q4_K_M, or run multiple smaller models simultaneously. The new Blackwell architecture gives us 30-50% better performance than the previous generation.
Ryzen 9 7900X: 12 cores and 24 threads provide massive headroom for Kubernetes orchestration, tokenization, RAG pipelines, and everything else happening alongside inference.
64 GB DDR5-6000: Plenty of system RAM for multiple large models in memory plus K3s overhead. Fast memory speeds help with model loading and preprocessing.
1 TB NVMe: Fast storage means sub-5-second cold starts when loading models from disk. Critical for testing auto-scaling behavior.

What ShadowStack Enables

Having real hardware unlocks testing scenarios that are impossible in cloud environments:

1. Air-Gap Simulation

We can physically disconnect ShadowStack from the internet and test the full offline installation workflow: downloading container images to USB drives, transferring model weights, setting up local registries. This is exactly what defense and healthcare organizations do.

2. Multi-Node Kubernetes

While ShadowStack currently runs as a single-node K3s cluster, we can expand it with additional nodes to test true distributed scenarios like model sharding, replica scheduling, network policies, and service mesh behavior.

3. Real GPU Workloads

No mocks. No simulations. We're running actual LLM inference with vLLM, Ollama, and LLMKube on real NVIDIA hardware. This lets us catch GPU-specific issues like memory fragmentation, CUDA errors, and thermal throttling that never show up in CPU-only testing.

4. Edge Deployment Testing

ShadowStack represents a typical edge datacenter or high-end workstation deployment. It's not a rack of servers but a single box that needs to handle everything. This constraint forces us to optimize for resource efficiency and multi-tenancy.

What ShadowStack Can Handle

With 32 GB of VRAM across dual RTX 5060 Ti GPUs, ShadowStack should comfortably handle 70B models with full GPU offload and run multiple smaller models simultaneously. The 12-core Ryzen 9 7900X provides plenty of headroom for Kubernetes orchestration, preprocessing, and everything else happening alongside inference.

We'll be publishing detailed performance benchmarks as we put ShadowStack through its paces: testing everything from cold start times to multi-model workloads to failure scenarios. Real numbers from real hardware, not theoretical projections.

What's Next: Demos, Tutorials, and Real Deployments

ShadowStack isn't just for internal testing. It's also our content production environment. Over the coming months, you'll see:

Video demos showing LLMKube deployments from scratch on bare metal
Air-gap installation tutorials walking through the full offline workflow
Performance benchmarks comparing different inference engines and model formats
Multi-model deployments demonstrating how to run multiple LLMs with resource isolation
Disaster recovery scenarios testing node failures, GPU crashes, and network partitions

All of this content will be filmed on ShadowStack, so you'll see exactly how things work on real hardware, not sanitized cloud examples.

Building Your Own ShadowStack

If you're building a similar test environment for your team, here are a few lessons we learned:

Prioritize VRAM over everything. You can work around slow CPUs or limited system RAM, but if your models don't fit in GPU memory, you're dead in the water. Dual 16 GB GPUs was the sweet spot for us.
Don't skimp on the PSU. GPU power spikes are real. We went with 1000W to have plenty of headroom, and it's paid off in stability.
Fast storage matters more than you think. Model loading is I/O bound. An NVMe drive turns 30-second cold starts into 5-second ones.
Plan for expansion. We started with a single node, but chose a motherboard with dual PCIe 5.0 slots and room to add more storage. Future-proofing is worth the extra $50.
Air-gapped testing requires discipline. It's tempting to just "quickly pull that container image" from the internet. Resist. Treat it like a real SCIF environment and you'll catch deployment issues early.

The Real Test Lab

Cloud testing is convenient. It's fast. It scales infinitely. But it doesn't prepare you for the reality of datacenter deployments where there's no `kubectl logs` piped to a cloud observability platform, no magical auto-scaling, and definitely no internet access.

ShadowStack gives us that reality check. It's our proving ground for LLMKube, a place where we can break things, fix them, and document the whole process for teams building similar systems.

Over the next few months, you'll see a lot more content coming from ShadowStack. We'll be documenting everything: the successes, the failures, and the weird edge cases that only show up on real hardware.

Stay tuned. The real work is just beginning.

Want to follow along? Watch the LLMKube GitHub repository for updates, and check back here for demos and tutorials filmed on ShadowStack.