I Sent the Agents Loose on My Kubernetes Operator. Here's What They Shipped.
I pointed a fleet of coding agents at LLMKube and told them to audit the repo. Not "find me bugs." Not "draft release notes." An honest critical read of where the project looks solid and where it looks like a weekend project that grew up in public, ranked by what would move that signal the most per unit of effort.
Six hours later there were seventeen PRs on main. The 1,567-line god controller was 356 lines. The curl | bash installer that had been silently broken for eight months actually worked, and verified its own checksums on the way. govulncheck was gating CI against Go stdlib CVEs. Then I built the audit-fixed operator and rolled it onto the cluster running the model that ran the audit. No regressions. Six existing inference services kept serving.
This is what happened.
The setup
The way I run Claude Code on a codebase this size is never as a single thread. It dispatches subagents in parallel. One reads the Go code, another the CRDs, another the CI pipelines, another the Helm chart, another the docs. They come back with independent grades and findings, and the main thread stitches them into a single report. That's the "fleet" part. I set a budget (be critical, be ranked, be specific) and let them go.
The bottom line they came back with:
The project is past "sloppy" and well into "solidly built," but it carries the tell-tale scars of fast organic growth from v0.1 → v0.7: a god-reconciler, an alpha CRD that's sprouting runtime-specific fields at the top level, docs that haven't caught up to what the code can do, and a repo root that looks lived-in.
That's exactly right. Every one of those things was true.
The five critical issues
1. A 1,567-line god reconciler
internal/controller/inferenceservice_controller.go had absorbed almost everything over seven minor releases. Deployment construction. Service reconcile. HPA creation. Pod scheduling analysis. Phase determination. Queue-position math. Priority-class resolution. Every llama.cpp command-line flag appender. GPU tensor split ratio calculation. Status writes across five condition types.
It compiled. It worked. It had 83.1% test coverage. And it was the kind of file that makes a drive-by reviewer assume the rest of the codebase is equally unruly. It isn't. pkg/agent/ is tidy. But first impressions get made by the biggest file.
2. The PROJECT file pointed at the wrong GitHub org
The kubebuilder PROJECT metadata still said repo: github.com/defilan/llmkube from before the org rename, in three places. Any scaffolding regeneration would have minted new resources with the wrong import path.
3. ROADMAP frozen at v0.5.0
The header literally read:
**Current Version:** 0.5.0
**Last Updated:** March 2026 We were on 0.7.0. The "planned" section listed vLLM, TGI, PersonaPlex, HPA, hybrid offload. All already shipped, two releases back. A reader finding it would assume the project was abandoned.
4. SECURITY supported-versions lied
The table said we supported 0.5.x and 0.4.x. Users on 0.7.x didn't know if security patches applied to them.
5. No supply-chain hardening on a curl | bash installer
The recommended install path pipes a shell script from raw.githubusercontent.com into bash. That script then pipes a tarball from github.com/.../releases/... into tar. Neither step verified anything. No checksums. No cosign. No SBOM. Nothing stopping a compromised CDN or a hijacked token from pushing a binary onto every user's machine.
CI had the same gap on the ingest side. No govulncheck, no gosec, no container scan.
Three workstreams
Everything got sequenced into three parallel workstreams so the easy wins could ship without blocking on the controller split.
Workstream A, Polish (PR #309): Fixed the PROJECT path. Rewrote ROADMAP for the v0.7.0 reality. Updated SECURITY version matrix. Tightened .gitignore and swept about twenty stray PNG screenshots out of the repo root. Moved two internal design specs out of the public docs/ tree. Locked down Status.Phase with +kubebuilder:validation:Enum on both CRDs so silent drift becomes impossible. Logged a swallowed error the Metal agent was burying at pkg/agent/executor.go:114.
Workstream B, Supply-chain MVP (PR #310, #319): Added sha256 checksum verification to install.sh, with a LLMKUBE_SKIP_CHECKSUM=1 escape hatch for scripted test environments. Added a new .github/workflows/security.yml running govulncheck ./... on push, PR, and Monday 07:00 UTC cron. Added gosec and bodyclose to .golangci.yml, with a config-level excludelist for the rules that fire constantly on operator-style code that intentionally execs binaries and reads user-provided paths, plus nine targeted //nolint:gosec // G115: <reason> annotations for bounded integer conversions. Wired codecov-action@v5 after make test and followed up with codecov.yml marking patch coverage informational so pure-move refactors stop tripping the check.
Workstream C, Controller split (PR #312 through #318): Seven sequential pure-move PRs broke the 1,567-line file into focused siblings. Each PR preserved behavior exactly. Here's the end state:
inferenceservice_controller.go 356 (was 1,567, orchestration only)
runtime_llamacpp_args.go 130 llama.cpp flag appenders
gpu_sharding.go 126 --split-mode and --tensor-split math
model_storage.go 296 PVC + cached + emptyDir model-volume wiring
hpa_reconciler.go 195 HPA create/update/delete
scheduling.go 201 phase, queue position, priority classes
deployment_builder.go 229 the 170-line constructDeployment + pod security
status_builder.go 232 updateStatusWithSchedulingInfo, VLLM spec
service_builder.go 75 constructService internal/controller coverage held at 83.1% through every PR. Zero tests needed rewriting. Every function resolved through the package unchanged.
The install.sh surprise
Adding checksum verification surfaced something I had to double-check: curl -sSL https://raw.githubusercontent.com/defilantech/LLMKube/main/install.sh | bash had been silently failing for every user for eight months.
The script constructed the archive filename as llmkube_0.7.0_darwin_arm64.tar.gz, lowercase llmkube_. But goreleaser's {{ .ProjectName }}_{{ .Version }}_{{ .Os }}_{{ .Arch }} template resolved .ProjectName to LLMKube and was emitting LLMKube_0.7.0_darwin_arm64.tar.gz, capital LLMKube_.
Every install attempt was fetching llmkube_...tar.gz (404), GitHub returned an HTML error page, curl silently saved it to disk as llmkube.tar.gz (because the script didn't use curl -f), and tar exited with "not a gzip file." Cryptic enough that nobody filed an issue.
I'd never caught it because I test via brew install defilantech/tap/llmkube (which works; the Homebrew tap is a different release path) or via go build (which bypasses releases entirely). The audit caught it on the first dry run because adding checksum verification made the actual download URL matter for the first time.
Same commit fixed the filename and added the sha256 verification. A separate ARCHIVE_PREFIX="LLMKube" constant sits next to BINARY_NAME="llmkube", and both curl calls got -f so HTTP errors fail fast instead of silently saving an error page to disk.
The Go toolchain whack-a-mole
Turning on govulncheck in CI also immediately failed the build.
First wave: 20 stdlib vulnerabilities across net/url, crypto/tls, crypto/x509, encoding/asn1, encoding/pem, net/mail. All in the GO-2025-* series. All fixed in Go 1.25.2 or 1.25.3. go.mod was pinned to go 1.25.0. Bumped to 1.25.3, CI green.
Thirty seconds later CI went red again. The new run picked up the GO-2026-* series. 12 more stdlib vulnerabilities, fixed in 1.25.5 through 1.25.9. Bumped to 1.25.9, CI green.
The long-term fix isn't "keep bumping go.mod manually every time a new Go patch drops." It's splitting the single directive into a language floor plus a toolchain preference:
go 1.25.0
toolchain go1.25.9 Dependabot's gomod ecosystem now tracks the toolchain directive as a separate line item (the feature landed in dependabot/dependabot-core#10131 in early 2025). Weekly toolchain bumps ship as normal Dependabot PRs from here on.
Shadowstack validation
Once all 17 PRs had merged, I cross-built the controller for linux/amd64 against the post-audit main (registry.defilan.net/llmkube-controller:0.7.1-dev-audit-c0cbe32) and rolled it onto my home cluster via helm upgrade --reuse-values.
Clean rollout. Zero restarts. The new Status.Phase enum validation didn't reject any of the existing phase: "Ready" | "Creating" values on the six in-flight InferenceServices. Controller logs showed all ten Model resources reconciling ("already Ready, skipping") and the Metal-agent path firing correctly for the qwen35-metal-test service. Three Ready services kept serving. Three services stuck Creating before the audit (on separate GPU-pressure reasons that predate all this work) kept being stuck. Nothing new broke.
The recursive bit
Here's what lands for me.
The audit was run by a fleet of coding agents. The models that can run agents like them, including the Qwen 3.6-35B that lived on two RTX 5060 Ti and shipped PR #283 last week, are served by LLMKube.
Agents ran on one substrate. They audited a Kubernetes operator that can run agents like them. They shipped the fixes. I deployed the fixes back onto the cluster. Same recursive loop as the April post where a local model wrote PR #283 overnight, just the audit version instead of the feature version.
That's the actual thing LLMKube is becoming. Not "Kubernetes but for LLMs." A platform where sending agents loose on your own hardware is a first-class workload. Overnight, private, no cloud API bill.
What's next
The v0.7.0 audit queue still has two big arcs left.
Release hardening. cosign-sign the controller image, the CLI binaries, and the Helm chart. Publish SBOMs via goreleaser's sboms: block. Document verification commands for users. This is the remaining supply-chain gap.
v1beta1 API cleanup. Consolidate llama.cpp flags under a LlamaCppConfig substruct so runtime-specific fields stop sprawling at the top of InferenceServiceSpec. Replace NoKvOffload *bool (negative booleans fail API-review everywhere) with an enum-typed field. Audit ExtraArgs usage and promote the top flags to typed fields. Validating webhook for cross-field rules and Secret existence. Phase → Conditions migration (Kubernetes upstream is moving that direction). Conversion webhook so existing v1alpha1 resources upgrade cleanly.
Both get their own audit + plan pass before starting.
One more thing worth saying here. None of the April work replaces what the community has already shipped. Matias Insaurralde has three merged PRs behind him: structured zap logging in the Metal agent, endpoint cleanup on process delete, and the Apple Silicon bug-report template that keeps showing up on incoming issues. Pablo Castorino shipped the runAsUser / runAsGroup inheritance fix that has been quietly saving every Metal-agent deployment from permission-denied errors on the model volume since it merged. The audit-day work was mine plus the agents. The project in front of you is not.
If you want to add a PR of your own, the good-first-issue label has the on-ramp. The repo is at 55 stars and 10 forks today; Discord is the fastest way to compare notes with anyone running local LLMs in production.
- Star the repo: github.com/defilantech/LLMKube
- Docs: llmkube.com
- Discord: discord.gg/Ktz85RFHDv
The PRs this audit spawned are all on main: #309, #310, #312 through #318, #319, #320 through #325.