Foreman

Foreman is the Kubernetes-native control plane for agentic workloads that ships as an opt-in add-on to LLMKube. It dispatches coder, verifier, and reviewer agents across a heterogeneous fleet of locally-hosted LLM nodes (Apple Silicon Metal, NVIDIA CUDA, Intel oneAPI / SYCL), runs each agent through a native Go function-calling loop, and produces PR-shaped contributions against your GitHub repositories.

If you only want LLMKube for serving local models, the operator and CRDs you already use are unchanged. Foreman lives in its own API group (foreman.llmkube.dev) and its own Helm chart. You install it on top of LLMKube when you’re ready for the pipeline shape; you ignore it otherwise.

This page is the entry point. Deep references are linked at the bottom.

Why Foreman

The argument for Foreman is the combination of three constraints that ordinary agentic frameworks don’t co-solve:

Kubernetes-native. Declarative CRDs, controller-runtime reconcilers, RBAC, Helm, OpenTelemetry. Drops into your existing ops surface rather than fighting it.
Heterogeneous-fleet by design. Capability-aware dispatch matches each task to a node whose advertised hardware actually fits the job. An Apple Silicon Mac runs the coder, a NVIDIA box runs the gate, a third node runs the reviewer.
Local by default. Inference happens against your own InferenceServices (vLLM, llama.cpp, mlx-server, vllm-swift). No cloud API egress unless you opt in to the cloud-reviewer escape hatch.

In-process frameworks (CrewAI, LangGraph, AutoGen) are excellent inside one Python or TypeScript process; they don’t solve the fleet dispatch problem. Datacenter inference platforms (NVIDIA Dynamo, vLLM serving) solve the inference side but not the agent-pipeline side. Foreman lives between them: above the inference engine, below the application-layer agent framework, with Kubernetes as the substrate.

Key concepts

Four CRDs make up the Foreman surface:

CRD	Scope	What it represents
`Workload`	namespaced	The user-facing intent (“fix these eight issues in this repo”). The reconciler decomposes it into a pipeline of AgenticTasks.
`AgenticTask`	namespaced	One dispatchable unit of work. References an Agent + a payload. The scheduler claims it for a FleetNode whose capability matches.
`Agent`	namespaced	A reusable role definition: system prompt, tool whitelist, model endpoint, required capability. The same Agent can drive many AgenticTasks.
`FleetNode`	cluster-scoped	A node in the fleet. The foreman-agent on each host self-registers and advertises its capability (accelerator family, RAM, context window, roles).

The minimal lifecycle:

Workload (intent)
  └── reconciler decomposes →
      ├── AgenticTask: code (agentRef: coder, kind: issue-fix)
      ├── AgenticTask: verify (agentRef: gate, kind: verify, dependsOn: code)
      └── AgenticTask: review (agentRef: reviewer, kind: review, dependsOn: verify)

Each task is claimed by the FleetNode whose capability satisfies the referenced Agent’s requiredCapability. The foreman-agent on that host runs the native Go loop against the local inference endpoint. Verdicts cascade to the parent Workload. Fork branches land on GitHub for human review.

Pipeline shape (v0.1)

v0.1 ships the linear pipeline:

Coder. Reads the issue body via fetch_issue, edits the workspace, commits with a DCO sign-off, pushes the branch to a fork.
Verifier (gate). Pulls the branch in a Kubernetes Job, runs your gate command (in our reference setup make fmt vet lint test manifests chart-crds). Emits GATE-PASS or GATE-FAIL.
Reviewer(s). Read the diff against the issue body, score it against an A-through-H checklist, emit APPROVE / REQUEST-CHANGES / REJECT with structured findings.

Reviewer ensembles are first-class: a Workload.spec.reviewerAgentRefs slice expands to one review-N-i task per (issue, reviewer) pair, and any REQUEST-CHANGES from any reviewer flips the Workload to Phase=Failed via the cascade rule.

DAGs, best-of-N selection, and an autonomous LLM-driven planner are v0.2+ work. See What v0.1 deliberately doesn’t ship below.

Install

Foreman is a separate Helm chart that depends on LLMKube core:

# Make sure LLMKube core is installed first (0.8.0+)
helm repo add llmkube https://defilantech.github.io/LLMKube
helm repo update
helm upgrade --install llmkube llmkube/llmkube 
  --namespace llmkube-system --create-namespace

# Then add Foreman
helm install foreman llmkube/foreman 
  --namespace foreman-system --create-namespace

That installs the foreman-operator (controllers for the four CRDs), plus a foreman-agent Deployment that registers a FleetNode for the gate-runner role on the Linux/K8s host. Apple Silicon coder / reviewer nodes run the foreman-agent binary directly via launchd; see the M3 runbook and the M4 runbook for hosts-side install.

A minimal example

A two-step coder + gate pipeline against a single issue (the V3 shape from M4):

apiVersion: foreman.llmkube.dev/v1alpha1
kind: Workload
metadata:
  name: fix-one-bug
  namespace: default
spec:
  intent: "Fix the lint-all docs gap"
  repo: defilantech/LLMKube
  issues: [510]
  coderAgentRef:
    name: qwen36-35b-carnice-mtp-coder
  verifierAgentRef:
    name: shadowstack-gate

kubectl apply and watch:

kubectl get workload,agentictask -n default -w

The Workload synthesizes a code-510 AgenticTask (coder) and a verify-510 AgenticTask (gate, depends on code). When both succeed, a DCO-signed branch lands on the fork (Defilan/LLMKube:foreman/fix-one-bug/issue-510). Verdict GATE-PASS means it cleared the gate; open the branch as an upstream PR or queue more issues into the Workload.

For the full reviewer-ensemble shape: examples/foreman/workload-v04-default.yaml in this repo.

Coder escalation tier

Setting Workload.spec.escalationCoderAgentRef opts the Workload into a coder escalation tier. When the base coder fails an issue at its model’s ceiling, Foreman re-runs that one issue on the named larger-model coder Agent, carrying the failed model’s own diagnosis forward as a prompt hint. The idea is a routing hop, not a retry storm: a fast, capable model takes the first pass, and only the issues it genuinely can’t close get a second, heavier attempt.

Escalation fires only on capability failures, where a bigger model plausibly helps:

a model-decided NO-GO (the coder concluded it could not solve the issue), or
a coder-gate failure (CODER-GATE-FAILED).

It deliberately does not fire on:

a model-decided INCOMPLETE (the model gave up or ran out of turns),
a stuck-loop detection, or
an error.

A larger, slower model won’t fix a give-up or a loop and is more likely to blow the turn budget, so those outcomes are left as-is.

escalationCoderAgentRef is a singular field: one escalation tier, run sequentially after the base coder. It applies to issue-batch mode only, alongside coderAgentRef and verifierAgentRef, and is ignored for explicit Pipeline-mode Workloads. Unset (the default) means the tier is disabled.

Recommended deployment. Run the base and escalation coders as a dual-box pair with both models hot, for example a fast MoE coder on one accelerator and a larger dense coder on another, so escalation is a routing hop rather than a cold model load. The operator is responsible for ensuring the escalation Agent’s model is reachable; the controller schedules against it but does not manage serving.

apiVersion: foreman.llmkube.dev/v1alpha1
kind: Workload
metadata:
  name: fix-a-batch
  namespace: default
spec:
  intent: "Clear the edit-fidelity backlog"
  repo: defilantech/LLMKube
  issues: [944, 911, 921]
  coderAgentRef:
    name: qwen36-35b-carnice-mtp-coder
  verifierAgentRef:
    name: shadowstack-gate
  escalationCoderAgentRef:
    name: qwopus-27b-dense-coder

Issues that the base coder closes never touch the escalation Agent. An issue whose base coder returns NO-GO or trips the coder gate is re-dispatched once to qwopus-27b-dense-coder, with the base model’s failure summary threaded into the prompt.

`ALREADY-RESOLVED` — honest “already done” bail (#970)

When a coder concludes the work is already present on the branch or upstream base (e.g. a Fixes #N commit since BaseBranch, or an existing pushed foreman/.../issue-N branch), it emits verdict="NO-GO" with extra.outcome="ALREADY-RESOLVED" and cites the resolution in extra.resolvedBy. This is distinct from a capability failure (MODEL-DECIDED):

shouldEscalateCoder excludes it. The Workload does not pay for a larger-model re-run that would re-derive “nothing to do.”
The Workload rolls to Phase=Completed (not Failed) with reason AllAlreadyResolved (pure case) or AllChildrenSucceeded (mixed case).
A CoderAlreadyResolved condition names the issue numbers so the operator can close them on GitHub.
A Kubernetes event per resolved issue (Reason=AlreadyResolved) gives an event-router / GitHub-app integration point without forcing auto-close.

What v0.1 deliberately doesn’t ship

Foreman v0.1 is the foundation, not the finished platform. A few capabilities we know people will ask for and deliberately punted:

Linear pipelines only. Full DAGs (parallel branches, joins, fan-out across competing candidates) land in v0.2.
No best-of-N or jury selection. Reviewers score the coder’s diff but don’t pick between competing coder candidates. Lands as a separate role in v0.2.
No autonomous planner. The current planner is a stub: you hand it an issue list, or an explicit pipeline. The LLM-driven decomposition lands in v0.2; the v0.1 CRD shape doesn’t change.
No self-improving routing. The capability matcher uses fixed rules. The AgentScore corpus that biases future dispatch based on past outcomes is on the roadmap; v0.1 records the data.
Model-tool-protocol compatibility is implicit, not declared. Foreman currently assumes every inference endpoint speaks OpenAI tool_calls. See model-compatibility for the calibrated table.

The v0.1 CRD shape was designed so each of those additions is a non-breaking extension. Pinning the foundation is the work of 0.8.0; everything above is what we build on it.

Deep references

M3 coder runbook: install the foreman-agent on the coder host (M5 Max / Apple Silicon).
M4 verifier runbook: install Foreman as a chart on the K8s cluster and stand up the gate Agent on a verifier node.
Verifier node install notes: deeper notes on the ShadowStack reference verifier deployment.
Model compatibility table: which models the v0.4 reviewer and coder roles have been empirically validated against.

Where to file issues

Repository: github.com/defilantech/LLMKube
Discord: discord.gg/Ktz85RFHDv
Issue templates: [BUG] or [FEATURE] prefix; the templates under .github/ISSUE_TEMPLATE/ are mandatory for triage.

If your shop fits the target profile (on-prem GPU, sovereignty constraint, K8s in production), we’d love to hear about your fleet.