Multi-GPU sharding

Run a single model across multiple GPUs on the same node. The Model CR controls split strategy, the InferenceService CR allocates the GPUs.

What this page will cover

Split strategies: layer (default), tensor / row, none, and the pipeline alias.
Custom layer ranges via Model.spec.hardware.gpu.sharding.layerSplit.
Allocating GPUs on the InferenceService and matching the count in the Model.
Verifying the split actually happened by checking llama-server startup logs and the GPU memory split.