Running CUDA workloads¶

If you want to run CUDA workloads on the K3s container you need to customize the container.
CUDA workloads require the NVIDIA Container Runtime, so containerd needs to be configured to use this runtime.
The K3s container itself also needs to run with this runtime.
If you are using Docker you can install the NVIDIA Container Toolkit.

Building a customized K3s image¶

To get the NVIDIA container runtime in the K3s image you need to build your own K3s image.
The native K3s image is based on Alpine but the NVIDIA container runtime is not supported on Alpine yet.
To get around this we need to build the image with a supported base image.

Dockerfile¶

Dockerfile:

ARG K3S_TAG="v1.28.8-k3s1"
ARG CUDA_TAG="12.4.1-base-ubuntu22.04"

FROM rancher/k3s:$K3S_TAG as k3s
FROM nvcr.io/nvidia/cuda:$CUDA_TAG

# Install the NVIDIA container toolkit
RUN apt-get update && apt-get install -y curl \
    && curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
    && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
      sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
      tee /etc/apt/sources.list.d/nvidia-container-toolkit.list \
    && apt-get update && apt-get install -y nvidia-container-toolkit \
    && nvidia-ctk runtime configure --runtime=containerd

COPY --from=k3s / / --exclude=/bin
COPY --from=k3s /bin /bin

# Deploy the nvidia driver plugin on startup
COPY device-plugin-daemonset.yaml /var/lib/rancher/k3s/server/manifests/nvidia-device-plugin-daemonset.yaml

VOLUME /var/lib/kubelet
VOLUME /var/lib/rancher/k3s
VOLUME /var/lib/cni
VOLUME /var/log

ENV PATH="$PATH:/bin/aux"

ENTRYPOINT ["/bin/k3s"]
CMD ["agent"]

This Dockerfile is based on the K3s Dockerfile The following changes are applied:

Change the base images to nvidia/cuda:12.4.1-base-ubuntu22.04 so the NVIDIA Container Toolkit can be installed. The version of cuda:xx.x.x must match the one you’re planning to use.
Add a manifest for the NVIDIA driver plugin for Kubernetes with an added RuntimeClass definition. See k3s documentation.

The NVIDIA device plugin¶

To enable NVIDIA GPU support on Kubernetes you also need to install the NVIDIA device plugin. The device plugin is a daemonset and allows you to automatically:

Expose the number of GPUs on each nodes of your cluster
Keep track of the health of your GPUs
Run GPU enabled containers in your Kubernetes cluster.

apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
  name: nvidia
handler: nvidia
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: nvidia-device-plugin-daemonset
  namespace: kube-system
spec:
  selector:
    matchLabels:
      name: nvidia-device-plugin-ds
  updateStrategy:
    type: RollingUpdate
  template:
    metadata:
      labels:
        name: nvidia-device-plugin-ds
    spec:
      runtimeClassName: nvidia # Explicitly request the runtime
      tolerations:
      - key: nvidia.com/gpu
        operator: Exists
        effect: NoSchedule
      # Mark this pod as a critical add-on; when enabled, the critical add-on
      # scheduler reserves resources for critical add-on pods so that they can
      # be rescheduled after a failure.
      # See https://kubernetes.io/docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/
      priorityClassName: "system-node-critical"
      containers:
      - image: nvcr.io/nvidia/k8s-device-plugin:v0.15.0-rc.2
        name: nvidia-device-plugin-ctr
        env:
          - name: FAIL_ON_INIT_ERROR
            value: "false"
        securityContext:
          allowPrivilegeEscalation: false
          capabilities:
            drop: ["ALL"]
        volumeMounts:
        - name: device-plugin
          mountPath: /var/lib/kubelet/device-plugins
      volumes:
      - name: device-plugin
        hostPath:
          path: /var/lib/kubelet/device-plugins

Two modifications have been made to the original NVIDIA daemonset:

Added RuntimeClass definition to the YAML frontmatter.

apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
  name: nvidia
handler: nvidia

Added runtimeClassName: nvidia to the Pod spec.

Note: you must explicitly add runtimeClassName: nvidia to all your Pod specs to use the GPU. See k3s documentation.

Build the K3s image¶

To build the custom image we need to build K3s because we need the generated output.

Put the following files in a directory:

The build.sh script is configured using exports & defaults to v1.28.8+k3s1. Please set at least the IMAGE_REGISTRY variable! The script performs the following steps builds the custom K3s image including the nvidia drivers.

build.sh:

#!/bin/bash

set -euxo pipefail

K3S_TAG=${K3S_TAG:="v1.28.8-k3s1"} # replace + with -, if needed
CUDA_TAG=${CUDA_TAG:="12.4.1-base-ubuntu22.04"}
IMAGE_REGISTRY=${IMAGE_REGISTRY:="MY_REGISTRY"}
IMAGE_REPOSITORY=${IMAGE_REPOSITORY:="rancher/k3s"}
IMAGE_TAG="$K3S_TAG-cuda-$CUDA_TAG"
IMAGE=${IMAGE:="$IMAGE_REGISTRY/$IMAGE_REPOSITORY:$IMAGE_TAG"}

echo "IMAGE=$IMAGE"

docker build \
  --build-arg K3S_TAG=$K3S_TAG \
  --build-arg CUDA_TAG=$CUDA_TAG \
  -t $IMAGE .
docker push $IMAGE
echo "Done!"

Run and test the custom image with k3d¶

You can use the image with k3d:

k3d cluster create gputest --image=$IMAGE --gpus=1

Deploy a test pod:

kubectl apply -f cuda-vector-add.yaml
kubectl logs cuda-vector-add

This should output something like the following:

$ kubectl logs cuda-vector-add

[Vector addition of 50000 elements]
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
Copy output data from the CUDA device to the host memory
Test PASSED
Done

If the cuda-vector-add pod is stuck in Pending state, probably the device-driver daemonset didn’t get deployed correctly from the auto-deploy manifests. In that case, you can apply it manually via kubectl apply -f device-plugin-daemonset.yaml.

Acknowledgements¶

Most of the information in this article was obtained from various sources:

Authors¶

Last update: April 15, 2024