GPU autoscaling on Kubernetes with KEDA: Building an external scaler
KEDA cannot natively scale on GPU metrics because it is compiled with CGO_ENABLED=0, making NVML inaccessible. A custom external scaler deployed as a DaemonSet on each GPU node reads local hardware metrics via go-nvml and exposes them over gRPC, enabling KEDA to trigger HPA decisions based on GPU utilization, memory, temperature, or power draw. Pre-built profiles cover common workloads: vLLM inference scales on memory usage with scale-to-zero, Triton on utilization, and training jobs on utilization without scale-down.