Slice, Share, Serve: Exploring CERN's GPU buffet

Diana Gaponcic

09:15 - 10:00

Pavillon

GPUs and accelerators are enabling High Energy Physics (HEP) to keep pace with the growing data volume and computational complexity. This said, the hardware's high cost and increasing demand make it crucial to maximize its use. To achieve optimal utilization, we need to look into ways of sharing the GPUs, which comes with added complexity. In this talk, we will discuss how to use GPUs on Kubernetes, what the challenges are, and we will explore the new exciting way of allocating and sharing GPUs - using Dynamic Resource Allocation (DRA). We go over the multiple options for GPU sharing: time-slicing, MPS, and MIG, and analyse the tradeoffs using extensive benchmarking. Finally, we describe how managing GPUs in a centralized way improves resource utilization across interactive and batch workloads while optimizing costs in the long run.