In my previous post I had demonstrated how easily we can setup Docker EE (Enterprise Edition) and all of it’s components including UCP (Universal Control Plane) and DTR (Docker Trusted Registry) on a single node. I had also outlined steps to deploy a sample application using Swarm orchestrator. Taking it further in this post I am going to provide you a walkthrough of how you can deploy GPU (Graphical Processing Unit) workloads on Docker EE using Kubernetes (K8s) as the orchestrator. K8s support is in experimental mode while GPU support for Swarm is still being worked upon. But before we get into details let me start by providing a quick perspective on GPUs.
If you are a geek most likely you have heard of GPUs. They have been around for a while. GPU, as you might know, was designed to perform computations needed for 3D graphics (for instance, interactive video games) an area where CPUs fell short. That’s how a computer’s motherboard started to have two separate chips – one CPU & other for GPU. Technically, GPU is a large array of small CPU processors performing highly parallelized computation.
Cool! But why are modern cloud computing platforms rushing to augment their compute services with GPUs (AWS, Azure, GCE)? Cloud is typically used for backend processing and has nothing to do with traditional displays. So what’s the rush about? The rush is to allow running computational workloads that have found sweet spot with GPUs – these include AI (Artificial Intelligence), Machine Learning, Deep Learning, and HPC (High Performance Computing) among others. These applications of GPU for non-display use cases is popularly referred to as GPGPU – General Purpose GPUs. Do note though, while it’s still possible to run these workloads with traditional CPUs but with GPUs their processing time is reduced from hours to minutes. Nvidia is the leading manufacturer of GPUs followed by AMD radeon. At the same time, cloud provides are coming up with their own native chips to cater to AI workloads.
So how can developers write programs to leverage GPUs? Well a lot of that depends on manufacturer and the tools they have built around it. For this post let us focus on Nvidia and CUDA (Compute Unified Device Architecture). CUDA is a parallel computing platform developed by Nvidia to allow developers use a CUDA enabled GPU (commonly referred as CUDA core). CUDA platform was designed to work with programming languages such as C & C++. As demand grew for deep learning workloads, Nvidia extended the platform with CUDA Deep Neural Network library (cuDNN) which provides a GPU-accelerated library of primitives for deep neural networks. cuDNN is used by popular deep learning frameworks including TensorFlow, MXNet, etc. to achieve GPU acceleration.
By this time you should have understood – to run GPU workloads you will need the manufacturer’s GPU driver / runtime, libraries and frameworks along with your project binaries. You can deploy all of these components on your machines or better leverage containerization and schedule / scale them using Docker and K8s. Coming to K8s the support for GPUs is in experimental mode. K8s implements Device Plugins to let Pods (Scheduling Unit of K8s) access specialized hardware features such as GPUs. Device Plugins are available for AMD and Nvidia but for this post I will continue to stick to Nvidia. Further for K8s you have two Nvidia device plugin implementations to choose from – one is official from Nvidia and other customized for Google Cloud. I will go with former and use a AWS Ubuntu 16.04 GPU based EC2 instance to deploy Nvidia Driver, Nvidia Docker runtime, K8s and then deploy a CUDA sample.
Let’s start by spinning up a p2.xlarge Ubuntu based Deep Learning AMI on AWS. This AMI has Nvidia CUDA and cuDNN preinstalled.
After logging you can execute ‘nvidia-smi’ command to get details of driver, CUDA version, GPU type, running processes and others. The output should be similar to below.
The first thing you need to with this AMI image is to install K8s. There are various options to install K8s but for this post we will be using Docker EE. You can use the same commands to install Docker EE on this node that I had outlined in my earlier post. Docker EE is bundled with upstream K8s (read here to get more details about Docker Kubernetes Service) hence that one node setup should suffice for our exercise. After installing Docker EE, to install Nvidia Docker container runtime you can follow these quick start instructions. As part those instructions don’t forget to enable K8s Device Plugin daemonset for Nvidia by executing below command (if you don’t have Kubectl installed you can follow the steps outlined here) .
That’s it! You are all set to deploy your first GPU workload to K8s. To create the Pod you can use the UCP UI or kubectl CLI. Here’s a sample yaml file requesting for 1 GPU to perform vector addition of 50000 elements:
apiVersion: v1 kind: Pod metadata: name: cuda-vector-add spec: restartPolicy: OnFailure containers: - name: cuda-vector-add # https://github.com/kubernetes/kubernetes/blob/v1.7.11/test/images/nvidia-cuda/Dockerfile image: "k8s.gcr.io/cuda-vector-add:v0.1" resources: limits: nvidia.com/gpu: 1 # requesting 1 GPU
The pod should be scheduled on our single node cluster and run to completion. You can retrieve the logs via UCP UI or CLI as shown below. The logs should show you the job completion with success status.
I will extend our basic setup in future blog posts to demonstrate workloads with popular deep learning libraries. Meanwhile, do let me know if you have any questions or comments on this deployment workflow.
Until then, happy learning 🙂