Deploying GPU Workloads with Docker EE & Kubernetes

In my previous post I had demonstrated how easily we can setup Docker EE (Enterprise Edition) and all of it’s components including UCP (Universal Control Plane) and DTR (Docker Trusted Registry) on a single node. I had also outlined steps to deploy a sample application using Swarm orchestrator. Taking it further in this post I am going to provide you a walkthrough of how you can deploy GPU (Graphical Processing Unit) workloads on Docker EE using Kubernetes (K8s) as the orchestrator. K8s support is in experimental mode while GPU support for Swarm is still being worked upon. But before we get into details let me start by providing a quick perspective on GPUs.

If you are a geek most likely you have heard of GPUs. They have been around for a while. GPU, as you might know, was designed to perform computations needed for 3D graphics (for instance, interactive video games) an area where CPUs fell short. That’s how a computer’s motherboard started to have two separate chips – one CPU & other for GPU. Technically, GPU is a large array of small CPU processors performing highly parallelized computation.

Cool! But why are modern cloud computing platforms rushing to augment their compute services with GPUs (AWS, Azure, GCE)? Cloud is typically used for backend processing and has nothing to do with traditional displays. So what’s the rush about? The rush is to allow running computational workloads that have found sweet spot with GPUs – these include AI (Artificial Intelligence), Machine Learning, Deep Learning, and HPC (High Performance Computing) among others. These applications of GPU for non-display use cases is popularly referred to as GPGPU – General Purpose GPUs. Do note though, while it’s still possible to run these workloads with traditional CPUs but with GPUs their processing time is reduced from hours to minutes. Nvidia is the leading manufacturer of GPUs followed by AMD radeon. At the same time, cloud provides are coming up with their own native chips to cater to AI workloads.

So how can developers write programs to leverage GPUs? Well a lot of that depends on manufacturer and the tools they have built around it. For this post let us focus on Nvidia and CUDA (Compute Unified Device Architecture). CUDA is a parallel computing platform developed by Nvidia to allow developers use a CUDA enabled GPU (commonly referred as CUDA core). CUDA platform was designed to work with programming languages such as C & C++. As demand grew for deep learning workloads, Nvidia extended the platform with CUDA Deep Neural Network library (cuDNN) which provides a GPU-accelerated library of primitives for deep neural networks. cuDNN is used by popular deep learning frameworks including TensorFlow, MXNet, etc. to achieve GPU acceleration.

By this time you should have understood – to run GPU workloads you will need the manufacturer’s GPU driver / runtime, libraries and frameworks along with your project binaries. You can deploy all of these components on your machines or better leverage containerization and schedule / scale them using Docker and K8s. Coming to K8s the support for GPUs is in experimental mode. K8s implements Device Plugins to let Pods (Scheduling Unit of K8s) access specialized hardware features such as GPUs. Device Plugins are available for AMD and Nvidia but for this post I will continue to stick to Nvidia. Further for K8s you have two Nvidia device plugin implementations to choose from – one is official from Nvidia and other customized for Google Cloud. I will go with former and use a AWS Ubuntu 16.04 GPU based EC2 instance to deploy Nvidia Driver, Nvidia Docker runtime, K8s and then deploy a CUDA sample.

Let’s start by spinning up a p2.xlarge Ubuntu based Deep Learning AMI on AWS. This AMI has Nvidia CUDA and cuDNN preinstalled.

After logging you can execute ‘nvidia-smi’ command to get details of driver, CUDA version, GPU type, running processes and others. The output should be similar to below.

The first thing you need to with this AMI image is to install K8s. There are various options to install K8s but for this post we will be using Docker EE. You can use the same commands to install Docker EE on this node that I had outlined in my earlier post. Docker EE is bundled with upstream K8s (read here to get more details about Docker Kubernetes Service) hence that one node setup should suffice for our exercise. After installing Docker EE, to install Nvidia Docker container runtime you can follow these quick start instructions. As part those instructions don’t forget to enable K8s Device Plugin daemonset for Nvidia by executing below command (if you don’t have Kubectl installed you can follow the steps outlined here) .

kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/1.0.0-beta/nvidia-device-plugin.yml

That’s it! You are all set to deploy your first GPU workload to K8s. To create the Pod you can use the UCP UI or kubectl CLI. Here’s a sample yaml file requesting for 1 GPU to perform vector addition of 50000 elements:

apiVersion: v1
kind: Pod
metadata:
  name: cuda-vector-add
spec:
  restartPolicy: OnFailure
  containers:
    - name: cuda-vector-add
      # https://github.com/kubernetes/kubernetes/blob/v1.7.11/test/images/nvidia-cuda/Dockerfile
      image: "k8s.gcr.io/cuda-vector-add:v0.1"
      resources:
        limits:
          nvidia.com/gpu: 1 # requesting 1 GPU

The pod should be scheduled on our single node cluster and run to completion. You can retrieve the logs via UCP UI or CLI as shown below. The logs should show you the job completion with success status.

I will extend our basic setup in future blog posts to demonstrate workloads with popular deep learning libraries. Meanwhile, do let me know if you have any questions or comments on this deployment workflow.

Until then, happy learning 🙂

Working with Docker APIs

Docker’s popularity is due to its simplicity. Most developers can cover quite a bit of ground just with build, push & run commands. But there could be times when you need to start looking beyond CLI commands. For instance, I was working with a customer who was trying to hook up their monitoring tool to Docker Engine to enhance productivity of their ops team. This hook is typically to pull status and stats of docker components and report / alert on them. The way monitoring tool would achieve this is by tapping into APIs provided by Docker Engine. Yes, Docker Engine does provide REST HTTP APIs which can be invoked by any language / runtime with HTTP support (that’s how Docker client works with Docker Engine). Docker even has SDKs for Go and Python. For this post I will be focus on how you can access these APIs using HTTP for both Linux and Windows hosts. Finally we will look at how to access APIs for Docker Enterprise deployments.

Accessing APIs with Linux Hosts

For Linux, Docker daemon listens by default on a socket – /var/run/docker.sock. So using our good old friend CURL here’s the command that you can execute locally to get the list of all containers (equivalent of docker ps)

curl --unix-socket /var/run/docker.sock -H "Content-Type: application/json" -X GET http:/containers/json

But how about accessing the API remotely? Well simple you need to expose a docker endpoint beyond socket using the -H flag for docker service. Link shows you how to configure the same. With that configuration in place you can fire CURL again but this time access the IP rather than the socket.

curl -k  -H "Content-Type: application/json" -X GET http://172.31.6.154:2376/containers/json

So can just anyone access these APIs remotely? Is there an option to restrict or secure access to APIs? The answer is yes. Docker supports certificate based authentication wherein only authorized clients (possessing cert obtained from authorized CA) can communicate to Docker engine. This article covers required steps in detail to configure certs. Once configuration is done you can use below curl command to access Docker APIs.

curl -k --cert cert.pem --cacert ca.pem --key key.pem https://3.14.248.232/containers/json

Accessing APIs with Windows Hosts

Unlike Linux which uses sockets, Windows uses pipes. Once you install Docker and run the pipelist command you should see ‘docker_engine’ in the pipelist. You can access this pipe using custom C# code or use Docker.DotNet NuGet Package. Below is a sample code with Docker.DotNet library to retrieve list of containers using ‘docker_engine’ pipe.

DockerClient client = new DockerClientConfiguration(new Uri("npipe://./pipe/docker_engine")).CreateClient();

var containerList = client.Containers.ListContainersAsync(new ContainersListParameters());

foreach (var container in containerList.Result)
{...}

In addition, similar to Linux you can use the -H option to allow for TCP connections and also configure certs to secure connection. You can do this via config file, environment variables or just command line as shown below. You can refer to this article for securing Docker Engine with TLS on Windows. Once configured you can use Docker.DotNet to invoke Docker APIs.

dockerd -H npipe:// -H 0.0.0.0:2376 --tlsverify --tlscacert=C:\ProgramData\docker\daemoncerts\ca.pem --tlscert=C:\ProgramData\docker\daemoncerts\cert.pem --tlskey=C:\ProgramData\docker\daemoncerts\key.pem --register-service

Accessing APIs with Docker Enterprise Edition

So far we have invoked APIs for a standalone community docker engine but Docker has other enterprise offerings including Docker Enterprise Edition (EE). If you are not familiar with it check out my post to create a single node test environment. Universal Control Plane (UCP) part of Docker EE provides a set of APIs which you can interact with. Also unlike standalone engines, Docker UCP is secure by default. So to connect to UCP you would need to pass username password credentials or use certs from the client bundle both of which are shown below. With credentials you retrieve an AUTHTOKEN and pass that token for subsequent requests to gain access.

#Accessing UCP APIs via Creds
AUTHTOKEN=$(curl --insecure -s -X POST -d "{ \"username\":\"admin\",\"password\":\"Password123\" }" "https://3.14.248.232/auth/login" | awk -F ':' '{print    $2}' | tr -d '"{}')

curl -k -H  "content-type: application/json" -H "Authorization: Bearer $AUTHTOKEN" -X GET https://3.14.248.232/containers/json 

#Accessing UCP APIs via Client Bundle
curl -k --cert cert.pem --cacert ca.pem --key key.pem https://3.14.248.232/images/json 

Hope this helps in extend your automation and monitoring tools to Docker!

What is DevOps?

Though the notion of DevOps has been around for years, I still see folks struggling to articulate what really is DevOps. Like recently in our user group meeting, participants highjacked the session for 30 minutes debating what DevOps stood for without really arriving at a conclusion. Similarly, I have seen architects, sales teams and developers, struggle to explain in simple terms to their management as to what is DevOps, why is it important and what business value it would provide. In this post, I will share my thought process and would like to hear from you if you have a simpler way of communicating the DevOps intent.

Traditionally software development has been a complex process involving multiple teams. You have developers, testers, project managers, operations, architects, UX designers, business analysts and many others, collaborating to create value for business. The collaboration among these teams requires handshakes (think of it as multi-party supply chain) which often cause friction leading to non-value adds. For instance, the handshake where UX designer develops UI and then developers add code, or the handshake where analyst captures business requirements and development team write code to meet those requirements, or the traditional handshake between developers and testers for code validation, and so on. One such critical handshake is between developers and operations where developers typically toss software over to operations to deploy it in upstream environments (outside of developer’s workstation). Unfortunately, members of these two teams have been ignoring concerns of each other for decades. Developers assume code will just run (after all it ran on their machine) but in reality it rarely does. And that’s where DevOps comes to rescue.

Considering above, it would be safe to summarize that DevOps is any tool, technology or process that will reduce friction between Developers and Operations (thereby creating more value for business e.g. faster time to market, higher uptimes, lower IT costs and so on). Now this could be app containerization bundling all dependencies into single image, or it could be setting up Continuous Integration and Continuous Delivery pipelines to allow for robust consistent deployments, or adopting microservices architecture to pave way for loosely coupled deployments or it could be infrastructure as a code to allow for reliable version-controlled infrastructure setup, or AI enabled app monitoring tools to proactively mitigate app issues or even reorganizing IT teams and driving cultural changes within IT organization. But the objectives are same – reduce friction between Developers and Operations. One you get these basics right it’s easy to devise a strategy and approach.

So does this resonate with you? Or you have a simpler way to explain DevOps? Looking forward for your comments.

NuGet Package Restore, Content Folder and Version Control

I was recently explaining this nuance to a Dev on my team, and he suggested I should capture this in a blog post. So here we go. First some NuGet background.

NuGet is the de facto standard of managing dependencies in the .NET world. Imagine you have some reusable code – rather than sharing that functionality with your team via a DLL, you can create a NuGet Package for your team. What’s the benefit you may ask?

1) Firstly NuGet can do lot more than adding a DLL into your project references. Your NuGet package can add configuration entries as part of the installation, execute scripts or create new folders / files within Visual Studio project structure, which would greatly simplify the adoption of your reusable code.

2) Secondly as a package owner you can include dependencies to other packages or DLLs. So when a team member installs your package, she will get all the required dependencies at one go.

3) Finally the NuGet Package is local to your project, the assemblies are not installed on your system GAC. This not only helps for a clean development, but also at build time. Packages don’t have to be checked into the version control, rather at build time you can restore them on your build server – no more shared lib folders.

It’s quite a simple process to create NuGet Packages. Download the NuGet command line utility, organize artifacts (DLLs, scripts, source code templates, etc.) you want to include into their respective folders, create package metadata (nuget spec), and pack them (nuget pack) to get your nupkg file. You can now install / restore the package through Visual Studio or through command line (nuget install / restore).

Typically NuGet recommends 4 folders to organize your artifacts – ‘Lib’ contains your binaries, ‘Content’ contains the folder structure, files, which will be added to your project root, ‘tools’ contains scripts e.g. init. ps1, install.ps1, and ‘build’ contains custom build targets / props.

Now let’s get to the crux of this post – the restore aspect and what you should check into your version control. When you add NuGet Package to your project, NuGet does two things – it’s creates a packages.config and a packages folder. Config files keep a list of all the added packages, and packages folder contains the actual package (it’s basically an unzip of your nupkg file). Recommended approach is to check-in your packages.config file but not the packages folder. As part of NuGet restore, NuGet brings back all the packages in the packages folder (see workflow image below).

workflow-image

The subtle catch is NuGet restore doesn’t restore content files, or perform transformations that are part of it. These changes are applied the first time you install NuGet package, and they should be checked in the version control. This also goes to say don’t put any DLLs inside the content folder (they should anyways go to lib folder). If you must, you will have to check-in even those DLLs inside your version control.

nugetpackage

In summary, NuGet restore just restores the package files, it doesn’t perform any tokenization, transformation or execution as part of it. These activities are performed at the package installation, and corresponding changes must be checked into the version control.