When I joined Docker my hiring manager had mentioned to me that everything they do is fairly simple except Networking. While most will agree to that statement, it’s even more true from a developer standpoint. If you search for any Docker Networking article, it invariably gets into the details of network namespaces, netns commands, ip links, iptables, IPVS, veths, mac addresses, and so on. While the underpinnings are important, what most of them fail to do is to paint a simple picture for Devs who are looking to understand the overall traffic flow and how communication between containers work. And that’s a common pain point I have heard from most of my customers. Hence, I thought to simplify this convoluted piece. Hope you find my attempt useful.
Let’s start by talking about networking in general which is to make computers (VMs and devices) talk to each other. Networking primarily relies on Bridge (or switch) and Router. Interestingly most of us are already using these components in our home – to connect all our devices at home with internet. If you need a refresher check out this link on home networking.
Docker containers are no different than those devices and to connect containers we need a similar setup. There are 2 parts to establishing connectivity – connecting containers running on single VM and connecting containers spread across VMs (nodes). The first one is pretty simple and I guess well understood. Docker uses a virtual bridge to connect all the containers running on a single VM. This bridge is called ‘Docker0’ and all the containers running on a standalone docker instance are connected to this bridge. The one end of the bridge is connected to host ethernet. This allows those containers to reach out for accessing external services. The docker0 also has a default IP CIDR range of 172.17.0.0/16 assigned, out of which individual container get their own IP address.
Now what about the containers running across nodes which is typical with Swarm – running Docker containers at scale. How do we connect all these containers? While the VMs themselves are connected, they don’t have any clue about containers and the IPs assigned to those containers. It would be very tedious to do any physical network changes to enable this communication. That’s where Docker leverages software defined networking, using VXLAN based overlay networks (note VXLAN overlay is one of the network drivers supported by Docker; for more comprehensive description on Docker Networking refer to this article). When you initialize Swarm on a docker node (“docker swarm init”) it creates two networks by default – docker_gwBridge & ingress (will get to ingress at the end).
So let’s take an example – consider you have UI and DB containers, and you want UI container to communicate to DB. In this case both UI & DB containers are running on different nodes. To make this communication possible you will have to create a user defined overlay network just use “docker network create –driver overlay appnet” and then create containers attached to that network “docker service create –network appnet nginx”. Yes it’s that simple. As you can see below the overlay network gets a CIDR range assigned on creation and attached containers are assigned IPs from that range (you can customize the CIDR pool for all overlay networks at time of swarm initialization with “docker swarm init –default-addr-pool <ip-range>”).
So is that all? Mostly but there is one more thing. How do the containers running on these overlay networks reach out to external ecosystem? What if the database we just mentioned is not running in the Swarm cluster but outside of it? In standalone Docker nodes this was carried out by connecting all containers to Docker0 bridge. Following a similar pattern in Swarm mode, we use a Gateway bridge – which is created on every node at the time of node joining the Swarm cluster. Each container in overlay network has its gateway endpoint attached to Docker gateway bridge for external access. The reason gateway bridge is connected with individual container and not VLAN as a whole is because containers can be on any node and having a single gateway for the entire VLAN would make it difficult. You also don’t want to pipe all traffic through one interface.
That brings us to the last component ‘Ingress’. Ingress (also called routing mesh) is just another overlay network. Its job is to cater to external user requests coming from outside the cluster. Hence this overlay is different, in the sense it’s spread across all the nodes in cluster. The app specific overlays were confined to the nodes where app containers were running. So how does ingress know where incoming request should be routed i.e. to which container? Ingress uses ports – you can publish your app on a specific port, and the traffic to that port would be routed to the underlying app regardless of the node the request landed on. Now couple of important points to note on how this magic happens. Ingress overlay has a gateway endpoint configured to every node’s docker_gwbridge network which allows it to receive incoming request from any node. In addition, all the containers which expose published port are also attached to ingress overlay and hence have dual IPs. This attachment of containers to both ingress and app overlay networks is necessary, without which there is no way for ingress to route request to that container as overlays are isolated from any external traffic. The gateway endpoints for containers are still connected to docker_gwbridge as explained earlier.
So there you go. I have taken some liberty to abstract things out but I would be very happy if all my customers had this level of understanding when I start working with them. You can easily extend this setup by adding L7 load balancer and reverse proxy like Traefik, Interlock, etc. That would allow you to do all app routing through reverse proxy and reverse proxy will be the only component exposed through ingress overlay. Docker engine has DNS capability too, where name resolution works for services in same overlay network (try to ping DB container from UI without IP address using serivce-name of DB swarm service) and there is built-in support for load balancing across service replicas via virtual IPs (you can lookup VIP using docker service inspect <service-name>).
That’s all for this post. Did I demystify the docker networking magic for you? Is there anything else I could simplify further. Please let me know your comments.