Kubernetes From Scratch Part 2 – Networking

Konrad Rotkiewicz
12 February 2018 · 13 min read

In our last blog post, we created a pseudo cluster to show how Kubernetes works inside. Today we are going to add a second node and make sure the cluster utilizes it.

SECOND NODE

What about having more than one node? What if we would like to schedule pods on 2 nodes? It is as simple as running kubelet on another node and making sure it connects to our API Server.

First of all, we assume that we have the first node from our previous blog post, with 10.135.53.41 internal IP, running API Server, etcd and nginx deployment.

Now let’s create a second one, ssh to it and run kubelet there.

$ doctl compute droplet create node2 --region fra1 --size 2gb --image ubuntu-16-04-x64 --enable-private-networking --ssh-keys 79:29:54:77:13:2f:9c:b8:06:3e:8b:fe:8d:c0:d7:ba
ID Name Public IPv4 Private IPv4 Public IPv6 Memory VCPUs Disk Region Image Status Tags
63460608 node2 46.101.98.124 2048 2 40 fra1 Ubuntu 16.04.3 x64 new
$ doctl compute droplet create node2 --region fra1 --size 2gb --image ubuntu-16-04-x64 --enable-private-networking --ssh-keys 79:29:54:77:13:2f:9c:b8:06:3e:8b:fe:8d:c0:d7:ba
ID Name Public IPv4 Private IPv4 Public IPv6 Memory VCPUs Disk Region Image Status Tags
63460608 node2 46.101.98.124 2048 2 40 fra1 Ubuntu 16.04.3 x64 new
$ ssh root@46.101.98.124
root@node2 $ apt-get update && apt-get install -y docker.io
root@node2 $ wget -q --show-progress https://dl.k8s.io/v1.7.6/kubernetes-server-linux-amd64.tar.gz
root@node2 $ tar xzf kubernetes-server-linux-amd64.tar.gz
root@node2 $ mv kubernetes/server/bin/* /usr/local/bin/
root@node2 $ rm -rf *
root@node2 $ export MASTER_IP=10.135.53.41
root@node2 $ kubelet --api-servers=$MASTER_IP:8080 &> /tmp/kubelet.log &
view raw 1.sh hosted with ❤ by GitHub

now on node1, we can check if the node has been recognized:

root@node $ kubectl get nodes
NAME STATUS AGE VERSION
node Ready 2h v1.7.6
node2 Ready 2m v1.7.6
view raw 2.sh hosted with ❤ by GitHub

This is how our nodes look like now:

1

Next, let’s scale up our nginx deployment:

root@node $ kubectl scale deploy nginx --replicas=6
root@node $ kubectl get pods -o=wide
NAME READY STATUS RESTARTS AGE IP NODE
nginx 1/1 Running 0 1h 172.17.0.3 node
nginx-31893996-3dnx7 1/1 Running 0 1h 172.17.0.5 node
nginx-31893996-5d1ts 1/1 Running 0 1h 172.17.0.6 node
nginx-31893996-5xnhc 1/1 Running 0 17s 172.17.0.2 node2
nginx-31893996-9k93w 1/1 Running 0 1h 172.17.0.4 node
nginx-31893996-lfrzl 1/1 Running 0 17s 172.17.0.4 node2
nginx-31893996-q99cp 1/1 Running 0 17s 172.17.0.3 node2
nginx2 1/1 Running 0 1h 172.17.0.2 node
view raw 3.sh hosted with ❤ by GitHub

we can see that they are scheduled on both nodes.

Wait, can you see that pods have duplicated IP addresses? This is because we don’t have a way to manage IP address for pods among all nodes, this also means that there is no communication between pods located on different nodes.
To fix that we have to introduce another cluster component – network fabric.

KUBERNETES NETWORKING USING FLANNEL

Kubernetes makes specific assumptions about networking in the cluster:

  • pods can communicate with each other by using unique pod’s IP address
  • nodes can communicate with pods using unique pod’s IP address
  • the IP that a container sees itself as is the same IP that others see it as

Kubernetes assumes that each pod and service in a cluster has a unique IP address and can communicate with other pods using their IP addresses. To achieve that we need a way to assign subnet of IP address for each node and ask Docker to use it when spawning containers, then we have to establish a non-NAT communication between these IP address. There is a lot of ways to do that, here we are going to focus on Flannel.

Flannel is one of the easiest ways to achieve these assumptions. Basically, Flannel runs as an agent on each node and is responsible for allocating a subnet for that node out of configured address space. That subnet is used by docker to obtain IP addresses for pods. Each subnet together with node’s IP address is stored in etcd and is readable by all agents. This allows flannel to obtain node location for given pod’s IP and forward traffic to that node.

This is how our networking will look like, it also shows how flannel works in the big picture.

2

Applying what we’ve just learned, now we can run flannel on our nodes. The tricky part is to configure docker to use flannel. This is what we are going to do:

  • run flannel on both nodes pointing it to our etcd
  • use mk-docker-opts.sh shipped with flannel. It generates script file with docker environment variables. The file is sourced by docker.
  • insert initial flannel configuration to etcd using etcdctl

root@node $ export MASTER_IP=10.135.53.41
root@node $ export NODE_IP=10.135.53.41
root@node $ wget -q --show-progress https://github.com/coreos/flannel/releases/download/v0.6.2/flannel-v0.6.2-linux-amd64.tar.gz
flannel-v0.6.2-linux-amd64.tar.gz.1 100%[==================================================================================================================================>] 4.51M 1.15MB/s in 3.9s
root@node $ tar xzf flannel-v0.6.2-linux-amd64.tar.gz
root@node $ mv flanneld mk-docker-opts.sh /usr/local/bin/
root@node $ etcdctl -C http://$MASTER_IP:2379 set /coreos.com/network/config '{"Network": "10.0.0.0/8", "SubnetLen": 20, "SubnetMin": "10.10.0.0","SubnetMax": "10.99.0.0","Backend": {"Type": "vxlan","VNI": 100,"Port": 8472}}'
root@node $ flanneld -iface=$NODE_IP -etcd-endpoints http://$MASTER_IP:2379 &> /tmp/flanneld.log &
root@node $ mk-docker-opts.sh -d /etc/default/docker
root@node $ service docker restart
view raw 4-1.sh hosted with ❤ by GitHub
root@node2 $ export MASTER_IP=10.135.53.41
root@node2 $ export NODE_IP=10.135.40.58
root@node2 $ wget -q --show-progress https://github.com/coreos/flannel/releases/download/v0.6.2/flannel-v0.6.2-linux-amd64.tar.gz
flannel-v0.6.2-linux-amd64.tar.gz 100%[==================================================================================================================================>] 4.51M 1.04MB/s in 4.5s
root@node2 $ tar xzf flannel-v0.6.2-linux-amd64.tar.gz
root@node2 $ mv flanneld mk-docker-opts.sh /usr/local/bin/
root@node2 $ flanneld -iface=$NODE_IP -etcd-endpoints http://$MASTER_IP:2379 &> /tmp/flanneld.log &
root@node2 $ mk-docker-opts.sh -d /etc/default/docker
root@node2 $ service docker restart
view raw 4-2.sh hosted with ❤ by GitHub

and now we can check pods’ IP addresses and try to ping them on different nodes:

root@node $ kubectl get pods -o=wide
NAME READY STATUS RESTARTS AGE IP NODE
busyboxx-2467095313-36vqq 1/1 Running 2 15m 10.11.32.2 node2
busyboxx-2467095313-ltrm4 1/1 Running 0 12m 10.14.192.7 node
nginx 1/1 Running 1 3d 10.14.192.4 node
nginx-31893996-3dnx7 1/1 Running 1 3d 10.14.192.5 node
nginx-31893996-5d1ts 1/1 Running 1 3d 10.14.192.3 node
nginx-31893996-5xnhc 1/1 Running 3 3d 10.11.32.5 node2
nginx-31893996-9k93w 1/1 Running 1 3d 10.14.192.6 node
nginx-31893996-lfrzl 1/1 Running 3 3d 10.11.32.3 node2
nginx-31893996-q99cp 1/1 Running 3 3d 10.11.32.4 node2
nginx2 1/1 Running 1 3d 10.14.192.2 node
root@node $ kubectl run -it curl --image=ulamlabs/curlping --command -- bash
root@curl-900741416-vwtjp $ ping 10.14.192.3 -c 1 && ping 10.11.32.5 -c 1
PING 10.14.192.3 (10.14.192.3): 56 data bytes
64 bytes from 10.14.192.3: icmp_seq=0 ttl=62 time=1.770 ms
--- 10.14.192.3 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max/stddev = 1.770/1.770/1.770/0.000 ms
PING 10.11.32.5 (10.11.32.5): 56 data bytes
64 bytes from 10.11.32.5: icmp_seq=0 ttl=64 time=0.104 ms
--- 10.11.32.5 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max/stddev = 0.104/0.104/0.104/0.000 ms
view raw 5.sh hosted with ❤ by GitHub

Sweet, we have a pod to pod communication and this is how our nodes look like now:

3-1

LOAD BALANCING BETWEEN NODES

So now we have both nodes fully capable of running pods, what about receiving traffic?
Currently, we accept traffic only on the first node, it will be forwarded to pods on the second node (by flannel) but this is not high availability solution – the first node is single a point of failure.
To solve that we should install Kube Proxy on all worker nodes, after doing that we can add an icing on our cake – DigitalOcean Load Balancer and balance ingress between nodes.

root@node2 $ kube-proxy --master=http://$MASTER_IP:8080 &> /tmp/proxy.log &
$ doctl compute load-balancer create --name lb --region fra1 --forwarding-rules entry_protocol:http,entry_port:80,target_protocol:http,target_port:30073 --health-check protocol:http,port:30073,path:/,check_interval_seconds:10,response_timeout_seconds:5,healthy_threshold:5,unhealthy_threshold:3
$ doctl compute droplet list "node*"
ID Name Public IPv4 Private IPv4 Public IPv6 Memory VCPUs Disk Region Image Status Tags
63370004 node 46.101.177.76 10.135.53.41 2048 2 40 fra1 Ubuntu 16.04.3 x64 active
63460608 node2 46.101.98.124 10.135.40.58 2048 2 40 fra1 Ubuntu 16.04.3 x64 active
$ doctl compute load-balancer add-droplets 58f02699-5717-43e6-bbfe-51ef4cc0a227 --droplet-ids 63370004,63460608
$ doctl compute load-balancer get 58f02699-5717-43e6-bbfe-51ef4cc0a227
ID IP Name Status Created At Algorithm Region Tag Droplet IDs SSL Sticky Sessions Health Check Forwarding Rules
58f02699-5717-43e6-bbfe-51ef4cc0a227 67.207.79.225 lb active 2017-09-27T09:40:56Z round_robin fra1 63370004,63460608 false type:none,cookie_name:,cookie_ttl_seconds:0 protocol:http,port:30073,path:/,check_interval_seconds:10,response_timeout_seconds:5,healthy_threshold:5,unhealthy_threshold:3 entry_protocol:http,entry_port:80,target_protocol:http,target_port:30073,certificate_id:,tls_passthrough:false
$ curl http://67.207.79.225
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
view raw 6.sh hosted with ❤ by GitHub

Everything works, we could even assume that we have production ready cluster 😀 but of course, we are far from that.

This is how our cluster looks now:

4-1

WRAPPING UP

In this blog post, we have learned how nodes in the cluster communicate together and how pods are exposed to the outer world through services. There are more aspects of Kubernetes that we need to cover before we can say that our cluster is production ready. We are going to cover them in future blog posts, so stay tuned!

Share on
Have a question?
Feel free to contact us.
Get a free consultation!