Domesticated Kubernetes Networking
I have wanted to run Kubernetes at home for some time, but the main obstacle has been a reliable solution for providing load balancing for ingress or services, and the lack of a reasonable way to manage NAT transparently. While publicly routable IPv4 addresses are seemingly limitless* in the cloud, typically we only get one at home.
Similarly, there isn’t a straightforward way to build cloud-ey load balancers at home. While Google and Amazon can conjure up magic TCP load balancers on their complex overlay network platform, we don’t really have that luxury outside of the cloud. Rather, my costs are sunk and my gear is paid for. Any functionality they didn’t come with from the factory has to be hacked on with Linux and cussing.
The solution is to combine existing technologies in a new way. To accomplish this, I will combine MetalLB, Nginx ingress, a couple flat VLANs, and dynamic IPTables configuration to automate NAT bindings.
Of course, the standard disclaimer applies here. I will not say that this is a good idea, or that any sane person should do these things. This is strictly an exploration of some novel technology that I find interesting, working to solve a problem that nobody really has. With that said, enjoy!
My test cluster consists of four active nodes:
This cluster is built on vanilla Kubernetes 1.21 installed with Kubeadm, and running Calico for the pod network overlay. There is no other “fancy stuff” going on here, as I like to keep things as simple and clean as possible.
MetalLB (the load balancer for poor people)
Since I don’t have a “real” cloud, I can’t create true network load balancers. Instead, I will use MetalLB for this purpose. It’s still a beta product, so don’t use this in production unless you really like taking risks.
In its simplest form, MetalLB works very similarly to HSRP or Keepalived, where one node is assigned as the role holder for a virtual IP.
It then assigns that to one of its network bridges and listens on the virtual IP for requests. Then, requests are forwarded to the
kube-proxy which sends them to their respective services and pods.
The important thing to note is that MetalLB is not really a load balancer in this setup, it’s more or less a floating IP to provide failover. All requests are directed to a single node, then redistributed. This isn’t ideal, but it’s good enough for our purposes. I’m sure a more robust solution will emerge in time.
Install MetalLB on the cluster
Before MetalLB can be installed, some small patches must be made to the existing
kube-proxy config to allow non-bound IPs to communicate with the node.
kubectl get configmap kube-proxy -n kube-system -o yaml | \ sed -e "s/strictARP: false/strictARP: true/" | \ kubectl diff -f - -n kube-system kubectl get configmap kube-proxy -n kube-system -o yaml | \ sed -e "s/strictARP: false/strictARP: true/" | \ kubectl apply -f - -n kube-system
Then, the manifests can be applied to install MetalLB on the cluster:
kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.9.6/manifests/namespace.yaml kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.9.6/manifests/metallb.yaml kubectl create secret generic -n metallb-system memberlist --from-literal=secretkey="$(openssl rand -base64 128)"
If installed correctly, the pods will be visible in the
$ kubectl get pods -n metallb-system NAME READY STATUS RESTARTS AGE pod/controller-64f86798cc-dsb8t 1/1 Running 0 104s pod/speaker-22wwl 1/1 Running 0 104s pod/speaker-6lxrs 1/1 Running 0 104s pod/speaker-nnpkb 1/1 Running 0 104s pod/speaker-rbvpn 1/1 Running 0 104s
The final step is to apply a manifest specifying the IPv4 range the load balancers may occupy. This file,
metallb-config.yaml allows 20 IP addresses in the
apiVersion: v1 kind: ConfigMap metadata: namespace: metallb-system name: config data: config: | address-pools: - name: default protocol: layer2 addresses: - 10.1.0.200-10.1.0.220
Finally, it can be applied to the cluster:
kubectl apply -f metallb-config.yaml -n metallb-system
Nginx Ingress Controller
While some distributions come pre-configured with Nginx for ingres, I have chosen to install it myself.
kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/master/deploy/static/provider/baremetal/deploy.yaml
This will install most of the needed components, but it is missing the connection for LoadBalancer type services, and relies on only NodePort or HostPort service types. This simply creates a service of type LoadBalancer for the Ingress controller to use:
--- apiVersion: v1 kind: Service metadata: labels: app.kubernetes.io/name: ingress-nginx app.kubernetes.io/instance: ingress-nginx app.kubernetes.io/version: 0.45.0 app.kubernetes.io/component: controller name: ingress-nginx-controller namespace: ingress-nginx spec: type: LoadBalancer externalTrafficPolicy: Local ports: - name: http port: 80 protocol: TCP targetPort: http - name: https port: 443 protocol: TCP targetPort: https selector: app.kubernetes.io/name: ingress-nginx app.kubernetes.io/instance: ingress-nginx app.kubernetes.io/component: controller
After this manifest is applied, the service will expose its IPv4 address on the private network:
$ kubectl -n ingress-nginx get service NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE ingress-nginx-controller LoadBalancer 10.98.99.38 10.1.0.200 80:31347/TCP,443:30495/TCP 35m ingress-nginx-controller-admission ClusterIP 10.102.49.124 <none> 443/TCP 35m
Any new ingress-enabled services will have access to this IP now, and all internal (on the local network) clients will be able to access the applications through the nginx ingress.
However we must still contend with the port forwarding through the NAT.
Automate NAT rules for Ingress load-balancer
Since my router is a simple Linux server, it will be very easy to drop in new NAT rules as the Nginx ingress load balancer is configured.
The rules are quite simple; one DNAT rule for each of HTTP and HTTPS, and a masquerade rule to fix the source/dest addresses to allow traffic to return to the correct place. I believe this could be applied to just about any router, so long as there is some sort of API or sufficiently hackable web UI.
Otherwise, the fallback solution would be to NAT all traffic to the
control-node-0 server, and proxy it back to the virtual IP on whichever worker it resides on. This eliminates the failover aspect, so I wouldn’t really recommend it unless it’s strictly necessary.
I automated this using a fairly basic Bash script that templates the ‘external’ IP of the load balancer, and applies that configuration to the Linux firewall.
Note that the user
nat_update has write access via ACL to the file
/etc/iptables/nat.rules. A separate process on the firewall is responsible for validating and merging the rules with its existing set. There’s almost certainly a better way to do this; but this works and that’s hard to argue with for a home-grown, hacked together cluster.
The system in action
To test the system out, I applied the manifests for the ‘Guestbook’ application and created an additional manifest for its ingress configuration:
--- apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: guestbook-ingress spec: rules: - host: foo.bar http: paths: - path: / pathType: Prefix backend: service: name: guestbook port: number: 3000
A few moments later, the configuration is applied and ready:
$ kubectl get ingress NAME CLASS HOSTS ADDRESS PORTS AGE guestbook-ingress <none> foo.bar 10.1.0.103 80 45s
Then, by some absolute miracle all the pieces come together.
There are hits on the NAT translation rule, packets are directed to the correct node to reach the virtual IP,
kube-proxy routes the request to the correct service, where the request is terminated at the pod and container responsible for returning the response. It’s horribly abstract, complex, and tightly coupled.
But it works, and that’s good enough for me.