K8s Homelab: High Availability for the Control Plane
Table of Contents
This is the fifth post in our “K8s Homelab” series. Check out the previous post to see how we deployed MetalLB, Traefik, Longhorn, and container registries.
The Single Point of Failure Problem
The cluster was running beautifully. Three nodes, all services deployed, everything humming along. I could access my services from anywhere via VPN, manage the cluster with kubectl, and life was good.
Until I realized I had a problem.
Every time I wanted to connect to the cluster, I was using a kubeconfig that pointed to a single node IP. If that node went down, I’d lose access to the cluster entirely. Not exactly what you’d call “high availability.”
# My kubeconfig was pointing to this:
server: https://192.168.X.5:6443 # node1 - single point of failure
This was fine for a development cluster, but I wanted something more resilient. I wanted a Virtual IP (VIP) that would automatically route to any available control plane node, ensuring I could always reach the API server even if one node failed.
What could go wrong?
Chapter 1: The Plan (It Always Starts with a Plan)
The solution seemed straightforward:
- Deploy keepalived on all control plane nodes as a systemd service (host-level, independent of Kubernetes)
- Configure VRRP to manage the VIP (192.168.X.253) with automatic failover
- Add health checks to ensure only healthy nodes hold the VIP
- Update kubeconfig to use the VIP instead of a single node IP
- Ensure TLS certificates include the VIP in their Subject Alternative Names
Simple, right? Keepalived is a battle-tested solution for VIP management, runs independently of Kubernetes, and provides automatic health checking. This should be a quick afternoon project.
Famous last words.
Chapter 2: Why Keepalived Instead of MetalLB?
Initially, I considered using MetalLB for the control plane VIP, but I quickly realized this had a fundamental flaw: MetalLB depends on Kubernetes. If Kubernetes is down, MetalLB can’t manage the VIP, which means I’d lose access to the API server exactly when I need it most.
Keepalived solves this by running on the host (via systemd), completely independent of Kubernetes:
- Early availability: Starts during boot, before Kubernetes
- Independence: Works even if Kubernetes is down
- Direct network access: Manages VIP at the network interface level
- Health checks: Can verify K3s API server health and automatically failover
I configured keepalived to run in a container (via podman) managed by systemd, which is the standard approach on Fedora CoreOS. The configuration includes:
- VRRP instance: Manages the VIP (192.168.X.253) with automatic failover
- Health check script: Monitors the local K3s API server on port 6443
- Priority-based election: Determines which node should hold the VIP initially
- No preemption: Health checks determine VIP ownership, not just priority
The inventory structure defines the VIP:
# inventory.yaml
vips:
control_planes: "192.168.X.253" # HA control plane (managed by keepalived)
workers: "192.168.X.254" # Traefik ingress (managed by MetalLB)
This separation ensures that control plane HA is independent of Kubernetes, while worker services (like Traefik) can still use MetalLB for LoadBalancer functionality.
Chapter 3: The Certificate Problem (TLS Strikes Again)
With the VIPs properly separated, I regenerated my kubeconfig to use the control planes VIP (192.168.X.253). I tried to connect:
$ kubectl get nodes
E1120 15:19:30.367321 7464 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"https://192.168.X.253:6443/api?timeout=32s\": tls: failed to verify certificate: x509: certificate is valid for 10.43.X.X, 127.0.0.1, 192.168.X.254, 192.168.X.5, 192.168.X.6, 192.168.X.7, ::1, not 192.168.X.253"
Ah, the classic TLS certificate problem. The k3s API server’s certificate didn’t include the new control planes VIP (192.168.X.253) in its Subject Alternative Names. It had the old VIP (192.168.X.254) and the individual node IPs, but not the new one.
I checked the k3s configuration on the nodes:
$ sudo cat /etc/rancher/k3s/config.yaml
tls-san: [192.168.X.254] # The old VIP, not the new one
The nodes had been provisioned with the old VIP in their tls-san configuration. I needed to update all control plane nodes to include the new VIP and restart k3s to regenerate the certificates.
The Fix: Updating TLS SAN on All Nodes
I created a playbook to update the k3s configuration on all control plane nodes:
# machines/playbooks/update-k3s-tls-san.yaml
- name: Update k3s TLS SAN to include control planes VIP
hosts: kubernetes_cluster
gather_facts: false
vars:
control_planes_vip: "{{ hostvars[groups['kubernetes_cluster'][0]].vips.control_planes }}"
tasks:
- name: Check if node is control plane
set_fact:
is_control_plane: "{{ control_plane | default(false) | bool }}"
- name: Skip if not control plane node
meta: end_play
when: not is_control_plane
- name: Read current k3s config
slurp:
src: /etc/rancher/k3s/config.yaml
register: k3s_config_content
become: yes
- name: Update k3s config to add VIP to tls-san
copy:
content: |
{{ (k3s_config_content.content | b64decode | from_yaml) | combine({'tls-san': ((k3s_config_content.content | b64decode | from_yaml).get('tls-san', []) + [control_planes_vip]) | unique}) | to_yaml }}
dest: /etc/rancher/k3s/config.yaml
mode: '0644'
become: yes
register: config_updated
- name: Restart k3s service to regenerate certificates
systemd:
name: k3s
state: restarted
become: yes
when: config_updated.changed | default(false) | bool
This playbook:
- Reads the current k3s configuration
- Adds the control planes VIP to the
tls-sanlist (if not already present) - Restarts k3s to regenerate certificates with the new SAN
After running this on all three control plane nodes and waiting for the certificates to regenerate, the kubeconfig worked perfectly. I could now connect via the VIP from both my laptop (through VPN) and from machines on the same VLAN.
Success! Or so I thought.
Chapter 4: The Fallback Problem (What If HA Fails?)
Everything was working, but I had a nagging thought: what if keepalived goes down? What if the VIP becomes unavailable? My kubeconfig would be pointing to a VIP that doesn’t work, and I’d be locked out of the cluster.
This was a real concern. In a homelab, things break. Services restart, nodes reboot, network issues happen. I needed a resilient kubeconfig generation process that could detect when the HA VIP wasn’t working and fall back to a direct node connection.
The Solution: Smart Kubeconfig Generation with Fallback
I updated the get-kubeconfig.yaml playbook to:
- Test HA VIP connectivity - Check if the VIP is reachable
- Validate TLS certificate - Ensure the certificate is valid for the VIP
- Fallback to direct node IP - If HA fails, use the first responsive control plane node
# machines/playbooks/get-kubeconfig.yaml
- name: Check if HA control plane VIP is reachable
command: "timeout 3 bash -c 'echo > /dev/tcp/{{ ha_control_plane_vip }}/6443'"
register: ha_vip_check
ignore_errors: true
delegate_to: localhost
become: false
changed_when: false
- name: Create temporary kubeconfig for HA VIP test
copy:
content: "{{ kubeconfig_output.stdout | regex_replace('127\\.0\\.0\\.1:6443', ha_control_plane_vip + ':6443') }}"
dest: "{{ playbook_dir }}/../data/kubeconfig-test-ha.tmp"
mode: '0600'
when:
- kubeconfig_output.rc == 0
- kubeconfig_output.stdout is defined
- ha_vip_check.rc == 0
become: false
delegate_to: localhost
- name: Test HA VIP with kubectl (verify TLS certificate is valid)
command: "timeout 5 kubectl --kubeconfig={{ playbook_dir }}/../data/kubeconfig-test-ha.tmp get nodes --no-headers"
register: ha_vip_kubectl_test
ignore_errors: true
delegate_to: localhost
become: false
changed_when: false
when:
- kubeconfig_output.rc == 0
- kubeconfig_output.stdout is defined
- ha_vip_check.rc == 0
- name: Determine if HA VIP is working
set_fact:
ha_vip_working: "{{ (ha_vip_check.rc == 0) and (ha_vip_kubectl_test.rc | default(1) == 0) }}"
become: false
- name: Display HA VIP status
debug:
msg: "HA VIP ({{ ha_control_plane_vip }}) is {{ 'working' if ha_vip_working else 'not working' }}. {{ 'Using HA VIP' if ha_vip_working else 'Falling back to direct node IP' }}."
become: false
- name: Replace 127.0.0.1 with HA control plane VIP in kubeconfig (HA mode)
set_fact:
kubeconfig_final: "{{ kubeconfig_output.stdout | regex_replace('127\\.0\\.0\\.1:6443', ha_control_plane_vip + ':6443') }}"
when:
- kubeconfig_output.rc == 0
- kubeconfig_output.stdout is defined
- ha_vip_working | default(false) | bool
become: false
- name: Replace 127.0.0.1 with responsive node IP in kubeconfig (fallback mode)
set_fact:
kubeconfig_final: "{{ kubeconfig_output.stdout | regex_replace('127\\.0\\.0\\.1:6443', source_node_ip + ':6443') }}"
when:
- kubeconfig_output.rc == 0
- kubeconfig_output.stdout is defined
- not (ha_vip_working | default(false) | bool)
become: false
Now the playbook automatically:
- Detects if HA is working by testing both connectivity and TLS validation
- Uses the HA VIP when available (resilient, automatic failover)
- Falls back to direct node IP when HA is unavailable (ensures access even if keepalived is down)
This gives me the best of both worlds: high availability when everything is working, and a safety net when things break.
Chapter 5: The Keepalived Configuration (Getting the Details Right)
With keepalived running on all control plane nodes, I needed to ensure the configuration was correct. The keepalived setup includes:
VRRP Configuration
# /etc/keepalived/keepalived.conf
vrrp_instance VI_1 {
state BACKUP
interface eth0
virtual_router_id 51
priority 100 # Node-specific: lenovo1=100, lenovo2=90, lenovo3=80
advert_int 1
authentication {
auth_type PASS
auth_pass k3s-keepalived-secret
}
virtual_ipaddress {
192.168.X.253/32
}
track_script {
chk_k3s
}
nopreempt # Health checks determine VIP ownership
}
Key points:
- Virtual Router ID: Unique ID (51) for the cluster’s VRRP instance
- Priority: Node-specific priorities (100, 90, 80) determine initial master election
- No preemption: Health checks determine who holds the VIP, not just priority
- Health check integration: The
chk_k3sscript monitors K3s API server health
Health Check Script
#!/bin/bash
# /usr/local/bin/k3s-healthcheck.sh
# Returns 0 if healthy, non-zero if unhealthy
# Check if K3s API server is listening
if ! netstat -tuln 2>/dev/null | grep -q ':6443 '; then
exit 1
fi
# Check if K3s API server responds to health check
if curl -k -s --max-time 2 --connect-timeout 2 https://localhost:6443/healthz >/dev/null 2>&1; then
exit 0
else
exit 1
fi
This script ensures that only nodes with a healthy K3s API server hold the VIP. If a node’s API server fails, keepalived automatically releases the VIP, allowing another healthy node to take over.
Systemd Service
# /etc/systemd/system/keepalived.service
[Unit]
Description=Keepalived for K3s API server HA
After=network-online.target
Before=k3s.service
Wants=network-online.target
[Service]
Type=notify
ExecStart=/usr/bin/podman run --rm \
--name keepalived \
--network host \
--cap-add=NET_ADMIN \
--cap-add=NET_BROADCAST \
--cap-add=NET_RAW \
--cap-add=SYS_TIME \
--volume /etc/keepalived/keepalived.conf:/etc/keepalived/keepalived.conf:ro \
--volume /usr/local/bin/k3s-healthcheck.sh:/usr/local/bin/k3s-healthcheck.sh:ro \
docker.io/osixia/keepalived:latest
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target
The service:
- Runs keepalived in a container with host networking (required for VIP management)
- Starts before k3s (ensures VIP is available when Kubernetes starts)
- Automatically restarts on failure
- Mounts configuration and health check script as read-only volumes
Chapter 6: The Network Configuration (Ensuring VPN Access Works)
With the HA control plane working within the cluster, I wanted to ensure it was accessible from my laptop through VPN. I checked the firewall configuration and found that VPN to Lab VLAN forwarding was already configured, but I wanted to make it explicit.
I added a specific firewall rule for the Kubernetes API server port:
# network/roles/router/templates/firewall/firewall.conf
# VPN to Lab forwarding (for remote development access)
config forwarding
option src vlan30 # Source: VPN VLAN
option dest vlan26 # Destination: Lab VLAN (example VLAN numbers)
The forwarding rule was sufficient - OpenWRT’s zone-based firewall allows all traffic between zones when forwarding is enabled. The explicit rule I initially added was redundant, so I removed it to keep the configuration clean.
What We Achieved
With the HA control plane implementation complete, we now have:
✅ Resilient API Access: The k3s API server is accessible via a VIP that automatically routes to any available control plane node
✅ Automatic Failover: If one control plane node fails, keepalived automatically migrates the VIP to a healthy node
✅ Host-Level Independence: Keepalived runs on the host, independent of Kubernetes, ensuring VIP availability even if Kubernetes is down
✅ Health Checks: Only nodes with healthy K3s API servers hold the VIP
✅ Separated VIPs: Control planes (keepalived) and workers (MetalLB) have their own VIPs, preventing conflicts
✅ Smart Kubeconfig: Automatic detection of HA availability with fallback to direct node access
✅ VPN Access: The HA control plane is accessible from anywhere through the VPN
✅ TLS Security: All certificates properly include the VIP in their SAN lists
✅ Future-Proof: New nodes automatically get keepalived and correct TLS SAN configuration via Butane templates
The Implementation Structure
Keepalived is configured via Butane/Ignition during node provisioning, ensuring it’s baked into the node image:
machines/roles/build-pxe-files/templates/
├── butane/
│ ├── lenovo1-bootstrap.bu.j2 # Includes keepalived config
│ ├── lenovo1-reinstall.bu.j2 # Includes keepalived config
│ ├── lenovo2.bu.j2 # Includes keepalived config
│ └── lenovo3.bu.j2 # Includes keepalived config
└── configs/
├── keepalived.conf.j2 # VRRP configuration template
├── k3s-healthcheck.sh.j2 # Health check script template
└── keepalived.service.j2 # Systemd service unit template
The configuration is deployed to the router’s HTTP server and sourced by Butane files during node provisioning:
# Butane file excerpt
files:
- path: /etc/keepalived/keepalived.conf
contents:
source: http://192.168.26.1/pxe/configs/keepalived-lenovo1.conf
- path: /usr/local/bin/k3s-healthcheck.sh
contents:
source: http://192.168.26.1/pxe/configs/k3s-healthcheck.sh
- path: /etc/systemd/system/keepalived.service
contents:
source: http://192.168.26.1/pxe/configs/keepalived.service
systemd:
units:
- name: keepalived.service
enabled: true
Lessons Learned
This journey taught me several important lessons about high availability in Kubernetes:
Host-Level Independence Matters: Running keepalived on the host (systemd) instead of in Kubernetes ensures the VIP is available even when Kubernetes is down. This is critical for control plane HA.
Health Checks Are Essential: Keepalived’s health check script ensures only nodes with healthy K3s API servers hold the VIP. Without this, a node could hold the VIP even if its API server is down.
TLS Certificates Are Fussy: Always include VIPs in the
tls-sanconfiguration from the start. Updating certificates on running nodes is possible but requires restarts.Fallback Is Essential: Even with HA, things can break. Building automatic fallback into tooling ensures you’re never locked out of your cluster.
No Preemption Is Better: Using
nopreemptin keepalived means health checks determine VIP ownership, not just priority. This prevents flapping when nodes recover.Butane Integration: Configuring keepalived via Butane/Ignition ensures it’s baked into the node image and automatically configured during reinstall.
Test Everything: Connectivity tests, TLS validation, and kubectl commands all need to work. Don’t assume that if one works, they all do.
Inventory Structure Matters: Clear separation of concerns (control_planes vs workers VIPs) makes the architecture easier to understand and maintain.
Container vs Native: Running keepalived in a container (podman) on Fedora CoreOS is the standard approach, providing isolation while maintaining host-level network access.
Conclusion
The cluster now has true high availability for the control plane. The Kubernetes API server is accessible via a VIP (192.168.X.253) managed by keepalived, which automatically migrates to healthy nodes when failures occur.
Keepalived runs on the host (systemd), completely independent of Kubernetes, ensuring the VIP is available even when Kubernetes is down. Health checks ensure only nodes with healthy K3s API servers hold the VIP, providing automatic failover based on actual service health, not just node availability.
The smart kubeconfig generation with automatic fallback provides an additional safety net, ensuring cluster access is maintained even if keepalived becomes unavailable.
The foundation is now truly resilient, and I can confidently manage the cluster knowing that single points of failure have been eliminated and that the control plane HA is independent of Kubernetes itself.
Check out the previous post to see how we deployed the cluster infrastructure, or read the first post for the complete journey from the beginning.
comments powered by Disqus