Building a Kubernetes Homelab: Deploying the Foundation Cluster Infrastructure
Table of Contents
This is the fourth post in our “Building a Kubernetes Homelab” series. Check out the previous post to see how we deployed the physical K3s cluster with PXE boot and Butane configurations.
From Physical to Logical: Building the Foundation Layer
With the physical K3s cluster up and running across our three Lenovo nodes, it was time to deploy the essential infrastructure services that would form the foundation for everything else. These services needed to run at the physical cluster level—before we could deploy any specialized logical clusters.
The foundation layer consists of three critical components:
- MetalLB - Load balancer for bare-metal Kubernetes
- Traefik - Ingress controller for HTTP/HTTPS traffic
- Longhorn - Distributed block storage for persistent volumes
Together, these three services provide the infrastructure layer that specialized clusters will consume.
The Architecture: Foundation vs Specialized Clusters
Before diving into the implementation, let’s clarify the architecture. This was a key design decision I made early on:
graph TB
subgraph "Physical Layer"
subgraph "Foundation Cluster"
K3s[K3s Cluster<br/>lenovo1, lenovo2, lenovo3]
MetalLB[MetalLB<br/>VIP Manager]
Traefik[Traefik<br/>Ingress Controller]
Longhorn[Longhorn<br/>Distributed Storage]
end
end
subgraph "Logical Layer - Future"
projectX[projectX<br/>Specialized Cluster]
Perso[Personal<br/>Specialized Cluster]
end
MetalLB -->|Provides VIPs| Traefik
Traefik -->|Routes Traffic| FutureApps
Longhorn -->|Provides PVs| AllClusters
K3s -->|Hosts| AllServices
Foundation Cluster: The physical K3s cluster running the infrastructure services. These services are shared across all specialized clusters.
Specialized Clusters: Future logical clusters (like “projectX” for some project, “perso” for personal services) that will consume the foundation layer services.
This separation allows:
- Shared infrastructure across multiple logical clusters
- Clean separation of concerns
- Easier upgrades and maintenance
- Consistent patterns across all specialized clusters
Challenge 1: Load Balancing Without a Cloud Provider
Kubernetes was designed for cloud environments where load balancers are a first-class primitive. In a bare-metal homelab, you don’t have AWS ELB or Google Cloud Load Balancer. You need a way to expose services to the network.
The Solution: MetalLB with Layer 2 Mode
MetalLB provides the “missing” load balancer for bare-metal Kubernetes. It operates in two modes:
- Layer 2 Mode (what we used): Assigns IP addresses from a configured pool, advertises them via ARP
- BGP Mode: Integrates with your network’s BGP infrastructure
For a homelab, Layer 2 mode is simpler and perfectly adequate. MetalLB:
- Chooses one node as the “leader” to respond to ARP requests
- Automatically migrates the IP if that node fails
- Supports multiple IP pools for different services
The VIP Design
I decided to use a single Virtual IP (VIP) as the entry point for all cluster services:
192.168.X.254 → cluster.lab.x.y.z
This VIP becomes the single point of entry. All services get DNS CNAMEs that point to this VIP:
longhorn.lab.x.y.z → CNAME → cluster.lab.x.y.z
(traefik dashboard would be the same)
DNS on the OpenWRT router handles the hostname routing, while MetalLB ensures the VIP is always reachable even if nodes fail.
MetalLB Implementation
# cluster/roles/metallb/defaults/main.yaml
metallb_namespace: metallb-system
metallb_version: v0.14.3
metallb_ip_pool_name: cluster-vip
metallb_ip_address: "{{ hostvars[groups['kubernetes_cluster'][0]].cluster_vip }}/32"
The MetalLB configuration creates an IPAddressPool and an L2Advertisement:
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
name: cluster-vip
namespace: metallb-system
spec:
addresses:
- 192.168.X.254/32
---
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
name: cluster-l2-advert
namespace: metallb-system
spec:
ipAddressPools:
- cluster-vip
One critical issue we discovered: MetalLB’s webhook service needs endpoints to be in the “ready” state before you can apply these configurations. The playbook needed a wait condition:
- name: Wait for webhook service to have endpoints
kubernetes.core.k8s_info:
api_version: v1
kind: Endpoints
name: metallb-webhook-service
namespace: "{{ metallb_namespace }}"
register: webhook_endpoints
until: >
webhook_endpoints.resources | length > 0
and webhook_endpoints.resources[0].subsets | default([]) | length > 0
and webhook_endpoints.resources[0].subsets[0].addresses | default([]) | length > 0
retries: 60
delay: 5
This waits for the endpoint to be in the addresses array (ready state) rather than notReadyAddresses (when the controller is starting).
Challenge 2: Namespace Organization and the Traefik Saga
The original Traefik deployment used the kube-system namespace. This bothered me from a cleanliness perspective. I wanted dedicated namespaces for each infrastructure component:
metallb-system- Load balancertraefik-system- Ingress controllerlonghorn-system- Storage
After some consideration, I decided to migrate Traefik to its own namespace. This required:
- Removing the existing Traefik deployment
- Adding namespace creation to the Ansible playbook
- Updating default variables
- Redeploying with the new namespace
The migration went smoothly, and now each infrastructure component has clear ownership and isolation.
Traefik Configuration
Traefik needed to use the MetalLB VIP and serve both HTTP and HTTPS:
# cluster/roles/ingress/templates/values.yaml.j2
service:
type: LoadBalancer
spec:
loadBalancerIP: {{ traefik_loadbalancer_ip }}
ports:
web:
port: 80
expose: {}
websecure:
port: 443
expose: {}
Note the expose: {} syntax—the newer Traefik Helm chart expects an object, not a boolean.
Challenge 3: Distributed Storage with Longhorn
Longhorn provides distributed block storage for Kubernetes. Unlike cloud storage solutions that rely on external cloud storage APIs, Longhorn runs entirely within the cluster. It:
- Creates replicated volumes across nodes
- Handles node failures gracefully
- Provides storage classes for different use cases
- Includes a web UI for management
Longhorn Configuration
Longhorn’s configuration includes several important settings:
# cluster/roles/longhorn/defaults/main.yaml
longhorn_namespace: longhorn-system
longhorn_version: v1.5.3
longhorn_data_path: /var/lib/longhorn
longhorn_replica_count: 3
longhorn_ingress_enabled: true
longhorn_ingress_host: longhorn.lab.x.y.z
longhorn_ingress_class: traefik
The Helm values file configures Longhorn’s default settings:
# cluster/roles/longhorn/templates/values.yaml.j2
defaultSettings:
defaultDataPath: {{ longhorn_data_path }}
defaultReplicaCount: {{ longhorn_replica_count }}
guaranteedEngineManagerCPU: "250"
guaranteedReplicaManagerCPU: "250"
storageReservedPercentageForDefaultDisk: "5"
The key decisions here:
- Replica count of 3: With three nodes, storing three replicas means each node has one copy. This provides redundancy if one node fails.
- Storage reserved percentage: Set to 5% (instead of the default 30%) to maximize available storage for volumes. This is important because Longhorn reserves space for system operations and to prevent over-provisioning.
- Ingress enabled: Access Longhorn UI through Traefik at
longhorn.lab.x.y.z - Data path: Store Longhorn data on dedicated HDDs (separate from the OS SSDs on each node)
Disk Tagging
Since our nodes have dedicated HDD disks for Longhorn storage (mounted at /var/lib/longhorn), we tag them with longhorn-hdd-raw to ensure they’re reserved for Longhorn use. The Ansible playbook automatically applies these tags:
# cluster/roles/longhorn/templates/node-disk-tags.yaml.j2
apiVersion: longhorn.io/v1beta2
kind: Node
metadata:
name: {{ node_item.name }}
namespace: {{ longhorn_namespace }}
spec:
disks:
"{{ node_item.disk_key }}":
tags:
- longhorn-hdd-raw
The tag uses a longhorn- prefix for clarity, making it explicit that these disks are reserved for Longhorn storage.
Custom Storage Classes
Longhorn comes with a default storage class, but we created custom ones with disk selectors to ensure volumes are only created on our tagged HDD disks:
# cluster/roles/longhorn/templates/storage-classes.yaml.j2
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: longhorn-hdd-raw-delete
provisioner: driver.longhorn.io
allowVolumeExpansion: true
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
parameters:
numberOfReplicas: "3"
staleReplicaTimeout: "30"
diskSelector: "longhorn-hdd-raw"
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: longhorn-hdd-raw-retain
provisioner: driver.longhorn.io
allowVolumeExpansion: true
reclaimPolicy: Retain
volumeBindingMode: WaitForFirstConsumer
parameters:
numberOfReplicas: "3"
staleReplicaTimeout: "30"
diskSelector: "longhorn-hdd-raw"
This setup ensures:
- Disk isolation: Volumes are only created on disks tagged with
longhorn-hdd-raw - Reclaim policies: Two storage classes - one for Delete (default) and one for Retain (for important data)
- Replica distribution: Three replicas spread across the three nodes for redundancy
The diskSelector parameter ensures that Longhorn only schedules volume replicas on our dedicated HDD disks, preventing accidental use of system SSDs.
Challenge 4: The Great Version Mismatch Mystery
Everything was going smoothly until we noticed instability: pods restarting, services not responding. Investigation revealed a critical issue:
lenovo1: v1.28.5+k3s1
lenovo2: v1.33.5+k3s1 ← The problem
lenovo3: v1.28.5+k3s1
Somehow, lenovo2 was running a version 5 major versions ahead of the other nodes! This created severe compatibility issues:
- MetalLB speakers couldn’t communicate on port 7946 (member discovery)
- Traefik pods were crashing
- API compatibility mismatches throughout the cluster
The Root Cause: Manual Intervention
After examining logs and timestamps, we discovered that the K3s installer was manually re-run without specifying a version. The installer defaults to the latest version, resulting in the mismatch.
This incident highlighted the need for:
- Version pinning in inventory configuration
- Automated upgrades via Ansible
- Rolling upgrade strategy to maintain cluster health
The Solution: Rolling Upgrade Playbook
We created an Ansible playbook to safely upgrade K3s with:
- Serial execution: Only one node at a time
- Cordon and drain: Move workloads before upgrading
- Health checks: Wait for nodes to be ready before proceeding
- Automatic uncordon: Restore scheduling after successful upgrade
# machines/playbooks/upgrade-k3s.yaml
- name: Rolling K3s Upgrade
hosts: kubernetes_cluster
serial: 1 # One node at a time
vars:
target_version: "{{ k3s_version }}"
tasks:
- name: Cordon node
command: kubectl cordon {{ node_name }}
- name: Drain node
command: kubectl drain {{ node_name }} --ignore-daemonsets --force
- name: Upgrade K3s
shell: curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION="{{ target_version }}" sh -
- name: Wait for node ready
command: kubectl get node {{ node_name }} -o jsonpath='{.status.conditions[?(@.type=="Ready")].status}'
until: node_status.stdout == "True"
retries: 60
delay: 5
- name: Uncordon node
command: kubectl uncordon {{ node_name }}
This playbook ensures zero-downtime upgrades while maintaining cluster health.
DNS Configuration: The Final Piece
With services deployed, we needed DNS resolution. The OpenWRT router’s DNSmasq configuration includes:
# A record for the cluster VIP
config domain
option name 'cluster.lab.x.y.z'
option ip '192.168.X.254'
# CNAME for each service
config cname
option cname 'longhorn.lab.x.y.z'
option target 'cluster.lab.x.y.z'
This allows accessing services by their friendly hostnames while only managing one actual IP address.
The Ansible Structure
All foundation cluster infrastructure is organized in cluster/:
cluster/
├── playbooks/
│ └── deploy.yaml # Main playbook
├── roles/
│ ├── metallb/
│ │ ├── defaults/main.yaml
│ │ ├── tasks/main.yaml
│ │ └── templates/config.yaml.j2
│ ├── ingress/
│ │ ├── defaults/main.yaml
│ │ ├── tasks/main.yaml
│ │ └── templates/values.yaml.j2
│ └── longhorn/
│ ├── defaults/main.yaml
│ ├── tasks/main.yaml
│ └── templates/
│ ├── values.yaml.j2
│ ├── storage-classes.yaml.j2
│ └── node-disk-tags.yaml.j2
└── data/ # Generated files (gitignored)
├── metallb-config.yaml
├── traefik-values.yaml
├── longhorn-values.yaml
├── storage-classes.yaml
└── node-disk-tags.yaml
Each role follows a consistent pattern:
- Add Helm repository
- Create namespace
- Template values/configuration
- Deploy via Helm
- Wait for readiness
- Apply custom resources (StorageClasses, etc.)
Python Installation: A Necessary Evil
One unexpected requirement: Fedora CoreOS doesn’t include Python by default (it’s a minimal immutable OS). This caused issues with Ansible’s gather_facts module.
The solution: Add a Python installation service to the Butane configurations:
# In all .bu.j2 files
- path: /etc/systemd/system/python-install.service
mode: 0644
contents:
inline: |
[Unit]
Description=Install Python for Ansible
After=network-online.target
Before=k3s-install.service
[Service]
Type=oneshot
ExecStart=/usr/bin/rpm-ostree install python3
[Install]
WantedBy=multi-user.target
This installs Python3 on first boot and persists through Zincati (Fedora CoreOS automatic updates).
Makefile Integration
The deployment is integrated into the main Makefile:
# Foundation Cluster Commands
.PHONY: cluster/deploy
cluster/deploy: machines/kubeconfig
@echo "📦 Deploying foundation cluster infrastructure..."
ansible-playbook -i inventory.yaml cluster/playbooks/deploy.yaml
Running make cluster/deploy:
- Ensures kubeconfig exists
- Creates required directories
- Runs all three roles in sequence
- Deploys the complete foundation layer
What We Achieved
With the foundation layer deployed, we now have:
✅ Load Balancing: MetalLB managing VIP distribution ✅ Ingress: Traefik routing HTTP/HTTPS traffic ✅ Storage: Longhorn providing persistent volumes ✅ DNS: CNAME-based routing for all services ✅ Automation: Complete Ansible-driven deployment ✅ Monitoring: All services accessible via friendly hostnames ✅ Redundancy: Services distributed across three nodes
The Road Ahead
The foundation is now solid. The next steps will be:
- Longhorn Access: Configure ingress to access Longhorn UI
- Testing: Verify VIP failover and service resilience
- Specialized Clusters: Deploy first logical cluster (projectX or perso)
- Service Migration: Move existing services to Kubernetes
- Monitoring Stack: Deploy Prometheus and Grafana
Lessons Learned
This phase taught me several important lessons:
- Webhooks Matter: Always wait for webhook endpoints to be ready before applying CRDs
- Namespace Organization: Dedicated namespaces improve clarity and maintainability
- Version Consistency: Never manually upgrade without proper procedures
- Layer Order: Infrastructure services must be stable before deploying workload
- DNS Design: CNAME aliasing is cleaner than managing multiple A records
- Automation First: Infrastructure changes should always go through Ansible
Conclusion
The foundation cluster infrastructure is now deployed and stable. We have a solid base for building specialized clusters on top. The combination of MetalLB, Traefik, and Longhorn provides all the essential primitives that specialized clusters will consume.
In the next post, we’ll explore how to use this foundation to deploy specialized logical clusters, potentially using technologies like KCP (Kubernetes Control Plane) or vCluster to create isolated, workload-specific environments.
Check out the previous post to see how we built the physical cluster, or read the first post for the complete journey from the beginning.
comments powered by Disqus