Motivation: Beyond Cloud Abstractions
Managed Kubernetes services (EKS, GKE) are designed to hide the complexities of underlying infrastructure. While beneficial for production velocity, this abstraction layer obscures the fundamental challenges of distributed systems engineering: network partitioning, partial failure modes, and physical resource contention.
To gain deep operational expertise, I architected a physical cluster that exposes these primitives. The goal was to build a system that forces confrontation with edge computing constraints—strictly limited memory, ARM64 architecture, and unreliable storage mediums—within a controlled environment.
The primary design constraint was environmental: the cluster functions as office infrastructure, requiring near-silent operation and a power envelope under 100 watts, while providing sufficient density to simulate complex failure scenarios.
Engineering Constraints
Acoustic Profile
Must operate below 30dB (library quiet) for office compatibility. Precludes standard 1U server fans.
Power Envelope
Total draw capped at ~400W to minimize thermal load and operational cost.
Node Density
Minimum 10+ nodes required to effectively test distributed consensus and failure recovery.
Hardware Architecture
Compute: Raspberry Pi CM4
The Compute Module 4 (8GB RAM) was selected for its form factor and PCIe connectivity. Unlike standard Raspberry Pis, the CM4 exposes a single PCIe Gen 2 lane, enabling direct NVMe storage attachment. This eliminates the USB bottleneck that typically plagues single-board computer clusters.
Interconnect: PoE+ Architecture
Power and networking are consolidated using a 24-port UniFi Pro PoE switch. This drastically simplifies cable management and allows for remote power cycling of hung nodes via the switch management API—a critical feature for headless fleet management.
Storage: Distributed NVMe
Each node hosts a 1TB NVMe SSD. Aggregated via distributed block storage software, this provides a high-performance, replicated storage tier capable of sustaining database I/O (PostgreSQL WAL writes) that would saturate SD card interfaces.
Physical Implementation
The entire cluster is contained within 4.33U of standard 19-inch rack space. The layout prioritizes thermal airflow and cable density.
Physical Rack Layout (4.33U Total)
Office environment • ~400W compute power • Silent operation
Network Layer - 2U
UniFiManagement Layer - 1.33U
Racknex MountCompute Layer - 1U
Compute BladeSoftware Stack
Talos Kubernetes orchestrating microservices
MLOps Services
- • RAG Pipelines
- • Vector Databases (pgvector)
- • Embedding Generation
- • Model Serving
- • LLM Orchestration
Data Layer
- • PostgreSQL (HA)
- • Redis (Caching/PubSub)
- • MinIO (S3-compatible)
- • Distributed Block Storage
- • 20TB Total NVMe
Observability
- • Prometheus (Metrics)
- • Grafana (Dashboards)
- • Distributed Tracing
- • AlertManager
- • Custom Thermal Monitor
Network Architecture
- • UniFi Dream Machine Pro (routing/gateway)
- • USW Pro 24 PoE (24-port switch)
- • Single cable per Pi (PoE+ power + data)
- • Kubernetes CNI for pod networking
- • Service mesh & network policies
Power & Thermal
- • ~400W total power consumption
- • PoE+ budget management (~25W/port)
- • Custom Rust thermal controller
- • Noctua fan curves (0-100% PWM)
- • Silent operation in office environment
Operating System: Talos Linux
I standardized on Talos Linux, an immutable OS built from scratch for Kubernetes. Talos eliminates the traditional Linux package manager and shell, treating the operating system as an ephemeral layer configured solely via API.
This architectural choice enforces “Infrastructure as Code” discipline. There is no SSH to “fix” a node; configurations must be applied declaratively. Upgrades are atomic image swaps (A/B partitioning), ensuring failed updates automatically rollback.
Custom Thermal Control Plane
The high density of 20 nodes in 1U creates a significant thermal challenge. The stock fan control solutions relied on userspace tools incompatible with Talos's restrictive environment.
I developed a lightweight Rust daemon to interface directly with the I2C bus on the carrier boards. This daemon runs as a DaemonSet on every node, autonomously managing fan PWM duty cycles based on local thermal telemetry.
fn calculate_fan_speed(temp: f32) -> u8 { match temp { t if t < 40.0 => 0, // Silent t if t < 50.0 => 30, // Low t if t < 60.0 => 50, // Medium t if t < 70.0 => 75, // High _ => 100, // Max } }
The control loop implements hysteresis to prevent fan oscillation (“hunting”) and maintains the SoC temperature between 50-60°C, balancing thermal headroom with acoustic comfort.
Distributed Storage Strategy
Reliable stateful workloads on unreliable hardware require robust replication. The storage layer uses a distributed block storage provider that implements 3-way synchronous replication.
Storage classes are configured to prefer local volume access (reading from the replica on the same node) to minimize network traversals, while synchronous writes ensure that a node failure results in zero data loss. This setup successfully survived multiple “pull the plug” tests during commissioning.
Engineering Challenges
The ARM64 Tax
While ARM support is improving, the ecosystem is not fully mature. I frequently encounter container images lacking `linux/arm64` manifests. This necessitated building a custom CI pipeline to cross-compile and re-package upstream dependencies, a valuable exercise in supply chain management.
Memory Pressure
Operating within 8GB constraints requires strict QoS enforcement. I learned to rely on the OOM Killer's behavior, setting distinct `requests` and `limits` to prioritize critical system components (CNI, storage) over application workloads during resource contention.
Conclusion
Constructing this cluster provided insights into distributed systems that are inaccessible through cloud consoles. It transformed theoretical knowledge of consensus algorithms, network latency, and failure domains into practical operational experience.
For engineers seeking to master Kubernetes, the physical constraints of a homelab offer a rigorous training ground. The friction encountered—hardware compatibility, thermal limits, network debugging—is not a bug, but the primary feature.