Benedetto Bosco

Architectural Overview

This project originated from a requirement to simulate edge computing constraints—network partitions, hardware failures, and strict resource limits—within a physical environment. While cloud infrastructure abstracts these complexities, building resilient distributed systems requires confronting them directly.

The solution is a high-density compute cluster comprising 20 Raspberry Pi Compute Module 4 nodes, orchestrated by Talos Linux. Each node is equipped with NVMe storage and powered via PoE+, eliminating common bottlenecks associated with single-board computers. The entire infrastructure, including networking and management, occupies just 4.33U of rack space while operating silently in an office environment.

The cluster serves as the production backbone for my personal infrastructure, hosting RAG pipelines, vector databases, and distributed caching layers. It demonstrates that enterprise-grade architecture principles—immutable infrastructure, GitOps, and observability—can be effectively scaled down to edge hardware.

System Architecture

Physical Rack Layout (4.33U Total)

Office environment • ~400W compute power • Silent operation

Network Layer - 2U

UniFi

Dream Machine Pro (1U)

Router • Gateway • Controller • IDS/IPS

USW Pro 24 PoE (1U)

24-port PoE+ switch • Powers all 20 Pi nodes

Management Layer - 1.33U

Racknex Mount

Intel NUC #1

Proxmox Host

Intel NUC #2

Proxmox Host

Intel NUC #3

Proxmox Host

Ubuntu VM Jumpbox

kubectl • talosctl • GitOps • CI/CD

Compute Layer - 1U

Compute Blade

Pi 1

Pi 2

Pi 3

Pi 4

Pi 5

Pi 6

Pi 7

Pi 8

Pi 9

Pi 10

Pi 11

Pi 12

Pi 13

Pi 14

Pi 15

Pi 16

Pi 17

Pi 18

Pi 19

Pi 20

CM4 Modules

ARM64 • 8GB RAM

Storage

1TB NVMe per node

Power

PoE+ per node

Cooling

Noctua + Rust Control

Software Stack

Talos Kubernetes orchestrating microservices

MLOps Services

• RAG Pipelines
• Vector Databases (pgvector)
• Embedding Generation
• Model Serving
• LLM Orchestration

Data Layer

• PostgreSQL (HA)
• Redis (Caching/PubSub)
• MinIO (S3-compatible)
• Distributed Block Storage
• 20TB Total NVMe

Observability

• Prometheus (Metrics)
• Grafana (Dashboards)
• Distributed Tracing
• AlertManager
• Custom Thermal Monitor

Network Architecture

• UniFi Dream Machine Pro (routing/gateway)
• USW Pro 24 PoE (24-port switch)
• Single cable per Pi (PoE+ power + data)
• Kubernetes CNI for pod networking
• Service mesh & network policies

Power & Thermal

• ~400W total power consumption
• PoE+ budget management (~25W/port)
• Custom Rust thermal controller
• Noctua fan curves (0-100% PWM)
• Silent operation in office environment

Engineering Decisions

Physicality & Edge Constraints

Choosing physical hardware over virtualization was deliberate. Distributed consensus algorithms (Raft, Paxos) behave differently under real network latency. Troubleshooting physical node failures provides operational experience that managed Kubernetes services cannot replicate.

Power & Networking Efficiency

The implementation uses a unified PoE+ architecture, delivering both power and 1Gbps networking over a single cable per node. This reduces failure points and simplifies thermal management within the rack enclosure.

Storage I/O Strategy

To support database workloads, I bypassed the USB bus entirely. Each CM4 node utilizes its single PCIe lane for NVMe storage, enabling high-throughput distributed block storage across the cluster.

Immutable OS Architecture

Talos Linux was selected for its API-driven, immutable nature. Eliminating SSH and shell access reduces the attack surface and enforces declarative configuration management, aligning with modern GitOps practices.

Technical Challenges

ARM64 Compatibility

The ARM architecture required a shift in CI/CD pipelines. I implemented multi-architecture build steps and, in several instances, contributed ARM64 support back to upstream open-source projects to enable deployment on the cluster.

Resource Constraints

With 8GB RAM per node, memory is a scarce resource. This necessitated strict Quality of Service (QoS) classes, aggressive horizontal autoscaling, and optimized JVM/runtime configurations for hosted services.

Thermal Control Plane

Hardware PWM control was incompatible with the immutable OS. I developed a custom Rust daemon that interfaces directly with the I2C bus to manage fan curves based on real-time thermal telemetry, maintaining optimal operating temperatures without OS-level dependencies.

Distributed Storage

Managing stateful workloads on ephemeral nodes required a robust storage layer. I deployed a replicated block storage system that ensures data locality where possible while guaranteeing consistency across node failures.

Distributed Edge Cluster

Architectural Overview

System Architecture

Physical Rack Layout (4.33U Total)

Network Layer - 2U

Management Layer - 1.33U

Compute Layer - 1U

Software Stack

MLOps Services

Data Layer

Observability

Network Architecture

Power & Thermal

Engineering Decisions

Physicality & Edge Constraints

Power & Networking Efficiency

Storage I/O Strategy

Immutable OS Architecture

Technical Challenges

ARM64 Compatibility

Resource Constraints

Thermal Control Plane

Distributed Storage

Operational Metrics