... automated via Flux, Renovate, and GitHub Actions
This is a repository for my home infrastructure and Kubernetes cluster. I try to adhere to Infrastructure as Code (IaC) and GitOps practices using tools like Kubernetes, Flux, Renovate, and GitHub Actions.
This semi hyper-converged cluster operates on Talos Linux, an immutable and ephemeral Linux distribution tailored for Kubernetes, and is deployed on bare-metal MS-A2 workstations. Rook supplies my workloads with persistent block, object, and file storage, while a separate server handles media file storage. The cluster is designed to enable a full teardown without any data loss.
There is a template at onedr0p/cluster-template if you want to follow along with some of the practices I use here.
- actions-runner-controller: Self-hosted Github runners.
- cert-manager: Creates SSL certificates for services in my cluster.
- cilium: eBPF-based networking for my workloads.
- cloudflared: Enables Cloudflare secure access to my routes.
- external-dns: Automatically syncs ingress DNS records to a DNS provider.
- external-secrets: Managed Kubernetes secrets using 1Password Connect.
- multus: Multi-homed pod networking.
- rook: Distributed block storage for persistent storage.
- spegel: Stateless cluster local OCI registry mirror.
- volsync: Backup and recovery of persistent volume claims.
Flux watches my kubernetes folder (see Directories below) and makes the changes to my clusters based on the state of my Git repository.
The way Flux works for me here is it will recursively search the kubernetes/apps folder until it finds the most top level kustomization.yaml per directory and then apply all the resources listed in it. That aforementioned kustomization.yaml will generally only have a namespace resource and one or many Flux kustomizations (ks.yaml). Under the control of those Flux kustomizations there will be a HelmRelease or other resources related to the application which will be applied.
Renovate monitors my entire repository for dependency updates, automatically creating a PR when updates are found. When some PRs are merged Flux applies the changes to my cluster.
This Git repository contains the following directories under kubernetes.
π kubernetes # Kubernetes cluster defined as code
ββπ apps # Apps deployed into my cluster grouped by namespace (see below)
ββπ components # Re-usable kustomize components
ββπ flux # Flux system configurationThis shows the two fundamental infrastructure workflows that enable secure, stateful applications in GitOps. These dependency chains solve the hardest problems newcomers face when building production-ready clusters: secrets management and persistent storage.
Solves the "chicken and egg" problem of bootstrapping secrets in GitOps
graph LR
A[1Password Vault]:::vault -->|Credentials| B[1Password Connect]:::connect
B -->|Creates| C[ClusterSecretStore]:::store
C -->|Enables| D[ExternalSecret]:::external
D -->|Syncs to| E[Kubernetes Secret]:::secret
E -->|Consumed by| F[Application Pods]:::app
classDef vault fill:#0066CC,stroke:#004499,stroke-width:2px,color:#fff;
classDef connect fill:#FF6B35,stroke:#CC5529,stroke-width:2px,color:#fff;
classDef store fill:#9C27B0,stroke:#7B1FA2,stroke-width:2px,color:#fff;
classDef external fill:#FF9800,stroke:#F57C00,stroke-width:2px,color:#fff;
classDef secret fill:#4CAF50,stroke:#2E7D32,stroke-width:2px,color:#fff;
classDef app fill:#2196F3,stroke:#0D47A1,stroke-width:2px,color:#fff;
Provides persistent storage with automated backup/restore capabilities
graph TD
A[Snapshot Controller]:::controller -->|Manages| B[VolumeSnapshots]:::snapshot
A -->|Enables| C[Rook Ceph Operator]:::operator
C -->|Creates| D[Ceph Cluster]:::cluster
D -->|Provides| E[StorageClasses]:::storage
E -->|Creates| F[PVCs]:::pvc
F -->|Backed up by| G[VolSync]:::volsync
G -->|Uses| B
F -->|Consumed by| H[Stateful Apps]:::app
classDef controller fill:#6A1B9A,stroke:#4A148C,stroke-width:2px,color:#fff;
classDef snapshot fill:#8E24AA,stroke:#6A1B9A,stroke-width:2px,color:#fff;
classDef operator fill:#00ACC1,stroke:#00838F,stroke-width:2px,color:#fff;
classDef cluster fill:#26A69A,stroke:#00695C,stroke-width:2px,color:#fff;
classDef storage fill:#66BB6A,stroke:#2E7D32,stroke-width:2px,color:#fff;
classDef pvc fill:#42A5F5,stroke:#0D47A1,stroke-width:2px,color:#fff;
classDef volsync fill:#FFA726,stroke:#E65100,stroke-width:2px,color:#fff;
classDef app fill:#EF5350,stroke:#C62828,stroke-width:2px,color:#fff;
Why These Workflows Matter:
π Security Pipeline: Traditional "secrets in git" approaches don't work for production. This 1Password Connect workflow solves the bootstrap problem by providing a secure, auditable way to inject secrets into your cluster without storing them in Git. Each ExternalSecret automatically syncs from 1Password, enabling secure GitOps practices.
Why External Secret Stores? While SOPS (Secrets OPerationS) is popular for encrypting secrets in GitOps homelabs, I just don't like it. I don't like dealing with the vscode plugins, and I don't like managing yet another tool. What I do like is using the secret manager where my passwords already live.
Secret Store Options: This cluster uses 1Password Connect (self-hosted), but there are many alternatives. Popular choices include HashiCorp Vault (powerful but complex), Bitwarden (familiar and self-hostable), Infisical (modern developer experience), and Doppler (simple cloud integration). 1Password also offers Service Accounts for easier cloud-based setup.
Check the Home Operations Discord for community experiences with different providers. Your choice depends on your security requirements, operational preferences, and existing infrastructure.
πΎ Storage Foundation: Stateful applications need reliable storage with backup/restore capabilities. This workflow shows how Rook Ceph provides distributed storage while VolSync handles automated backups using VolumeSnapshots. The Snapshot Controller enables point-in-time recovery for all your critical data.
Click to see a high-level network diagram
In my cluster there are two instances of ExternalDNS running. One for syncing private DNS records to my UDM Pro Max using ExternalDNS webhook provider for UniFi, while another instance syncs public DNS to Cloudflare. This setup is managed by creating routes with two specific gatways: internal for private DNS and external for public DNS. The external-dns instances then syncs the DNS records to their respective platforms accordingly.
| Device | Count | OS Disk Size | Data Disk Size | Ram | Operating System | Purpose |
|---|---|---|---|---|---|---|
| Beelink EQ12 | 2 | 512GB (SSD) | 512GB (NVME) | 32GB | Talos | Kubernetes |
| Intel NUC7 | 1 | 512GB (SSD) | 512GB (NVME) | 32GB | Talos | Kubernetes |
| 45Drives HL15 | 1 | 2x512GB (SSD) | 8x14TB HDD | 128GB | TrueNAS Scale | NFS |
| PiKVM (RasPi 4) | 1 | - | - | 4GB | PiKVM | KVM |
| TESmart 8 Port KVM Switch | 1 | - | - | - | - | Network KVM (for PiKVM) |
| UniFi Gateway Max | 1 | - | 512 (NVME) | - | UniFi OS | Router & NVR |
| UniFi USW Enterprise 8 POE | 1 | - | - | - | UniFi OS | 2.5Gb Core Switch |
| UniFi USW Pro 8 | 1 | - | - | - | UniFi OS | Garage PoE Switch |
| Lenovo Thinkstation P520 | 1 | - | Many Mixed NVME's | 128GB | UnRAID | Secondary/Flash NAS |
- Renovate Permission Issues: If you see "Cannot access vulnerability alerts" or "Package lookup failures", see
docs/RENOVATE-TROUBLESHOOTING.md - Cluster Issues: For node, storage, or networking problems, see
docs/CLUSTER-TROUBLESHOOTING.md - Setup Issues: For initial setup problems, see
docs/SETUP-GUIDE.md
# Fix secret sync issues
task k8s:sync-secrets
# Fix Renovate permissions
./scripts/fix-renovate-permissions.sh
# Browse storage issues
task k8s:browse-pvc CLAIM=<pvc-name>Many thanks to @onedrop, @buroa and all the fantastic people who donate their time to the Home Operations Discord community. Be sure to check out kubesearch.dev for ideas on how to deploy applications or get ideas on what you may deploy.
See the latest release notes.
See LICENSE.
