Talos with Cilium CNI, BGP and and
Introduction
I have been running a Kubernetes cluster in some form at home for about 2 years, initially with Ubuntu and kubeadm
and more recently with Talos Linux.
Talos has helped reduce toil[^1] significantly in the last 9 months of use. No longer do I need to regularly ssh into a node and upgrade it's packages or wrangle with kubeadm
to upgrade the cluster or maintain Ansible play-books for disaster recovery and maintenance.
When I came across Matthew Frost's reddit post and their subsequent guide and code I was immediately attracted.
[^1]: Toil in the sense of busy work.
The Hardware
- 5x Raspberry Pi 4b
- 8GB RAM
- 32 GB SD card
- 128 GB USB storage (only on 3-4 pi's as some have or are in the process of death)
- PoE Hats
- NUC with an Intel N6005
- 46 GB RAM
- 1 TB NVME
- Running Proxmox with these VMs ...
- 4x Talos Nodes (2 vCPU - 8 GB RAM - 10GB System Disk, 200GB Scratch Disk)
Talos installation
I will be light here as the talos docs have comprehensive guides and examples for installing and configuration on different systems.
I will share that I am using Taskfile to help abstract commands.
Config generation
Note: I am using proposal 2 of the experimental feature for Map Variables
version: "3"
vars:
MEMBERS:
map:
SEYCHELLES01: {"TYPE": "controlplane", "IP": "192.168.10.1", "IS_RPI": true}
# SEYCHELLES02: { "TYPE": "worker", "IP": "192.168.10.2", "IS_RPI": true }
# SEYCHELLES03: { "TYPE": "worker", "IP": "192.168.10.3", "IS_RPI": true }
# SEYCHELLES04: { "TYPE": "worker", "IP": "192.168.10.4", "IS_RPI": true }
# SEYCHELLES05: { "TYPE": "controlplane", "IP": "192.168.10.5", "IS_RPI": true }
SEYCHELLES101: {"TYPE": "controlplane", "IP": "192.168.10.101", "IS_RPI": false}
SEYCHELLES102: {"TYPE": "worker", "IP": "192.168.10.102", "IS_RPI": false}
SEYCHELLES103: {"TYPE": "worker", "IP": "192.168.10.103", "IS_RPI": false}
SEYCHELLES104: {"TYPE": "worker", "IP": "192.168.10.104", "IS_RPI": false}
tasks:
get-secrets:
cmd: op document get "Talos Secrets" --out-file secrets.yaml
method: checksum
sources:
- secrets.yaml
generates:
- secrets.yaml
generate-config:
internal: true
deps:
- get-secrets
requires:
vars:
- NAME
- name: TYPE
enum: [controlplane, worker]
- IS_RPI
vars:
IS_controlplane:
ref: eq .TYPE "controlplane"
IS_RPI: # I can't seem to get IS_RPI passed in as a bool instead of a string, so just doing this and moving on
ref: eq .IS_RPI "true"
cmd: |
talosctl gen config --force \
--with-secrets secrets.yaml \
--output-types {{ .TYPE }} \
--config-patch @cni.patch \
--config-patch @logging.patch \
{{- if .IS_RPI }}
--config-patch @install-rpi.patch \
{{- end -}}
{{ if .IS_CONTROLPANE }}
--config-patch @network-controlplane.patch \
{{- end }}
--config-patch @{{ .NAME }}.patch \
--output _out/{{ .NAME }}.yaml \
--with-docs=false --with-examples=false \
seychelles https://192.168.10.99:6443
method: checksum
label: "{{ .NAME }}"
sources:
- secrets.yaml
- cni.patch
- logging.patch
- install-rpi.patch
- network-controlplane.patch
- "{{ .NAME }}.patch"
generates:
- _out/{{ .NAME }}.yaml
generate-config-all:
cmds:
- for: {var: MEMBERS, as: NODE}
task: generate-config
vars:
NAME: "{{ .KEY }}"
TYPE: "{{ .NODE.TYPE }}"
IS_RPI: "{{ .NODE.IS_RPI }}" # The ref version still passes this var to the task as a string
I have split my configuration over several patch files.
The
cni.patch
is notable because it will prevent the cluster from reaching a healthy state, as I have elected to install the CNI[^2] separately.
cluster:
network:
cni:
name: none
proxy:
disabled: true
[^2]: CNI - Container Network Interface
Other patches ...
and a side note on naming
# Selectively included on raspberry pi nodes
machine:
install:
disk: /dev/mmcblk0
machine:
logging:
destinations:
- endpoint: "udp://cairo.local:514/"
format: "json_lines"
# Selectively included on control pane nodes
machine:
network:
interfaces:
- interface: eth0
dhcp: true
vip:
ip: 192.168.10.99
# This is repeated for each host with a hostname
machine:
network:
hostname: seychelles101
nodeLabels:
node.kubernetes.io/instance-type: santorini-vm
Tiers | Description | Examples |
---|---|---|
secrets | Secrets generated and saved with IaC | generating/rotating/saving fine grain API keys |
metal | Machine Creation and configuration | Ansible, Terraform, Talos |
foundation | Basic cluster needs | networking, secrets retrieval, mail, permissions, DNS records |
core_services | Services shared between apps | Databases, ClusterTunnel |
applications | Services used directly | APIs, Apps, Sites, and related infra (gateways) |
Cilium CNI and ArgoCD Installation and configuration
In general I prefer Helm combined with ArgoCD's Apps of Apps pattern
Here is an overview of the next steps
- Add the ArgoCD CRDs
- Retrieve secrets, to be consumed by Kustomize
- Setup the helm app of apps with Cilium and ArgoCD
- Format the
values.yaml
for Cilium and ArgoCD to be consumed by Kustomize - Run
kubectl kustomize | kubectl apply -f -
which does the following additional steps- Installs the helm charts for Cilium and ArgoCD
- Adds the Kubernetes gateway-api CRDs (at the version cilium requires)
- Adds the ArgoCD namespace
- Adds the app_of_apps.yaml
- Create a secret with the label
argocd.argoproj.io/secret-type: repository
so ArgoCD can access my private repo hosting the code.
- Wait for everything to roll out.
- Tell ArgoCD to take over the life-cycle of the helm charts.
And a visualisation of the folder layout
infra
├── foundation
│ ├── _out
│ │ └── # un-committed files for consumption by kustomize
│ ├── app-of-apps
│ │ ├── Chart.yaml
│ │ ├── templates
│ │ │ ├── argo_app.yaml
│ │ │ ├── argo_namespace.yaml
│ │ │ └── cilium_app.yaml
│ │ └── values.yaml
│ ├── app_of_apps.yaml
│ ├── gitlab-creds.txt.tmpl
│ └── kustomization.yaml
└── metal
├── Taskfile.yaml
├── _out
│ └── # generated configs
├── cni.patch
├── install-rpi.patch
├── logging.patch
├── network-controlplane.patch
├── secrets.yaml
├── seychelles01.patch
├── seychelles101.patch
├── seychelles102.patch
├── seychelles103.patch
└── seychelles104.patch