Skip to content

Talos with Cilium CNI, BGP and and

Introduction

I have been running a Kubernetes cluster in some form at home for about 2 years, initially with Ubuntu and kubeadm and more recently with Talos Linux.

Talos has helped reduce toil[^1] significantly in the last 9 months of use. No longer do I need to regularly ssh into a node and upgrade it's packages or wrangle with kubeadm to upgrade the cluster or maintain Ansible play-books for disaster recovery and maintenance.

When I came across Matthew Frost's reddit post and their subsequent guide and code I was immediately attracted.

[^1]: Toil in the sense of busy work.

The Hardware

  • 5x Raspberry Pi 4b
    • 8GB RAM
    • 32 GB SD card
    • 128 GB USB storage (only on 3-4 pi's as some have or are in the process of death)
    • PoE Hats
  • NUC with an Intel N6005
    • 46 GB RAM
    • 1 TB NVME
    • Running Proxmox with these VMs ...
    • 4x Talos Nodes (2 vCPU - 8 GB RAM - 10GB System Disk, 200GB Scratch Disk)

Talos installation

I will be light here as the talos docs have comprehensive guides and examples for installing and configuration on different systems.

I will share that I am using Taskfile to help abstract commands.

Config generation

Note: I am using proposal 2 of the experimental feature for Map Variables

Taskfile.yaml
version: "3"
vars:
  MEMBERS:
    map:
      SEYCHELLES01: {"TYPE": "controlplane", "IP": "192.168.10.1", "IS_RPI": true}
      # SEYCHELLES02: { "TYPE": "worker", "IP": "192.168.10.2", "IS_RPI": true }
      # SEYCHELLES03: { "TYPE": "worker", "IP": "192.168.10.3", "IS_RPI": true }
      # SEYCHELLES04: { "TYPE": "worker", "IP": "192.168.10.4", "IS_RPI": true }
      # SEYCHELLES05: { "TYPE": "controlplane", "IP": "192.168.10.5", "IS_RPI": true }
      SEYCHELLES101: {"TYPE": "controlplane", "IP": "192.168.10.101", "IS_RPI": false}
      SEYCHELLES102: {"TYPE": "worker", "IP": "192.168.10.102", "IS_RPI": false}
      SEYCHELLES103: {"TYPE": "worker", "IP": "192.168.10.103", "IS_RPI": false}
      SEYCHELLES104: {"TYPE": "worker", "IP": "192.168.10.104", "IS_RPI": false}
tasks:
  get-secrets:
    cmd: op document get "Talos Secrets" --out-file secrets.yaml
    method: checksum
    sources:
      - secrets.yaml
    generates:
      - secrets.yaml
  generate-config:
    internal: true
    deps:
      - get-secrets
    requires:
      vars:
        - NAME
        - name: TYPE
          enum: [controlplane, worker]
        - IS_RPI
    vars:
      IS_controlplane:
        ref: eq .TYPE "controlplane"
      IS_RPI: # I can't seem to get IS_RPI passed in as a bool instead of a string, so just doing this and moving on
        ref: eq .IS_RPI "true"
    cmd: |
      talosctl gen config --force \
      --with-secrets secrets.yaml \
      --output-types {{ .TYPE }} \
      --config-patch @cni.patch \
      --config-patch @logging.patch \
      {{- if .IS_RPI }}
      --config-patch @install-rpi.patch \
      {{- end -}}
      {{ if .IS_CONTROLPANE }}
      --config-patch @network-controlplane.patch \
      {{- end }}
      --config-patch @{{ .NAME }}.patch \
      --output _out/{{ .NAME }}.yaml \
      --with-docs=false --with-examples=false \
      seychelles https://192.168.10.99:6443
    method: checksum
    label: "{{ .NAME }}"
    sources:
      - secrets.yaml
      - cni.patch
      - logging.patch
      - install-rpi.patch
      - network-controlplane.patch
      - "{{ .NAME }}.patch"
    generates:
      - _out/{{ .NAME }}.yaml
  generate-config-all:
    cmds:
      - for: {var: MEMBERS, as: NODE}
        task: generate-config
        vars:
          NAME: "{{ .KEY }}"
          TYPE: "{{ .NODE.TYPE }}"
          IS_RPI: "{{ .NODE.IS_RPI }}" # The ref version still passes this var to the task as a string

I have split my configuration over several patch files. The cni.patch is notable because it will prevent the cluster from reaching a healthy state, as I have elected to install the CNI[^2] separately.

cni.patch
cluster:
  network:
    cni:
      name: none
  proxy:
    disabled: true

[^2]: CNI - Container Network Interface

Other patches ...

and a side note on naming

install-rpi.patch
# Selectively included on raspberry pi nodes
machine:
  install:
    disk: /dev/mmcblk0
logging.patch
machine:
  logging:
    destinations:
      - endpoint: "udp://cairo.local:514/"
        format: "json_lines"
network-controlplane.patch
# Selectively included on control pane nodes
machine:
  network:
    interfaces:
    - interface: eth0
      dhcp: true
      vip:
        ip: 192.168.10.99
host.patch
# This is repeated for each host with a hostname
machine:
  network:
    hostname: seychelles101
  nodeLabels:
    node.kubernetes.io/instance-type: santorini-vm
I organise my infra folder in tiers, and name machines after international destinations I have visited.

Tiers Description Examples
secrets Secrets generated and saved with IaC generating/rotating/saving fine grain API keys
metal Machine Creation and configuration Ansible, Terraform, Talos
foundation Basic cluster needs networking, secrets retrieval, mail, permissions, DNS records
core_services Services shared between apps Databases, ClusterTunnel
applications Services used directly APIs, Apps, Sites, and related infra (gateways)

Cilium CNI and ArgoCD Installation and configuration

In general I prefer Helm combined with ArgoCD's Apps of Apps pattern

Here is an overview of the next steps

  1. Add the ArgoCD CRDs
  2. Retrieve secrets, to be consumed by Kustomize
  3. Setup the helm app of apps with Cilium and ArgoCD
  4. Format the values.yaml for Cilium and ArgoCD to be consumed by Kustomize
  5. Run kubectl kustomize | kubectl apply -f - which does the following additional steps
    1. Installs the helm charts for Cilium and ArgoCD
    2. Adds the Kubernetes gateway-api CRDs (at the version cilium requires)
    3. Adds the ArgoCD namespace
    4. Adds the app_of_apps.yaml
    5. Create a secret with the label argocd.argoproj.io/secret-type: repository so ArgoCD can access my private repo hosting the code.
  6. Wait for everything to roll out.
  7. Tell ArgoCD to take over the life-cycle of the helm charts.

And a visualisation of the folder layout

infra
├── foundation
│   ├── _out
│   │   └── # un-committed files for consumption by kustomize
│   ├── app-of-apps
│   │   ├── Chart.yaml
│   │   ├── templates
│   │   │   ├── argo_app.yaml
│   │   │   ├── argo_namespace.yaml
│   │   │   └── cilium_app.yaml
│   │   └── values.yaml
│   ├── app_of_apps.yaml
│   ├── gitlab-creds.txt.tmpl
│   └── kustomization.yaml
└── metal
    ├── Taskfile.yaml
    ├── _out
    │   └── # generated configs
    ├── cni.patch
    ├── install-rpi.patch
    ├── logging.patch
    ├── network-controlplane.patch
    ├── secrets.yaml
    ├── seychelles01.patch
    ├── seychelles101.patch
    ├── seychelles102.patch
    ├── seychelles103.patch
    └── seychelles104.patch

Cilium

BGP

Gateway API

ArgoCD

External DNS

External Secrets Operator with 1Password

Certman