|
| 1 | +# GitHub Actions for CloudNativePG Chaos Testing |
| 2 | + |
| 3 | +This directory contains GitHub Actions workflows and reusable composite actions for automated chaos testing of CloudNativePG clusters. |
| 4 | + |
| 5 | +## Workflows |
| 6 | + |
| 7 | +### `chaos-test-full.yml` |
| 8 | + |
| 9 | +Comprehensive chaos testing workflow that validates PostgreSQL cluster resilience under failure conditions. |
| 10 | + |
| 11 | +**What it does**: |
| 12 | +- Provisions a Kind cluster using cnpg-playground |
| 13 | +- Installs CloudNativePG operator and PostgreSQL cluster |
| 14 | +- Deploys Litmus Chaos and Prometheus monitoring |
| 15 | +- Runs Jepsen consistency tests with pod-delete chaos injection |
| 16 | +- **Validates resilience** - fails the build if chaos tests don't pass |
| 17 | +- Collects comprehensive artifacts including cluster state dumps on failure |
| 18 | + |
| 19 | +**Triggers**: |
| 20 | +- **Manual**: `workflow_dispatch` with configurable chaos duration (default: 300s) |
| 21 | +- **Automatic**: Pull requests to `main` branch (skips documentation-only changes) |
| 22 | +- **Scheduled**: Weekly on Sundays at 13:00 UTC |
| 23 | + |
| 24 | +**Quality Gates**: |
| 25 | +- Litmus chaos experiment must pass |
| 26 | +- Jepsen consistency validation must pass (`:valid? true`) |
| 27 | +- Workflow fails if either check fails |
| 28 | + |
| 29 | +--- |
| 30 | + |
| 31 | +## Reusable Composite Actions |
| 32 | + |
| 33 | +### `free-disk-space` |
| 34 | + |
| 35 | +Removes unnecessary pre-installed software from GitHub runners to free up ~40GB of disk space. |
| 36 | + |
| 37 | +**What it removes**: |
| 38 | +- .NET SDK (~15-20 GB) |
| 39 | +- Android SDK (~12 GB) |
| 40 | +- Haskell tools (~5-8 GB) |
| 41 | +- Large tool caches (CodeQL, Go, Python, Ruby, Node) |
| 42 | +- Unused browsers |
| 43 | + |
| 44 | +**What it preserves**: |
| 45 | +- Docker |
| 46 | +- kubectl |
| 47 | +- Kind |
| 48 | +- Helm |
| 49 | +- jq |
| 50 | + |
| 51 | +**Usage**: |
| 52 | +```yaml |
| 53 | +- name: Free disk space |
| 54 | + uses: ./.github/actions/free-disk-space |
| 55 | +``` |
| 56 | +
|
| 57 | +--- |
| 58 | +
|
| 59 | +### `setup-tools` |
| 60 | + |
| 61 | +Installs and upgrades chaos testing tools to latest stable versions. |
| 62 | + |
| 63 | +**Tools installed/upgraded**: |
| 64 | +- kubectl (latest stable) |
| 65 | +- Kind (latest release) |
| 66 | +- Helm (latest via official installer) |
| 67 | +- krew (kubectl plugin manager) |
| 68 | +- kubectl-cnpg plugin (via krew) |
| 69 | + |
| 70 | +**Usage**: |
| 71 | +```yaml |
| 72 | +- name: Setup chaos testing tools |
| 73 | + uses: ./.github/actions/setup-tools |
| 74 | +``` |
| 75 | + |
| 76 | +--- |
| 77 | + |
| 78 | +### `setup-kind` |
| 79 | + |
| 80 | +Creates a Kind cluster using the proven cnpg-playground configuration. |
| 81 | + |
| 82 | +**Features**: |
| 83 | +- Multi-node cluster with PostgreSQL-labeled nodes |
| 84 | +- Configured for HA testing |
| 85 | +- Proven configuration from cnpg-playground |
| 86 | + |
| 87 | +**Inputs**: |
| 88 | +- `region` (optional): Region name for the cluster (default: `eu`) |
| 89 | + |
| 90 | +**Outputs**: |
| 91 | +- `kubeconfig`: Path to kubeconfig file |
| 92 | +- `cluster-name`: Name of the created cluster |
| 93 | + |
| 94 | +**Usage**: |
| 95 | +```yaml |
| 96 | +- name: Create Kind cluster |
| 97 | + uses: ./.github/actions/setup-kind |
| 98 | + with: |
| 99 | + region: eu |
| 100 | +``` |
| 101 | + |
| 102 | +--- |
| 103 | + |
| 104 | +### `setup-cnpg` |
| 105 | + |
| 106 | +Installs CloudNativePG operator and deploys a PostgreSQL cluster. |
| 107 | + |
| 108 | +**What it does**: |
| 109 | +1. Installs CNPG operator using `kubectl cnpg install generate` (recommended method) |
| 110 | +2. Waits for operator deployment to be ready |
| 111 | +3. Applies CNPG operator configuration |
| 112 | +4. Waits for webhook to be fully initialized |
| 113 | +5. Deploys PostgreSQL cluster |
| 114 | +6. Waits for cluster to be ready with health checks |
| 115 | + |
| 116 | +**Requirements**: |
| 117 | +- `clusters/cnpg-config.yaml` - CNPG operator configuration |
| 118 | +- `clusters/pg-eu-cluster.yaml` - PostgreSQL cluster definition |
| 119 | + |
| 120 | +**Usage**: |
| 121 | +```yaml |
| 122 | +- name: Setup CloudNativePG |
| 123 | + uses: ./.github/actions/setup-cnpg |
| 124 | +``` |
| 125 | + |
| 126 | +--- |
| 127 | + |
| 128 | +### `setup-litmus` |
| 129 | + |
| 130 | +Installs Litmus Chaos operator, experiments, and RBAC configuration. |
| 131 | + |
| 132 | +**What it installs**: |
| 133 | +- litmus-core operator (via Helm) |
| 134 | +- pod-delete chaos experiment |
| 135 | +- Litmus RBAC (ServiceAccount, ClusterRole, ClusterRoleBinding) |
| 136 | + |
| 137 | +**Verification**: |
| 138 | +- Checks all CRDs are installed |
| 139 | +- Verifies operator is ready |
| 140 | +- Validates RBAC permissions |
| 141 | + |
| 142 | +**Requirements**: |
| 143 | +- `litmus-rbac.yaml` - RBAC configuration file |
| 144 | + |
| 145 | +**Usage**: |
| 146 | +```yaml |
| 147 | +- name: Setup Litmus Chaos |
| 148 | + uses: ./.github/actions/setup-litmus |
| 149 | +``` |
| 150 | + |
| 151 | +--- |
| 152 | + |
| 153 | +### `setup-prometheus` |
| 154 | + |
| 155 | +Installs Prometheus monitoring (without Grafana) and configures CNPG ServiceMonitor. |
| 156 | + |
| 157 | +**What it installs**: |
| 158 | +- kube-prometheus-stack (Grafana and AlertManager disabled) |
| 159 | +- Prometheus Operator |
| 160 | +- kube-state-metrics |
| 161 | +- CNPG ServiceMonitor for PostgreSQL metrics |
| 162 | + |
| 163 | +**Resource limits** (optimized for CI): |
| 164 | +- Prometheus: 512Mi request, 1Gi limit |
| 165 | +- Prometheus Operator: 128Mi request, 256Mi limit |
| 166 | + |
| 167 | +**Requirements**: |
| 168 | +- `monitoring/podmonitor-pg-eu.yaml` - CNPG ServiceMonitor configuration |
| 169 | + |
| 170 | +**Usage**: |
| 171 | +```yaml |
| 172 | +- name: Setup Prometheus |
| 173 | + uses: ./.github/actions/setup-prometheus |
| 174 | +``` |
| 175 | + |
| 176 | +--- |
| 177 | + |
| 178 | +## Artifacts |
| 179 | + |
| 180 | +Each workflow run produces the following artifacts (retained for 30 days): |
| 181 | + |
| 182 | +**Jepsen Results**: |
| 183 | +- `results.edn` - Test results in EDN format |
| 184 | +- `history.edn` - Operation history |
| 185 | +- `STATISTICS.txt` - Test statistics |
| 186 | +- `*.png` - Visualization graphs |
| 187 | + |
| 188 | +**Litmus Results**: |
| 189 | +- `chaosresult.yaml` - Chaos experiment results |
| 190 | + |
| 191 | +**Logs**: |
| 192 | +- `test.log` - Complete test execution log |
| 193 | + |
| 194 | +**Cluster State** (on failure only): |
| 195 | +- `cluster-state-dump.yaml` - Complete cluster state including pods, events, and operator logs |
| 196 | + |
| 197 | +--- |
| 198 | + |
| 199 | +## Usage in Other Workflows |
| 200 | + |
| 201 | +You can reuse these actions in your own workflows: |
| 202 | + |
| 203 | +```yaml |
| 204 | +name: My Chaos Test |
| 205 | +
|
| 206 | +on: |
| 207 | + workflow_dispatch: |
| 208 | +
|
| 209 | +jobs: |
| 210 | + test: |
| 211 | + runs-on: ubuntu-latest |
| 212 | + permissions: |
| 213 | + contents: read |
| 214 | + actions: write |
| 215 | + |
| 216 | + steps: |
| 217 | + - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2 |
| 218 | + |
| 219 | + - name: Free disk space |
| 220 | + uses: ./.github/actions/free-disk-space |
| 221 | + |
| 222 | + - name: Setup tools |
| 223 | + uses: ./.github/actions/setup-tools |
| 224 | + |
| 225 | + - name: Create cluster |
| 226 | + uses: ./.github/actions/setup-kind |
| 227 | + with: |
| 228 | + region: us |
| 229 | + |
| 230 | + - name: Setup CNPG |
| 231 | + uses: ./.github/actions/setup-cnpg |
| 232 | + |
| 233 | + # Your custom chaos testing steps here |
| 234 | +``` |
| 235 | + |
| 236 | +--- |
0 commit comments