Add chaos testing setup and experiment documentation #3

XploY04 · 2025-10-02T11:57:54Z

This PR introduces a production-ready chaos testing framework for CloudNativePG clusters, combining Jepsen (formal consistency verification) with Litmus Chaos (pod deletion) to provide mathematical proof of data integrity under failure conditions.

Part of LFX Mentorship Program 2025/3 to strengthen CloudNativePG resilience through rigorous testing.

Closes #2

What's Included

chaos-testing/
├── README.md                          # Comprehensive documentation (1300+ lines)
├── pg-eu-cluster.yaml                 # PostgreSQL cluster config
├── litmus-rbac.yaml                   # Chaos permissions
├── experiments/
│   └── cnpg-jepsen-chaos.yaml         # Combined Jepsen + chaos test
├── workloads/
│   ├── jepsen-cnpg-job.yaml           # Jepsen test job
│   └── jepsen-results-pvc.yaml        # Result storage
└── scripts/
    ├── run-jepsen-chaos-test.sh       # Main orchestration
    ├── monitor-cnpg-pods.sh           # Real-time monitoring
    └── get-chaos-results.sh           # Result extraction

Test Workflow

Deploy CloudNativePG cluster (1 primary + 2 replicas)
Start Jepsen workload (continuous read/write operations)
Inject chaos (delete primary pod every 180s)
CloudNativePG performs automatic failover
Jepsen continues operations throughout chaos
Elle checker analyzes history for consistency violations
Generate verdict: :valid? true (PASS) or :valid? false (FAIL)

Quick Start

# Deploy PostgreSQL cluster
kubectl apply -f pg-eu-cluster.yaml
kubectl wait --for=condition=ready cluster/pg-eu --timeout=300s

# Configure chaos RBAC
kubectl apply -f litmus-rbac.yaml

# Run 5-minute test
./scripts/run-jepsen-chaos-test.sh

# Check results
grep ":valid?" logs/jepsen-chaos-*/results/results.edn
cat logs/jepsen-chaos-*/STATISTICS.txt

Expected Results

Successful Test

{:valid? true
 :anomaly-types []
 :not #{}}

Statistics

Total :ok     : 14,523  (97.03%)  # Successful operations
Total :fail   : 445     (2.97%)   # Expected during failover
Total :info   : 0       (0.00%)   # Indeterminate operations

Result Files

results.edn - Consistency verdict (:valid? true/false)
history.edn - Complete operation log (3-6 MB)
timeline.html - Interactive visualization
STATISTICS.txt - High-level summary
jepsen.log - Full test execution logs

Technical Highlights

Dynamic Pod Targeting

TARGETS: "deployment:default:[cnpg.io/cluster=pg-eu,cnpg.io/instanceRole=primary]:intersection"

Comprehensive Probes

Start-of-test: Cluster health check
End-of-test: Cluster recovery verification
Continuous: Replication lag monitoring (Prometheus)
Continuous: Primary availability check
Continuous: Cluster status validation

Configurable Parameters

Test duration: 60s to 1800s+
Chaos interval: 30s to 300s
Operation rate: 10-100+ ops/sec
Concurrency: 1-20 workers
Isolation levels: read-committed, repeatable-read, serializable

Testing Performed

✅ Baseline tests (no chaos)
✅ Primary pod deletion
✅ Multiple failover cycles
✅ Different isolation levels
✅ Various chaos intervals

Future Enhancements

Network partition testing
Storage failure simulation
Multi-cluster testing
Performance regression detection
CI/CD pipeline templates

Signed-off-by: XploY04 [email protected]

- Updated README.md with prerequisites, environment setup, and chaos experiment instructions. - Created EXPERIMENT-GUIDE.md for detailed chaos experiment execution and monitoring. - Added YAML files for chaos experiments: cnpg-primary-pod-delete.yaml, cnpg-random-pod-delete.yaml, and cnpg-replica-pod-delete.yaml. - Implemented Litmus RBAC configuration in litmus-rbac.yaml. - Configured PostgreSQL cluster in pg-eu-cluster.yaml. - Developed scripts for environment verification (check-environment.sh) and chaos results retrieval (get-chaos-results.sh). - Enhanced status check script (status-check.sh) for Litmus installation verification. Signed-off-by: XploY04 <[email protected]>

Signed-off-by: XploY04 <[email protected]>

…TARGET_PODS Signed-off-by: XploY04 <[email protected]>

… updating documentation. Added support for chaos experiments without hard-coded pod names, improved README and quick start guides, and introduced monitoring scripts for better visibility during chaos experiments. Signed-off-by: XploY04 <[email protected]>

…a consistency verification - Implemented `setup-cnp-bench.sh` for configuring cnp-bench with detailed instructions for benchmarking CloudNativePG. - Created `setup-prometheus-monitoring.sh` to apply PodMonitor configurations for Prometheus metrics scraping. - Developed `verify-data-consistency.sh` to check data integrity after chaos experiments, including various consistency tests. - Added `pgbench-continuous-job.yaml` for running continuous pgbench workloads during chaos testing, with options for custom workloads. Signed-off-by: XploY04 <[email protected]>

… Prometheus Signed-off-by: XploY04 <[email protected]>

…for consistency in chaos experiments Signed-off-by: XploY04 <[email protected]>

- Created a Kubernetes Job definition for running the Jepsen PostgreSQL consistency test against a CloudNativePG cluster. - The job includes environment variables for configuration, command execution for testing, and result handling. - Added a PersistentVolumeClaim for storing Jepsen test results with a request for 2Gi of storage. Signed-off-by: XploY04 <[email protected]>

gbartolini · 2025-11-18T11:05:13Z

Please use the latest version of the operator in the installation instructions:

kubectl apply -f \
  https://raw.githubusercontent.com/cloudnative-pg/cloudnative-pg/release-1.20/releases/cnpg-1.20.0.yaml

…oring enhancements Signed-off-by: XploY04 <[email protected]>

- Implemented a comprehensive bash script to orchestrate Jepsen consistency testing with chaos experiments. - The script includes pre-flight checks, database cleanup, PVC management, Jepsen job deployment, chaos experiment application, and result extraction. - Added logging functionality with color-coded output for better readability. - Integrated error handling and cleanup procedures to ensure graceful exits and resource management. - Provided detailed usage instructions and exit codes for user guidance. Signed-off-by: XploY04 <[email protected]>

…ration Signed-off-by: XploY04 <[email protected]>

…ting - Introduced a new ChaosEngine configuration () for running Jepsen tests without Prometheus probes, allowing for chaos testing in environments lacking monitoring. - Updated existing to remove unnecessary probe configurations and ensure compatibility with the new no-probes variant. - Modified to include a Service definition for metrics collection and changed PodMonitor to ServiceMonitor for better integration with Prometheus. - Removed obsolete and Jepsen job configurations that are no longer needed. - Deleted scripts for fetching chaos results and monitoring CNPG pods, streamlining the testing process. - Enhanced to include namespace and context parameters for improved flexibility. Signed-off-by: XploY04 <[email protected]>

…d improve primary pod identification logic Signed-off-by: XploY04 <[email protected]>

… replication monitoring Signed-off-by: XploY04 <[email protected]>

README.md

sxd · 2025-11-22T07:23:55Z

README.md

+  >
+  > ```bash
+  > VERSION="v1.27.1"
+  > curl -L "https://github.com/cloudnative-pg/cloudnative-pg/releases/download/${VERSION}/kubectl-cnpg_${VERSION}_linux_amd64.tar.gz" -o /tmp/kubectl-cnpg.tar.gz


This command line doesn't work is not v1.27.1 for the version and amd64 isn't in the binary to download

sxd · 2025-11-22T09:14:17Z

README.md

+
+```bash
+# Re-export the playground kubeconfig if you opened a new shell
+export KUBECONFIG=/path/to/cnpg-playground/k8s/kube-config.yaml


probably something like

export KUBECONFIG=$PWD/k8s/kube-config.yaml

Since you already inside the cnpg-playground directory

sxd · 2025-11-22T09:15:35Z

README.md

+# Apply the 1.27.1 operator manifest exactly as documented
+kubectl apply --server-side -f \
+  https://raw.githubusercontent.com/cloudnative-pg/cloudnative-pg/release-1.27/releases/cnpg-1.27.1.yaml
+
+# Alternatively, generate a custom manifest via the kubectl cnpg plugin
+kubectl cnpg install generate --control-plane \
+  | kubectl apply --context kind-k8s-eu -f - --server-side


I will keep one or the other, but not both, it's confusing my first question was "Why I'm installing the same twice?"

sxd · 2025-11-22T09:19:25Z

README.md

+
+> Follow these sections in order; each references the authoritative upstream documentation to keep this README concise.
+
+### 1. Bootstrap the CNPG Playground


On this test it's important to mention that it's required to increase the max open files otherwise it's not working, this is on the playground IIRC

I didn't think about this. I have added that.

Where was this added ? Because I can't see it

Increase max open files limit if needed (required for Jepsen on some systems):
ulimit -n 65536

In Prerequisites before running the script.

sxd · 2025-11-22T09:33:38Z

README.md

+kubectl apply -n litmus -f https://hub.litmuschaos.io/api/chaos/master?file=faults/kubernetes/pod-delete/fault.yaml
+
+# OR install from local file (if you need customization)
+kubectl apply -n litmus -f chaosexperiments/pod-delete-cnpg.yaml


This failed with the following error:

error: the namespace from the provided object "default" does not match the namespace "litmus". You must pass '--namespace=default' to perform this operation.

sxd · 2025-11-22T09:35:51Z

README.md

+# Watch the chaos runner pod start (refreshes every 2s)
+watch -n2 'kubectl -n litmus get pods | grep cnpg-jepsen-chaos-noprobes-runner'


and I do what? here is not clear if I should exit or not at some point, I'm guessing yes

sxd · 2025-11-22T09:36:53Z

README.md

+# Check experiment logs to see pod deletions (ensure a pod exists first)
+runner_pod=$(kubectl -n litmus get pods -l chaos-runner-name=cnpg-jepsen-chaos-noprobes -o jsonpath='{.items[0].metadata.name}') && \
+kubectl -n litmus logs -f "$runner_pod"


This failed with the following error

runner_pod=$(kubectl -n litmus get pods -l chaos-runner-name=cnpg-jepsen-chaos-noprobes -o jsonpath='{.items[0].metadata.name}') && \ kubectl -n litmus logs -f "$runner_pod" error: error executing jsonpath "{.items[0].metadata.name}": Error executing template: array index out of bounds: index 0, length 0. Printing more information for debugging the template: template was: {.items[0].metadata.name} object given to jsonpath engine was: map[string]interface {}{"apiVersion":"v1", "items":[]interface {}{}, "kind":"List", "metadata":map[string]interface {}{"resourceVersion":""}}

sxd · 2025-11-22T09:39:00Z

README.md

+Expose the CNPG metrics port (9187) through a dedicated Service + ServiceMonitor bundle, then verify Prometheus scrapes it. Manual management keeps you aligned with the operator deprecation of `spec.monitoring.enablePodMonitor` and dodges the PodMonitor regression in kube-prometheus-stack v79 where CNPG pods only advertise the `postgresql` and `status` ports:
+
+```bash
+kubectl create namespace monitoring --dry-run=client -o yaml | kubectl apply -f -


This throw the following error

kubectl create namespace monitoring --dry-run=client -o yaml | kubectl apply -f - Warning: resource namespaces/monitoring is missing the kubectl.kubernetes.io/last-applied-configuration annotation which is required by kubectl apply. kubectl apply should only be used on resources created declaratively by either kubectl create --save-config or kubectl apply. The missing annotation will be patched automatically. namespace/monitoring configured

sxd · 2025-11-22T09:40:43Z

README.md

+
+> **Note:** Keep using `experiments/cnpg-jepsen-chaos-noprobes.yaml` until Section 5 installs Prometheus/Grafana. Once monitoring is online, switch to `experiments/cnpg-jepsen-chaos.yaml` (probes enabled) for full observability.
+
+### 5. Configure monitoring (Prometheus + Grafana)


Now on this step I run out of space, having 20G of space, this should be clarified, my test just stop here because of running out of space

I added this now.

…, refine Jepsen prerequisites, and improve various command examples. Signed-off-by: XploY04 <[email protected]>

…refine chaos result summary output. Signed-off-by: XploY04 <[email protected]>

… corresponding setup instructions. Signed-off-by: XploY04 <[email protected]>

…p and removing optional Litmus UI and advanced CNPG install details. Signed-off-by: XploY04 <[email protected]>

Signed-off-by: XploY04 <[email protected]>

gbartolini · 2025-11-25T08:19:45Z

README.md

+export KUBECONFIG=$PWD/k8s/kube-config.yaml
+kubectl config use-context kind-k8s-eu
+
+# Apply the 1.27.1 operator manifest exactly as documented


Given that you have the plugin installed, I would rely on the plugin to install the latest version of the operator. See https://github.com/cloudnative-pg/cnpg-playground/blob/main/demo/setup.sh#L65

…est runner with EOT probe checks, and streamline cluster credential handling. Signed-off-by: XploY04 <[email protected]>

README.md

…the for the latest version. Signed-off-by: XploY04 <[email protected]>

- Added wait for webhook pod to be ready - Prevents 'connection refused' error when creating cluster - Gives webhook time to fully initialize Signed-off-by: XploY04 <[email protected]>

Signed-off-by: XploY04 <[email protected]>

- Changed from pod selector wait to deployment rollout status - Handles controller manager restart correctly - Prevents timeout when pods are recreated Signed-off-by: XploY04 <[email protected]>

- Disable Alertmanager (not needed) - Disable Node Exporter (not needed) - Reduce Prometheus Operator memory - Speeds up installation significantly Signed-off-by: XploY04 <[email protected]>

- Runs full infrastructure setup (Steps 1-6) - Executes run-jepsen-chaos-test-v2.sh script - Collects and uploads test results - Configurable chaos duration - Scheduled daily runs Signed-off-by: XploY04 <[email protected]>

- GitHub Actions needs a push event to register new workflows - Added push trigger on dev-2 branch - Workflow will now appear in Actions UI Signed-off-by: XploY04 <[email protected]>

- litmus-core (Litmus 3.x) only has operator deployment - No separate control plane/portal server in litmus-core - Removed obsolete pre-flight check for control plane - Fixes 'Litmus control plane deployment not found' error Signed-off-by: XploY04 <[email protected]>

- Added logs/jepsen-chaos-*/results/*.png to artifact paths - Captures latency-raw.png, latency-quantiles.png, rate.png - Provides visual graphs of test performance Signed-off-by: XploY04 <[email protected]>

- Added 90-second wait after Prometheus setup - Allows 2+ scrape intervals for metric collection - Verifies cnpg_collector_up metric is available - Fixes probe failures (0/5 passed -> should pass now) Signed-off-by: XploY04 <[email protected]>

- Show ServiceMonitor, Service, Endpoints status - Display Prometheus configuration - Wait 60s for target discovery - Query Prometheus API to verify scraping - Test cnpg_collector_up metric availability - Helps diagnose probe failure issues Signed-off-by: XploY04 <[email protected]>

…bility - Test Prometheus API accessibility from pods (simulates Litmus probes) - Query CNPG metrics and verify data availability - Wait 90s for metrics scraping - Remove Litmus control plane check (litmus-core doesn't have it) - Add detailed debugging output for probe failures Signed-off-by: XploY04 <[email protected]>

- Show individual probe verdicts and descriptions - Collect chaos engine status - Get experiment pod logs for probe errors - Display full probe status JSON - Helps identify why probes aren't executing Signed-off-by: XploY04 <[email protected]>

- Changed TOTAL_CHAOS_DURATION from 600s to 300s - Matches typical test duration passed to script - Allows experiment to complete and finalize probe verdicts - Fixes 'Awaited' probe status issue - Added probe debugging to workflow Signed-off-by: XploY04 <[email protected]>

- Capture full experiment pod logs (not filtered) - Show complete chaos engine and chaosresult YAML - Find experiment pod using chaosUID label - Helps identify probe initialization failures - Reduced default chaos duration to 300s Signed-off-by: XploY04 <[email protected]>

- Chaos-runner pod crashed with exit code 2 - Need to check experiment job pod logs (where probes execute) - Added job pod discovery and log collection - Will show actual probe execution errors Signed-off-by: XploY04 <[email protected]>

- Removed pull_request trigger from test-setup.yml - Added push trigger on dev-2 to chaos-test-full.yml - Every push now runs complete chaos validation - Cleaner PR workflow without redundant infrastructure tests Signed-off-by: XploY04 <[email protected]>

- Changed from 110s to 180s (3 minutes) - Allows experiment to fully complete after chaos ends - EOT probes need: 30s initialDelay + 60-90s retries + 30-60s finalization - Fixes 'Awaited' probe verdicts by waiting for completion - Added experiment job pod log collection for debugging Signed-off-by: XploY04 <[email protected]>

- Changed from fixed 180s wait to dynamic phase checking - Monitors ChaosResult.status.experimentStatus.phase - Waits until phase changes to 'Completed' - 10-minute timeout with progress updates every 30s - Additional 10s buffer for final ChaosResult update - Fixes 'Awaited' probe verdicts by ensuring completion Signed-off-by: XploY04 <[email protected]>

…avior - Removed replication-lag-continuous probe - Probe failed on expected lag (45s) during primary deletion - Caused experiment to go to Error instead of Completed - Reduced wait timeout from 600s to 420s (7 minutes) - Updated probe count from 5 to 4 probes - Jepsen already tracks operation failures during chaos Signed-off-by: XploY04 <[email protected]>

- Removed probe debugging section from workflow (~70 lines) - Removed detailed probe output from script (~13 lines) - Simplified Prometheus verification (~100 lines total) - Saves ~2 minutes per workflow run - Keeps essential monitoring and error handling - Cleaner, more readable output Signed-off-by: XploY04 <[email protected]>

- Added explicit permissions blocks to all workflows - Grant only contents:read (minimum for checkout) - Deny all other permissions implicitly - Follows GitHub Actions security best practices - Reduces attack surface if workflow is compromised Signed-off-by: XploY04 <[email protected]>

- Run every Sunday at 2 AM UTC instead of daily - Reduces CI resource usage - Manual trigger still available anytime Signed-off-by: XploY04 <[email protected]>

- Change schedule from Sunday 2 AM UTC to 2 PM Italy time (13:00 UTC) - Add pull_request trigger for main and dev-2 branches - Makes workflow visible in Actions tab for manual triggering Signed-off-by: XploY04 <[email protected]>

Signed-off-by: XploY04 <[email protected]>

XploY04 force-pushed the dev-2 branch from 551f6eb to 8ad9ddc Compare November 18, 2025 10:06

XploY04 added 8 commits November 18, 2025 15:38

Add documentation for primary pod deletion without TARGET_PODS

08348c5

Signed-off-by: XploY04 <[email protected]>

Enhance documentation and code for primary pod chaos testing without …

1718988

…TARGET_PODS Signed-off-by: XploY04 <[email protected]>

feat: Add setup and workload testing scripts for CNPG monitoring with…

da0a01f

… Prometheus Signed-off-by: XploY04 <[email protected]>

fix: Update probe timeout and interval formats to include 's' suffix …

d9246e0

…for consistency in chaos experiments Signed-off-by: XploY04 <[email protected]>

XploY04 force-pushed the dev-2 branch from 8ad9ddc to 6a193d9 Compare November 18, 2025 10:08

XploY04 self-assigned this Nov 18, 2025

XploY04 added 5 commits November 18, 2025 23:04

fix: Update chaos experiment configurations for consistency and monit…

2b9e31f

…oring enhancements Signed-off-by: XploY04 <[email protected]>

fix: Update namespace for litmus-admin ServiceAccount in RBAC configu…

b9d9a8c

…ration Signed-off-by: XploY04 <[email protected]>

fix: Update chaos interval for primary pod deletion to 180 seconds an…

5cab5ce

…d improve primary pod identification logic Signed-off-by: XploY04 <[email protected]>

XploY04 marked this pull request as ready for review November 20, 2025 21:14

XploY04 requested a review from a team as a code owner November 20, 2025 21:14

fix: Enhance pod status check command and update Prometheus query for…

10313b1

… replication monitoring Signed-off-by: XploY04 <[email protected]>

sxd reviewed Nov 22, 2025

View reviewed changes

XploY04 added 5 commits November 23, 2025 00:05

docs: Update CNPG plugin installation, add disk space recommendations…

6e575a9

…, refine Jepsen prerequisites, and improve various command examples. Signed-off-by: XploY04 <[email protected]>

refactor: Consistently use LITMUS_NAMESPACE for Litmus resources and …

55047b7

…refine chaos result summary output. Signed-off-by: XploY04 <[email protected]>

feat: add pg-eu CloudNativePG cluster manifest and update README with…

3274fe4

… corresponding setup instructions. Signed-off-by: XploY04 <[email protected]>

docs: Streamline README setup instructions by adding a repo clone ste…

f7245ed

…p and removing optional Litmus UI and advanced CNPG install details. Signed-off-by: XploY04 <[email protected]>

docs: Remove kubectl cnpg plugin commands section from README.md

be495db

Signed-off-by: XploY04 <[email protected]>

gbartolini reviewed Nov 25, 2025

View reviewed changes

feat: Implement GitHub Actions for automated chaos testing, enhance t…

cf9e711

…est runner with EOT probe checks, and streamline cluster credential handling. Signed-off-by: XploY04 <[email protected]>

gbartolini reviewed Nov 25, 2025

View reviewed changes

README.md Show resolved Hide resolved

docs: update CloudNativePG operator installation instructions to use …

db71bf9

…the for the latest version. Signed-off-by: XploY04 <[email protected]>

XploY04 added 26 commits November 27, 2025 14:32

fix: Wait for CNPG webhook to be ready before cluster deployment

c577e4a

- Added wait for webhook pod to be ready - Prevents 'connection refused' error when creating cluster - Gives webhook time to fully initialize Signed-off-by: XploY04 <[email protected]>

test: Add Step 5 - Litmus Chaos operator and experiments

b2cabc3

Signed-off-by: XploY04 <[email protected]>

test: Add Step 6 - Prometheus monitoring for chaos probes

66bf1c2

Signed-off-by: XploY04 <[email protected]>

fix: Use deployment rollout status for webhook wait

306e54a

- Changed from pod selector wait to deployment rollout status - Handles controller manager restart correctly - Prevents timeout when pods are recreated Signed-off-by: XploY04 <[email protected]>

perf: Optimize Prometheus installation

3a81ff2

- Disable Alertmanager (not needed) - Disable Node Exporter (not needed) - Reduce Prometheus Operator memory - Speeds up installation significantly Signed-off-by: XploY04 <[email protected]>

feat: Add complete Jepsen + Chaos test workflow

42b89e5

- Runs full infrastructure setup (Steps 1-6) - Executes run-jepsen-chaos-test-v2.sh script - Collects and uploads test results - Configurable chaos duration - Scheduled daily runs Signed-off-by: XploY04 <[email protected]>

fix: Add push trigger to register chaos test workflow

c8661bf

- GitHub Actions needs a push event to register new workflows - Added push trigger on dev-2 branch - Workflow will now appear in Actions UI Signed-off-by: XploY04 <[email protected]>

fix: Include PNG graph files in test artifacts

6d78b27

- Added logs/jepsen-chaos-*/results/*.png to artifact paths - Captures latency-raw.png, latency-quantiles.png, rate.png - Provides visual graphs of test performance Signed-off-by: XploY04 <[email protected]>

chore: Change chaos test schedule from daily to weekly

851196f

- Run every Sunday at 2 AM UTC instead of daily - Reduces CI resource usage - Manual trigger still available anytime Signed-off-by: XploY04 <[email protected]>

chore: Update chaos test schedule and add PR trigger

df0c083

- Change schedule from Sunday 2 AM UTC to 2 PM Italy time (13:00 UTC) - Add pull_request trigger for main and dev-2 branches - Makes workflow visible in Actions tab for manual triggering Signed-off-by: XploY04 <[email protected]>

ci: remove Jepsen chaos testing setup workflow and execution script.

d8db968

Signed-off-by: XploY04 <[email protected]>

docs: Remove GitHub Actions chaos testing README.

7abea15

Signed-off-by: XploY04 <[email protected]>

XploY04 force-pushed the dev-2 branch from 6e43baa to 823b64f Compare December 2, 2025 19:54

feat: Add GitHub Actions docs and simplify chaos test workflow

2f0b7dc

Signed-off-by: XploY04 <[email protected]>

XploY04 force-pushed the dev-2 branch from 823b64f to 2f0b7dc Compare December 2, 2025 20:05

ci: reduce workflow artifact retention days to 7

d7cd132

Signed-off-by: XploY04 <[email protected]>


		> Follow these sections in order; each references the authoritative upstream documentation to keep this README concise.

		### 1. Bootstrap the CNPG Playground

		# Watch the chaos runner pod start (refreshes every 2s)
		watch -n2 'kubectl -n litmus get pods \| grep cnpg-jepsen-chaos-noprobes-runner'


		> Note: Keep using `experiments/cnpg-jepsen-chaos-noprobes.yaml` until Section 5 installs Prometheus/Grafana. Once monitoring is online, switch to `experiments/cnpg-jepsen-chaos.yaml` (probes enabled) for full observability.

		### 5. Configure monitoring (Prometheus + Grafana)

Add chaos testing setup and experiment documentation #3

Are you sure you want to change the base?

Add chaos testing setup and experiment documentation #3

Uh oh!

Conversation

XploY04 commented Oct 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What's Included

Test Workflow

Quick Start

Expected Results

Successful Test

Statistics

Result Files

Technical Highlights

Dynamic Pod Targeting

Comprehensive Probes

Configurable Parameters

Testing Performed

Future Enhancements

Uh oh!

gbartolini commented Nov 18, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

XploY04 commented Oct 2, 2025 •

edited

Loading