Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
db6ad55
added Appinsights doc
rjayaswal Jan 8, 2026
0165f5c
dixed PII
rjayaswal Feb 2, 2026
7e64d7f
Implement Application Insights telemetry integration
rjayaswal Feb 3, 2026
453da4e
feat: Add Application Insights telemetry integration
rjayaswal Feb 12, 2026
ded55fa
Merge branch 'main' into users/rjayaswal/appinsights-implementation
Ritvik-Jayaswal Feb 25, 2026
615a3da
Address Copilot PR review comments for telemetry implementation
rjayaswal Feb 25, 2026
c5b0719
Remove accidentally added github-secrets-telemetry-setup.md
rjayaswal Mar 9, 2026
c3b92fe
feat: Inject App Insights connection string in release workflow
rjayaswal Mar 9, 2026
b795272
Merge branch 'main' into users/rjayaswal/appinsights-implementation
Ritvik-Jayaswal Mar 9, 2026
4481b4f
Merge main into users/rjayaswal/appinsights-implementation
rjayaswal Mar 19, 2026
244f5b9
docs: Add telemetry GUID strategy design and update metrics spec
rjayaswal Apr 2, 2026
08dd7a5
docs: Make telemetry GUID strategy document neutral
rjayaswal Apr 2, 2026
81ccaf7
docs: Present GUID strategy as greenfield design
rjayaswal Apr 2, 2026
05f942e
docs: Add Option 3 Hybrid Cloud-Native ID Strategy
rjayaswal Apr 3, 2026
831453c
docs: Add telemetry testing section to AGENTS.md
rjayaswal Apr 3, 2026
195eb55
Merge branch 'main' into users/rjayaswal/appinsights-implementation
Ritvik-Jayaswal Apr 3, 2026
131d925
fix: Fix Helm template issues in operator deployment
rjayaswal Apr 3, 2026
da1d1a8
docs: Add sample cluster for telemetry E2E testing
rjayaswal Apr 3, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions .github/workflows/release_images.yml
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,13 @@ jobs:
echo "values.yaml content:"
cat operator/documentdb-helm-chart/values.yaml

- name: Inject telemetry connection string
if: ${{ secrets.APPINSIGHTS_CONNECTION_STRING != '' }}
run: |
echo "Injecting Application Insights connection string for telemetry"
# Use yq to update the connectionString field in values.yaml
sed -i 's|connectionString: ""|connectionString: "${{ secrets.APPINSIGHTS_CONNECTION_STRING }}"|g' operator/documentdb-helm-chart/values.yaml

- name: Set chart version
run: |
echo "CHART_VERSION=${{ github.event.inputs.version }}" >> $GITHUB_ENV
Expand Down
25 changes: 25 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -371,6 +371,31 @@ Types:
- Mock external dependencies appropriately
- Ensure tests are idempotent and isolated

### Telemetry Testing

The operator includes Application Insights telemetry. For design details, see:
- [docs/designs/appinsights-metrics.md](docs/designs/appinsights-metrics.md) - Metrics design and implementation
- [docs/designs/telemetry-guid-strategy.md](docs/designs/telemetry-guid-strategy.md) - GUID strategy options

**Quick E2E Test:**
```bash
# 1. Set instrumentation key (use test App Insights resource)
export APPINSIGHTS_INSTRUMENTATIONKEY="your-instrumentation-key"

# 2. Deploy operator to AKS/Kind cluster with telemetry enabled
helm install documentdb-operator ./operator/documentdb-helm-chart \
--namespace documentdb-operator --create-namespace \
--set telemetry.enabled=true \
--set telemetry.instrumentationKey=$APPINSIGHTS_INSTRUMENTATIONKEY

# 3. Create a DocumentDB cluster to trigger telemetry events
kubectl apply -f documentdb-playground/telemetry/sample-appinsights-cluster.yaml

# 4. Verify in Azure Portal → Application Insights → Live Metrics / Logs
```

For full observability setup (Prometheus, Grafana, OpenTelemetry), see `documentdb-playground/telemetry/`.

### Code Review

For thorough code reviews, reference the code review agent:
Expand Down
5 changes: 4 additions & 1 deletion docs/designs/appinsights-metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,9 @@
## Overview
This document specifies all telemetry data points to be collected by Application Insights for the DocumentDB Kubernetes Operator. These metrics provide operational insights, usage patterns, and error tracking for operator deployments.

### Cluster ID Generation
Cluster IDs (`cluster_id`) are generated using a deterministic SHA-256 hash of `namespace + cluster_name`. This ensures consistent IDs across operator restarts without requiring persistence. See [telemetry-guid-strategy.md](telemetry-guid-strategy.md) for details.

---

## 1. Operator Lifecycle Metrics
Expand All @@ -21,7 +24,7 @@ This document specifies all telemetry data points to be collected by Application
- **Metric**: `operator.health.status`
- **Value**: `1` (healthy) or `0` (unhealthy)
- **Frequency**: Every 60 seconds
- **Dimensions**: `pod_name`, `namespace`
- **Dimensions**: `pod_name`, `namespace_hash`

---

Expand Down
134 changes: 134 additions & 0 deletions docs/designs/telemetry-configuration.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
# Application Insights Telemetry Configuration

This document describes how to configure Application Insights telemetry collection for the DocumentDB Kubernetes Operator.

## Overview

The DocumentDB Operator can send telemetry data to Azure Application Insights to help monitor operator health, track cluster lifecycle events, and diagnose issues. All telemetry is designed with privacy in mind - no personally identifiable information (PII) is collected.

## Configuration

### Environment Variables

Configure telemetry by setting these environment variables in the operator deployment:

| Variable | Description | Required |
|----------|-------------|----------|
| `APPINSIGHTS_INSTRUMENTATIONKEY` | Application Insights instrumentation key | Yes (or connection string) |
| `APPLICATIONINSIGHTS_CONNECTION_STRING` | Application Insights connection string (alternative to instrumentation key) | Yes (or instrumentation key) |
| `DOCUMENTDB_TELEMETRY_ENABLED` | Set to `false` to disable telemetry collection | No (default: `true`) |

### Helm Chart Configuration

When installing via Helm, you can configure telemetry in your values.yaml:

```yaml
# values.yaml
telemetry:
enabled: true
instrumentationKey: "YOUR-INSTRUMENTATION-KEY-HERE"
# Or use connection string:
# connectionString: "InstrumentationKey=xxx;IngestionEndpoint=https://..."
# Or use an existing secret containing APPINSIGHTS_INSTRUMENTATIONKEY / APPLICATIONINSIGHTS_CONNECTION_STRING:
# existingSecret: "documentdb-operator-telemetry"
```

### Kubernetes Secret

For production deployments, store the instrumentation key in a Kubernetes secret:

```yaml
apiVersion: v1
kind: Secret
metadata:
name: documentdb-operator-telemetry
namespace: documentdb-system
type: Opaque
stringData:
APPINSIGHTS_INSTRUMENTATIONKEY: "YOUR-INSTRUMENTATION-KEY-HERE"
```

Then reference it in the operator deployment:

```yaml
envFrom:
- secretRef:
name: documentdb-operator-telemetry
```

## Privacy & Data Collection

### What We Collect

The operator collects anonymous, aggregated telemetry data including:

- **Operator lifecycle**: Startup events, health status, version information
- **Cluster operations**: Create, update, delete events (with timing metrics)
- **Backup operations**: Backup creation, completion, and expiration events
- **Error tracking**: Categorized errors (no raw error messages with sensitive data)
- **Performance metrics**: Reconciliation duration, API call latency

### What We DON'T Collect

To protect your privacy, we explicitly do NOT collect:

- Cluster names, namespace names, or any user-provided resource names
- Connection strings, passwords, or credentials
- IP addresses or hostnames
- Storage class names (may contain organizational information)
- Raw error messages (only categorized error types)
- Container image names

### Privacy Protection Mechanisms

1. **GUIDs Instead of Names**: All resources are identified by auto-generated GUIDs stored in annotations (`telemetry.documentdb.io/cluster-id`)
2. **Hashed Namespaces**: Namespace names are SHA-256 hashed before transmission
3. **Categorized Data**: Values like PVC sizes are categorized (small/medium/large) instead of exact values
4. **Error Sanitization**: Error messages are stripped of potential PII and truncated

## Disabling Telemetry

To completely disable telemetry collection:

1. **Via environment variable**:
```yaml
env:
- name: DOCUMENTDB_TELEMETRY_ENABLED
value: "false"
```

2. **Via Helm**:
```yaml
telemetry:
enabled: false
```

3. **Don't provide instrumentation key**: If no `APPINSIGHTS_INSTRUMENTATIONKEY` or `APPLICATIONINSIGHTS_CONNECTION_STRING` is set, telemetry is automatically disabled.

## Telemetry Events Reference

See [appinsights-metrics.md](appinsights-metrics.md) for the complete specification of all telemetry events and metrics collected.

## Troubleshooting

### Telemetry Not Being Sent

1. Verify the instrumentation key is correctly configured:
```bash
kubectl get deployment documentdb-operator -n documentdb-system -o yaml | grep -A5 APPINSIGHTS
```

2. Check operator logs for telemetry initialization:
```bash
kubectl logs -n documentdb-system -l app=documentdb-operator | grep -i telemetry
```

3. Verify network connectivity to Application Insights endpoint (`dc.services.visualstudio.com`)

### High Cardinality Warnings

If you see warnings about high cardinality dimensions, this indicates too many unique values for a dimension. The telemetry system automatically samples high-frequency events to mitigate this.

## Support

For issues related to telemetry collection, please open an issue on the [GitHub repository](https://github.com/documentdb/documentdb-kubernetes-operator/issues).
Loading
Loading