You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: astro-docs/src/content/docs/features/CI Features/flaky-tasks.mdoc
+53-5Lines changed: 53 additions & 5 deletions
Original file line number
Diff line number
Diff line change
@@ -39,12 +39,60 @@ In this image, the `e2e-ci--src/e2e/app.cy.ts` task is a flaky task that has bee
39
39
40
40
When a flaky task fails in CI with [distributed task execution](/docs/features/ci-features/distribute-task-execution) enabled, Nx will **automatically send that task to a different agent** and run it again (up to 2 tries in total). Its important to run the task on a different agent to ensure that the agent itself or the other tasks that were run on that agent are not the reason for the flakiness.
41
41
42
-
## Manually Mark a Task as Flaky or Not Flaky
42
+
## Flaky Task Analytics
43
43
44
-
If you suspect that a task is flaky, but Nx has not confirmed it yet, you can manually **mark it as likely flaky** from the run details screen. Failed tasks that are not flaky will have a button that says **"Mark task as likely flaky"**.
Workspace flaky task analytics is currently available for organizations on the Enterprise plan. Reach out if your organization is [interested in Nx Enterprise](https://nx.dev/enterprise?utm_source=nx.dev&utm_medium=callout&utm_campaign=flaky-task-analytics).
46
+
{% /aside %}
45
47
46
-

48
+
Nx Cloud provides analytics to help you understand and manage flaky tasks across your workspace. The analytics dashboard gives you insights into which tasks are flaky, how often they fail, and how much time is being wasted on reruns.
47
49
48
-
Once you've resolved the issue that caused a task to be flaky, you can immediately mark the task as not flaky by clicking on **"Mark task as no longer flaky"** on the same run details screen.

52
+
The dashboard displays key metrics over the time range selected (7 days vs 30 days) to give you a quick overview of your workspace health.
53
+
54
+
- **Active flaky tasks** - The total number of tasks in your workspace that have a flake rate greater than 0 within the selected time window.
55
+
- **Average flake rate** - A weighted average flake rate across all tasks in your workspace. This metric uses the sample size to weight each task's flake rate proportionally, so a task that ran 1000 times with 5% flake rate has more impact than one that ran 10 times with 50% flake rate.
56
+
- **High risk tasks** - The number of tasks with a flake rate higher than 20%, indicating severe reliability issues that need immediate attention.
57
+
58
+
The chart shown provides a visual representation of your flaky tasks, helping you quickly identify which tasks need the most attention.
59
+
60
+
Tasks are plotted based on their **impact score**, which is calculated as `flake_rate × sample_size`. This means frequently-run flaky tasks are weighted higher than rarely-run flaky tasks.
61
+
62
+
Priority levels are determined using percentile-based thresholds that scale across organizations of any size:
63
+
64
+
- **High priority** (red) - Top 10% of tasks by impact score (90th percentile and above). These tasks have severe flakiness and should be addressed immediately.
65
+
- **Medium priority** (yellow) - Next 23% of tasks by impact score (67th-90th percentile). These tasks have moderate flakiness with sufficient data.
66
+
- **Low priority** (gray) - Bottom 67% of tasks by impact score. These tasks have minor flakiness or not enough data to be concerning.
67
+
68
+
Tasks on the right side of the chart typically represent the highest priority items that need attention. The scatter plot shows up to 50 results, sorted by most recent flaked tasks.
69
+
70
+
### Flaky Task Table
71
+
72
+
The table provides detailed information about each flaky task in your workspace:
By default, the table loads your most recent flaky tasks. Each row includes:
77
+
78
+
- **Task** - The project and target combination (e.g., `my-app:test`)
79
+
- **Flake rate** - Measures how often a task succeeds due to flakiness. Specifically, it represents the percentage of total successes that came from unreliable (flaky) task hashes: `flaky_successes / (flaky_successes + non_flaky_successes)`. This tells you: "Of all the times this task succeeded, how many successes came from unreliable code?"
80
+
- **Total reruns** - The number of times a task was executed more than once due to flakiness. This counts the "extra" executions that happened because the task failed and needed to be retried. Calculated as: `total_executions - unique_hash_count`
81
+
- **Time wasted** - An estimate of the total time spent on reruns, calculated by multiplying the total reruns by the average task duration
82
+
- **Last failure** - The timestamp of the most recent failure across all contributing task hashes
83
+
84
+
#### Flaky Task Detail View
85
+
86
+
Click on any row in the table to view detailed information about a specific flaky task.
87
+
88
+
The **Overview** tab shows summary statistics and trends for the selected task such as flake rate, time wasted and automatic deflake counts
The **Environments** tab provides insights into the different environments where the task was executed, helping identify if certain environments contribute to flakiness.
Copy file name to clipboardExpand all lines: astro-docs/src/content/docs/reference/glossary.mdoc
+12Lines changed: 12 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -92,6 +92,12 @@ A script that performs some action on your code. This can include building, lint
92
92
93
93
> See: [Executors and Configurations](/docs/concepts/executors-and-configurations)
94
94
95
+
### Flake Rate
96
+
97
+
A metric that measures how often a task succeeds due to flakiness rather than reliable code. It represents the percentage of total successes that came from unreliable (flaky) task hashes: `flaky_successes / (flaky_successes + non_flaky_successes)`. This answers the question: "Of all the times this task succeeded, how many successes came from unreliable code?"
Tasks that will sometimes succeed and sometimes fail without any change to the inputs. These tasks are often e2e tests and are particularly problematic in CI. Nx Cloud automatically detects flaky tasks and re-runs them.
@@ -110,6 +116,12 @@ A computer science concept that consists of nodes connected by edges. In the Nx
110
116
111
117
> See: [Explore the Graph](/docs/features/explore-graph)
112
118
119
+
### Impact Score
120
+
121
+
A metric used in [Flaky Task Analytics](#flaky-tasks) to prioritize which flaky tasks need attention. Calculated as `flake_rate × sample_size`, it weights frequently-run flaky tasks higher than rarely-run flaky tasks. For example, a task with 50% flake rate and 100 runs has an impact score of 50, the same as a task with 10% flake rate and 500 runs.
Launch Templates are used to set up an agent machine. They specify a resource class, an image and a series of set up steps before tasks are executed on that machine.
0 commit comments