Skip to content

Commit 9c98ca7

Browse files
authored
Merge pull request #522 from MannLabs/improved-opt-methods
Improve method section for optimization
2 parents 1da75aa + 728c46a commit 9c98ca7

File tree

1 file changed

+54
-104
lines changed

1 file changed

+54
-104
lines changed
Lines changed: 54 additions & 104 deletions
Original file line numberDiff line numberDiff line change
@@ -1,134 +1,84 @@
1-
# Optimization and Calibration
2-
In peptide centric DIA search, calibration of the library and optimization of search parameters is required to maximize the number of confident identifications. AlphaDIA performs both calibration and optimization iteratively. Calibration removes the systematic deviation of observed and library values to account for technical variation from the LC or MS instrument. Optimization reduces the search space to improve the confidence in identifications and to accelerate search.
1+
# Search Parameter Optimization and Calibration
2+
3+
In peptide-centric DIA search, calibration of the library and optimization of search parameters is required to maximize the number of confident identifications. AlphaDIA performs both calibration and optimization iteratively. Calibration removes the systematic deviation of observed and library values to account for technical variation from the LC or MS instrument. Optimization reduces the search space to improve the confidence in identifications and to accelerate search.
4+
35
:::{note}
46
Calibration and optimization are different but both connected to transfer learning. In [transfer learning](./transfer-learning.md) the residual (non-systematic) variation is learned and thereby reduced. This usually leads to better performance if used with optimization and calibration.
57
:::
68

7-
## Overview of optimization
8-
AlphaDIA can perform optimization for the following parameters, `target_ms1_tolerance`, `target_ms2_tolerance`, `target_mobility_tolerance` and `target_rt_tolerance`. There are two optimization strategies, targeted and automatic.
9-
10-
### Targeted optimization
11-
The search space is progressively narrowed until a target tolerance is reached for a given parameter.
9+
## Search Space Optimization
10+
AlphaDIA supports two optimization strategies:
11+
1. Fixed target optimization (e.g., 7ppm mass tolerance)
12+
2. Automatic optimization for optimal search results
1213

13-
To activate targeted optimization for example for fragment m/z tolerance, set `target_ms2_tolerance` to `10` for using a target tolerance of 10 ppm.
14-
For retention time, the target value can be either set as an absolute value in seconds or as a fraction of the total retention time range.
14+
### Initial Parameters
15+
AlphaDIA starts with the following default search parameters:
16+
- MS1 tolerance: 30 ppm
17+
- MS2 tolerance: 30 ppm
18+
- Ion mobility tolerance: 0.1 1/K0
19+
- Retention time tolerance: 50% of gradient length
1520

16-
For example, setting `target_rt_tolerance` to `300` will result in a target tolerance of 300 seconds, while setting it to `0.3` will use 30% of the gradient length as the target tolerance.
21+
### Optimization Algorithm
1722

18-
### Automatic optimization
19-
In automatic optimization the search space is reduced until an optimal value is detected. This optimization is curently performed for every raw file individually. The results of the optimization can be found in the [stats.tsv](<project:../methods/output-format.md>) file in the output directory.
23+
The optimization process follows these steps:
24+
1. Search is performed batch-wise, starting with the first 8000 precursors
25+
2. Batch size increases exponentially (16,000, 32,000, 64,000, ...) until 200 precursors are identified at 1% FDR
26+
3. For targeted optimization, the search space is updated to the 95% percentile of identified precursors
27+
4. For automatic optimization, the search space is set to the 99% percentile
2028

21-
To activate automatic optimization for a certain quantity, set the respective target to `0.0`, for example set `target_rt_tolerance=0.0` for automatic retention time optimization.
22-
23-
:::{tip}
24-
We recommend to always use automatic optimization for retention time and ion mobility as set by default. For automatic optimization of mass tolerances we recommend using optimization in the first pass and then using the optimized values in the second pass.
25-
:::
29+
The optimization uses different figures of merit:
30+
- MS1 error: Correlation of observed and predicted isotope intensity profile
31+
- MS2, RT, and ion mobility: Precursor proportion of the library detected at 1% FDR
2632

27-
## Optimization and Calibration Algorithm
28-
### Overall process
29-
AlphaDIA performs iterative optimization and calibration of retention time, ion mobility, precursor m/z and fragment m/z parameters as illustrated below.
33+
Optimization stops when the property of interest stabilizes and the optimal value based on the figure of merit is reached.
3034

3135
<img src="../_static/images/methods_optimization.png" width="100%" height="auto">
3236

33-
Optimization can be performed in either a targeted or automatic manner. In targeted optimization, the search space is progressively narrowed until a target tolerance is reached for a given parameter. In automatic optimization, the search space is progressively narrowed until an internal algorithm detects that further narrowing will reduce the confident identification of precursors (either by directly assessing the proportion of the library which has been detected or using a surrogate metric, such as the mean isotope intensity correlation for precursor m/z tolerance), at which point the optimal value is selected for search. It is possible to use targeted optimization for some parameters and automatic optimization for others.
34-
3537
AlphaDIA iteratively performs calibration and optimization based on a subset of the spectral library used for search. The size of this subset is adjusted according to an exponential batch plan to balance accuracy and efficiency. A defined number of precursors, set by the ``optimization_lock_target`` (default: 200), need to be identified at 1% FDR before calibration and optimization are performed. If fewer precursors than the target number are identified using a given step of the batch plan, AlphaDIA will search for precursors from the next step of the batch plan in addition to those already searched. If more precursors than the target number are identified, AlphaDIA will check if any previous step of the batch plan is also likely to yield at least the target number, in which case it will use the smallest such step of the batch plan for the next iteration of calibration and optimization. In this way, AlphaDIA ensures that calibration is always performed on sufficient precursors to be reliable, while calibrating on the smallest-possible subset of the library to maximize efficiency.
3638

3739
The process of choosing the best batch step for calibration and optimization is illustrated in the figure below, which shows optimization and calibration over seven iterations; the batch size is increased in the first iteration until sufficient precursors are detected, and subsequently reduced when the proportion is sufficiently high that a previous step should reach the target as well; if, however, fluctuations in the number of identifications mean that not enough precursors are actually identified, the next step in the batch plan will be searched as well to ensure that calibration is always performed on at least the target number of precursors.
3840

3941
<img src="../_static/images/methods_optimization_calibration.png" width="100%" height="auto">
4042

41-
### Calibration
42-
If enough confident target precursors have been detected, they are calibrated to the observed values using locally estimated scatterplot smoothing (LOESS) regression. For calibration of fragment m/z values, a fixed number of up to 5000 of the best fragments according to their XIC correlation are used. For precursor calibration all precursors passing 1% FDR are used. Calibration is performed prior to every optimization.
43+
### Targeted Optimization
44+
45+
To activate targeted optimization for a parameter, set its target value:
46+
- For fragment m/z tolerance: `search.target_ms2_tolerance = 10` (10 ppm)
47+
- For retention time:
48+
- Absolute value: `search.target_rt_tolerance = 300` (300 seconds)
49+
- Relative value: `search.target_rt_tolerance = 0.3` (30% of gradient length)
50+
51+
### Automatic Optimization
4352

44-
### Optimization
45-
For optimizing the search space, tolerances like retention time, ion mobility and m/z ratios need to be reduced. The goal is to cover the expected spectrum space but reduce it as much as possible to accelerate search and gain statistical power. Search starts with initial tolerances as defined in `search_initial`. For targeted optimization, the 95% deviation after calibration is adopted as the new tolerance until the target tolerances defined in the `search` section are reached. For automatic optimization, the 99% deviation plus 10% of the absolute value of the tolerance is adopted as the new tolerance, and search continues until parameter-specific convergence rules are met.
53+
To activate automatic optimization, set the target value to 0.0:
54+
- Example: `search.target_rt_tolerance = 0.0`
4655

47-
The optimization is finished as soon as the minimum number of steps `min_steps` has passed and all tolerances have either 1. reached the target tolerances defined in `search` if using targeted optimization, or 2. have converged if using automatic optimization.
56+
:::{tip}
57+
We recommend using automatic optimization for retention time and ion mobility (default setting). For mass tolerances, use optimization in the first pass and then apply the optimized values in the second pass.
58+
:::
4859

49-
## Configuring calibration and optimization
50-
By default, alphaDIA performs targeted optimization of precursor m/z, fragment m/z, and automatic optimization of retention time and ion mobility using the settings below.
60+
## Calibration
5161

52-
```yaml
53-
search:
54-
# Number of peak groups identified in the convolution score to classify with target decoy competition
55-
target_num_candidates: 3
62+
Calibration of systematic deviations occurs in parallel based on confident precursors identified at 1% FDR. Library values are calibrated to match the dataset distribution using locally estimated scatterplot smoothing (LOESS) regression.
5663

57-
# Targeted optimization of precursor m/z tolerance.
58-
# Use absolute values in ppm (e.g. 15ppm) or set to 0 for automatic optimization.
59-
target_ms1_tolerance: 5
64+
### LOESS Calibration Model
6065

61-
# Targeted optimization of fragment m/z tolerance.
62-
# Use absolute values in ppm (e.g. 15ppm) or set to 0 for automatic optimization.
63-
target_ms2_tolerance: 10
66+
The calibration process uses:
67+
- For fragment m/z: Up to 5000 (minimum 500) of the best fragments based on XIC correlation
68+
- For precursors: All precursors passing 1% FDR
6469

65-
# Targeted optimization of ion mobility tolerance.
66-
# Use absolute values in 1/K0 (e.g. 0.04 1/K0) or set to 0 for automatic optimization.
67-
target_mobility_tolerance: 0
70+
The LOESS regression architecture:
71+
- Uses uniformly distributed kernels
72+
- Applies first and second degree polynomial basis functions
73+
- For m/z and ion mobility: Two local estimators with tricubic kernels
74+
- For retention time: Six estimators with tricubic kernels
6875

69-
# Targeted optimization of retention time tolerance.
70-
# Use absolute values in seconds (e.g. 300s) or set to 0 for automatic optimization.
71-
target_rt_tolerance: 0
72-
```
76+
The model is built on scikit-learn and can be configured with different hyperparameters and predictors.
7377

74-
## Calibration using LOESS
75-
Individual properties like the retention time deviate from their library values and need to be calibrated (a). As a nonlinear but stable method, locally estimated scatterplot smoothing (LOESS) using both density and uniformly distributed kernels is used. (b) A collection of polynomial kernels is fitted to uniformly distributed subregions of the data. These consist of first and second degree polynomials basis functions of the calibratable property. (c) The individual functions are combined and smoothed using tricubic weights. (d) Combining the kernels with their weighting functions allows to approximate the systematic deviation of the data locally. (e), The sum of the weighted kernels can then be used for continuous approximation and calibration of retention times. The architecture is built on the scikit-learn package and can be configured to use different hyperparameters and arbitrary predictors for calibration.
78+
Individual properties like the retention time deviate from their library values and need to be calibrated (a). As a nonlinear but stable method, locally estimated scatterplot smoothing (LOESS) using both density and uniformly distributed kernels is used. (b) A collection of polynomial kernels is fitted to uniformly distributed subregions of the data. These consist of first and second degree polynomials basis functions of the calibratable property. (c) The individual functions are combined and smoothed using tricubic weights. (d) Combining the kernels with their weighting functions allows to approximate the systematic deviation of the data locally. (e), The sum of the weighted kernels can then be used for continuous approximation and calibration of retention times.
7679

7780
<img src="../_static/images/methods_loess.png" width="100%" height="auto">
7881

79-
## Configuring the LOESS model
80-
81-
The type of model, the hyperparameters and the columns used as input and target for calibration can be set in the `calibration_manager` section of the configuration file.
82-
83-
```yaml
84-
calibration_manager:
85-
- name: fragment
86-
estimators:
87-
- name: mz
88-
model: LOESSRegression
89-
model_args:
90-
n_kernels: 2
91-
input_columns:
92-
- mz_library
93-
target_columns:
94-
- mz_observed
95-
output_columns:
96-
- mz_calibrated
97-
# display deviation in ppm
98-
transform_deviation: 1e6
99-
- name: precursor
100-
estimators:
101-
- name: mz
102-
model: LOESSRegression
103-
model_args:
104-
n_kernels: 2
105-
input_columns:
106-
- mz_library
107-
target_columns:
108-
- mz_observed
109-
output_columns:
110-
- mz_calibrated
111-
# display deviation in ppm
112-
transform_deviation: 1e6
113-
- name: rt
114-
model: LOESSRegression
115-
model_args:
116-
n_kernels: 6
117-
input_columns:
118-
- rt_library
119-
target_columns:
120-
- rt_observed
121-
output_columns:
122-
- rt_calibrated
123-
- name: mobility
124-
model: LOESSRegression
125-
model_args:
126-
n_kernels: 2
127-
input_columns:
128-
- mobility_library
129-
target_columns:
130-
- mobility_observed
131-
output_columns:
132-
- mobility_calibrated
133-
134-
```
82+
### Configuring the LOESS Model
83+
84+
The type of model, hyperparameters, and columns used for calibration can be configured in the `calibration_manager` section of the configuration file.

0 commit comments

Comments
 (0)