Commit a1c29d5
feat!(backend): refactor multi-segment submission (2/n) (#5398)
resolves #4708,
#4734
partially resolves
#5392,
#5185 (comment)
Builds on #5382
When users submit to multi-segmented organisms and want to group
multiple segments under one metadata entry they are now required to add
an additional `fastaId` column with a space (or comma) -separated list
of the `fastaIds` (fasta header IDs) of the respective sequences. If no
`fastaId` column is supplied the `submissionId` will be used instead and
the backend will assume that (as in the single-segmented case) there is
a one-to-one mapping of metadata `submissionId` to `fastaId`.
This new submission structure was voted for in microbioinfo:
https://microbial-bioinfo.slack.com/archives/CB0HYT53M/p1760961465729399
and discussed in
https://app.nuclino.com/Loculus/Development/2025-10-20-Weekly-6d5fe89f-8ded-4286-b892-d215e0a498f6
(and in other meetings)
Nextclade sort will be used to assign segments/subtypes for all aligned
sequences:
```
minimizer_index: <url_to_minimizer_index_used_by_nextclade_sort>
```
For organisms without a nextclade dataset we still allow the fasta
headers to be used to determine the segment/subtype - entries must have
the format <submissionId>_<segmentName> (as in current set up).
As preprocessing now assigns segments it will return a map from the
segment (or subtype) to the fastaHeader in the processedData:
`sequenceNameToFastaHeaderMap`. This allows us to surface this
assignment on the edit page.
You can use pathoplexus/dev_example_data#2 for
testing.
Instead of having a dictionary for the nextclade datasets and servers we
make `nucleotideSequences` a list of sequences:
```
nextclade_dataset_name:
L: nextstrain/cchfv/linked/L
M: nextstrain/cchfv/linked/M
S: nextstrain/cchfv/linked/S
nextclade_dataset_server: https://raw.githubusercontent.com/nextstrain/nextclade_data/cornelius-cchfv/data_output
genes: [RdRp, GPC, NP]
```
```
nucleotideSequences:
- name: L
nextclade_dataset_name: nextstrain/cchfv/linked/L
nextclade_dataset_tag: <optional - was previously incorrectly placed on an organism level>
nextclade_dataset_server: <optional overwrites nextclade_dataset_server for this seq>
accepted_sort_matches: <optional, used for classify_with_nextclade_sort and require_nextclade_sort_match, if not given nextclade_dataset_name is used>
gene_prefix: <optional, prefix to add to genes produced by nextclade run, e.g. nextclade labels genes as `AV1` but we expect `EV1_AV1`, here `EV1` would be the prefix >
- name: M
nextclade_dataset_name: nextstrain/cchfv/linked/M
- name: S
nextclade_dataset_name: nextstrain/cchfv/linked/S
nextclade_dataset_server: https://raw.githubusercontent.com/nextstrain/nextclade_data/cornelius-cchfv/data_output
```
Note the templates now also generate the genes list from the merged
config.
- [ ] Update values.schema.json
- [x] keep tests for alignment NONE case
- [x] Create a minimizer for tests using:
https://github.com/loculus-project/nextclade-sort-minimizer-creator
- [x] Any manual testing that has been done is documented: submission of
EVs from test folder were submitted with the same fastaHeader as the
submissionId -> this succeeded, additionally the submission of CCHF with
a fastaID column in the metadata was tested (also in folder above),
additionally revision of a segment was tested
- [x] Have preprocessing send back a segment: fastaHeader mapping
- [ ] add integration testing for full EV submission user journey
- [ ] improve CCHF minimizer (some segments are again not assigned)
- [ ] discuss if the originalData dictionary should be migrated
(persistent DB has segmentName as key, now we have fastaHeader as key)
- [ ] update PPX docs with new multi-segment submission format
🚀 Preview: https://multi-segment-submission.loculus.org
---------
Co-authored-by: Cornelius Roemer <[email protected]>1 parent 123b94f commit a1c29d5
File tree
67 files changed
+1361
-649
lines changed- backend
- docs/db
- src
- main
- kotlin/org/loculus/backend
- api
- controller
- model
- service/submission
- dbtables
- utils
- resources/db/migration
- test
- kotlin/org/loculus/backend
- controller/submission
- service
- utils
- resources
- ingest
- scripts
- tests/expected_output_cchf
- integration-tests/tests
- fixtures
- specs
- cli
- features
- search
- test-data
- kubernetes/loculus
- templates
- preprocessing
- nextclade
- src/loculus_preprocessing
- tests
- ebola-dataset/minimizer
- website
- src
- components
- Edit
- ReviewPage
- Submission
- FileUpload
- types
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
67 files changed
+1361
-649
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
378 | 378 | | |
379 | 379 | | |
380 | 380 | | |
381 | | - | |
| 381 | + | |
| 382 | + | |
382 | 383 | | |
383 | 384 | | |
384 | 385 | | |
| |||
538 | 539 | | |
539 | 540 | | |
540 | 541 | | |
541 | | - | |
542 | | - | |
543 | | - | |
| 542 | + | |
| 543 | + | |
544 | 544 | | |
545 | 545 | | |
546 | 546 | | |
| |||
753 | 753 | | |
754 | 754 | | |
755 | 755 | | |
756 | | - | |
| 756 | + | |
757 | 757 | | |
758 | 758 | | |
759 | 759 | | |
| |||
794 | 794 | | |
795 | 795 | | |
796 | 796 | | |
| 797 | + | |
| 798 | + | |
| 799 | + | |
| 800 | + | |
| 801 | + | |
| 802 | + | |
| 803 | + | |
797 | 804 | | |
798 | 805 | | |
799 | 806 | | |
| |||
Lines changed: 8 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
8 | 8 | | |
9 | 9 | | |
10 | 10 | | |
| 11 | + | |
11 | 12 | | |
12 | 13 | | |
13 | 14 | | |
| |||
166 | 167 | | |
167 | 168 | | |
168 | 169 | | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
169 | 175 | | |
170 | 176 | | |
171 | 177 | | |
| |||
300 | 306 | | |
301 | 307 | | |
302 | 308 | | |
303 | | - | |
| 309 | + | |
304 | 310 | | |
305 | | - | |
| 311 | + | |
306 | 312 | | |
307 | 313 | | |
308 | 314 | | |
| |||
Lines changed: 9 additions & 7 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | | - | |
| 3 | + | |
4 | 4 | | |
5 | 5 | | |
6 | 6 | | |
7 | | - | |
| 7 | + | |
8 | 8 | | |
9 | 9 | | |
10 | | - | |
| 10 | + | |
11 | 11 | | |
12 | 12 | | |
13 | 13 | | |
| |||
18 | 18 | | |
19 | 19 | | |
20 | 20 | | |
21 | | - | |
| 21 | + | |
22 | 22 | | |
23 | 23 | | |
| 24 | + | |
| 25 | + | |
24 | 26 | | |
25 | 27 | | |
26 | 28 | | |
27 | 29 | | |
28 | | - | |
| 30 | + | |
29 | 31 | | |
30 | | - | |
| 32 | + | |
31 | 33 | | |
32 | 34 | | |
33 | 35 | | |
| |||
114 | 116 | | |
115 | 117 | | |
116 | 118 | | |
117 | | - | |
| 119 | + | |
118 | 120 | | |
119 | 121 | | |
120 | 122 | | |
| |||
Lines changed: 39 additions & 29 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
31 | 31 | | |
32 | 32 | | |
33 | 33 | | |
34 | | - | |
35 | | - | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
36 | 37 | | |
37 | 38 | | |
38 | 39 | | |
39 | 40 | | |
40 | 41 | | |
| 42 | + | |
41 | 43 | | |
42 | 44 | | |
43 | 45 | | |
| |||
126 | 128 | | |
127 | 129 | | |
128 | 130 | | |
129 | | - | |
130 | | - | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
131 | 138 | | |
132 | 139 | | |
133 | 140 | | |
| |||
167 | 174 | | |
168 | 175 | | |
169 | 176 | | |
| 177 | + | |
170 | 178 | | |
171 | | - | |
| 179 | + | |
172 | 180 | | |
173 | 181 | | |
174 | 182 | | |
175 | 183 | | |
176 | 184 | | |
177 | 185 | | |
178 | | - | |
| 186 | + | |
179 | 187 | | |
180 | 188 | | |
181 | 189 | | |
182 | 190 | | |
183 | | - | |
184 | | - | |
185 | | - | |
186 | | - | |
187 | | - | |
188 | | - | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
189 | 198 | | |
190 | | - | |
191 | | - | |
192 | | - | |
193 | | - | |
194 | | - | |
195 | | - | |
196 | | - | |
197 | | - | |
198 | | - | |
199 | | - | |
200 | | - | |
201 | | - | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
202 | 210 | | |
203 | 211 | | |
204 | 212 | | |
| |||
244 | 252 | | |
245 | 253 | | |
246 | 254 | | |
| 255 | + | |
247 | 256 | | |
248 | 257 | | |
249 | 258 | | |
| |||
253 | 262 | | |
254 | 263 | | |
255 | 264 | | |
256 | | - | |
| 265 | + | |
257 | 266 | | |
258 | 267 | | |
259 | 268 | | |
| |||
269 | 278 | | |
270 | 279 | | |
271 | 280 | | |
272 | | - | |
| 281 | + | |
273 | 282 | | |
274 | 283 | | |
275 | 284 | | |
| |||
344 | 353 | | |
345 | 354 | | |
346 | 355 | | |
347 | | - | |
| 356 | + | |
348 | 357 | | |
349 | 358 | | |
350 | 359 | | |
351 | 360 | | |
352 | 361 | | |
353 | | - | |
354 | | - | |
| 362 | + | |
| 363 | + | |
| 364 | + | |
355 | 365 | | |
356 | 366 | | |
357 | 367 | | |
| |||
Lines changed: 2 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
102 | 102 | | |
103 | 103 | | |
104 | 104 | | |
| 105 | + | |
105 | 106 | | |
106 | 107 | | |
107 | 108 | | |
| |||
128 | 129 | | |
129 | 130 | | |
130 | 131 | | |
| 132 | + | |
131 | 133 | | |
132 | 134 | | |
133 | 135 | | |
| |||
Lines changed: 1 addition & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
20 | 20 | | |
21 | 21 | | |
22 | 22 | | |
| 23 | + | |
23 | 24 | | |
24 | 25 | | |
25 | 26 | | |
| |||
Lines changed: 5 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
232 | 232 | | |
233 | 233 | | |
234 | 234 | | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
235 | 240 | | |
236 | 241 | | |
237 | 242 | | |
| |||
Lines changed: 2 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
457 | 457 | | |
458 | 458 | | |
459 | 459 | | |
| 460 | + | |
460 | 461 | | |
461 | 462 | | |
462 | 463 | | |
| |||
1224 | 1225 | | |
1225 | 1226 | | |
1226 | 1227 | | |
1227 | | - | |
| 1228 | + | |
1228 | 1229 | | |
1229 | 1230 | | |
1230 | 1231 | | |
| |||
0 commit comments