Commit 95aa5e9
feat!(backend): refactor multi-segment submission (2/n) (#5398)
resolves #4708,
#4734
partially resolves
#5392,
#5185 (comment)
Builds on #5382
### BREAKING CHANGES
When users submit to multi-segmented organisms and want to group
multiple segments under one metadata entry they are now required to add
an additional `fastaId` column with a space (or comma) -separated list
of the `fastaIds` (fasta header IDs) of the respective sequences. If no
`fastaId` column is supplied the `submissionId` will be used instead and
the backend will assume that (as in the single-segmented case) there is
a one-to-one mapping of metadata `submissionId` to `fastaId`.
This new submission structure was voted for in microbioinfo:
https://microbial-bioinfo.slack.com/archives/CB0HYT53M/p1760961465729399
and discussed in
https://app.nuclino.com/Loculus/Development/2025-10-20-Weekly-6d5fe89f-8ded-4286-b892-d215e0a498f6
(and in other meetings)
Nextclade sort will be used to assign segments/subtypes for all aligned
sequences:
```
minimizer_index: <url_to_minimizer_index_used_by_nextclade_sort>
```
For organisms without a nextclade dataset we still allow the fasta
headers to be used to determine the segment/subtype - entries must have
the format <submissionId>_<segmentName> (as in current set up).
As preprocessing now assigns segments it will return a map from the
segment (or subtype) to the fastaHeader in the processedData:
`sequenceNameToFastaHeaderMap`. This allows us to surface this
assignment on the edit page.
### Testing
You can use pathoplexus/dev_example_data#2 for
testing.
### Prepro config changes
Instead of having a dictionary for the nextclade datasets and servers we
make `nucleotideSequences` a list of sequences:
```
nextclade_dataset_name:
L: nextstrain/cchfv/linked/L
M: nextstrain/cchfv/linked/M
S: nextstrain/cchfv/linked/S
nextclade_dataset_server: https://raw.githubusercontent.com/nextstrain/nextclade_data/cornelius-cchfv/data_output
genes: [RdRp, GPC, NP]
```
```
nucleotideSequences:
- name: L
nextclade_dataset_name: nextstrain/cchfv/linked/L
nextclade_dataset_tag: <optional - was previously incorrectly placed on an organism level>
nextclade_dataset_server: <optional overwrites nextclade_dataset_server for this seq>
accepted_sort_matches: <optional, used for classify_with_nextclade_sort and require_nextclade_sort_match, if not given nextclade_dataset_name is used>
gene_prefix: <optional, prefix to add to genes produced by nextclade run, e.g. nextclade labels genes as `AV1` but we expect `EV1_AV1`, here `EV1` would be the prefix >
- name: M
nextclade_dataset_name: nextstrain/cchfv/linked/M
- name: S
nextclade_dataset_name: nextstrain/cchfv/linked/S
nextclade_dataset_server: https://raw.githubusercontent.com/nextstrain/nextclade_data/cornelius-cchfv/data_output
```
Note the templates now also generate the genes list from the merged
config.
### PR Checklist
- [ ] Update values.schema.json
- [x] keep tests for alignment NONE case
- [x] Create a minimizer for tests using:
https://github.com/loculus-project/nextclade-sort-minimizer-creator
- [x] Any manual testing that has been done is documented: submission of
EVs from test folder were submitted with the same fastaHeader as the
submissionId -> this succeeded, additionally the submission of CCHF with
a fastaID column in the metadata was tested (also in folder above),
additionally revision of a segment was tested
- [x] Have preprocessing send back a segment: fastaHeader mapping
## Future Work
- [ ] add integration testing for full EV submission user journey
- [ ] improve CCHF minimizer (some segments are again not assigned)
- [ ] discuss if the originalData dictionary should be migrated
(persistent DB has segmentName as key, now we have fastaHeader as key)
- [ ] update PPX docs with new multi-segment submission format
🚀 Preview: https://multi-segment-submission.loculus.org
---------
Co-authored-by: Cornelius Roemer <[email protected]>1 parent 5feacae commit 95aa5e9
File tree
67 files changed
+1381
-661
lines changed- backend
- docs/db
- src
- main
- kotlin/org/loculus/backend
- api
- controller
- model
- service/submission
- dbtables
- utils
- resources/db/migration
- test
- kotlin/org/loculus/backend
- controller/submission
- service
- utils
- resources
- ingest
- scripts
- tests/expected_output_cchf
- integration-tests/tests
- fixtures
- specs
- cli
- features
- search
- test-data
- kubernetes/loculus
- templates
- preprocessing
- nextclade
- src/loculus_preprocessing
- tests
- ebola-dataset/minimizer
- website
- src
- components
- Edit
- ReviewPage
- Submission
- FileUpload
- types
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
67 files changed
+1381
-661
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
380 | 380 | | |
381 | 381 | | |
382 | 382 | | |
383 | | - | |
| 383 | + | |
| 384 | + | |
384 | 385 | | |
385 | 386 | | |
386 | 387 | | |
| |||
540 | 541 | | |
541 | 542 | | |
542 | 543 | | |
543 | | - | |
544 | | - | |
545 | | - | |
| 544 | + | |
| 545 | + | |
546 | 546 | | |
547 | 547 | | |
548 | 548 | | |
| |||
755 | 755 | | |
756 | 756 | | |
757 | 757 | | |
758 | | - | |
| 758 | + | |
759 | 759 | | |
760 | 760 | | |
761 | 761 | | |
| |||
796 | 796 | | |
797 | 797 | | |
798 | 798 | | |
| 799 | + | |
| 800 | + | |
| 801 | + | |
| 802 | + | |
| 803 | + | |
| 804 | + | |
| 805 | + | |
799 | 806 | | |
800 | 807 | | |
801 | 808 | | |
| |||
Lines changed: 8 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
8 | 8 | | |
9 | 9 | | |
10 | 10 | | |
| 11 | + | |
11 | 12 | | |
12 | 13 | | |
13 | 14 | | |
| |||
166 | 167 | | |
167 | 168 | | |
168 | 169 | | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
169 | 175 | | |
170 | 176 | | |
171 | 177 | | |
| |||
300 | 306 | | |
301 | 307 | | |
302 | 308 | | |
303 | | - | |
| 309 | + | |
304 | 310 | | |
305 | | - | |
| 311 | + | |
306 | 312 | | |
307 | 313 | | |
308 | 314 | | |
| |||
Lines changed: 9 additions & 7 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | | - | |
| 3 | + | |
4 | 4 | | |
5 | 5 | | |
6 | 6 | | |
7 | | - | |
| 7 | + | |
8 | 8 | | |
9 | 9 | | |
10 | | - | |
| 10 | + | |
11 | 11 | | |
12 | 12 | | |
13 | 13 | | |
| |||
18 | 18 | | |
19 | 19 | | |
20 | 20 | | |
21 | | - | |
| 21 | + | |
22 | 22 | | |
23 | 23 | | |
| 24 | + | |
| 25 | + | |
24 | 26 | | |
25 | 27 | | |
26 | 28 | | |
27 | 29 | | |
28 | | - | |
| 30 | + | |
29 | 31 | | |
30 | | - | |
| 32 | + | |
31 | 33 | | |
32 | 34 | | |
33 | 35 | | |
| |||
114 | 116 | | |
115 | 117 | | |
116 | 118 | | |
117 | | - | |
| 119 | + | |
118 | 120 | | |
119 | 121 | | |
120 | 122 | | |
| |||
Lines changed: 39 additions & 29 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
31 | 31 | | |
32 | 32 | | |
33 | 33 | | |
34 | | - | |
35 | | - | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
36 | 37 | | |
37 | 38 | | |
38 | 39 | | |
39 | 40 | | |
40 | 41 | | |
| 42 | + | |
41 | 43 | | |
42 | 44 | | |
43 | 45 | | |
| |||
128 | 130 | | |
129 | 131 | | |
130 | 132 | | |
131 | | - | |
132 | | - | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
133 | 140 | | |
134 | 141 | | |
135 | 142 | | |
| |||
177 | 184 | | |
178 | 185 | | |
179 | 186 | | |
| 187 | + | |
180 | 188 | | |
181 | | - | |
| 189 | + | |
182 | 190 | | |
183 | 191 | | |
184 | 192 | | |
185 | 193 | | |
186 | 194 | | |
187 | 195 | | |
188 | | - | |
| 196 | + | |
189 | 197 | | |
190 | 198 | | |
191 | 199 | | |
192 | 200 | | |
193 | | - | |
194 | | - | |
195 | | - | |
196 | | - | |
197 | | - | |
198 | | - | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
199 | 208 | | |
200 | | - | |
201 | | - | |
202 | | - | |
203 | | - | |
204 | | - | |
205 | | - | |
206 | | - | |
207 | | - | |
208 | | - | |
209 | | - | |
210 | | - | |
211 | | - | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
212 | 220 | | |
213 | 221 | | |
214 | 222 | | |
| |||
254 | 262 | | |
255 | 263 | | |
256 | 264 | | |
| 265 | + | |
257 | 266 | | |
258 | 267 | | |
259 | 268 | | |
| |||
263 | 272 | | |
264 | 273 | | |
265 | 274 | | |
266 | | - | |
| 275 | + | |
267 | 276 | | |
268 | 277 | | |
269 | 278 | | |
| |||
279 | 288 | | |
280 | 289 | | |
281 | 290 | | |
282 | | - | |
| 291 | + | |
283 | 292 | | |
284 | 293 | | |
285 | 294 | | |
| |||
354 | 363 | | |
355 | 364 | | |
356 | 365 | | |
357 | | - | |
| 366 | + | |
358 | 367 | | |
359 | 368 | | |
360 | 369 | | |
361 | 370 | | |
362 | 371 | | |
363 | | - | |
364 | | - | |
| 372 | + | |
| 373 | + | |
| 374 | + | |
365 | 375 | | |
366 | 376 | | |
367 | 377 | | |
| |||
Lines changed: 2 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
118 | 118 | | |
119 | 119 | | |
120 | 120 | | |
| 121 | + | |
121 | 122 | | |
122 | 123 | | |
123 | 124 | | |
| |||
144 | 145 | | |
145 | 146 | | |
146 | 147 | | |
| 148 | + | |
147 | 149 | | |
148 | 150 | | |
149 | 151 | | |
| |||
Lines changed: 1 addition & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
20 | 20 | | |
21 | 21 | | |
22 | 22 | | |
| 23 | + | |
23 | 24 | | |
24 | 25 | | |
25 | 26 | | |
| |||
Lines changed: 5 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
232 | 232 | | |
233 | 233 | | |
234 | 234 | | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
235 | 240 | | |
236 | 241 | | |
237 | 242 | | |
| |||
Lines changed: 2 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
461 | 461 | | |
462 | 462 | | |
463 | 463 | | |
| 464 | + | |
464 | 465 | | |
465 | 466 | | |
466 | 467 | | |
| |||
1233 | 1234 | | |
1234 | 1235 | | |
1235 | 1236 | | |
1236 | | - | |
| 1237 | + | |
1237 | 1238 | | |
1238 | 1239 | | |
1239 | 1240 | | |
| |||
0 commit comments