BERT-style mask prediction pretraining #851

niklasmei · 2025-12-01T10:50:32Z

This adds a module for unsupervised pretraining using BERT-style mask prediction.

I originally made it as to pretrain a model that returns a latent view of input sequences along with a single vector to represent the sequences. This was the case because I used a model based on DeepIce, where I had the cls-token and the processed sequence. Some variables still reflect the original use in their names.

In the version here I made it optional to provide a vector that summarizes the input data, which is only used to predict some summary feature (the total charge in an event in the standard case).

It is important that the model that is to be pretrained does not change the number of sequence elements beyond providing an additional summary vector, like a cls-token. Other than that this pretraining module should be indifferent to the model that is pretrained.

…pared in a mse_loss function

…ves everything unchanged

niklasmei added 27 commits June 13, 2025 13:33

mad the basis for maskpred pretraining and TheseusDeepIce

e0b7243

updated the mask_pred module; currently still in my theseus.py file

f387e0a

updated the maskpred frame; nowa certain percentage is masked and com…

408cb3f

…pared in a mse_loss function

added loss function

e234e03

fixed mse loss

21f7a45

renamed file and brought up to date

8c34835

added in the charge prediction functionality and learend masking values

f92a13f

some minor fixes

1115508

restored saving

0e751f6

tested and should be ready for pull request

eafe2e3

black reformatted

3d38cae

further fix for passing checks

d280f9a

added docstrings

172ce83

another reformatting

dae5642

docformatter reformatting

cb74825

more formatting

e7d8ffb

formatting

89fe2ae

formatting

ef53032

mypy fix

744d376

formatting? black check fails even though running black on my end lea…

4fc261b

…ves everything unchanged

avoid mypy problem with type of 'rep'

91822d4

avoid mypy error due to type of 'rep' variable

d248288

black formatting

c982cca

mypy fix

4504994

again weird behaviour of black

103a40d

still error from black

99b5b38

formatting

8d8448e

RasmusOrsoe requested a review from Aske-Rosted January 6, 2026 09:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

BERT-style mask prediction pretraining #851

BERT-style mask prediction pretraining #851

Uh oh!

niklasmei commented Dec 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

BERT-style mask prediction pretraining #851

Are you sure you want to change the base?

BERT-style mask prediction pretraining #851

Uh oh!

Conversation

niklasmei commented Dec 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant