You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Masked indexing i.e., you only want the current data loader to focus on a fixed subset of the data Implemented
1 and 3 probably break the idea that we should pre-shuffle the entire dataset (although you should probably still do some degree of shuffling, and perhaps this motivates more flexible shuffling) but 2 probably could be worked with pre-shuffling in some way or another.
The flipside is that these could be used to "weight" preshuffling so the order of the on-disk data gives the randomness scheme desired (i.e., shuffling in a certain way to achieve 2 or 3 without actually changing the loading code).
Description of feature
Related to but somehwat different than #43
Currently we just blindly exhaust the entire on-disk
obsindex but there are multiple cases where you might want to break this model:Different weighted sampling schemesPlanned Feat: groupby per dataset #186 CategoricalSampler support #119Masked indexing i.e., you only want the current data loader to focus on a fixed subset of the dataImplemented1 and 3 probably break the idea that we should pre-shuffle the entire dataset (although you should probably still do some degree of shuffling, and perhaps this motivates more flexible shuffling) but 2 probably could be worked with pre-shuffling in some way or another.
The flipside is that these could be used to "weight" preshuffling so the order of the on-disk data gives the randomness scheme desired (i.e., shuffling in a certain way to achieve 2 or 3 without actually changing the loading code).