Skip to content

Custom indexes #87

@ilan-gold

Description

@ilan-gold

Description of feature

Related to but somehwat different than #43

Currently we just blindly exhaust the entire on-disk obs index but there are multiple cases where you might want to break this model:

  1. Different weighted sampling schemes Planned Feat: groupby per dataset #186 CategoricalSampler support #119
  2. "Mapped indexing" i.e., from a neighborhood graph
  3. Masked indexing i.e., you only want the current data loader to focus on a fixed subset of the data Implemented

1 and 3 probably break the idea that we should pre-shuffle the entire dataset (although you should probably still do some degree of shuffling, and perhaps this motivates more flexible shuffling) but 2 probably could be worked with pre-shuffling in some way or another.

The flipside is that these could be used to "weight" preshuffling so the order of the on-disk data gives the randomness scheme desired (i.e., shuffling in a certain way to achieve 2 or 3 without actually changing the loading code).

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions