Skip to content

connect to Intake #13

@martindurant

Description

@martindurant

I don't know if you are aware of intake, but it is a data access and cataloguing package that aims to do a lot of what you have done here, but for generic data-sets rather than the one specific example.

Firstly. the existing npy data source type shows how you might use intake on array data; note that the use of open_files ( here in the code ) already allows access to data on remote file-systems (s3, gcs, http...) with optional compression, and the caching system handles download-on-first-use, again with various possible file layouts at the far end.

You would still need some of your code for the specifics of the format of the mnist data, but I believe you could make your work much smaller and structured, and allow it to be included in other catalogues, or indeed as a conda package.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions