-
Notifications
You must be signed in to change notification settings - Fork 59
Description
I don't know if you are aware of intake, but it is a data access and cataloguing package that aims to do a lot of what you have done here, but for generic data-sets rather than the one specific example.
Firstly. the existing npy data source type shows how you might use intake on array data; note that the use of open_files ( here in the code ) already allows access to data on remote file-systems (s3, gcs, http...) with optional compression, and the caching system handles download-on-first-use, again with various possible file layouts at the far end.
You would still need some of your code for the specifics of the format of the mnist data, but I believe you could make your work much smaller and structured, and allow it to be included in other catalogues, or indeed as a conda package.