make huggingface an option. It should also be possible to pull directly from github, or from just a url eg s3
the data should be stored in exactly the same way (datacard, etc). But the huggingface_cli should only be used if the user intends to use huggingface, eg