-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
Since the requests package seem to exhaust system RAM as default behavior, I think some api should pass stream = True that allows chunked download. Current implementation hardcode as stream=None (equivalent to False) and this can make the user's system unstable when downloading large datasets.
| settings = self._session.merge_environment_settings(http_request.url, {}, None, None, None) |
The download_file method in KaggleApi class tries to support chunked downloads but I am not sure this code works as expected because the downloading would be considered complete at this point.
kaggle-api/src/kaggle/api/kaggle_api_extended.py
Line 2181 in b97668b
| for data in response.iter_content(chunk_size): |
And I think the current usage of the kaggle.http_client() outside of the with self.build_kaggle_client() as kaggle: statement is not recommended because resource managed by kaggle object might be closed outside the with statement.
with self.build_kaggle_client() as kaggle:
...
download_file(..., kaggle.http_client(), ...)ex.
kaggle-api/src/kaggle/api/kaggle_api_extended.py
Lines 1187 to 1196 in b97668b
| with self.build_kaggle_client() as kaggle: | |
| request = ApiDownloadDataFileRequest() | |
| request.competition_name = competition | |
| request.file_name = file_name | |
| response = kaggle.competitions.competition_api_client.download_data_file(request) | |
| url = response.history[0].url | |
| outfile = os.path.join(effective_path, url.split('?')[0].split('/')[-1]) | |
| if force or self.download_needed(response, outfile, quiet): | |
| self.download_file(response, outfile, kaggle.http_client(), quiet, not force) |