fix: numpy default NaT handling#3863
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #3863 +/- ##
==========================================
+ Coverage 92.86% 92.91% +0.04%
==========================================
Files 86 86
Lines 10705 10707 +2
==========================================
+ Hits 9941 9948 +7
+ Misses 764 759 -5
🚀 New features to boost your workflow:
|
|
I'm not actually sure what the endgame is here. It started with this issue: numpy/numpy#28287, now it looks like the direction is requiring that |
src/zarr/core/dtype/npy/time.py
Outdated
| if isinstance(data, np.timedelta64) and np.isnan(data): | ||
| return np.timedelta64("NaT", self.unit) |
There was a problem hiding this comment.
Not really sure what to do here since the following strikes me as a bug that won't be fixed (i.e., the below is the codepath of the next line return self._cast_scalar_unchecked(data)):
# /// script
# requires-python = "==3.12"
# dependencies = [
# "numpy==2.0",
# ]
# ///
import numpy as np
# AssertionError
assert np.isnan(
np.dtype("timedelta64[10s]").type(
np.timedelta64("NaT", "s"), "10s"
)
)There was a problem hiding this comment.
either this is a bug in numpy that we should upstream, or there's some equivalent isNaT routine? Alternatively, we check the int64 value of the timedelta, i think all the NaT values land on the same value.
There was a problem hiding this comment.
i think all the NaT values land on the same value
I observed this as well
we should upstream
My "bug that won't be fixed" point was that this appears to be numpy<2.4 specific (i.e., the above script fails on all numpy versions I tried less than 2.4)
There was a problem hiding this comment.
# /// script
# requires-python = "==3.12"
# dependencies = [
# "numpy==2.0", # or 2.3
# ]
# ///
import numpy as np
# AssertionError
assert np.isnat(
np.dtype("timedelta64[10s]").type(
np.timedelta64("NaT", "s"), "10s"
)
)also fails.
The. output of
np.dtype("timedelta64[10s]").type(
np.timedelta64("NaT", "s"), "10s"
)is this magic integer you mentioned
There was a problem hiding this comment.
gotcha, I didn't see the numpy version pin. looks like numpy does have an isnat function so we should probably just use that instead
There was a problem hiding this comment.
wait what do we expect here? this isn't actually a NaT value, it's just a huge timedelta:
>>> np.dtype("timedelta64[10s]").type(np.timedelta64("NaT", "s"), "10s")
np.timedelta64(922337203685477579,'10s')There was a problem hiding this comment.
Before this change, the call would have been unit-less:
assert np.isnat(np.dtype("timedelta64[10s]").type(np.timedelta64("NaT"), "10s"))which works. Bringing in the units is what is causing the non-NaT value in pre-numpy 2.4 versions but now:
# /// script
# requires-python = "==3.12"
# dependencies = [
# "numpy>=2.4",
# ]
# ///
import numpy as np
assert np.isnat(np.dtype("timedelta64[10s]").type(np.timedelta64("NaT", "ms"), "10s"))
assert np.isnat(np.dtype("timedelta64[10s]").type(np.timedelta64("NaT", "ns"), "10s"))works so our old code path works fine although now that I'm writing this, is no longer hit when NaT.
Agreed |
|
@ilan-gold is this ready? |
|
Yea, I agree we should keep an eye on this. |
Upstream CI started failing because of
numpyNaTrequiring a unit, not sure if this is intended on their side sinceNaTwould be the one time I might not expect a unit..I am going to open an issueJudging by numpy/numpy#29619 (comment), this is the intention.I don't really get this as a newbie to datetimes since:
works.
See: https://github.com/zarr-developers/zarr-python/actions/runs/23841178099/job/69496895274?pr=3826 for the failures
TODO:
docs/user-guide/*.mdchanges/