Description of feature
In theory, it should be possible to pre-allocate and then reuse memory for
- io operations (i.e., read from disk per anndata). For sparse data, this could prove challenging (although I’d guess not impossible) because of the uneven nature of
data and indices reads - certainly upper bounds derived from indptr should be doable on how much memory needs to be preallocated
- then also for in-memory shuffle (i.e., we concatenate the read-from-disk data directly into preallocated buffers, shuffle the data that was put into that buffer, yield, repeat after next read. this suffers from the aame above problem around uneven buffers for sparse matrices)
Handling leftover data for the second buffwr (i.e., cocnat buffer) might make this challenging, but it’s also possible we can at least upper bound the needed memory and then track how much and where the needed data is stored.
The benefit here would of course be not having to spend time allocating memory. Is this actually a bottleneck though? Maybe, maybe not.
Description of feature
In theory, it should be possible to pre-allocate and then reuse memory for
dataandindicesreads - certainly upper bounds derived fromindptrshould be doable on how much memory needs to be preallocatedHandling leftover data for the second buffwr (i.e., cocnat buffer) might make this challenging, but it’s also possible we can at least upper bound the needed memory and then track how much and where the needed data is stored.
The benefit here would of course be not having to spend time allocating memory. Is this actually a bottleneck though? Maybe, maybe not.