-
Notifications
You must be signed in to change notification settings - Fork 63
Make fdasrsf dependency optional
#700
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
vnmabus
wants to merge
26
commits into
develop
Choose a base branch
from
feature/dynamic_programming_warping
base: develop
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add functionality to implement the DP algorithm for warping alignment (used in elastic registration) in pure Python. This would allow us to remove the dependency on fdasrsf, have a more clear and maintainable implementation, and implement elastic registration for more array types in the future.
Add penalty term to L2 energy function in the DP algorithm.
Add ASV benchmark for testing the time it takes to compute the Fisher-Rao elastic registration.
In order for elastic registration to be tractable with a high number of points, we can now reduce the grid used, so that only the energy of direct lines from near grid points are computed. This should not alter much the result (lines with far away points can be approximated with more line segments), but can drastically improve performance. We now use trapezoid quadrature to compute the integrals that appear in the energy computation. The reason for that change is that it is easier to compute the right quadrature weights when integrals with different domains are computed in a vectorized way.
Scaling by the square root of the slope is necessary when we work with the SRSF of the curves instead of with the curves themselves.
In addition, the code has been simplified to use the abstractions in scikit-fda.
In almost-constant parts of the function our implementation of line energies computation can obtain different results than the reference (fdasrsf), because we use a different quadrature (trapezoidal instead of rectangular). This can be fixed adding a small bit of regularization.
The penalty term was not integrated, as it should be.
Use the newly implemented DP algorithm everywhere.
In particular the evaluation has been optimized using always linear interpolation. This optimization should probably be applied in the future to normal evaluation too in the default case.
Some values could be computed once, or once per row.
Unfortunatly, even with these the algorithm is 3 times slower than the reference implementation.
Line energies are not computed inside the Python loop anymore, achieving potentially higher performance.
Here we try to use Numba to compute the expensive part of the DP algorithm, but without changing much the code. It is about three times *slower* than the equivalent NumPy-only code.
This reverts commit 721f63b.
This reverts commit c86f3cb.
This reverts commit 5b5fbef.
Use spline interpolation, which is faster, and einsum. Einsum is not optimized, as optimized one is slower, even with einsum_path.
We try a vectorization over columns, which is faster than over cells and consumes less memory (and is also faster) than computing all line energies at once. This is still 8 times slower than the reference implementation.
The fdasrsf library is no longer a mandatory dependency, but can still be used to achieve better performance.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #700 +/- ##
===========================================
- Coverage 87.04% 87.03% -0.01%
===========================================
Files 199 199
Lines 14639 14754 +115
===========================================
+ Hits 12742 12841 +99
- Misses 1897 1913 +16 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem with the accuracy of the line integrals in the DP algorithm: the times after the warping where the curve change slope are not guaranteed to correspond with the original ones. Thus, if the warping compresses the time, the integral could not take into account many details of the warped function. This was specially noticeable with non-smooth curves, such as in the Berkely Growth dataset. As a temporary solution, we increase the number of points used in the discretization of the curves, by a factor of `dim_grim` (which roughly corresponfs to the maximum slope). This improves accuracy, but makes the algorithm slower again.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Describe the proposed changes
Adds a new implementation of the dynamic programming (DP) algorithm used in elastic registration. The new implementation is written in pure Python, with no dependencies on
fdasrsf. Unfortunately, it is around 8-10 times slower for large data, so we still keepfdasrsfas an optional dependency, for those that really need the performance. Iffdasrsfis present it will be used.Additional information
There are several benefits to have our own implementation independent of
fdasrsf:fdasrsfhad a lot of problems to compile when Python versions change, or BLAS libraries are not easily found. This was disrupting for both our users and our continuous integration, even when these users do not need to use elastic registration.of possible uses, including having interactive documentation. However, non-mainstream libraries that are written in C (including
fdasrsf) are not available in Pyodide. After removing this dependency, I think the only missing thing would be to remove the dependency on Numba.All of these make having a Python implementation useful, even if it is (currently) not as performant as the one in
fdasrsf.Checklist before requesting a review