Skip to content

stages.flux.daemon_flux module: remove numba jit decorators, clean up and add documentation, speed up interpolation/evaluation#930

Open
thehrh wants to merge 11 commits into
masterfrom
daemon_flux_no_numba
Open

stages.flux.daemon_flux module: remove numba jit decorators, clean up and add documentation, speed up interpolation/evaluation#930
thehrh wants to merge 11 commits into
masterfrom
daemon_flux_no_numba

Conversation

@thehrh
Copy link
Copy Markdown
Collaborator

@thehrh thehrh commented Apr 22, 2026

On top of resolving #929, this PR fixes missing documentation and cleans up the module and demonstrates performance and accuracy in the notebook.

@thehrh
Copy link
Copy Markdown
Collaborator Author

thehrh commented Apr 22, 2026

Failing workflows are due to #927

thehrh added 2 commits April 22, 2026 21:45
…owed any longer), no need to set calc_mode in setup_function, cosmetics
…> 5 statements), no need to set representation in compute_function, turn default energy grid into global variable with units [no ci]
@thehrh
Copy link
Copy Markdown
Collaborator Author

thehrh commented Apr 23, 2026

Timings as of now:

branch (single CPU): flux daemon_flux
- setup:    Total time (s): +0.000, n calls: 1
- compute:  Total time (s): 74.671, n calls: 100, time/call (s): mean 0.747, max. 0.784, min. 0.735
- apply:    Total time (s): +0.000, n calls: 100, time/call (s): mean +0.000, max. +0.000, min. +0.000

master (single CPU): flux daemon_flux
- setup:    Total time (s): +0.000, n calls: 1
- compute:  Total time (s): 76.168, n calls: 100, time/call (s): mean 0.762, max. 2.039, min. 0.738
- apply:    Total time (s): +0.000, n calls: 100, time/call (s): mean +0.000, max. +0.000, min. +0.000

master (3 CPUs): flux daemon_flux
- setup:    Total time (s): +0.000, n calls: 1
- compute:  Total time (s): 76.544, n calls: 100, time/call (s): mean 0.765, max. 2.095, min. 0.740
- apply:    Total time (s): +0.000, n calls: 100, time/call (s): mean +0.000, max. +0.000, min. +0.000

Let's see the how much 1) the external flux calculation by daemonflux, 2) the splining, and 3) the spline evaluation contribute (calc_mode = events using the 3y DeepCore events file). Here's a fairly typical example of the durations:

daemonflux numu flux generation duration (s): 3.8e-02
splining duration (s): 1.6e-03
daemonflux antinumu flux generation duration (s): 3.8e-02
splining duration (s): 1.5e-03
daemonflux nue flux generation duration (s): 3.9e-02
splining duration (s): 1.6e-03
daemonflux antinue flux generation duration (s): 4.0e-02
splining duration (s): 1.5e-03
PISA spline evaluation time (s, 795502 events): 6.0e-01

One can tell that evaluating the splines in event-by-event mode takes roughly 75% of the total time.

If we compare the pipeline IceCube_3y_neutrinos_daemon.cfg to IceCube_3y_neutrinos.cfg, the latter has the services flux.honda_ip + flux.barr_simple instead of flux.daemon_flux and performs its flux computations on a 200 x 200 grid.

Between these two analysis configurations, the template generation time of the one using daemonflux is ~0.5 s slower on a single CPU.

If the daemonflux pipeline employs the same 200 x 200 grid instead, the above PISA spline evaluation time drops from 0.6 s-> 0.4 s (template generation time still ~0.3 s slower than for IceCube_3y_neutrinos.cfg).

@thehrh
Copy link
Copy Markdown
Collaborator Author

thehrh commented Apr 29, 2026

@marialiubarska @jpyanez
Given the significant contribution of the daemon_flux service to template generation times (I presume not just for this 3-year DRAGON pipeline, timings above) and to inform possible searches for alternatives or optimisation attempts, I'm wondering which reasons guided the choice of creating a 2D interpolant with RectBivariateSpline here? For example, was speed a factor at all?

@marialiubarska
Copy link
Copy Markdown
Contributor

Hey, I'm not sure I understood if your question is on the reason we do 2d interpolation or on the choice of sepcific interpolation method.

The reason for 2d interpolation is speed, because of how daemonflux package uses two 1d interpolations, which is not ideal for our case where each neutrino has random coszen and energy, using it directly was much slower then re-interpolating in 2d for each set of updated parameters.

In terms of the specific method, I don't remember why RectBivariateSpline was chosen, I don't think I looked into spped comparison with other interpolation methods.

@thehrh
Copy link
Copy Markdown
Collaborator Author

thehrh commented Apr 29, 2026

The choice of RectBivariateSpline is what I'm interested in, but the background on the 2D choice is useful to know also. I presume you can confirm that daemon_flux is a significant contributor to the time it takes to generate expected distributions during any given fit?

@marialiubarska
Copy link
Copy Markdown
Contributor

Hmm, I'm not 100% sure, but I think it's not just daemonflux, 2d interpolation does take at least comparable amount of time to the grid evaluation. It's just that it was much faster then looping over events.

There was a third method: for N events, sort them by energy and coszen into NxN grid, then use it to get flux from daemonflux, then keep the diagonal and discard other values. This approach was a bit faster then 2d interpolation, but we didn't use it because with larger numbers of events it could lead to memory issues

@marialiubarska
Copy link
Copy Markdown
Contributor

I remember we had a discussion with @afedynitch regarding this, maybe he also has some thoughts

@thehrh
Copy link
Copy Markdown
Collaborator Author

thehrh commented Apr 29, 2026

Hmm, I'm not 100% sure, but I think it's not just daemonflux, 2d interpolation does take at least comparable amount of time to the grid evaluation. It's just that it was much faster then looping over events.

There was a third method: for N events, sort them by energy and coszen into NxN grid, then use it to get flux from daemonflux, then keep the diagonal and discard other values. This approach was a bit faster then 2d interpolation, but we didn't use it because with larger numbers of events it could lead to memory issues

Based on my timings above, obtaining the fluxes from the external daemonflux dependency and creating the RectBivariateSpline objects is much less important than evaluating the latter (all in event-by-event mode), which is why I'm interested in optimising the PISA side (=interpolants created from daemonflux.Flux.flux) at this point.

The other aspect is that the total template generation time on a single core is of the order of twice the daemon_flux PISA service's compute time , so an $X\,\%$ performance gain here should result in an overall $\sim X/2\,\%$ gain. I would assume this relation to be roughly independent of the underlying event sample.

Any timings from a fit including the daemon_flux service similar to https://icecube.github.io/pisa/notebooks/IceCube_3y_oscillations_example.html#profiling (which uses honda+barr instead) would be very useful to me.

@thehrh thehrh marked this pull request as draft May 7, 2026 14:07
@thehrh
Copy link
Copy Markdown
Collaborator Author

thehrh commented May 8, 2026

I've now included an external numba-accelerated interpolation package as an (experimental) option. Compared to the above timings, the spline evaluation time is reduced by a factor of approx. 4, meaning it is now smaller than the total daemonflux flux generation time (given the indicated number of 795502 MC events in the DRAGON sample):

[    INFO] daemonflux numu flux generation duration (s): 4.90e-02
[    INFO] splining duration (s): 8.17e-04
[    INFO] daemonflux antinumu flux generation duration (s): 4.53e-02
[    INFO] splining duration (s): 5.43e-04
[    INFO] daemonflux nue flux generation duration (s): 3.92e-02
[    INFO] splining duration (s): 6.13e-04
[    INFO] daemonflux antinue flux generation duration (s): 4.00e-02
[    INFO] splining duration (s): 5.20e-04
[    INFO] PISA spline evaluation time (s, 795502) events): 1.50e-01

The event distributions given this initial speedup attempt differ at the sub-percent level; here are typical maximal and minimal absolute fractional differences between binwise count expectations:

nue_cc, max: 0.0006977199225309526, min: 0.0006444681210882939
numu_cc, max: 7.285358006379291e-05, min: 5.952816310806611e-05
nutau_cc, max: 0.0008284691842364866, min: 0.0006873706923714039
nue_nc, max: 0.006025318891604914, min: 0.005163450066271475
numu_nc, max: 0.00019059368789305282, min: 0.00015835165290412938
nutau_nc, max: 0.0005200821727574986, min: 0.00042091459621874044
nuebar_cc, max: 0.001419649454289934, min: 0.0012558469729848204
numubar_cc, max: 0.0001523649138680785, min: 0.00013414333696720485
nutaubar_cc, max: 0.0008741137723083838, min: 0.0007975125458589856
nuebar_nc, max: 0.014431783203481844, min: 0.011095828418334359
numubar_nc, max: 0.0008163497000315399, min: 0.0007224898112474274
nutaubar_nc, max: 0.003314216747600409, min: 0.003006488879502088

I've also generated 50 random combinations of the parameters of the daemon_flux service in the public DRAGON neutrino pipeline and plotted the ratios between the event distributions resulting from the two implementations here: https://user-web.icecube.wisc.edu/~tehrhardt/map_comparisons/daemon_flux_new_regular_vs_fast_interp/fast_interp_default_egrid/.

A numu CC example is:
image

Others seem to look very similar.

Given the current speedup of ~50% for flux.daemon_flux computations, I wonder whether analyses are sensitive to the choice of implementation.

@thehrh thehrh changed the title stages.flux.daemon_flux module: remove numba jit decorators, clean up and add documentation stages.flux.daemon_flux module: remove numba jit decorators, clean up and add documentation, speed up interpolation/evaluation May 8, 2026
@thehrh
Copy link
Copy Markdown
Collaborator Author

thehrh commented May 20, 2026

Since I'm not aware of a specific motivation behind the choice of the ENERGY_GRID global variable, following are timings and relative Map deviations for other choices. As a reference, the minimal daemon_flux compute time with fast interpolation and the default energy grid, of 500 log-spaced values between 0.1 and 10⁵ GeV, is ~0.31 s.

There does not seem to be any significant speed up once grid size drops below 100 values (compute time ~0.21 s), where the relative deviations are still very similar to those of template produced with the default grid though. At 10 (!) grid points, all relative deviations are at the percent level.

energy grid with 200 log-spaced values


nue_cc, max: 0.0007032636910766987, min: 0.0006480499738102815
numu_cc, max: 6.76457459031931e-05, min: 5.590904064865424e-05
nutau_cc, max: 0.0008341897363580841, min: 0.0006910765444860524
nue_nc, max: 0.006078318403203845, min: 0.005193687414849116
numu_nc, max: 0.00019264806947761118, min: 0.00016047070964865805
nutau_nc, max: 0.0005089321612748992, min: 0.0003961536630941301
nuebar_cc, max: 0.0014348484781289731, min: 0.001275686138113708
numubar_cc, max: 0.00015358870153954767, min: 0.00013423628626833082
nutaubar_cc, max: 0.000886009634639455, min: 0.0007861050605572776
nuebar_nc, max: 0.014420792412245613, min: 0.01108239079226981
numubar_nc, max: 0.0008108631668439374, min: 0.0007059665774915829
nutaubar_nc, max: 0.003314310641292867, min: 0.003006599110231705

energy grid with 100 log-spaced values


nue_cc, max: 0.0007562574961501183, min: 0.0006958975848508406
numu_cc, max: 9.408956167305278e-05, min: 7.177148553148146e-05
nutau_cc, max: 0.0008684512583732959, min: 0.0006928834822301446
nue_nc, max: 0.006124932126317551, min: 0.005278143020090229
numu_nc, max: 0.0001945088630831319, min: 0.0001625651172070575
nutau_nc, max: 0.0004409632070347742, min: 0.0003121188374449897
nuebar_cc, max: 0.0015471582456897215, min: 0.0014067864193395777
numubar_cc, max: 0.00014762517320871717, min: 0.00012948326146812488
nutaubar_cc, max: 0.0009158186987179999, min: 0.0007770057030777512
nuebar_nc, max: 0.01426574325371879, min: 0.010907300148320963
numubar_nc, max: 0.0008719907396266656, min: 0.0007393728077474975
nutaubar_nc, max: 0.00310803426122814, min: 0.0028841790153176774
[    INFO] daemonflux numu flux generation duration (s): 2.94e-02
[    INFO] splining duration (s): 4.76e-04
[    INFO] daemonflux antinumu flux generation duration (s): 2.42e-02
[    INFO] splining duration (s): 6.01e-04
[    INFO] daemonflux nue flux generation duration (s): 2.64e-02
[    INFO] splining duration (s): 4.76e-04
[    INFO] daemonflux antinue flux generation duration (s): 2.74e-02
[    INFO] splining duration (s): 4.75e-04
[    INFO] PISA spline evaluation time (s, 795502) events): 1.44e-01

energy grid with 50 log-spaced values


nue_cc, max: 0.0007601242442450019, min: 0.0006979777841420998
numu_cc, max: 0.0004458494877522111, min: 0.0002604620387897745
nutau_cc, max: 0.0009039585320506854, min: 0.0006548871616288167
nue_nc, max: 0.006186760045457843, min: 0.005318751930374121
numu_nc, max: 0.0003193439661157826, min: 0.00024513407589436863
nutau_nc, max: 0.0005713281179285144, min: 0.0003753845242440459
nuebar_cc, max: 0.0014010953700068023, min: 0.0012283004896959083
numubar_cc, max: 0.00029831239895296605, min: 0.0002219255914019182
nutaubar_cc, max: 0.0008997864739751873, min: 0.0007847121073513096
nuebar_nc, max: 0.014669337019489017, min: 0.011185927609781817
numubar_nc, max: 0.0010766805047367196, min: 0.0009330475312435296
nutaubar_nc, max: 0.003351553177938384, min: 0.0030404888162098884

energy grid with 25 log-spaced values


nue_cc, max: 0.0010238108720788549, min: 0.0009030754986619497
numu_cc, max: 0.000792844798613296, min: 0.0005191869667389621
nutau_cc, max: 0.001216129732052831, min: 0.0010716849205862611
nue_nc, max: 0.00599184761086515, min: 0.0051628881020905895
numu_nc, max: 0.0011878009231093092, min: 0.0009010935115135575
nutau_nc, max: 0.001156238624448378, min: 0.0007599743389719115
nuebar_cc, max: 0.0012960487235788546, min: 0.0010315298807703523
numubar_cc, max: 0.0004840622397312057, min: 0.00035036526953658917
nutaubar_cc, max: 0.0012904755428669134, min: 0.0011411679076896112
nuebar_nc, max: 0.014797319370727577, min: 0.010957170143665327
numubar_nc, max: 0.0010384473847636162, min: 0.0008936761776663445
nutaubar_nc, max: 0.0030963027675952205, min: 0.002820337667783901
  • minimal daemon_flux compute duration: 0.224 s (vs. regular implementation: 0.790 s)

energy grid with 10 (!) log-spaced values


nue_cc, max: 0.0308539886398366, min: 0.017277063903420285
numu_cc, max: 0.008770646400080415, min: 0.006050360347252464
nutau_cc, max: 0.010577352663165505, min: 0.005677879412975053
nue_nc, max: 0.02454013389333761, min: 0.014977111866241016
numu_nc, max: 0.009896584662672882, min: 0.005194588511014799
nutau_nc, max: 0.014032345142411268, min: 0.00775040902142593
nuebar_cc, max: 0.032752431356523906, min: 0.020514113485681613
numubar_cc, max: 0.01373526114756977, min: 0.008080009263064921
nutaubar_cc, max: 0.01709734102760799, min: 0.01072478751402709
nuebar_nc, max: 0.03377852929136632, min: 0.02546872585532833
numubar_nc, max: 0.010188422884134476, min: 0.005343213240449076
nutaubar_nc, max: 0.01932538860051647, min: 0.013034873837136392
  • minimal daemon_flux compute duration: 0.210 s (vs. regular implementation: 0.771 s)

@thehrh thehrh marked this pull request as ready for review May 20, 2026 17:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants