CAF (Common Analysis Framework) files are used in SBN analyses either through the CAFAna framework or through various standalone projects. In either case, they are a common starting point for many analyzers. The standard pathway for making CAFs only includes reconstruction outputs which are available in the artROOT files after the final step of the reconstruction chain.
The SPINE (Scalable Particle Imaging with Neural Embeddings) package (Github) implements an end-to-end reconstruction chain with a machine learning algorithms at the heart of point classification, clustering, and particle identification. The reconstruction chain uses LArCV files as input and produces HDF5 files with the outputs of the reconstruction. This necessarily takes place outside of the LArSoft ecosystem, so to make these reconstruction outputs available for analyzers requires a little extra work.
The sbn_ml_cafmaker package provides code for merging the ML reconstruction outputs into existing CAF files (enforcing event-by-event matching) and for producing standalone CAF files with only ML reconstruction outputs.
The package has two main requirements:
- HDF5 C++ library (instructions below)
sbnanaobj(must have ML variables implemented inStandardRecord)
The most recent versions of hdf5 available on UPS do not come packaged with the C++ API. The latest version which works as a viable dependency is v1.10.5 with qualifier e20, which is absolutely ancient. Moreover, hdf5 will not be supported in the future as a UPS product. The switch to qualifier e26 means that the UPS product is essentially a non-workable solution for the foreseable future. The unfortunate solution is to clone and build the hdf5 repository.
CMake is required to be setup before beginning installation:
cd /path/to/hdf5 (hereafter $HDF5_HOME)
git clone https://github.com/HDFGroup/hdf5.git $HDF5_HOME/hdf5_src
mkdir $HDF5_HOME/hdf5_build
mkdir $HDF5_HOME/hdf5_install
cd $HDF5_HOME/hdf5_src
git checkout hdf5-1_10_5
cd $HDF5_HOME/hdf5_build
cmake $HDF5_HOME/hdf5_src/ -DCMAKE_CXX_COMPILER="g++" -DCMAKE_CXX_STANDARD=17 -DCMAKE_INSTALL_PREFIX=$HDF5_HOME/hdf5_install/
make install
This should build an install of hdf5 that is compatible with qualifier e26 and that can be used for building sbn_ml_cafmaker.
The machine-learning reconstruction outputs are implemented as two additional branches within the StandardRecord tree. It is recommended to clone the latest version of sbnanaobj to be compatible with existing CAF files (or backwards compatible) and then git cherry-pick the singular commit on feature branch feature/mueller_mlcafs implementing these new variables. This will result in the most up-to-date version of StandardRecord after building for use in sbn_ml_cafmaker.
The repository can be easily built using CMake:
git clone https://github.com/justinjmueller/sbn_ml_cafmaker.git
cd sbn_ml_cafmaker
mkdir build
cmake /path/to/sbn_ml_cafmaker -DHDF5_INSTALL="/path/to/hdf5/install"
make
Select a tagged sbnana version available from ups list -aK+ sbnana.
For example:
setup sbnana v10_01_04 -q e26:prof
cmake /exp/sbnd/app/users/sc5303/sbn_ml_cafmaker -DHDF5_INSTALL="/path/to/hdf5/install"
make
The executable that handles the merging of the ML reconstruction outputs into an existing CAF (with the same events) is merge_sources. The executable takes as input a standard CAF file (not flattened) and the HDF5 with the reconstruction outputs of the same set of events. It uses the run number and event number to build a look-up table for events in the HDF5 file, then copies them into the StandardRecord during the main loop over entries in the CAF file. Finally, the output CAF file is written. There exists a separate executable for data (only contains reconstructed objects) and simulation (additionally has truth objects). The executable can be used as:
./merge_sources_simulation <output_caf_file> <input_caf_file> <input_hdf5_file>
or
./merge_sources_data <output_caf_file> <input_caf_file> <input_hdf5_file>
In the case where no ML reconstruction outputs exist for an event, none are written. If the ML classes within the StandardRecord are already filled, they are erased and replaced with the new inputs. This serves to allow efficient updating of reconstruction outputs in the future.
The executable that handles the creation of standalone CAFs with only ML reconstruction outputs is make_standalone. The executable takes as input a list of input HDF5 files and places them in a single CAF output file. There exists a separate executable for data (only contains reconstructed objects) and simulation (additionally has truth objects). The executable can be used as:
./make_standalone_simulation <output_caf_file> <event_offset> <input_hdf5_file(s)>
or
./make_standalone_data <output_caf_file> <event_offset> <input_hdf5_file(s)>
The event_offset is used to introduce a offset to the image_id attribute of interactions and particles. This may be useful in some cases for breaking the degeneracy of image_ids in multiple input files. The list of HDF5 input files may be one or longer - the code will loop over the remaining arguments and produce a single output file.
| Variable Name | Variable Type | Units | Description |
|---|---|---|---|
| cathode_offset | double | None | Distance from the cathode. |
| flash_hypo_pe | double | None | Total PE of the hypothesized flash. |
| flash_id | int64_t | None | Flash ID for the matched flash. |
| flash_time | double | None | Time of the matched flash. |
| flash_total_pe | double | None | Total PE of the matched flash. |
| id | int64_t | None | Interaction ID. |
| index | std::vector<int64_t> | None | List of coordinate indices that comprise the interaction. |
| is_cathode_crosser | bool | None | Whether the interaction is a cathode-crosser. |
| is_contained | bool | None | Whether the interaction is contained. |
| is_fiducial | bool | None | Whether the interaction has an interaction in the fiducial volume. |
| is_flash_matched | bool | None | Whether the flash is matched to the interaction. |
| is_matched | bool | None | Whether the interaction is matched to a true interaction. |
| is_truth | bool | None | Whether the interaction is a truth interaction. |
| match_ids | std::vector<int64_t> | None | Interaction IDs of the considered matches (correspond to true interactions). |
| match_overlaps | std::vector | None | Intersection over union (IoU) of the considered matches. |
| module_ids | std::vector | None | Module IDs of the interaction. |
| particle_ids | std::vector<int64_t> | None | Particle IDs in the interaction. |
| topology | char * | None | Topology of the interaction (e.g. "0g0e1mu0pi2p") considering only primaries. |
| units | char * | None | Units in which the position coordinates are expressed. |
| vertex | std::array<float, 3> | None | Vertex of the interaction in detector coordinates. |
| Variable Name | Variable Type | Units | Description |
|---|---|---|---|
| calo_ke | double | None | Calorimetric kinetic energy. |
| cathode_offset | double | None | Distance from the cathode. |
| csda_ke | double | None | Continuous-slowing-down-approximation kinetic energy. |
| end_dir | std::array<float, 3> | None | Unit direction vector calculated at the particle end point. |
| end_point | std::array<float, 3> | None | End point (vector) of the particle. |
| fragment_ids | std::vector<int32_t> | None | Fragment IDs comprising the particle. |
| id | int64_t | None | Particle ID. |
| index | std::vector<int64_t> | None | List of coordinate indices that comprise the particle. |
| interaction_id | int64_t | None | Parent interaction ID. |
| is_cathode_crosser | bool | None | Whether the particle is a cathode-crosser. |
| is_contained | bool | None | Whether the particle is contained. |
| is_matched | bool | None | Whether the particle is matched. |
| is_primary | bool | None | Whether the particle is a primary particle. |
| is_truth | bool | None | Whether the particle is a truth particle. |
| is_valid | bool | None | Whether the particle passes thresholds and counts towards topology. |
| ke | double | None | Kinetic energy according to assumed best energy estimator (CSDA, calorimetric, or MCS). |
| length | double | None | Length of the particle. |
| match_ids | std::vector<int64_t> | None | Match ID. |
| match_overlaps | std::vector | None | Match overlap. |
| mcs_ke | double | None | Multiple Coulomb scattering kinetic energy. |
| module_ids | std::vector | None | Module IDs of the particle. |
| momentum | std::array<float, 3> | None | Momentum (vector) of the particle. |
| p | float | None | Momentum magnitude. |
| pdg_code | int64_t | None | PDG code of the particle. |
| pid | Pid | None | Particle ID (see Pid enumeration). |
| pid_scores | std::array<float, 5> | None | PID softmax scores. |
| ppn_ids | std::vector<int32_t> | None | PPN IDs of the particle. |
| primary_scores | std::array<float, 2> | None | Primary softmax scores. |
| shape | Shape | None | Semantic type of the particle (see SemanticType enumeration). |
| start_dir | std::array<float, 3> | None | Unit direction vector calculated at the particle start point. |
| start_point | std::array<float, 3> | None | Start point (vector) of the particle. |
| units | char * | None | Units in which the position coordinates are expressed. |
| Variable Name | Variable Type | Units | Description |
|---|---|---|---|
| bjorken_x | double | None | Bjorken x of the neutrino interaction. |
| cathode_offset | double | None | Distance from the cathode. |
| creation_process | char * | None | Creation process of the neutrino. |
| current_type | CurrentType | None | Current type of the neutrino. |
| energy_init | double | None | Initial energy of the neutrino. |
| energy_transfer | double | None | Energy transfer (Q0) of the neutrino interaction. |
| flash_hypo_pe | double | None | Total PE of the hypothesized flash. |
| flash_id | int64_t | None | Flash ID for the matched flash. |
| flash_time | double | None | Time of the matched flash. |
| flash_total_pe | double | None | Total PE of the matched flash. |
| hadronic_invariant_mass | double | None | Hadronic invariant mass of the neutrino. |
| id | int64_t | None | Interaction ID. |
| index | std::vector<int64_t> | None | List of coordinate indices that comprise the interaction. |
| index_adapt | std::vector<int64_t> | None | Index corresponding to the true interaction in the adapted cluster label points. |
| index_g4 | std::vector<int64_t> | None | Index corresponding to the true interaction in the G4 points (effectively SED). |
| inelasticity | double | None | Inelasticity of the neutrino interaction. |
| interaction_id | int64_t | None | Deprecated. |
| interaction_mode | InteractionMode | None | Interaction mode of the neutrino. |
| interaction_type | InteractionType | None | Interaction type of the neutrino. |
| is_cathode_crosser | bool | None | Whether the interaction is a cathode-crosser. |
| is_contained | bool | None | Whether the interaction is contained. |
| is_fiducial | bool | None | Whether the interaction has an interaction in the fiducial volume. |
| is_flash_matched | bool | None | Whether the flash is matched to the interaction. |
| is_matched | bool | None | Whether the interaction is matched to a true interaction. |
| is_truth | bool | None | Whether the interaction is a truth interaction. |
| lepton_p | double | None | Momentum of the lepton in the interaction. |
| lepton_pdg_code | int64_t | None | PDG code of the lepton in the interaction. |
| lepton_track_id | int64_t | None | Track ID of the lepton in the neutrino interaction. |
| match_ids | std::vector<int64_t> | None | Interaction IDs of the considered matches (correspond to true interactions). |
| match_overlaps | std::vector | None | Intersection over union (IoU) of the considered matches. |
| mct_index | int64_t | None | Index of the neutrino in the original MCTruth array. |
| module_ids | std::vector<int32_t> | None | Module IDs of the interaction. |
| momentum | std::array<float, 3> | None | Momentum (vector) of the neutrino. |
| momentum_transfer | double | None | Momentum transfer (Q^2) of the neutrino interaction. |
| momentum_transfer_mag | double | None | Momentum transfer (Q3) of the neutrino interaction. |
| nu_id | int64_t | None | Neutrino ID (-1 = not a neutrino, 0 = first neutrino, 1 = second neutrino, etc.). |
| nucleon | int64_t | None | Nucleon in the neutrino interaction. |
| orig_id | int64_t | None | Original ID of the interaction. |
| particle_ids | std::vector<int64_t> | None | Particle IDs in the interaction. |
| pdg_code | int64_t | None | PDG code of the neutrino. |
| position | std::array<float, 3> | None | Position of the neutrino interaction. |
| quark | int64_t | None | Quark in the neutrino interaction. |
| reco_vertex | std::array<float, 3> | None | Vertex of the interaction in detector coordinates (reco). |
| target | int64_t | None | Target in the neutrino interaction. |
| theta | double | None | Angle of the neutrino interaction. |
| topology | char * | None | Topology of the interaction (e.g. "0g0e1mu0pi2p") considering only primaries. |
| track_id | int64_t | None | Track ID of the neutrino interaction. |
| units | char * | None | Units in which the position coordinates are expressed. |
| vertex | std::array<float, 3> | None | Vertex of the interaction in detector coordinates (truth). |
| Variable Name | Variable Type | Units | Description |
|---|---|---|---|
| ancestor_creation_process | char * | None | Geant4 creation process of the ancestor particle. |
| ancestor_pdg_code | int64_t | None | PDG code of the ancestor particle. |
| ancestor_position | std::array<float, 3> | None | Position of the ancestor particle. |
| ancestor_t | double | None | Time of the ancestor particle. |
| ancestor_track_id | int64_t | None | Track ID of the ancestor particle. |
| calo_ke | double | None | Calorimetric kinetic energy. |
| cathode_offset | double | None | Distance from the cathode. |
| children_counts | std::vector<int64_t> | None | Number of children of the particle. |
| children_id | std::vector<int64_t> | None | List of particle ID of children particles. |
| creation_process | char * | None | Geant4 creation process of the particle. |
| csda_ke | double | None | Continuous-slowing-down-approximation kinetic energy. |
| end_dir | std::array<float, 3> | None | Unit direction vector calculated at the particle end point. |
| end_momentum | std::array<float, 3> | None | Momentum (vector) of the particle at the end. |
| end_p | float | None | Momentum magnitude of the particle at the end. |
| end_point | std::array<float, 3> | None | End point (vector) of the particle. |
| end_position | std::array<float, 3> | None | End position (vector) of the particle. |
| energy_deposit | double | None | Energy deposited by the particle. |
| energy_init | double | None | Initial energy of the particle. |
| first_step | std::array<float, 3> | None | Coordinates of the first step of the particle. |
| fragment_ids | std::vector<int64_t> | None | Fragment IDs comprising the particle. |
| gen_id | int64_t | None | Generator ID of the particle (may differ from Geant4 or Supera ID). |
| group_id | int64_t | None | Group ID of the particle. |
| group_primary | int64_t | None | Whether the particle is a primary within its group. |
| id | int64_t | None | Particle ID. |
| index | std::vector<int64_t> | None | List of coordinate indices that comprise the particle. |
| index_adapt | std::vector<int64_t> | None | Index corresponding to the true particle in the adapted cluster label points. |
| index_g4 | std::vector<int64_t> | None | Index corresponding to the true particle in the G4 points (effectively SED). |
| interaction_id | int64_t | None | Parent interaction ID. |
| interaction_primary | int64_t | None | Whether the particle is a primary within its interaction (equivalent to is_primary). |
| is_cathode_crosser | bool | None | Whether the particle is a cathode-crosser. |
| is_contained | bool | None | Whether the particle is contained. |
| is_matched | bool | None | Whether the particle is matched. |
| is_primary | bool | None | Whether the particle is a primary particle. |
| is_truth | bool | None | Whether the particle is a truth particle. |
| is_valid | bool | None | (Whether the particle passes thresholds and counts towards topology. |
| ke | double | None | Kinetic energy according to assumed best energy estimator (CSDA, calorimetric, or MCS). |
| last_step | std::array<float, 3> | None | Coordinates of the last step of the particle. |
| length | double | None | Length of the particle. |
| match_ids | std::vector<int64_t> | None | Match ID. |
| match_overlaps | std::vector | None | Match overlap. |
| mcs_ke | double | None | Multiple Coulomb scattering kinetic energy. |
| mcst_index | int64_t | None | MCST index. |
| mct_index | int64_t | None | Index of the particle in the original MCTruth array.. |
| module_ids | std::vector | None | Module IDs of the particle. |
| momentum | std::array<float, 3> | None | Momentum (vector) of the particle. |
| nu_id | int64_t | None | Neutrino ID (-1 = not a neutrino, 0 = first neutrino, 1 = second neutrino, etc.). |
| num_voxels | int64_t | None | Number of voxels comprising the particle. |
| orig_id | int64_t | None | Original ID of the particle. |
| orig_interaction_id | int64_t | None | Interaction ID as it was stored in the parent LArCV file under the interaction_id attribute. |
| p | float | None | Momentum magnitude. |
| parent_creation_process | char * | None | Geant4 creation process of the parent particle. |
| parent_id | int64_t | None | Parent particle ID. |
| parent_pdg_code | int64_t | None | PDG code of the parent particle. |
| parent_position | std::array<float, 3> | None | Position of the parent particle. |
| parent_t | double | None | Time of the parent particle. |
| parent_track_id | int64_t | None | Track ID of the parent particle. |
| pdg_code | int64_t | None | PDG code of the particle. |
| pid | Pid | None | Particle ID (see Pid enumeration). |
| position | std::array<float, 3> | None | Position of the particle. |
| shape | Shape | None | Semantic type of the particle (see SemanticType enumeration). |
| start_dir | std::array<float, 3> | None | Unit direction vector calculated at the particle start point. |
| start_point | std::array<float, 3> | None | Start point (vector) of the particle. |
| t | double | None | Time of the particle. |
| track_id | int64_t | None | Track ID of the particle. |
| truth_end_dir | std::array<double, 3> | None | Unit direction vector calculated at the truth particle end point. |
| truth_start_dir | std::array<double, 3> | None | Unit direction vector calculated at the truth particle start point. |
| units | char * | None | Units in which the position coordinates are expressed. |