Add DBSCAN clustering step to Sophronia city#951
Conversation
gonzaponte
left a comment
There was a problem hiding this comment.
Looks quite good. Here is the first round of comments.
7995532 to
2eadf57
Compare
gonzaponte
left a comment
There was a problem hiding this comment.
besides these two cosmetic changes, I think this is good to go. I've asked @jwaiton to take a quick look to make sure I didn't miss anything important, but this is essentially approved.
|
Very nice work! I only have one suggestion. I'd suggest removing the The scaling applied with This removes a parameter that is poorly explained even in DBSCANs own documentation (in my opinion), when it just means 'distance between two nodes'. If our distance will always be 1 or slightly less, there is no need for it 😸 |
|
But do we want that? As I remember, the optimal parameters include eps = 3. Would it be better having flexibility for the eps value? |
|
Yes, I understand your point and if it is the case, I would fix eps = 1.74 But again, I'm still concerned with the clusterization process killing physical hits. For example, if a neighbour hit is not activated (for some threshold-related reason), but the next to it does it, the latter can be labelled as noise. I don't know how likely is this case. Has the performance of this algorithm been studied using NEXT100 data? I thought it had, and the result had been the set of parameters eps = 3, min_samples = 5. Maybe I misunderstood. If we are sure that with one unit of distance is enough, we can fix eps value. In any case, I'd like to discuss it next week in a meeting. |
This is what my last comment is addressing. If this was the case, setting I think the likelihood of the number of physical hits being on the order of 10 and being two pitch lengths apart for any hit is unlikely in NEXT-100, but I can check some x-rays/bremstrahlungs for this information (I believe @Ian0sborne would have this information at hand 😸 ).
I studied/implemented a similar algorithm here, which I didn't provide defaults for, but used For the scale in NEXT-100 (pitch = 15.55, maximum z bin = 4), this is a scaling of 1.06 in XY and 1.08 in Z, which would then be Lets talk next week 👍 |
To retain only direct neighbour hits (including corner) as a cluster
|
As has been discussed, the eps value is fixed to 1.8. This ensures that any direct 3D neighbour (including corners) is considered as part of the cluster. Changes in this condition must be done by applying different scaling factors. |

Summary
This PR integrates a hit clustering step (using DBSCAN) into the
Sophroniacity workflow.Key Changes
1. New Logic (
reco/hits_functions.py)cluster_tagger: A function that applies DBSCAN on an event-by-event basis.scale_xyandscale_zparameters.2. Integration (
cities/sophronia.py)clustering_paramsto the city configuration.clustering_paramsisNone(default), the city skips clustering and produces the exact same output structure as before (noclustercolumn).3. Testing (
cities/sophronia_test.py&reco/hits_functions_test.py)clustercolumn appears only when enabled.Configuration Example
To use this feature, add the following to the configuration file (these values are optimized for NEXT-100 detector geometry):