This project implements a novel density-based clustering algorithm that automatically detects clusters with varying shapes, densities, and proximities without requiring predefined parameters like the number of clusters.
The algorithm works by analyzing local density characteristics and utilizing a concept of "gauging proximity" between points and clusters. It can identify clusters in complex datasets where traditional algorithms might struggle.
Jinli Yao and Yong Zeng, "Gauging-β: Border-aware density clustering with hierarchical single-linkage." Pattern Recognition, 2025. (accepted)
- Automatic detection of clusters without specifying the number of clusters beforehand
- Handling of varying cluster shapes and densities
- Border point identification and assignment
- Adaptive thresholding for determining cluster proximity
- Support for different distance metrics (euclidean and precomputed)
- Support for different linkage methods (single and average)
The main clustering algorithm that:
- Initializes each point as its own cluster
- Merges clusters based on distance and proximity measures
- Uses adaptive thresholds to determine when to merge clusters
- Handles the assignment of border points
Identifies points with lower density that might represent borders between clusters:
- Calculates point density using a radius-based approach
- Uses quartile-based thresholding to identify lower-density points
- Provides visualization for density-based point classification
Controls the execution of the clustering algorithm on various datasets:
- Loads test data
- Applies the clustering algorithm
- Saves results and generates visualization
from gauging_proximity import Perception
import numpy as np
# Load your data
X = load_data('your_dataset.txt')
# Initialize the model
model = Perception()
# Fit the model and get cluster assignments
labels, points = model.fit(X)
# Save or visualize resultsmetric: Distance metric ('euclidean' or 'precomputed')linkage: Linkage method for clustering ('single' or 'average')log_path: Directory for saving plots and logs
The algorithm has been tested on various standard clustering datasets, including:
- Simple 2D/3D geometric shapes
- Spiral datasets
- Blobs with varying densities
- Complex shapes like rings and diamonds
- Real-world datasets
The algorithm provides visualization capabilities to:
- Show cluster assignments
- Track the evolution of clusters
- Visualize border points and their assignments