A collection of papers, codes, and datasets for RGB-T related tasks based on deep learning.
The main directions involved are RGB-T Fusion, RGB-T Salient Object Detection (SOD), RGB-T Vehicle Detection (VD), RGB-T Crowd Counting (CC), RGB-T Pedestrian Detection (PD), RGB-T Semantic Segmeantaion (SS), RGB-T Tracking.
More than 120+ papers have been included. 🎉🎉🎉
Feel free to star and fork~ 🌟🌟🌟
We will continue to update this repository 🏃🏃🏃
- RGB-T Fusion
- RGB-T Salient Object Detection (SOD)
- RGB-T Vehicle Detection
- RGB-T Crowd Counting
- RGB-T Pedestrian Detection
- RGB-T Semantic Segmeantaion
- RGB-T Tracking
- RGB-T ReID
- RGB-T Alignment/Registration
- 2025/12/5: RGB-T Fusion +4, RGB-T SOD +2, RGB-T SS +5, RGB-T Tracking +2
- 2025/11/3: RGB-T Fusion +1
- 2025/10/28: RGB-T SS +1
- Since 2025: RGB-T Fusion +12, RGB-T SOD +18, RGB-T VD +0, RGB-T CC +4, RGB-T PD +0, RGB-T SS +11, RGB-T Tracking +7, RGB-T ReID +6, RGB-T Alignment +0
🚀🚀🚀Update (in 2025-12-5)
| No. | Year | Model | Pub. | Title | Links |
|---|---|---|---|---|---|
| 16 | 2025 | ME-PMA | Inf. Fusion | Joint multi-view embedding with progressive multi-scale alignment for unaligned infrared-visible image fusion | Paper/Project |
| 15 | 2025 | MGDIF | Infrared Physics & Technology | An image fusion network using salient object mask-guided diffusion model | Paper |
| 14 | 2025 | BDDS | Inf. Fusion | Learning Bi-directional fusion and deformation-sensitive loss for RGB-T tiny object detection | Paper |
| 13 | 2025 | SFIFusion | Signal Proc | SFIFusion: Semantic-frequency integration for task-driven infrared and visible image fusion | Paper |
| 12 | 2025 | FreeFusion | TPAMI | FreeFusion: Infrared and Visible Image Fusion via Cross Reconstruction Learning | Paper/Project |
| 11 | 2025 | UMCFuse | TIP | UMCFuse: A Unified Multiple Complex Scenes Infrared and Visible Image Fusion Framework | Paper/Project |
| 10 | 2025 | CDTFusion | TPAMI | CDTFusion: Crossing Domain and Task for Infrared and Visible Image Fusion | Paper/Project |
| 9 | 2025 | FusionINV | TIP | FusionINV: A Diffusion-Based Approach for Multimodal Image Fusion | Paper/Project |
| 8 | 2025 | SDC-DDF | Neurocomputing | Dual-decoder conditional diffusion model based on spatial-domain difference compensation pre-fusion for infrared and visible image fusion | Paper |
| 7 | 2025 | VCIF | KBS | VCIF: Visually-compelling infrared and visible image fusion under darkness | Paper/Project |
| 6 | 2025 | LoME | TCSVT | LoME: LoRA-Driven Multimodal Extractor for RGB-X Vision Tasks | Paper/Project |
| 5 | 2025 | AMSFusion | TCSVT | AMSFusion: An Adaptive Multi-Scale Infrared and Visible Image Fusion Network Based on Attention Mechanisms | Paper |
| 4 | 2025 | IVFSCA | Infrared Physics & Technology | Infrared and visible image fusion based on spatial correlation attention | Paper |
| 3 | 2025 | HaIVFusion | JAS | HaIVFusion: Haze-free Infrared and Visible Image Fusion | Paper |
| 2 | 2025 | WSFM | TGRS | Weakly Supervised Cross Mixer for Infrared and Visible Image Fusion | Paper/Project |
| 1 | 2025 | T²EA | TCSVT | T²EA: Target-Aware Taylor Expansion Approximation Network for Infrared and Visible Image Fusion | Paper/Project |
| 2 | 2024 | HitFusion | TMM | HitFusion: Infrared and Visible Image Fusion for High-Level Vision Tasks Using Transformer | Paper |
| 1 | 2024 | CLIP | arXiv | From Text to Pixels: A Context-Aware Semantic Synergy Solution for Infrared and Visible Image Fusion | Paper/解读-知乎 |
| 2 | 2023 | RFVIF | Inf. Fusion | Feature dynamic alignment and refinement for infrared–visible image fusion: Translation robust fusion | Paper/Project |
| 1 | 2023 | MURF | TPAMI | ✨ MURF:Mutually Reinforcing Multi-Modal Image Registration and Fusion | Paper/Project |
| 1 | 2022 | RFNet | CVPR | ✨ RFNet:Unsupervised Network for Mutually Reinforcing Multi-modal Image Registration and Fusion | Paper/Project |
| 1 | 2020 | U2Fusion | TPAMI | ✨ U2Fusion: A Unified Unsupervised Image Fusion Network | Paper/Project |
| 1 | 2019 | 综述 | Inf. Fusion | Infrared and visible image fusion methods and applications: A survey | Paper |
| 1 | 2016 | GTF | Inf. Fusion | Infrared and visible image fusion via gradient transfer and total variation minimization | Paper/Project |
✨ : Group of Professor Ma at WHU
🚀🚀🚀Update (in 2025-12-5)
| No. | Year | Model | Pub. | Title | Links |
|---|---|---|---|---|---|
| 9 | 2025 | DualGazeNet | arXiv | DualGazeNet: A Biologically Inspired Dual-Gaze Query Network for Salient Object Detection | Paper |
| 8 | 2025 | HSMNet | EAAI | Hierarchical semantics guided multi-scale correlation network for alignment-free red-green-blue and thermal salient object detection | Paper/Project |
| 7 | 2025 | HyPSAM | TCSVT | HyPSAM: Hybrid Prompt-driven Segment Anything Model for RGB-Thermal Salient Object Detection | Paper/Project |
| 6 | 2025 | Samba | CVPR | Samba: A Unified Mamba-based Framework for General Salient Object Detection | Paper/Project |
| 5 | 2025 | DFINet | TIM | Cognition-Inspired Dynamic Feature Integration Network for RGB-D and RGB-T Salient Object Detection | Paper |
| 4 | 2025 | AlignSal | TGRS | Efficient Fourier Filtering Network With Contrastive Learning for AAV-Based Unaligned Bimodal Salient Object Detection | Paper/Project |
| 3 | 2025 | TwinsTNet | TIP | TwinsTNet: Broad-View Twins Transformer Network for Bi-Modal Salient Object Detection | Paper/Project |
| 2 | 2025 | KAN-SAM | ArXiv | KAN-SAM: Kolmogorov-Arnold Network Guided Segment Anything Model for RGB-T Salient Object Detection | Paper |
| 1 | 2025 | TCINet | TIM | Three-decoder Cross-modal Interaction Network for Unregistered RGB-T Salient Object Detection | Paper/Project |
| 5 | 2024 | ConTriNet | TPAMI | Divide-and-Conquer: Modality-aware Triple-Decoder Network for Robust RGB-T Salient Object Detection | Paper/Project |
| 4 | 2024 | UTDNet | NN | UTDNet: A unified triplet decoder network for multimodal salient object detection | Paper |
| 3 | 2024 | MSEDNet | NN | MSEDNet: Multi-scale fusion and edge-supervised network for RGB-T salient object detection | Paper/Project |
| 2 | 2025 | PCNet | AAAI | Alignment-Free RGB-T Salient Object Detection: A Large-scale Dataset and Progressive Correlation Network | Paper/Project |
| 20 | 2023 | 综述 | 数据采集与处理 | Deep Learning Based Salient Object Detection: A Survey(中文) | Paper |
| 19 | 2023 | LSNet | TIP | LSNet: Lightweight Spatial Boosting Network for Detecting Salient Objects in RGB-Thermal Images | Paper/Project |
| 18 | 2023 | RGBTScribble | ICME | Scribble-Supervised RGB-T Salient Object Detection | Paper/Project |
| 17 | 2023 | EAEFNet | RAL | Explicit Attention-Enhanced Fusion for RGB-Thermal Perception Tasks | Paper/Project |
| 16 | 2023 | TAGFNet | EAAI | Thermal images-aware guided early fusion network for cross-illumination RGB-T salient object detection | Paper/Project |
| 15 | 2023 | FANet | NC | Feature aggregation with transformer for RGB-T salient object detection | Paper/Project |
| 14 | 2023 | MENet | NC | MENet: Lightweight multimodality enhancement network for detecting salient objects in RGB-thermal images | Paper |
| 13 | 2023 | TIDNet | KBS | Three-stream interaction decoder network for RGB-thermal salient object detection | Paper/Project |
| 12 | 2023 | PRLNet | TIP | Position-Aware Relation Learning for RGB-Thermal Salient Object Detection | Paper/Project |
| 11 | 2023 | MGAINet | TCSVT | Multiple Graph Affinity Interactive Network and a Variable Illumination Dataset for RGBT Image Salient Object Detection | Paper/Project |
| 10 | 2023 | CAVER | TIP | CAVER: Cross-Modal View-Mixed Transformer for Bi-Modal Salient Object Detection | Paper/Project |
| 9 | 2023 | C³A | PR | Cross-modal co-feedback cellular automata for RGB-T saliency detection | Paper |
| 8 | 2023 | WaveNet | TIP | WaveNet: Wavelet Network With Knowledge Distillation for RGB-T Salient Object Detection | Paper/Project |
| 7 | 2023 | MSAFNet | ICIP | Feature Enhancement and Fusion for RGB-T Salient Object Detection | Paper |
| 6 | 2023 | AiOSOD | ArXiv | All in One: RGB, RGB-D, and RGB-T Salient Object Detection | Paper |
| 5 | 2023 | SPNet | ACM MM | Saliency Prototype for RGB-D and RGB-T Salient Object Detection | Paper/Project |
| 4 | 2023 | FFANet | PR | Frequency-aware feature aggregation network with dual-task consistency for RGB-T salient object detection | Paper |
| 3 | 2023 | UniSOD | ArXiv | Unified-modal Salient Object Detection via Adaptive Prompt Learning | Paper/Project |
| 2 | 2023 | CMDBIF-Net | TCSVT | Cross-Modality Double Bidirectional Interaction and Fusion Network for RGB-T Salient Object Detection | Paper |
| 1 | 2023 | MITFNet | TCSVT | Modality-Induced Transfer-Fusion Network for RGB-D and RGB-T Salient Object Detection | Paper |
| 17 | 2022 | TSEDNet | AI | RGB-T salient object detection via CNN feature and result saliency map fusion | Paper |
| 16 | 2022 | MIA_DPD | NC | Multi-modal Interactive Attention and Dual Progressive Decoding Network for RGB-D/T Salient Object Detection | Paper/Project |
| 15 | 2022 | CGMDRNet | TCSVT | CGMDRNet: Cross-Guided Modality Difference Reduction Network for RGB-T Salient Object Detection | Paper |
| 14 | 2022 | RGB-T-Glass-Segmentation | ArXiv | Glass Segmentation with RGB-Thermal Image Pairs | Paper/Project |
| 13 | 2022 | DCNet | TIP | Weakly Alignment-free RGBT Salient Object Detection with Deep Correlation Network | Paper/Project |
| 12 | 2022 | OSRNet | TIM | Real-time One-stream Semantic-guided Refinement Network for RGB-Thermal Salient Object Detection | Paper/Project |
| 11 | 2022 | CCFENet | TCSVT | Cross-Collaborative Fusion-Encoder Network for Robust RGB-Thermal Salient Object Detection | Paper/Project |
| 10 | 2022 | UniFusion | EAAI | Unidirectional RGB-T salient object detection with intertwined driving of encoding and fusion | Paper |
| 9 | 2022 | EAF-Net | MVA | EAF-Net: an enhancement and aggregation–feedback network for RGB-T salient object detection | Paper |
| 8 | 2022 | SwinMCNet | ArXiv | Mirror Complementary Transformer Network for RGB-thermal Salient Object Detection | Paper/Project |
| 7 | 2022 | MS-SFA | CVIU | Enabling modality interactions for RGB-T salient object detection | Paper |
| 6 | 2022 | MCFNet | AI | Modal complementary fusion network for RGB-T salient object detection | Paper/Project |
| 5 | 2022 | TNet | TMM | Does Thermal really always matter for RGB-T salient object detection | Paper/Project |
| 4 | 2022 | ICANet | Arxiv | Interactive Context-Aware Network for RGB-T Salient Object Detection | Paper |
| 3 | 2022 | MFENet | DSP | MFENet: Multitype fusion and enhancement network for detecting salient objects in RGB-T images | Paper/Project |
| 2 | 2022 | C³A | PR | Cross-modal co-feedback cellular automata for RGB-T saliency detection | Paper |
| 1 | 2022 | ACMANet | KBS | Asymmetric cross-modal activation network for RGB-T salient object detection | Paper/Project |
| 11 | 2021 | 综述 | 电子学报 | Review of the Methods for Salient Object Detection Based on Deep Learning (中文) | Paper |
| 10 | 2021 | MIDD | TIP | Multi-Interactive Dual-Decoder for RGB-Thermal Salient Object Detection | Paper/Project |
| 9 | 2021 | ECFFNet | TCSVT | ECFFNet: Effective and Consistent Feature Fusion Network for RGB-T Salient Object Detection | Paper/Results(tx48) |
| 8 | 2021 | MMNet | TCSVT | Unified Information Fusion Network for Multi-Modal RGB-D and RGB-T Salient Object Detection | Paper |
| 7 | 2021 | CGFNet | TCSVT | CGFNet: Cross-Guided Fusion Network for RGB-T Salient Object Detection | Paper/Project |
| 6 | 2021 | CSRNet | TCSVT | Efficient Context-Guided Stacked Refinement Network for RGB-T Salient Object Detection | Paper/Project |
| 5 | 2021 | TSFNet | SPL | TSFNet: Two-Stage Fusion Network for RGB-T Salient Object Detection | Paper |
| 4 | 2021 | APNet | TETCI | APNet: Adversarial Learning Assistance and Perceived Importance Fusion Network for All-Day RGB-T Salient Object Detection | Paper/Project |
| 3 | 2021 | SwinNet | TCSVT | SwinNet: Swin Transformer drives edge-aware RGB-D and RGB-T salient object detection | Paper/Project |
| 2 | 2021 | MGFM | TCSVT | Multi-graph Fusion and Learning for RGBT Image Saliency Detection | Paper |
| 1 | 2021 | DFPN-SIPM | CYBER | Salient Target Detection in RGB-T Image based on Multi-level Semantic Information | Paper |
| 4 | 2020 | ADFC-MGF | TIP | RGB-T Salient Object Detection via Fusing Multi-Level CNN Features | Paper |
| 3 | 2020 | MS-MM-ML-FFM | TCSVT | Revisiting Feature Fusion for RGB-T Salient Object Detection | Paper |
| 2 | 2020 | RGBN-SOD | AAAI | Multi-Spectral Salient Object Detection by Adversarial Domain Adaptation | Paper |
| 1 | 2020 | RGBN-SOD | TMM | Deep Domain Adaptation Based Multi-spectral Salient Object Detection | Paper |
| 4 | 2019 | 综述 | CVM | 显著物体检测综述(本文为CVM 2019期刊论文中译版) | Paper |
| 3 | 2019 | M3S-NIR | MIPR | M3S-NIR: Multi-Modal Multi-Scale Noise-Insensitive Ranking for RGB-T Saliency Detection | Paper/Project |
| 2 | 2019 | VT1000 | TMM | RGB-T Image Saliency Detection via Collaborative Graph Learning | Paper/Project |
| 1 | 2019 | VT821 | TCSVT | RGBT Salient Object Detection: Benchmark and A Novel Cooperative Ranking Approach | Paper/Project |
| 1 | 2018 | VT5000 | IGTA | RGB-T Saliency Detection Benchmark: Dataset, Baselines, Analysis and a Novel Approach | Paper/Project |
| 1 | 2017 | MDF-SVM | ISCID | Learning Multiscale Deep Features and SVM Regressors for Adaptive RGB-T Saliency Detection | Paper |
- https://github.com/zyrant/Summary-of-RGB-T-Salient-Object-Detection-and-Semantic-segmentation
- https://github.com/lz118/RGBT-Salient-Object-Detection
| No. | Year | Model | Pub. | Title | Links |
|---|---|---|---|---|---|
| 11 | 2025 | MDANet | JPTIP | Multi-stage differential-aware attention network for real-time underwater salient object detection | Paper |
| 10 | 2025 | LESOD | PR | LESOD: Lightweight and Efficient Network for RGB-D Salient Object Detection | Paper/Project |
| 9 | 2025 | PRANet | Neural Networks | Potential region attention network for RGB-D salient object detection | Paper |
| 8 | 2025 | HPI | TIP | Heterogeneous Experts and Hierarchical Perception for Underwater Salient Object Detection | Paper |
| 7 | 2025 | HDANet | TGRS | HDANet: Enhancing Underwater Salient Object Detection With Physics-Inspired Multimodal Joint Learning | Paper/Project |
| 6 | 2025 | MFINet | EAAI | Multi-modal feature integration network for Visible-Depth-Thermal salient object detection | Paper/Project |
| 5 | 2025 | LiteSalNet | TGRS | A Lightweight Multistream Framework for Salient Object Detection in Optical Remote Sensing | Paper/Project |
| 4 | 2025 | DSSN | TGRS | Hyperspectral Remote Sensing Images Salient Object Detection: The First Benchmark Dataset and Baseline | Paper/Project |
| 3 | 2025 | SPDE | TCSVT | Underwater Salient Object Detection via Dual-Stage Self-Paced Learning and Depth Emphasis | Paper/Project |
| 2 | 2025 | TRNet | TCSVT | TRNet: Two-Tier Recursion Network for Co-Salient Object Detection | Paper/Project |
| 1 | 2025 | IFENet | TIP | IFENet: Interaction, Fusion, and Enhancement Network for V-D-T Salient Object Detection | Paper/Project |
| 2 | 2024 | MambaSOD | arXiv | MambaSOD: Dual Mamba-Driven Cross-Modal Fusion Network for RGB-D Salient Object Detection | Paper/Project |
| 1 | 2024 | Saliency-Ranking-Paradigm | TIP | Rethinking Object Saliency Ranking: A Novel Whole-Flow Processing Paradigm | Paper/Project |
| 1 | 2023 | MFFNet | TMM | MFFNet: Multi-modal Feature Fusion Network for V-D-T Salient Object Detection | Paper |
| 1 | 2019 | LV-Net | TGRS | Nested Network with Two-Stream Pyramid for Salient Object Detection in Optical Remote Sensing Images | Paper/Project |
- VT821 Dataset: Paper, link
- VT1000 Dataset: Paper, link
- VT5000 Dataset: Paper, link [y9jj]
- VT723 Dataset: Paper, link
- Python version: CPU version and GPU version
- Matlab version: here(include weighted F) and here
🚀🚀🚀Update (in 2025-06-26)
| No. | Year | Model | Pub. | Title | Links |
|---|---|---|---|---|---|
| 4 | 2023 | CALNet | ACM MM | Multispectral Object Detection via Cross-Modal Conflict-Aware Learning | Paper |
| 3 | 2023 | LRAF-Net | TNNLS | LRAF-Net: Long-Range Attention Fusion Network for Visible–Infrared Object Detection | Paper |
| 2 | 2023 | GF-detection | RS | GF-Detection: Fusion with GAN of Infrared and Visible Images for Vehicle Detection at Nighttime | Paper |
| 1 | 2023 | CMAFF | PR | Cross-modality attentive feature fusion for object detection in multispectral remote sensing imagery | Paper |
| 3 | 2022 | TSRA | ECCV | Translation, Scale and Rotation: Cross-Modal Alignment Meets RGB-Infrared Vehicle Detection | Paper |
| 2 | 2022 | UA-CMDet | TCSVT | Drone-based RGB-Infrared Cross-Modality Vehicle Detection via Uncertainty-Aware Learning | Paper |
| 1 | 2022 | RISNet | RS | Improving RGB-Infrared Object Detection by Reducing Cross-Modality Redundancy | Paper |
- DroneVehicle: partially aligned, link
- VEDAI: strictly aligned, link
- Multispectral Datasets for Detection and Segmentation: with Segmentation annotation, link
🚀🚀🚀Update (in 2025-08-30)
| No. | Year | Model | Pub. | Title | Links |
|---|---|---|---|---|---|
| 3 | 2025 | RGBT-Booster | JIOT | RGBT-Booster: Detail-Boosted Fusion Network for RGB-Thermal Crowd Counting With Local Contrastive Learning | Paper/Project |
| 2 | 2025 | MHKDF | TCSVT | A Mutual Head Knowledge Distillation Framework for Lightweight RGB-T Crowd Counting | Paper/Project |
| 1 | 2025 | MISF-Net | TMM | MISF-Net: Modality-Invariant and -Specific Fusion Network for RGB-T Crowd Counting | Paper/Project |
| 2 | 2022 | MAFNet | arXiv | MAFNet: A Multi-Attention Fusion Network for RGB-T Crowd Counting | Paper |
| 1 | 2022 | MAT | ICME | Multimodal Crowd Counting with Mutual Attention Transformers | Paper |
| 1 | 2021 | IADM | CVPR | Cross-Modal Collaborative Representation Learning and a Large-Scale RGBT Benchmark for Crowd Counting | Paper/Project |
| 1 | 2020 | MMCCN | ACCV | RGB-T Crowd Counting from Drone: A Benchmark and MMCCN Network | Paper/Project |
🚀🚀🚀Update (in 2025-06-27)
| No. | Year | Model | Pub. | Title | Links |
|---|---|---|---|---|---|
| 2 | 2024 | TFDet | TNNLS | TFDet: Target-Aware Fusion for RGB-T Pedestrian Detection | Paper/Project |
| 1 | 2024 | M2FNet | TMM | M2FNet: Mask-guided Multi-level Fusion for RGB-T Pedestrian Detection | Paper |
| 7 | 2023 | AANet | ACM MM | Attentive Alignment Network for Multispectral Pedestrian Detection | Paper |
| 6 | 2023 | CALNet | ACM MM | Multispectral Object Detection via Cross-Modal Conflict-Aware Learning | Paper |
| 5 | 2023 | SMPD | TCSVT | Stabilizing Multispectral Pedestrian Detection with Evidential Hybrid Fusion | Paper |
| 4 | 2023 | CSSA | CVPRW | Multimodal Object Detection by Channel Switching and Spatial Attention | Paper |
| 3 | 2023 | MFPT | TITS | Multi-Modal Feature Pyramid Transformer for RGB-Infrared Object Detection | Paper |
| 2 | 2023 | MCMHE-CAF | TMM | Multiscale Cross-modal Homogeneity Enhancement and Confidence-aware Fusion for Multispectral Pedestrian Detection | Paper |
| 1 | 2023 | HAFNet | RS | HAFNet: Hierarchical Attentive Fusion Network for Multispectral Pedestrian Detection | Paper |
| 7 | 2022 | ProbEn | ECCV | Multimodal Object Detection via Probabilistic Ensembling | Paper |
| 6 | 2022 | DCMNet | ACM MM | Learning a Dynamic Cross-Modal Network for Multispectral Pedestrian Detection | Paper |
| 5 | 2022 | CMPD | TMM | Confidence-aware Fusion using Dempster-Shafer Theory for Multispectral Pedestrian Detection | Paper |
| 4 | 2022 | AMSF | PRCV | Attention-Guided Multi-modal and Multi-scale Fusion for Multispectral Pedestrian Detection | Paper |
| 3 | 2022 | RISNet | ICIP | Improving RGB-Infrared Pedestrian Detection by Reducing Cross-Modality Redundancy | Paper |
| 2 | 2022 | MuFEm | TITS | Spatio-contextual deep network-based multimodal pedestrian detection for autonomous driving | Paper |
| 1 | 2022 | Sensory Fusion with the YOLOv4 | Sensors | Adopting the YOLOv4 Architecture for Low-LatencyMultispectral Pedestrian Detection in Autonomous Driving | Paper |
| 5 | 2021 | CMPI | ICIP | Deep Active Learning from Multispectral Data Through Cross-Modality Prediction Inconsistency | Paper |
| 4 | 2021 | MCFF | Sensors | Attention Fusion for One-Stage Multispectral Pedestrian Detection | Paper |
| 3 | 2021 | UFF-UCG | TCSVT | Uncertainty-Guided Cross-Modal Learning for Robust Multispectral Pedestrian Detection | Paper |
| 2 | 2021 | CFL | TCSVT | Deep Cross-modal Representation Learning and Distillation for Illumination-invariant Pedestrian Detection | Paper |
| 1 | 2021 | GAFF | WACV | Guided Attentive Feature Fusion for Multispectral Pedestrian Detection | Paper |
| 3 | 2020 | Cyclic Fuse-and-Refine | ICIP | Multispectral Fusion for Object Detection with Cyclic Fuse-and-Refine Blocks | Paper |
| 2 | 2020 | MBNet | ECCV | Improving Multispectral Pedestrian Detection by Addressing Modality Imbalance Problems | Paper/Project |
| 1 | 2020 | FCRPNMPD | BMVC | Anchor-free Small-scale Multispectral Pedestrian Detection | Paper/Project |
| 4 | 2019 | AR-CNN | ICCV | Weakly Aligned Cross-Modal Learning for Multispectral Pedestrian Detection | Paper/Project |
| 3 | 2019 | BSSDNN | ISPRS | Box-level Segmentation Supervised Deep Neural Networks for Accurate and Real-time Multispectral Pesdestrian Detecion | Paper/Project |
| 2 | 2019 | CIAN | Information Fusion | Cross-modality interactive attention network for multispectral pedestrian detection | Paper/Project |
| 1 | 2019 | TS-RPN | Information Fusion | Pedestrian detection with unsupervised multispectral feature learning using deep neural networks | Paper |
| 2 | 2018 | MSDS-RCNN | BMVC | Multispectral Pedestrian Detection via Simultaneous Detection and Segmentation | Paper/Project/Project |
| 1 | 2018 | CWF-APF | PR | Unified Multi-spectral Pedestrian Detection Based on Probabilistic Fusion Networks | Paper |
| 1 | 2016 | ConvNet | BMVC | Multispectral Deep Neural Networks for Pedestrian Detection | Paper/Project |
| 1 | 2015 | ACF | CVPR | Multispectral Pedestrian Detection Benchmark Dataset and Baseline | Paper/Project |
KAIST dataset, CVC-14 dataset , FLIR dataset, LLVIP dataset, M3FD dataset
- Improved KAIST Testing Annotations provided by Liu et al.Link to download
- Sanitized KAIST Training Annotations provided by Li et al.Link to download
- Improved KAIST Training Annotations provided by Zhang et al.Link to download
- Evalutaion codes.Link to download
- Annotation: vbb format->xml format.Link to download
🚀🚀🚀Update (in 2025-12-5)
| No. | Year | Model | Pub. | Title | Links |
|---|---|---|---|---|---|
| 16 | 2025 | DTDC | HISS | Semi-supervised abdominal multi-organ segmentation via dual-task de-biased consistency | Paper |
| 15 | 2025 | VESSA | arXiv | Vision-Language Enhanced Foundation Model for Semi-supervised Medical Image Segmentation | Paper |
| 14 | 2025 | AMFC | IVC | Semi-supervised medical image segmentation via anatomy-preserving consistency training | Paper |
| 13 | 2025 | MCLNet | Inf. Fusion | Bridging RGB-T image fusion and semantic segmentation via multi-task collaborative learning | Paper |
| 12 | 2025 | AdaptRGB-t | IJON | AdaptRGB-t: Adaptive RGB-t semantic segmentation via efficient parameter-tuning with textual guidance | Paper |
| 11 | 2025 | AMDANet | ICCV | AMDANet: Attention-Driven Multi-Perspective Discrepancy Alignment for RGB-Infrared Image Fusion and Segmentation | Paper |
| 10 | 2025 | EBFNet | JSEN | Evidence-based Fusion for Low-quality RGB-T Semantic Segmentation | Paper |
| 9 | 2025 | Cascaded Embedded-Feature Pyramid Networks | Neurocomputing | Cascaded embedded-FPN: A cross-modality multi-scale feature fusion network for varied-sized objects semantic segmentation | Paper |
| 8 | 2025 | WSRT | Neurocomputing | A weight-sharing based RGB-T image semantic segmentation network with hierarchical feature enhancement and progressive feature fusion | Paper/Project |
| 7 | 2025 | HMFENet | TITS | HMFENet: Hierarchical Matching Guided Feature Enhancement Network for Few-Shot RGB-Thermal Urban Scene Segmentation | Paper/Project |
| 6 | 2025 | CFDHI-Net | TITS | CFDHI-Net: Correlation-Driven Feature Decoupling and Hierarchical Integration Network for RGB-Thermal Semantic Segmentation | Paper/Project |
| 5 | 2025 | ERTFNet | CVIU | ERTFNet: Enhanced RGB-T Fusion Network for semantic segmentation by integrating thermal edge features | Paper |
| 4 | 2025 | IQSeg | PR | Implicit alignment and query refinement for RGB-T semantic segmentation | Paper |
| 3 | 2025 | CCFFNet | Information Fusion | Towards efficient RGB-T semantic segmentation via feature generative distillation strategy | Paper |
| 2 | 2025 | SICFNet | ESWA | SICFNet: Shared Information Interaction and Complementary Feature Fusion Network for RGB-T traffic scene parsing | Paper/Project |
| 1 | 2025 | SCRNet | PR | Resolving semantic conflicts in RGB-T semantic segmentation | Paper |
| 3 | 2024 | OpenRSS | ECCV | Open-Vocabulary RGB-Thermal Semantic Segmentation | Paper/Project |
| 2 | 2024 | MDBFNet | TIV | Multi-branch Differential Bidirectional Fusion Network for RGB-T Semantic Segmentation | Paper |
| 1 | 2024 | CAITNet | TMM | Context-Aware Interaction Network for RGB-T Semantic Segmentation | Paper/Project |
| 12 | 2023 | ECGF-ARL | TITS | Embedded Control Gate Fusion and Attention Residual Learning for RGB–Thermal Urban Scene Parsing | Paper/Project |
| 11 | 2023 | EAEFNet | RAL | Explicit Attention-Enhanced Fusion for RGB-Thermal Perception Tasks | Paper/Project |
| 10 | 2023 | CCFFNet | PR | Complementarity-aware cross-modal feature fusion network for RGB-T semantic segmentation | Paper/Project |
| 9 | 2023 | MMSMCNet | TCSVT | MMSMCNet: Modal Memory Sharing and Morphological Complementary Networks for RGB-T Urban Scene Semantic Segmentation | Paper/Project |
| 8 | 2023 | CACFNet | TIV | CACFNet: Cross-Modal Attention Cascaded Fusion Network for RGB-T Urban Scene Parsing | Paper |
| 7 | 2023 | DBCNet | TSMC | DBCNet: Dynamic Bilateral Cross-Fusion Network for RGB-T Urban Scene Understanding in Intelligent Vehicles | Paper |
| 6 | 2023 | SGFNet | TCSVT | SGFNet: Semantic-Guided Fusion Network for RGB-Thermal Semantic Segmentation | Paper/Project |
| 5 | 2023 | DPLNet | arxiv | Efficient Multimodal Semantic Segmentation via Dual-Prompt Learning | Paper/Project |
| 4 | 2023 | FI | TITS | A RGB-Thermal Image Segmentation Method Based on Parameter Sharing and Attention Fusion for Safe Autonomous Driving | Paper |
| 3 | 2023 | UTFNet | GRSL | UTFNet: Uncertainty-Guided Trustworthy Fusion Network for RGB-Thermal Semantic Segmentation | Paper/Project |
| 2 | 2023 | SFAF-MA | TIM | SFAF-MA: Spatial Feature Aggregation and Fusion With Modality Adaptation for RGB-Thermal Semantic Segmentation | Paper/Project |
| 1 | 2023 | SASEM | TIV | On Exploring Shape and Semantic Enhancements for RGB-X Semantic Segmentation | Paper/Project |
| 8 | 2022 | EGFNet | AAAI | Edge-aware guidance fusion network for RGB–thermal scene parsing | Paper/Project |
| 7 | 2022 | MTANet | TIV | MTANet: Multitask-Aware Network with Hierarchical Multimodal Fusion for RGB-T Urban Scene Understanding | Paper/Project |
| 6 | 2022 | CMX | TITS | CMX: Cross-Modal Fusion for RGB-X Semantic Segmentation with Transformers | Paper/Project |
| 5 | 2022 | ARTSeg | ACPR | ARTSeg: Employing Attention for Thermal Images Semantic Segmentation | Paper |
| 4 | 2022 | GCNet | Neurocomputing | GCNet: Grid-Like Context-Aware Network for RGB-Thermal Semantic Segmentation | Paper |
| 3 | 2022 | LASNet | TCSVT | RGB-T Semantic Segmentation with Location, Activation, and Sharpening | Paper/Project |
| 2 | 2022 | GEBNet | SPL | GEBNet: Graph-Enhancement Branch Network for RGB-T Scene Parsing | Paper/Project |
| 1 | 2022 | FDCNet | TCSVT | A Feature Divide-and-Conquer Network for RGB-T Semantic Segmentation | Paper |
| 8 | 2021 | GMNet | TIP | GMNet: Graded-Feature Multilabel-Learning Network for RGB-Thermal Urban Scene Semantic Segmentation | Paper/Project |
| 7 | 2021 | ABMDRNet | CVPR | ABMDRNet: Adaptive-weighted Bi-directional Modality Difference Reduction Network for RGB-T Semantic Segmentation | Paper |
| 6 | 2021 | FEANet | IROS | FEANet: Feature-Enhanced Attention Network for RGB-Thermal Real-time Semantic Segmentation | Paper/Project |
| 5 | 2021 | MLFNet | Measurement | Robust semantic segmentation based on RGB-thermal in variable lighting scenes | Paper |
| 4 | 2021 | MFFENet | TMM | MFFENet: Multiscale Feature Fusion and Enhancement Network for RGBThermal Urban Road Scene Parsing | Paper/Project |
| 3 | 2021 | MMNet | Applied Intelligence | MMNet: Multi-modal multi-stage network for RGB-T image semantic segmentation | Paper |
| 2 | 2021 | CCAFFMNet | Neurocomputing | CCAFFMNet: Dual-spectral semantic segmentation network with channel-coordinate attention feature fusion module | Paper |
| 1 | 2021 | HeatNet | IROS | HeatNet: Bridging the Day-Night Domain Gap in Semantic Segmentation with Thermal Images | Paper/Project |
| 3 | 2020 | PST900 | ICRA | PST900: RGB-Thermal Calibration, Dataset and Segmentation Network | Paper/Project |
| 2 | 2020 | FuseSeg | TASE | FuseSeg: Semantic Segmentation of Urban Scenes Based on RGB and Thermal Data Fusion | Paper |
| 1 | 2020 | spark | CINE | Using thermal intensities to build conditional random fields for object segmentation at night | Paper |
| 1 | 2019 | RTFNet | RAL | RTFNet: RGB-Thermal Fusion Network for Semantic Segmentation of Urban Scenes | Paper/Project |
| 1 | 2017 | MFNet | IROS | MFNet: Towards Real-Time Semantic Segmentation for Autonomous Vehicles with Multi-Spectral Scenes | Paper/Project |
🚀🚀🚀Update (in 2025-12-5)
| No. | Year | Model | Pub. | Title | Links |
|---|---|---|---|---|---|
| 8 | 2025 | MoKA-HP | IJON | MoKA-HP: Motion-aware KAdaptation with historical prompts for efficient and robust RGB-T tracking | Paper |
| 7 | 2025 | QSTNet | TIP | Quality-Aware Spatio-Temporal Transformer Network for RGBT Tracking | Paper/Project |
| 6 | 2025 | TUMFNet | IJCAI | Template-based Uncertainty Multimodal Fusion Network for RGBT Tracking | Paper/Project |
| 5 | 2025 | FMTrack | TCSVT | FMTrack: Frequency-aware Interaction and Multi-Expert Fusion for RGB-T Tracking | Paper/Project |
| 4 | 2025 | LRPD | ICMR | Exploiting Multimodal Prompt Learning and Distillation for RGB-T Tracking | Paper |
| 3 | 2025 | MRTTrack | PR | Mining representative tokens via transformer-based multi-modal interaction for RGB-T tracking | Paper/Project |
| 2 | 2025 | MGNet | Neural Networks | MGNet: RGBT tracking via cross-modality cross-region mutual guidance | Paper/Project |
| 1 | 2025 | AETrack | TCSVT | Adaptive Expert Decision for RGB-T Tracking | Paper |
| 2 | 2024 | TGTrack | TCSVT | Top-Down Cross-Modal Guidance for Robust RGB-T Tracking | Paper |
| 1 | 2024 | BAT | AAAI | Bi-directional Adapter for Multimodal Tracking | Paper/Project |
| 6 | 2023 | MPLT | arXiv | RGB-T Tracking via Multi-Modal Mutual Prompt Learning | Paper/Project |
| 5 | 2023 | XMSNet | arXiv | Object Segmentation by Mining Cross-Modal Semantics | Paper/Project |
| 4 | 2023 | MTNet | ICME | MTNet: Learning Modality-aware Representation with Transformer for RGBT Tracking | Paper |
| 3 | 2023 | ViPT | CVPR | Visual Prompt Multi-Modal Tracking | Paper/Project |
| 2 | 2023 | CMD | CVPR | Efficient RGB-T Tracking via Cross-Modality Distillation | Paper |
| 1 | 2023 | TBSI | CVPR | Bridging Search Region Interaction with Template for RGB-T Tracking | Paper/Project |
| 4 | 2022 | RFC | SCIS | RGBT tracking via reliable feature configuration | Paper |
| 3 | 2022 | APFNet | AAAI | Attribute-Based Progressive Fusion Network for RGBT Tracking | Paper/Project |
| 2 | 2022 | GAP-WRS | ACM MM | Dense Feature Aggregation and Pruning for RGBT Tracking | Paper |
| 1 | 2022 | ProTrack | ACM MM | Prompting for Multi-Modal Tracking | Paper |
| 3 | 2021 | JMMAC | TIP | Jointly Modeling Motion and Appearance Cues for Robust RGB-T Tracking | Paper |
| 2 | 2021 | AENet | IJCV | Learning Adaptive Attribute-Driven Representation for Real-Time RGB-T Tracking | Paper |
| 1 | 2021 | FANet | TIV | Quality-Aware Feature Aggregation Network for Robust RGBT Tracking | Paper |
| 3 | 2020 | CANN | ECCV | Challenge-Aware RGBT Tracking | Paper |
| 2 | 2020 | 综述 | Information Fusion | Object fusion tracking based on visible and infrared images: A comprehensive review | Paper |
| 1 | 2020 | CMPP | CVPR | Cross-Modal Pattern-Propagation for RGB-T Tracking | Paper |
| 1 | 2019 | RGBT234 | PR | RGB-T object tracking: Benchmark and baseline | Paper |
- VIReID: visible-infrared person re-identification
🚀🚀🚀Update (in 2025-07-25)
| No. | Year | Model | Pub. | Title | Links |
|---|---|---|---|---|---|
| 3 | 2025 | MASM | Neural Networks | Memory-augmented shuffled meta learning for visible–infrared person re-identification | Paper |
| 2 | 2025 | PMCM | PR | Visible–infrared person re-identification via patch-mixed cross-modality learning | Paper |
| 1 | 2025 | DMANet | PR | DMANet: Dual-modality alignment network for visible–infrared person re-identification | Paper |
| 5 | 2024 | MIA | Image and Vision Computing | Modality interactive attention for cross-modality person re-identification | Paper |
| 4 | 2024 | PDEM | Neurocomputing | Progressive discrepancy elimination for visible–infrared person re-identification | Paper/Project |
| 3 | 2024 | WF-CAMReViT | PR | Enhanced visible–infrared person re-identification based on cross-attention multiscale residual vision transformer | Paper |
| 2 | 2024 | CMGR | Neural Networks | Cross-modal group-relation optimization for visible–infrared person re-identification | Paper |
| 1 | 2024 | FDNM | arXiv | Frequency Domain Nuances Mining for Visible-Infrared Person Re-identification | Paper |
- SYSU-MM01
- RegDB
- LLCM
🚀🚀🚀Update (in 2023-12-26)
| No. | Year | Model | Pub. | Title | Links |
|---|---|---|---|---|---|
| 1 | 2023 | XMSNet | ACM MM | Object Segmentation by Mining Cross-Modal Semantics | Paper |