Skip to content

End-to-end explainable AI pipeline for medical classification using Random Forest and XGBoost with SHAP and LIME for global and local interpretability. Designed for transparent, trustworthy machine learning in healthcare and research applications.

Notifications You must be signed in to change notification settings

alizangeneh/explainable-ai-medical-ml-shap-lime

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

Explainable Machine Learning with SHAP and LIME

This project demonstrates how to apply Explainable AI techniques to classical machine learning models using a real medical dataset. The goal is to make black-box models such as Random Forest and XGBoost interpretable using SHAP and LIME.

The dataset used in this project is the Breast Cancer dataset from scikit-learn, which is a standard binary classification medical dataset.

Project Features

Machine Learning Models:

Random Forest (scikit-learn)

XGBoost (optional, if installed)

Explainability Methods:

Classical Feature Importance

SHAP (global explanation)

LIME (local explanation for single samples)

Automatic Outputs:

Trained models are saved in the "models" folder

Classification reports are saved in the "reports" folder

Explanation plots are saved in the "figures" folder

Installation

Make sure Python 3.9 or newer is installed.

Install required packages using:

pip install -r requirements.txt

How to Run

Run the pipeline with the following command:

python main.py

After execution, the following folders will be created automatically:

models

reports

figures

Output Description

models/ Contains trained machine learning models saved with joblib.

reports/ Contains classification reports for each model in JSON format.

figures/ Contains feature importance plots, SHAP summary plots, and LIME explanations.

Explainability Methods

Feature Importance: Uses built-in feature_importances_ from tree-based models to show which features contribute most.

SHAP: Uses SHAP TreeExplainer to generate global explanations for model predictions.

LIME: Uses LIME Tabular Explainer to generate local explanations for individual predictions.

Dataset

The Breast Cancer dataset is loaded directly from scikit-learn and contains medical diagnostic features for tumor classification.

Author

Ali Zangeneh Email: [email protected]

GitHub: https://github.com/alizangeneh

ORCID: https://orcid.org/0009-0002-5184-0571

License

This project is intended for educational and research purposes.

About

End-to-end explainable AI pipeline for medical classification using Random Forest and XGBoost with SHAP and LIME for global and local interpretability. Designed for transparent, trustworthy machine learning in healthcare and research applications.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages