Explainable Machine Learning with SHAP and LIME
This project demonstrates how to apply Explainable AI techniques to classical machine learning models using a real medical dataset. The goal is to make black-box models such as Random Forest and XGBoost interpretable using SHAP and LIME.
The dataset used in this project is the Breast Cancer dataset from scikit-learn, which is a standard binary classification medical dataset.
Project Features
Machine Learning Models:
Random Forest (scikit-learn)
XGBoost (optional, if installed)
Explainability Methods:
Classical Feature Importance
SHAP (global explanation)
LIME (local explanation for single samples)
Automatic Outputs:
Trained models are saved in the "models" folder
Classification reports are saved in the "reports" folder
Explanation plots are saved in the "figures" folder
Installation
Make sure Python 3.9 or newer is installed.
Install required packages using:
pip install -r requirements.txt
How to Run
Run the pipeline with the following command:
python main.py
After execution, the following folders will be created automatically:
models
reports
figures
Output Description
models/ Contains trained machine learning models saved with joblib.
reports/ Contains classification reports for each model in JSON format.
figures/ Contains feature importance plots, SHAP summary plots, and LIME explanations.
Explainability Methods
Feature Importance: Uses built-in feature_importances_ from tree-based models to show which features contribute most.
SHAP: Uses SHAP TreeExplainer to generate global explanations for model predictions.
LIME: Uses LIME Tabular Explainer to generate local explanations for individual predictions.
Dataset
The Breast Cancer dataset is loaded directly from scikit-learn and contains medical diagnostic features for tumor classification.
Author
Ali Zangeneh Email: [email protected]
GitHub: https://github.com/alizangeneh
ORCID: https://orcid.org/0009-0002-5184-0571
License
This project is intended for educational and research purposes.