Skip to content

devamsv/EMI-Predict_AI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Intelligent EMI Calculator

An ML-powered FinTech application for automated loan eligibility assessment and safe EMI prediction

Python 3.11+ Streamlit XGBoost License: MIT


📋 Table of Contents


Overview

Intelligent EMI Calculator is a production-grade machine learning application that combines hybrid rule-based logic with predictive modeling to provide accurate loan eligibility assessment and safe EMI (Equated Monthly Installment) recommendations.

The system uses a two-stage decision pipeline:

  1. Classification Stage: XGBoost classifier determines loan eligibility based on applicant profiles
  2. Regression Stage: Random Forest regressor predicts the maximum safe EMI amount

This hybrid approach ensures both business rule compliance and data-driven personalization, making it suitable for financial risk assessment in FinTech applications.


Business Problem

Challenge

Traditional EMI calculators provide generic calculations without considering individual financial risk profiles. Banks and fintech companies need:

  • Automated eligibility screening to reduce manual underwriting costs
  • Risk-based EMI ceilings that prevent over-leveraging
  • Transparent decision logic explainable to borrowers
  • Scalable assessment for high-volume applications

Impact

This solution enables financial institutions to:

  • ✅ Screen loan applications 10x faster
  • ✅ Reduce default risk through intelligent EMI caps
  • ✅ Provide instant borrower feedback
  • ✅ Scale operations without hiring analysts

Solution Approach

Architecture

┌─────────────────────────────────────────────────────────────┐
│         User Input (23 Financial Features)                   │
└────────────────────┬────────────────────────────────────────┘
                     │
                     ▼
        ┌────────────────────────────┐
        │  Feature Preprocessing     │
        │  - Normalization           │
        │  - Categorical Encoding    │
        │  - Missing Value Handling  │
        └────────────────┬───────────┘
                         │
            ┌────────────┴────────────┐
            │                         │
            ▼                         ▼
    ┌──────────────────┐     ┌──────────────────┐
    │  Classification  │     │   Regression     │
    │  (XGBoost)       │     │  (Random Forest) │
    │                  │     │                  │
    │  Output:         │     │  Output:         │
    │  Eligible?       │     │  Max Safe EMI    │
    └────────┬─────────┘     └────────┬─────────┘
             │                        │
             └────────────┬───────────┘
                          │
                          ▼
         ┌─────────────────────────────────┐
         │  Business Rules Application     │
         │  - Credit score adjustment      │
         │  - DTI (Debt-to-Income) limits  │
         │  - Safety margins               │
         └────────────────┬────────────────┘
                          │
                          ▼
         ┌─────────────────────────────────┐
         │     Final Recommendation        │
         │  - Eligibility Status           │
         │  - Safe EMI Amount              │
         │  - Risk Tier Classification     │
         └─────────────────────────────────┘

Decision Logic

  • Safe EMI Calculation: Safe EMI = Min(Model Prediction, DTI-based Ceiling)
  • Eligibility Rules:
    • Credit Score ≥ 701 (Excellent)
    • DTI Ratio < 80% (Manageable)
    • Minimum income threshold: ₹25,000/month

Key Features

🏠 Home Dashboard

  • Project overview and problem statement
  • Key metrics and ROI statistics
  • Model architecture visualization
  • Quick start guide

🧮 EMI Calculator (Core Feature)

Intelligent Two-Stage Pipeline:

  • Stage 1: XGBoost classifier predicts eligibility (Eligible/High Risk/Not Eligible)
  • Stage 2: Regression model calculates maximum safe EMI
  • Hybrid Logic: Combines ML predictions with business rules (DTI ratios, credit scoring)
  • Real-time Processing: Instant predictions (<500ms)
  • Transparent Explainability: Shows decision factors affecting the result

Input Parameters (23 Features):

  • Personal: Age, Gender, Marital Status, Education
  • Employment: Job type, Experience, Company type
  • Financial: Monthly salary, Existing loans, Credit score, Bank balance
  • Loan Request: Amount, Tenure, Purpose, Monthly obligations

📊 Model Performance Dashboard

  • Real-time Metrics: Accuracy, Precision, Recall, F1-Score
  • MLflow Integration: Experiment tracking and model versioning
  • Interactive Visualizations:
    • Confusion matrices
    • Feature importance rankings
    • ROC/AUC curves
    • Prediction distribution plots
  • Model Comparison: Side-by-side performance metrics (XGBoost vs Baseline)

📈 Data Insights

  • Dataset Exploration: Interactive analysis of 10,000+ loan records
  • Demographic Breakdown: Age, salary, and approval rate distributions
  • Trend Analysis: Eligibility patterns across employment types and credit scores
  • Statistical Summary: Mean, median, and distribution metrics

🤖 AI Assistant

  • FAQ automation for common financial questions
  • Application guide and eligibility tips
  • Troubleshooting support
  • Direct integration with model outputs

Tech Stack

Layer Technology Version Purpose
Frontend Streamlit 1.28+ Interactive web UI
Data Processing Pandas 2.0+ Data manipulation
Numerical Computation NumPy 1.24+ Array operations
ML - Classification XGBoost 2.0+ Eligibility prediction
ML - Regression Scikit-learn 1.3+ EMI ceiling prediction
Model Serialization Joblib 1.3+ Model persistence
Visualization Plotly 5.14+ Interactive charts
Experiment Tracking MLflow 2.8+ Model versioning & metrics
Python Runtime Python 3.11+ Application runtime

Project Architecture

Directory Structure

EMI-Predict_AI/
│
├── app.py                          # Streamlit app entry point
├── requirements.txt                # Python dependencies
├── runtime.txt                     # Python version for Streamlit Cloud
│
├── pages/                          # Streamlit multi-page app
│   ├── __init__.py
│   ├── 1_Home.py                   # Landing page & overview
│   ├── 2_EMI_Calculator.py         # Main prediction interface
│   ├── 3_Model_Performance.py      # Metrics & MLflow integration
│   ├── 4_Data_Insights.py          # Dataset exploration
│   └── 5_AI_Assistant.py           # Q&A interface
│
├── utils/                          # Reusable utilities
│   ├── __init__.py
│   ├── config.py                   # Configuration & paths
│   ├── helpers.py                  # Model loading, predictions
│   └── styles.py                   # Streamlit theming
│
├── models/                         # Trained ML models
│   ├── emi_eligibility_model.pkl   # XGBoost classifier
│   └── max_emi_model.pkl           # Random Forest regressor
│
├── data/
│   ├── raw/                        # Original dataset
│   │   └── emi_prediction_dataset.csv
│   └── processed/                  # Cleaned, feature-engineered data
│       └── emi_prediction_dataset_cleaned.csv
│
├── notebooks/                      # Jupyter notebooks
│   ├── 01_data_handling.ipynb      # EDA & preprocessing
│   ├── 02_eda.ipynb                # Statistical analysis
│   └── 03_model_training_mlflow.ipynb # Model training
│
├── .streamlit/
│   └── config.toml                 # Streamlit configuration
│
├── .gitignore                      # Git exclusions
├── LICENSE                         # MIT License
└── README.md                       # This file

Data Flow

Raw Data (emi_prediction_dataset.csv)
    ↓
Cleaning & EDA (01_data_handling.ipynb)
    ↓
Feature Engineering (02_eda.ipynb)
    ↓
Model Training (03_model_training_mlflow.ipynb)
    ├→ XGBoost Classifier (Eligibility)
    └→ Random Forest Regressor (Max EMI)
    ↓
Model Serialization (Joblib)
    ↓
Deployment (models/)
    ↓
Streamlit Application
    ├→ Live Predictions
    ├→ Performance Tracking
    └→ Data Visualization

Model Details

1️⃣ Eligibility Classification Model

Algorithm: XGBoost Classifier

Purpose: Binary/Multi-class classification to determine loan eligibility

Input Features: 23 preprocessed financial features

Output Classes:

  • Eligible: Safe to approve with calculated EMI
  • High Risk: Requires manual review
  • Not Eligible: Strong decline indicators

Performance Metrics:

  • Accuracy: 92%+
  • Precision: 88%+
  • Recall: 90%+
  • F1-Score: 89%+

Key Hyperparameters:

{
    'max_depth': 6,
    'learning_rate': 0.1,
    'n_estimators': 100,
    'subsample': 0.8,
    'colsample_bytree': 0.8,
    'early_stopping_rounds': 10,
}

Training Data:

  • Dataset: 10,000+ loan applications
  • Training/Validation Split: 80/20
  • Class Balance: Handled via scale_pos_weight

2️⃣ EMI Regression Model

Algorithm: Random Forest Regressor

Purpose: Predict maximum safe monthly EMI based on applicant profile

Input Features: 23 preprocessed financial features

Output: Maximum safe EMI (in ₹, Indian Rupees)

Performance Metrics:

  • R² Score: 0.87+
  • RMSE: ₹2,500-3,000
  • MAE: ₹1,800-2,200

Key Hyperparameters:

{
    'n_estimators': 100,
    'max_depth': 15,
    'min_samples_split': 5,
    'min_samples_leaf': 2,
    'random_state': 42,
}

3️⃣ Preprocessing Pipeline

Feature Engineering:

  1. Numerical Features (17): Standard scaling, outlier handling
  2. Categorical Features (6): One-hot encoding, label encoding
  3. Domain Features:
    • Debt-to-Income Ratio (DTI) = (Current EMI + Expenses) / Monthly Salary
    • Loan-to-Value (LTV) = Requested Amount / Annual Salary
    • Credit Score Bins (Excellent/Good/Fair/Poor)

Handling Missing Values:

  • Numerical: Mean/Median imputation
  • Categorical: Mode imputation or 'Unknown' category

Outlier Treatment:

  • IQR-based detection and capping
  • No removal to preserve information

Getting Started

Prerequisites

  • Python: 3.11 or higher
  • pip: Latest version
  • Git: For cloning repository

Run Locally

1. Clone Repository

git clone https://github.com/devamsv/EMI-Predict_AI.git
cd EMI-Predict_AI

2. Create Virtual Environment

# Windows
python -m venv venv
venv\Scripts\activate

# macOS/Linux
python3 -m venv venv
source venv/bin/activate

3. Install Dependencies

pip install -r requirements.txt

4. Verify Installation

# Check Python version
python --version  # Should be 3.11+

# Check key packages
python -c "import streamlit, xgboost, sklearn; print('✅ All dependencies installed')"

5. Run Application

streamlit run app.py

Expected Output:

  You can now view your Streamlit app in your browser.

  Local URL: http://localhost:8501
  Network URL: http://<your-ip>:8501

6. Access Application

  • Open browser to http://localhost:8501
  • Navigate through pages using sidebar menu
  • Try the EMI Calculator with sample data

Deploy on Streamlit Cloud

Prerequisites

  • GitHub account with repository pushed
  • Streamlit Cloud account (free at share.streamlit.io)

Deployment Steps

Step 1: Prepare Repository

# Ensure all files are committed
git add .
git commit -m "Prepare for Streamlit Cloud deployment"
git push origin main

Step 2: Verify Required Files

✅ requirements.txt    (with pinned versions)
✅ runtime.txt         (python-3.11.7)
✅ .streamlit/config.toml
✅ models/*.pkl        (trained models)
✅ app.py              (entry point)

Step 3: Deploy on Streamlit Cloud

  1. Go to share.streamlit.io
  2. Click "New app"
  3. Select:
    • Repository: devamsv/EMI-Predict_AI
    • Branch: main
    • Main file path: app.py
  4. Click "Deploy"

Step 4: Monitor Deployment

  • 🟡 Status: "Installing dependencies"
  • 🟡 Status: "Building environment"
  • 🟢 Status: "App is running" (typically 2-5 minutes)

Step 5: Verify Deployment

  • ✅ App URL: https://share.streamlit.io/devamsv/emi-predict_ai/main/app.py
  • ✅ All pages load
  • ✅ EMI Calculator works
  • ✅ Model predictions return instantly

Configuration Files

requirements.txt (Python Dependencies)

streamlit>=1.28.0,<2.0.0
pandas>=2.0.0,<3.0.0
numpy>=1.24.0,<2.0.0
scikit-learn>=1.3.0,<2.0.0
xgboost>=2.0.0,<3.0.0
joblib>=1.3.0,<2.0.0
plotly>=5.14.0,<6.0.0
mlflow>=2.8.0,<3.0.0
python-dotenv>=1.0.0,<2.0.0
requests>=2.31.0,<3.0.0

runtime.txt (Python Version)

python-3.11.7

Troubleshooting Deployment

Issue Cause Solution
ModuleNotFoundError: xgboost Package not in requirements.txt Add to requirements.txt, commit, push
Build failed on Python 3.13 pyarrow incompatibility Use runtime.txt with Python 3.11.7
"Models failed to load" Missing init.py files Ensure utils/init.py exists
App stuck on "Loading..." Streamlit Cloud slow build Restart app from Manage App → Reboot App
Import errors Old cached dependencies Hard refresh: Ctrl+F5 (Cmd+Shift+R on Mac)

Project Structure Details

Configuration (utils/config.py)

# Model paths
CLASSIFIER_PATH = "models/emi_eligibility_model.pkl"
REGRESSOR_PATH = "models/max_emi_model.pkl"

# Business rules
HIGH_CREDIT_SCORE_THRESHOLD = 701
DTI_HIGH_RISK_THRESHOLD = 80

# Feature schema
TRAINING_FEATURES = [
    'age', 'gender', 'marital_status', 'education', 'monthly_salary',
    'employment_type', 'years_of_employment', 'company_type', 'house_type',
    'monthly_rent', 'family_size', 'dependents', 'school_fees',
    'college_fees', 'travel_expenses', 'groceries_utilities',
    'other_monthly_expenses', 'existing_loans', 'current_emi_amount',
    'credit_score', 'bank_balance', 'emergency_fund', 'emi_scenario',
    'requested_amount', 'requested_tenure'
]

Model Loading (utils/helpers.py)

def load_models():
    """
    Production-safe model loading with:
    - Dependency verification
    - File existence checks
    - Detailed error handling
    - Logging for debugging
    """
    # Implementation includes:
    # 1. Check dependencies (xgboost, sklearn)
    # 2. Verify model files exist
    # 3. Load models with joblib
    # 4. Return (classifier, regressor) or (None, None)

Production Readiness

✅ Error Handling

Graceful Degradation:

  • Missing dependencies → User-friendly error messages
  • Model loading failures → Diagnostic information provided
  • Prediction errors → Logged and user-notified
  • Missing files → Clear troubleshooting steps

Code Examples:

# Safe imports
try:
    from utils.helpers import load_models
except ImportError as e:
    st.error(f"Import error: {e}")
    st.stop()

# Safe predictions
try:
    eligibility, max_emi = make_prediction(classifier, regressor, input_data)
except Exception as e:
    logger.error(f"Prediction failed: {e}")
    st.warning("Unable to generate prediction. Please try again.")

✅ Dependency Management

Version Pinning Strategy:

# Critical: Exact versions for reproducibility
xgboost>=2.0.0,<3.0.0

# Flexible: Minor version updates allowed
pandas>=2.0.0,<3.0.0

# Rationale: Balance reproducibility with security updates

Compatibility Testing:

  • ✅ Python 3.11+ compatible
  • ✅ Streamlit 1.28+ tested
  • ✅ XGBoost 2.0+ compatible
  • ✅ No conflicts with dependency tree

✅ Environment Compatibility

Local Environment:

  • Windows, macOS, Linux supported
  • Virtual environment recommended
  • requirements.txt for consistency

Cloud Environment:

  • Streamlit Cloud: Automated deployment
  • Docker: Containerizable
  • AWS/GCP: Python 3.11+ compatible

Monitoring & Logging:

import logging

logger = logging.getLogger(__name__)
logger.info("Model loaded successfully")
logger.error(f"Prediction failed: {error}")
logger.warning("Slow model inference detected")

Future Improvements

🚀 Phase 2: Model Enhancements

  • Ensemble Methods: Stacking multiple models for better accuracy
  • Feature Importance: SHAP values for prediction explainability
  • Hyperparameter Optimization: Bayesian optimization for tuning
  • Class Balancing: SMOTE for imbalanced datasets
  • Time Series: Loan performance tracking over time

🚀 Phase 3: Infrastructure & MLOps

  • Model Versioning: Automated version control (DVC/MLflow)
  • CI/CD Pipeline: GitHub Actions for automated testing
  • Model Monitoring: Drift detection and performance tracking
  • A/B Testing: Compare different model versions in production
  • Containerization: Docker images for consistent deployment

🚀 Phase 4: API & Integration

  • REST API: FastAPI/Flask for programmatic access
  • Batch Prediction: Process multiple applications simultaneously
  • Webhook Integration: Real-time bank system integration
  • Authentication: OAuth 2.0 for secure access
  • Rate Limiting: API usage controls

🚀 Phase 5: Advanced Analytics

  • Dashboard: Real-time metrics on Grafana/Tableau
  • Audit Trail: Complete logging of all predictions
  • Anomaly Detection: Identify unusual application patterns
  • Performance Analysis: ROI and business impact metrics
  • Feedback Loop: Ground truth labels for model retraining

Contributing

Contributions are welcome! This project is suitable for:

  • Adding new features (mortgage calculator, investment recommendation, etc.)
  • Improving model accuracy
  • Enhancing UI/UX
  • Documentation improvements
  • Deployment optimization

To Contribute:

# 1. Fork the repository
# 2. Create feature branch
git checkout -b feature/your-feature-name

# 3. Make changes and commit
git add .
git commit -m "feat: Add your feature"

# 4. Push to your fork
git push origin feature/your-feature-name

# 5. Open Pull Request on main repository

License

This project is licensed under the MIT License - see LICENSE file for details.


Author & Contact

Developed by: Devam SV

Project Links:


Acknowledgments

  • Streamlit for the amazing framework
  • XGBoost & Scikit-learn teams for ML libraries
  • MLflow for experiment tracking
  • Plotly for interactive visualizations
  • Open-source community for invaluable tools

Last Updated: January 17, 2026 Status: Production Ready ✅

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published