Intelligent EMI Calculator

An ML-powered FinTech application for automated loan eligibility assessment and safe EMI prediction

📋 Table of Contents

Overview
Business Problem
Solution Approach
Key Features
Tech Stack
Project Architecture
Model Details
Getting Started
- Run Locally
- Deploy on Streamlit Cloud
Project Structure
Production Readiness
Future Improvements
Contributing
License

Overview

Intelligent EMI Calculator is a production-grade machine learning application that combines hybrid rule-based logic with predictive modeling to provide accurate loan eligibility assessment and safe EMI (Equated Monthly Installment) recommendations.

The system uses a two-stage decision pipeline:

Classification Stage: XGBoost classifier determines loan eligibility based on applicant profiles
Regression Stage: Random Forest regressor predicts the maximum safe EMI amount

This hybrid approach ensures both business rule compliance and data-driven personalization, making it suitable for financial risk assessment in FinTech applications.

Business Problem

Challenge

Traditional EMI calculators provide generic calculations without considering individual financial risk profiles. Banks and fintech companies need:

Automated eligibility screening to reduce manual underwriting costs
Risk-based EMI ceilings that prevent over-leveraging
Transparent decision logic explainable to borrowers
Scalable assessment for high-volume applications

Impact

This solution enables financial institutions to:

✅ Screen loan applications 10x faster
✅ Reduce default risk through intelligent EMI caps
✅ Provide instant borrower feedback
✅ Scale operations without hiring analysts

Solution Approach

Architecture

┌─────────────────────────────────────────────────────────────┐
│         User Input (23 Financial Features)                   │
└────────────────────┬────────────────────────────────────────┘
                     │
                     ▼
        ┌────────────────────────────┐
        │  Feature Preprocessing     │
        │  - Normalization           │
        │  - Categorical Encoding    │
        │  - Missing Value Handling  │
        └────────────────┬───────────┘
                         │
            ┌────────────┴────────────┐
            │                         │
            ▼                         ▼
    ┌──────────────────┐     ┌──────────────────┐
    │  Classification  │     │   Regression     │
    │  (XGBoost)       │     │  (Random Forest) │
    │                  │     │                  │
    │  Output:         │     │  Output:         │
    │  Eligible?       │     │  Max Safe EMI    │
    └────────┬─────────┘     └────────┬─────────┘
             │                        │
             └────────────┬───────────┘
                          │
                          ▼
         ┌─────────────────────────────────┐
         │  Business Rules Application     │
         │  - Credit score adjustment      │
         │  - DTI (Debt-to-Income) limits  │
         │  - Safety margins               │
         └────────────────┬────────────────┘
                          │
                          ▼
         ┌─────────────────────────────────┐
         │     Final Recommendation        │
         │  - Eligibility Status           │
         │  - Safe EMI Amount              │
         │  - Risk Tier Classification     │
         └─────────────────────────────────┘

Decision Logic

Safe EMI Calculation: Safe EMI = Min(Model Prediction, DTI-based Ceiling)
Eligibility Rules:
- Credit Score ≥ 701 (Excellent)
- DTI Ratio < 80% (Manageable)
- Minimum income threshold: ₹25,000/month

Key Features

🏠 Home Dashboard

Project overview and problem statement
Key metrics and ROI statistics
Model architecture visualization
Quick start guide

🧮 EMI Calculator (Core Feature)

Intelligent Two-Stage Pipeline:

Stage 1: XGBoost classifier predicts eligibility (Eligible/High Risk/Not Eligible)
Stage 2: Regression model calculates maximum safe EMI
Hybrid Logic: Combines ML predictions with business rules (DTI ratios, credit scoring)
Real-time Processing: Instant predictions (<500ms)
Transparent Explainability: Shows decision factors affecting the result

Input Parameters (23 Features):

Personal: Age, Gender, Marital Status, Education
Employment: Job type, Experience, Company type
Financial: Monthly salary, Existing loans, Credit score, Bank balance
Loan Request: Amount, Tenure, Purpose, Monthly obligations

📊 Model Performance Dashboard

Real-time Metrics: Accuracy, Precision, Recall, F1-Score
MLflow Integration: Experiment tracking and model versioning
Interactive Visualizations:
- Confusion matrices
- Feature importance rankings
- ROC/AUC curves
- Prediction distribution plots
Model Comparison: Side-by-side performance metrics (XGBoost vs Baseline)

📈 Data Insights

Dataset Exploration: Interactive analysis of 10,000+ loan records
Demographic Breakdown: Age, salary, and approval rate distributions
Trend Analysis: Eligibility patterns across employment types and credit scores
Statistical Summary: Mean, median, and distribution metrics

🤖 AI Assistant

FAQ automation for common financial questions
Application guide and eligibility tips
Troubleshooting support
Direct integration with model outputs

Tech Stack

Layer	Technology	Version	Purpose
Frontend	Streamlit	1.28+	Interactive web UI
Data Processing	Pandas	2.0+	Data manipulation
Numerical Computation	NumPy	1.24+	Array operations
ML - Classification	XGBoost	2.0+	Eligibility prediction
ML - Regression	Scikit-learn	1.3+	EMI ceiling prediction
Model Serialization	Joblib	1.3+	Model persistence
Visualization	Plotly	5.14+	Interactive charts
Experiment Tracking	MLflow	2.8+	Model versioning & metrics
Python Runtime	Python	3.11+	Application runtime

Project Architecture

Directory Structure

EMI-Predict_AI/
│
├── app.py                          # Streamlit app entry point
├── requirements.txt                # Python dependencies
├── runtime.txt                     # Python version for Streamlit Cloud
│
├── pages/                          # Streamlit multi-page app
│   ├── __init__.py
│   ├── 1_Home.py                   # Landing page & overview
│   ├── 2_EMI_Calculator.py         # Main prediction interface
│   ├── 3_Model_Performance.py      # Metrics & MLflow integration
│   ├── 4_Data_Insights.py          # Dataset exploration
│   └── 5_AI_Assistant.py           # Q&A interface
│
├── utils/                          # Reusable utilities
│   ├── __init__.py
│   ├── config.py                   # Configuration & paths
│   ├── helpers.py                  # Model loading, predictions
│   └── styles.py                   # Streamlit theming
│
├── models/                         # Trained ML models
│   ├── emi_eligibility_model.pkl   # XGBoost classifier
│   └── max_emi_model.pkl           # Random Forest regressor
│
├── data/
│   ├── raw/                        # Original dataset
│   │   └── emi_prediction_dataset.csv
│   └── processed/                  # Cleaned, feature-engineered data
│       └── emi_prediction_dataset_cleaned.csv
│
├── notebooks/                      # Jupyter notebooks
│   ├── 01_data_handling.ipynb      # EDA & preprocessing
│   ├── 02_eda.ipynb                # Statistical analysis
│   └── 03_model_training_mlflow.ipynb # Model training
│
├── .streamlit/
│   └── config.toml                 # Streamlit configuration
│
├── .gitignore                      # Git exclusions
├── LICENSE                         # MIT License
└── README.md                       # This file

Data Flow

Raw Data (emi_prediction_dataset.csv)
    ↓
Cleaning & EDA (01_data_handling.ipynb)
    ↓
Feature Engineering (02_eda.ipynb)
    ↓
Model Training (03_model_training_mlflow.ipynb)
    ├→ XGBoost Classifier (Eligibility)
    └→ Random Forest Regressor (Max EMI)
    ↓
Model Serialization (Joblib)
    ↓
Deployment (models/)
    ↓
Streamlit Application
    ├→ Live Predictions
    ├→ Performance Tracking
    └→ Data Visualization

Model Details

1️⃣ Eligibility Classification Model

Algorithm: XGBoost Classifier

Purpose: Binary/Multi-class classification to determine loan eligibility

Input Features: 23 preprocessed financial features

Output Classes:

Eligible: Safe to approve with calculated EMI
High Risk: Requires manual review
Not Eligible: Strong decline indicators

Performance Metrics:

Accuracy: 92%+
Precision: 88%+
Recall: 90%+
F1-Score: 89%+

Key Hyperparameters:

{
    'max_depth': 6,
    'learning_rate': 0.1,
    'n_estimators': 100,
    'subsample': 0.8,
    'colsample_bytree': 0.8,
    'early_stopping_rounds': 10,
}

Training Data:

Dataset: 10,000+ loan applications
Training/Validation Split: 80/20
Class Balance: Handled via scale_pos_weight

2️⃣ EMI Regression Model

Algorithm: Random Forest Regressor

Purpose: Predict maximum safe monthly EMI based on applicant profile

Input Features: 23 preprocessed financial features

Output: Maximum safe EMI (in ₹, Indian Rupees)

Performance Metrics:

R² Score: 0.87+
RMSE: ₹2,500-3,000
MAE: ₹1,800-2,200

Key Hyperparameters:

{
    'n_estimators': 100,
    'max_depth': 15,
    'min_samples_split': 5,
    'min_samples_leaf': 2,
    'random_state': 42,
}

3️⃣ Preprocessing Pipeline

Feature Engineering:

Numerical Features (17): Standard scaling, outlier handling
Categorical Features (6): One-hot encoding, label encoding
Domain Features:
- Debt-to-Income Ratio (DTI) = (Current EMI + Expenses) / Monthly Salary
- Loan-to-Value (LTV) = Requested Amount / Annual Salary
- Credit Score Bins (Excellent/Good/Fair/Poor)

Handling Missing Values:

Numerical: Mean/Median imputation
Categorical: Mode imputation or 'Unknown' category

Outlier Treatment:

IQR-based detection and capping
No removal to preserve information

Getting Started

Prerequisites

Python: 3.11 or higher
pip: Latest version
Git: For cloning repository

Run Locally

1. Clone Repository

git clone https://github.com/devamsv/EMI-Predict_AI.git
cd EMI-Predict_AI

2. Create Virtual Environment

# Windows
python -m venv venv
venv\Scripts\activate

# macOS/Linux
python3 -m venv venv
source venv/bin/activate

3. Install Dependencies

pip install -r requirements.txt

4. Verify Installation

# Check Python version
python --version  # Should be 3.11+

# Check key packages
python -c "import streamlit, xgboost, sklearn; print('✅ All dependencies installed')"

5. Run Application

streamlit run app.py

Expected Output:

  You can now view your Streamlit app in your browser.

  Local URL: http://localhost:8501
  Network URL: http://<your-ip>:8501

6. Access Application

Open browser to http://localhost:8501
Navigate through pages using sidebar menu
Try the EMI Calculator with sample data

Deploy on Streamlit Cloud

Prerequisites

GitHub account with repository pushed
Streamlit Cloud account (free at share.streamlit.io)

Deployment Steps

Step 1: Prepare Repository

# Ensure all files are committed
git add .
git commit -m "Prepare for Streamlit Cloud deployment"
git push origin main

Step 2: Verify Required Files

✅ requirements.txt    (with pinned versions)
✅ runtime.txt         (python-3.11.7)
✅ .streamlit/config.toml
✅ models/*.pkl        (trained models)
✅ app.py              (entry point)

Step 3: Deploy on Streamlit Cloud

Go to share.streamlit.io
Click "New app"
Select:
- Repository: devamsv/EMI-Predict_AI
- Branch: main
- Main file path: app.py
Click "Deploy"

Step 4: Monitor Deployment

🟡 Status: "Installing dependencies"
🟡 Status: "Building environment"
🟢 Status: "App is running" (typically 2-5 minutes)

Step 5: Verify Deployment

✅ App URL: https://share.streamlit.io/devamsv/emi-predict_ai/main/app.py
✅ All pages load
✅ EMI Calculator works
✅ Model predictions return instantly

Configuration Files

requirements.txt (Python Dependencies)

streamlit>=1.28.0,<2.0.0
pandas>=2.0.0,<3.0.0
numpy>=1.24.0,<2.0.0
scikit-learn>=1.3.0,<2.0.0
xgboost>=2.0.0,<3.0.0
joblib>=1.3.0,<2.0.0
plotly>=5.14.0,<6.0.0
mlflow>=2.8.0,<3.0.0
python-dotenv>=1.0.0,<2.0.0
requests>=2.31.0,<3.0.0

runtime.txt (Python Version)

python-3.11.7

Troubleshooting Deployment

Issue	Cause	Solution
`ModuleNotFoundError: xgboost`	Package not in requirements.txt	Add to requirements.txt, commit, push
`Build failed on Python 3.13`	pyarrow incompatibility	Use runtime.txt with Python 3.11.7
"Models failed to load"	Missing init.py files	Ensure utils/init.py exists
App stuck on "Loading..."	Streamlit Cloud slow build	Restart app from Manage App → Reboot App
Import errors	Old cached dependencies	Hard refresh: Ctrl+F5 (Cmd+Shift+R on Mac)

Project Structure Details

Configuration (utils/config.py)

# Model paths
CLASSIFIER_PATH = "models/emi_eligibility_model.pkl"
REGRESSOR_PATH = "models/max_emi_model.pkl"

# Business rules
HIGH_CREDIT_SCORE_THRESHOLD = 701
DTI_HIGH_RISK_THRESHOLD = 80

# Feature schema
TRAINING_FEATURES = [
    'age', 'gender', 'marital_status', 'education', 'monthly_salary',
    'employment_type', 'years_of_employment', 'company_type', 'house_type',
    'monthly_rent', 'family_size', 'dependents', 'school_fees',
    'college_fees', 'travel_expenses', 'groceries_utilities',
    'other_monthly_expenses', 'existing_loans', 'current_emi_amount',
    'credit_score', 'bank_balance', 'emergency_fund', 'emi_scenario',
    'requested_amount', 'requested_tenure'
]

Model Loading (utils/helpers.py)

def load_models():
    """
    Production-safe model loading with:
    - Dependency verification
    - File existence checks
    - Detailed error handling
    - Logging for debugging
    """
    # Implementation includes:
    # 1. Check dependencies (xgboost, sklearn)
    # 2. Verify model files exist
    # 3. Load models with joblib
    # 4. Return (classifier, regressor) or (None, None)

Production Readiness

✅ Error Handling

Graceful Degradation:

Missing dependencies → User-friendly error messages
Model loading failures → Diagnostic information provided
Prediction errors → Logged and user-notified
Missing files → Clear troubleshooting steps

Code Examples:

# Safe imports
try:
    from utils.helpers import load_models
except ImportError as e:
    st.error(f"Import error: {e}")
    st.stop()

# Safe predictions
try:
    eligibility, max_emi = make_prediction(classifier, regressor, input_data)
except Exception as e:
    logger.error(f"Prediction failed: {e}")
    st.warning("Unable to generate prediction. Please try again.")

✅ Dependency Management

Version Pinning Strategy:

# Critical: Exact versions for reproducibility
xgboost>=2.0.0,<3.0.0

# Flexible: Minor version updates allowed
pandas>=2.0.0,<3.0.0

# Rationale: Balance reproducibility with security updates

Compatibility Testing:

✅ Python 3.11+ compatible
✅ Streamlit 1.28+ tested
✅ XGBoost 2.0+ compatible
✅ No conflicts with dependency tree

✅ Environment Compatibility

Local Environment:

Windows, macOS, Linux supported
Virtual environment recommended
requirements.txt for consistency

Cloud Environment:

Streamlit Cloud: Automated deployment
Docker: Containerizable
AWS/GCP: Python 3.11+ compatible

Monitoring & Logging:

import logging

logger = logging.getLogger(__name__)
logger.info("Model loaded successfully")
logger.error(f"Prediction failed: {error}")
logger.warning("Slow model inference detected")

Future Improvements

🚀 Phase 2: Model Enhancements

Ensemble Methods: Stacking multiple models for better accuracy
Feature Importance: SHAP values for prediction explainability
Hyperparameter Optimization: Bayesian optimization for tuning
Class Balancing: SMOTE for imbalanced datasets
Time Series: Loan performance tracking over time

🚀 Phase 3: Infrastructure & MLOps

Model Versioning: Automated version control (DVC/MLflow)
CI/CD Pipeline: GitHub Actions for automated testing
Model Monitoring: Drift detection and performance tracking
A/B Testing: Compare different model versions in production
Containerization: Docker images for consistent deployment

🚀 Phase 4: API & Integration

REST API: FastAPI/Flask for programmatic access
Batch Prediction: Process multiple applications simultaneously
Webhook Integration: Real-time bank system integration
Authentication: OAuth 2.0 for secure access
Rate Limiting: API usage controls

🚀 Phase 5: Advanced Analytics

Dashboard: Real-time metrics on Grafana/Tableau
Audit Trail: Complete logging of all predictions
Anomaly Detection: Identify unusual application patterns
Performance Analysis: ROI and business impact metrics
Feedback Loop: Ground truth labels for model retraining

Contributing

Contributions are welcome! This project is suitable for:

Adding new features (mortgage calculator, investment recommendation, etc.)
Improving model accuracy
Enhancing UI/UX
Documentation improvements
Deployment optimization

To Contribute:

# 1. Fork the repository
# 2. Create feature branch
git checkout -b feature/your-feature-name

# 3. Make changes and commit
git add .
git commit -m "feat: Add your feature"

# 4. Push to your fork
git push origin feature/your-feature-name

# 5. Open Pull Request on main repository

License

This project is licensed under the MIT License - see LICENSE file for details.

Author & Contact

Developed by: Devam SV

Project Links:

🔗 GitHub: devamsv/EMI-Predict_AI
🌐 Live App: Streamlit Cloud Deployment
📧 Questions: Open an issue on GitHub

Acknowledgments

Streamlit for the amazing framework
XGBoost & Scikit-learn teams for ML libraries
MLflow for experiment tracking
Plotly for interactive visualizations
Open-source community for invaluable tools

Last Updated: January 17, 2026 Status: Production Ready ✅

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.streamlit		.streamlit
data		data
models		models
notebooks		notebooks
pages		pages
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt
runtime.txt		runtime.txt

License

devamsv/EMI-Predict_AI

Folders and files

Latest commit

History

Repository files navigation

Intelligent EMI Calculator

📋 Table of Contents

Overview

Business Problem

Challenge

Impact

Solution Approach

Architecture

Decision Logic

Key Features

🏠 Home Dashboard

🧮 EMI Calculator (Core Feature)

📊 Model Performance Dashboard

📈 Data Insights

🤖 AI Assistant

Tech Stack

Project Architecture

Directory Structure

Data Flow

Model Details

1️⃣ Eligibility Classification Model

2️⃣ EMI Regression Model

3️⃣ Preprocessing Pipeline

Getting Started

Prerequisites

Run Locally

1. Clone Repository

2. Create Virtual Environment

3. Install Dependencies

4. Verify Installation

5. Run Application

6. Access Application

Deploy on Streamlit Cloud

Prerequisites

Deployment Steps

Configuration Files

Troubleshooting Deployment

Project Structure Details

Configuration (utils/config.py)

Model Loading (utils/helpers.py)

Production Readiness

✅ Error Handling

✅ Dependency Management

✅ Environment Compatibility

Future Improvements

🚀 Phase 2: Model Enhancements

🚀 Phase 3: Infrastructure & MLOps

🚀 Phase 4: API & Integration

🚀 Phase 5: Advanced Analytics

Contributing

License

Author & Contact

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages