An ML-powered FinTech application for automated loan eligibility assessment and safe EMI prediction
- Overview
- Business Problem
- Solution Approach
- Key Features
- Tech Stack
- Project Architecture
- Model Details
- Getting Started
- Project Structure
- Production Readiness
- Future Improvements
- Contributing
- License
Intelligent EMI Calculator is a production-grade machine learning application that combines hybrid rule-based logic with predictive modeling to provide accurate loan eligibility assessment and safe EMI (Equated Monthly Installment) recommendations.
The system uses a two-stage decision pipeline:
- Classification Stage: XGBoost classifier determines loan eligibility based on applicant profiles
- Regression Stage: Random Forest regressor predicts the maximum safe EMI amount
This hybrid approach ensures both business rule compliance and data-driven personalization, making it suitable for financial risk assessment in FinTech applications.
Traditional EMI calculators provide generic calculations without considering individual financial risk profiles. Banks and fintech companies need:
- Automated eligibility screening to reduce manual underwriting costs
- Risk-based EMI ceilings that prevent over-leveraging
- Transparent decision logic explainable to borrowers
- Scalable assessment for high-volume applications
This solution enables financial institutions to:
- ✅ Screen loan applications 10x faster
- ✅ Reduce default risk through intelligent EMI caps
- ✅ Provide instant borrower feedback
- ✅ Scale operations without hiring analysts
┌─────────────────────────────────────────────────────────────┐
│ User Input (23 Financial Features) │
└────────────────────┬────────────────────────────────────────┘
│
▼
┌────────────────────────────┐
│ Feature Preprocessing │
│ - Normalization │
│ - Categorical Encoding │
│ - Missing Value Handling │
└────────────────┬───────────┘
│
┌────────────┴────────────┐
│ │
▼ ▼
┌──────────────────┐ ┌──────────────────┐
│ Classification │ │ Regression │
│ (XGBoost) │ │ (Random Forest) │
│ │ │ │
│ Output: │ │ Output: │
│ Eligible? │ │ Max Safe EMI │
└────────┬─────────┘ └────────┬─────────┘
│ │
└────────────┬───────────┘
│
▼
┌─────────────────────────────────┐
│ Business Rules Application │
│ - Credit score adjustment │
│ - DTI (Debt-to-Income) limits │
│ - Safety margins │
└────────────────┬────────────────┘
│
▼
┌─────────────────────────────────┐
│ Final Recommendation │
│ - Eligibility Status │
│ - Safe EMI Amount │
│ - Risk Tier Classification │
└─────────────────────────────────┘
- Safe EMI Calculation:
Safe EMI = Min(Model Prediction, DTI-based Ceiling) - Eligibility Rules:
- Credit Score ≥ 701 (Excellent)
- DTI Ratio < 80% (Manageable)
- Minimum income threshold: ₹25,000/month
- Project overview and problem statement
- Key metrics and ROI statistics
- Model architecture visualization
- Quick start guide
Intelligent Two-Stage Pipeline:
- Stage 1: XGBoost classifier predicts eligibility (Eligible/High Risk/Not Eligible)
- Stage 2: Regression model calculates maximum safe EMI
- Hybrid Logic: Combines ML predictions with business rules (DTI ratios, credit scoring)
- Real-time Processing: Instant predictions (<500ms)
- Transparent Explainability: Shows decision factors affecting the result
Input Parameters (23 Features):
- Personal: Age, Gender, Marital Status, Education
- Employment: Job type, Experience, Company type
- Financial: Monthly salary, Existing loans, Credit score, Bank balance
- Loan Request: Amount, Tenure, Purpose, Monthly obligations
- Real-time Metrics: Accuracy, Precision, Recall, F1-Score
- MLflow Integration: Experiment tracking and model versioning
- Interactive Visualizations:
- Confusion matrices
- Feature importance rankings
- ROC/AUC curves
- Prediction distribution plots
- Model Comparison: Side-by-side performance metrics (XGBoost vs Baseline)
- Dataset Exploration: Interactive analysis of 10,000+ loan records
- Demographic Breakdown: Age, salary, and approval rate distributions
- Trend Analysis: Eligibility patterns across employment types and credit scores
- Statistical Summary: Mean, median, and distribution metrics
- FAQ automation for common financial questions
- Application guide and eligibility tips
- Troubleshooting support
- Direct integration with model outputs
| Layer | Technology | Version | Purpose |
|---|---|---|---|
| Frontend | Streamlit | 1.28+ | Interactive web UI |
| Data Processing | Pandas | 2.0+ | Data manipulation |
| Numerical Computation | NumPy | 1.24+ | Array operations |
| ML - Classification | XGBoost | 2.0+ | Eligibility prediction |
| ML - Regression | Scikit-learn | 1.3+ | EMI ceiling prediction |
| Model Serialization | Joblib | 1.3+ | Model persistence |
| Visualization | Plotly | 5.14+ | Interactive charts |
| Experiment Tracking | MLflow | 2.8+ | Model versioning & metrics |
| Python Runtime | Python | 3.11+ | Application runtime |
EMI-Predict_AI/
│
├── app.py # Streamlit app entry point
├── requirements.txt # Python dependencies
├── runtime.txt # Python version for Streamlit Cloud
│
├── pages/ # Streamlit multi-page app
│ ├── __init__.py
│ ├── 1_Home.py # Landing page & overview
│ ├── 2_EMI_Calculator.py # Main prediction interface
│ ├── 3_Model_Performance.py # Metrics & MLflow integration
│ ├── 4_Data_Insights.py # Dataset exploration
│ └── 5_AI_Assistant.py # Q&A interface
│
├── utils/ # Reusable utilities
│ ├── __init__.py
│ ├── config.py # Configuration & paths
│ ├── helpers.py # Model loading, predictions
│ └── styles.py # Streamlit theming
│
├── models/ # Trained ML models
│ ├── emi_eligibility_model.pkl # XGBoost classifier
│ └── max_emi_model.pkl # Random Forest regressor
│
├── data/
│ ├── raw/ # Original dataset
│ │ └── emi_prediction_dataset.csv
│ └── processed/ # Cleaned, feature-engineered data
│ └── emi_prediction_dataset_cleaned.csv
│
├── notebooks/ # Jupyter notebooks
│ ├── 01_data_handling.ipynb # EDA & preprocessing
│ ├── 02_eda.ipynb # Statistical analysis
│ └── 03_model_training_mlflow.ipynb # Model training
│
├── .streamlit/
│ └── config.toml # Streamlit configuration
│
├── .gitignore # Git exclusions
├── LICENSE # MIT License
└── README.md # This file
Raw Data (emi_prediction_dataset.csv)
↓
Cleaning & EDA (01_data_handling.ipynb)
↓
Feature Engineering (02_eda.ipynb)
↓
Model Training (03_model_training_mlflow.ipynb)
├→ XGBoost Classifier (Eligibility)
└→ Random Forest Regressor (Max EMI)
↓
Model Serialization (Joblib)
↓
Deployment (models/)
↓
Streamlit Application
├→ Live Predictions
├→ Performance Tracking
└→ Data Visualization
Algorithm: XGBoost Classifier
Purpose: Binary/Multi-class classification to determine loan eligibility
Input Features: 23 preprocessed financial features
Output Classes:
Eligible: Safe to approve with calculated EMIHigh Risk: Requires manual reviewNot Eligible: Strong decline indicators
Performance Metrics:
- Accuracy: 92%+
- Precision: 88%+
- Recall: 90%+
- F1-Score: 89%+
Key Hyperparameters:
{
'max_depth': 6,
'learning_rate': 0.1,
'n_estimators': 100,
'subsample': 0.8,
'colsample_bytree': 0.8,
'early_stopping_rounds': 10,
}Training Data:
- Dataset: 10,000+ loan applications
- Training/Validation Split: 80/20
- Class Balance: Handled via
scale_pos_weight
Algorithm: Random Forest Regressor
Purpose: Predict maximum safe monthly EMI based on applicant profile
Input Features: 23 preprocessed financial features
Output: Maximum safe EMI (in ₹, Indian Rupees)
Performance Metrics:
- R² Score: 0.87+
- RMSE: ₹2,500-3,000
- MAE: ₹1,800-2,200
Key Hyperparameters:
{
'n_estimators': 100,
'max_depth': 15,
'min_samples_split': 5,
'min_samples_leaf': 2,
'random_state': 42,
}Feature Engineering:
- Numerical Features (17): Standard scaling, outlier handling
- Categorical Features (6): One-hot encoding, label encoding
- Domain Features:
- Debt-to-Income Ratio (DTI) = (Current EMI + Expenses) / Monthly Salary
- Loan-to-Value (LTV) = Requested Amount / Annual Salary
- Credit Score Bins (Excellent/Good/Fair/Poor)
Handling Missing Values:
- Numerical: Mean/Median imputation
- Categorical: Mode imputation or 'Unknown' category
Outlier Treatment:
- IQR-based detection and capping
- No removal to preserve information
- Python: 3.11 or higher
- pip: Latest version
- Git: For cloning repository
git clone https://github.com/devamsv/EMI-Predict_AI.git
cd EMI-Predict_AI# Windows
python -m venv venv
venv\Scripts\activate
# macOS/Linux
python3 -m venv venv
source venv/bin/activatepip install -r requirements.txt# Check Python version
python --version # Should be 3.11+
# Check key packages
python -c "import streamlit, xgboost, sklearn; print('✅ All dependencies installed')"streamlit run app.pyExpected Output:
You can now view your Streamlit app in your browser.
Local URL: http://localhost:8501
Network URL: http://<your-ip>:8501
- Open browser to
http://localhost:8501 - Navigate through pages using sidebar menu
- Try the EMI Calculator with sample data
- GitHub account with repository pushed
- Streamlit Cloud account (free at share.streamlit.io)
Step 1: Prepare Repository
# Ensure all files are committed
git add .
git commit -m "Prepare for Streamlit Cloud deployment"
git push origin mainStep 2: Verify Required Files
✅ requirements.txt (with pinned versions)
✅ runtime.txt (python-3.11.7)
✅ .streamlit/config.toml
✅ models/*.pkl (trained models)
✅ app.py (entry point)
Step 3: Deploy on Streamlit Cloud
- Go to share.streamlit.io
- Click "New app"
- Select:
- Repository:
devamsv/EMI-Predict_AI - Branch:
main - Main file path:
app.py
- Repository:
- Click "Deploy"
Step 4: Monitor Deployment
- 🟡 Status: "Installing dependencies"
- 🟡 Status: "Building environment"
- 🟢 Status: "App is running" (typically 2-5 minutes)
Step 5: Verify Deployment
- ✅ App URL:
https://share.streamlit.io/devamsv/emi-predict_ai/main/app.py - ✅ All pages load
- ✅ EMI Calculator works
- ✅ Model predictions return instantly
requirements.txt (Python Dependencies)
streamlit>=1.28.0,<2.0.0
pandas>=2.0.0,<3.0.0
numpy>=1.24.0,<2.0.0
scikit-learn>=1.3.0,<2.0.0
xgboost>=2.0.0,<3.0.0
joblib>=1.3.0,<2.0.0
plotly>=5.14.0,<6.0.0
mlflow>=2.8.0,<3.0.0
python-dotenv>=1.0.0,<2.0.0
requests>=2.31.0,<3.0.0runtime.txt (Python Version)
python-3.11.7| Issue | Cause | Solution |
|---|---|---|
ModuleNotFoundError: xgboost |
Package not in requirements.txt | Add to requirements.txt, commit, push |
Build failed on Python 3.13 |
pyarrow incompatibility | Use runtime.txt with Python 3.11.7 |
| "Models failed to load" | Missing init.py files | Ensure utils/init.py exists |
| App stuck on "Loading..." | Streamlit Cloud slow build | Restart app from Manage App → Reboot App |
| Import errors | Old cached dependencies | Hard refresh: Ctrl+F5 (Cmd+Shift+R on Mac) |
# Model paths
CLASSIFIER_PATH = "models/emi_eligibility_model.pkl"
REGRESSOR_PATH = "models/max_emi_model.pkl"
# Business rules
HIGH_CREDIT_SCORE_THRESHOLD = 701
DTI_HIGH_RISK_THRESHOLD = 80
# Feature schema
TRAINING_FEATURES = [
'age', 'gender', 'marital_status', 'education', 'monthly_salary',
'employment_type', 'years_of_employment', 'company_type', 'house_type',
'monthly_rent', 'family_size', 'dependents', 'school_fees',
'college_fees', 'travel_expenses', 'groceries_utilities',
'other_monthly_expenses', 'existing_loans', 'current_emi_amount',
'credit_score', 'bank_balance', 'emergency_fund', 'emi_scenario',
'requested_amount', 'requested_tenure'
]def load_models():
"""
Production-safe model loading with:
- Dependency verification
- File existence checks
- Detailed error handling
- Logging for debugging
"""
# Implementation includes:
# 1. Check dependencies (xgboost, sklearn)
# 2. Verify model files exist
# 3. Load models with joblib
# 4. Return (classifier, regressor) or (None, None)Graceful Degradation:
- Missing dependencies → User-friendly error messages
- Model loading failures → Diagnostic information provided
- Prediction errors → Logged and user-notified
- Missing files → Clear troubleshooting steps
Code Examples:
# Safe imports
try:
from utils.helpers import load_models
except ImportError as e:
st.error(f"Import error: {e}")
st.stop()
# Safe predictions
try:
eligibility, max_emi = make_prediction(classifier, regressor, input_data)
except Exception as e:
logger.error(f"Prediction failed: {e}")
st.warning("Unable to generate prediction. Please try again.")Version Pinning Strategy:
# Critical: Exact versions for reproducibility
xgboost>=2.0.0,<3.0.0
# Flexible: Minor version updates allowed
pandas>=2.0.0,<3.0.0
# Rationale: Balance reproducibility with security updatesCompatibility Testing:
- ✅ Python 3.11+ compatible
- ✅ Streamlit 1.28+ tested
- ✅ XGBoost 2.0+ compatible
- ✅ No conflicts with dependency tree
Local Environment:
- Windows, macOS, Linux supported
- Virtual environment recommended
requirements.txtfor consistency
Cloud Environment:
- Streamlit Cloud: Automated deployment
- Docker: Containerizable
- AWS/GCP: Python 3.11+ compatible
Monitoring & Logging:
import logging
logger = logging.getLogger(__name__)
logger.info("Model loaded successfully")
logger.error(f"Prediction failed: {error}")
logger.warning("Slow model inference detected")- Ensemble Methods: Stacking multiple models for better accuracy
- Feature Importance: SHAP values for prediction explainability
- Hyperparameter Optimization: Bayesian optimization for tuning
- Class Balancing: SMOTE for imbalanced datasets
- Time Series: Loan performance tracking over time
- Model Versioning: Automated version control (DVC/MLflow)
- CI/CD Pipeline: GitHub Actions for automated testing
- Model Monitoring: Drift detection and performance tracking
- A/B Testing: Compare different model versions in production
- Containerization: Docker images for consistent deployment
- REST API: FastAPI/Flask for programmatic access
- Batch Prediction: Process multiple applications simultaneously
- Webhook Integration: Real-time bank system integration
- Authentication: OAuth 2.0 for secure access
- Rate Limiting: API usage controls
- Dashboard: Real-time metrics on Grafana/Tableau
- Audit Trail: Complete logging of all predictions
- Anomaly Detection: Identify unusual application patterns
- Performance Analysis: ROI and business impact metrics
- Feedback Loop: Ground truth labels for model retraining
Contributions are welcome! This project is suitable for:
- Adding new features (mortgage calculator, investment recommendation, etc.)
- Improving model accuracy
- Enhancing UI/UX
- Documentation improvements
- Deployment optimization
To Contribute:
# 1. Fork the repository
# 2. Create feature branch
git checkout -b feature/your-feature-name
# 3. Make changes and commit
git add .
git commit -m "feat: Add your feature"
# 4. Push to your fork
git push origin feature/your-feature-name
# 5. Open Pull Request on main repositoryThis project is licensed under the MIT License - see LICENSE file for details.
Developed by: Devam SV
Project Links:
- 🔗 GitHub: devamsv/EMI-Predict_AI
- 🌐 Live App: Streamlit Cloud Deployment
- 📧 Questions: Open an issue on GitHub
- Streamlit for the amazing framework
- XGBoost & Scikit-learn teams for ML libraries
- MLflow for experiment tracking
- Plotly for interactive visualizations
- Open-source community for invaluable tools
Last Updated: January 17, 2026 Status: Production Ready ✅