MS in Computer Science | Data Science • Machine Learning • Analytics • Cloud
Passionate about continuous learning and building intelligent, scalable, and impactful software.
I'm a Computer Science graduate student at Ohio University, with strong hands-on experience in building machine learning models, creating insightful visualizations, and developing end-to-end ML workflows. I enjoy solving real-world problems using data, cloud tools, and code.
✦ Research: Machine learning for aviation turbulence forecasting using PIREPs + ERA5 with PCA and K-Means
✦ Core Strengths: Data wrangling, feature engineering, class balancing (SMOTE, Isolation Forest), and model tuning
✦ Current Interests: MLOps, geospatial mapping, AI dashboards, and cloud-deployed ML systems
📁 Multilingual LLM-based Medical FAQ Answering Pipeline
Generates multilingual healthcare FAQ answers on MedQuAD (English, Spanish, Telugu) with a scalable LLM RAG workflow.
→ Built using Databricks PySpark, NLTK, Azure Blob Parquet, LangChain + FAISS with sentence transformer embeddings, Hugging Face FLAN T5, and GCP Translation.
→ Adds evaluation and governance with TF IDF retrieval checks, ROUGE, and human review to support telehealth automation.
📁 Reddit AI Job Sentiment Tracker
Monitors global discourse on AI-driven job displacement using real-time Reddit data streams.
→ Built Databricks + Apache Spark pipeline with PRAW API, NLP preprocessing (NLTK), and sentiment analysis (VADER).
→ Created interactive Plotly dashboards and Gradient Boosted models, reducing insight latency by 50% for PR teams.
📁 U.S. Turbulence Visualizer
Interactive dashboard to explore turbulence zones by altitude, time, and risk category
→ Built using: Plotly Dash, Flask, GeoPandas
→ Adds explainability and visual context to model predictions
📁 Turbulence Risk Predictor
Forecasts severe turbulence zones using 1.1M+ flight reports and ERA5 reanalysis weather data
→ Highlights: Data cleaning, feature engineering, SMOTE, Isolation Forest, PCA, XGBoost
→ Achieved 91.8% accuracy on unseen 2025 flight data
📁 Cancer Genomics Classifier
Predicts cancer outcomes using 8,000+ genomic features and ensemble models
→ Focus: Random Forest modeling with 9-fold cross-validation
→ Achieved 86% sensitivity and 90% specificity