Skip to content
View godhanaravara's full-sized avatar

Block or report godhanaravara

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
godhanaravara/README.md

Hello! I'm Godha 👋

MS in Computer Science | Data Science • Machine Learning • Analytics • Cloud
Passionate about continuous learning and building intelligent, scalable, and impactful software.


👩‍💻 About Me

I'm a Computer Science graduate student at Ohio University, with strong hands-on experience in building machine learning models, creating insightful visualizations, and developing end-to-end ML workflows. I enjoy solving real-world problems using data, cloud tools, and code.

Research: Machine learning for aviation turbulence forecasting using PIREPs + ERA5 with PCA and K-Means
Core Strengths: Data wrangling, feature engineering, class balancing (SMOTE, Isolation Forest), and model tuning
Current Interests: MLOps, geospatial mapping, AI dashboards, and cloud-deployed ML systems


</> Featured Projects

📁 Multilingual LLM-based Medical FAQ Answering Pipeline
Generates multilingual healthcare FAQ answers on MedQuAD (English, Spanish, Telugu) with a scalable LLM RAG workflow.
→ Built using Databricks PySpark, NLTK, Azure Blob Parquet, LangChain + FAISS with sentence transformer embeddings, Hugging Face FLAN T5, and GCP Translation.
→ Adds evaluation and governance with TF IDF retrieval checks, ROUGE, and human review to support telehealth automation.

📁 Reddit AI Job Sentiment Tracker
Monitors global discourse on AI-driven job displacement using real-time Reddit data streams.
→ Built Databricks + Apache Spark pipeline with PRAW API, NLP preprocessing (NLTK), and sentiment analysis (VADER).
→ Created interactive Plotly dashboards and Gradient Boosted models, reducing insight latency by 50% for PR teams.

📁 U.S. Turbulence Visualizer
Interactive dashboard to explore turbulence zones by altitude, time, and risk category
→ Built using: Plotly Dash, Flask, GeoPandas
→ Adds explainability and visual context to model predictions

📁 Turbulence Risk Predictor
Forecasts severe turbulence zones using 1.1M+ flight reports and ERA5 reanalysis weather data
→ Highlights: Data cleaning, feature engineering, SMOTE, Isolation Forest, PCA, XGBoost
→ Achieved 91.8% accuracy on unseen 2025 flight data

📁 Cancer Genomics Classifier
Predicts cancer outcomes using 8,000+ genomic features and ensemble models
→ Focus: Random Forest modeling with 9-fold cross-validation
→ Achieved 86% sensitivity and 90% specificity


⚡ Technical Skills

• Languages & Databases

Python SQL R MySQL MATLAB C++ Java

• Machine Learning & AI

Scikit-learn XGBoost CatBoost LightGBM TabNet Random Forest MLP SVM Logistic Regression KNN NLP LLMs AI Agents SMOTE Isolation Forest TensorFlow PyTorch

• Data Wrangling & Engineering

Pandas NumPy NetCDF4 Spark Hadoop

• Visualization & Explainability

Plotly Plotly Dash Seaborn Matplotlib GeoPandas Cartopy SHAP Tableau

• Cloud & Infrastructure

AWS AWS SageMaker Azure GCP Databricks Snowflake Docker Git GitHub Actions WSL Jupyter Notebook Google Colab


📫 Connect With Me

🇮🇳 LinkedIn
[email protected]

Pinned Loading

  1. aviation-turbulence-risk-predictor-ML aviation-turbulence-risk-predictor-ML Public

    Jupyter Notebook 1 1

  2. data-science-portfolio data-science-portfolio Public

    Jupyter Notebook

  3. godhanaravara godhanaravara Public

  4. reddit-trend-sentiment reddit-trend-sentiment Public

    Jupyter Notebook 1