Skip to content

This repository serves as a collection of my coding projects over the years & a snapshot of ongoing work. Projects at various stages of completion, experiments that may lead somewhere (or not), and scripts that will definitely get rewritten a few more times. The cycle continues.

Notifications You must be signed in to change notification settings

stochastic-sisyphus/past-portfolio-old-very-old-geriatric-even

Repository files navigation

M.S. Analytics; Machine Learning Specialization


Portfolio adjacent. This is a collection of data analysis and data science, statistical methods, machine learning, deep learning, and other topics. I am fascinated with translating abstract ideas into tangible solutions and am particularly interested in the intersection of technology and practical applications.

Alt text

Skills and Technologies

  • Programming Languages: Python, R, SQL
  • Data Analysis Tools: Jupyter Notebook, RStudio
  • Machine Learning Libraries: TensorFlow, PyTorch, scikit-learn
  • Database Management: PostgreSQL, MySQL, MongoDB
  • Cloud Platforms: Google Cloud Platform (GCP), Amazon Web Services (AWS)
  • Development Tools: Docker, VS Code
  • Other Tools: Git, Firebase, Tableau

Quick Links


Featured Projects

A sophisticated static analyzer for Python projects that helps manage code complexity and variants:

  • Function/class SHA-256 hashing to detect variants and clones
  • Cyclomatic complexity & maintainability analysis
  • Auto-generated LLM refactor prompts
  • Interactive dashboard for code analysis
  • Internal dependency graphing

Comprehensive population shift analysis and prediction system:

  • Multi-source data integration (Chicago Data Portal, Census Bureau, FRED)
  • Advanced modeling with scenario analysis
  • Zip code level impact assessment
  • Interactive visualization dashboard
  • Automated data pipeline

LLM-powered system for enriching internal knowledge retrieval:

  • Automated metadata extraction and enhancement
  • Custom LLM fine-tuning for domain-specific tasks
  • Semantic search integration
  • Knowledge graph construction

Advanced feature selection combining multiple techniques:

  • PCA dimensionality reduction
  • LASSO regularization
  • Optuna hyperparameter optimization
  • Automated feature importance ranking

Custom semantic search implementation:

  • Sentence transformer embeddings
  • Efficient vector similarity search
  • Customizable ranking algorithms
  • API integration capabilities

Production-grade data processing framework:

  • Modular pipeline architecture
  • Automated quality checks
  • Scalable processing components
  • Comprehensive logging and monitoring

Repository Structure

Portfolio/
├── Artificial-Intelligence/
│   ├── code-cartographer/      # Deep static code analyzer
│   ├── content-processing/     # Content analysis tools
│   ├── research-tools/         # Research automation
│   ├── synsearch/             # Semantic search engine
│   └── web-automation/        # Web scraping and automation
├── Data-Science-and-Analysis/
│   ├── adv_data_processing_pipeline/  # Production data pipeline
│   ├── chipop-pred-apropos/          # Chicago population analysis
│   ├── data-quality-facelift/        # Data quality tools
│   ├── dna-analysis/                 # Genetic analysis
│   ├── early-analysis/               # Initial analysis projects
│   └── github-analyzers/             # GitHub analytics tools
├── Machine-Learning-and-Deep-Learning/
│   ├── basics/                       # ML fundamentals
│   ├── feature-selection-optuna-remix/ # Advanced feature selection
│   ├── computer-vision/              # Image processing
│   ├── nlp/                          # Natural language processing
│   └── recommender-systems/          # Recommendation engines
├── Masters-Capstone/
│   └── Masters-Capstone-Bosch-Metadata-LLM/  # LLM for metadata
├── Documentation/
│   ├── guides/                       # Usage guides
│   └── references/                   # Reference materials
└── Miscellaneous/
    ├── admin/                        # Administrative files
    └── assets/                       # Media and resources

Project Breakdown

File/Directory Summary
Code Cartographer Deep static analyzer for Python projects. Features SHA-256 hashing for variant detection, complexity analysis, dependency graphing, and interactive dashboard.
Chicago Population Forecast Population prediction and analysis system integrating multiple data sources (Chicago Data Portal, Census, FRED) with scenario modeling and zip-code level predictions.
Masters Capstone - Bosch LLM-based system for enriching metadata in internal knowledge bases, featuring custom fine-tuning and semantic search integration.
Feature Selection Framework Advanced feature selection combining PCA, LASSO, and Optuna optimization for optimal feature subset selection.
Synsearch Custom semantic search engine using sentence transformers and efficient vector similarity search.
Advanced Data Pipeline Production-grade data processing framework with modular components and comprehensive monitoring.
ML Basics Machine learning fundamentals with backpropagation and gradient descent.
CIFAR10 Analysis Image classification using logistic regression on CIFAR-10 dataset.
Deep Learning Language Normalization and translation for language projects.
Language Modeling Text analytics and language modeling techniques.
LSTM Text Modeling Text modeling using LSTM neural networks.
NLTK Embeddings Word sense disambiguation and embeddings using NLTK.
Recommender System Implementation of recommendation algorithms.
PSID Web Scraping Automated data retrieval from PSID database.
Web Summarizer URL content summarization tool.
AI Research Synthesizer Research synthesis with Nvidia API integration.
Data Quality Facelift Data quality enhancement with Streamlit interface.
DNA Analysis Comprehensive genetic analysis tool with health traits, ancestry analysis, and interactive dashboard.
GitHub Portfolio Analyzer Analysis tool for GitHub portfolios.
GitHub Repo Analyzer Repository analysis and insights tool.
Credit Risk Analysis Statistical analysis of credit risk factors.
Housing Analysis Housing market and phishing data analysis.
Student Placement Predictive modeling for student placement.

About Me

Connect with me on Medium or LinkedIn.

About

This repository serves as a collection of my coding projects over the years & a snapshot of ongoing work. Projects at various stages of completion, experiments that may lead somewhere (or not), and scripts that will definitely get rewritten a few more times. The cycle continues.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •