Portfolio adjacent. This is a collection of data analysis and data science, statistical methods, machine learning, deep learning, and other topics. I am fascinated with translating abstract ideas into tangible solutions and am particularly interested in the intersection of technology and practical applications.
- Programming Languages: Python, R, SQL
- Data Analysis Tools: Jupyter Notebook, RStudio
- Machine Learning Libraries: TensorFlow, PyTorch, scikit-learn
- Database Management: PostgreSQL, MySQL, MongoDB
- Cloud Platforms: Google Cloud Platform (GCP), Amazon Web Services (AWS)
- Development Tools: Docker, VS Code
- Other Tools: Git, Firebase, Tableau
A sophisticated static analyzer for Python projects that helps manage code complexity and variants:
- Function/class SHA-256 hashing to detect variants and clones
- Cyclomatic complexity & maintainability analysis
- Auto-generated LLM refactor prompts
- Interactive dashboard for code analysis
- Internal dependency graphing
Comprehensive population shift analysis and prediction system:
- Multi-source data integration (Chicago Data Portal, Census Bureau, FRED)
- Advanced modeling with scenario analysis
- Zip code level impact assessment
- Interactive visualization dashboard
- Automated data pipeline
LLM-powered system for enriching internal knowledge retrieval:
- Automated metadata extraction and enhancement
- Custom LLM fine-tuning for domain-specific tasks
- Semantic search integration
- Knowledge graph construction
Advanced feature selection combining multiple techniques:
- PCA dimensionality reduction
- LASSO regularization
- Optuna hyperparameter optimization
- Automated feature importance ranking
Custom semantic search implementation:
- Sentence transformer embeddings
- Efficient vector similarity search
- Customizable ranking algorithms
- API integration capabilities
Production-grade data processing framework:
- Modular pipeline architecture
- Automated quality checks
- Scalable processing components
- Comprehensive logging and monitoring
Portfolio/
├── Artificial-Intelligence/
│ ├── code-cartographer/ # Deep static code analyzer
│ ├── content-processing/ # Content analysis tools
│ ├── research-tools/ # Research automation
│ ├── synsearch/ # Semantic search engine
│ └── web-automation/ # Web scraping and automation
├── Data-Science-and-Analysis/
│ ├── adv_data_processing_pipeline/ # Production data pipeline
│ ├── chipop-pred-apropos/ # Chicago population analysis
│ ├── data-quality-facelift/ # Data quality tools
│ ├── dna-analysis/ # Genetic analysis
│ ├── early-analysis/ # Initial analysis projects
│ └── github-analyzers/ # GitHub analytics tools
├── Machine-Learning-and-Deep-Learning/
│ ├── basics/ # ML fundamentals
│ ├── feature-selection-optuna-remix/ # Advanced feature selection
│ ├── computer-vision/ # Image processing
│ ├── nlp/ # Natural language processing
│ └── recommender-systems/ # Recommendation engines
├── Masters-Capstone/
│ └── Masters-Capstone-Bosch-Metadata-LLM/ # LLM for metadata
├── Documentation/
│ ├── guides/ # Usage guides
│ └── references/ # Reference materials
└── Miscellaneous/
├── admin/ # Administrative files
└── assets/ # Media and resources
| File/Directory | Summary |
|---|---|
| Code Cartographer | Deep static analyzer for Python projects. Features SHA-256 hashing for variant detection, complexity analysis, dependency graphing, and interactive dashboard. |
| Chicago Population Forecast | Population prediction and analysis system integrating multiple data sources (Chicago Data Portal, Census, FRED) with scenario modeling and zip-code level predictions. |
| Masters Capstone - Bosch | LLM-based system for enriching metadata in internal knowledge bases, featuring custom fine-tuning and semantic search integration. |
| Feature Selection Framework | Advanced feature selection combining PCA, LASSO, and Optuna optimization for optimal feature subset selection. |
| Synsearch | Custom semantic search engine using sentence transformers and efficient vector similarity search. |
| Advanced Data Pipeline | Production-grade data processing framework with modular components and comprehensive monitoring. |
| ML Basics | Machine learning fundamentals with backpropagation and gradient descent. |
| CIFAR10 Analysis | Image classification using logistic regression on CIFAR-10 dataset. |
| Deep Learning Language | Normalization and translation for language projects. |
| Language Modeling | Text analytics and language modeling techniques. |
| LSTM Text Modeling | Text modeling using LSTM neural networks. |
| NLTK Embeddings | Word sense disambiguation and embeddings using NLTK. |
| Recommender System | Implementation of recommendation algorithms. |
| PSID Web Scraping | Automated data retrieval from PSID database. |
| Web Summarizer | URL content summarization tool. |
| AI Research Synthesizer | Research synthesis with Nvidia API integration. |
| Data Quality Facelift | Data quality enhancement with Streamlit interface. |
| DNA Analysis | Comprehensive genetic analysis tool with health traits, ancestry analysis, and interactive dashboard. |
| GitHub Portfolio Analyzer | Analysis tool for GitHub portfolios. |
| GitHub Repo Analyzer | Repository analysis and insights tool. |
| Credit Risk Analysis | Statistical analysis of credit risk factors. |
| Housing Analysis | Housing market and phishing data analysis. |
| Student Placement | Predictive modeling for student placement. |
