🧠 ShopAI - End-to-End AI Data Pipeline

A full-featured PostgreSQL + Python project simulating a real-world AI data workflow.
This project demonstrates how to design, build, and automate a complete data pipeline — from raw data ingestion to AI-ready feature generation and model training.

🚀 Tech Stack

Languages & Tools:
PostgreSQL · Python · pandas · SQLAlchemy · psycopg2 · scikit-learn · schedule

Concepts Covered:
ETL Pipelines · Feature Engineering · Data Cleaning · Automation · ML Integration · Scheduling

📦 Features

SQL-based feature engineering with PostgreSQL
Automated Python ETL pipeline (pandas + SQLAlchemy)
Daily data refresh via scheduler
AI-ready dataset export
Logistic Regression model integration
Modular SQL and Python code structure for learning and debugging

🧩 Run Locally

Install dependencies

pip install -r requirements.txt

Run ETL pipeline

python python/etl_shopai.py

Train machine learning model

python python/model_train.py

Automate daily ETL refresh

python python/scheduler.py

📊 Example Features

Feature	Description
total_orders	Total number of orders made by the user
avg_spend	Average amount spent per order
days_since_last_order	Number of days since last purchase
activity_flag	User activity label (Active / Inactive)
rank_in_city	Ranking of user by spend within their city

🧠 Project Overview

ShopAI simulates an AI-driven data pipeline used in e-commerce and analytics environments. It demonstrates:

How to design a PostgreSQL schema for analytics
Build a Python-based ETL system
Create feature stores for ML models
Automate data refresh using scheduling techniques

This workflow mirrors how modern DataOps and MLOps teams manage AI datasets in production.

🧩 Folder Structure

ShopAI-DataPipeline/
│
├── sql/
│   ├── 01_schema.sql
│   ├── 02_insert_sample_data.sql
│   ├── 03_cleaning.sql
│   ├── 04_features.sql
│   ├── 05_feature_store.sql
│
├── python/
│   ├── etl_shopai.py
│   ├── model_train.py
│   ├── scheduler.py
│
├── data/
│   ├── users.csv
│   ├── orders.csv
│
├── docs/
│   ├── architecture.png
│   ├── architecture.txt
│
├── requirements.txt
└── README.md

📸 Architecture Diagram

🧱 Flow

Raw CSV → PostgreSQL → SQL Feature Views → Python ETL → Materialized Feature Store → ML Model

🧪 Assessment (For Learners)

Students enrolled in the CareerByteCode AI Data Pipeline Course must use this repository as part of their practical assessment.

Objective: Identify, debug, and fix small intentional issues in the project to gain real-world experience.

Task	Description
🧩 Fix Script Errors	Identify small SQL or Python syntax issues and correct them
🧹 Clean Data	Validate and clean datasets using SQL
🧠 Run ETL	Execute and debug `etl_shopai.py`
🧮 Verify Features	Confirm expected columns in `user_feature_store_py`
🤖 Train Model	Run `model_train.py` and validate model output
⏰ Automate	Test scheduler by changing interval to 1 minute
💡 Extend Features	Add one new feature (e.g., `days_since_signup`)
🧾 Document	Add notes in README about fixes and improvements

✅ Completion Criteria

You’ve successfully completed the project when:

All scripts and Python files run end-to-end without errors
user_feature_store_py populates correctly
Model trains successfully with visible coefficients
Scheduler triggers ETL automatically
You can explain every step (schema → ETL → model)

💡 Bonus Challenges

Create an additional materialized view that aggregates daily revenue
Add a plot or visualization of average spend per city using Python
Build a Dockerfile to containerize the project
Deploy the PostgreSQL + ETL pipeline using Docker Compose
Integrate with Grafana or Metabase for dashboard visualization

🧠 Learning Outcomes

After completing this project, you’ll master:

SQL for AI & Feature Engineering
Python + PostgreSQL ETL Automation
Data Cleaning and Quality Validation
Feature Store Design for ML Models
Automation and Scheduling Concepts

🧩 Example Output Tables

Table	Description
users	Raw customer data
orders	Transaction data
user_features	Derived features per user
user_feature_store	Materialized feature table
user_feature_store_py	ETL-generated feature table

Example:

user_id	name	total_spend	avg_spend	days_since_last_order	activity_flag
1	Alice	300	100	5	Active
2	Ben	350	175	8	Inactive

🧩 Run ETL + Model (Quick Demo)

python python/etl_shopai.py
python python/model_train.py

Output Example:

Starting ETL...
ETL finished. Features written to table 'user_feature_store_py'.
Model trained. Coefficients: {'total_orders': 0.01, 'total_spend': 0.04, 'avg_spend': -0.02, 'days_since_last_order': 0.1}

🧪 Assessment Rubric (Optional for Mentors)

Area	Marks	Description
SQL Setup & Schema	10	Database structure and data correctness
Data Cleaning	10	Proper handling of duplicates, nulls
Feature Engineering	20	Accuracy and creativity of SQL features
ETL & Integration	20	Functional Python ETL pipeline
Automation	10	Scheduler working as expected
Model Training	10	Successful model execution
Documentation	10	Clear fixes and explanations
Extension Features	10	Added new feature successfully

Total: 100 marks

🏆 Author

careerbytecode DevOps & Cloud Engineer | Building AI-ready DataOps workflows 🔗 https://github.com/careerbytecode

💬 Contribute

Learners and contributors are encouraged to:

Submit pull requests fixing issues
Add new SQL or Python utilities
Create new AI features for practice
Document discovered bugs or improvements

🏁 License

This project is open-source under the MIT License. Use it for learning, practicing, and building your AI & DataOps portfolio.

✅ Next Step: Copy this complete README content into your repository’s README.md file. It’s fully formatted for GitHub with headings, code blocks, and tables that render cleanly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🧠 ShopAI - End-to-End AI Data Pipeline

🚀 Tech Stack

📦 Features

🧩 Run Locally

Install dependencies

Run ETL pipeline

Train machine learning model

Automate daily ETL refresh

📊 Example Features

🧠 Project Overview

🧩 Folder Structure

📸 Architecture Diagram

🧱 Flow

🧪 Assessment (For Learners)

✅ Completion Criteria

💡 Bonus Challenges

🧠 Learning Outcomes

🧩 Example Output Tables

Example:

🧩 Run ETL + Model (Quick Demo)

Output Example:

🧪 Assessment Rubric (Optional for Mentors)

🏆 Author

💬 Contribute

🏁 License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
data		data
docs		docs
python		python
sql		sql
README.md		README.md
Steps.md		Steps.md
requirements.txt		requirements.txt

careerbytecode/ShopAI_Project

Folders and files

Latest commit

History

Repository files navigation

🧠 ShopAI - End-to-End AI Data Pipeline

🚀 Tech Stack

📦 Features

🧩 Run Locally

Install dependencies

Run ETL pipeline

Train machine learning model

Automate daily ETL refresh

📊 Example Features

🧠 Project Overview

🧩 Folder Structure

📸 Architecture Diagram

🧱 Flow

🧪 Assessment (For Learners)

✅ Completion Criteria

💡 Bonus Challenges

🧠 Learning Outcomes

🧩 Example Output Tables

Example:

🧩 Run ETL + Model (Quick Demo)

Output Example:

🧪 Assessment Rubric (Optional for Mentors)

🏆 Author

💬 Contribute

🏁 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages