🚀 GenAI – Learning Generative AI from Scratch

A hands-on journey into LLMs, Tokenization, RAG pipelines, PDF vector processing, and API development.

Welcome to my Generative AI learning repository! This project documents my step-by-step progress in understanding the core foundations of GenAI, including tokenization, embeddings, vector stores, retrieval-augmented generation (RAG), and backend integration using FastAPI.

📌 What This Repository Contains

📁 1. Tokenization & Vocabulary Building

Includes Python scripts where I learned:

How tokenizers work
Creating custom vocab files
Training BPE tokenizers
Understanding merges, vocab length, and token distribution Files:
vocab_train.py
vocab_path.py
vocab_length.py
my_bpe-merges.txt
my_bpe-vocab.json
words.txt

📁 2. RAG Model (Retrieval-Augmented Generation)

Hands-on implementation of:

PDF text extraction
Converting documents into embeddings
Vector indexing and similarity search
Building a simple RAG pipeline for answering queries Folder: Rag Model

📁 3. API Development (FastAPI)

I built API endpoints to:

Generate responses
Interact with the RAG pipeline
Test HTTP GET & POST requests
Understand API structure Folder: API

📁 4. FastAPI Todo App

A beginner-level FastAPI project to understand:

CRUD operations
Path & query parameters
Async routes Folder: fastapi-todo-main

🎯 Skills I’m Learning Through This Project

🧠 Generative AI Foundations
🔤 Tokenization (LLM Tokenizer Concepts)
📚 Embeddings & Vector Databases
🔍 RAG Workflow (Retriever + Generator)
⚙️ FastAPI for AI Backend Integration
📝 Working with PDF Text & Chunking
🐍 Python for AI & NLP Workflows

📘 About This Repository

This repo contains my personal learning journey as I explore the core building blocks behind modern LLM systems and GenAI applications. I am documenting everything from basics to advanced workflows.

🚧 Work in Progress

✔️ Tokenizer Training ✔️ PDF Vector Processing ✔️ Basic RAG 🔜 Integrating a full Q&A chatbot 🔜 Adding frontend for GenAI demo 🔜 Deploying FastAPI backend

🤝 Contributions / Suggestions

If you have ideas to improve the code or learning roadmap, feel free to open an issue or PR!

⭐ Support

If you like this repo, consider giving it a star ⭐ — it helps me stay motivated to learn more every day!

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
API		API
Digital Detective		Digital Detective
Fast_API_todo/fastapi-todo-main		Fast_API_todo/fastapi-todo-main
Mood Analyzer		Mood Analyzer
Rag Model		Rag Model
Resume Matcher		Resume Matcher
Vocab		Vocab
fastapi-todo-main		fastapi-todo-main
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚀 GenAI – Learning Generative AI from Scratch

📌 What This Repository Contains

📁 1. Tokenization & Vocabulary Building

📁 2. RAG Model (Retrieval-Augmented Generation)

📁 3. API Development (FastAPI)

📁 4. FastAPI Todo App

🎯 Skills I’m Learning Through This Project

📘 About This Repository

🚧 Work in Progress

🤝 Contributions / Suggestions

⭐ Support

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🚀 GenAI – Learning Generative AI from Scratch

📌 What This Repository Contains

📁 1. Tokenization & Vocabulary Building

📁 2. RAG Model (Retrieval-Augmented Generation)

📁 3. API Development (FastAPI)

📁 4. FastAPI Todo App

🎯 Skills I’m Learning Through This Project

📘 About This Repository

🚧 Work in Progress

🤝 Contributions / Suggestions

⭐ Support

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages