Skip to content

RangeshPandianPT/GenAI

Repository files navigation

🚀 GenAI – Learning Generative AI from Scratch

A hands-on journey into LLMs, Tokenization, RAG pipelines, PDF vector processing, and API development.

Welcome to my Generative AI learning repository! This project documents my step-by-step progress in understanding the core foundations of GenAI, including tokenization, embeddings, vector stores, retrieval-augmented generation (RAG), and backend integration using FastAPI.


📌 What This Repository Contains

📁 1. Tokenization & Vocabulary Building

Includes Python scripts where I learned:

  • How tokenizers work
  • Creating custom vocab files
  • Training BPE tokenizers
  • Understanding merges, vocab length, and token distribution Files:
  • vocab_train.py
  • vocab_path.py
  • vocab_length.py
  • my_bpe-merges.txt
  • my_bpe-vocab.json
  • words.txt

📁 2. RAG Model (Retrieval-Augmented Generation)

Hands-on implementation of:

  • PDF text extraction
  • Converting documents into embeddings
  • Vector indexing and similarity search
  • Building a simple RAG pipeline for answering queries Folder: Rag Model

📁 3. API Development (FastAPI)

I built API endpoints to:

  • Generate responses
  • Interact with the RAG pipeline
  • Test HTTP GET & POST requests
  • Understand API structure Folder: API

📁 4. FastAPI Todo App

A beginner-level FastAPI project to understand:

  • CRUD operations
  • Path & query parameters
  • Async routes Folder: fastapi-todo-main

🎯 Skills I’m Learning Through This Project

  • 🧠 Generative AI Foundations
  • 🔤 Tokenization (LLM Tokenizer Concepts)
  • 📚 Embeddings & Vector Databases
  • 🔍 RAG Workflow (Retriever + Generator)
  • ⚙️ FastAPI for AI Backend Integration
  • 📝 Working with PDF Text & Chunking
  • 🐍 Python for AI & NLP Workflows

📘 About This Repository

This repo contains my personal learning journey as I explore the core building blocks behind modern LLM systems and GenAI applications. I am documenting everything from basics to advanced workflows.


🚧 Work in Progress

✔️ Tokenizer Training ✔️ PDF Vector Processing ✔️ Basic RAG 🔜 Integrating a full Q&A chatbot 🔜 Adding frontend for GenAI demo 🔜 Deploying FastAPI backend


🤝 Contributions / Suggestions

If you have ideas to improve the code or learning roadmap, feel free to open an issue or PR!


⭐ Support

If you like this repo, consider giving it a star ⭐ — it helps me stay motivated to learn more every day!


About

Generative AI learning repository! This project documents my step-by-step progress in understanding the core foundations of GenAI, including tokenization, embeddings, vector stores, retrieval-augmented generation (RAG), and backend integration using FastAPI.Basic concepts of Generative AI

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors