This project is a simple yet powerful Q&A pipeline that allows you to query a PDF file and receive AI-generated answers using:
- 💡 LangChain for document splitting, embeddings, and vector storage
- 🧠 HuggingFace MiniLM for semantic similarity
- 💃 ChromaDB as a persistent vector store
- 🤖 Google Gemini API for generating natural language responses
- Loads and reads your PDF file (e.g.,
How We Think.pdf) - Splits the document into smaller chunks using
RecursiveCharacterTextSplitter - Embeds these chunks using HuggingFace
all-MiniLM-L6-v2 - Stores the embeddings in a local Chroma vector database
- Takes your query, finds the most relevant chunks
- Passes the results to Google Gemini for answer generation
📆pdf-gemini-qa/
├ 📄 rag.py
├ 📄 requirements.txt (below you can see)
├ 📄 .env
├ 📄 How We Think.pdf
┗ 📁 chroma_db_nccn/
- Python 3.8 or higher
# Clone the repo
git clone https://github.com/your-username/pdf-gemini-qa.git
cd pdf-gemini-qa
# Create a virtual environment
python -m venv venv
source venv/bin/activate # On Windows use: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txtYou need to add your Gemini API key to a .env file in the root folder:
GEMINI_API_KEY=your_actual_api_key_hereHere’s what’s in your requirements.txt:
langchain
langchain-community
chromadb
huggingface-hub
sentence-transformers
python-dotenv
google-generativeai
PyPDF2If you’re not using requirements.txt, you can install packages manually:
pip install langchain
pip install langchain-community
pip install chromadb
pip install huggingface-hub
pip install sentence-transformers
pip install python-dotenv
pip install google-generativeai
pip install PyPDF2Run the script:
python main.pyYou'll be prompted:
What is your Query:Ask questions based on the content of your PDF!
What is your Query: What is reflective thinking according to the author?GEMINI_API_KEY: Your API key from Google AI Studio (https://ai.google.dev/)
## Context:
<relevant document content>
## Query:
Why does the author emphasize reflective thinking?
## Instructions:
- Use ONLY the provided context to answer the query.
- If the context does not contain enough information, say "I don't have enough information."
- Provide factual and well-structured responses.HuggingFace
LangChain
Google AI / Gemini