You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: PROJECT_EXPLANATION.md
+8-8Lines changed: 8 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -10,18 +10,18 @@ The primary goal is to provide students with a tool to explore and understand th
10
10
11
11
## Key Processes and Components
12
12
13
-
1.**Data Ingestion and Processing (`pdf_chunker.py`, `embed_store.py`):**
13
+
1.**Data Ingestion and Processing (`pdf_chunker.py`, `faiss_store.py`):**
14
14
***Loading:** The system first reads the `grade-11-history-text-book.pdf`.
15
15
***Chunking:** The text content is broken down into smaller, manageable chunks (paragraphs or sections). This is crucial for effective retrieval.
16
16
***Embedding:** Each text chunk is converted into a numerical representation called an "embedding" using a machine learning model (likely via `gemini_utils.py` or a dedicated embedding model). Embeddings capture the semantic meaning of the text.
17
17
***Vector Store Creation:** These embeddings (vectors) and their corresponding text chunks are stored in a specialized database called a vector store. This project uses FAISS (`faiss_store.py`), which allows for very fast searching of similar vectors. The store consists of `faiss_index.index` (for the vectors) and `faiss_metadata.pkl` (linking vectors back to text and metadata). This step only needs to be done once unless the source PDF changes.
***Query Analysis (`agents/query_analyzer.py`):** The user's query is analyzed using basic regex to extract keywords and entities.
25
25
***Query Embedding:** The user's question is also converted into an embedding using the same model as the document chunks.
26
26
***Retrieval (`agents/retriever.py`, `faiss_store.py`):** The system searches the FAISS vector store for the text chunks whose embeddings are most similar (closest in vector space) to the query embedding. These retrieved chunks are considered the most relevant context from the textbook.
27
27
***Context Expansion (`agents/context_expander.py` - potentially):** The retrieved context might be expanded or refined.
@@ -32,7 +32,7 @@ The primary goal is to provide students with a tool to explore and understand th
32
32
* Instructions for the AI (e.g., "Answer the user's question using *only* the provided context.").
33
33
***Generation (`agents/generator.py`, `gemini_utils.py`):** The constructed prompt is sent to the Google Gemini API. Gemini reads the prompt, understands the question and the provided context, and generates a natural language answer.
34
34
***Reference Tracking (`agents/reference_tracker.py` - potentially):** The system might track which parts of the retrieved context were used to generate the answer, potentially for citation purposes (though this isn't explicitly shown in the UI).
35
-
***Orchestration (`agents/orchestrator.py`):** This component likely manages the flow between the different agents (retriever, generator, etc.) ensuring they work together correctly.
35
+
***Orchestration (`agents/orchestrator.py`):** This component manages the flow between the different agents (retriever, generator, etc.) ensuring they work together correctly.
36
36
37
37
4.**Response Delivery:**
38
38
* The generated answer is sent back to the user interface (web or CLI) and displayed.
@@ -41,9 +41,9 @@ The primary goal is to provide students with a tool to explore and understand th
41
41
## Technology Stack
42
42
43
43
***Language:** Python
44
-
***Web Framework:** Flask (`app.py`)
44
+
***Web Framework:** Flask (`web.py`)
45
45
***Generative AI:** Google Gemini (`gemini_utils.py`)
46
46
***Vector Store:** FAISS (`faiss_store.py`)
47
-
***PDF Processing:**PyPDF (likely used in `pdf_chunker.py`)
47
+
***PDF Processing:**PyMuPDF (via `fitz` in `pdf_chunker.py`)
Copy file name to clipboardExpand all lines: README.md
+5-2Lines changed: 5 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
# Yuhasa - History Tutor Chatbot
2
2
3
-
This project implements a Retrieval-Augmented Generation (RAG) chatbot focused on answering questions about a Grade 11 History textbook. It uses Google's Gemini AI for language understanding and generation, and FAISS for efficient information retrieval from the textbook content.
3
+
This project implements a Retrieval-Augmented Generation (RAG) chatbot focused on answering questions about a Grade 11 History textbook. It uses Google's Gemini AI for language understanding and generation, and FAISS for efficient information retrieval from the textbook content. The project intentionally uses minimal dependencies (Flask, FAISS, Gemini, PyPDF) for simplicity, speed, and maintainability.
4
4
5
5
## Prerequisites
6
6
@@ -28,7 +28,7 @@ This project implements a Retrieval-Augmented Generation (RAG) chatbot focused o
28
28
29
29
pip install -r requirements.txt
30
30
```
31
-
*(Note: A `requirements.txt` file might need to be created if it doesn't exist. Based on the project files, likely dependencies include: `flask`, `google-generativeai`, `faiss-cpu` or `faiss-gpu`, `langchain` (potentially), `pypdf`, `python-dotenv`, `numpy`, `spacy`)*
31
+
*(Note: The `requirements.txt` file lists the necessary dependencies: `flask`, `google-generativeai`, `faiss-cpu`, `pypdf`, `python-dotenv`, `numpy`)*
32
32
33
33
3. **Configure API Key:**
34
34
* Create a file named `.env` in the root project directory.
@@ -76,3 +76,6 @@ There seem to be multiple ways to interact with the chatbot:
76
76
* `chats/`: Stores conversation history (JSON files).
77
77
* `grade-11-history-text-book.pdf`: The source document.
78
78
* `faiss_index.index`, `faiss_metadata.pkl`: The generated vector store files.
79
+
* `requirements.txt`: Lists the project dependencies.
80
+
* `README.md`: This file.
81
+
* `PROJECT_EXPLANATION.md`: Detailed explanation of the project architecture.
0 commit comments