SyllabusRAG helps you ask questions about a university syllabus easily.
It reads a syllabus file, builds a knowledge base and uses an AI model to answer questions.
- The syllabus data is stored in a file called
syllabus.json. - Each course is read and turned into a simple text summary.
- These summaries are saved in
knowledge_base.txt. - The summaries are converted into vector embeddings using sentence-transformers.
- The vectors are stored in a FAISS index for fast similarity search.
- A small AI model (
google/gemma-7b-it) is loaded from Hugging Face. - When you ask a question:
- If you include a course code, the system retrieves that course directly.
- If not, it finds the closest course by comparing your question with the vector index.
- The AI generates an answer based only on the matched course information.
- If nothing relevant is found, it clearly responds that it cannot answer.
.
├── README.md
├── data
│ └── syllabus.json # Course syllabus data in JSON format
├── main.ipynb # Main implementation notebook
├── report/
│ └── report.pdf # Project report
├── requirements.txt # Python dependencies
└── training_data/
├── knowledge_base.txt # Processed text summaries
└── faiss_index/
├── index.faiss # FAISS vector index
└── index.pkl # Index metadata
To run the project, create a requirements.txt file with the following content:
transformers
torch
accelerate
bitsandbytes
langchain
langchain-community
sentence-transformers
faiss-cpu
langchain-huggingface
Then, run this command in your notebook or terminal:
!pip install -r requirements.txt
- Open Google Colab (recommended) or Jupyter Notebook with GPU support.
- Clone or download this repository and upload
main.ipynbalong withsyllabus.json. - Create a free Hugging Face account and copy your access token.
- In Colab, click the 🔑 icon on the left, add a new secret named
HF_TOKENand paste your token. - Open
main.ipynband run all cells.
This will:
- Install required dependencies ⚙️
- Process the syllabus data 📑
- Build a FAISS index 📊
- Load the model 🤖
- Enable direct Q&A from the syllabus 💬
ask_question("What is the title for BECE309L?")
ask_question("What are the prerequisites for BECE309L?")
ask_question("What are the objectives of the Artifical Intelligence and Machine Learning course?")✅ After completing these steps, your setup will be ready to query the syllabus!
For detailed analysis and evaluation results, check out the comprehensive project report:
PDF: report/report.pdf
The report includes qualitative and quantitative analysis, performance metrics and experimental results.
- The model is very basic and only lightly fine-tuned on a small dataset.
- Data pre-processing is raw and automated, not carefully cleaned or standardized.
- Because of this, the model can sometimes give inaccurate or incomplete answers.
- It works best only for specific, direct questions related to the syllabus.
- This project should be seen as a beginner-friendly starting point, not a production-ready system.