Hyperloo

This repository is a knowledge graph of all UWaterloo programs, majors, courses, and topics. Similar to Hyperphysics, except algorithmically generated for any and all topics instead of just physics.

I: Scraping the Data

This repository contains three Python scripts to scrape academic program, major data, and courses from the University of Waterloo's academic calendar website.

Features

programscrape.py: Scrapes all undergraduate programs and their links into programs.json.
majorscrape.py: Scrapes majors under each program from programs.json and saves them into majors.json.
coursescraper.py: Scrapes courses under each major from majors.json and saves them into courses.json.
syllabuscraper.py: Scrapes syllabi under each course from courses.json and saves them into syllabi.json.

Prerequisites

Python 3.9+: Download Python
Chrome WebDriver: Required for Selenium automation.
- Download from ChromeDriver
- Ensure it is added to your system PATH or place it in the project directory.
Git (Optional): For cloning the repository.
pip install selenium
pip install beautifulsoup4
pip install spacy

Installation

1. Clone the Repository

git clone https://github.com/tumph/hyperloo.git
cd scrapers

2. Install dependencies

pip install selenium
pip install beautifulsoup4
pip install spacy

3. Run the scripts

Step 1: Scrape Programs Run the first script to generate programs.json:

Hyperloo/scrapers/programscraper/programscrape.py

python programscrape.py

Step 2: Scrape Majors After programs.json is generated, run the second script to scrape majors:

Hyperloo/scrapers/majorscraper/majorscraper.py

python majorscrape.py

Step 3: Scrape Courses After majors.json is generated, run the third script to scrape courses:

Hyperloo/scrapers/coursescraper/coursescraper.py

python coursescraper.py

Obviously, if you have a mac you need to configure your venv in order to run the python scripts and pip.

Step 3b: Stem Major Scrape

You need to run

python STEMfilter.py

in order to generate the stem_majors.json file. This file is used to filter out the majors that are not relevant to the topic of interest.

Step 4: Scrape Syllabi Then, you need to run scrape syllabus. After courses.json is generated, run the fourth script to scrape syllabi:

Hyperloo/scrapers/syllabuscraper/syllabuscraper.py

python syllabuscraper.py

This creates the syllabi.json file.

II: NLP and Processing

Generating the NLP model is the most time consuming part of the process. It takes a few hours to train, so we made a chunker that splits up the syllabi text into 60 chunks that all get processed parallelly. The chunker is located in the NLP folder.

The chunker is a python script that takes syllabi.json as input and outputs a new folder called chunks that contains the chunked syllabi.json files.

Run

python NLPtrainer.py

in order to train the NLP model. This will create a new folder called syllabus_classifierv4 that contains the trained model.

Then, go into NLP/Processing and run these commands as in commands.txt

Split into chunks

python split_syllabi.py

Process in parallel (use nohup for long-running)

chmod +x run_parallel.sh
./run_parallel.sh

Combine results

cat trees/trees_*.jsonl > final_trees.jsonl

#combine error jsnols as well

cat missedtrees/trees_*.jsonl > final_missed_trees.jsonl

III: Generating Knowledge Graph

Taking the final trees.jsonl file, we can generate the knowledge graph. The knowledge graph is a JSON file that contains all the information about the topics, majors, and courses. It is located in the UI folder.

Convert the trees.jsonl file into a JSON file, and then run the Graph.js file on it.

You are done!

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
NLP		NLP
UI		UI
graph		graph
scrapers		scrapers
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Hyperloo

I: Scraping the Data

Features

Prerequisites

Installation

1. Clone the Repository

2. Install dependencies

3. Run the scripts

II: NLP and Processing

Split into chunks

Process in parallel (use nohup for long-running)

Combine results

III: Generating Knowledge Graph

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

Tumph/Hyperloo

Folders and files

Latest commit

History

Repository files navigation

Hyperloo

I: Scraping the Data

Features

Prerequisites

Installation

1. Clone the Repository

2. Install dependencies

3. Run the scripts

II: NLP and Processing

Split into chunks

Process in parallel (use nohup for long-running)

Combine results

III: Generating Knowledge Graph

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages