This is the Search Engine Project for CS 121, Group 20. This is a search engine for a selection of subdomains that belong to UC Irvine's School of ICS. It instantly (usually within 150 ms) shows top-ranking results among 55,000+ webpages.
Goup 20 members:
The project has several segments:
- text extraction from HTML format
- tokenization and stemming
- index creation
- given a query, rank documents' relevance using Cosine Similarity
Visit this search engine website and simply type in your query and search.
For example, try searching VR gaming, machine learning, the name of your favorite professor, and more...
- Download and unzip the directory containing information of all the scraped webpages.
- run
M1function insidemain.pyto create the full index file. This process may take some time.
- a) Search in Terminal
- run
M2n3function insidemain.pyand interact with the prompt
- run
- b) Host Webpage and Its Backend
- run the command
python3 flask_server.pyin terminal. You can change the port insideflask_server.pyas you wish. - visit the host machine's
IP_address:port_numberusing a browser to access the Web search interface.
- run the command