This repository or a rather demonstration of BeautifulSoup3 in scraping movie information from TMDB and displaying it to a user through FastAPI in form of automatically generated documentation using Swagger.
Both, .json and console output are available to the user without the need for a database, although, that could potentially add to the completion of the repo.
Feel free to fork, clone and tinker with it. This is part of my ever "expanding" portoflio to showcase my understanding of development on a much smaller scale like personal development to enhance my ability to solve problems and atmost learn, learn and learn.
A To Do list:
- Working WebScraper where it scrapes
npages instead of just one -> [?] - Working FastAPI backend -> [x]
- Working Authentication method -> []
- Working a React front-end (optional) -> []
- Working Docker deployment -> [x]
- Python 3.12+
- React + Vite
- BeautifulSoup
- Asyncio HTTP
- FastAPI
-
Clone the repository
git clone https://github.com/jsnieg/python-webscraper.git <optional directory>
-
Install dependecies
- Move into the directory
cd python-webscraper pip install -r 'requirements.txt'
- Move into the directory
-
Running the backend
python backend/src/run.pyorcd backend/src/apithenfastapi run api.py.
Work in Progress.
-
Clone the repository
git clone https://github.com/jsnieg/python-webscraper.git <optional directory>
-
Move into the directory
cd <your directory> -
Create a Python Virtual Environment (to not install packages system-wide)
- python -m venv /path/to/new/virtual/env
-
Run
/path/to/venv/<your venv>/bin/pip install -r requirements.txt -
Test running
/path/to/env/<your env>/bin/python backend/src/main.pyshould run the script. -
Test running
/path/to/env/<your env>/bin/fastapi run backend/src/api/api.pyit should run fastapi with docs, alternatively replace run with dev for hot-reload.
-
Download Docker Desktop (Ubuntu or Windows).
-
Run
docker build -t webscraper .and wait until finished. -
Run
docker run -d --name 'webscraper-container' -p 80:80 webscraperthis launches a container. -
Connect to
127.0.0.1:80or127.0.0.1/docs. Voila.
Janusz Snieg
TBA
-
This is a very first personal project where I sat down and actually coded something up by just using documentation, form of AI only for troubleshooting, relying on Stacks and articles found.
-
When moving development onto Linux, I came across issues of the OS refusal of installing packages on system-wide scale. Hence, I needed to move into virtual environemnts. As a challenge, I did not want to use
anacondabut rather built-inpythonv-env and being able to successfully to run the script same way it did on Windows. -
Performance as of 23/07/2025 isn't a concern for this project, maybe down the line. However, aim of this was to learn basics/advanced usages of WebScraping and API development.