-
Notifications
You must be signed in to change notification settings - Fork 157
fix: rework broken Academy exercises #2180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Also fix stock unit counts and make some tests more benevolent.
b56a4fc to
0e71496
Compare
|
Preview for this PR was built for commit |
Discovered apify/crawlee-python#1673 when working on this.
|
Preview for this PR was built for commit |
|
Preview for this PR was built for commit |
|
Preview for this PR was built for commit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is being reviewed by Cursor Bugbot
Details
Your team is on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle for each member of your team.
To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.
Comment @cursor review or bugbot run to trigger another review on this PR
sources/academy/webscraping/scraping_basics_python/exercises/crawlee_netflix_ratings.py
Outdated
Show resolved
Hide resolved
|
Preview for this PR was built for commit |
Wikipedia is unreliable as a scraping target, people are advised to use it's API to get the content of the articles. Also after a recent security issues, GitHub tightened up the npm registry. So this PR is mostly about replacing exercises which target those two websites.
Meanwhile IMDb changed its structure, so I fix that here too. I also had to change some tests as they proved to be too draconian.
Some time I spent on this was not fruitful, because I reworked the examples to use the UNESCO website, but it later proved to be very unreliable. Also I found apify/crawlee-python#1673 and spent some time debugging it. Fixes #2113, at least for now 😅
This is a proof from my local machine:
I have no idea whether all of the exercises will correctly work from the data center IPs of GitHub Actions, we'll see that once the tests run there. But at least they'll work for students trying to pass the courses.
Note
Shifts Academy scraping exercises to more reliable targets and aligns code, lessons, and tests.
IMOmembers (listing),WTArankings (links and player birthplaces), andGitHub TopicsLLM projects (JS); remove old Wikipedia/npm scripts and add new implementations for both JS and Python.eq(); adjust headings and sample outputs.ipc-title-link-wrapper, only process first 5 films, and export dataset; mirror changes in Python versionMon, stock count expectation77→76, add quieteruv -q, and adapt JSON filtering threshold (min_price> 50000)Written by Cursor Bugbot for commit d1cc411. Configure here.