tesaunders / r-webscraping Public

Notifications You must be signed in to change notification settings
Fork 0
Star 0

A simple tutorial for scraping tables from webpages using R and GitHub Actions.

tesaunders.github.io/r-webscraping/

0 stars 0 forks Branches Tags Activity

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
.github/workflows		.github/workflows
assets		assets
data		data
docs		docs
.gitignore		.gitignore
.nojekyl		.nojekyl
LICENSE		LICENSE
README.md		README.md
_quarto.yml		_quarto.yml
analyse.R		analyse.R
r-scraping-tutorial.Rproj		r-scraping-tutorial.Rproj
scrape-carjam.R		scrape-carjam.R
scrape-imdb.R		scrape-imdb.R
scrape-wiki.R		scrape-wiki.R
slides.qmd		slides.qmd

Repository files navigation

R webscraping tutorial

A quick introduction to webscraping in R using {rvest} with a couple of use cases relevant for researchers.

R scripts

scrape-wiki.R shows an example of scraping multiple tables from a website (in this case Wikipedia) and joining them together based on a column with common values.
scrape-carjam.R shows an example of scraping tables from multiple webpages using an offset in the URL.
scrape-imdb.R shows an example of scraping a single table weekly, and using a GitHub action to automate this process (in .github/workflows/update.yml), in order to build a dataset to show changes over time.

Data scraped from the IMDB example is saved in /data, and analyse.R can be used to analyse and plot the data to show changes over time.

Other files

slides.qmd is a Quarto markdown document containing code used to create slides in the /docs directory, based ona couple of options in _quarto.yml, which are then published here via GitHub pages.
r-scraping-tutorial.Rproj is an RStudio project file and .gitignore is created by RStudio to exclude certain files from being tracked by version control.
.nojekyll tells GitHub pages not to do any further processing to the 'website' used to host the slides.

Licence

All code in this repository is licensed under the MIT license.