A quick introduction to webscraping in R using {rvest} with a couple of use cases relevant for researchers.
scrape-wiki.Rshows an example of scraping multiple tables from a website (in this case Wikipedia) and joining them together based on a column with common values.scrape-carjam.Rshows an example of scraping tables from multiple webpages using an offset in the URL.scrape-imdb.Rshows an example of scraping a single table weekly, and using a GitHub action to automate this process (in.github/workflows/update.yml), in order to build a dataset to show changes over time.
Data scraped from the IMDB example is saved in /data, and analyse.R can be used to analyse and plot the data to show changes over time.
slides.qmdis a Quarto markdown document containing code used to create slides in the/docsdirectory, based ona couple of options in_quarto.yml, which are then published here via GitHub pages.r-scraping-tutorial.Rprojis an RStudio project file and.gitignoreis created by RStudio to exclude certain files from being tracked by version control..nojekylltells GitHub pages not to do any further processing to the 'website' used to host the slides.
All code in this repository is licensed under the MIT license.