Thank you for volunteering to teach this one-hour session on using the pandas library to analyze data. This teaching guide explains our setup and the suggested material to cover.
The class is one hour long. The exercises live in this Jupyter notebook.
In this session, you'll learn how to analyze data using the popular Python data analysis library pandas. You'll learn about the benefits of scripting your data projects and enough syntax to load, sort, filter and group a data set.
This class is good for: People who are comfortable working with data in spreadsheets or SQL and want to make the leap to programming.
Attendees should leave with a basic understanding of:
- How to write and run Python code in a Jupyter notebook
- When it makes sense to script your analysis (as opposed to just using Excel, SQL, etc.)
- Loading a CSV into a
pandasdataframe - Inspecting the dataframe with
head(),describe()and other methods - Sorting data with
sort_values() - Filtering data
- Grouping data (if time)
- How to find help when they get stuck
I Do, We Do, You Do. Demonstrate a concept, go through it together, then give them plenty of time to experiment on their own while you and your coach walk around and answer questions (see sections marked ✍️ Try it yourself). The pace will be slower than you think, and that's OK! It's not the end of the world if you don't get through everything.
Most people who come to this class will have zero experience with programming, so be empathetic and try to remember how frustrating it is to feel lost.
Having the students open the included syntax reference notebook can be useful for reinforcing some basics.
We'll have the latest version of Python 3 installed. We're using uv to manage the virtual environment and project dependencies (jupyterlab and pandas), which will already have been installed and tested prior to your session.
Begin the class by (slowly!) walking everyone through the process of activating their virtual environments and launching Jupyterlab. (If you prefer to use a different tool, like the Jupyter extension in VS code or whatever, that's fine too. Or, if you're in a "BYO laptop" lab, use a tool like CoLab, which can load files directly from GitHub repos.)
- Open Terminal (or
cmdon a PC) cdinto your class directoryuv run jupyter lab
It will take everyone a few minutes to get going. You'll also probably get some questions about what, exactly, you're doing at this step. Try to avoid a lengthy digression into virtual environments, if you can -- it's beyond the scope of this hourlong session, so maybe offer to talk to them after class, or send 'em our way: training@ire.org.
Once everyone is good to go, toggle back to the terminal and show them what's going on: A Jupyter server is running in the background, so don't close that terminal window!
Go over some notebook basics: Adding cells, writing code and running cells, etc. A common beginner gotcha: Writing code that other cells depend on but forgetting to first run it to make it available.
Start working your way down the notebook: Importing pandas, loading data from file, sorting, filtering, grouping. Pause frequently to ask if anyone has questions.
Any time you see ✍️ Try it yourself, hit the brakes and give everyone time to play around with whatever concept you're discussing.
If you can, find an opportunity when someone has gotten an error and take 5 minutes to walk through basic debugging strategy: Reading the traceback error from bottom to top, strategic Googling, etc.
Unlikely! But if you have extra time, you can ask them to load the combined file of MLB salaries to take their analysis up one level, compare over time, etc.
- Have everyone close out of their notebook tabs
- In terminal,
Ctrl+Cto kill the server process - Close the terminal window
- Clone or download/unzip this repo onto your computer
- In your command-line interface,
cdinto the folder uv syncuv run jupyter lab