-
Notifications
You must be signed in to change notification settings - Fork 578
Expand file tree
/
Copy pathSchedule.qmd
More file actions
341 lines (195 loc) · 22.8 KB
/
Schedule.qmd
File metadata and controls
341 lines (195 loc) · 22.8 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
---
format: html
toc: TRUE
toc-location: right
toc-depth: 4
toc-expand: true
sidebar: false
---

<div style="text-align: right;">
[![AGPL-3.0][agpl3-shield]][agpl3]
[![CC BY-SA 4.0][cc-by-sa-shield]][cc-by-sa]
</div>
### Cytometry in R: A Course for Beginners
Cytometry in R is a free virtual mini-course being organized by the [Flow Cytometry Shared Resource](https://www.medschool.umaryland.edu/cibr/core/umgccc_flow/) core at the University of Maryland's [Greenebaum Comprehensive Cancer Center](https://www.umms.org/umgccc). This course is a passion project arising from our desire to contribute back to the community. We are excited that you have chosen to take part and look forward to helping you get started on your own learning journey.
Course materials can be found [here](/course/00_GitHub/index.qmd) or via the Course tab in the navigation bar. The livestream recordings are available via [YouTube](https://www.youtube.com/@CytometryInR)
<br>
### Pre-Course Walkthroughs
{width=25%}
[**Week 0: January 26, 2026**]{.underline} In these pre-course [walk-throughs](/course/00_GitHub/index.qmd), we ensure that everyone creates a [GitHub](https://github.com/) account, and has their computer properly set up with the required software (including [R](https://cran.rstudio.com/), [Positron](https://positron.posit.co/), and [Git](https://git-scm.com/)). We then start to build individual participants familiarity with the software infrastructure that they will be using throughout the rest of the course.
<br>
<br>
### Installing R Packages
{width=25%}
[**Week 1: February 2, 2026**]{.underline} During this [first session](/course/01_InstallingRPackages/index.qmd), we learn how to install R packages from the various repositories ([CRAN](https://cran.r-project.org/), [Bioconductor](https://www.bioconductor.org/), [GitHub](https://github.com/r-lib/remotes)), and how to troubleshoot the more typical errors that occur during this process.
<br>
<br>
### File Paths
{width=50%}
[**Week 2: February 9, 2026**]{.underline} For this [second session](/course/02_FilePaths/index.qmd), we focus on how to programmatically tell your computer where to locate your experimental files, introducing the concept of [file paths](https://ytakemon.github.io/2019-10-22-R-BCCRC/02-filedir/). We explore how the various operating systems (Linux, MacOS, Windows) specify their respective folders and files, and how to identify where you are currently within the directory. Our goal by the end of this session is to have walked you through how to figure out where an .fcs file of interest is stored, and convey to your computer where you want it copied/moved to, without encountering the common pitfalls.
<br>
<br>
### Inside an .FCS file
{width=50%}
[**Week 3: February 16, 2026**]{.underline} In the course of this [third session](/course/03_InsideFCSFile/index.qmd) we will slice into an .FCS file and find out what the individual components that make it up are. In the process, we will cover the concepts of main [data structures](http://adv-r.had.co.nz/Data-structures.html) within R (vectors, matrices, data.frames, list) and how to identify what we are working with. Additionally, we will explore how various cytometry softwares store their metadata variables under various keywords that can be useful to know about.
<br>
<br>
### Introduction to the Tidyverse
{width=75%}
[**Week 4: February 23, 2026**]{.underline} Within this [fourth session](/course/04_IntroToTidyverse/index.qmd), we explore how the various [tidyverse](https://tidyverse.org/) packages can be utilized to reorganize rows and columns of data in ways that are useful for data analysis. We will primarily work with the MFI expression data we isolated from within the .fcs file in the previous session, identifying and isolating events that meet certain criterias. We introduce the concepts behind ["tidy"](https://vita.had.co.nz/papers/tidy-data.pdf) data and how it can improve our workflows.
<br>
<br>
### Gating Sets
{width=75%}
[**Week 5: March 2, 2026**]{.underline} As part of this [fifth session](/course/05_GatingSets/index.qmd), we learn about the two main flow cytometry infrastructure packages in R we will be working with during the course, [flowcore](https://www.bioconductor.org/packages/release/bioc/vignettes/flowCore/inst/doc/HowTo-flowCore.pdf) and [flowWorkspace](https://www.bioconductor.org/packages/release/bioc/vignettes/flowWorkspace/inst/doc/flowWorkspace-Introduction.html). Throughout the session, we will compare how they differ in naming, memory usage, and accessing .fcs file metadata. We additionally explore how to add keywords to their respective metadata for use in filtering specimens of interest from the larger set of .fcs files.
<br>
<br>
### Visualizing with ggplot2
{width=50%}
[**Week 6: March 16, 2026**]{.underline} During this [sixth](/course/06_Visualizing/index.qmd) session we provide an introduction to the [ggplot2](https://ggplot2.tidyverse.org/) package. We will take the datasets we have collected from the previous sessions and see how in varying in different arguments at the respective plot layers we can produce and customize many different forms of plots, focusing on both cytometry and statistics plots. We close out providing links to [additional helpful resources](https://youtu.be/_indbXPXUw8?si=iZRFHzWvBZg-wu_X) and highlight the [TidyTuesday](https://github.com/rfordatascience/tidytuesday) project.
<br>
<br>
### Applying Transformations
{width=75%}
[**Week 7: March 23, 2026**]{.underline} For this [seventh](/course/07_Transformations/index.qmd) session, we take a closer look at the raw values of the data within our .fcs files, and explore the various ways to [transform](https://docs.flowjo.com/flowjo/graphs-and-gating/gw-transform-overview/) (ie. scale) flow cytometry data in R to better visualize "positive" and "negative populations". In the process, we visualize the differences resulting from applying different transformations commonly used by commercial software.
<br>
<br>
### Conference Break 1
No class week of March 30, 2026. If you are attending the [ABRF conference](https://web.cvent.com/event/6aeb3907-0f0b-418d-a0d5-91f4de72c144/summary?RefId=ABRF%202026%20Annual%20Meeting%20Home%20Page), track me down at the [Complex Data Analysis in Flow Cytometry: Navigating the Landscape](https://web.cvent.com/event/6aeb3907-0f0b-418d-a0d5-91f4de72c144/websitePage:89d4bbd7-0f7c-4235-a335-97866af9506b) talk on Monday, March 30th at 4:30 PM.
<br>
<br>
### Manual and Automated Gating
{width=75%}
[**Week 8: April 6, 2026**]{.underline} Within this [eight](/course/08_WaysToGate/index.qmd) session, we explore various ways to implement gating for flow cytometry files in R. We will explore manual approaches utilizing [flowGate](https://www.bioconductor.org/packages/release/bioc/html/flowGate.html), as well as automated options with [openCyto](https://www.bioconductor.org/packages/release/bioc/vignettes/openCyto/inst/doc/HowToAutoGating.html) and it's gating templates. We additionally will explore how to provide gate constraints and various ways to visually screen and evaluate the outcomes within the context of our own projects.
<br>
<br>
### It's Raining Functions!
{width=75%}
[**Week 9: April 13, 2026**]{.underline} In the course of this [ninth](/course/09_Functions/index.qmd) session, we tackle one of the harder but most useful concepts to learn for a beginner, namely [functions](https://r4ds.had.co.nz/functions.html). We explore what they are, how their individual arguments work, how they differ from for-loops, and how to create our own to do useful work, reduce the number times code gets copied and pasted. Additionally, some functional programming best practices will be introduced, as well as provide introduction to how to use the walk and map functions from the [purrr](https://purrr.tidyverse.org/) package.
<br>
<br>
### Downsampling and Concatenation
{width=50%}
[**Week 10: April 20, 2026**]{.underline} Within this [tenth](/course/10_Downsampling/index.qmd) session, we will expand on our growing understanding of GatingSets, functions and fcs file internals to write a function to downsample your fcs files to a desired number (or percentage) of cells for a given cell population. We will additionally learn how to concatenate these downsampled files together, and save them to a new .fcs file in ways that the metadata can be read by commercial software without the scaling being widely thrown off.
<br>
<br>
### Retrieving data for Statistics
{width=33%}
[**Week 11: April 27, 2026**]{.underline} Leveraging the increased familiarity working with the various packages this far in the course, in this session we will retrieve summary statistics for the gates within our GatingSet, and programmatically derrive out tidy data.frames for use in statistical analyses typically used by many Immunologist. In the process, we add a couple additional plot types to our ggplot2 arsenal to hold in reserve should Prism prices go up again.
<br>
<br>
### Spectral Signatures
{width=75%}
[**Week 12:**]{.underline} As part of this session, we will explore how to extract fluorescent signatures from our raw spectral flow cytometry reference controls. Building on prior concepts, we will learn to isolate median signatures from positive and negative gates, and how to derrive and plot normalized signatures. We also introduce [plotly](https://plotly.com/r/) package and it's interactive plotting features, before showcasing various packages attempts at facilitating signature retrieval.
<br>
<br>
### Similarities and Hotspots
{width=75%}
[**Week 13:**]{.underline} During this session, we will utilize the spectral signature matrix isolated from raw spectral flow cytometry controls and evaluate different ways of evaluating how similar different fluorescent signatures are to each other. In the process, we will gain better understanding of the metrics behind similarity (cosine), panel complexity (kappa), and unmixing-dependent spreading (collinearity).
<br>
<br>
### Unmixing in R
{width=75%}
[**Week 14:**]{.underline} In the course of this session, we will attempt a reach goal of many, namely carry out unmixing of raw .fcs files using the spectral signatures we have isolated from our unmixing controls, and write to new .fcs files. After evaluating the necessary internals, we will explore how various current cytometry R packages have implemented their own unmixing functions, and the various limitations that each approach has encountered.
<br>
<br>
### Cleaning Algorithms
{width=75%}
[**Week 15:**]{.underline} In the span of this session, we will directly compare how various Bioconductor data cleanup algorithms (namely [PeacoQC](https://www.bioconductor.org/packages/release/bioc/vignettes/PeacoQC/inst/doc/PeacoQC_Vignette.pdf), [FlowAI](https://www.bioconductor.org/packages/release/bioc/vignettes/flowAI/inst/doc/flowAI.html), [FlowCut](https://www.bioconductor.org/packages/release/bioc/vignettes/flowCut/inst/doc/flowCut.html), and [FlowClean](https://www.bioconductor.org/packages/release/bioc/vignettes/flowClean/inst/doc/flowClean.pdf)) tackle distinguishing and removing bad quality events. We will see how they perform with previously identified good quality and horrific quality .fcs files. We will whether the implemented algorithmic decisions made sense, and how to customize them within our workflows to achieve our own desired goals.
<br>
<br>
### Clustering Algorithms
{width=50%}
[**Week 16:**]{.underline} As part of this session, we venture away from supervised and semi-supervised analyses to explore unsupervised clustering approaches, namely [FlowSOM](https://onlinelibrary.wiley.com/doi/10.1002/cyto.a.22625) and [Phenograph](https://www.colibri-cytometry.com/post/the-peculiarities-of-phenograph). We will compare outcomes depending markers included, transformations applied, and panel used to gain a greater familiarity with how they work. We wrap up by investigating ways to visualize marker expression of cells ending up in each cluster, and how to backgate them to our manual gates.
<br>
<br>
### Conference Break 2
No class week of June 8, 2026. If you are attending the [Cyto conference](https://www.cytoconference.org/?gad_source=1&gad_campaignid=20633392465&gbraid=0AAAAADoJzsvHaLZAq9tqn_aTAQGEzIk_V&gclid=CjwKCAiA-sXMBhAOEiwAGGw6LJFFV69xaAU3s7bElL86RdnRNFwAYqOQO78MrIYQuG1qvRU6HTN3ZRoCGmAQAvD_BwE), track me down at my talks ([Open-Source automation](https://davidrach.github.io/abstracts.html#cyto-2026---flow-awarenesss) on June 7, 10:30-11:30AM at Grand Ballroom;
and [Semi-supervised pipeline](https://davidrach.github.io/abstracts.html#cyto-2026---alpha-beta) on June 9, 10:30-11:45AM atRoom 2DEF) or poster (grab some Cytometry in R course hex stickers!)
<br>
<br>
### Normalization: Batch Effect or Real Biology
{width=75%}
[**Week 17:**]{.underline} During this session, we will dive into evaluating the performance of two commonly used normalization algorithms, [CytoNorm](https://github.com/saeyslab/CytoNorm) and [CyCombine](https://github.com/biosurf/cyCombine). We will utilize our ggplot2 and functional programming toolkits to create a customized workflow to visualize the differences for our respective cell populations before and after normalization, to better evaluate how the respective parameter choices can affect the process.
<br>
<br>
### Dimensionality Visualization
{width=75%}
[**Week 18:**]{.underline} For this session, we explore how dimensionality visualization algorithms perform [tSNE](https://github.com/lvdmaaten/bhtsne/) and [UMAP](https://github.com/jlmelville/uwot) in R using our raw and unmixed samples. In the process, we will explore how markers included, number of cells, and presence of bad quality events can impact the [final](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1011288) visualizations. Finally, we will provide an overview of how to link to Python to additionally run [PaCMAP](https://github.com/YingfanWang/PaCMAP) and [PHATE](https://github.com/KrishnaswamyLab/PHATE) visualizations for use in R.
<br>
<br>
### Annotating Unsupervised Clusters
{width=50%}
[**Week 19:**]{.underline} In the course of this session, we explore ways to scale our efficiency in figuring out what an unsupervised cluster of cells may be, by employing several annotation packages. We explore how these work under the hood in their decision making process, and how to link them to reference data from external repositories for additional evaluation.
<br>
<br>
### The Art of GitHub Diving
{width=100%}
[**Week 20:**]{.underline} Within this session, we delve into the art of investigating a new-to-you GitHub repository. We discuss the overall structure of R packages stored as source files within GitHub repositories, and how to leverage this knowledge when troubleshooting errors thrown by underdocumented R packages. We discuss how to modify identified functions, evaluate them, and process to submit helpful bug reports back to the original project to help fix the issue.
<br>
<br>
### XML Files All The Way Down
{width=50%}
[**Week 21:**]{.underline} Breaking news alert, most of the experiment templates and worksheet layouts we work with as cytometrist are .xml files. In this session, we learn some additional coding tools to allow us to work with these types of files to extract useful data. In this session, we test out our new problem solving abilities to retrieve data from SpectroFlo and Diva .xml files to monitor how our core's flow cytometers behaved for various users last week.
<br>
<br>
### Utilizing Bioconductor packages
{width=75%}
[**Week 22:**]{.underline} Many of the R packages for Flow Cytometry we have utilized in this course were packages from the [Bioconductor](https://www.bioconductor.org/) project. We take a look at what makes Bioconductor packages unique compared to packages found on GitHub and CRAN, explore some of their specific infrastructure types for flow cytometry data, and highlight some useful packages for downstream analysis that we haven't had time to properly explore.
<br>
<br>
### Building your First R package
{width=33%}
[**Week 23:**]{.underline} For most of the course, we have been working with R packages that other individuals built and maintained. In this session, we leverage all your hard work from the rest of the course and corral the unwieldly arsenal of functions you wrote into your [first R package](https://r-pkgs.org/introduction.html) for easier use. We will discuss the individual pieces of an R package, the importance of a well-setup namespace file, and how to generate help page manuals to refer future-you back to what your individual function arguments actually do.
<br>
<br>
### Everyone Get's a Quarto Website
{width=75%}
[**Week 24:**]{.underline} In this session, we will extend the knowledge of .R and .qmd files you have gained from the course and extend them to create your own website using [Quarto](). We discuss the additional files that are required, how to customize and render the website locally, and finally set up [Quarto Pub](https://quartopub.com/) or [GitHub Pages](https://docs.github.com/en/pages) website that we are to access online.
<br>
<br>
### Conference Break 3
No class week of August 10, 2026. If you are attending the [BioC conference](https://bioc2026.bioconductor.org/), track me down at my talk/poster.
<br>
<br>
### Reproducibility and Replicability
{width=50%}
[**Week 25:**]{.underline} Throughout the course, we emphasized the importance of making your workspaces and code reproducible and replicable. But what do we mean by these terms, and are there best practices we could add to our existing workflow to do this more efficiently? We explore a couple community-led efforts within the cytometry space and troubleshoot their implementation into a previously published pipeline.
<br>
<br>
### Open Source Licenses
&v=1365581685114){width=50%}
[**Week 26:**]{.underline} For this course, we have relied extensively on open-source software to create our own data analysis pipelines. In the process, you may have some recollection of the various license names. But what impact do all these different names have in the end? We take a brief deep-dive into the ecosystem of [free and open-source licenses](https://en.wikipedia.org/wiki/Free_and_open-source_software), and evaluate what their respective license terms mean for us as individual users of the code, as well as potential developers extending existing codebases.
<br>
<br>
### Validating Algorthmic Tools
{width=50%}
[**Week 27:**]{.underline} We will be the first to admit, new implementations of algorithms as R packages are awesome! We appreciate the effort that went into them and making them available to the community at large. But what is the best way of evaluating whether they behave as promised, or work for our dataset? During this session, we share tips and tricks to gain better understanding of how a new R package works, and things to watch out for when evaluating complicated algortithms. We wrap with walkthrough of how to generate simulated datasets with known distributions for use in testing.
<br>
<br>
### Databases and Repositories
{width=50%}
[**Week 28:**]{.underline} During this session, we will learn how to identify and retrieve .fcs files from databases. While many of us are accustomed to working with large datasets of our own making, many of us are increasingly encountering larger-than-memory datasets, as well as files stored in large repositories. In this session, we will explore several database focused R [packages](https://arrow.apache.org/docs/r/), before investigating how to identify and retrieve .fcs files and associated metadata of interest from repositories, namely [ImmPort](https://www.immport.org/shared/) (and maybe [FlowRepository](https://flowrepository.org/) if it can be pinged that afternoon).
<br>
<br>
### Assembling Web Data
{width=50%}
[**Week 29:**]{.underline} In this session, we briefly delve into the concepts of web-scraping and APIs in general. We highlight useful packages, namely [httr2](https://httr2.r-lib.org/) and [rvest](https://rvest.tidyverse.org/), and best practices implemented to allow respectful retrieval of useful data without crashing someone's server like some AI startup bot. We finish by providing a list of additional useful resources for those interested in learning more.
<br>
<br>
### Future Directions
{width=75%}
[**Week 30:**]{.underline} In this final of the planned sessions, we revisit our solutions to the challenge problems set out during the beginning of the course. We also discuss potential future topics to visit in the future, and any additional resources that proved helpful throughout the course.
<br>
<br>
<div style="text-align: right;">
[![AGPL-3.0][agpl3-image]][agpl3]
[![CC BY-SA 4.0][cc-by-sa-image]][cc-by-sa]
</div>
[cc-by-sa]: http://creativecommons.org/licenses/by-sa/4.0/
[cc-by-sa-image]: https://licensebuttons.net/l/by-sa/4.0/88x31.png
[cc-by-sa-shield]: https://img.shields.io/badge/License-CC%20BY--SA%204.0-lightgrey.svg
[agpl3]: https://www.gnu.org/licenses/agpl-3.0.en.html
[agpl3-image]: https://www.gnu.org/graphics/agplv3-with-text-162x68.png
[agpl3-shield]: https://img.shields.io/badge/license-AGPLv3-blue