-
Notifications
You must be signed in to change notification settings - Fork 1
Expand file tree
/
Copy pathtmwr-ch7-workflows-classification.Rmd
More file actions
94 lines (74 loc) · 2.52 KB
/
tmwr-ch7-workflows-classification.Rmd
File metadata and controls
94 lines (74 loc) · 2.52 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
---
title: "Tidy Modeling with R: Chapter 7 (Workflow Classification)"
output:
html_document: default
pdf_document: default
---
```{r performance-setup, include = FALSE}
library(tidymodels)
library(tidyverse)
tidymodels_prefer()
```
This notebook works through the contents of chapter 7 of the book TMWR on model workflows. Workflows encourage good methodology since they provide a single point of entry to the estimation components of a modeling process. Second, it enables the user to improve project organization.
The data used is the Watson job attrition dataset, which is available from the `modeldata` package.
```{r data}
data(attrition)
glimpse(attrition)
```
We split the data 80/20 into training and test sets, stratified by the outcome variable `Attrition`.
```{r split}
set.seed(502)
attr_split <- initial_split(attrition, prop = 0.80, strata = Attrition)
attr_train <- training(attr_split)
attr_test <- testing(attr_split)
```
# Simple Logistic Regression Workflow
A workflow always requires a `parsnip` model object, so we create a logistic regression model with `parsnip`.
```{r}
attr_lm <-
logistic_reg() %>%
set_engine("glm") %>%
set_mode("classification")
```
Let's add the model to a workflow.
```{r}
attr_workflow <-
workflow() %>%
add_model(attr_lm)
attr_workflow
```
We can add formula via the `add_formula()` method.
```{r}
attr_workflow <-
attr_workflow %>%
add_formula(Attrition ~ Age + DailyRate)
attr_workflow
```
We can then fit the model through the workflow object.
```{r}
lm_fit <- fit(attr_workflow, attr_train)
tidy(lm_fit)
```
Once a workflow has been fitted, we can use it to make predictions.
```{r}
predict(lm_fit, attr_test %>% slice(1:3))
```
# Evaluating the Test Set
Let's say that we've concluded our model development and have settled on a final model. There is a convenience function called `last_fit()` that will fit the model to the entire training set and evaluate it with the testing set.
```{r}
final_lm_res <- last_fit(attr_workflow, attr_split)
final_lm_res
```
The `.workflow` column contains the fitted workflow and can be pulled out of the results using:
```{r}
fitted_lm_wflow <- extract_workflow(final_lm_res)
```
Similarly, `collect_metrics()` and `collect_predictions()` provide access to the performance metrics and predictions, respectively.
```{r}
collect_metrics(final_lm_res)
```
```{r}
collect_predictions(final_lm_res) %>% slice(1:5)
```
# References
1. Max Kuhn and Julia Silge, _Tidy Modeling with R_, chapter 7. https://www.tmwr.org/workflows.html (2022)