Skip to content

Commit f813066

Browse files
authored
Merge branch 'main' into nsudh-dat-init
2 parents 836bb15 + 5a3456c commit f813066

File tree

15 files changed

+499
-258
lines changed

15 files changed

+499
-258
lines changed

DESCRIPTION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ Description: Includes datasets for the Residential Energy Consumption
1111
Survey (2015 and 2020), the American National Election Studies (2020),
1212
and the National Crime Victimization Study (2021) - household, person,
1313
and incident files.
14-
License: CC BY 4.0
14+
License: CC BY 4.0 + file LICENSE
1515
Depends:
1616
R (>= 3.5)
1717
Suggests:

LICENSE

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Re-distributing the ANES or NCVS datasets is subject to their policies.

LICENSE.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
Attribution 4.0 International
1+
# Attribution 4.0 International
22

33
=======================================================================
44

R/data.R

Lines changed: 36 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@
3131
#' \item{\code{SummerTempAway}}{double Summer thermostat setting or temperature in home when no one is home during the day}
3232
#' \item{\code{SummerTempNight}}{double Summer thermostat setting or temperature in home at night}
3333
#' \item{\code{NWEIGHT}}{double Final Analysis Weight}
34-
#' \item{\code{NWEIGHT1-NWEIGHT60}}{double Final Analysis Weight for replicate 1-60}
34+
#' \item{\code{NWEIGHT1 - NWEIGHT60}}{double Final Analysis Weight for replicate 1-60}
3535
#' \item{\code{BTUEL}}{double Total electricity use, in thousand Btu, 2020, including self-generation of solar power}
3636
#' \item{\code{DOLLAREL}}{double Total electricity cost, in dollars, 2020}
3737
#' \item{\code{BTUNG}}{double Total natural gas use, in thousand Btu, 2020}
@@ -43,7 +43,7 @@
4343
#' \item{\code{BTUWOOD}}{double Total wood use, in thousand Btu, 2020}
4444
#' \item{\code{TOTALBTU}}{double Total usage including electricity, natural gas, propane, and fuel oil, in thousand Btu, 2020}
4545
#' \item{\code{TOTALDOL}}{double Total cost including electricity, natural gas, propane, and fuel oil, in dollars, 2020}
46-
#'}
46+
#' }
4747
#' @source \url{https://www.eia.gov/consumption/residential/data/2020/index.php?view=microdata}
4848
"recs_2020"
4949

@@ -81,7 +81,7 @@
8181
#' \item{\code{TOTUCSQFT}}{double Total uncooled square footage}
8282
#' \item{\code{TOTUSQFT}}{double Total unheated square footage}
8383
#' \item{\code{NWEIGHT}}{double Final sample weight}
84-
#' \item{\code{BRRWT1-BRRWT96}}{double Replicate weight 1 through 96}
84+
#' \item{\code{BRRWT1 - BRRWT96}}{double Replicate weight 1 through 96}
8585
#' \item{\code{CDD30YR}}{double Cooling degree days, 30-year average 1981-2010, base temperature 65F}
8686
#' \item{\code{CDD65}}{double Cooling degree days in 2015, base temperature 65F}
8787
#' \item{\code{CDD80}}{double Cooling degree days in 2015, base temperature 80F (used for garage cooling load estimation only)}
@@ -103,7 +103,7 @@
103103
#' \item{\code{TOTALDOL}}{double Total cost, in dollars, 2015 }
104104
#' \item{\code{BTUWOOD}}{double Total cordwood usage, in thousand Btu, 2015 (Wood consumption is not included in TOTALBTU or TOTALDOL)}
105105
#' \item{\code{BTUPELLET}}{double Total wood pellet usage, in thousand Btu, 2015 (Wood consumption is not included in TOTALBTU or TOTALDOL)}
106-
#'}
106+
#' }
107107
#' @source \url{https://www.eia.gov/consumption/residential/data/2015/index.php?view=microdata}
108108
"recs_2015"
109109

@@ -171,7 +171,7 @@
171171
#' \item{\code{V4277E}}{integer C MULT OFF: TEACHER/SCHOOL STAFF (START 2007 Q1) (END 2016 Q4) (START 2021 Q1)}
172172
#' \item{\code{V4399}}{integer REPORTED TO POLICE}
173173
#' \item{\code{V4529}}{integer TYPE OF CRIME CODE (NEW, NCVS)}
174-
#'}
174+
#' }
175175
#' @source \url{https://doi.org/10.3886/ICPSR38429.v1}
176176
"ncvs_2021_incident"
177177

@@ -191,7 +191,7 @@
191191
#' \item{\code{V2126B}}{integer PLACE SIZE CODE - 1990, 2000, 2010 SAMPLE DESIGN (START 1995 Q3)}
192192
#' \item{\code{V2127B}}{integer REGION - 1990, 2000, 2010 SAMPLE DESIGN (START 1995 Q3)}
193193
#' \item{\code{V2129}}{integer CBSA MSA STATUS}
194-
#'}
194+
#' }
195195
#' @source \url{https://doi.org/10.3886/ICPSR38429.v1}
196196
"ncvs_2021_household"
197197

@@ -210,7 +210,7 @@
210210
#' \item{\code{V3024}}{integer HISPANIC ORIGIN}
211211
#' \item{\code{V3084}}{integer SEXUAL ORIENTATION (START 2017 Q1)}
212212
#' \item{\code{V3086}}{integer CURRENT GENDER IDENTITY (START 2017 Q1)}
213-
#'}
213+
#' }
214214
#' @source \url{https://doi.org/10.3886/ICPSR38429.v1}
215215
"ncvs_2021_person"
216216

@@ -283,7 +283,7 @@
283283
#' \item{\code{V202109x}}{double PRE-POST: SUMMARY: Voter turnout in 2020}
284284
#' \item{\code{V202110x}}{double PRE-POST: SUMMARY: 2020 Presidential vote}
285285
#' \item{\code{VotedPres2020_selection}}{factor PRE-POST: SUMMARY: 2020 Presidential vote}
286-
#'}
286+
#' }
287287
#' @source \url{https://electionstudies.org/data-center/2020-time-series-study/}
288288
"anes_2020"
289289

@@ -315,4 +315,31 @@
315315
#' \item{\code{POVERTY3}}{factor RC-POVERTY LEVEL-NEW INC (% OF US CENSUS POVERTY THRESHOLD)}
316316
#' }
317317
#' @source \url{https://www.samhsa.gov/data/data-we-collect/nsduh-national-survey-drug-use-and-health/datafiles}
318-
"nsduh_2023"
318+
"nsduh_2023"
319+
320+
#' @title California Health Interview Survey (CHIS) (2023) data
321+
#' @description A subset of variables from the CHIS 2023 Public Use File
322+
#' @format A data frame with 21671 rows and 98 variables:
323+
#' \describe{
324+
#' \item{\code{PUF1Y_ID}}{character PUBLIC USE FILE ID - CHIS 1 YEAR DATAFILES}
325+
#' \item{\code{AH1V2}}{factor HAVE USUAL SOURCE OF HEALTH CARE}
326+
#' \item{\code{AH22}}{factor DELAY/NOT GET OTHER MEDICAL CARE IN PAST 12 MOS}
327+
#' \item{\code{SMKCUR30}}{factor CURRENT SMOKER (PAST 30 DAYS)}
328+
#' \item{\code{AB1}}{factor GENERAL HEALTH CONDITION}
329+
#' \item{\code{DIABETES}}{factor DOCTOR EVER TOLD HAVE DIABETES (NON-GESTATIONAL)}
330+
#' \item{\code{BMI_P}}{double BODY MASS INDEX (PUF RECODE)}
331+
#' \item{\code{RBMI}}{factor BMI DESCRIPTIVE}
332+
#' \item{\code{AB17}}{factor DOCTOR EVER TOLD HAVE ASTHMA}
333+
#' \item{\code{DSTRS12}}{factor LIKELY HAS HAD PSYCHOLOGICAL DISTRESS IN THE LAST YEAR}
334+
#' \item{\code{AB29V2}}{factor DOCTOR EVER TOLD HAVE HIGH BLOOD PRESSURE}
335+
#' \item{\code{SPK_ENG}}{factor ENGLISH USE AND PROFICIENCY}
336+
#' \item{\code{POVLL2_P1V2}}{double POVERTY LEVEL AS TIMES OF 100% FPL (PUF RECODE V2)}
337+
#' \item{\code{POVLL}}{factor POVERTY LEVEL}
338+
#' \item{\code{SRAGE_P1}}{ordered;factor SELF-REPORTED AGE (PUF 1 YR RECODE)}
339+
#' \item{\code{SRSEX}}{factor SELF-REPORTED GENDER}
340+
#' \item{\code{OMBSRR_P1}}{factor OMB/CURRENT DOF RACE - ETHNICITY (PUF 1 YR RECODE)}
341+
#' \item{\code{RAKEDW0}}{double CHIS 2023 FINAL RAKED WEIGHT}
342+
#' \item{\code{RAKEDW1 - RAKEDW80}}{CHIS2023 RAKED WEIGHT - REPLICATE 1 through REPLICATE 80}
343+
#' }
344+
#' @source \url{https://healthpolicy.ucla.edu/our-work/public-use-files/one-year-public-use-files-pufs}
345+
"chis_2023"

README.Rmd

Lines changed: 28 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -17,8 +17,8 @@ knitr::opts_chunk$set(
1717
warning = FALSE,
1818
message = FALSE,
1919
fig.retina = 2,
20-
fig.align = 'center',
21-
tidy = 'styler'
20+
fig.align = "center",
21+
tidy = "styler"
2222
)
2323
```
2424

@@ -46,7 +46,7 @@ This package includes data from three surveys including the American National El
4646

4747
### ANES
4848

49-
The ANES data is based on the publicly available 2020 ANES data with additional derived variables and is subset to people who completed both pre and post-election interviews. The ANES Times Series Studies collect data on political polling in the United States and has been conducted since 1948.For more information about the 2020 study, see the [American National Election Studies website](https://electionstudies.org/data-center/2020-time-series-study/). On the ANES website, you can learn more about the study, see codebooks and methodology reports, and download the data (after registering). We received permission to distribute this data for the purpose of the book. Once the package is loaded, you can use the data immediately as follows:
49+
The ANES data is based on the publicly available 2020 ANES data with additional derived variables and is subset to people who completed both pre and post-election interviews. The ANES Times Series Studies collect data on political polling in the United States and has been conducted since 1948. For more information about the 2020 study, see the [American National Election Studies website](https://electionstudies.org/data-center/2020-time-series-study/). On the ANES website, you can learn more about the study, see codebooks and methodology reports, and download the data (after registering). We received permission to distribute this data for the purpose of the book. Once the package is loaded, you can use the data immediately as follows:
5050

5151
```{r}
5252
#| label: show-anes
@@ -61,7 +61,7 @@ Also, included in the package is a Stata version of the ANES data with a subset
6161
```{r}
6262
#| label: anes-stata
6363
64-
anes_stata <- haven::read_dta(system.file("extdata", "anes_2020_stata_example.dta", package="srvyrexploR"))
64+
anes_stata <- haven::read_dta(system.file("extdata", "anes_2020_stata_example.dta", package = "srvyrexploR"))
6565
```
6666

6767
### NCVS
@@ -98,6 +98,19 @@ head(recs_2020)
9898
head(recs_2020_raw)
9999
```
100100

101+
### CHIS
102+
103+
The CHIS data is a subset of variables from the 2023 California Health Interview Survey Adult Public Use File. CHIS is an annual survey of people in households in California with several topics related to [health and social determinants of health](https://healthpolicy.ucla.edu/our-work/california-health-interview-survey-chis/chis-design-and-methods/survey-topics-and-questionnaires). For more information about the study, refer to the [CHIS website](https://healthpolicy.ucla.edu/our-work/california-health-interview-survey-chis). To download a full version of the data with all variables or view codebooks, create an account and [download the public use files](https://healthpolicy.ucla.edu/our-work/public-use-files). See a snippet of the data below:
104+
105+
```{r}
106+
#| label: show-chis
107+
108+
head(chis_2023)
109+
```
110+
111+
See `?chis_2023` for more information about the data.
112+
113+
101114
## Examples
102115

103116
To analyze the survey data, we recommend using the {srvyr} package as follows:
@@ -112,16 +125,18 @@ pak::pak("gergness/srvyr")
112125
library(srvyr)
113126
114127
recs_des <- recs_2020 %>%
115-
as_survey_rep(weights = NWEIGHT, repweights = NWEIGHT1:NWEIGHT60,
116-
type = "JK1", scale = 59/60, mse = TRUE,
117-
variables=c(ACUsed, Region))
128+
as_survey_rep(
129+
weights = NWEIGHT, repweights = NWEIGHT1:NWEIGHT60,
130+
type = "JK1", scale = 59 / 60, mse = TRUE,
131+
variables = c(ACUsed, Region)
132+
)
118133
119134
recs_des
120135
121136
recs_des %>%
122137
group_by(Region) %>%
123138
summarize(
124-
p=survey_mean(ACUsed, vartype="ci", proportion = TRUE, prop_method = "logit")
139+
p = survey_mean(ACUsed, vartype = "ci", proportion = TRUE, prop_method = "logit")
125140
)
126141
```
127142

@@ -148,6 +163,10 @@ ANES:
148163

149164
+ American National Election Studies, 2021. ANES 2020 Time Series Study Full Release [dataset and documentation]. July 19, 2021 version. https://www.electionstudies.org
150165

166+
CHIS:
167+
168+
+ California Health Interview Survey. CHIS 2023 Adult Public Use Files. [Computer file]. UCLA Center for Health Policy Research, Los Angeles, CA. February 2025 version
169+
151170
NCVS:
152171

153172
+ United States. Bureau of Justice Statistics. National Crime Victimization Survey, [United States], 2021. Inter-university Consortium for Political and Social Research [distributor], 2022-09-19. https://doi.org/10.3886/ICPSR38429.v1
@@ -160,4 +179,4 @@ and Health: Public use file data users’ guide. https://www.samhsa.gov/data/dat
160179
RECS:
161180

162181
+ U.S. Energy Information Administration, 2024. Residential Energy Consumption 2020 Survey Data. [dataset and documentation]. January 2024 version. https://www.eia.gov/consumption/residential/data/2020/index.php?view=microdata
163-
+ U.S. Energy Information Administration, 2018 Residential Energy Consumption 2015 Survey Data. [dataset and documentation]. December 2018 version. https://www.eia.gov/consumption/residential/data/2015/index.php?view=microdata
182+
+ U.S. Energy Information Administration, 2018 Residential Energy Consumption 2015 Survey Data. [dataset and documentation]. December 2018 version. https://www.eia.gov/consumption/residential/data/2015/index.php?view=microdata

README.md

Lines changed: 43 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ The ANES data is based on the publicly available 2020 ANES data with
3737
additional derived variables and is subset to people who completed both
3838
pre and post-election interviews. The ANES Times Series Studies collect
3939
data on political polling in the United States and has been conducted
40-
since 1948.For more information about the 2020 study, see the [American
40+
since 1948. For more information about the 2020 study, see the [American
4141
National Election Studies
4242
website](https://electionstudies.org/data-center/2020-time-series-study/).
4343
On the ANES website, you can learn more about the study, see codebooks
@@ -280,6 +280,42 @@ head(recs_2020_raw)
280280
#> # STUDIO <dbl>, WALLTYPE <dbl>, ROOFTYPE <dbl>, HIGHCEIL <dbl>, …
281281
```
282282

283+
### CHIS
284+
285+
The CHIS data is a subset of variables from the 2023 California Health
286+
Interview Survey Adult Public Use File. CHIS is an annual survey of
287+
people in households in California with several topics related to
288+
[health and social determinants of
289+
health](https://healthpolicy.ucla.edu/our-work/california-health-interview-survey-chis/chis-design-and-methods/survey-topics-and-questionnaires).
290+
For more information about the study, refer to the [CHIS
291+
website](https://healthpolicy.ucla.edu/our-work/california-health-interview-survey-chis).
292+
To download a full version of the data with all variables or view
293+
codebooks, create an account and [download the public use
294+
files](https://healthpolicy.ucla.edu/our-work/public-use-files). See a
295+
snippet of the data below:
296+
297+
``` r
298+
head(chis_2023)
299+
#> # A tibble: 6 × 98
300+
#> PUF1Y_ID AH1V2 AH22 SMKCUR30 AB1 DIABETES BMI_P RBMI AB17 DSTRS12 AB29V2
301+
#> <chr> <fct> <fct> <fct> <fct> <fct> <dbl> <fct> <fct> <fct> <fct>
302+
#> 1 23021436 Yes No No Very … No 35.6 Obes… No No No
303+
#> 2 23009146 Yes No No Excel… No 23.0 Norm… No No No
304+
#> 3 23005039 Yes No No Good No 25.6 Over… Yes No Borde…
305+
#> 4 23025815 Yes Yes No Fair No 42.5 Obes… No No Borde…
306+
#> 5 23010158 Yes No No Good Yes 24.7 Norm… No No Yes
307+
#> 6 23006250 Yes No No Excel… No 19.1 Norm… No No No
308+
#> # ℹ 87 more variables: SPK_ENG <fct>, POVLL2_P1V2 <dbl>, POVLL <fct>,
309+
#> # SRAGE_P1 <ord>, SRSEX <fct>, OMBSRR_P1 <fct>, RAKEDW0 <dbl>, RAKEDW1 <dbl>,
310+
#> # RAKEDW2 <dbl>, RAKEDW3 <dbl>, RAKEDW4 <dbl>, RAKEDW5 <dbl>, RAKEDW6 <dbl>,
311+
#> # RAKEDW7 <dbl>, RAKEDW8 <dbl>, RAKEDW9 <dbl>, RAKEDW10 <dbl>,
312+
#> # RAKEDW11 <dbl>, RAKEDW12 <dbl>, RAKEDW13 <dbl>, RAKEDW14 <dbl>,
313+
#> # RAKEDW15 <dbl>, RAKEDW16 <dbl>, RAKEDW17 <dbl>, RAKEDW18 <dbl>,
314+
#> # RAKEDW19 <dbl>, RAKEDW20 <dbl>, RAKEDW21 <dbl>, RAKEDW22 <dbl>, …
315+
```
316+
317+
See `?chis_2023` for more information about the data.
318+
283319
## Examples
284320

285321
To analyze the survey data, we recommend using the {srvyr} package as
@@ -362,6 +398,12 @@ ANES:
362398
Full Release \[dataset and documentation\]. July 19, 2021 version.
363399
<https://www.electionstudies.org>
364400

401+
CHIS:
402+
403+
- California Health Interview Survey. CHIS 2023 Adult Public Use Files.
404+
\[Computer file\]. UCLA Center for Health Policy Research, Los
405+
Angeles, CA. February 2025 version
406+
365407
NCVS:
366408

367409
- United States. Bureau of Justice Statistics. National Crime

0 commit comments

Comments
 (0)