Author |
|
Name |
Claire Descombes |
Affiliation |
Universitätsklinik für Neurochirurgie, Inselspital
Bern |
Degree |
MSc Statistics and Data Science, University of
Bern |
Contact |
claire.descombes@insel.ch |
The reference material for this course, as well as some useful
literature to deepen your knowledge of R, can be found at the bottom of
the page.
NHANES data sets
The National Health and Nutrition Examination (NHANES
) Survey
from 2011-2012 assessed overall health and nutrition of adults and
children in the United States and was conducted by the National Center
for Health Statistics (NCHS). The data sets can be found in the data_sets
folder.
Dataset |
NHANES Code |
Description |
File |
Demographics |
DEMO_G |
Age, sex, race/ethnicity, income, education |
DEMO_G.csv |
Blood Pressure |
BPX_G |
Systolic/diastolic blood pressure, number of readings |
BPX_G.csv |
Body Measures |
BMX_G |
Height, weight, BMI, waist circumference |
BMX_G.csv |
Smoking Questionnaire |
SMQ_G |
Smoking habits, exposure to secondhand smoke |
SMQ_G.csv |
Mortality Data |
NDI |
Mortality follow-up data (status, follow-up time, cause of
death) |
NHANES_2011_2012_MORT.dat |
💡 The codebook for each data set can be accessed either on the NCHS website
or directly in R
using the function
nhanesCodebook(nh_table, colname)
from the package
nhanesA
(which I used to download the data).
Follows a list of some of the key variables from the NHANES data
sets.
Demographics
(DEMO_G)
Variable Name |
Description |
RIDAGEYR |
Participant’s age in years |
RIAGENDR |
Participant’s gender |
DMDHHSIZ |
Total number of people in the household |
DMDHHSZA |
Number of children aged 5 or younger in the household |
DMDHRAGE |
Age of the household reference person |
DMDHRMAR |
Marital status of the household reference person |
DMDHRGND |
Gender of the household reference person |
AIALANGA |
Language of the interview |
Blood Pressure
(BPX_G)
Variable Name |
Description |
BPXSY1 |
Systolic blood pressure (first reading) in mm Hg |
BPXSY2 |
Systolic blood pressure (second reading) in mm Hg |
BPXSY3 |
Systolic blood pressure (third reading) in mm Hg |
BPXDI1 |
Diastolic blood pressure (first reading) in mm Hg |
BPXDI2 |
Diastolic blood pressure (second reading) in mm Hg |
BPXDI3 |
Diastolic blood pressure (third reading) in mm Hg |
BPXPULS |
Pulse rate (beats per minute) |
BPXPLS |
60-second pulse (30-second pulse multiplied by 2) |
BPXPTY |
Pulse type (e.g., regular or irregular) |
BPXML1 |
Maximum inflation level (mm Hg) |
Body Measures
(BMX_G)
Variable Name |
Description |
BMXWT |
Weight (kg) |
BMXHT |
Standing height (cm) |
BMXBMI |
Body Mass Index (kg/m²) |
BMXWAIST |
Waist circumference (cm) |
BMXHIP |
Hip circumference (cm) |
BMXARML |
Upper arm length (cm) |
BMXARMC |
Upper arm circumference (cm) |
BMXLEG |
Upper leg length (cm) |
Smoking Questionnaire
(SMQ_G)
Variable Name |
Description |
SMQ020 |
Smoked at least 100 cigarettes in life |
SMQ040 |
Do you now smoke cigarettes? |
SMQ050Q |
Average number of cigarettes smoked per day |
SMD030 |
Age when first smoked cigarettes regularly |
SMD070 |
Age when last smoked cigarettes regularly |
SMQ680 |
Smoked cigars in the past 5 days |
SMQ690 |
Smoked pipes in the past 5 days |
SMQ700 |
Smoked chewing tobacco in the past 5 days |
SMQ710 |
Smoked snuff in the past 5 days |
Mortality data set
(NHANES_2011_2012_MORT_2019_PUBLIC)
Variable Name |
Description |
MORTSTAT |
Mortality status at the end of follow-up (0 = assumed alive, 1 =
assumed deceased) |
PERMTH_INT |
Person-months of follow-up from the NHANES interview date to date of
death or censoring |
PERMTH_EXM |
Person-months of follow-up from the NHANES examination date to date
of death or censoring |
UCOD_LEADING |
Underlying cause of death, grouped into leading cause categories
(public-use) |
DIABETES |
Indicator if diabetes was listed as a cause of death
(public-use) |
HYPERTEN |
Indicator if hypertension was listed as a cause of death
(public-use) |
References
Alexander Henzi. 2021. “Programming and Data Analysis with
R.” Lecture notes.
Burns, Patrick. n.d.
The R Inferno.
Accessed May 8, 2025.
https://www.burns-stat.com/documents/books/the-r-inferno/.
“ChatGPT.” n.d. Accessed January 26, 2025.
https://chatgpt.com.
“Create Elegant Data
Visualisations Using the Grammar
of Graphics.” n.d. Accessed January 26, 2025.
https://ggplot2.tidyverse.org/.
Elena Kosourova. n.d.
“RStudio Tutorial
for Beginners: A Complete
Guide.” Accessed January 26, 2025.
https://www.datacamp.com/tutorial/r-studio-tutorial.
Grolemund, Hadley Wickham and Garrett. n.d.
R for Data
Science. Accessed May 8, 2025.
https://r4ds.had.co.nz/introduction.html.
Mayer, Michael. 2025.
“Mayer79/Statistical_computing_material.” https://github.com/mayer79/statistical_computing_material.
“The Comprehensive R
Archive Network.” n.d. Accessed January
26, 2025.
https://stat.ethz.ch/CRAN/.
W. N. Venables, D. M. Smith and the R Core Team. n.d.
“An
Introduction to R.” Accessed May 8,
2025.
https://cran.r-project.org/doc/manuals/r-release/R-intro.html.
Wickham, Hadley. n.d.
Advanced R. Accessed May 8,
2025.
https://adv-r.hadley.nz/introduction.html.