Author
Name Claire Descombes
Affiliation Universitätsklinik für Neurochirurgie, Inselspital Bern
Degree MSc Statistics and Data Science, University of Bern
Contact claire.descombes@insel.ch

The reference material for this course, as well as some useful literature to deepen your knowledge of R, can be found at the bottom of the page.

1 NHANES data sets

The National Health and Nutrition Examination (NHANES) Survey from 2011-2012 assessed overall health and nutrition of adults and children in the United States and was conducted by the National Center for Health Statistics (NCHS). The data sets can be found in the data_sets folder.

Dataset NHANES Code Description File
Demographics DEMO_G Age, sex, race/ethnicity, income, education DEMO_G.csv
Blood Pressure BPX_G Systolic/diastolic blood pressure, number of readings BPX_G.csv
Body Measures BMX_G Height, weight, BMI, waist circumference BMX_G.csv
Smoking Questionnaire SMQ_G Smoking habits, exposure to secondhand smoke SMQ_G.csv
Mortality Data NDI Mortality follow-up data (status, follow-up time, cause of death) NHANES_2011_2012_MORT.dat

💡 The codebook for each data set can be accessed either on the NCHS website or directly in R using the function nhanesCodebook(nh_table, colname) from the package nhanesA (which I used to download the data).

Follows a list of some of the key variables from the NHANES data sets.

1.1 Demographics (DEMO_G)

Variable Name Description
RIDAGEYR Participant’s age in years
RIAGENDR Participant’s gender
DMDHHSIZ Total number of people in the household
DMDHHSZA Number of children aged 5 or younger in the household
DMDHRAGE Age of the household reference person
DMDHRMAR Marital status of the household reference person
DMDHRGND Gender of the household reference person
AIALANGA Language of the interview

1.2 Blood Pressure (BPX_G)

Variable Name Description
BPXSY1 Systolic blood pressure (first reading) in mm Hg
BPXSY2 Systolic blood pressure (second reading) in mm Hg
BPXSY3 Systolic blood pressure (third reading) in mm Hg
BPXDI1 Diastolic blood pressure (first reading) in mm Hg
BPXDI2 Diastolic blood pressure (second reading) in mm Hg
BPXDI3 Diastolic blood pressure (third reading) in mm Hg
BPXPULS Pulse rate (beats per minute)
BPXPLS 60-second pulse (30-second pulse multiplied by 2)
BPXPTY Pulse type (e.g., regular or irregular)
BPXML1 Maximum inflation level (mm Hg)

1.3 Body Measures (BMX_G)

Variable Name Description
BMXWT Weight (kg)
BMXHT Standing height (cm)
BMXBMI Body Mass Index (kg/m²)
BMXWAIST Waist circumference (cm)
BMXHIP Hip circumference (cm)
BMXARML Upper arm length (cm)
BMXARMC Upper arm circumference (cm)
BMXLEG Upper leg length (cm)

1.4 Smoking Questionnaire (SMQ_G)

Variable Name Description
SMQ020 Smoked at least 100 cigarettes in life
SMQ040 Do you now smoke cigarettes?
SMQ050Q Average number of cigarettes smoked per day
SMD030 Age when first smoked cigarettes regularly
SMD070 Age when last smoked cigarettes regularly
SMQ680 Smoked cigars in the past 5 days
SMQ690 Smoked pipes in the past 5 days
SMQ700 Smoked chewing tobacco in the past 5 days
SMQ710 Smoked snuff in the past 5 days

1.5 Mortality data set (NHANES_2011_2012_MORT_2019_PUBLIC)

Variable Name Description
MORTSTAT Mortality status at the end of follow-up (0 = assumed alive, 1 = assumed deceased)
PERMTH_INT Person-months of follow-up from the NHANES interview date to date of death or censoring
PERMTH_EXM Person-months of follow-up from the NHANES examination date to date of death or censoring
UCOD_LEADING Underlying cause of death, grouped into leading cause categories (public-use)
DIABETES Indicator if diabetes was listed as a cause of death (public-use)
HYPERTEN Indicator if hypertension was listed as a cause of death (public-use)

References

Alexander Henzi. 2021. “Programming and Data Analysis with R.” Lecture notes.
Burns, Patrick. n.d. The R Inferno. Accessed May 8, 2025. https://www.burns-stat.com/documents/books/the-r-inferno/.
CDC. 2025. “National Death Index.” Data Linkage. https://www.cdc.gov/nchs/linked-data/mortality-files/index.html.
ChatGPT.” n.d. Accessed January 26, 2025. https://chatgpt.com.
Christopher J. Endres. 2025. “Introducing nhanesA.” https://cran.r-project.org/web/packages/nhanesA/vignettes/Introducing_nhanesA.html.
“Create Elegant Data Visualisations Using the Grammar of Graphics.” n.d. Accessed January 26, 2025. https://ggplot2.tidyverse.org/.
David, Author. 2016. BIRT Joins.” MBSE Chaos. https://mbsechaos.wordpress.com/2016/05/24/birt-joins/.
Elena Kosourova. n.d. RStudio Tutorial for Beginners: A Complete Guide.” Accessed January 26, 2025. https://www.datacamp.com/tutorial/r-studio-tutorial.
Grolemund, Hadley Wickham and Garrett. n.d. R for Data Science. Accessed May 8, 2025. https://r4ds.had.co.nz/introduction.html.
Mayer, Michael. 2025. “Mayer79/Statistical_computing_material.” https://github.com/mayer79/statistical_computing_material.
Patrick Burns. n.d. Impatient R. Accessed May 8, 2025. https://www.burns-stat.com/documents/tutorials/impatient-r/.
P-Value.” 2025. Wikipedia. https://en.wikipedia.org/w/index.php?title=P-value&oldid=1305292611.
“Synthetic Dataset for AI in Healthcare.” n.d. Accessed May 9, 2025. https://www.kaggle.com/datasets/smmmmmmmmmmmm/synthetic-dataset-for-ai-in-healthcare.
“The Comprehensive R Archive Network.” n.d. Accessed January 26, 2025. https://stat.ethz.ch/CRAN/.
W. N. Venables, D. M. Smith and the R Core Team. n.d. “An Introduction to R.” Accessed May 8, 2025. https://cran.r-project.org/doc/manuals/r-release/R-intro.html.
Wickham, Hadley. n.d. Advanced R. Accessed May 8, 2025. https://adv-r.hadley.nz/introduction.html.