Author | |
---|---|
Name | Claire Descombes |
Affiliation | Universitätsklinik für Neurochirurgie, Inselspital Bern |
Degree | MSc Statistics and Data Science, University of Bern |
Contact | claire.descombes@insel.ch |
The reference material for this course, as well as some useful literature to deepen your knowledge of R, can be found at the bottom of the page.
Section | Topic | Subtopics |
---|---|---|
4.1 Tests for comparing two groups | Tests comparing means or proportions between two groups | Student’s t-Test, Wilcoxon-Mann-Whitney Test (Mann-Whitney-U-Test), Fisher’s Exact Test, McNemar Test |
4.2 Tests for more than two groups | Tests for comparing multiple groups | Kruskal-Wallis Test, Friedman Test, Pearson’s Chi-Square Test |
4.3 Tests for distribution and normality | Tests for distribution of data | Lilliefors/ Kolmogorov-Smirnov-Lilliefors Test |
4.4 Tests for survival analysis | Tests for time-to-event data | Logrank/ Log-Rank Test |
4.5 Correlation and association tests | Tests for relationships between variables | Correlation test by Pearson, Correlation test by Spearman |
4.6 Predictive modeling and regression | Predictive models and regression techniques | Generalized Linear Models (GLMs) (Linear Regression - Logistic Regression - Cox Proportional Hazards Regression - Multivariable Regression) Mixed Effects Models, Generalized Additive Models (GAMs), Generalized Additive Mixed Models (GAMMs) |
1 Sample | 2 Paired Samples | 2 Unpaired Samples | >2 Paired Samples | >2 Unpaired Samples | Continuous Predictor | |
---|---|---|---|---|---|---|
Binary | Binomial test | McNemar test | Chi-square test, Fisher’s exact test | Cochran’s Q test | Chi-square test, extensions of Fisher’s test | Logistic regression |
Nominal | Chi-square goodness of fit test | Chi-square test, extensions of Fisher’s test | Chi-square test | Multinomial regression | ||
Ordinal | Wilcoxon signed-rank test, sign test | Sign test or Wilcoxon signed-rank test on differences | Mann-Whitney U test (Wilcoxon rank-sum test) | Friedman test | Kruskal–Wallis test | Ordinal regression |
Continuous | One-sample t-test | Paired t-test or Wilcoxon signed-rank test on differences | Two-sample t-test | Repeated measures ANOVA | ANOVA | Linear regression |
Time-to-event | One-sample log-rank test | Log-rank test | Cox regression, Weibull regression |
Basis for comparing means of two groups. Introduce first as it’s widely used and foundational.
Presented as the non-parametric alternative to the t-Test.
Focus on categorical variables and small sample sizes.
Highlight its application for paired categorical data.
Generalization of Wilcoxon-Mann-Whitney for more than two groups.
Generalization of paired tests (e.g., Wilcoxon) for more than two related groups.
Complement Fisher’s Exact Test, emphasizing it’s better suited for larger samples.
Test for deviations from normality, set the stage for determining when to use parametric vs. non-parametric tests.
For analyzing time-to-event data. Mention Kaplan-Meier curves for context.
Basis for understanding relationships between two continuous, normally distributed variables.
Non-parametric alternative for monotonic relationships.
Purpose: GLMs are an extension of linear models that allow for non-normal distributions of the response variable (e.g., binary, count, or categorical outcomes). They offer more flexibility than traditional linear regression by using different link functions and error distributions.
Key Features:
Linear relationship: GLMs assume a linear relationship between the predictors and the transformed response variable.
Link function: Links the linear predictor to the mean of the distribution. Common link functions:
Error distributions: GLMs can be applied with various error distributions: * Normal for continuous data (linear regression) * Binomial for binary data (logistic regression) * Poisson for count data
Assumptions
Common Applications
Purpose: Used to model the relationship between a continuous dependent variable and one or more independent variables. Assumptions: Linearity, normality of residuals, homoscedasticity, independence. Example Application: Predicting the price of a house based on square footage, number of rooms, etc.
Purpose: Used when the dependent variable is binary (e.g., yes/no, success/failure). Assumptions: Linear relationship between the log-odds of the outcome and predictors. Example Application: Predicting the likelihood of a disease based on age, gender, and other factors.
Purpose: Used for survival analysis, particularly when studying the time to an event (e.g., time to death, relapse). Assumptions: Proportional hazards assumption, meaning the effect of the predictor on the hazard rate is constant over time. Example Application: Analyzing the impact of age, treatment type, and other covariates on patient survival times.
Purpose: An extension of linear or logistic regression with more than one predictor variable. Assumptions: Similar to linear and logistic regression, but more complex due to multiple predictors. Example Application: Predicting a health outcome (e.g., cholesterol levels) based on multiple lifestyle factors (e.g., diet, exercise, genetics).
Purpose: Mixed effects models allow for the inclusion of both fixed and random effects, providing flexibility for hierarchical or grouped data. They are especially useful when there is variation between groups or subjects.
Key Features
Fixed effects: These are the main predictors of interest (e.g., treatment, age, etc.), which are assumed to have the same effect across all groups. Random effects: These account for variability across groups or clusters (e.g., random intercepts for subjects or random slopes for measurements over time).
Assumptions
Common Applications
Purpose: GAMs extend GLMs by allowing for non-linear relationships between predictors and the outcome. This is useful when the relationship between the independent and dependent variables is not linear.
Key Features
Non-linear terms: Uses smooth functions (e.g., splines) for predictors, allowing for flexibility in modeling.
Additive structure: The model assumes that the total effect is an additive combination of linear and smooth non-linear terms.
Link function: Like GLMs, GAMs can use different link functions depending on the distribution of the outcome variable.
Common Applications: Modeling complex relationships in patient data where the effect of treatment or time may not be linear.
Purpose: GAMMs combine the flexibility of GAMs with random effects, useful for hierarchical or clustered data.
Key Features: Like GAMs, but with the inclusion of random effects to account for variability between groups.
Applications: Ideal for longitudinal studies or hierarchical data where both non-linear relationships and random effects are present.
Assumptions
Example Application: Analyzing patient data where outcomes are influenced by both individual patient characteristics and random hospital-specific effects (e.g., variability between hospitals).