Main Effects
plot_model(model, type = "pred", show.data = T)
## $ageP1
##
## $ageP2
##
## $cwcARS
##
## $gmcARS
##
## $SEX
library(dplyr)
library(lmerTest)
library(ggplot2)
library(sjPlot)
sjPlot::set_theme(base = theme_bw())
This is a guide that is designed to be your resource for making plots from multilevel models. You can use this as a starting point for visualizing your plots in a reliable way. If you want to learn more about touching up your plots to make them more visually appealing (for a publication, poster, or presentation), then you should read more on using ggplot2. You can find a cheat sheet here: https://www.rstudio.com/wp-content/uploads/2016/11/ggplot2-cheatsheet-2.1.pdf
Feel free to provide feedback so that this can serve as a useful and reliable guide for years to come. I have included what I think is sufficient annotation; however, if you believe anything is unclear, or you can think of a way to make the descriptions/explanations better, please help!
Update December 29, 2020: Special thanks to Daniel Eggelmann for prompting me to update the functions from sjPlot
!
These data were generated for a course on multilevel modeling in the Department of Psychology at University of Miami.
This guide will use the data set from probset4, since it has a diverse set of properties (e.g., non-linearity) that we would like to capture.
probset4 <- read.csv("probset4.csv", header = T, fileEncoding = "UTF-8-BOM")
probset4 <- probset4 %>%
group_by(ID) %>%
mutate(avgARS = mean(ARS)) %>%
mutate(cwcARS = ARS - avgARS)
probset4$gmcARS <- probset4$avgARS - mean(probset4$avgARS)
probset4$age0 <- probset4$AGE - min(probset4$AGE)
probset4$ageP1 <- ifelse(probset4$age0 <= 8, probset4$age0, 8)
probset4$ageP2 <- ifelse(probset4$age0 <= 8, 0, probset4$age0 - 8)
Often times the code for plotting can get messy. To clean things up and clearly separate what features we are adding to our plots, you will probably encounter two different approaches.
Approach 1: Creating an object, then adding features to the object
base.plot <- ggplot(dataName, aes(x = x, y = y, group = cluster))
base.plot + geom_line()
Approach 2: Creating a single plot, without any intermediate objects, but each feature is separated by a new line of code.
ggplot(dataName, aes(x = x, y = y, group = cluster)) +
geom_line()
When displaying code incrementally, it is often easier to use Approach 1, since you do not need to read through the same lines of code multiple times. Instead, you simply see an object, which was generated with code you already know, with some added features.
I would also recommend Approach 1 for your own models because if you want to make changes to any individual step, any errors you make in that process will be isolated, so you can easily trace where you are having problems.
Here, we want to understand the shape of our data. To do so, simply plot a regression line for every cluster.
base.plot <- ggplot(probset4, aes(x = AGE, y = INT, group = ID))
base.plot + geom_line()
This shows a clear non-linear relationship. Since this data was generated from a piecewise model, we will continue this example with the piecewise model.
You need to first build your model to view your residuals. Residuals are the distance between the observed and expected values, so this relies on the predict() function. There are also random effect estimates when you have random intercepts and slopes, which will rely on the ranef() function.
model <- lmer(INT ~ ageP1 + ageP2 + cwcARS + gmcARS + SEX + ageP1:cwcARS + ageP2:cwcARS + ageP1:gmcARS + ageP1:SEX + (1 + ageP1 + ageP2 | ID), data = probset4)
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
## Model failed to converge with max|grad| = 0.0193472 (tol = 0.002, component 1)
summary(model)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: INT ~ ageP1 + ageP2 + cwcARS + gmcARS + SEX + ageP1:cwcARS +
## ageP2:cwcARS + ageP1:gmcARS + ageP1:SEX + (1 + ageP1 + ageP2 | ID)
## Data: probset4
##
## REML criterion at convergence: 4873.5
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.5107 -0.6248 0.0447 0.6314 3.7544
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## ID (Intercept) 0.580571 0.76195
## ageP1 0.009479 0.09736 -0.77
## ageP2 0.003813 0.06175 0.51 0.16
## Residual 2.089865 1.44564
## Number of obs: 1308, groups: ID, 300
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 4.30948 0.20162 99.77947 21.374 < 2e-16 ***
## ageP1 1.22347 0.03292 158.74033 37.170 < 2e-16 ***
## ageP2 -0.03682 0.04084 230.20050 -0.902 0.3682
## cwcARS 0.41094 0.20693 1051.59911 1.986 0.0473 *
## gmcARS -1.10448 0.17411 129.70817 -6.343 3.44e-09 ***
## SEX 3.74009 0.27911 115.75695 13.400 < 2e-16 ***
## ageP1:cwcARS -0.02208 0.03577 1084.93308 -0.617 0.5372
## ageP2:cwcARS 0.56144 0.05677 1032.39992 9.890 < 2e-16 ***
## ageP1:gmcARS 0.20853 0.02876 212.17164 7.252 7.57e-12 ***
## ageP1:SEX -0.46331 0.04429 177.40236 -10.462 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) ageP1 ageP2 cwcARS gmcARS SEX agP1:cARS aP2:AR
## ageP1 -0.921
## ageP2 0.164 -0.303
## cwcARS -0.026 0.023 -0.003
## gmcARS -0.093 0.080 -0.007 -0.038
## SEX -0.697 0.622 0.018 0.032 0.034
## agP1:cwcARS 0.020 -0.017 0.003 -0.926 0.031 -0.023
## agP2:cwcARS -0.001 -0.001 0.005 0.322 -0.009 0.001 -0.523
## agP1:gmcARS 0.076 -0.062 -0.015 0.034 -0.920 -0.027 -0.027 0.010
## ageP1:SEX 0.640 -0.664 -0.030 -0.027 -0.028 -0.927 0.018 0.002
## agP1:gARS
## ageP1
## ageP2
## cwcARS
## gmcARS
## SEX
## agP1:cwcARS
## agP2:cwcARS
## agP1:gmcARS
## ageP1:SEX 0.030
## convergence code: 0
## Model failed to converge with max|grad| = 0.0193472 (tol = 0.002, component 1)
# Creating level 1 standardized residuals
probset4$predict <- predict(model)
probset4$resid <- probset4$INT - probset4$predict
probset4$zresid <- (probset4$resid - mean(probset4$resid))/sd(probset4$resid)
# Creating level 2 random effects
randeff <- ranef(model)$ID
names(randeff) <- c("u0", "u1", "u2") # Random intercept, random slope for P1, random slope for P2
randeff$ID <- rownames(randeff)
probset4 <- merge(probset4, randeff, by="ID")
To assess normality, just make a histogram of the level 1 residuals
hist(probset4$zresid)
To assess heteroscedasticity, plot residuals against predictors
plot(probset4$zresid ~ probset4$age0)
plot(probset4$zresid ~ probset4$cwcARS)
plot(probset4$zresid ~ probset4$SEX)
plot(probset4$zresid ~ probset4$gmcARS)
plot(probset4$zresid ~ probset4$predict)
To assess normality, make histograms of u0, u1, and u2
hist(probset4$u0, prob = T)
hist(probset4$u1, prob = T)
hist(probset4$u2, prob = T)
To assess heteroscedasticity, plot u0, u1, and u2 against predictors
# Random Intercepts
plot(probset4$u0 ~ probset4$age0)
plot(probset4$u0 ~ probset4$cwcARS)
plot(probset4$u0 ~ probset4$SEX)
plot(probset4$u0 ~ probset4$gmcARS)
plot(probset4$u0 ~ probset4$predict)
# Random slope (piece 1)
plot(probset4$u1 ~ probset4$age0)
plot(probset4$u1 ~ probset4$cwcARS)
plot(probset4$u1 ~ probset4$SEX)
plot(probset4$u1 ~ probset4$gmcARS)
plot(probset4$u1 ~ probset4$predict)
# Random Slope (piece 2)
plot(probset4$u2 ~ probset4$age0)
plot(probset4$u2 ~ probset4$cwcARS)
plot(probset4$u2 ~ probset4$SEX)
plot(probset4$u2 ~ probset4$gmcARS)
plot(probset4$u2 ~ probset4$predict)
The Q-Q plot for Level 2 residuals can be obtained from the plot_model
function of sjPlot
, using type = "diag"
(“diag” meaning “diagnostic”). Specifically, the Q-Q plot is the second of four plots generated from this code. The other three plots replicate some of diagnostic plots we created above.
diagnostic_plot <- sjPlot::plot_model(model, type = "diag")
diagnostic_plot[[2]]$ID
## `geom_smooth()` using formula 'y ~ x'
The remaining diagnostic plots can be isolated with the following code:
diagnostic_plot[[1]]
## `geom_smooth()` using formula 'y ~ x'
diagnostic_plot[[3]]
diagnostic_plot[[4]]
## `geom_smooth()` using formula 'y ~ x'
For most cases, you should use sjp.lmer. However, when you’re testing non-linear relationships, you will need to use the customized nonLinearPlot function to get the full picture (below).
plot_model(model, type = "pred", show.data = T)
## $ageP1
##
## $ageP2
##
## $cwcARS
##
## $gmcARS
##
## $SEX
plot_model(model, type = "int", mdrt.values = "meansd", show.data = T)
## [[1]]
##
## [[2]]
##
## [[3]]
##
## [[4]]
This is a work in progress. I built this function to create non-linear plots from both piecewise and quadratic interactions. I have expanded it to include functionality for OLS regression (with lm objects), for adding title commands (main, x-axis, y-axis, and legend titles), and for plotting main effects models. If you want to learn more about building functions, or expanding this one, I would be happy to help.
The function for nonLinearPlot() can be found in the R script nonLinearPlot.R. To use this function, simply place the nonLinearFunction_v5.R file into your working directory and run the following code:
source("nonLinearPlot.R")
Main Effects - Piecewise
nonLinearPlot(model = model, data = probset4, x1 = "ageP1", x2 = "ageP2", mod = NULL, type = "piece", scale = 5, piece.sep = 8, main = "Piecewise Main Effect of Age", xlab = "Age", ylab = "Internalizing Symptoms")
Interactions
nonLinearPlot(model = model, data = probset4, x1 = "ageP1", x2 = "ageP2", mod = "SEX", binary = T, type = "piece", scale = 5, piece.sep = 8, main = "Piecewise Interaction: Age and Sex", xlab = "Age", ylab = "Internalizing Symptoms", legend = "Sex")
nonLinearPlot(model = model, data = probset4, x1 = "ageP1", x2 = "ageP2", mod = "cwcARS", binary = F, type = "piece", scale = 5, piece.sep = 8, main = "Piecewise Interaction: Age and ARS_cwc", xlab = "Age", ylab = "Internalizing Symptoms", legend = "ARS Level")
nonLinearPlot(model = model, data = probset4, x1 = "ageP1", x2 = "ageP2", mod = "gmcARS", binary = F, type = "piece", scale = 5, piece.sep = 8, main = "Piecewise Interaction: Age and ARS_gmc", xlab = "Age", ylab = "Internalizing Symptoms", legend = "ARS Level")