library(dplyr)
library(lmerTest)
library(ggplot2)
library(sjPlot)
sjPlot::set_theme(base = theme_bw())

Use This Guide! (especially for lmer objects)

This is a guide that is designed to be your resource for making plots from multilevel models. You can use this as a starting point for visualizing your plots in a reliable way. If you want to learn more about touching up your plots to make them more visually appealing (for a publication, poster, or presentation), then you should read more on using ggplot2. You can find a cheat sheet here: https://www.rstudio.com/wp-content/uploads/2016/11/ggplot2-cheatsheet-2.1.pdf

Feel free to provide feedback so that this can serve as a useful and reliable guide for years to come. I have included what I think is sufficient annotation; however, if you believe anything is unclear, or you can think of a way to make the descriptions/explanations better, please help!

Update December 29, 2020: Special thanks to Daniel Eggelmann for prompting me to update the functions from sjPlot!

Data Background

These data were generated for a course on multilevel modeling in the Department of Psychology at University of Miami.

This guide will use the data set from probset4, since it has a diverse set of properties (e.g., non-linearity) that we would like to capture.

probset4 <- read.csv("probset4.csv", header = T, fileEncoding = "UTF-8-BOM")

probset4 <- probset4 %>% 
  group_by(ID) %>% 
  mutate(avgARS = mean(ARS)) %>% 
  mutate(cwcARS = ARS - avgARS)

probset4$gmcARS <- probset4$avgARS - mean(probset4$avgARS)

probset4$age0 <- probset4$AGE - min(probset4$AGE)

probset4$ageP1 <- ifelse(probset4$age0 <= 8, probset4$age0, 8)

probset4$ageP2 <- ifelse(probset4$age0 <= 8, 0, probset4$age0 - 8)

Approaching Plots

Often times the code for plotting can get messy. To clean things up and clearly separate what features we are adding to our plots, you will probably encounter two different approaches.

Approach 1: Creating an object, then adding features to the object

base.plot <- ggplot(dataName, aes(x = x, y = y, group = cluster))
base.plot + geom_line()

Approach 2: Creating a single plot, without any intermediate objects, but each feature is separated by a new line of code.

ggplot(dataName, aes(x = x, y = y, group = cluster)) +
  geom_line()

When displaying code incrementally, it is often easier to use Approach 1, since you do not need to read through the same lines of code multiple times. Instead, you simply see an object, which was generated with code you already know, with some added features.

I would also recommend Approach 1 for your own models because if you want to make changes to any individual step, any errors you make in that process will be isolated, so you can easily trace where you are having problems.

Raw Data

Here, we want to understand the shape of our data. To do so, simply plot a regression line for every cluster.

base.plot <- ggplot(probset4, aes(x = AGE, y = INT, group = ID))
base.plot + geom_line()

This shows a clear non-linear relationship. Since this data was generated from a piecewise model, we will continue this example with the piecewise model.

Diagnostic Plots

You need to first build your model to view your residuals. Residuals are the distance between the observed and expected values, so this relies on the predict() function. There are also random effect estimates when you have random intercepts and slopes, which will rely on the ranef() function.

model <- lmer(INT ~ ageP1 + ageP2 + cwcARS + gmcARS + SEX + ageP1:cwcARS + ageP2:cwcARS + ageP1:gmcARS + ageP1:SEX + (1 + ageP1 + ageP2 | ID), data = probset4)
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
## Model failed to converge with max|grad| = 0.0193472 (tol = 0.002, component 1)
summary(model)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: INT ~ ageP1 + ageP2 + cwcARS + gmcARS + SEX + ageP1:cwcARS +  
##     ageP2:cwcARS + ageP1:gmcARS + ageP1:SEX + (1 + ageP1 + ageP2 |      ID)
##    Data: probset4
## 
## REML criterion at convergence: 4873.5
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.5107 -0.6248  0.0447  0.6314  3.7544 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr       
##  ID       (Intercept) 0.580571 0.76195             
##           ageP1       0.009479 0.09736  -0.77      
##           ageP2       0.003813 0.06175   0.51  0.16
##  Residual             2.089865 1.44564             
## Number of obs: 1308, groups:  ID, 300
## 
## Fixed effects:
##                Estimate Std. Error         df t value Pr(>|t|)    
## (Intercept)     4.30948    0.20162   99.77947  21.374  < 2e-16 ***
## ageP1           1.22347    0.03292  158.74033  37.170  < 2e-16 ***
## ageP2          -0.03682    0.04084  230.20050  -0.902   0.3682    
## cwcARS          0.41094    0.20693 1051.59911   1.986   0.0473 *  
## gmcARS         -1.10448    0.17411  129.70817  -6.343 3.44e-09 ***
## SEX             3.74009    0.27911  115.75695  13.400  < 2e-16 ***
## ageP1:cwcARS   -0.02208    0.03577 1084.93308  -0.617   0.5372    
## ageP2:cwcARS    0.56144    0.05677 1032.39992   9.890  < 2e-16 ***
## ageP1:gmcARS    0.20853    0.02876  212.17164   7.252 7.57e-12 ***
## ageP1:SEX      -0.46331    0.04429  177.40236 -10.462  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) ageP1  ageP2  cwcARS gmcARS SEX    agP1:cARS aP2:AR
## ageP1       -0.921                                                    
## ageP2        0.164 -0.303                                             
## cwcARS      -0.026  0.023 -0.003                                      
## gmcARS      -0.093  0.080 -0.007 -0.038                               
## SEX         -0.697  0.622  0.018  0.032  0.034                        
## agP1:cwcARS  0.020 -0.017  0.003 -0.926  0.031 -0.023                 
## agP2:cwcARS -0.001 -0.001  0.005  0.322 -0.009  0.001 -0.523          
## agP1:gmcARS  0.076 -0.062 -0.015  0.034 -0.920 -0.027 -0.027     0.010
## ageP1:SEX    0.640 -0.664 -0.030 -0.027 -0.028 -0.927  0.018     0.002
##             agP1:gARS
## ageP1                
## ageP2                
## cwcARS               
## gmcARS               
## SEX                  
## agP1:cwcARS          
## agP2:cwcARS          
## agP1:gmcARS          
## ageP1:SEX    0.030   
## convergence code: 0
## Model failed to converge with max|grad| = 0.0193472 (tol = 0.002, component 1)
# Creating level 1 standardized residuals

probset4$predict <- predict(model)

probset4$resid <- probset4$INT - probset4$predict

probset4$zresid <- (probset4$resid - mean(probset4$resid))/sd(probset4$resid)

# Creating level 2 random effects

randeff <- ranef(model)$ID

names(randeff) <- c("u0", "u1", "u2") # Random intercept, random slope for P1, random slope for P2
randeff$ID <- rownames(randeff)

probset4 <- merge(probset4, randeff, by="ID")

Level 1 Residuals

Normality

To assess normality, just make a histogram of the level 1 residuals

hist(probset4$zresid)

Heteroscedasticity

To assess heteroscedasticity, plot residuals against predictors

plot(probset4$zresid ~ probset4$age0)

plot(probset4$zresid ~ probset4$cwcARS)

plot(probset4$zresid ~ probset4$SEX)

plot(probset4$zresid ~ probset4$gmcARS)

plot(probset4$zresid ~ probset4$predict)

Level 2 Residuals

Normality

To assess normality, make histograms of u0, u1, and u2

hist(probset4$u0, prob = T)

hist(probset4$u1, prob = T)

hist(probset4$u2, prob = T)

To assess heteroscedasticity, plot u0, u1, and u2 against predictors

# Random Intercepts
plot(probset4$u0 ~ probset4$age0)

plot(probset4$u0 ~ probset4$cwcARS)

plot(probset4$u0 ~ probset4$SEX)

plot(probset4$u0 ~ probset4$gmcARS)

plot(probset4$u0 ~ probset4$predict)

# Random slope (piece 1)

plot(probset4$u1 ~ probset4$age0)

plot(probset4$u1 ~ probset4$cwcARS)

plot(probset4$u1 ~ probset4$SEX)

plot(probset4$u1 ~ probset4$gmcARS)

plot(probset4$u1 ~ probset4$predict)

# Random Slope (piece 2)

plot(probset4$u2 ~ probset4$age0)

plot(probset4$u2 ~ probset4$cwcARS)

plot(probset4$u2 ~ probset4$SEX)

plot(probset4$u2 ~ probset4$gmcARS)

plot(probset4$u2 ~ probset4$predict)

Q-Q Plot of random effects

The Q-Q plot for Level 2 residuals can be obtained from the plot_model function of sjPlot, using type = "diag" (“diag” meaning “diagnostic”). Specifically, the Q-Q plot is the second of four plots generated from this code. The other three plots replicate some of diagnostic plots we created above.

diagnostic_plot <- sjPlot::plot_model(model, type = "diag")

diagnostic_plot[[2]]$ID
## `geom_smooth()` using formula 'y ~ x'

The remaining diagnostic plots can be isolated with the following code:

diagnostic_plot[[1]]
## `geom_smooth()` using formula 'y ~ x'

diagnostic_plot[[3]]

diagnostic_plot[[4]]
## `geom_smooth()` using formula 'y ~ x'

Prediction Plots

For most cases, you should use sjp.lmer. However, when you’re testing non-linear relationships, you will need to use the customized nonLinearPlot function to get the full picture (below).

Main Effects

plot_model(model, type = "pred", show.data = T)
## $ageP1

## 
## $ageP2

## 
## $cwcARS

## 
## $gmcARS

## 
## $SEX

Interactions

plot_model(model, type = "int", mdrt.values = "meansd", show.data = T)
## [[1]]

## 
## [[2]]

## 
## [[3]]

## 
## [[4]]

Non-linearity

This is a work in progress. I built this function to create non-linear plots from both piecewise and quadratic interactions. I have expanded it to include functionality for OLS regression (with lm objects), for adding title commands (main, x-axis, y-axis, and legend titles), and for plotting main effects models. If you want to learn more about building functions, or expanding this one, I would be happy to help.

The function for nonLinearPlot() can be found in the R script nonLinearPlot.R. To use this function, simply place the nonLinearFunction_v5.R file into your working directory and run the following code:

source("nonLinearPlot.R")

Main Effects - Piecewise

nonLinearPlot(model = model, data = probset4, x1 = "ageP1", x2 = "ageP2", mod = NULL, type = "piece", scale = 5, piece.sep = 8, main = "Piecewise Main Effect of Age", xlab = "Age", ylab = "Internalizing Symptoms")

Interactions

nonLinearPlot(model = model, data = probset4, x1 = "ageP1", x2 = "ageP2", mod = "SEX", binary = T, type = "piece", scale = 5, piece.sep = 8, main = "Piecewise Interaction: Age and Sex", xlab = "Age", ylab = "Internalizing Symptoms", legend = "Sex")

nonLinearPlot(model = model, data = probset4, x1 = "ageP1", x2 = "ageP2", mod = "cwcARS", binary = F, type = "piece", scale = 5, piece.sep = 8, main = "Piecewise Interaction: Age and ARS_cwc", xlab = "Age", ylab = "Internalizing Symptoms", legend = "ARS Level")

nonLinearPlot(model = model, data = probset4, x1 = "ageP1", x2 = "ageP2", mod = "gmcARS", binary = F, type = "piece", scale = 5, piece.sep = 8, main = "Piecewise Interaction: Age and ARS_gmc", xlab = "Age", ylab = "Internalizing Symptoms", legend = "ARS Level")