This was set up to provide you with an easy and reliable resource for conducting regression models. This can, and should, be an evolving document; as you notice errors, inconsistencies, or the absence of a tool that you think is very important, please let your instructor, your TA, or Danny Forster (forster[dot]danny[at]gmail.com) know. We also want to make it as easy to understand as possible, so if you find anything unclear or can suggest a better way to present information, or even acccomplish a task, then please contribute!
Also, note that R packages are constantly being updated, which can influence how functions are called from various packages. This means that much of the syntax presented here will inevitably go out of date. Unless contacted, I will not be keeping this document up to date.
First, you should load all of the packages at the beginning of your script. If you don’t have a package, use r install.packages("packageName")
.
install.packages("ggplot2") # This installs only one package
install.packages(c("knitr", "ggplot2", "foreign", "stargazer", "lmSupport", "sjPlot", "BaylorEdPsych", "lavaan", "semPlot", "gvlma")) # This installs many packages
Once your packages are installed, then just use r library("packageName")
.
library(ggplot2) # this package is good for making plots
library(foreign) # this allows you to read in spss data files
library(stargazer) # this creates nice descriptive tables
library(lmSupport) # this has nice features for comparing models
library(sjPlot) # This makes pretty plots for various models
library(BaylorEdPsych) # this has resources for logistic regression
library(lavaan) # this can fit structural equation models
library(semPlot) # this can create path diagrams from structural equation models
library(gvlma) # assesses global linear model assumptions
sjp.setTheme(base = theme_bw()) # This changes the default to something I like...
Next, set your working directory. For any project, you should always use a specific filepath on your computer. If you are working across multiple computers, use cloud storage, then just take note of which working directory is for which computer. For example:
setwd("/Users/dannyforster/Dropbox/Dissertation/Analyses/") # home computer
setwd("/Users/dforster/Dropbox/Dissertation/Analyses/") # office computer
After you set your working directory, all files in that filepath can now be accessed by only referencing the file name. For example:
dissData <- read.csv("dissertationData.csv", header = T) # if your data is saved as a .csv
dissData <- read.table("dissertationData.txt", header = T) # if your data is saved as a tab delimited .txt
dissData <- read.spss("dissertationData.sav", to.data.frame = T, use.value.labels = F) # if your data is saved as a .sav (SPSS file; requires the 'foreign' package)
For this example, we will use the continuous moderator data.
modData <- read.spss("continuous moderation example from Cohen et al.sav", to.data.frame = T)
## Warning in read.spss("continuous moderation example from Cohen et
## al.sav", : continuous moderation example from Cohen et al.sav: Unrecognized
## record type 7, subtype 18 encountered in system file
str(modData) # This tells you the structure of your data (whether your variables are numeric, factor, etc.)
## 'data.frame': 245 obs. of 11 variables:
## $ id : num 1 2 3 4 5 6 7 8 9 10 ...
## $ age : num 60 40 29 47 48 42 55 43 39 51 ...
## $ yearsexe: num 10 9 2 10 9 6 8 19 9 14 ...
## $ enduranc: num 18 36 51 18 23 30 8 40 28 15 ...
## $ cage : num 10.82 -9.18 -20.18 -2.18 -1.18 ...
## $ cyearsex: num -0.673 -1.673 -8.673 -0.673 -1.673 ...
## $ inter : num -7.28 15.37 175.06 1.47 1.98 ...
## $ hiexer : num -5.45 -6.45 -13.45 -5.45 -6.45 ...
## $ interhi : num -58.93 59.22 271.43 11.9 7.63 ...
## $ loexer : num 4.1 3.1 -3.9 4.1 3.1 ...
## $ lointer : num 44.37 -28.49 78.67 -8.96 -3.67 ...
## - attr(*, "codepage")= int 1252
head(modData) # This shows the first 6 rows of your data
## id age yearsexe enduranc cage cyearsex inter hiexer
## 1 1 60 10 18 10.816327 -0.6734694 -7.284465 -5.4482
## 2 2 40 9 36 -9.183673 -1.6734694 15.368596 -6.4482
## 3 3 29 2 51 -20.183673 -8.6734694 175.062474 -13.4482
## 4 4 47 10 18 -2.183673 -0.6734694 1.470637 -5.4482
## 5 5 48 9 23 -1.183673 -1.6734694 1.980841 -6.4482
## 6 6 42 6 30 -7.183673 -4.6734694 33.572678 -9.4482
## interhi loexer lointer
## 1 -58.929510 4.1022 44.3707347
## 2 59.218163 3.1022 -28.4895918
## 3 271.434078 -3.8978 78.6719224
## 4 11.897090 4.1022 -8.9578653
## 5 7.632563 3.1022 -3.6719918
## 6 67.872784 0.1022 -0.7341714
summary(modData) # This shows basic descriptives of your data
## id age yearsexe enduranc
## Min. : 1 Min. :20.00 Min. : 0.00 Min. : 0.00
## 1st Qu.: 63 1st Qu.:43.00 1st Qu.: 7.00 1st Qu.:19.00
## Median :124 Median :48.00 Median :11.00 Median :27.00
## Mean :125 Mean :49.18 Mean :10.67 Mean :26.53
## 3rd Qu.:187 3rd Qu.:56.00 3rd Qu.:14.00 3rd Qu.:33.00
## Max. :250 Max. :82.00 Max. :26.00 Max. :55.00
## cage cyearsex inter hiexer
## Min. :-29.184 Min. :-10.6735 Min. :-131.570 Min. :-15.448
## 1st Qu.: -6.184 1st Qu.: -3.6735 1st Qu.: -7.264 1st Qu.: -8.448
## Median : -1.184 Median : 0.3265 Median : 4.511 Median : -4.448
## Mean : 0.000 Mean : 0.0000 Mean : 13.587 Mean : -4.775
## 3rd Qu.: 6.816 3rd Qu.: 3.3265 3rd Qu.: 32.103 3rd Qu.: -1.448
## Max. : 32.816 Max. : 15.3265 Max. : 175.062 Max. : 10.552
## interhi loexer lointer
## Min. :-165.252 Min. :-5.898 Min. :-175.428
## 1st Qu.: -20.792 1st Qu.: 1.102 1st Qu.: -16.244
## Median : 7.507 Median : 5.102 Median : 2.002
## Mean : 13.587 Mean : 4.776 Mean : 13.587
## 3rd Qu.: 36.403 3rd Qu.: 8.102 3rd Qu.: 37.453
## Max. : 275.733 Max. :20.102 Max. : 298.701
stargazer(modData[, 2:10], type = "text") # modData[, 2:10] takes only the 2nd-10th rows (i.e., excludes id)
##
## ==============================================
## Statistic N Mean St. Dev. Min Max
## ----------------------------------------------
## age 245 49.184 10.107 20 82
## yearsexe 245 10.673 4.775 0 26
## enduranc 245 26.531 10.819 0 55
## cage 245 0.000 10.107 -29.184 32.816
## cyearsex 245 0.000 4.775 -10.673 15.327
## inter 245 13.587 46.012 -131.570 175.062
## hiexer 245 -4.775 4.775 -15.448 10.552
## interhi 245 13.587 67.131 -165.252 275.733
## loexer 245 4.776 4.775 -5.898 20.102
## ----------------------------------------------
First, create a model with enduranc regressed on cyearsex and cage
m1 <- lm(enduranc ~ cyearsex + cage, data = modData)
# check multi-collinearity
vif(m1)
## cyearsex cage
## 1.086837 1.086837
sqrt(vif(m1)) > 2
## cyearsex cage
## FALSE FALSE
# check non-independence of errors
durbinWatsonTest(m1)
## lag Autocorrelation D-W Statistic p-value
## 1 0.02165138 1.955424 0.772
## Alternative hypothesis: rho != 0
# check global assumptions (using gvlma package)
global1 <- gvlma(m1)
summary(global1)
##
## Call:
## lm(formula = enduranc ~ cyearsex + cage, data = modData)
##
## Residuals:
## Min 1Q Median 3Q Max
## -21.7994 -6.9040 0.5701 5.6326 27.2279
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 26.5306 0.6337 41.865 < 2e-16 ***
## cyearsex 0.9163 0.1386 6.610 2.44e-10 ***
## cage -0.2571 0.0655 -3.925 0.000113 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9.919 on 242 degrees of freedom
## Multiple R-squared: 0.1663, Adjusted R-squared: 0.1594
## F-statistic: 24.14 on 2 and 242 DF, p-value: 2.754e-10
##
##
## ASSESSMENT OF THE LINEAR MODEL ASSUMPTIONS
## USING THE GLOBAL TEST ON 4 DEGREES-OF-FREEDOM:
## Level of Significance = 0.05
##
## Call:
## gvlma(x = m1)
##
## Value p-value Decision
## Global Stat 2.7023 0.6088 Assumptions acceptable.
## Skewness 0.7820 0.3765 Assumptions acceptable.
## Kurtosis 0.7505 0.3863 Assumptions acceptable.
## Link Function 0.3835 0.5357 Assumptions acceptable.
## Heteroscedasticity 0.7863 0.3752 Assumptions acceptable.
summary(m1)
##
## Call:
## lm(formula = enduranc ~ cyearsex + cage, data = modData)
##
## Residuals:
## Min 1Q Median 3Q Max
## -21.7994 -6.9040 0.5701 5.6326 27.2279
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 26.5306 0.6337 41.865 < 2e-16 ***
## cyearsex 0.9163 0.1386 6.610 2.44e-10 ***
## cage -0.2571 0.0655 -3.925 0.000113 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9.919 on 242 degrees of freedom
## Multiple R-squared: 0.1663, Adjusted R-squared: 0.1594
## F-statistic: 24.14 on 2 and 242 DF, p-value: 2.754e-10
# get confidence intervals
confint(m1)
## 2.5 % 97.5 %
## (Intercept) 25.2823165 27.7789080
## cyearsex 0.6432272 1.1893923
## cage -0.3861168 -0.1280706
Then you can add the interaction term as a variable…
m2.var <- lm(enduranc ~ cyearsex + cage + inter, data = modData)
summary(m2.var)
##
## Call:
## lm(formula = enduranc ~ cyearsex + cage + inter, data = modData)
##
## Residuals:
## Min 1Q Median 3Q Max
## -21.165 -6.939 0.269 6.300 21.299
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 25.88872 0.64662 40.037 < 2e-16 ***
## cyearsex 0.97272 0.13653 7.124 1.20e-11 ***
## cage -0.26169 0.06406 -4.085 6.01e-05 ***
## inter 0.04724 0.01359 3.476 0.000604 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9.7 on 241 degrees of freedom
## Multiple R-squared: 0.2061, Adjusted R-squared: 0.1962
## F-statistic: 20.86 on 3 and 241 DF, p-value: 4.764e-12
…or you can simply create the interaction within the model.
m2.star <- lm(enduranc ~ cyearsex * cage, data = modData) # The asterisk creates the main effects and interaction between two variables
summary(m2.star)
##
## Call:
## lm(formula = enduranc ~ cyearsex * cage, data = modData)
##
## Residuals:
## Min 1Q Median 3Q Max
## -21.165 -6.939 0.269 6.300 21.299
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 25.88872 0.64662 40.037 < 2e-16 ***
## cyearsex 0.97272 0.13653 7.124 1.20e-11 ***
## cage -0.26169 0.06406 -4.085 6.01e-05 ***
## cyearsex:cage 0.04724 0.01359 3.476 0.000604 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9.7 on 241 degrees of freedom
## Multiple R-squared: 0.2061, Adjusted R-squared: 0.1962
## F-statistic: 20.86 on 3 and 241 DF, p-value: 4.764e-12
m2.colon <- lm(enduranc ~ cyearsex + cage + cyearsex:cage, data = modData) # The colon isolates the interaction term (i.e., the colon will not evaluate main effects, so they must be entered separately)
summary(m2.colon)
##
## Call:
## lm(formula = enduranc ~ cyearsex + cage + cyearsex:cage, data = modData)
##
## Residuals:
## Min 1Q Median 3Q Max
## -21.165 -6.939 0.269 6.300 21.299
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 25.88872 0.64662 40.037 < 2e-16 ***
## cyearsex 0.97272 0.13653 7.124 1.20e-11 ***
## cage -0.26169 0.06406 -4.085 6.01e-05 ***
## cyearsex:cage 0.04724 0.01359 3.476 0.000604 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9.7 on 241 degrees of freedom
## Multiple R-squared: 0.2061, Adjusted R-squared: 0.1962
## F-statistic: 20.86 on 3 and 241 DF, p-value: 4.764e-12
Now let’s compare models:
modelCompare(m1, m2.star) # this uses the lmSupport package
## SSE (Compact) = 23810.33
## SSE (Augmented) = 22673.79
## Delta R-Squared = 0.03979367
## Partial Eta-Squared (PRE) = 0.04773338
## F(1,241) = 12.08038, p = 0.0006041869
This moderation is significant! We can plot this very easily in R.
sjp.int(m2.star, type = "eff", mdrt.values = "meansd")
You can also create a nice summary table using sjt.lm()
sjt.lm(m2.star)
enduranc | ||||
B | CI | p | ||
(Intercept) | 25.89 | 24.61 – 27.16 | <.001 | |
cyearsex | 0.97 | 0.70 – 1.24 | <.001 | |
cage | -0.26 | -0.39 – -0.14 | <.001 | |
cyearsex:cage | 0.05 | 0.02 – 0.07 | <.001 | |
Observations | 245 | |||
R2 / adj. R2 | .206 / .196 |
poly <- read.spss("polynomial revised.sav", to.data.frame = T)
head(poly)
## time gain time2 ctime ctime2
## 1 1 4 1 -2 4
## 2 1 2 1 -2 4
## 3 1 3 1 -2 4
## 4 1 2 1 -2 4
## 5 2 7 4 -1 1
## 6 2 5 4 -1 1
We want to regress gain on centered time (ctime) and centered time squared (ctime2)
polyreg <- lm(gain ~ ctime + ctime2, data = poly)
summary(polyreg)
##
## Call:
## lm(formula = gain ~ ctime + ctime2, data = poly)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.0071 -1.7214 0.2429 1.6357 5.9571
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 12.0429 1.0098 11.926 1.11e-09 ***
## ctime 1.3750 0.4582 3.001 0.008040 **
## ctime2 -1.9464 0.3872 -5.026 0.000104 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.898 on 17 degrees of freedom
## Multiple R-squared: 0.6684, Adjusted R-squared: 0.6294
## F-statistic: 17.13 on 2 and 17 DF, p-value: 8.415e-05
There is a significant quadratic effect. We can interpret this more easily by plotting it.
What happens if we try to plot it using sjPlots?
# plotting the linear effect of time
sjp.lm(polyreg, type = "pred", vars = "ctime", facet.grid = F)
# plotting the quadratic effect of time
sjp.lm(polyreg, type = "pred", vars = "ctime2", facet.grid = F)
This is confusing: We have a positive linear effect but a negative quadratic effect. We can’t really see the non-linearity here. What happens if we just include both variables in our plot?
sjp.lm(polyreg, type = "pred", vars = c("ctime", "ctime2"), facet.grid = F)
This also makes no sense. We have to coerce these two effects (linear and quadratic) onto a single, linear scale. To do so, you will have to use a function I created. If you have the nonLinearFunction_v7.R script in your working directory, you can simply load the function using source(). If you want to see how the function works, feel free to open it up and check it out. It’s pretty ugly, but I would be happy to take you through it, step-by-step.
source("nonLinearFunction_v7.R")
Now you can use the nonLinearPlot() function.
nonLinearPlot(model = polyreg, data = poly, x1 = "ctime", x2 = "ctime2", type = "quad", main = "Gain ~ Time", xlab = "Time", ylab = "Gain", scale = mean(poly$time))
Now we can clearly see how the combined linear and quadratic effects create a cruvilinear pattern of predicted values.
Logistic models are just as easy as OLS models. Instead of the lm() function, we will use the glm() function (generalized linear model). To see the different generlized linear models that R allows you to fit, use r ?family
. For logistic regression, we will use family = binomial(link = “logit”).
First, import the data set.
logitData <- read.spss("logistic data.sav", to.data.frame = T)
## Warning in read.spss("logistic data.sav", to.data.frame = T): logistic
## data.sav: Unrecognized record type 7, subtype 18 encountered in system file
head(logitData)
## age agrp chd dummy cage c10year lnage
## 1 20 1 0 0 -24.38 -2.438 2.995732
## 2 23 1 0 0 -21.38 -2.138 3.135494
## 3 24 1 0 0 -20.38 -2.038 3.178054
## 4 25 1 0 0 -19.38 -1.938 3.218876
## 5 25 1 1 0 -19.38 -1.938 3.218876
## 6 26 1 0 0 -18.38 -1.838 3.258097
# to create a dummy variable
logitData$dummyAge <- ifelse(logitData$age < 50, 0, 1)
# compare new dummy variable to old dummy variable
logitData$dummyAge == logitData$dummy
## [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [15] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [29] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [43] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [57] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [71] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [85] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [99] TRUE TRUE
In this data set, we first want to predict chd with our dummy variable.
logMod1 <- glm(chd ~ dummy, data = logitData, family = binomial(link = "logit"))
summary(logMod1)
##
## Call:
## glm(formula = chd ~ dummy, family = binomial(link = "logit"),
## data = logitData)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.6481 -0.7787 -0.7787 0.7710 1.6378
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.0380 0.2822 -3.678 0.000235 ***
## dummy 2.0989 0.4788 4.384 1.17e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 136.66 on 99 degrees of freedom
## Residual deviance: 114.61 on 98 degrees of freedom
## AIC: 118.61
##
## Number of Fisher Scoring iterations: 4
# Exponentiate coefficients to get Odds and Odds Ratios
exp(logMod1$coefficients)
## (Intercept) dummy
## 0.3541667 8.1568627
# Obtain and exponentiate confidence intervals to get Odds and Odds Ratios
confint(logMod1)
## Waiting for profiling to be done...
## 2.5 % 97.5 %
## (Intercept) -1.619470 -0.5055285
## dummy 1.193538 3.0827738
exp(confint(logMod1))
## Waiting for profiling to be done...
## 2.5 % 97.5 %
## (Intercept) 0.1980036 0.6031867
## dummy 3.2987321 21.8188390
# Obtain psued-R squared values from the BaylorEdPsych package
PseudoR2(logMod1)
## McFadden Adj.McFadden Cox.Snell Nagelkerke
## 0.1613743 0.1174706 0.1979135 0.2656432
## McKelvey.Zavoina Effron Count Adj.Count
## 0.2334979 0.2150322 0.7400000 0.3953488
## AIC Corrected.AIC
## 118.6090963 118.7328076
For the second analysis, we want to predict chd with mean-centered age.
logMod2 <- glm(chd ~ cage, data = logitData, family = binomial(link = "logit"))
summary(logMod2)
##
## Call:
## glm(formula = chd ~ cage, family = binomial(link = "logit"),
## data = logitData)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.9718 -0.8456 -0.4576 0.8253 2.2859
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.38677 0.23972 -1.613 0.107
## cage 0.11092 0.02406 4.610 4.02e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 136.66 on 99 degrees of freedom
## Residual deviance: 107.35 on 98 degrees of freedom
## AIC: 111.35
##
## Number of Fisher Scoring iterations: 4
# Exponentiate coefficients to get Odds and Odds Ratios
exp(logMod2$coefficients)
## (Intercept) cage
## 0.6792452 1.1173068
# Obtain and exponentiate confidence intervals to get Odds and Odds Ratios
confint(logMod2)
## Waiting for profiling to be done...
## 2.5 % 97.5 %
## (Intercept) -0.87154824 0.07534233
## cage 0.06693158 0.16200672
exp(confint(logMod2))
## Waiting for profiling to be done...
## 2.5 % 97.5 %
## (Intercept) 0.4183034 1.078253
## cage 1.0692223 1.175868
# Obtain psued-R squared values from the BaylorEdPsych package
PseudoR2(logMod2)
## McFadden Adj.McFadden Cox.Snell Nagelkerke
## 0.2144684 0.1705648 0.2540516 0.3409928
## McKelvey.Zavoina Effron Count Adj.Count
## 0.3371659 0.2725374 0.7400000 0.3953488
## AIC Corrected.AIC
## 111.3530927 111.4768040
Again, the sjPlots package makes it easy to get prediction lines.
sjp.glm(logMod2, type = "pred", vars = "cage")
You can also create a nice summary table using sjt.glm()
sjt.glm(logMod2)
chd | ||||
Odds Ratio | CI | p | ||
(Intercept) | 0.68 | 0.42 – 1.08 | .107 | |
cage | 1.12 | 1.07 – 1.18 | <.001 | |
Observations | 100 |
These are advanced techniques, but this is just to demonstrate the ease of doing these types of analyses in R. First, I’m going to show you how to conduct the mediation analysis from your second homework assignment using the lavaan package.
ptsd <- read.spss("PTSD on warstr data with blood pressure.sav", to.data.frame = T)
head(ptsd)
## ID PTSD ANXTOT WARSTR BLOODPRE
## 1 1 19.00 21 7 98.8
## 2 2 15.31 9 6 105.0
## 3 3 30.00 11 9 113.0
## 4 4 5.00 15 12 125.0
## 5 5 19.00 3 4 88.0
## 6 6 18.00 1 6 100.0
From this data, we wanted to test whether PTSD mediated the relationship between War Stress and Blood Pressure, controlling for the influence of Anxiety on PTSD. With SEM, we can analyze everything, including the indirect effects, in a single step.
# First, create your model
mediate <- '
# variables must be spelled/capitalized correctly
# letters joined to variables with an asterisk are labels for the specified path
# Y on X and M
BLOODPRE ~ c*WARSTR + b*PTSD
# M on X (controlling for covariate)
PTSD ~ a*WARSTR + ANXTOT
# We can create a new parameter (the indirect and total effects) from the labels above
# Indirect Effect
ab := a*b
# Total Effect
total := c + (a*b)
'
# Fit the model using bootstrap standard errors
fit <- sem(mediate, data = ptsd, se = "bootstrap")
summary(fit, ci = T)
## lavaan (0.5-23.1097) converged normally after 25 iterations
##
## Used Total
## Number of observations 151 152
##
## Estimator ML
## Minimum Function Test Statistic 0.096
## Degrees of freedom 1
## P-value (Chi-square) 0.757
##
## Parameter Estimates:
##
## Information Observed
## Standard Errors Bootstrap
## Number of requested bootstrap draws 1000
## Number of successful bootstrap draws 1000
##
## Regressions:
## Estimate Std.Err z-value P(>|z|) ci.lower ci.upper
## BLOODPRE ~
## WARSTR (c) 1.088 0.552 1.971 0.049 -0.092 2.123
## PTSD (b) 0.421 0.132 3.192 0.001 0.142 0.685
## PTSD ~
## WARSTR (a) 0.822 0.288 2.856 0.004 0.265 1.427
## ANXTOT 0.709 0.109 6.503 0.000 0.487 0.920
##
## Variances:
## Estimate Std.Err z-value P(>|z|) ci.lower ci.upper
## .BLOODPRE 172.194 24.518 7.023 0.000 126.288 221.636
## .PTSD 61.697 8.404 7.341 0.000 45.693 78.169
##
## Defined Parameters:
## Estimate Std.Err z-value P(>|z|) ci.lower ci.upper
## ab 0.346 0.159 2.179 0.029 0.082 0.677
## total 1.433 0.496 2.892 0.004 0.384 2.344
You can also get R to give you a path diagram for a fitted model using semPaths(). This is from the semPlot package. This isn’t super great (you’re much better off using powerpoint), but it can still give you a functional path diagram.
# roatation = 2 puts all exogenous variables on the left side
# edge.label.cex changes the font size (default = 0.8, which is rather small)
semPaths(fit, whatLabels = "par", rotation = 2, edge.label.cex = 1)
## Warning in qgraph(Edgelist, labels = nLab, bidirectional = Bidir, directed
## = Directed, : The following arguments are not documented and likely not
## arguments of qgraph and thus ignored: loopRotation; residuals; residScale;
## residEdge; CircleEdgeEnd