I was tagged today on twitter asking about categorical variables in `lavaan`

. I will say I have not done much with categorical predictors either endogenous or exogenous. I did a quick reproducible example of exogenous variables, and I will refer you to the help guide for `lavaan`

here.

You will need both the `lavaan`

and `psych`

packages to reproduce this code. Ironically, this data is binary outcome data (the `epi`

dataset in `psych`

), which wasn't intentional, I just knew it was a good dataset to work with to test how to do exogenous categorical variables.

First, let's make a model that works (I do assume you know a bit about `lavaan`

here, feel free to ask questions):

```
#load libraries and data
library(psych)
library(lavaan)
DF = epi
#lavaan model syntax
epi.model = 'latent =~ V1+V2+V3+V4
latent2 =~ V5+V6+V7+V8'
#analyze the model
epi.fit = cfa(model = epi.model,
data = DF)
#show a summary
summary(epi.fit)
```

The `cfa`

and `summary`

did not throw any errors, so the model at least runs smoothly, even if it is not a “good” model. For good measure, you can also use `semPlot`

to create a picture of this two-factor model:

```
library(semPlot)
#semPaths with basic options
semPaths(epi.fit,
whatLabels = "std",
edge.label.cex = 1)
```

Next, I created a fake dummy coded variable with three levels, although you could scale this easily with more levels:

```
DF$category = c(rep("group", nrow(epi)/3),
rep("group2", nrow(epi)/3),
rep("group3", nrow(epi)/3))
DF$category = as.factor(DF$category)
```

When I tried to run a new model with the `category`

variable, `lavaan`

was not happy:

```
Warning message:
In lav_data_full(data = data, group = group, cluster = cluster, :
lavaan WARNING: unordered factor(s) with more than 2 levels detected in data: category
```

Fine, let's dummy code them with the gloriously easy `dummy.code`

function in `psych`

:

```
#dummy code and combine with DF
DF_dc = cbind(DF, dummy.code(DF$category))
```

However, I will warn you that `psych`

does give you K columns where K = levels. Real dummy coding is K - 1 columns, so I find it odd that `psych`

gives you K output. For example, it took our `group`

, `group2`

, `group3`

labels and transformed them into three new columns with 0 as *not my group* and 1 as *my group*. Therefore, I will advise you to pick your favorite combination of K - 1 levels, and do not use all of them or you will create a singular matrix that will be difficult to troubleshoot in any regression based analysis. Here's an example of that error:

```
Error in lav_samplestats_icov(COV = cov[[g]], ridge = ridge, x.idx = x.idx[[g]], :
lavaan ERROR: sample covariance matrix is not positive-definite
```

I can add the first two to the model predicting one of the latents using `~`

for regression rather than `=~`

for create a latent:

```
#model syntax
epi.model2 = 'latent =~ V1+V2+V3+V4
latent2 =~ V5+V6+V7+V8
latent ~ group
latent ~ group2'
#analyze the model with the new DF
epi.fit2 = cfa(model = epi.model2,
data = DF_dc)
#summarize the model
summary(epi.fit2)
```

In your output, you will get two new lines for regression:

```
Regressions:
Estimate Std.Err z-value P(>|z|)
latent ~
group 0.026 0.014 1.847 0.065
group2 -0.001 0.014 -0.082 0.935
```

The interpretation here would be that group = group 1 versus group 3 was related to/predicted `latent`

at 0.026, so the difference in `latent`

for group 1 to group 3 was 0.026. The second variable would be group2 = group 2 versus group 3, and they basically have no difference on `latent`

. You can learn more about dummy coding here.

Here's the picture of that analysis:

```
semPaths(epi.fit,
whatLabels = "std",
edge.label.cex = 1)
```

Remember that any endogenous variables will get automatically correlated … so now we have a second latent variable hanging out in space we would want to either predict with our dummy coded variables or do something with. So, I would probably either add the correlation between latents back in with: `latent ~~ latent2`

or add in the regressions for using the categoricals to predict latent2: `latent2 ~ group`

and `latent2 ~ group2`

.

More lavaan help can be found on my youtube channel!.