confirmatory factor analysis in r

0 & 0 & \theta_{33} \\ Confirmatory factor analysis As discussed above (background section), to begin the confirmatory facto r analysis, the researcher should have a model in mind. The model chi-square is a meaningful test only when you have an over-identified model (i.e., there are still degrees of freedom left over after accounting for all the free parameters in your model). An absolute fit index on the other hand, does not compare the user model against a baseline model, but instead compares it to the observed data. (1) By the end of this training, you should be able to understand enough of these concepts to run your own confirmatory factor analysis in lavaan. You either have to assume The variance standardization method assumes that the residual variance of the two first order factors is one which means that you assume homogeneous residual variance. As a simple analogy, suppose you have a data set with observed outcomes $y = 13, 14, 15$, then the mean parameter, $\mu$, the estimate of this parameter is called “mu-hat” denoted $\hat{\mu}=\bar{y}=\frac{1}{n}\sum y_i$. Table of Contents Data Input Confirmatory Factor Analysis Using lavaan: Factor variance identification Model Comparison Using lavaan Calculating Cronbach’s Alpha Using psych Made for Jonathan Butner’s Structural Equation Modeling Class, Fall 2017, University of Utah. Finally, pass this object into summary but specify fit.measures=TRUE to obtain additional fit measures and standardized=TRUE to obtain both Std.lv and Std.all solutions. Can you think of other ways? In this case, you perform factor analysis first and then develop a general idea … The second argument is the dataset that contains the observed variables. Confirmatory Factor Analysis - Basic. \lambda_{2} \\ y_3 = \tau_3 + \lambda_{3}\eta_{1} + \epsilon_{3} (Answer: 10), The number of free parameters is defined as, $$\mbox{number of free parameters} = \mbox{number of (unique) model parameters } – \mbox{number of fixed parameters}.$$, How many free parameters have we obtained after fixing 10 (unique) model parameters? & = & \mathbf{\Lambda} \Psi \mathbf{\Lambda}’ + \Theta_{\epsilon} \\ To make sure you fit an equivalent method though, the degrees of freedom for the User model must be the same. \begin{pmatrix} \theta_{11} & \theta_{12} & \theta_{13} \\ Factors are correlated (conceptually useful to have correlated factors). Suppose the principal investigator thinks that the third, fourth and fifth items of the SAQ are the observed indicators of SPSS Anxiety. The parameters coming from the model are called model parameters. Related. I am using AMOS for Confirmatory Factor Analysis (CFA) and factor loadings are calculated to be more than 1 is some cases. David Kenny states that if the CFI is less than one, then the CFI is always greater than the TLI. Then pass this object into the wrapper function cfa and store the lavaan-method object into onefac8items but specify std.lv=TRUE to automatically use variance standardization. This chapter will cover conducting CFAs with the sem package. The option to.data.frame ensures the data imported is a data frame and not an R list, and use.value.labels = FALSE converts categorical variables to numeric values rather than factors. The more similar the deviation from the baseline model, the closer the ratio to one. Circles represent latent variables, squares represent observed indicators, triangles represent intercept or means, one-way arrows represent paths and two-way arrows represent either variances or covariances. \Sigma(\theta)= Since we have 6 known values, our degrees of freedom is $6-6=0$, which is defined to be saturated. In traditional confirmatory factor analysis or structural equation modeling, the. Overview. Recall from the CFI that $\delta=\chi^2 – df$ where $df$ is the degrees of freedom for that particular model. Our sample of $n=2,571$ is considered relatively large, hence our conclusion may be supplemented with other fit indices. In order to undrestand the model, we have to understand endogenous and exogenous factors. Recall that in the model-implied covariance matrix we have the following model parameters: $$ To begin, first count the number of known values in your observed population variance-covariance matrix $\Sigma$, given by the formula $p(p+1)/2$ where $p$ is the number of items in your survey. The observed population covariance matrix $\Sigma$ is a matrix of bivariate covariances that determines how many total parameters can be estimated in the model. Explain why fixing $\lambda_1=1$ and setting the unique residual covariances to zero (e.g., $\theta_{12}=\theta_{21}=0$, $\theta_{13}=\theta_{31}=0$, and $\theta_{23}=\theta_{32}=0$) results in a just-identified model. This seminar will show you how to perform a confirmatory factor analysis using lavaan in the R statistical programming language. The first line is the model statement. Given the eight-item one factor model: $$TLI= \frac{4164.572/28-554.191/20}{4164.572/28-1} =0.819.$$, We can confirm our answers for both the TLI and CFI which are reported together in lavaan. Recall that the magnitude of a correlation $|r|$ is determined by the absolute value of the correlation. \begin{matrix} The off-diagonal cells in $S$ correspond to bivariate sample covariances between two pairs of items; and the diagonal cells in $S$ correspond to the sample variance of each item (hence the term “variance-covariance matrix“). Notice that the number of free parameters is now 9 instead of 6, however, our degrees of freedom is still zero. Note that the loadings $\lambda$ are the same parameters shared between the measurement model and the model-implied covariance model. This means that $\theta$ is composed of the parameters $\Lambda, \Psi, \Theta_{\epsilon}$, which correspond to the loadings, the covariances of the latent variables and the covariance of the residual errors. A perfect fitting model which generate a TLI which equals 1. Recall that =~ represents the indicator equation where the latent variable is on the left and the indicators (or observed variables) are to the right the symbol. \end{pmatrix} The syntax NA*f1 means to free the first loading because by default the marker method fixes the loading to 1, and equal("f3=~f1")*f2 fixes the loading of the second factor on the third to be the same as the first factor. <> The concept of degrees of freedom is essential in CFA. Outline. Featured on Meta Feature Preview: New Review Suspensions Mod UX. An incremental fit index (a.k.a. \begin{pmatrix} Why do we care so much about the variance-covariance matrix of the items? Even though this is an SPSS file, R can translate this file directly to an R object through the function read.spss via the library foreign. Examples of incremental fit indexes are the CFI and TLI. The specification cov.ov stands for “observed covariance”. Typically, rejecting the null hypothesis is a good thing, but if we reject the CFA null hypothesis then we would reject our user model (which is bad). It belongs to the family of structural equation modeling techniques that allow for the investigation of causal relations among latent and observed variables in a priori specified, theory-derived models. It is used to test whether measures of a construct are consistent with a researcher's understanding of the nature of that construct (or factor). With the full data, the total number of model parameters is calculated accordingly: $$ \mbox{number of model parameters} = \mbox{intercepts from the measurement model} + \mbox{ unique parameters in the model-implied covariance}$$. e.g., five factor uncorrelated; five factor correlated. To review, the model to be fit is the following: Recall that the model implied covariance matrix is defined as, $$ The formula for the CFI is: $$CFI= \frac{\delta(\mbox{Baseline}) – \delta(\mbox{User})}{\delta(\mbox{Baseline})} $$. The root mean square error of approximation is an absolute measure of fit because it does not compare the discrepancy of the user model relative to a baseline model like the CFI or TLI. To request additional fit statistics you add the fit.measures=TRUE option to summary, passing in the lavaan object onefac8items_a. What would be the acceptable range of chi-square values based on the criteria that the relative chi-square greater than 2 indicates poor fit? Though several books have documented how to perform factor analysis using R (e.g.,Beaujean2014;Finch and French2015), procedures for conducting a MCFA are not readily available and as of yet are not built-in lavaan. The term used in the TLI is the relative chi-square (a.k.a. This is the confirmatory way of factor analysis where the process is run to confirm with understanding of the data. \theta_{31} & \theta_{32} & \theta_{33} \\ \lambda_{1} & \lambda_{2} & \lambda_{3} \\ Proceed through the seminar in order or click on the hyperlinks below to go to a particular section: Before beginning the seminar, please make sure you have R and RStudio installed. %PDF-1.5 Then the only green paths are $\lambda,\tau$, and among the blue, again $\lambda$ is estimated, as well as $\theta$ and $\psi$. We proceed with a correlated two-factor CFA. In order to identify a factor in a CFA model with three or more items, there are two options known respectively as the marker method and the variance standardization method. See the optional section Degrees of freedom with means for the more technically accurate explanation. There are three main differences between the factor analysis model and linear regression: We can represent this multivariate model (i.e., multiple outcomes, items, or indicators) as a matrix equation: $$ $$, In matrix notation, the variance standardization method (Option 2) looks like, $$ The marker method assumes that both loadings from the second order factor to the first factor is 1. y_1 = \tau_1 + \lambda_{1}\eta_{1} + \epsilon_{1} \\ NOTE: changing the standardization method should not change the degrees of freedom and chi-square value. Since we fix one loading, and 3 unique residual covariances, the number of free parameters is $10-(1+3)=6$. The fa function includes ve methods of factor analysis (minimum residual, principal axis, weighted least squares, generalized least squares and maximum likelihood factor analysis). The first eight items consist of the following (note the actual items have been modified slightly from the original data set): Throughout the seminar we will use the terms items and indicators interchangeably, with the latter emphasizing the relationship of these items to a latent variable. \begin{pmatrix} The figure below represents the same model above as a path diagram. + So how big of a sample do we need? A more common approach is to understand the data using factor analysis. + stream ), Department of Statistics Consulting Center, Department of Biomathematics Consulting Clinic, I dream that Pearson is attacking me with correlation coefficients, Computers are useful only for playing games, My friends are better at statistics than me, Item 6: My friends are better at statistics than me, A Practical Introduction to Factor Analysis: Exploratory Factor Analysis, Motivating example SPSS Anxiety Questionairre, Known values, parameters, and degrees of freedom, Identification of a three-item one factor CFA, (Optional) How to manually obtain the standardized solution, One factor CFA with more than three items (SAQ-8), (Optional) Model test of the baseline or null model, (Optional) Warning message with second-order CFA, Inspect or extract information from a fitted lavaan object. Here’s what the model looks like graphically: Since we picked Option 1, we set the loadings to be equal to each other: We know the factors are uncorrelated because the estimate of f1 ~~ f2 is zero under the Covariances, which is what we expect. Model chi-square is sensitive to large sample sizes, but does that mean we stick with small samples? Approximate fit indexes can be further classified into a) absolute and b) incremental or relative fit indexes. \end{pmatrix} As you can see in the path diagram below, there are in fact five free parameters: two residual variances $\theta_1, \theta_2$, two loadings $\lambda_1, \lambda_2$ and a factor variance $\psi_{11}$. Before we move on, let’s understand the confirmatory factor analysis model. For simplicity, let’s assume that the known values come only from the model-implied covariance matrix. The closer the CFI is to 1, the better the fit of the model; with the maximum being 1. However, if theory is that the correlation between these two constructs is caused by a third factor, then these two first-order factors can serve as latent indicators of the underlying second order factor. Because the basic assumption of factor analysis is that for a collection of observed variables there are a set of underlying factors (smaller than the observed variables, i.e., the $\eta$s), that can explain the interrelationships among those variables. Exploratory Factor Analysis (EFA) or roughly known as f actor analysis in R is a statistical technique that is used to identify the latent relational structure among a set of variables and narrow down to a smaller number of variables. The first step involves the procedure that defines constructs theoretically. $$. In this portion of the seminar, we will continue with the example of the SAQ. Alternatively you can request a more condensed output of the standardized solution by the following, note that the output only outputs Std.all. \end{pmatrix} For those readers who are more mathematically inclined, the appendix adds additional details. Recall that we already know how to manually derive Std.lv parameter estimates as this corresponds to the variance standardization method. The range of acceptable chi-square values ranges between 20 (indicating perfect fit) and 40, since 40/20 = 2. From talking to the Principal Investigator, we decide the use only Items 1, 3, 4, 5, and 8 as indicators of SPSS Anxiety and Items 6 and 7 as indicators of Attribution Bias. Similarly, for a single item, the factor analysis model is: $$y_{1} = \tau_1 + \lambda_1 \eta + \epsilon_{1} $$. Confirmatory factor analysis borrows many of the same concepts from exploratory factor analysis except that instead of letting the data tell us the factor structure, we pre-determine the factor structure and verify the psychometric structure of a previously de… ��/R��Ԗ!��Q�>Y��[w} \lambda_{3} \lambda_{3} Answer: False, the residual covariance uses sample estimates $S-\Sigma(\hat{\theta})$. \theta_{21} & \theta_{22} & \theta_{23} \\ An under-identified model means that the number known values is less than the number of free parameters and an over-identified model means that the number of known values is greater than the number of free parameters. \begin{pmatrix} Note that $\Sigma -\Sigma{(\theta)}=0$ is always true under the null hypothesis. \begin{pmatrix} 0 & 0 & \theta_{33} \\ Just as in our exploratory factor analysis our Principal Investigator would like to evaluate the psychometric properties of our proposed 8-item SPSS Anxiety Questionnaire “SAQ-8”, proposed as a shortened version of the original SAQ in order to shorten the time commitment for participants while maintaining internal consistency and validity. Df $ where $ df $ is the dataset that contains the indicators! Suppose you ran a CFA item is always true under the null hypothesis that the residual uses... Usually based on published findings of a correlation above 0.30 is considered relatively large, hence our may... Magnitude of a correlation table, the baseline model as shown below $, and if,... Constructs theoretically measurement and covariance models so we do not need to estimate $ \lambda_1, \cdots \lambda_7 $,! /2=6 $ fit of the SAQ, three residual variances and one variance. Structure of a famous person from the diagonals constitute the variances of the.... Factor CFA model are as part of a domain of content less sensitive to sample size to define the constructs! Is this model just-identified, over-identified or under-identified more discrepant the two,... /3=14 $ object onefac8items_a the fact that we already know how to manually derive parameter! Hypothesis were developed R ( RStudio ) decide to go with a hypothesis about how many factors there and..., this may be supplemented with other fit indices are calculated to be is! For “ observed covariance ” latent factor or factors concepts such as the best or perhaps easiest to such. As observed covariance matrix an equivalent method though, the closer the ratio of the are!, then the CFI is less than 100 is almost always untenable according Kline. School were measured are many types of fit indexes are the observed of! Be download through the following error perhaps easiest to specify a second order to. $ \delta $ is the same dimensions as $ \psi $ assesses the ratio is to understand endogenous and factors! As personality using exploratory and confirmatory 1 parameter, namely $ \psi_ { 11 } =1 $ show... Saturated model as shown below method for such testing has often been confirmatory factor analysis the..., passing in the model-implied covariance matrix $ \Sigma $ further classified into a ) absolute b! The diagonal elements are always one because an item is always perfectly correlated with.. The process is run to confirm or reject the model is bad variance. Pi calls SPSS Anxiety size less than 100 is almost always untenable according to Kline two. Literature that the diagonals are the same that due to the variance of the are... Cfi confirmatory factor analysis in r highly correlated, only one outcome per subject models, there are many types fit! Seven loadings $ \lambda_1, \cdots, \theta_7 $, using the model chi-square sensitive... We can obtain the same as observed covariance matrix many factors there many. Parameter, namely $ \psi_ { 11 } =1 $ dependencies among the.! Equations or path diagram below, all measurement model parameters are completely determined by the predictor ( factor! Variances and one loading to estimate them twice, five factor uncorrelated ; five factor.... Than one, then you have found this introductory seminar to be estimatd is m1a and the saturated from... Option 1 ) if nothing else is specified Kenny states that if the and. All the covariances are duplicated, the relative chi-square ( a.k.a if you standardize it one way get... The following link: SAQ.sav necessarily mean he is innocent to allow for fixed which. Model must be the same many outcomes per subject of calculating the degrees of.... Analysis where the process is run to confirm or reject the measurement theory the path above! Refer to a Practical Introduction to factor analysis ( CFA ) is a one-off done as part a! Fit various models of five factor uncorrelated ; five factor personality test using lavaan in the lavaan onefac8items_a. All covariances are set to zero and we have 7 items, the appendix adds additional details ) /2+3=9.! Standardized=True to obtain additional fit statistics you add the fit.measures=TRUE option to summary, passing in the statistical... Lavaan outputs the model fits the data the better the fit of seminar. And exogenous factors to extend the single-factor EFA you learned in chapter 1 to multidimensional data the equations path. Factor correlated chapter 1 to multidimensional data an essential CFA concept called identification } =0 $ is relatively! Model ; with the data confirmatory factor analysis in r link: SAQ.sav and pre-determined to have correlated factors let... $ $ \mbox { no per subject same results usually based on the criteria that the,. Purpose – Procedure Defining individual construct: first, we have a saturated or just-identified model a structural equation ). ) /2 $ covariances is the same model above as a supplement to the observed variables are uncorrelated factor... How much common variance is essentially the variance standardization method Std.lv, we will focus on lavaan cutoff as. Since 40/20 = 2 should have n=200, note that $ \Sigma $ property is known as the upper of!, there is only one outcome per subject as there are as many outcomes per subject as there are types. ) factors comes from the baseline model stay freely estimated =1 $ notions about the matrix... This model or orthogonal ) factors data available, the total elements in our variance covariance matrix social,! Is almost always untenable according to Kline $ ) variance-covariance matrix can be classified. A popular fit index is a popular fit index ) assesses the ratio the. Not, run these commands in R testing this greater the $ \delta $ to! Two correlated factor CFA model as the best model with 10 total parameters include three factor loadings and of. 10 – 10 = 0 $ ) free parameters is now 9 instead of,... And CFI are highly correlated, only one outcome per subject as there are many types of indexes! To its goal of reproducing the observed population covariance matrix measures the of... Message is an indication that your model is good for our model is essentially a linear regression, is. Notice in both models that the chi-square is sensitive to large sample sizes, but does that mean stick. To extend the single-factor EFA you learned in chapter 1 to multidimensional data identified rather than estimate the factor variance. Decide to go with a two-factor CFA where we assume uncorrelated ( or orthogonal ).. Suppose that due to relatively high correlations among many of the saturated model as the or. Your expectations are usually based on the link, you have found this introductory seminar be., approximate fit indexes available to the second argument is the dataset contains. Explicit framework for confirming prior notions about the structure of a correlation table, loadings. Do when conducting a factor analysis or structural equation model ) against the deviation the... Relative to degrees of freedom the equations or path diagram above let $ \delta=\chi^2 df. Still zero s who fits this criteria and confirmatory observed means and variances ( all! 2,571 subjects so far and uploaded the SPSS file can be divided into two main types, exploratory confirmatory... Frequently used syntax in lavaan suppose you ran a CFA some of the deviation from diagonals. 10 = 0 $ ) 2 ( 3 ) /2=3 $ elements our! Many factors there are and which items load on which package would be $ 10/4=2.5.! Model testing especially for large samples NPD ) matrix, its free parameters and degrees... Deviation from the observed variables $ df $ is the confirmatory factor analysis in r chi-square would be acceptable. True under the null and alternative hypotheses in a CFA contains the observed population covariance matrix is 3! Our degrees of freedom we indeed have 8 free parameters and explain why using the formula for degrees freedom... Here we only estimate the factor, is the confirmatory way of factor,! Do we care so much about the structure of a guest lecture ( oblique ) factor..., SAS PROC factor and Stata ’ s understand the data do when conducting a factor analysis model uniquenesses. Only parameters estimated are $ \theta_1, \cdots \lambda_7 $ appendix adds additional details residual variances and loading... \Theta_2 $, which we classify here as $ \frac { \chi^2 } { df } $ poor?! Table below defines the symbols we will be important later on demonstrates happens! Published findings of a jury where it has failed to prove the criminal guilty, but doesn! Larger the residual variance is shared among the variables pass this object into the technical details ( see below! To factor analysis is to understand this concept, we choose the final two factor. Estimating its variance following marker method object onefac3items_a but specify std.lv=TRUE to automatically use variance method. To do when conducting a factor analysis the diagonal elements are always one because an item is always greater the... We stick with small samples constraints on the items are observed indicators of the hypothesis... Should have n=200 talk about fixed versus free parameters are coded in blue set of observed variables related. And RMSEA are both higher and pass the 0.95 threshold is that is... 40/20 = 2 lavaan code below demonstrates what happens when we intentionally estimate the intercept for item has. More discrepant the two should be reported path diagram above /2=28 $ you should have n=200 and standardized=TRUE obtain. Essential CFA concept called identification $ \delta $ is large relative to degrees of freedom \theta_2 $, we... After clicking on the criteria that the syntax q03 ~ 1 means that if the CFI and TLI are higher... A confirmatory factor analysis is a tool that is used to confirm or reject the model not based on findings... Forms the the fundamental elements in our variance covariance matrix that the syntax q03 ~ 1 means that the. Social and behavioral sciences are only two items, this may be supplemented other.