New Insights into PCA + Varimax for Psychological Researchers

A short commentary on Rohe & Zeng (2023)

Authors
Affiliations

Florian Pargent

Department of Psychology, LMU Munich

David Goretzko

Utrecht University

Timo von Oertzen

Bundeswehr University Munich and Max Planck Institute for Human Development

Published

March 26, 2024

Important

This document is an updated copy of a published commentary, to showcase Quarto manuscripts in our Quarto workshop. The official online repository of our published commentary can be found here.

Commentary

As psychologists, we appreciate Rohe & Zeng’s (R&Z; Rohe & Zeng (2023)) new insights into “vintage” principal component analysis with varimax rotation (PCA+VR). Theories of intelligence and personality, perhaps psychology’s contributions best known outside of our field, have been a direct product of PCA. PCA+VR is still widely used for developing and evaluating psychological tests and questionnaires, although the literature has fought against it in favor of more complex factor analytic techniques (Fokkema & Greiff, 2017).

In our opinion, abandoning the simpler PCA(+VR) is a mistake and R&Z refute a common argument by proving that PCA+VR can perform statistical inference in latent variable models: The factor indeterminacy problem which plagued VR since its invention only applies for the special case of normally distributed factors. For any other distribution, perfect factor indeterminacy does not apply, although identifiability might be weak. However, distributions producing sparse components fulfill a sufficient leptokurtic condition, which can be confirmed by simple diagnostics.

Because the results are complicated, we relate them to psychological applications. The examples in R&Z only deal with sparse binary network data, but in typical psychological applications, the \(A\) matrix consists of responses of \(n\) persons to \(d\) items which are either binary (e.g., intelligence tests), integer-valued (e.g., personality questionnaires) or continuous (e.g., digital sensors). Psychologists are often interested in whether i) items can be structured in a simple way to represent a small number of meaningful components, and ii) those components can be interpreted as psychological constructs that describe interindividual differences. R&Z show that “radial streaks” in the rotated loading matrix \(\hat{Y}\) suggest that item loadings are identified and can be estimated with PCA+VR from the data. Similarly, streaks in the component matrix \(\hat{Z}\) suggest that person scores can be estimated.

However, we question whether streaks are common in psychology with regard to both aspects. Test and questionnaire items are traditionally designed to measure only a single construct, so “simple structure” reflected by streaks in \(\hat{Y}\) might be expected. Psychological constructs are often conceptualized as roughly normally distributed, so streaks in \(\hat{Z}\) seem more questionable. In our online materials (https://osf.io/5symf/), we analyze a dataset (Stachl et al., 2020) containing both personality items (\(n =687\), \(d =300\)) and smartphone sensing variables (\(n =624\), \(d =1821\)). Streaks were found only in \(\hat{Y}\) but not in \(\hat{Z}\). It is also a cautionary example of how imputation of missing values in combination with inappropriate data processing seemingly produce streaks in \(\hat{Z}\) that belong to uninterpretable components. Degree normalization as discussed in R&Z is not suitable for many psychological datasets and other procedures like z-standardization are often required to detect meaningful factors. Finally, we demonstrate R&Z’s side result that the matrix \(\hat{Z}\hat{B}\) from PCA+VR can estimate person scores simulated from oblique leptokurtic components.

In our opinion, the main usefulness of PCA+VR not necessarily stems from its ability to estimate latent variable models. PCA excels at providing meaningful descriptions in practical applications but R&Z’s and our examples also show that there is rarely a single definite structure. Components are most useful when they predict other meaningful quantities, regardless of the assumed epistemological nature of psychological constructs (Yarkoni, 2020).

References

Fokkema, M., & Greiff, S. (2017). How Performing PCA and CFA on the Same Data Equals Trouble: Overfitting in the Assessment of Internal Structure and Some Editorial Thoughts on It. European Journal of Psychological Assessment, 33(6), 399–402. https://doi.org/10.1027/1015-5759/a000460
Rohe, K., & Zeng, M. (2023). Vintage factor analysis with Varimax performs statistical inference. Journal of the Royal Statistical Society Series B: Statistical Methodology, 85(4), 1037–1060. https://doi.org/10.1093/jrsssb/qkad029
Stachl, C., Au, Q., Schoedel, R., Gosling, S. D., Harari, G. M., Buschek, D., Völkel, S. T., Schuwerk, T., Oldemeier, M., Ullmann, T., Hussmann, H., Bischl, B., & Bühner, M. (2020). Predicting personality from patterns of behavior collected with smartphones. Proceedings of the National Academy of Sciences of the United States of America, 117(30), 17680–17687. https://doi.org/10.1073/pnas.1920484117
Yarkoni, T. (2020). Implicit Realism Impedes Progress in Psychology: Comment on Fried (2020). Psychological Inquiry, 31(4), 326–333. https://doi.org/10.1080/1047840X.2020.1853478

Citation

BibTeX citation:
@online{pargent2024,
  author = {Pargent, Florian and Goretzko, David and von Oertzen, Timo},
  title = {New {Insights} into {PCA} + {Varimax} for {Psychological}
    {Researchers}},
  date = {2024-03-26},
  url = {https://florianpargent.github.io/myquartomanuscript/},
  langid = {en}
}
For attribution, please cite this work as:
Pargent, F., Goretzko, D., & von Oertzen, T. (2024, March 26). New Insights into PCA + Varimax for Psychological Researchers. https://florianpargent.github.io/myquartomanuscript/