Not Yet: Large Language Models Cannot Replace Human Respondents for Psychometric Research

P. Wang, H. Zou, Z. Yan, F. Guo, T. Sun, Z. Xiao, B. Zhang, OSF Preprints. osf.io/rwy9b, 2024.

Keywords: Artificial Intelligence, Large Language model, Personality, Psychometrics, Survey Methodology

Multiple studies have claimed that artificial intelligence (AI), particularly large language models (LLMs), can simulate human-like responses on various psychological tasks such that AI may replace human respondents for social science studies. However, this claim may be premature because of limitations in the design and evaluation metrics of previous studies. The present study aimed to provide a comprehensive evaluation of this claim, focusing on LLMs, by comparing six types of LLM-generated responses and human responses to the Big Five Inventory-2 (BFI-2) and the HEXACO-100 personality inventory. While previous research has primarily highlighted similarities between LLM-generated responses and human responses at the broad personality domain level in terms of descriptive statistics (mean and standard deviation), we took a closer look by first comparing descriptive statistics at the item, facet, and domain levels. Then, we performed a comprehensive psychometric analysis (e.g., model fit, factor loadings, inter-factor correlations) of LLM-generated responses to examine the degree to which LLM-generated responses produced similar results as those produced by human responses. Our findings indicated that although LLMs perform well in replicating broad-level patterns, they fall short at the item level, where subtle human differences are more accurately captured, and significant psychometric challenges remain when using LLM-generated responses. Additionally, we explore the influence of social desirability on LLM-generated responses and apply logistic regression to differentiate between LLM and human responses. We emphasize the importance of rigorous validation and adherence to psychometric principles when using LLMs for psychological research.

Download paper here