Reliability generalization meta-analysis of the internal consistency of the Big Five Inventory (BFI) by comparing BFI (44 items) and BFI-2 (60 items) versions controlling for age, sex, language factors

Husain, W., Haddad, A. J., Husain, M. A., Ghazzawi, H., Trabelsi, K., Ammar, A., Saif, Z., Pakpour, A., & Jahrami, H. (2025). Reliability generalization meta-analysis of the internal consistency of the big five inventory (bfi) by comparing bfi (44 items) and bfi-2 (60 items) versions controlling for age, sex, language factors. BMC Psychology, 13(1). doi:/10.1186/s40359-024-02271-x

Abstract

The Big Five Inventory (BFI) is a popular measure that evaluates personality on the Big-Five model. Apart from its utilization across cultures, the literature did not reveal any meta-analysis for the reliability of the different versions of the BFI and its translations. The current study carried out a reliability generalization meta-analysis (REGEMA) to establish the reliability of the BFI across cultures and languages. We searched 30 databases for the relevant studies from 1991 to mid-November 2024. The studies that we intended to include in our meta-analysis required to have utilized the BFI (44 items) and the BFI-2 (60 items) and have reported Cronbach's alpha or McDonald's omega reliability estimates. Our coded variables included BFI version, sample size, population type, age, gender, clinical state, and reliability. A total of 57 studies (datapoints) published in 34 research articles (involving 43,715 participants; 60.24% women; Mean age = 30.08) from various cultures and languages were finally included. These studies used BFI and BFI-2 in Arabic, Chinese, Croatian, Czech, Danish, Dutch, English, French, German, Indonesian, Italian, Japanese, Malay, Norwegian, Polish, Portuguese, Russian, Serbian, Spanish, Swahili, and Turkish. Data analysis was conducted using the metafor and meta packages in R. The average correlation was computed using a random-effects model and reliability coefficients indicated effect size. I2 and Cochran's Q tests were used to examine heterogeneity, with prediction intervals suggesting genuine influences around the pooled estimate. Using funnel plots, regression-based tests (e.g., Egger's regression, rank correlation), and trim-and-fill imputation, publication bias was adjusted to estimate unbiased effects. We calculated the individual and combined reliability of the BFI and BFI-2 across languages and cultures. The results revealed the reliability of all five factors used in the BFI/BFI-2. The BFI estimates provide the following results: openness is estimated at 0.77 (95% CI: 0.75; 0.80); conscientiousness is estimated at 0.80 (95% CI: 0.78; 0.82); extraversion is also estimated at 0.80 (95% CI: 0.79; 0.82); agreeableness is estimated at 0.73 (95% CI: 0.71; 0.76); and neuroticism is estimated at 0.80 (95% CI: 0.79; 0.82). The BFI-2 estimates are as follows: openness is estimated at 0.83 (95% CI: 0.82; 0.84); conscientiousness is estimated at 0.86 (95% CI: 0.85; 0.87); extraversion is estimated at 0.85 (95% CI: 0.84; 0.86); agreeableness is also estimated at 0.80 (95% CI: 0.79; 81); and neuroticism is estimated at 0.89 (95% CI: 0.88; 0.89). The current meta-analysis represents the first reliability analysis of the BFI and the first comparison between its two different versions, the BFI (44 items) and the BFI-2 (60 items). The generalized reliability of both the BFI and BFI-2 were established. The findings confirm that the BFI and BFI-2 have good reliability across all five factors.

What This Study Is About

Researchers wanted to know if a famous personality test called the Big Five Inventory (BFI) is truly reliable. They compared the original version to a newer, longer version to see if they give consistent results across different ages, genders, and languages.

How They Studied It

Instead of testing new people, the researchers did a "meta-analysis"—which is like a giant review of everyone else's homework. They combined data from 57 different studies involving 43,715 participants from all over the world. These people spoke 22 different languages, including Arabic, Chinese, and Swahili. The team compared the older 44-question test with the newer 60-question version (the BFI-2).

What They Found

The researchers found that both versions of the test are very "reliable." In science, reliability means that if you measured the same thing twice, you’d get a similar result—like a scale that doesn't change its mind every time you step on it.

They found that the newer, longer test (BFI-2) was slightly more consistent than the original. It was particularly good at measuring all five main personality traits: Openness, Conscientiousness, Extraversion, Agreeableness, and Neuroticism.

What This Might Mean

This is a big deal for aphantasia research. Aphantasia is the inability to create mental imagery (the "mind's eye"). To understand if people with aphantasia have different personality types—for example, if they are more or less "Open to Experience"—scientists need tools they can trust.

This study suggests that the BFI-2 is a solid, trustworthy tool for researchers to use worldwide. However, because the study found that results can vary slightly depending on culture and language, scientists must still be careful when comparing results from different countries.

One Interesting Detail

The researchers found that the test stayed remarkably consistent regardless of the participants' age or gender, proving that these personality traits are stable parts of being human across the globe!

Reliability generalization meta-analysis of the internal consistency of the Big Five Inventory (BFI) by comparing BFI (44 items) and BFI-2 (60 items) versions controlling for age, sex, language factors

Abstract

Authors

What This Study Is About

How They Studied It

What They Found

What This Might Mean

One Interesting Detail

Joel Pearson

Merlin Monzel

Adam Zeman

Rebecca Keogh

Martin Reuter

Juha Silvanto

Carla Dance

Paolo Bartolomeo

Jianghao Liu

Julia Simner