Published Date : 8/25/2025Â
A new research paper delves into the contentious issue of demographic fairness in facial recognition systems. Authored by researchers from Idiap and published in IEEE Transactions on Biometrics, Behavior, and Identity Science, the paper titled “Review of Demographic Fairness in Face Recognition” systematically examines the primary causes, datasets, assessment metrics, and mitigation approaches associated with performance differences in facial recognition across demographic groups.
The paper aims to provide researchers with a unified perspective on the state-of-the-art while emphasizing the critical need for equitable and trustworthy facial recognition (FR) systems. As FR technologies are increasingly deployed globally, disparities in performance across demographic groups—such as race, ethnicity, and gender—have garnered significant attention. The authors cite several real-world incidents that underscore the societal risks associated with such disparities.
Most of the incidents of false identifications by facial recognition technology (FRT) involve Black people. As such, the authors’ major focus is on race and ethnicity, but they also include gender-related studies within the broader context. Age is, for the most part, not in scope.
Having acknowledged the problem of potential bias in facial biometric systems, the paper notes that the issue has been formally incorporated into the evaluation frameworks of prominent initiatives. Specifically, the National Institute of Standards and Technology (NIST)’s Face Recognition Vendor Tests (FRVT) benchmarks have been pivotal. Since 2019, FRVT reports have incorporated analyses of demographic disparities, making them a key reference for assessing fairness in FR.
Other initiatives touching on bias include the Maryland Test Facility (MdTF) and the European Association for Biometrics (EAB), though at a smaller scale and scope compared to NIST. In the section analyzing the causes of varied performance in facial recognition systems, the categorization system encompasses factors such as imbalances in training datasets, variability in skin-tones, algorithmic sensitivity, image quality, related covariates, combined or intersectional factors, and soft attributes. It stresses that biased decisions often result from a combination of factors in concert.
On skin tone, the paper references a 2019 report from the Biometric Technology Rally organized by MdTF, which notes how skin reflectance—the measurable amount of light reflected from the skin surface—is a better metric than “skin tone,” which refers to perceived skin color. Using systematic linear modeling, their study demonstrated that darker skin-tones were associated with longer transaction (processing overall pipeline) times and lower accuracy in biometric systems. Longer transaction times were primarily attributed to difficulties in the face detection or image acquisition stage under suboptimal lighting conditions.
Lower skin reflectance can reduce contrast, making it harder for detection algorithms to localize the face, thus increasing processing time. This dependency was found to vary substantially across systems, highlighting the important role of acquisition methods in determining the extent of performance differences. Regardless, the authors add the important note that, while many studies report that individuals with lighter skin tones tend to be recognized more accurately than those with darker skin-tones, there is no consistent consensus that skin tone is the primary driver of differences in FR performance across demographic groups.
The report lists datasets used for the study of demographic accuracy differences and gets highly technical in its explanation of assessment methods and metrics. It looks at bias mitigation systems across the biometric processing lifecycle and casts a glance at future directions to be explored if remaining challenges are to be overcome.
In conclusion, the paper identifies training data imbalance, skin-tone variations, and image quality as key factors in facial recognition, as well as the growing recognition of non-demographic attributes, such as facial hair, hairstyle, and makeup, in shaping recognition outcomes. These factors, though not inherently demographic, are deeply intertwined with social and cultural norms that vary across gender and ethnicity. In the context of FR, these soft attributes function as partial occlusions or lead to shifts in the underlying data distribution. And when these occlusions correspond with specific demographic groups, they contribute to unequal recognition outcomes and further exacerbate existing disparities, mimicking demographic bias.
In short, more so than skin tone or gender in themselves, specific features like beards or hairdos may be causing facial recognition systems to treat certain demographics less fairly. Recent studies have demonstrated that many of the observed demographic fairness in FR may in fact be driven by these correlated non-demographic traits. These collective insights underscore the significant influence of non-demographic but demographically correlated appearance factors in shaping recognition performance.Â
Q: What is the main focus of the research paper by Idiap and IEEE?
A: The main focus of the research paper is to examine the primary causes, datasets, assessment metrics, and mitigation approaches associated with performance differences in facial recognition across demographic groups.
Q: Why are false identifications by facial recognition technology (FRT) predominantly involving Black people?
A: False identifications by FRT predominantly involve Black people due to disparities in performance across demographic groups, which have been highlighted in several real-world incidents.
Q: What is the role of the National Institute of Standards and Technology (NIST) in assessing fairness in facial recognition?
A: NIST's Face Recognition Vendor Tests (FRVT) benchmarks have incorporated analyses of demographic disparities since 2019, making them a key reference for assessing fairness in facial recognition.
Q: What metric is better for measuring skin reflectance in facial recognition systems?
A: Skin reflectance, the measurable amount of light reflected from the skin surface, is a better metric than perceived skin tone for measuring skin reflectance in facial recognition systems.
Q: How do non-demographic attributes like facial hair and makeup contribute to bias in facial recognition?
A: Non-demographic attributes like facial hair and makeup can function as partial occlusions or lead to shifts in the underlying data distribution, contributing to unequal recognition outcomes and exacerbating existing disparities.Â