A recent study has revealed that OpenAI’s GPT-4 can perform facial recognition tasks with accuracy comparable to specialized biometric algorithms, despite not being explicitly trained for such functions. Conducted by researchers at the Norwegian University of Science and Technology and the Mizani and Idiap Research Institute, the study tested GPT-4’s ability to recognize faces, determine gender, and estimate age from photos.
GPT-4 achieved a 100 percent accuracy rate in gender recognition on a dataset of 5,400 balanced images, surpassing the DeepFace model, which was specifically designed for this purpose and scored 99 percent accuracy. In age estimation, GPT-4 correctly identified the age range 74.25 percent of the time using the UTKFace dataset, though it tended to estimate wider age ranges for individuals over 60 years old.
That having been said, in testing on the Labeled Faces in the Wild (LFW) dataset, GPT-4 achieved a 95.15 percent accuracy rate in facial recognition tasks, which is notable but falls short of the 99.57 percent accuracy achieved by MobileFaceNet, a model specifically optimized for facial recognition that was also used in the research.
The LFW dataset is widely used to benchmark face verification models because it contains over 13,000 images with 6,000 matched pairs. These pairs are balanced between genuine pairs (same individual) and imposter pairs (different individuals), which provides a robust basis for evaluating the model’s ability to correctly identify and differentiate faces under real-world, uncontrolled conditions. Although GPT-4’s performance is respectable, the lower accuracy relative to MobileFaceNet suggests that GPT-4’s general-purpose capabilities are limited compared to specialized facial recognition systems when it comes to precision.
This accuracy level, however, is still impressive considering that GPT-4 was not explicitly trained or fine-tuned for biometric tasks. As a large language model with emergent capabilities in various domains, GPT-4 managed to recognize faces and distinguish between individuals by analyzing basic facial features. The performance gap observed in the LFW dataset illustrates the trade-off between the versatility of a general-purpose model and the accuracy of domain-specific models.
The research also uncovered that while GPT-4 generally provided detailed and convincing explanations for its face recognition analyses, it was sometimes susceptible to generating false positives—misidentifying non-matching faces as the same person. In these cases, GPT-4 often described the similarities between the images, such as similar expressions or shared features, in ways that were persuasive yet incorrect.
This points to GPT-4’s potential to make compelling but sometimes inaccurate claims, which could be problematic in high-stakes scenarios like identity verification.
The study found other issues, too, as researchers managed to bypass GPT-4’s built-in safeguards against revealing sensitive biometric information by claiming that images were AI-generated. This workaround allowed the model to analyze real photos, highlighting vulnerabilities in the current safety protocols of large language models.
The researchers emphasized the need for more comprehensive safety measures to prevent unauthorized access to and misuse of biometric data by AI systems. This unexpected capability of large language models raises important questions about the ethical implications and the necessity for robust regulatory frameworks to ensure the responsible use of AI in sensitive areas like identity verification and security.
Source: The Decoder
Follow Us