Clearview Doesn’t Know How Many Individuals Are In Its Database

Clearview AI‘s facial recognition database now has 30 billion images, according to CEO Hoan Ton-That. It’s the latest illustration of the platform’s rapid – and massive – scaling, with Clearview having announced that it reached the 10 billion-image milestone in February of this year, and gone on to reach 20 billion images in late March.

The latest development comes as Clearview seeks to expand its business activities into the private sector, having launched a commercial solution designed to enable selfie-based identity verification earlier this year. The system is designed to use facial recognition to match an end user to their official ID, and can also be set up to match a user against images in a given client’s database.

That framework wouldn’t necessarily benefit from an increase in the size of Clearview’s own image database, which exists mainly to support the company’s police and security-focused facial recognition solution. That system has made Clearview AI notorious among privacy advocates, owing primarily to its trawling of the internet, including social media profiles, to collect face images against which large-scale biometric searches may be performed. This approach to data collection has drawn not only criticism but also serious fines from privacy regulators.

For his part, Ton-That confirmed to FindBiometrics that Clearview “continues to collect images from the public internet,” but stressed that the company is only taking data from “public social media posts,” for this which there is no expectation of privacy. The system “doesn’t collect any private data,” the chief executive said.

Ton-That also offered some further clarity about how Clearview AI’s system operates, explaining that it doesn’t contain a database of unique individuals – indeed, Clearview’s administrators don’t actually know how many individuals comprise its 30 billion face images.

“We don’t have a way from the way our database is setup to calculate the number of unique individuals,” he said. “Our database just returns the most similar to least similar photos in a search, and doesn’t have a concept of an ‘identity’, just an ordered list of results.”

Such an approach may at least partially address the concerns of privacy advocates. But Ton-That has also been eager to respond to broader criticisms of facial recognition technology with respect to racialized outcomes.

Many facial recognition algorithms have been found to show discrepancies in accuracy between subjects of different demographic groups. One study conducted by the National Institute of Standards and Technology (NIST) in 2019, for example, looked at 189 different algorithms, and found that depending on the system, Asian and African American subjects could be up to 100 times more likely to be misidentified than white males. Such disparities can have serious real-world consequences, such as wrongful arrests.

The disparities are generally believed to stem from inadequate training of machine learning systems, such as by using datasets of primarily white faces. But in a recent letter to the Pittsburgh Post-Gazette, Ton-That argued that the state of the art has advanced considerably in recent years.

“As of 2022, independent assessment by the National Institute of Standards and Technology, the world’s experts in algorithm evaluation, shows at least 25 algorithms exist that are 99 percent or more accurate at picking the correct matching image out of a lineup of millions of images, across all demographic groups,” he argued, adding, “These are the facts about the accuracy of facial recognition technology in 2022”.

For Clearview AI’s part, the company argued in a recent patent win announcement that its “unique data preparation and distributed training algorithms” helped to eliminate demographic bias in its facial recognition system. In that sense, at least, its enormous dataset appears to be a real asset in minimizing facial recognition’s potential for discrimination.

–

October 7, 2022 – by Alex Perala

[UPDATE 10/07/22: Ton-That’s letter to the Post-Gazette originally stated that there were 100 algorithms with a 99 percent or more accuracy rate; both the letter and this article have corrected the number of algorithms to “at least 25”.]

Related News

Partners

FaceTec’s patented, industry-leading 3D Face Verification and Reverification software anchors digital identity, creating a chain of trust from user onboarding to ongoing authentication on all modern smart devices and webcams. FaceTec’s 3D FaceMaps™ finally make trusted, remote identity verification possible. As the only technology backed by a persistent spoof bounty program and NIST/iBeta Certified Liveness Detection, FaceTec is the global standard for 3D Liveness and Face Matching with millions of users on six continents in financial services, border security, transportation, blockchain, e-voting, social networks, online dating and more. www.facetec.com

AuthenticID provides 100% automated identity verification and fraud detection solutions that are leveraged by companies worldwide, including 2 of the top 3 U.S. Banks, 8 out the top 10 wireless providers in North America, and 2 of the 3 credit bureaus. Using proprietary computer vision and machine learning technology, these solutions help companies accurately verify the identity of their users across retail, digital and call center environments for onboarding and ongoing re-authentication events; KYC, IAM, and more. The solutions are easy to integrate and provide customers a large ROI by stopping fraud losses, increasing customer conversion at onboarding, reducing operational costs and allowing quick and cost-effective operational scalability, all while ensuring global privacy regulations are complied with. https://www.authenticid.com/

Anonybit is a privacy-focused technology platform that provides decentralized solutions for securing personal data, particularly biometric information. Rather than storing sensitive data in centralized repositories, which are vulnerable to breaches, Anonybit uses a distributed architecture that breaks data into smaller encrypted bits and stores them across a network of decentralized nodes. This approach ensures that no single entity has access to the full dataset, enhancing privacy and security, even preventing insider threats. Anonybit is used by leading banks, fintechs and other enterprises for critical biometric identity functions like deduplication and blocklist checks, step up authentication, passwordless login and account recovery https://anonybit.io/

Identity Week aims to be a significant identity industry catalyst. It’s our mission is to help accelerate the move towards a world where trusted identity solutions enable governments and commercial organisations to provide citizens, employees, customers and consumers with a multitude of opportunities to transact in a seamless, yet secure manner. All the while preventing the efforts of those intent on doing harm. https://identityweek.net/

The Biometric Digital Identity Prism is a market landscape framework designed to help influencers and decision makers understand, innovate, and implement digital identity technologies and solutions. This innovative framework for understanding and evaluating the rapidly evolving biometric digital identity marketplace is the only market model that is truly biometric-centric based on the foundational conviction that in the age of digital transformation the only true, reliable link between humans and their digital data is biometrics. https://www.the-prism-project.com

ID R&D is an innovator of biometric facial liveness, document liveness, and voice biometrics. Ranked #1 by NIST, our patented passive approach to liveness, and specialized detection of voice clones, injection attacks and deepfakes, empowers KYC and authentication systems with fast, accurate and secure biometric verification technologies. Over 140 partners in more than 70 countries are collectively processing hundreds of millions of identity checks per year. ID R&D solutions easily integrate with mobile, web, messaging, smart speakers, set-top boxes, and IoT devices. Learn more: www.idrnd.ai

Related News

Footer

Follow Us