Researchers at Pittsburgh’s Carnegie Mellon University (CMU) have managed to create highly realistic 3D reconstructions of individual faces using data collected from video recordings from a smartphone camera and deep learning algorithms.
Typically, the process of creating accurate 3D reconstructions of a human face requires expensive specialized scanners, cameras and a studio, not to mention the expertise. The researchers used a camera to shoot continuous footage of the front and sides of the face, generating a massive amount of data. Then using a two-step process developed by CMU’s Robotics Institute that uses that data, they were able to build a digital reconstruction of the face with sub-millimeter accuracy.
Besides obvious applications for animation, gaming or virtual or augmented reality, accurate digital faces are also useful for biometric identification or even medical health for building custom surgical masks or respirators.
“Building a 3D reconstruction of the face has been an open problem in computer vision and graphics because people are very sensitive to the look of facial features,” said Simon Lucey, an associate research professor in the Robotics Institute. “Even slight anomalies in the reconstructions can make the end result look unrealistic,” he added.
The research team demonstrated their method in a presentation in early March by shooting 15-20 seconds of continuous video in slow-motion on an iPhone X.
“The high frame rate of slow motion is one of the key things for our method because it generates a dense point cloud,” Lucey said.
The team then used a technique known as visual simultaneous localization and mapping (SLAM) — which triangulates points on a surface, calculating its shape while simultaneously determining the positioning of the camera — to create an initial geometric map of the face, though with some gaps due to missing data.
The researchers then use deep learning algorithms to fill in those gaps in the second step of the process, mainly to identify a person’s landmark features such as their eyes, ears, and nose, and allowing classic computer vision to fill in the gaps.
Though the method took 30-40 minutes to process, it was completed entirely on a smartphone. The team sees its methods eventually being used to map any 3D object and create a digital reconstruction of it.
Source: Science Daily
–
April 2, 2020 – by Tony Bitzionis
Follow Us