Do explicit 3D priors improve face inpainting? We present 3DFaceFill!

31 May 2022 by Rahul Dey

Face completion is a specific, and more challenging application of image completion centered around faces. The structural integrity of the face needs to be maintained for any face inpainting to look realistic. Face inpainting has primarily been accomplished using end-to-end trained models. These models, usually designed as autoencoders, are trained to take in an artificially masked face image and output the complete image and are usually trained using a combination of photometric, adversarial, and other face-specific losses. As such, these approaches have to implicitly model the geometrical structure of faces. We experimented with several of such approaches including GFC, DeepFillv2, PIC, etc. (cite), and observed that, in challenging cases, their outputs fail in producing a geometrically realistic face image.


Existing approaches typically employ an autoencoder to inpaint masked images

Face is a 3D Object

A face is, however, a 3D object whose appearance in the image is its 2D projection. The appearance of a face in an image depends on its shape, pose, albedo, background illumination, and camera parameters, among other factors.


Face image can be disentangled into its 3D shape, 3D pose, albedo, and illumination components

We observe that a face inpainting model that can explicitly disentangles the face image into these factors, and then selectively fills only the missing albedo, would better retain the geometric realisticity of such inpainted faces. At the same time, 3DMM-based approaches have seen tremendous focus and improvements in recent years, that can disentangle a single face image into such components. Motivated by these observations, we present 3DFaceFill, an analysis-by-synthesis approach for face completion. Below we show the architecture of 3DFaceFill.

alt text

Furthermore, facial albedo is a largely symmetric object, especially when represented in the UV space (as shown in the image below). We leverage this in our proposed approach as an attention mechanism to copy features from the visible parts to occluded counterparts in the symmetric halves as shown in the architecture figure.


Facial albedo is largely symmetric. We leverage this in our proposed 3DFaceFill approach for inpainting masked parts using features from the visible counterparts

Results

Below, we show some of the qualitative results of face completion using various baselines and those from 3DFaceFill. One can observe the various deformations and artifacts the baselines introduce to the completed faces in these examples, which make the output face look not real at all. In comparison, the results from 3DFaceFill are geometrically and photometrically much more realistic.

alt text

We also compared 3DFaceFill versus two of the most competitive baselines, DeepFillv2 and Pluralistic Image Completion (PIC) on real occlusions. One can observe the results in the figure below. Notice that, in the first row, the baselines create artifacts whereas 3DFaceFill doesn’t. In the second row, the baselines change the jawline of the face, whereas 3DFaceFill retains the geometry of the face in the deoccluded image.

alt text

To further show the advantage of explicit 3D consideration for face inpainting, we performed a cross-dataset evaluation on the pose and illumination varying images from the MultiPIE dataset. None of the methods were trained on the MultiPIE dataset. We show the results in the figure below. One can see that inpainting by the baselines becomes worse as the pose and illumination get more challenging. 3DFaceFill images are much less affected by such variations.

alt text

Finally, quantitatively too, 3DFaceFill outperforms the baselines across the whole range of Mask/Face are ratios (#pixels under the mask / # pixels in the face region). Furthermore, the gap between 3DFaceFill and the baselines increases as the mask ratio increases, which points to the observation that the improvement using 3DFaceFill is more important as the occlusion/mask gets more challenging.


As measured in terms of PSNR and LPIPS, 3DFaceFill performs better face completion than the baselines across mask sizes.

Conclusion

In this work, we proposed 3DFaceFill, a 3D-aware face completion method. Our solution was driven by the hypothesis that performing face completion with explicit 3D-disentanglement will allow us to effectively leverage the power of 3D correspondence and lead to face completions that are geometrically and photometrically more accurate. Experimental evaluations, both quantitative and qualitative and across multiple datasets, show the advantages of 3DFaceFill over other baselines, specifically under large variations in pose, illumination, shape, and appearance. These results validate our primary hypothesis.