Abstract

Generating 3D faces and rendering them into images has numerous practical applications, including AR/VR, dataset generation, and avatar creation. In recent years, there has been a significant surge in the development of high-fidelity 3D face generation techniques such as StyleSDF, which combine the benefits of 3D implicit neural representations with those of style-based 2D generative adversarial networks (GANs). Although these implicit 3D GAN approaches generate highly realistic faces using a 3D representation, the properties of the generated faces cannot easily be edited or controlled. Meanwhile, linear 3D morphable models (3DMMs) and their nonlinear extensions have also made significant strides in their expressive capacity and quality. Still, they have yet to match the image quality achieved by GANs. This paper proposes a new method, CoLa-SDF, which combines the controllability of nonlinear 3DMMs with the high fidelity of implicit 3D GANs. Inspired by the impressive photorealism and expressive 3D representations of StyleSDF, our model uses a similar architecture but enforces the latent space to match the interpretable and physical parameters of the nonlinear 3D morphable model MOST-GAN. We demonstrate high-fidelity image synthesis and subsequent 3D manipulation with full control over the disentangled latent parameters.