CLR-Face: Conditional Latent Refinement for Blind Face Restoration Using Score-Based Diffusion Models

CLR-Face: Conditional Latent Refinement for Blind Face Restoration Using Score-Based Diffusion Models

Maitreya Suin, Rama Chellappa

Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence
Main Track. Pages 1290-1298. https://doi.org/10.24963/ijcai.2024/143

Recent generative methods have shown promising blind face restoration performance. They usually project the degraded images to the latent space and then decode high-quality faces either by single-stage latent optimization or directly from the encoding. Generating fine-grained facial details faithful to inputs remains challenging. Most existing methods produce either overly smooth outputs or alter the identity. This could be attributed to the typical trade-off between quality and resolution in the latent space. If the latent is highly compressed, the decoded output is more robust to degradations but shows worse fidelity. On the other hand, a more flexible latent space can capture intricate details better, but is extremely difficult to optimize for highly degraded faces. We introduce a diffusion-based-prior inside a VQGAN architecture that focuses on learning the distribution over uncorrupted latent embeddings. We iteratively recover the clean embedding conditioning on the degraded counterpart. Furthermore, to ensure the reverse diffusion trajectory does not deviate from the underlying identity, we train a separate Identity Recovery Network and use its output to constrain the reverse diffusion. Specifically, using a learnable latent mask, we add gradients from a face-recognition network to a subset of latent features that correlates with the finer identity-related details in the pixel space, leaving the other features untouched. Disentanglement between perception and fidelity in the latent space allows us to achieve the best of both worlds. We perform extensive evaluations on multiple real and synthetic datasets to validate our approach.
Keywords:
Computer Vision: CV: Biometrics, face, gesture and pose recognition
Computer Vision: CV: Image and video synthesis and generation 
Machine Learning: ML: Generative models