AK4Prompts: Aesthetics-driven Automatically Keywords-Ranking for Prompts in Text-To-Image Models

AK4Prompts: Aesthetics-driven Automatically Keywords-Ranking for Prompts in Text-To-Image Models

Haiyang Zhang, Mengchao Wang, Shuai He, Anlong Ming

Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence
Main Track. Pages 1661-1669. https://doi.org/10.24963/ijcai.2024/184

Current text-to-image synthesis (TIS) models have demonstrated the ability to generate high-fidelity images based on textual prompts. However, the efficacy of these models heavily relies on the keywords present in the prompts, and there is a dearth of objective analysis regarding how different keywords impact the ultimate quality of generated results. Therefore, manual evaluation becomes necessary but limited and inefficient to ascertain the role played by keywords. In this paper, we propose automated keywords-ranking for prompts (AK4Prompts), a keyword evaluation model based on mainstream TIS models that explicitly quantifies the multidimensional impact of various keywords on image generation based on prompts. To enable personalized keyword evaluation based on prompt content, we propose decoupling the latent representations of keywords and prompts in TIS models, followed by integrating the semantic features of prompts into keywords. For quantitative and multidimensional evaluation, we align the fused features of keywords using HPSv2, aesthetic score, and CLIP score, each representing distinct factors contributing to keyword impact. Our AK4Prompts can flexibly and automatically select the keywords that best match the original prompt based on individual user preferences. Extensive experimental results show the superiority of AK4Prompts to improve the quality of generated images significantly over strong baselines. Our approach not only enhances usability and user experience but also addresses the current gap in automated analysis and evaluation of keyword effects. Our code is availableat https://github.com/mRobotit/AK4Prompts.
Computer Vision: CV: Image and video synthesis and generation 
Computer Vision: CV: Computational photography
Computer Vision: CV: Machine learning for vision