Diffutoon: High-Resolution Editable Toon Shading via Diffusion Models

Diffutoon: High-Resolution Editable Toon Shading via Diffusion Models

Zhongjie Duan, Chengyu Wang, Cen Chen, Weining Qian, Jun Huang

Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence
AI, Arts & Creativity. Pages 7645-7653. https://doi.org/10.24963/ijcai.2024/846

Toon shading is a type of non-photorealistic rendering task in animation. Its primary purpose is to render objects with a flat and stylized appearance. As diffusion models have ascended to the forefront of image synthesis, this paper delves into an innovative form of toon shading based on diffusion models, aiming to directly render photorealistic videos into anime styles. In video stylization, existing methods encounter persistent challenges, notably in maintaining consistency and achieving high visual quality. In this paper, we model the toon shading problem as four subproblems, i.e., stylization, consistency enhancement, structure guidance, and colorization. To address the challenges in video stylization, we propose an effective toon shading approach called Diffutoon. Diffutoon is capable of rendering remarkably detailed, high-resolution, and extended-duration videos in anime style. It can also edit the video content according to input prompts via an additional branch. The efficacy of Diffutoon is evaluated through quantitive metrics and human evaluation. Notably, Diffutoon surpasses both open-source and closed-source baseline approaches in our experiments. Our work is accompanied by the release of both the source code and example videos on Github.
Keywords:
Application domains: Computer Graphics and Animation
Theory and philosophy of arts and creativity in AI systems: Autonomous creative or artistic AI