Integrating LLM, VLM, and Text-to-Image Models for Enhanced Information Graphics: A Methodology for Accurate and Visually Engaging Visualizations

Integrating LLM, VLM, and Text-to-Image Models for Enhanced Information Graphics: A Methodology for Accurate and Visually Engaging Visualizations

Chao-Ting Chen, Hen-Hsen Huang

Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence
Demo Track. Pages 8627-8630. https://doi.org/10.24963/ijcai.2024/995

This study presents an innovative approach to the creation of information graphics, where the accuracy of content and aesthetic appeal are of paramount importance. Traditional methods often struggle to balance these two aspects, particularly in complex visualizations like phylogenetic trees. Our methodology integrates the strengths of Large Language Models (LLMs), Vision Language Models (VLMs), and advanced text-to-image models to address this challenge. Initially, an LLM plans the layout and structure, employing Mermaid—a JavaScript-based tool that uses Markdown-like scripts for diagramming—to establish a precise and structured foundation. This structured script is crucial for ensuring data accuracy in the graphical representation. Following this, text-to-image models are employed to enhance the vector graphic generated by Mermaid, adding rich visual elements and enhancing overall aesthetic appeal. The integration of text-to-image models is a key innovation, enabling the creation of graphics that are not only informative but also visually captivating. Finally, a VLM performs quality control, ensuring that the visual enhancements align with the informational accuracy. This comprehensive approach effectively combines the accuracy of structured data representation, the creative potential of text-to-image models, and the validation capabilities of VLMs. The result is a new standard in information graphic creation, suitable for diverse applications ranging from education to scientific communication, where both information integrity and visual engagement are essential.
Keywords:
Multidisciplinary Topics and Applications: MDA: Arts and creativity
Computer Vision: CV: Vision and language 
Humans and AI: HAI: Intelligent user interfaces