Learning Label Dependencies for Visual Information Extraction

Learning Label Dependencies for Visual Information Extraction

Minghong Yao, Liansheng Zhuang, Houqiang Li, Jiuchang Wei

Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence
Main Track. Pages 6615-6623. https://doi.org/10.24963/ijcai.2024/731

Visual Information Extraction (VIE), which aims to extract structured information from visually rich document images, has drawn much attention due to its wide applications in document understanding. However, previous methods often treat the VIE task as a sequence labeling problem and ignore the label correlations in the sequence, which may significantly degrade their performance. To address this issue, this paper proposes a novel framework to exploit the potential of label correlations to improve the VIE models' performance. Its key idea is to learn the label dependency of entities, and use it to regularize the label sequence. Specifically, to capture the label dependency of entities, a label transformer is pre-trained to assign a higher likelihood to the label sequence that respects the label patterns of document layouts. During testing stages, an inference transformer is used to predict the label sequence by considering not only the features of each entity but also the likelihood of the label sequence evaluated by the label transformer. Our framework can be combined with existing popular VIE models such as LayoutLM and GeoLayoutLM. Extensive experiments on public datasets have demonstrated the effectiveness of our framework.
Keywords:
Natural Language Processing: NLP: Applications
Natural Language Processing: NLP: Information extraction