Low-Resource NER by Data Augmentation With Prompting
Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence
Main Track. Pages 4252-4258.
https://doi.org/10.24963/ijcai.2022/590
Named entity recognition (NER) is a fundamental information extraction task that seeks to identify entity mentions of certain types in text.
Despite numerous advances, the existing NER methods rely on extensive supervision for model training, which struggle in a low-resource scenario with limited training data.
In this paper, we propose a new data augmentation method for low-resource NER, by eliciting knowledge from BERT with prompting strategies.
Particularly, we devise a label-conditioned word replacement strategy that can produce more label-consistent examples by capturing the underlying word-label dependencies, and a prompting with question answering method to generate new training data from unlabeled texts.
The experimental results have widely confirmed the effectiveness of our approach.
Particularly, in a low-resource scenario with only 150 training sentences, our approach outperforms previous methods without data augmentation by over 40% in F1 and prior best data augmentation methods by over 2.0% in F1. Furthermore, our approach also fits with a zero-shot scenario, yielding promising results without using any human-labeled data for the task.
Keywords:
Natural Language Processing: Information Extraction
Natural Language Processing: Knowledge Extraction