Prompt Learning with Extended Kalman Filter for Pre-trained Language Models

Prompt Learning with Extended Kalman Filter for Pre-trained Language Models

Quan Li, Xike Xie, Chao Wang, S. Kevin Zhou

Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence
Main Track. Pages 4452-4460. https://doi.org/10.24963/ijcai.2024/492

Prompt learning has gained popularity as a means to leverage the knowledge embedded in pre-trained language models (PLMs) for NLP tasks while using a limited number of trainable parameters. While it has shown promise in tasks like sentiment classification and natural language inference, generating suitable prompts for PLMs, as opposed to human prompts, remains a challenge. In this paper, we introduce an abstraction of the prompt learning process using an extended Kalman filter. Our approach, called Conditional Extended Kalman Filter based on Neural Networks (CEKFNN), effectively infers more appropriate prompt tokens by enhancing the classic extended Kalman filter with PLM's contextual representation power. Specifically, CEKFNN learns transition and emission functions from PLM embeddings of input sentences to infer latent prompt tokens. We refine CEKFNN using an alternate-training approach, retraining a PLM's emission function with prompt tokens inferred by prompt models (PMs), as well as the initial and transition functions. PLM's output labels assist in PMs' training. When updating the pre-trained language model (PLM), we use an adapter approach with few trainable parameters, leaving PLM parameters frozen. We evaluate CEKFNN across open-source PLMs, demonstrating performance improvements over state-of-the-art methods while using a limited number of trainable parameters. It shows that CEKFNN performs on-par or better than fine-tuning, which requires updating all parameters in the PLM.
Keywords:
Machine Learning: ML: Deep learning architectures
Machine Learning: ML: Bayesian learning
Natural Language Processing: NLP: Language models
Planning and Scheduling: PS: Markov decisions processes