TaD: A Plug-and-Play Task-Aware Decoding Method to Better Adapt LLMs on Downstream Tasks
TaD: A Plug-and-Play Task-Aware Decoding Method to Better Adapt LLMs on Downstream Tasks
Xinhao Xu, Hui Chen, Zijia Lin, Jungong Han, Lixing Gong, Guoxin Wang, Yongjun Bao, Guiguang Ding
Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence
Main Track. Pages 6587-6596.
https://doi.org/10.24963/ijcai.2024/728
Fine-tuning pre-trained models on downstream tasks is a common practice in leveraging large language models (LLMs) today. A critical issue is how to adapt pre-trained models to downstream tasks better, thereby enhancing their performance. This paper introduces Task-aware Decoding (TaD), a plug-and-play method that exploits the difference in probability distributions before and after fine-tuning to boost the performance of LLMs on downstream tasks. The proposed TaD argues that the difference between the pre-finetuning probability distribution and the post-finetuning one represents the direction from common knowledge towards specific downstream-task knowledge. Aligning the final output probability distribution to that direction can probably result in superior downstream task performance, compared to the original fine-tuned model. Experiments on various datasets across four different task categories well demonstrate TaD's effectiveness on different LLMs, i.e., GPT, BLOOM, and LLaMA, with different fine-tuning methods. Moreover, further experiments reveal that TaD better enhances model performance in data-scarce scenarios.
Keywords:
Natural Language Processing: NLP: Language generation
Natural Language Processing: NLP: Applications
Natural Language Processing: NLP: Language models