X-former Elucidator: Reviving Efficient Attention for Long Context Language Modeling

X-former Elucidator: Reviving Efficient Attention for Long Context Language Modeling

Xupeng Miao, Shenhan Zhu, Fangcheng Fu, Ziyu Guo, Zhi Yang, Yaofeng Tu, Zhihao Jia, Bin Cui

Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence
Survey Track. Pages 8179-8187. https://doi.org/10.24963/ijcai.2024/904

Transformer-based LLMs are becoming increasingly important in various AI applications. However, apart from the success of LLMs, the explosive demand of long context handling capabilities is a key and in-time problem for both academia and industry. Due to the limitations from the quadratic complexity of the attention mechanism, long context scenarios require much more resources for LLM development and deployment, bringing huge challenges to the underlying AI infrastructure. Meanwhile, we observe that there is a trend of reviving previous efficient attention mechanisms to latest LLMs. However, it still remains an open question about how to select from these diverse approaches in practice. In this paper, we answer this question from several aspects. First, we revisit these latest long-context LLM innovations and discuss their relationship with prior approaches with a novel and comprehensive taxonomy. Next, we conduct a thorough evaluation over various types of workloads considering both efficiency and effectiveness. Finally, we provide an in-depth analysis, summarize our key findings, and offer insightful suggestions on the trade-offs of designing and deploying efficient attention mechanisms for Transformer-based LLMs.
Keywords:
Natural Language Processing: General
Machine Learning: ML: Attention models
Multidisciplinary Topics and Applications: MTA: Computational sustainability
Natural Language Processing: NLP: Language models