NanoAdapt: Mitigating Negative Transfer in Test Time Adaptation with Extremely Small Batch Sizes

NanoAdapt: Mitigating Negative Transfer in Test Time Adaptation with Extremely Small Batch Sizes

Shiji Zhao, Shao-Yuan Li, Sheng-Jun Huang

Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence
Main Track. Pages 5572-5580. https://doi.org/10.24963/ijcai.2024/616

Test Time Adaptation (TTA) has garnered significant attention in recent years, with the research focus on addressing distribution shifts during test time. As one fundamental component of many TTA methods, the Batch Normalization (BN) layer plays a crucial role in enabling the model adaptability. However, existing BN strategies can prove detrimental when the batch size is (extremely) small. In numerous real-world scenarios, limited hardware resources or just-in-time demand often necessitates adjusting models with very small batch sizes, making existing methods less practical. In this paper, we first showcase and thoroughly analyze the negative transfer phenomenon in previous TTA methods encountering extremely small batch sizes. Subsequently, we propose a novel batch size-agnostic method called NanoAdapt to effectively mitigate the negative transfer even with batch size 1. NanoAdapt is composed of three key components: a dynamic BN calibration strategy that leverages historical information and the Taylor series to refine the statistics estimations, an entropy-weighted gradient accumulation strategy that uses the entropy of each sample's label prediction to weigh and accumulate the loss for backpropagation, and a novel proxy computation graph to capture the sample interactions. Extensive experiments are conducted to validate the superiority of NanoAdapt, showing its consistent efficacy in improving existing TTA methods.
Keywords:
Machine Learning: ML: Multi-task and transfer learning
Computer Vision: CV: Transfer, low-shot, semi- and un- supervised learning