With a Little Help from Language: Semantic Enhanced Visual Prototype Framework for Few-Shot Learning

With a Little Help from Language: Semantic Enhanced Visual Prototype Framework for Few-Shot Learning

Hecheng Cai, Yang Liu, Shudong Huang, Jiancheng Lv

Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence
Main Track. Pages 3751-3759. https://doi.org/10.24963/ijcai.2024/415

Few-shot learning (FSL) aims to recognize new categories given limited training samples. The core challenge is to avoid overfitting to the minimal data while ensuring good generalization to novel classes. One mainstream method employs prototypes from visual feature extractors as classifier weight and the performance depends on the quality of the prototype. Since different categories may have similar visual features, the visual prototype has limitations. This is because existing methods only learn a simple visual feature extractor during the pre-training stage but neglect the importance of a well-developed feature space for the prototype. We introduce the Semantic Enhanced Visual Prototype framework (SEVpro) to address this issue. SEVpro refines prototype learning from the pre-training stage and serves as a versatile plug-and-play framework for all prototype-based FSL methods. Specifically, we enhance prototype discriminability by transforming semantic embeddings into the visual space, aiding in separating categories with similar visual features. For novel class learning, we leverage knowledge from base classes and incorporate semantic information to elevate prototype quality further. Meanwhile, extensive experiments on FSL benchmarks and ablation studies demonstrate the superiority of our proposed SEVpro for FSL.
Keywords:
Machine Learning: ML: Few-shot learning
Machine Learning: ML: Multi-modal learning