ParsNets: A Parsimonious Composition of Orthogonal and Low-Rank Linear Networks for Zero-Shot Learning

ParsNets: A Parsimonious Composition of Orthogonal and Low-Rank Linear Networks for Zero-Shot Learning

Jingcai Guo, Qihua Zhou, Xiaocheng Lu, Ruibin Li, Ziming Liu, Jie Zhang, Bo Han, Junyang Chen, Xin Xie, Song Guo

Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence
Main Track. Pages 4062-4070. https://doi.org/10.24963/ijcai.2024/449

This paper provides a novel parsimonious yet efficient design for zero-shot learning (ZSL), dubbed ParsNets, in which we are interested in learning a composition of on-device friendly linear networks, each with orthogonality and low-rankness properties, to achieve equivalent or better performance against deep models. Concretely, we first refactor the core module of ZSL, i.e., the visual-semantics mapping function, into several base linear networks that correspond to diverse components of the semantic space, wherein the complex nonlinearity can be collapsed into simple local linearities. Then, to facilitate the generalization of local linearities, we construct a maximal margin geometry on the learned features by enforcing low-rank constraints on intra-class samples and high-rank constraints on inter-class samples, resulting in orthogonal subspaces for different classes. To enhance the model's adaptability and counterbalance the over-/under-fittings, a set of sample-wise indicators is employed to select a sparse subset from these base linear networks to form a composite semantic predictor for each sample. Notably, maximal margin geometry can guarantee the diversity of features and, meanwhile, local linearities guarantee efficiency. Thus, our ParsNets can generalize better to unseen classes and can be deployed flexibly on resource-constrained devices.
Keywords:
Machine Learning: ML: Cost-sensitive learning
Machine Learning: ML: Ensemble methods
Machine Learning: ML: Few-shot learning
Machine Learning: ML: Learning sparse models