SemanticMask: A Contrastive View Design for Anomaly Detection in Tabular Data

SemanticMask: A Contrastive View Design for Anomaly Detection in Tabular Data

Shuting Tao, Tongtian Zhu, Hongwei Wang, Xiangming Meng

Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence
Main Track. Pages 2370-2378. https://doi.org/10.24963/ijcai.2024/262

Contrastive learning based on data augmentation techniques has recently achieved substantial advancement in learning a representation well-suited for anomaly detection in image domain. However, due to the lack of spatial structure, designing effective data augmentation methods for tabular data remains challenging. Conventional techniques, such as random mask, disregard the inter-feature correlations and fail to accurately represent the data. To address this issue, we propose a novel augmentation technique called SemanticMask which leverages the semantic information from column names to generate better augmented views. SemanticMask aims to ensure that the shared information between views contains sufficient information for anomaly detection without redundancy. We analyze the relationship between shared information and anomaly detection performance and empirically demonstrate that good views for tabular anomaly detection tasks are feature-dependent. Our experiment results validate the superiority of SemanticMask over the state-of-the-art anomaly detection methods and existing augmentation techniques for tabular data. In further evaluations of the multi-class novelty detection task, SemanticMask also significantly outperforms the baseline.
Keywords:
Data Mining: DM: Anomaly/outlier detection
Machine Learning: ML: Unsupervised learning