Discriminative Feature Decoupling Enhancement for Speech Forgery Detection

Discriminative Feature Decoupling Enhancement for Speech Forgery Detection

Yijun Bei, Xing Zhou, Erteng Liu, Yang Gao, Sen Lin, Kewei Gao, Zunlei Feng

Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence
Main Track. Pages 340-348. https://doi.org/10.24963/ijcai.2024/38

The emergence of AIGC has brought attention to the issue of generating realistic deceptive content. While AIGC has the potential to revolutionize content creation, it also facilitates criminal activities. Specifically, the manipulation of speech has been exploited in tele-fraud and financial fraud schemes, posing a significant threat to societal security. Current deep learning-based methods for detecting forged speech extract mixed features from the original speech, which often contain redundant information. Moreover, these methods fail to consider the distinct characteristics of human voice-specific features and the diversity of background environmental sounds. This paper introduces a framework called Discriminative fEature dEcoupling enhanceMent (DEEM) for detecting speech forgery. Initially, the framework decouples the original speech into human voice features and background sound features. Subsequently, DEEM enhances voice-specific features through temporal dimension aggregation and improves continuity-related features in the background sound map via spectral-dimension aggregation. By employing the decoupling enhancement features, extensive experiments demonstrate that DEEM achieves an accuracy improvement of over 5% on FoR dataset compared to the state-of-the-art methods.
Keywords:
AI Ethics, Trust, Fairness: ETF: Safety and robustness
AI Ethics, Trust, Fairness: ETF: AI and law, governance, regulation
AI Ethics, Trust, Fairness: ETF: Fairness and diversity
AI Ethics, Trust, Fairness: ETF: Societal impact of AI