Innovative Directional Encoding in Speech Processing: Leveraging Spherical Harmonics Injection for Multi-Channel Speech Enhancement

Innovative Directional Encoding in Speech Processing: Leveraging Spherical Harmonics Injection for Multi-Channel Speech Enhancement

Jiahui Pan, Pengjie Shen, Hui Zhang, Xueliang Zhang

Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence
Main Track. Pages 6451-6459. https://doi.org/10.24963/ijcai.2024/713

Multi-channel speech enhancement leverages multiple microphones to extract target speech signals amid background noise. Effectively utilizing directional cues is key for robust enhancement. While deep learning shows promise for multi-channel speech processing, most methods operate on short-time Fourier transform (STFT) coefficients directly. We propose using spherical harmonics transform (SHT) coefficients as auxiliary inputs to models. which concisely represent spatial distributions. SHT allows signals from varying numbers of microphones to be converted into coefficients of a consistent dimension. The proposed technique enables a single model to generalize to microphone arrays with varying configurations, rather than requiring a specialized model for each array layout. We present two architectures with SHT-based auxiliary inputs: parallel and serial. Specifically, the parallel model contains two encoders - one for STFT and another for SHT. By fusing both encoders' outputs in the decoder to estimate the enhanced STFT, it effectively incorporates spatial context. For the serial approach, we first apply SHT to the signals and then take STFT of the transformed signals as network inputs. Evaluations of the TIMIT dataset under fluctuating noise and reverberation demonstrate our model outperforms established benchmarks. Remarkably, these results are attained with reduced computations and parameters. Furthermore, experiments on the MS-SNSD dataset show the proposed method can enhance the generalization ability of networks. The source code is publicly accessible at https://github.com/Pandade1997/SH_injection.
Keywords:
Natural Language Processing: NLP: Information extraction
Machine Learning: ML: Applications
Machine Learning: ML: Representation learning
Machine Learning: ML: Trustworthy machine learning