Publications

You can also find my articles on my Google Scholar profile.

Geometrically Constrained Source Extraction and Dereverberation Based on Joint Optimization

Published in EUSIPCO, 2024

Blind-audio-source-separation (BASS) techniques, particularly those with low latency, play an important role in a wide range of real-time systems, e.g., hearing aids, in-car hand-free voice communication, real-time human-machine interaction, etc. Most existing BASS algorithms are deduced to run on batch mode, and therefore large latency is unavoidable. Recently, some online algorithms were developed, which achieve separation on a frame-by-frame basis in the short-time-Fourier-transform (STFT) domain and the latency is significantly reduced as compared to those batch methods. However, the latency with these algorithms may still be too long for many real-time systems to bear. To further reduce latency while achieving good separation performance, we propose in this work to integrate a weighted prediction error (WPE) module into a non-causal sample-truncating-based independent vector analysis (NST-IVA). The resulting algorithm can maintain the algorithmic delay as NST-IVA if the delay with WPE is appropriately controlled while achieving significantly better performance, which is validated by simulations.

Download Paper

On Semi-Blind Source Separation-Based Approaches to Nonlinear Echo Cancellation Based on Bilinear Alternating Optimization

Published in IEEE/ACM Trans. on Audio, Speech, and Lang. Process. , 2024

Acoustic echo cancellation (AEC) is a crucial task in full duplex communications. As conventional linear filtering approaches are ineffective to deal with double-talk, various semi-blind source separation (SBSS)-based AEC algorithms are deceived, most of which are formulated and implemented in the frequency domain based on the multiplicative transfer function (MTF) model for computational efficiency. To avoid large latency and in order to deal with loudspeaker nonlinearities, the convolutive transfer function (CTF) model and odd power series expansion are leveraged, which are employed by numerous SBSS-based nonlinear AEC (SBSSNAEC) algorithms. Conventional SBSS-NAEC methods estimate the series expansion coefficients and the CTF filter simultaneously making the number of free parameters to estimate large. Hence, the corresponding algorithms are computationally expensive and are difficult to optimize. In this work, we propose to decouple the series expansion coefficients and the CTF filters into a bilinear form and present a bilinear alternating optimization framework for estimating the model parameters. An alternating iterative projection (AIP) algorithm and an alternating element-wise iterative source steering (AEISS) algorithm are proposed. As the bilinear representation consists of less parameters compared to the conventional methods, the proposed algorithms not only improve the AEC performance but also reduce the computational complexity, which is validated by comprehensive simulations and experiments.

Download Paper

Light Gated Multi Mini-patch Rxtractor For Audio Classification

Published in IEEE ICASSP HSCMA, 2024

Audio classification, which serves as a fundamental step foracoustic signal processing, has attacked a lot of research in-terest and numerous audio classification neural networks havebeen proposed. In these networks, down-sampling blockswhich compresses audio features are essential due to the com-putational capacity. However, compressing the signal will in-evitably cause the loss of relevant information. To mitigatethis issue, large amount of parameters are used. In this paper,we present a novel down-sampling method called gated multimini-patch extractor (GMME), in which multiple convolutivelayers are used to extract relevant information at different lev-els, including time frames, pseudo-frequency bins, and globalfeatures. And gate mechanism is adopted to retain the corre-lation with the original features. Several simulations demon-strate that, compared to the baseline, our method can achievecomparable or slightly better performance with significant re-duction of number of parameters.

Download Paper

Stereophonic Music Source Separation with Spatially-Informed Bridging Band-Split Network

Published in IEEE ICASSP , 2024

Stereophonic music source separation (MSS) is a problem of extracting individual source tracks, e.g. bass, drums, vocals, from a stereo music recording. Deep neural network (DNN) based MSS systems have demonstrated great promise though spatial panning cues and time-frequency spectral structures in stereo music have not yet been fully explored in such systems and methods. This paper presents a spatially-informed MSS method using a bridging band-split neural network that incorporates both spatial and spectral information. The spatial panning angles of each target source are used as input of the network, along with the time-frequency spectrograms. Moreover, the inter-track correlations are exploited for further performance improvement. Experiments show that the proposed method outperforms significantly the baseline systems as the result of using spatial cues, spectral characteristics, and inter-track relationships.

Download Paper

A Computationally Efficient Semi-blind Source Separation Approach for Nonlinear Echo Cancellation Based on an Element-wise Iterative Source Steering

Published in IEEE ICASSP , 2024

While the semi-blind source separation-based acoustic echo cancellation (SBSS-AEC) has received much research attention due to its promising performance during double-talk compared to the traditional adaptive algorithms, it suffers from system latency and non-linear distortions. To circumvent these drawbacks, the recently developed ideas on convolutive transfer function (CTF) approximation and nonlinear expansion have been used in the iterative projection (IP)-based semi-blind source separation (SBSS) algorithm. However, because of the introduction of CTF approximation and nonlinear expansion, this algorithm becomes computationally very expensive, which makes it difficult to implement in embedded systems. Thus, we attempt in this paper to improve this IP-based algorithm , thereby developing an element-wise iterative source steering (EISS) algorithm. In comparison with the IP-based SBSS algorithm , the proposed algorithm is computationally much more efficient , especially when the nonlinear expansion order is high and the length of the CTF filter is long. Meanwhile, its AEC performance is as good as that of IP-based SBSS.

Download Paper

On Joint Dereverberation and Source Separation with Geometrical Constraints and Iterative Source Steering

Published in APASIPA ASC, 2023

In order to improve both the separation performanceand the convergence speed, several geometrically constrainedindependent vector analysis (GC-IVA) algorithms have beendeveloped. Those algorithms are based on the multiplicativetransfer function model, which assumes that the analysis windowlength is longer than the effective part of the room impulseresponses. However, this assumption does often not hold inreverberant environments, particularly if the reverberation isstrong, which makes the algorithms suffer from significantperformance degradation. To circumvent this issue, an algorithmwas developed, which jointly optimizes the weighted predictionerror (WPE) dereverberation method and GC-IVA (GC-WPE-IVA). While it has demonstrated promising performance, thisjoint optimization method involves matrix inversion; so it iscomputationally very expensive. This work attempts to improvethe efficiency and stability of GC-WPE-IVA. We develop aniterative source steering (ISS) updating algorithm in the frame-work of GC-WPE-IVA. The experimental results show that thedeveloped method is computationally much more efficient yet itcan achieve comparable separation performance in reverberationenvironments as compared to GC-WPE-IVA.

Download Paper

Geometrically Constrained Source Extraction and Dereverberation Based on Joint Optimization

Published in EUSIPCO, 2023

Source extraction, which aims at extracting the target source signalsfrom the observed reverberant mixtures, plays an important rolein voice communication and human-machine interfaces. Amongthe numerous source extraction methods that have been developed,the geometrically constrained (GC) one, which incorporates thedirection-of-arrival (DOA) information of the target signals, hasdemonstrated great potential. However, this method generallysuffers from significant performance degradation in strong reverberantenvironments since it is challenging to obtain in such environmentsaccurate DOA estimates that are needed by the algorithm. Toaddress this problem, we present in this work an iterative algorithm,which integrates the source-wise weighted prediction error (WPE)-based dereverberation principle with the geometrically constrainedsource extraction method. We show that this algorithm is able toimprove the DOA estimation accuracy as well as the source extractionperformance.

Download Paper

On Multiple-Input/Binaural-Output Antiphasic Speaker Signal Extraction

Published in IEEE ICASSP , 2023

This paper studies the problem of target speaker signal exaction and antiphasic rendering with an array of microphones in the scenarios where there are two active speakers. Based on the important findings achieved in the psychoacoustic field as well as our recent works on single-channel speech enhancement, we present a rendering based approach in which a temporal convolutional network (TCN) is trained to take the multiple signals observed by the microphone array as its inputs and generate two output (binaural) signals. The TCN is trained in such a way that, when binaural output signals are listened by the listener with headsets, the speech signal from the desired speaker is perceived on one side of and close to the listener’s head, while the competing speech signal is perceived on the opposite side and also away from the listener’s head. Benefited from rendering and the signal-to-interference ratio (SIR) improvement, this antiphasic binaural presentation enables the listener to better focus on the target speaker’s signal while ignoring the impact of the competing speech. The modified rhyme tests (MRTs) are performed to validate the superiority of the proposed method.

Download Paper

Spatially Informed Independent vector analysis for Source Extraction based on the convolutive Transfer Function Model

Published in IEEE ICASSP , 2023

Spatial information can help improve source separation performance. Numerous spatially informed source extraction methods based on the independent vector analysis (IVA) have been developed, which can achieve reasonably good performance in non-or weakly reverberant environments. However, the performance of those methods degrades quickly as the reverberation increases. The underlying reason is that those methods are derived based on the multiplicative transfer function model with a rank-1 assumption, which does not hold true if reverberation is strong. To circumvent this issue, this paper proposes to use the convolutive transfer function (CTF) model to improve the source extraction performance and develop a spatially informed IVA algorithm. Simulations demonstrate the efficacy of the developed method even in highly reverberant environments.

Download Paper

Independent Vector Analysis Assisted Adaptive Beamfomring for Speech Source Separation with an Acoustic Vector Sensor

Published in IEEE IWAENC , 2022

Acoustic vector sensor (AVS), as a compact sensor with the capabil-ity of forming a frequency-invariant spatial beampattern over the 3Dspace, has potential in source separation. A straightforward way toachieve source separation with AVS is through adaptive beamform-ing. Such a method requires the direction-of-arrival (DOA) infor-mation, which is challenging to estimate accurately in reverberantenvironments. To circumvent this issue, we present a frameworkjointly implementing adaptive beamforming and independent vectoranalysis (IVA). Different from the conventional beamforming, thepresented method only require rough DOA estimation for initializa-tion. It iteratively refines the estimates of source DOA and signalstatistics. The proposed method has great advantages of improv-ing source separation performance and enhancing DOA estimationaccuracy. Simulations demonstrate the properties of the developedmethod.

Download Paper

A Minimum Variance Distortionless Response Spectral Estimator with Kronecker Product Filters

Published in EUSIPCO, 2022

Spectral estimation is of significant practical importance in a widerange of applications. This paper proposes a minimum variance dis-tortionless response (MVDR) method for spectral estimation basedon the Kronecker product. Taking advantage of the particular struc-ture of the Fourier vector, we decompose it as a Kronecker productof two shorter vectors. Then, we design the spectral estimation fil-ters under the same structure, i.e., as a Kronecker product of twofilters. Consequently, the conventional MVDR spectrum problem istransformed to one of estimating two filters of much shorter length-s. Since it has much fewer parameters to estimate, the proposedmethod is able to achieve better performance than its conventionalcounterpart, particularly when the number of available signal sam-ples is small. Also presented in this paper is the generalization to theestimation of the cross-spectrum and coherence function.

Download Paper

Time Difference of Arrival Estimation Based on a Kronecker Product Decomposition

Published in IEEE Signal Process. Lett., 2020

Time difference of arrival (TDOA) estimation, which often serves as the fundamental step for a source localization or a beamforming system, has a significant practical importance in a wide spectrum of applications. To deal with reverberation, the TDOA estimation problem is often transformed into one of identifying the relative acoustic impulse responses. This letter presents a method to efficiently identify the relative acoustic impulse response between two microphones for TDOA estimation based on the so-called Kronecker product decomposition. By decomposing the relative impulse response into a series of Kronecker products of shorter filters, the original channel identification problem with a long impulse response is converted into one of identifying a number of short filters. Since the TDOA information is embedded only in the direct path of the relative impulse response, the dimension of the Kronecker product decomposition can be very small and, as a result, the developed algorithm is expected to work well in real environments with a small number of data snapshots.

Download Paper