Portfolio item number 1
Short description of portfolio item number 1
Short description of portfolio item number 1
Short description of portfolio item number 2
Published in IEEE Signal Process. Lett., 2020
Time difference of arrival (TDOA) estimation, which often serves as the fundamental step for a source localization or a beamforming system, has a significant practical importance in a wide spectrum of applications. To deal with reverberation, the TDOA estimation problem is often transformed into one of identifying the relative acoustic impulse responses. This letter presents a method to efficiently identify the relative acoustic impulse response between two microphones for TDOA estimation based on the so-called Kronecker product decomposition. By decomposing the relative impulse response into a series of Kronecker products of shorter filters, the original channel identification problem with a long impulse response is converted into one of identifying a number of short filters. Since the TDOA information is embedded only in the direct path of the relative impulse response, the dimension of the Kronecker product decomposition can be very small and, as a result, the developed algorithm is expected to work well in real environments with a small number of data snapshots.
Published in EUSIPCO, 2022
Spectral estimation is of significant practical importance in a widerange of applications. This paper proposes a minimum variance dis-tortionless response (MVDR) method for spectral estimation basedon the Kronecker product. Taking advantage of the particular struc-ture of the Fourier vector, we decompose it as a Kronecker productof two shorter vectors. Then, we design the spectral estimation fil-ters under the same structure, i.e., as a Kronecker product of twofilters. Consequently, the conventional MVDR spectrum problem istransformed to one of estimating two filters of much shorter length-s. Since it has much fewer parameters to estimate, the proposedmethod is able to achieve better performance than its conventionalcounterpart, particularly when the number of available signal sam-ples is small. Also presented in this paper is the generalization to theestimation of the cross-spectrum and coherence function.
Published in IEEE IWAENC , 2022
Acoustic vector sensor (AVS), as a compact sensor with the capabil-ity of forming a frequency-invariant spatial beampattern over the 3Dspace, has potential in source separation. A straightforward way toachieve source separation with AVS is through adaptive beamform-ing. Such a method requires the direction-of-arrival (DOA) infor-mation, which is challenging to estimate accurately in reverberantenvironments. To circumvent this issue, we present a frameworkjointly implementing adaptive beamforming and independent vectoranalysis (IVA). Different from the conventional beamforming, thepresented method only require rough DOA estimation for initializa-tion. It iteratively refines the estimates of source DOA and signalstatistics. The proposed method has great advantages of improv-ing source separation performance and enhancing DOA estimationaccuracy. Simulations demonstrate the properties of the developedmethod.
Published in IEEE ICASSP , 2023
Spatial information can help improve source separation performance. Numerous spatially informed source extraction methods based on the independent vector analysis (IVA) have been developed, which can achieve reasonably good performance in non-or weakly reverberant environments. However, the performance of those methods degrades quickly as the reverberation increases. The underlying reason is that those methods are derived based on the multiplicative transfer function model with a rank-1 assumption, which does not hold true if reverberation is strong. To circumvent this issue, this paper proposes to use the convolutive transfer function (CTF) model to improve the source extraction performance and develop a spatially informed IVA algorithm. Simulations demonstrate the efficacy of the developed method even in highly reverberant environments.
Published in IEEE ICASSP , 2023
This paper studies the problem of target speaker signal exaction and antiphasic rendering with an array of microphones in the scenarios where there are two active speakers. Based on the important findings achieved in the psychoacoustic field as well as our recent works on single-channel speech enhancement, we present a rendering based approach in which a temporal convolutional network (TCN) is trained to take the multiple signals observed by the microphone array as its inputs and generate two output (binaural) signals. The TCN is trained in such a way that, when binaural output signals are listened by the listener with headsets, the speech signal from the desired speaker is perceived on one side of and close to the listener’s head, while the competing speech signal is perceived on the opposite side and also away from the listener’s head. Benefited from rendering and the signal-to-interference ratio (SIR) improvement, this antiphasic binaural presentation enables the listener to better focus on the target speaker’s signal while ignoring the impact of the competing speech. The modified rhyme tests (MRTs) are performed to validate the superiority of the proposed method.
Published in EUSIPCO, 2023
Source extraction, which aims at extracting the target source signalsfrom the observed reverberant mixtures, plays an important rolein voice communication and human-machine interfaces. Amongthe numerous source extraction methods that have been developed,the geometrically constrained (GC) one, which incorporates thedirection-of-arrival (DOA) information of the target signals, hasdemonstrated great potential. However, this method generallysuffers from significant performance degradation in strong reverberantenvironments since it is challenging to obtain in such environmentsaccurate DOA estimates that are needed by the algorithm. Toaddress this problem, we present in this work an iterative algorithm,which integrates the source-wise weighted prediction error (WPE)-based dereverberation principle with the geometrically constrainedsource extraction method. We show that this algorithm is able toimprove the DOA estimation accuracy as well as the source extractionperformance.
Published in APASIPA ASC, 2023
In order to improve both the separation performanceand the convergence speed, several geometrically constrainedindependent vector analysis (GC-IVA) algorithms have beendeveloped. Those algorithms are based on the multiplicativetransfer function model, which assumes that the analysis windowlength is longer than the effective part of the room impulseresponses. However, this assumption does often not hold inreverberant environments, particularly if the reverberation isstrong, which makes the algorithms suffer from significantperformance degradation. To circumvent this issue, an algorithmwas developed, which jointly optimizes the weighted predictionerror (WPE) dereverberation method and GC-IVA (GC-WPE-IVA). While it has demonstrated promising performance, thisjoint optimization method involves matrix inversion; so it iscomputationally very expensive. This work attempts to improvethe efficiency and stability of GC-WPE-IVA. We develop aniterative source steering (ISS) updating algorithm in the frame-work of GC-WPE-IVA. The experimental results show that thedeveloped method is computationally much more efficient yet itcan achieve comparable separation performance in reverberationenvironments as compared to GC-WPE-IVA.
Published in IEEE ICASSP , 2024
While the semi-blind source separation-based acoustic echo cancellation (SBSS-AEC) has received much research attention due to its promising performance during double-talk compared to the traditional adaptive algorithms, it suffers from system latency and non-linear distortions. To circumvent these drawbacks, the recently developed ideas on convolutive transfer function (CTF) approximation and nonlinear expansion have been used in the iterative projection (IP)-based semi-blind source separation (SBSS) algorithm. However, because of the introduction of CTF approximation and nonlinear expansion, this algorithm becomes computationally very expensive, which makes it difficult to implement in embedded systems. Thus, we attempt in this paper to improve this IP-based algorithm , thereby developing an element-wise iterative source steering (EISS) algorithm. In comparison with the IP-based SBSS algorithm , the proposed algorithm is computationally much more efficient , especially when the nonlinear expansion order is high and the length of the CTF filter is long. Meanwhile, its AEC performance is as good as that of IP-based SBSS.
Published in IEEE ICASSP , 2024
Stereophonic music source separation (MSS) is a problem of extracting individual source tracks, e.g. bass, drums, vocals, from a stereo music recording. Deep neural network (DNN) based MSS systems have demonstrated great promise though spatial panning cues and time-frequency spectral structures in stereo music have not yet been fully explored in such systems and methods. This paper presents a spatially-informed MSS method using a bridging band-split neural network that incorporates both spatial and spectral information. The spatial panning angles of each target source are used as input of the network, along with the time-frequency spectrograms. Moreover, the inter-track correlations are exploited for further performance improvement. Experiments show that the proposed method outperforms significantly the baseline systems as the result of using spatial cues, spectral characteristics, and inter-track relationships.
Published in IEEE ICASSP HSCMA, 2024
Audio classification, which serves as a fundamental step foracoustic signal processing, has attacked a lot of research in-terest and numerous audio classification neural networks havebeen proposed. In these networks, down-sampling blockswhich compresses audio features are essential due to the com-putational capacity. However, compressing the signal will in-evitably cause the loss of relevant information. To mitigatethis issue, large amount of parameters are used. In this paper,we present a novel down-sampling method called gated multimini-patch extractor (GMME), in which multiple convolutivelayers are used to extract relevant information at different lev-els, including time frames, pseudo-frequency bins, and globalfeatures. And gate mechanism is adopted to retain the corre-lation with the original features. Several simulations demon-strate that, compared to the baseline, our method can achievecomparable or slightly better performance with significant re-duction of number of parameters.
Published in IEEE/ACM Trans. on Audio, Speech, and Lang. Process. , 2024
Acoustic echo cancellation (AEC) is a crucial task in full duplex communications. As conventional linear filtering approaches are ineffective to deal with double-talk, various semi-blind source separation (SBSS)-based AEC algorithms are deceived, most of which are formulated and implemented in the frequency domain based on the multiplicative transfer function (MTF) model for computational efficiency. To avoid large latency and in order to deal with loudspeaker nonlinearities, the convolutive transfer function (CTF) model and odd power series expansion are leveraged, which are employed by numerous SBSS-based nonlinear AEC (SBSSNAEC) algorithms. Conventional SBSS-NAEC methods estimate the series expansion coefficients and the CTF filter simultaneously making the number of free parameters to estimate large. Hence, the corresponding algorithms are computationally expensive and are difficult to optimize. In this work, we propose to decouple the series expansion coefficients and the CTF filters into a bilinear form and present a bilinear alternating optimization framework for estimating the model parameters. An alternating iterative projection (AIP) algorithm and an alternating element-wise iterative source steering (AEISS) algorithm are proposed. As the bilinear representation consists of less parameters compared to the conventional methods, the proposed algorithms not only improve the AEC performance but also reduce the computational complexity, which is validated by comprehensive simulations and experiments.
Published in EUSIPCO, 2024
Blind-audio-source-separation (BASS) techniques, particularly those with low latency, play an important role in a wide range of real-time systems, e.g., hearing aids, in-car hand-free voice communication, real-time human-machine interaction, etc. Most existing BASS algorithms are deduced to run on batch mode, and therefore large latency is unavoidable. Recently, some online algorithms were developed, which achieve separation on a frame-by-frame basis in the short-time-Fourier-transform (STFT) domain and the latency is significantly reduced as compared to those batch methods. However, the latency with these algorithms may still be too long for many real-time systems to bear. To further reduce latency while achieving good separation performance, we propose in this work to integrate a weighted prediction error (WPE) module into a non-causal sample-truncating-based independent vector analysis (NST-IVA). The resulting algorithm can maintain the algorithmic delay as NST-IVA if the delay with WPE is appropriately controlled while achieving significantly better performance, which is validated by simulations.
Published in IWAENC , 2024
Acoustic echo cancellation (AEC), interference suppression, and noise reduction play important roles in full-duplex communication. However, conventional systems that cascade adaptive filters and beamformers often experience a degradation in performance during doubletalk situations. To tackle this issue, this paper presents a multichannel semi blind source separation (SBSS) method that combines the element-wise iterative source steering (EISS) AEC algorithm with a geometrically constrained independent vector analysis source extraction algorithm for full-duplex communications. Simulation results confirm the effectiveness of the proposed method.
Published in IWAENC , 2024
Nonlinear acoustic echo cancellation (NAEC) is of significant importance in acoustic telecommunication. To improve NAEC performance in the double-talk case, semi-blind source separation-based NAEC (SBSS-NAEC) algorithms have been proposed. However, to deal with reverberation and loudspeaker nonlinearities, convolutive transfer function (CTF) models and power series expansions are employed, which significantly increase the number of free parameters and consequently lead to slow convergence speed and, hence, limited performance. In this paper, we introduce the data-reuse strategy, well-known in the adaptive filter literature, into an SBSS-NAEC framework and propose two algorithms: data-reuse iteration projection (DR-IP) and data-reuse element-wise iterative source steering (DR-EISS). Several simulations demonstrate the superiority of the proposed methods, especially the tracking capability when the impulse response changes.
Published:
This is a description of your talk, which is a markdown files that can be all markdown-ified like any other post. Yay markdown!
Published:
This is a description of your conference proceedings talk, note the different field in type. You can put anything in this field.
Undergraduate course, University 1, Department, 2014
This is a description of a teaching experience. You can use markdown like any other post.
Workshop, University 1, Department, 2015
This is a description of a teaching experience. You can use markdown like any other post.