Beamforming node performs sound source separation based on the following methods:
DS : Delay-and-Sum beamforming
WDS : Weighted Delay-and-Sum beamforming
NULL : NULL beamforming
ILSE : Iterative Least Squares with Enumeration
LCMV Linearly Constrained Minimum Variance Beamforming
GJ : Griffiths-Jim Beamforming
MSNR : Maximum Signal-to-Noise Ratio
GSS : Geometrically constrained Source Separation
GICA : Geometrically constrained Independent Component Analysis
GHDSS : Geometrically constrained Higher-order Decorrelation-based Source Separation
Node inputs are:
Multi-channel complex spectrum of the input acoustic signal,
Direction of localized sound sources,
Direction or multi-channel complex spectrum of the noise.
Note outputs are a set of complex spectrum of each separated sound.
Corresponding parameter name |
Description |
TF_CONJ_FILENAME |
Transfer function of microphone array |
INITW_FILENAME |
Initial value of separation matrix |
When to use
Given a sound source direction, the node separates a sound source originating from the direction with a microphone array. As a sound source direction, either a value estimated by sound source localization or a constant value may be used.
Typical connection
Figure 6.40 shows a connection example of the Beamforming . The node has three inputs as follows:
INPUT_FRAMES takes a multi-channel complex spectrum containing the mixture of sounds coming from for example MultiFFT ,
INPUT_SOURCES takes the results of sound source localization coming from for example LocalizeMUSIC or ConstantLocalization ,
INPUT_NOISE_SOURCES takes the results of sound source localization for noise sources coming from for example LocalizeMUSIC or ConstantLocalization .
The output is the separated signals.
Parameter name |
Type |
Default value |
Unit |
Description |
LENGTH |
512 |
[pt] |
Analysis frame length. |
|
ADVANCE |
160 |
[pt] |
Shift length of frame. |
|
SAMPLING_RATE |
16000 |
[Hz] |
Sampling frequency. |
|
SPEED_OF_SOUND |
343.0 |
[m/s] |
Sound speed |
|
TF_CONJ_FILENAME |
File name of transfer function database of your microphone array. This is valid for any BF_METHOD. |
|||
INITW_FILENAME |
A file name in which the initial value of the separation matrix is described. Use the file exported by EXPORT_W. |
|||
SS_METHOD |
ADAPTIVE |
A stepsize calculation method based on blind source separation. This is valid only when BF_METHOD=GSS,GICA,GHDSS. Select FIX, LC_MYU or ADAPTIVE. FIX indicates fixed values for the stepsize specified by SS_MYU. LC_MYU indicates that SS_MYU=LC_MYU. ADAPTIVE adaptively tunes the stepsize. |
||
SS_MYU |
1.0 |
The stepsize when updating a separation matrix based on blind source separation. This is valid only when BF_METHOD=GSS,GICA,GHDSS. If SS_METHOD=FIX, SS_MYU is the value for the fixed stepsize. If SS_METHOD=LC_MYU, this parameter is ignored. If SS_METHOD=ADAPTIVE, the stepsize is determined based on an adaptive stepsize method which is multiplied by SS_MYU. |
||
LC_METHOD |
ADAPTIVE |
A stepsize calculation method based on geometric constraints. This is valid only when BF_METHOD=LCMV,GJ,GSS,GICA,GHDSS. Select FIX or ADAPTIVE. FIX indicates fixed values for the stepsize specified by LC_MYU. ADAPTIVE adaptively tunes the stepsize. |
||
LC_MYU |
1.0 |
The stepsize when updating a separation matrix based on geometric constraints. This is valid only when BF_METHOD=LCMV,GJ,GSS,GICA,GHDSS. If LC_METHOD=FIX, LC_MYU is the value for the fixed stepsize. If LC_METHOD=ADAPTIVE, the stepsize is determined based on an adaptive stepsize method which is multiplied by LC_MYU. |
||
EXPORT_W |
false |
Designate whether separation matrixes are to be written to files. |
||
EXPORT_W_FILENAME |
The name of the file to which the separation matrix is written. This is valid only when EXPORT_W=true. |
|||
ALPHA |
0.99 |
The stepsize for updating correlation matrices when BF_METHOD=MSNR. |
||
NL_FUNC |
TANH |
The function for computing higher-order correlation matrices. Currently, only TANH is supported. This is valid only whrn BF_METHOD=GICA,GHDSS. |
||
SS_SCAL |
1.0 |
The scale factor in a higher-order correlation matrix computation. This is valid only when BF_METHOD=GICA,GHDSS. |
||
BF_METHOD |
LCMV |
Selection of sound source separation method |
||
ENABLE_DEBUG |
false |
Enabling debug output |
Input
: Matrix<complex<float> > type. Multi-channel complex spectra. Rows correspond to channels, i.e., complex spectra of waveforms input from microphones, and columns correspond to frequency bins.
: Vector<ObjectRef> type. A Vector array of the Source type object in which Source localization results are stored. It is typically connected to the SourceTracker node and SourceIntervalExtender node and its outputs are used.
: Vector<ObjectRef> type. A Vector array of the Source type object where noise source localization results are stored. The type is the same as INPUT_SOURCES.
Output
: Map<int, ObjectRef> type. A pair containing the sound source ID of a separated sound and a 1-channel complex spectrum of the separated sound
(Vector<complex<float> > type).
Parameter
: int type. Analysis frame length [samples], which must be equal to the values at a preceding node (e.g. AudioStreamFromMic or the MultiFFT node). The default is 512.
: int type. Shift length of a frame [samples], which must be equal to the values at a preceding node (e.g. AudioStreamFromMic or the MultiFFT node). The default is 160.
: int type. Sampling frequency of the input waveform [Hz]. The default is 16000.
: float type. Sound speed [m/s]. The default is 343.0[m/s].
: string type. The file name in which the transfer function database of your microphone array is saved. Refer to Section 5.1.2 for the detail of the file format. This valid for all BF_METHOD.
: string type. The file name in which the initial value of a separation matrix is described. Initializing with a converged separation matrix through preliminary computation allows for separation with good precision from the beginning. The file given here must be ready beforehand by setting to truefor EXPORT_W and save the separation matrix with the certain file name at EXPORT_W_FILENAME. For its format, see 5.1.3. This function is disabled currently.
: string type. Select a stepsize calculation method based for a blind source separation. This is valid only when BF_METHOD=GSS,GICA,GHDSS. If GSS, the stepsize of DSS (Decorrelation-based Source Separation) is determined. If GICA, the stepsize of ICA (Independent Component Analysis) is determined. If GHDSS, the stepsize of HDSS (Higher-order Decorrelation-based Source Separation) is determined. Select one of SS_METHOD=FIX, LC_MYU, and ADAPTIVE. If FIX, a fixed designated value specified by SS_MYU is the stepsize. If LC_MYU, SS_MYU=LC_MYU. If ADAPTIVE, the stepsize is adaptively determined.
: float type. Designate the stepsize to be used when updating a separation matrix based on blind source separation. The default value is 1.0. This is valid only when BF_METHOD=GSS,GICA,GHDSS. When SS_METHOD=FIX, SS_MYU is the designated value for the stepsize. When SS_METHOD=LC_MYU, this parameter is ignored. When SS_METHOD=ADAPTIVE, SS_MYU is multiplied by the adaptive stepsize, resulting the final stepsize. By setting this value and LC_MYU to zero and passing a separation matrix of delay-and-sum beamformer type as INITW_FILENAME, processing equivalent to delay-and-sum beamforming is performed when BF_METHOD=GSS,GICA,GHDSS.
: string type. Select a stepsize calculation method for separation based on geometric constraints. This is valid only when BF_METHOD=LCMV,GJ,GSS,GICA,GHDSS. This parameter affects the stepsize for the source separation based on geometric constraints (GC). Select one of LC_METHOD=FIX and ADAPTIVE. If FIX, a fixed designated value specified by LC_MYU is the stepsize. If ADAPTIVE, the stepsize is adaptively determined.
: float type. Designate the stepsize to be used when updating a separation matrix based on geometric constraints. The default value is 1.0. This is valid only when BF_METHOD=LCMV,GJ,GSS,GICA,GHDSS. When LC_METHOD=FIX, LC_MYU is the designated value for the stepsize. When LC_METHOD=ADAPTIVE, LC_MYU is multiplied by the adaptive stepsize, resulting the final stepsize. By setting this value and SS_MYU to zero and passing a separation matrix of delay-and-sum beamformer type as INITW_FILENAME, processing equivalent to delay-and-sum beamforming is performed when BF_METHOD=GSS,GICA,GHDSS.
: bool type. The default value is false. The user determines if the results of the separation matrix updated by Beamforming will be output. When true, select EXPORT_W_FILENAME.
: string type. This parameter is valid when EXPORT_W is set to true. Designate the name of the file into which a separation matrix will be output. For its format, see Section 5.1.3.
: float type. The stepsize for updating correlation matrices when BF_METHOD=MSNR. The default value is 0.99.
: string type. The function for computing higher-order correlation matrices. Currently, only TANH (hyperbolic tangent) is supported. This is valid only whrn BF_METHOD=GICA,GHDSS.
: float type. The default value is 1.0. Designate the scale factor of a hyperbolic tangent function (tanh) in calculation of the higher-order correlation matrix when BF_METHOD=GICA,GHDSS. A positive real number greater than zero must be designated. The smaller the value is, the less non-linearity, which makes the calculation close to a normal correlation matrix calculation.
: string type. Designate the sound source separation method. Currently, this node supports the following separation methods:
DS : Delay-and-Sum beamforming [1]
WDS : Weighted Delay-and-Sum beamforming [1]
NULL : NULL beamforming [1]
ILSE : Iterative Least Squares with Enumeration [2]
LCMV : Linearly Constrained Minimum Variance beamforming [3]
GJ : Griffiths-Jim beamforming [4]
MSNR : Maximum Signal-to-Noise Ratio method [5]
GSS : Geometrically constrained Source Separation [6]
GICA : Geometrically constrained Independent Component Analysis [7]
GHDSS : Geometrically constrained Higher-order Decorrelation-based Source Separation [7]
: bool type. The default value is false. If true, this node prints the status of separation as a standard output.
Technical details: Basically, the technical detail of each separation method can be found in the references below.
Bried explanation of sound source separation:
Table 6.35 shows the notation of variables used in sound source separation problems. Since the source separation is performed frame-by-frame in the frequency domain, all the variable is computed in a complex field. Also, the separation is performed for all $K$ frequency bins ($1 \leq k \leq K$). Here, we omit $k$ from the notation. Let $N$, $M$, and $f$ denote the number of sound sources and the number of microphones, and the frame index, respectively.
Variables |
Description |
$\boldsymbol {S}(f) = \left[S_1(f), \dots , S_ N(f)\right]^ T$ |
Complex spectrum of target sound sources at the $f$-th frame |
$\boldsymbol {X}(f) = \left[X_1(f), \dots , X_ M(f)\right]^ T$ |
Complex spectrum of a microphone observation at the $f$-th frame, which corresponds to INPUT_FRAMES. |
$\boldsymbol {N}(f) = \left[N_1(f), \dots , N_ M(f)\right]^ T$ |
Complex spectrum of added noise |
$\boldsymbol {H} = \left[ \boldsymbol {H}_1, \dots , \boldsymbol {H}_ N \right] \in \mathbb {C}^{M \times N}$ |
Transfer function matrix from the $n$-th sound source ($1 \leq n \leq N$) to the $m$-th microphone ($1 \leq m \leq M$) |
$\boldsymbol {W}(f) = \left[ \boldsymbol {W}_1, \dots , \boldsymbol {W}_ M \right] \in \mathbb {C}^{N \times M}$ |
Separation matrix at the $f$-th frame |
$\boldsymbol {Y}(f) = \left[Y_1(f), \dots , Y_ N(f)\right]^ T$ |
Complex spectrum of separated signals |
We use the following linear model for the signal processing:
$\displaystyle \boldsymbol {X}(f) $ | $\displaystyle = $ | $\displaystyle \boldsymbol {H}\boldsymbol {S}(f) + \boldsymbol {N}(f)~ .\label{eq:beamforming_ observation} $ | (28) |
The purpose of the separation is to estimate $\boldsymbol {W}(f)$ based on the following equation:
$\displaystyle \boldsymbol {Y}(f) $ | $\displaystyle = $ | $\displaystyle \boldsymbol {W}(f)\boldsymbol {X}(f) \label{eq:Beamforming-separation} $ | (29) |
so that $\boldsymbol {Y}(f)$ is getting closer to $\boldsymbol {S}(f)$. After separation, the estimated $\boldsymbol {W}(f)$ can be saved by setting EXPORT_W=true and put a certain name in EXPORT_W_FILENAME.
TF_CONJ_FILENAME specifies the transfer function matrix $\boldsymbol {H}$ which is pre-measured or pre-calculated. Hereinafter, we denote this pre-measured transfer function as $\hat{\boldsymbol {H}}$ to distinguish from $\boldsymbol {H}$.
Separation by BF_METHOD=DS,WDS,NULL,ILSE: $\boldsymbol {W}(f)$ is directly determined using $\hat{\boldsymbol {H}}$ and corresponding directions of target and noise sources coming from INPUT_SOURCES and INPUT_NOISE_SOURCES.
Separation by BF_METHOD=MSNR: The cont function $J_{\textrm{MSNR}}(\boldsymbol {W}(f))$ for updating the separation matrix is defined by the directions of target and noise sources coming from INPUT_SOURCES and INPUT_NOISE_SOURCES. The signal correlation matrix $\boldsymbol {R}_{ss}(f)$ used in MSNR is updated as follows using a correlation matrix $\boldsymbol {R}_{xx}(f)$ of a signal when sound sources exists in INPUT_SOURCES (existence of target sound sources).
$\displaystyle \boldsymbol {R}_{ss}(f+1) $ | $\displaystyle = $ | $\displaystyle \alpha \boldsymbol {R}_{ss}(f) + (1-\alpha )\boldsymbol {R}_{xx}(f), \label{eq:MSNR_ Rss} $ | (30) |
On the other hand, the noise correlation matrix $\boldsymbol {R}_{nn}(f)$ is updated as follows using a correlation matrix $\boldsymbol {R}_{xx}(f)$ of a noise when sound sources exists in INPUT_NOISE_SOURCES (existence of noise sources).
$\displaystyle \boldsymbol {R}_{nn}(f+1) $ | $\displaystyle = $ | $\displaystyle \alpha \boldsymbol {R}_{nn}(f) + (1-\alpha )\boldsymbol {R}_{xx}(f), \label{eq:MSNR_ Rnn} $ | (31) |
$\alpha $ in Eqs. () and () can be specified by the property parameter, ALPHA. The sound sources are separaetd by updating $\boldsymbol {W}(f)$ based on the updates of $\boldsymbol {R}_{ss}(f)$ and $\boldsymbol {R}_{nn}(f)$.
Separation by BF_METHOD=LCMV,GJ: The cont function $J_{\textrm{L}}(\boldsymbol {W}(f))$ for updating the separation matrix is defined by the directions of target and noise sources coming from INPUT_SOURCES and INPUT_NOISE_SOURCES. The equation for updating the separation matrix is described simply as follows:
$\displaystyle \boldsymbol {W}(f+1) $ | $\displaystyle = $ | $\displaystyle \boldsymbol {W}(f) + \mu \nabla _{\boldsymbol {W}}\boldsymbol {J}_{\textrm{L}}(\boldsymbol {W})(f)~ ,\label{eq:LCMV_ GJ_ J} $ | (32) |
where $\nabla _{\boldsymbol {W}}\boldsymbol {J}_{\textrm{L}}(\boldsymbol {W}) = \frac{\partial \boldsymbol {J}_{\textrm{L}}(\boldsymbol {W})}{\partial \boldsymbol {W}}$. LC_MYU specifies the value of $\mu $. If LC_METHOD=ADAPTIVE, this node computes the adaptive stepsize based on the following equation.
$\displaystyle \mu $ | $\displaystyle = $ | $\displaystyle \left. \frac{\boldsymbol {J}_{\textrm{L}}(\boldsymbol {W})}{\left| \nabla _{\boldsymbol {W}}\boldsymbol {J}_{\textrm{L}}(\boldsymbol {W})\right|^2} \right|_{\boldsymbol {W} = \boldsymbol {W}(f)}\label{eq:LCMV_ GJ_ mu} $ | (33) |
BF_METHOD=GSS,GICA,GHDSSの場合: The cont function $J_{\textrm{G}}(\boldsymbol {W}(f))$ for updating the separation matrix is defined by the directions of target and noise sources coming from INPUT_SOURCES and INPUT_NOISE_SOURCES.
$\displaystyle J_{\textrm{G}}(\boldsymbol {W}(f)) $ | $\displaystyle = $ | $\displaystyle J_{\textrm{SS}}(\boldsymbol {W}(f)) + J_{\textrm{LC}}(\boldsymbol {W}(f))~ , \label{eq:GHDSS_ J} $ | (34) |
where $J_{\textrm{SS}}(\boldsymbol {W}(f))$ is the cost function for the blind source separation, $J_{\textrm{LC}}(\boldsymbol {W}(f))$ is the cost function for the source separation based on geometric constraints. The equation for updating the separation matrix is described simply as follows:
$\displaystyle \boldsymbol {W}(f+1) $ | $\displaystyle = $ | $\displaystyle \boldsymbol {W}(f) + \mu _{\textrm{SS}} \nabla _{\boldsymbol {W}}\boldsymbol {J}_{\textrm{SS}}(\boldsymbol {W})(f) + \mu _{\textrm{LC}} \nabla _{\boldsymbol {W}}\boldsymbol {J}_{\textrm{LC}}(\boldsymbol {W})(f)~ , \label{eq:GHDSS_ W} $ | (35) |
where $\nabla _{\boldsymbol {W}}$ means the partial derivative in respect of $\boldsymbol {W}$ same as Eq. (). The $\mu _{\textrm{SS}}$ and $\mu _{\textrm{LC}}$ in the equation can be specified by SS_MYU and LC_MYU, respectively. If SS_METHOD=ADAPTIVE, $\mu _{\textrm{SS}}$ is adaptively determined by
$\displaystyle \mu _{\textrm{SS}} $ | $\displaystyle = $ | $\displaystyle \left. \frac{\boldsymbol {J}_{\textrm{SS}}(\boldsymbol {W})}{\left| \nabla _{\boldsymbol {W}}\boldsymbol {J}_{\textrm{SS}}(\boldsymbol {W})\right|^2} \right|_{\boldsymbol {W} = \boldsymbol {W}(f)}~ .\label{eq:GHDSS_ SS_ mu} $ | (36) |
If LC_METHOD=ADAPTIVE, $\mu _{\textrm{LC}}$ is adaptively determined by
$\displaystyle \mu _{\textrm{LC}} $ | $\displaystyle = $ | $\displaystyle \left. \frac{\boldsymbol {J}_{\textrm{LC}}(\boldsymbol {W})}{\left| \nabla _{\boldsymbol {W}}\boldsymbol {J}_{\textrm{LC}}(\boldsymbol {W})\right|^2} \right|_{\boldsymbol {W} = \boldsymbol {W}(f)}~ .\label{eq:GHDSS_ LC_ mu} $ | (37) |
Trouble shooting: Basically, follow the trouble shooting of the GHDSS node.
H. Krim and M. Viberg, ’Two decades of array signal processing research: the parametric approach’, in IEEE Signal Processing Magazine, vol. 13, no. 4, pp. 67–94, 1996. D. H. Johnson and D. E. Dudgeon, Array Signal Processing: Concepts and Techniques, Prentice-Hall, 1993.
S. Talwar, et al.: ’Blind separation of synchronous co-channel digital signals using an antenna array. I. Algorithms’, IEEE Transactions on Signal Processing, vol. 44 , no. 5, pp. 1184 - 1197.
O. L. FrostIII, ’An Algorithm for Lineary Constrained Adaptive array processing’, Proc. of the IEEE, Vol. 60, No.8, 1972
L. Griffiths and C. Jim, ’An alternative approach to linearly constrained adaptive beamforming’, IEEE trans. on ant. and propag. Vol. AP-30, No.1, 1982
P. W. Howells, ’Intermediate Frequency Sidelobe Canceller’, U.S. Patent No.3202990, 1965.
Parra, L. C., et al.: ’Geometric source separation: Merging convolutive source separation with geometric beamforming’, IEEE Trans. SAP Vol.10, No.6, pp.352-362, 2002.
H. Nakajima, et al.: ’Blind Source Separation With Parameter-Free Adaptive Step-Size Method for Robot Audition’, IEEE Trans. ASL Vol.18, No.6, pp.1476-1485, 2010.