speechbrain.processing.multi_mic module¶
Multi-microphone components.
This library contains functions for multi-microphone signal processing.
Example
import torch >>> from speechbrain.dataio.dataio import read_audio from speechbrain.processing.features import STFT, ISTFT from speechbrain.processing.multi_mic import Covariance from speechbrain.processing.multi_mic import GccPhat, SrpPhat, Music from speechbrain.processing.multi_mic import DelaySum, Mvdr, Gev >>> xs_speech = read_audio(
‘samples/audio_samples/multi_mic/speech_-0.82918_0.55279_-0.082918.flac’
) xs_speech = xs_speech.unsqueeze(0) # [batch, time, channels] xs_noise_diff = read_audio(‘samples/audio_samples/multi_mic/noise_diffuse.flac’) xs_noise_diff = xs_noise_diff.unsqueeze(0) xs_noise_loc = read_audio(‘samples/audio_samples/multi_mic/noise_0.70225_-0.70225_0.11704.flac’) xs_noise_loc = xs_noise_loc.unsqueeze(0) fs = 16000 # sampling rate
ss = xs_speech nn_diff = 0.05 * xs_noise_diff nn_loc = 0.05 * xs_noise_loc xs_diffused_noise = ss + nn_diff xs_localized_noise = ss + nn_loc
# Delay-and-Sum Beamforming with GCC-PHAT localization stft = STFT(sample_rate=fs) cov = Covariance() gccphat = GccPhat() delaysum = DelaySum() istft = ISTFT(sample_rate=fs)
Xs = stft(xs_diffused_noise) XXs = cov(Xs) tdoas = gccphat(XXs) Ys_ds = delaysum(Xs, tdoas) ys_ds = istft(Ys_ds)
# Mvdr Beamforming with SRP-PHAT localization mvdr = Mvdr() mics = torch.zeros((4,3), dtype=torch.float) mics[0,:] = torch.FloatTensor([-0.05, -0.05, +0.00]) mics[1,:] = torch.FloatTensor([-0.05, +0.05, +0.00]) mics[2,:] = torch.FloatTensor([+0.05, +0.05, +0.00]) mics[3,:] = torch.FloatTensor([+0.05, +0.05, +0.00]) srpphat = SrpPhat(mics=mics) doas = srpphat(XXs) Ys_mvdr = mvdr(Xs, XXs, doas, doa_mode=True, mics=mics, fs=fs) ys_mvdr = istft(Ys_mvdr)
# Mvdr Beamforming with MUSIC localization music = Music(mics=mics) doas = music(XXs) Ys_mvdr2 = mvdr(Xs, XXs, doas, doa_mode=True, mics=mics, fs=fs) ys_mvdr2 = istft(Ys_mvdr2)
# GeV Beamforming gev = Gev() Xs = stft(xs_localized_noise) Ss = stft(ss) Nn = stft(nn_loc) SSs = cov(Ss) NNs = cov(Nn) Ys_gev = gev(Xs, SSs, NNs) ys_gev = istft(Ys_gev)
- Authors:
William Aris
Francois Grondin
Summary¶
Classes:
Computes the covariance matrices of the signals. |
|
Performs delay and sum beamforming by using the TDOAs and the first channel as a reference. |
|
Generalized Cross-Correlation with Phase Transform localization. |
|
Generalized EigenValue decomposition (GEV) Beamforming. |
|
Multiple Signal Classification (MUSIC) localization. |
|
Perform minimum variance distortionless response (MVDR) beamforming by using an input signal in the frequency domain, its covariance matrices and tdoas (to compute a steering vector). |
|
Steered-Response Power with Phase Transform Localization. |
Functions:
This function converts directions of arrival (xyz coordinates expressed in meters) in time differences of arrival (expressed in samples). |
|
This function generates cartesian coordinates (xyz) for a set of points forming a 3D sphere. |
|
This function computes a steering vector by using the time differences of arrival for each channel (in samples) and the number of bins (n_fft). |
|
This function selects the tdoas of each channel and put them in a tensor. |
Reference¶
-
class
speechbrain.processing.multi_mic.
Covariance
(average=True)[source]¶ Bases:
torch.nn.modules.module.Module
Computes the covariance matrices of the signals.
- averagebool
Informs the module if it should return an average (computed on the time dimension) of the covariance matrices. The Default value is True.
Example
import torch from speechbrain.dataio.dataio import read_audio from speechbrain.processing.features import STFT from speechbrain.processing.multi_mic import Covariance >>> xs_speech = read_audio(
‘samples/audio_samples/multi_mic/speech_-0.82918_0.55279_-0.082918.flac’
) xs_speech = xs_speech.unsqueeze(0) # [batch, time, channels] xs_noise = read_audio(‘samples/audio_samples/multi_mic/noise_diffuse.flac’) xs_noise = xs_noise.unsqueeze(0) xs = xs_speech + 0.05 * xs_noise fs = 16000
stft = STFT(sample_rate=fs) cov = Covariance() >>> Xs = stft(xs) XXs = cov(Xs) XXs.shape torch.Size([1, 1001, 201, 2, 10])
-
forward
(Xs)[source]¶ This method uses the utility function _cov to compute covariance matrices. Therefore, the result has the following format: (batch, time_step, n_fft/2 + 1, 2, n_mics + n_pairs).
The order on the last dimension corresponds to the triu_indices for a square matrix. For instance, if we have 4 channels, we get the following order: (0, 0), (0, 1), (0, 2), (0, 3), (1, 1), (1, 2), (1, 3), (2, 2), (2, 3) and (3, 3). Therefore, XXs[…, 0] corresponds to channels (0, 0) and XXs[…, 1] corresponds to channels (0, 1).
- Xstensor
A batch of audio signals in the frequency domain. The tensor must have the following format: (batch, time_step, n_fft/2 + 1, 2, n_mics)
-
class
speechbrain.processing.multi_mic.
DelaySum
[source]¶ Bases:
torch.nn.modules.module.Module
Performs delay and sum beamforming by using the TDOAs and the first channel as a reference.
Example
import torch
from speechbrain.dataio.dataio import read_audio from speechbrain.processing.features import STFT, ISTFT from speechbrain.processing.multi_mic import Covariance from speechbrain.processing.multi_mic import GccPhat, DelaySum >>> xs_speech = read_audio(
‘samples/audio_samples/multi_mic/speech_-0.82918_0.55279_-0.082918.flac’
) xs_speech = xs_speech. unsqueeze(0) # [batch, time, channel] xs_noise = read_audio(‘samples/audio_samples/multi_mic/noise_diffuse.flac’) xs_noise = xs_noise.unsqueeze(0) #[batch, time, channels] fs = 16000 xs = xs_speech + 0.05 * xs_noise >>> stft = STFT(sample_rate=fs) cov = Covariance() gccphat = GccPhat() delaysum = DelaySum() istft = ISTFT(sample_rate=fs) >>> Xs = stft(xs) XXs = cov(Xs) tdoas = gccphat(XXs) Ys = delaysum(Xs, tdoas) ys = istft(Ys)
-
forward
(Xs, localization_tensor, doa_mode=False, mics=None, fs=None, c=343.0)[source]¶ This method computes a steering vector by using the TDOAs/DOAs and then calls the utility function _delaysum to perform beamforming. The result has the following format: (batch, time_step, n_fft, 2, 1).
- Parameters
Xs (tensor) – A batch of audio signals in the frequency domain. The tensor must have the following format: (batch, time_step, n_fft/2 + 1, 2, n_mics)
localization_tensor (tensor) – A tensor containing either time differences of arrival (TDOAs) (in samples) for each timestamp or directions of arrival (DOAs) (xyz coordinates in meters). If localization_tensor represents TDOAs, then its format is (batch, time_steps, n_mics + n_pairs). If localization_tensor represents DOAs, then its format is (batch, time_steps, 3)
doa_mode (bool) – The user needs to set this parameter to True if localization_tensor represents DOAs instead of TDOAs. Its default value is set to False.
mics (tensor) – The cartesian position (xyz coordinates in meters) of each microphone. The tensor must have the following format (n_mics, 3). This parameter is only mandatory when localization_tensor represents DOAs.
fs (int) – The sample rate in Hertz of the signals. This parameter is only mandatory when localization_tensor represents DOAs.
c (float) – The speed of sound in the medium. The speed is expressed in meters per second and the default value of this parameter is 343 m/s. This parameter is only used when localization_tensor represents DOAs.
-
-
class
speechbrain.processing.multi_mic.
Mvdr
(eps=1e-20)[source]¶ Bases:
torch.nn.modules.module.Module
Perform minimum variance distortionless response (MVDR) beamforming by using an input signal in the frequency domain, its covariance matrices and tdoas (to compute a steering vector).
import torch
from speechbrain.dataio.dataio import read_audio from speechbrain.processing.features import STFT, ISTFT from speechbrain.processing.multi_mic import Covariance from speechbrain.processing.multi_mic import GccPhat, DelaySum >>> xs_speech = read_audio(
‘samples/audio_samples/multi_mic/speech_-0.82918_0.55279_-0.082918.flac’
) xs_speech = xs_speech.unsqueeze(0) # [batch, time, channel] xs_noise = read_audio(‘samples/audio_samples/multi_mic/noise_diffuse.flac’) xs_noise = xs_noise.unsqueeze(0) #[batch, time, channels] fs = 16000 xs = xs_speech + 0.05 * xs_noise >>> stft = STFT(sample_rate=fs) cov = Covariance() gccphat = GccPhat() mvdr = Mvdr() istft = ISTFT(sample_rate=fs) >>> Xs = stft(xs) XXs = cov(Xs) tdoas = gccphat(XXs) Ys = mvdr(Xs, XXs, tdoas) ys = istft(Ys)
-
forward
(Xs, XXs, localization_tensor, doa_mode=False, mics=None, fs=None, c=343.0)[source]¶ This method computes a steering vector before using the utility function _mvdr to perform beamforming. The result has the following format: (batch, time_step, n_fft, 2, 1).
- Parameters
Xs (tensor) – A batch of audio signals in the frequency domain. The tensor must have the following format: (batch, time_step, n_fft/2 + 1, 2, n_mics)
XXs (tensor) – The covariance matrices of the input signal. The tensor must have the format (batch, time_steps, n_fft/2 + 1, 2, n_mics + n_pairs)
localization_tensor (tensor) – A tensor containing either time differences of arrival (TDOAs) (in samples) for each timestamp or directions of arrival (DOAs) (xyz coordinates in meters). If localization_tensor represents TDOAs, then its format is (batch, time_steps, n_mics + n_pairs). If localization_tensor represents DOAs, then its format is (batch, time_steps, 3)
doa_mode (bool) – The user needs to set this parameter to True if localization_tensor represents DOAs instead of TDOAs. Its default value is set to False.
mics (tensor) – The cartesian position (xyz coordinates in meters) of each microphone. The tensor must have the following format (n_mics, 3). This parameter is only mandatory when localization_tensor represents DOAs.
fs (int) – The sample rate in Hertz of the signals. This parameter is only mandatory when localization_tensor represents DOAs.
c (float) – The speed of sound in the medium. The speed is expressed in meters per second and the default value of this parameter is 343 m/s. This parameter is only used when localization_tensor represents DOAs.
-
-
class
speechbrain.processing.multi_mic.
Gev
[source]¶ Bases:
torch.nn.modules.module.Module
Generalized EigenValue decomposition (GEV) Beamforming.
Example
from speechbrain.dataio.dataio import read_audio import torch >>> from speechbrain.processing.features import STFT, ISTFT from speechbrain.processing.multi_mic import Covariance from speechbrain.processing.multi_mic import Gev >>> xs_speech = read_audio(
‘samples/audio_samples/multi_mic/speech_-0.82918_0.55279_-0.082918.flac’
) xs_speech = xs_speech.unsqueeze(0) # [batch, time, channels] xs_noise = read_audio(‘samples/audio_samples/multi_mic/noise_0.70225_-0.70225_0.11704.flac’) xs_noise = xs_noise.unsqueeze(0) fs = 16000 ss = xs_speech nn = 0.05 * xs_noise xs = ss + nn >>> stft = STFT(sample_rate=fs) cov = Covariance() gev = Gev() istft = ISTFT(sample_rate=fs) >>> Ss = stft(ss) Nn = stft(nn) Xs = stft(xs) >>> SSs = cov(Ss) NNs = cov(Nn) >>> Ys = gev(Xs, SSs, NNs) ys = istft(Ys)
-
forward
(Xs, SSs, NNs)[source]¶ This method uses the utility function _gev to perform generalized eigenvalue decomposition beamforming. Therefore, the result has the following format: (batch, time_step, n_fft, 2, 1).
- Parameters
Xs (tensor) – A batch of audio signals in the frequency domain. The tensor must have the following format: (batch, time_step, n_fft/2 + 1, 2, n_mics).
SSs (tensor) – The covariance matrices of the target signal. The tensor must have the format (batch, time_steps, n_fft/2 + 1, 2, n_mics + n_pairs).
NNs (tensor) – The covariance matrices of the noise signal. The tensor must have the format (batch, time_steps, n_fft/2 + 1, 2, n_mics + n_pairs).
-
-
class
speechbrain.processing.multi_mic.
GccPhat
(tdoa_max=None, eps=1e-20)[source]¶ Bases:
torch.nn.modules.module.Module
Generalized Cross-Correlation with Phase Transform localization.
- Parameters
tdoa_max (int) – Specifies a range to search for delays. For example, if tdoa_max = 10, the method will restrict its search for delays between -10 and 10 samples. This parameter is optional and its default value is None. When tdoa_max is None, the method will search for delays between -n_fft/2 and n_fft/2 (full range).
eps (float) – A small value to avoid divisions by 0 with the phase transformation. The default value is 1e-20.
Example
import torch
from speechbrain.dataio.dataio import read_audio from speechbrain.processing.features import STFT, ISTFT from speechbrain.processing.multi_mic import Covariance from speechbrain.processing.multi_mic import GccPhat, DelaySum >>> xs_speech = read_audio(
‘samples/audio_samples/multi_mic/speech_-0.82918_0.55279_-0.082918.flac’
) xs_speech = xs_speech.unsqueeze(0) # [batch, time, channel] xs_noise = read_audio(‘samples/audio_samples/multi_mic/noise_diffuse.flac’) xs_noise = xs_noise.unsqueeze(0) #[batch, time, channels] fs = 16000 xs = xs_speech + 0.05 * xs_noise >>> stft = STFT(sample_rate=fs) cov = Covariance() gccphat = GccPhat() Xs = stft(xs) XXs = cov(Xs) tdoas = gccphat(XXs)
-
forward
(XXs)[source]¶ Perform generalized cross-correlation with phase transform localization by using the utility function _gcc_phat and by extracting the delays (in samples) before performing a quadratic interpolation to improve the accuracy. The result has the format: (batch, time_steps, n_mics + n_pairs).
The order on the last dimension corresponds to the triu_indices for a square matrix. For instance, if we have 4 channels, we get the following order: (0, 0), (0, 1), (0, 2), (0, 3), (1, 1), (1, 2), (1, 3), (2, 2), (2, 3) and (3, 3). Therefore, delays[…, 0] corresponds to channels (0, 0) and delays[…, 1] corresponds to channels (0, 1).
- XXstensor
The covariance matrices of the input signal. The tensor must have the format (batch, time_steps, n_fft/2 + 1, 2, n_mics + n_pairs).
-
class
speechbrain.processing.multi_mic.
SrpPhat
(mics, space='sphere', sample_rate=16000, speed_sound=343.0, eps=1e-20)[source]¶ Bases:
torch.nn.modules.module.Module
Steered-Response Power with Phase Transform Localization.
- Parameters
mics (tensor) – The cartesian coordinates (xyz) in meters of each microphone. The tensor must have the following format (n_mics, 3).
space (string) – If this parameter is set to ‘sphere’, the localization will be done in 3D by searching in a sphere of possible doas. If it set to ‘circle’, the search will be done in 2D by searching in a circle. By default, this parameter is set to ‘sphere’. Note: The ‘circle’ option isn’t implemented yet.
sample_rate (int) – The sample rate in Hertz of the signals to perform SRP-PHAT on. By default, this parameter is set to 16000 Hz.
speed_sound (float) – The speed of sound in the medium. The speed is expressed in meters per second and the default value of this parameter is 343 m/s.
eps (float) – A small value to avoid errors like division by 0. The default value of this parameter is 1e-20.
Example
import torch
from speechbrain.dataio.dataio import read_audio from speechbrain.processing.features import STFT from speechbrain.processing.multi_mic import Covariance from speechbrain.processing.multi_mic import SrpPhat
xs_speech = read_audio(‘samples/audio_samples/multi_mic/speech_-0.82918_0.55279_-0.082918.flac’) xs_noise = read_audio(‘samples/audio_samples/multi_mic/noise_diffuse.flac’) fs = 16000
xs_speech = xs_speech.unsqueeze(0) # [batch, time, channels] xs_noise = xs_noise.unsqueeze(0)
ss1 = xs_speech ns1 = 0.05 * xs_noise xs1 = ss1 + ns1
ss2 = xs_speech ns2 = 0.20 * xs_noise xs2 = ss2 + ns2
ss = torch.cat((ss1,ss2), dim=0) ns = torch.cat((ns1,ns2), dim=0) xs = torch.cat((xs1,xs2), dim=0)
mics = torch.zeros((4,3), dtype=torch.float) mics[0,:] = torch.FloatTensor([-0.05, -0.05, +0.00]) mics[1,:] = torch.FloatTensor([-0.05, +0.05, +0.00]) mics[2,:] = torch.FloatTensor([+0.05, +0.05, +0.00]) mics[3,:] = torch.FloatTensor([+0.05, +0.05, +0.00])
stft = STFT(sample_rate=fs) cov = Covariance() srpphat = SrpPhat(mics=mics)
Xs = stft(xs) XXs = cov(Xs) doas = srpphat(XXs)
-
forward
(XXs)[source]¶ Perform SRP-PHAT localization on a signal by computing a steering vector and then by using the utility function _srp_phat to extract the doas. The result is a tensor containing the directions of arrival (xyz coordinates (in meters) in the direction of the sound source). The output tensor has the format (batch, time_steps, 3).
This localization method uses Global Coherence Field (GCF): https://www.researchgate.net/publication/221491705_Speaker_localization_based_on_oriented_global_coherence_field
- Parameters
XXs (tensor) – The covariance matrices of the input signal. The tensor must have the format (batch, time_steps, n_fft/2 + 1, 2, n_mics + n_pairs).
-
class
speechbrain.processing.multi_mic.
Music
(mics, space='sphere', sample_rate=16000, speed_sound=343.0, eps=1e-20, n_sig=1)[source]¶ Bases:
torch.nn.modules.module.Module
Multiple Signal Classification (MUSIC) localization.
- Parameters
mics (tensor) – The cartesian coordinates (xyz) in meters of each microphone. The tensor must have the following format (n_mics, 3).
space (string) – If this parameter is set to ‘sphere’, the localization will be done in 3D by searching in a sphere of possible doas. If it set to ‘circle’, the search will be done in 2D by searching in a circle. By default, this parameter is set to ‘sphere’. Note: The ‘circle’ option isn’t implemented yet.
sample_rate (int) – The sample rate in Hertz of the signals to perform SRP-PHAT on. By default, this parameter is set to 16000 Hz.
speed_sound (float) – The speed of sound in the medium. The speed is expressed in meters per second and the default value of this parameter is 343 m/s.
eps (float) – A small value to avoid errors like division by 0. The default value of this parameter is 1e-20.
n_sig (int) – An estimation of the number of sound sources. The default value is set to one source.
Example
import torch
from speechbrain.dataio.dataio import read_audio from speechbrain.processing.features import STFT from speechbrain.processing.multi_mic import Covariance from speechbrain.processing.multi_mic import SrpPhat
xs_speech = read_audio(‘samples/audio_samples/multi_mic/speech_-0.82918_0.55279_-0.082918.flac’) xs_noise = read_audio(‘samples/audio_samples/multi_mic/noise_diffuse.flac’) fs = 16000
xs_speech = xs_speech.unsqueeze(0) # [batch, time, channels] xs_noise = xs_noise.unsqueeze(0)
ss1 = xs_speech ns1 = 0.05 * xs_noise xs1 = ss1 + ns1
ss2 = xs_speech ns2 = 0.20 * xs_noise xs2 = ss2 + ns2
ss = torch.cat((ss1,ss2), dim=0) ns = torch.cat((ns1,ns2), dim=0) xs = torch.cat((xs1,xs2), dim=0)
mics = torch.zeros((4,3), dtype=torch.float) mics[0,:] = torch.FloatTensor([-0.05, -0.05, +0.00]) mics[1,:] = torch.FloatTensor([-0.05, +0.05, +0.00]) mics[2,:] = torch.FloatTensor([+0.05, +0.05, +0.00]) mics[3,:] = torch.FloatTensor([+0.05, +0.05, +0.00])
stft = STFT(sample_rate=fs) cov = Covariance() music = Music(mics=mics)
Xs = stft(xs) XXs = cov(Xs) doas = music(XXs)
-
forward
(XXs)[source]¶ Perform MUSIC localization on a signal by computing a steering vector and then by using the utility function _music to extract the doas. The result is a tensor containing the directions of arrival (xyz coordinates (in meters) in the direction of the sound source). The output tensor has the format (batch, time_steps, 3).
- Parameters
XXs (tensor) – The covariance matrices of the input signal. The tensor must have the format (batch, time_steps, n_fft/2 + 1, 2, n_mics + n_pairs).
-
speechbrain.processing.multi_mic.
doas2taus
(doas, mics, fs, c=343.0)[source]¶ This function converts directions of arrival (xyz coordinates expressed in meters) in time differences of arrival (expressed in samples). The result has the following format: (batch, time_steps, n_mics).
- Parameters
doas (tensor) – The directions of arrival expressed with cartesian coordinates (xyz) in meters. The tensor must have the following format: (batch, time_steps, 3).
mics (tensor) – The cartesian position (xyz) in meters of each microphone. The tensor must have the following format (n_mics, 3).
fs (int) – The sample rate in Hertz of the signals.
c (float) – The speed of sound in the medium. The speed is expressed in meters per second and the default value of this parameter is 343 m/s.
Example
import torch
from speechbrain.dataio.dataio import read_audio from speechbrain.processing.multi_mic import sphere, doas2taus
xs = read_audio(‘samples/audio_samples/multi_mic/speech_-0.82918_0.55279_-0.082918.flac’) xs = xs.unsqueeze(0) # [batch, time, channels] fs = 16000 mics = torch.zeros((4,3), dtype=torch.float) mics[0,:] = torch.FloatTensor([-0.05, -0.05, +0.00]) mics[1,:] = torch.FloatTensor([-0.05, +0.05, +0.00]) mics[2,:] = torch.FloatTensor([+0.05, +0.05, +0.00]) mics[3,:] = torch.FloatTensor([+0.05, +0.05, +0.00])
doas = sphere() taus = doas2taus(doas, mics, fs)
-
speechbrain.processing.multi_mic.
tdoas2taus
(tdoas)[source]¶ This function selects the tdoas of each channel and put them in a tensor. The result has the following format: (batch, time_steps, n_mics).
- tdoastensor
The time difference of arrival (TDOA) (in samples) for each timestamp. The tensor has the format (batch, time_steps, n_mics + n_pairs).
Example
import torch from speechbrain.dataio.dataio import read_audio from speechbrain.processing.features import STFT from speechbrain.processing.multi_mic import Covariance from speechbrain.processing.multi_mic import GccPhat, tdoas2taus >>> xs_speech = read_audio(
‘samples/audio_samples/multi_mic/speech_-0.82918_0.55279_-0.082918.flac’
) xs_noise = read_audio(‘samples/audio_samples/multi_mic/noise_diffuse.flac’) xs = xs_speech + 0.05 * xs_noise xs = xs.unsqueeze(0) fs = 16000 >>> stft = STFT(sample_rate=fs) cov = Covariance() gccphat = GccPhat() >>> Xs = stft(xs) XXs = cov(Xs) tdoas = gccphat(XXs) taus = tdoas2taus(tdoas)
-
speechbrain.processing.multi_mic.
steering
(taus, n_fft)[source]¶ This function computes a steering vector by using the time differences of arrival for each channel (in samples) and the number of bins (n_fft). The result has the following format: (batch, time_step, n_fft/2 + 1, 2, n_mics).
- taustensor
The time differences of arrival for each channel. The tensor must have the following format: (batch, time_steps, n_mics).
- n_fftint
The number of bins resulting of the STFT. It is assumed that the argument “onesided” was set to True for the STFT.
Example: ——–f import torch from speechbrain.dataio.dataio import read_audio from speechbrain.processing.features import STFT from speechbrain.processing.multi_mic import Covariance from speechbrain.processing.multi_mic import GccPhat, tdoas2taus, steering >>> xs_speech = read_audio(
‘samples/audio_samples/multi_mic/speech_-0.82918_0.55279_-0.082918.flac’
) xs_noise = read_audio(‘samples/audio_samples/multi_mic/noise_diffuse.flac’) xs = xs_speech + 0.05 * xs_noise xs = xs.unsqueeze(0) # [batch, time, channels] fs = 16000
stft = STFT(sample_rate=fs) cov = Covariance() gccphat = GccPhat() >>> Xs = stft(xs) n_fft = Xs.shape[2] XXs = cov(Xs) tdoas = gccphat(XXs) taus = tdoas2taus(tdoas) As = steering(taus, n_fft)
-
speechbrain.processing.multi_mic.
sphere
(levels_count=4)[source]¶ This function generates cartesian coordinates (xyz) for a set of points forming a 3D sphere. The coordinates are expressed in meters and can be used as doas. The result has the format: (n_points, 3).
- Parameters
levels_count (int) –
A number proportional to the number of points that the user wants to generate.
If levels_count = 1, then the sphere will have 42 points
If levels_count = 2, then the sphere will have 162 points
If levels_count = 3, then the sphere will have 642 points
If levels_count = 4, then the sphere will have 2562 points
If levels_count = 5, then the sphere will have 10242 points
…
By default, levels_count is set to 4.
Example
import torch from speechbrain.processing.multi_mic import sphere doas = sphere()