speechbrain.processing.multi_mic module

Multi-microphone components.

This library contains functions for multi-microphone signal processing.

Example

import torch >>> from speechbrain.dataio.dataio import read_audio from speechbrain.processing.features import STFT, ISTFT from speechbrain.processing.multi_mic import Covariance from speechbrain.processing.multi_mic import GccPhat, SrpPhat, Music from speechbrain.processing.multi_mic import DelaySum, Mvdr, Gev >>> xs_speech = read_audio(

‘samples/audio_samples/multi_mic/speech_-0.82918_0.55279_-0.082918.flac’

) xs_speech = xs_speech.unsqueeze(0) # [batch, time, channels] xs_noise_diff = read_audio(‘samples/audio_samples/multi_mic/noise_diffuse.flac’) xs_noise_diff = xs_noise_diff.unsqueeze(0) xs_noise_loc = read_audio(‘samples/audio_samples/multi_mic/noise_0.70225_-0.70225_0.11704.flac’) xs_noise_loc = xs_noise_loc.unsqueeze(0) fs = 16000 # sampling rate

ss = xs_speech nn_diff = 0.05 * xs_noise_diff nn_loc = 0.05 * xs_noise_loc xs_diffused_noise = ss + nn_diff xs_localized_noise = ss + nn_loc

# Delay-and-Sum Beamforming with GCC-PHAT localization stft = STFT(sample_rate=fs) cov = Covariance() gccphat = GccPhat() delaysum = DelaySum() istft = ISTFT(sample_rate=fs)

Xs = stft(xs_diffused_noise) XXs = cov(Xs) tdoas = gccphat(XXs) Ys_ds = delaysum(Xs, tdoas) ys_ds = istft(Ys_ds)

# Mvdr Beamforming with SRP-PHAT localization mvdr = Mvdr() mics = torch.zeros((4,3), dtype=torch.float) mics[0,:] = torch.FloatTensor([-0.05, -0.05, +0.00]) mics[1,:] = torch.FloatTensor([-0.05, +0.05, +0.00]) mics[2,:] = torch.FloatTensor([+0.05, +0.05, +0.00]) mics[3,:] = torch.FloatTensor([+0.05, +0.05, +0.00]) srpphat = SrpPhat(mics=mics) doas = srpphat(XXs) Ys_mvdr = mvdr(Xs, XXs, doas, doa_mode=True, mics=mics, fs=fs) ys_mvdr = istft(Ys_mvdr)

# Mvdr Beamforming with MUSIC localization music = Music(mics=mics) doas = music(XXs) Ys_mvdr2 = mvdr(Xs, XXs, doas, doa_mode=True, mics=mics, fs=fs) ys_mvdr2 = istft(Ys_mvdr2)

# GeV Beamforming gev = Gev() Xs = stft(xs_localized_noise) Ss = stft(ss) Nn = stft(nn_loc) SSs = cov(Ss) NNs = cov(Nn) Ys_gev = gev(Xs, SSs, NNs) ys_gev = istft(Ys_gev)

Authors:
  • William Aris

  • Francois Grondin

Summary

Classes:

Covariance

Computes the covariance matrices of the signals.

DelaySum

Performs delay and sum beamforming by using the TDOAs and the first channel as a reference.

GccPhat

Generalized Cross-Correlation with Phase Transform localization.

Gev

Generalized EigenValue decomposition (GEV) Beamforming.

Music

Multiple Signal Classification (MUSIC) localization.

Mvdr

Perform minimum variance distortionless response (MVDR) beamforming by using an input signal in the frequency domain, its covariance matrices and tdoas (to compute a steering vector).

SrpPhat

Steered-Response Power with Phase Transform Localization.

Functions:

doas2taus

This function converts directions of arrival (xyz coordinates expressed in meters) in time differences of arrival (expressed in samples).

sphere

This function generates cartesian coordinates (xyz) for a set of points forming a 3D sphere.

steering

This function computes a steering vector by using the time differences of arrival for each channel (in samples) and the number of bins (n_fft).

tdoas2taus

This function selects the tdoas of each channel and put them in a tensor.

Reference

class speechbrain.processing.multi_mic.Covariance(average=True)[source]

Bases: torch.nn.modules.module.Module

Computes the covariance matrices of the signals.

averagebool

Informs the module if it should return an average (computed on the time dimension) of the covariance matrices. The Default value is True.

Example

import torch from speechbrain.dataio.dataio import read_audio from speechbrain.processing.features import STFT from speechbrain.processing.multi_mic import Covariance >>> xs_speech = read_audio(

‘samples/audio_samples/multi_mic/speech_-0.82918_0.55279_-0.082918.flac’

) xs_speech = xs_speech.unsqueeze(0) # [batch, time, channels] xs_noise = read_audio(‘samples/audio_samples/multi_mic/noise_diffuse.flac’) xs_noise = xs_noise.unsqueeze(0) xs = xs_speech + 0.05 * xs_noise fs = 16000

stft = STFT(sample_rate=fs) cov = Covariance() >>> Xs = stft(xs) XXs = cov(Xs) XXs.shape torch.Size([1, 1001, 201, 2, 10])

forward(Xs)[source]

This method uses the utility function _cov to compute covariance matrices. Therefore, the result has the following format: (batch, time_step, n_fft/2 + 1, 2, n_mics + n_pairs).

The order on the last dimension corresponds to the triu_indices for a square matrix. For instance, if we have 4 channels, we get the following order: (0, 0), (0, 1), (0, 2), (0, 3), (1, 1), (1, 2), (1, 3), (2, 2), (2, 3) and (3, 3). Therefore, XXs[…, 0] corresponds to channels (0, 0) and XXs[…, 1] corresponds to channels (0, 1).

Xstensor

A batch of audio signals in the frequency domain. The tensor must have the following format: (batch, time_step, n_fft/2 + 1, 2, n_mics)

training: bool
class speechbrain.processing.multi_mic.DelaySum[source]

Bases: torch.nn.modules.module.Module

Performs delay and sum beamforming by using the TDOAs and the first channel as a reference.

Example

import torch

from speechbrain.dataio.dataio import read_audio from speechbrain.processing.features import STFT, ISTFT from speechbrain.processing.multi_mic import Covariance from speechbrain.processing.multi_mic import GccPhat, DelaySum >>> xs_speech = read_audio(

‘samples/audio_samples/multi_mic/speech_-0.82918_0.55279_-0.082918.flac’

) xs_speech = xs_speech. unsqueeze(0) # [batch, time, channel] xs_noise = read_audio(‘samples/audio_samples/multi_mic/noise_diffuse.flac’) xs_noise = xs_noise.unsqueeze(0) #[batch, time, channels] fs = 16000 xs = xs_speech + 0.05 * xs_noise >>> stft = STFT(sample_rate=fs) cov = Covariance() gccphat = GccPhat() delaysum = DelaySum() istft = ISTFT(sample_rate=fs) >>> Xs = stft(xs) XXs = cov(Xs) tdoas = gccphat(XXs) Ys = delaysum(Xs, tdoas) ys = istft(Ys)

forward(Xs, localization_tensor, doa_mode=False, mics=None, fs=None, c=343.0)[source]

This method computes a steering vector by using the TDOAs/DOAs and then calls the utility function _delaysum to perform beamforming. The result has the following format: (batch, time_step, n_fft, 2, 1).

Parameters
  • Xs (tensor) – A batch of audio signals in the frequency domain. The tensor must have the following format: (batch, time_step, n_fft/2 + 1, 2, n_mics)

  • localization_tensor (tensor) – A tensor containing either time differences of arrival (TDOAs) (in samples) for each timestamp or directions of arrival (DOAs) (xyz coordinates in meters). If localization_tensor represents TDOAs, then its format is (batch, time_steps, n_mics + n_pairs). If localization_tensor represents DOAs, then its format is (batch, time_steps, 3)

  • doa_mode (bool) – The user needs to set this parameter to True if localization_tensor represents DOAs instead of TDOAs. Its default value is set to False.

  • mics (tensor) – The cartesian position (xyz coordinates in meters) of each microphone. The tensor must have the following format (n_mics, 3). This parameter is only mandatory when localization_tensor represents DOAs.

  • fs (int) – The sample rate in Hertz of the signals. This parameter is only mandatory when localization_tensor represents DOAs.

  • c (float) – The speed of sound in the medium. The speed is expressed in meters per second and the default value of this parameter is 343 m/s. This parameter is only used when localization_tensor represents DOAs.

training: bool
class speechbrain.processing.multi_mic.Mvdr(eps=1e-20)[source]

Bases: torch.nn.modules.module.Module

Perform minimum variance distortionless response (MVDR) beamforming by using an input signal in the frequency domain, its covariance matrices and tdoas (to compute a steering vector).

import torch

from speechbrain.dataio.dataio import read_audio from speechbrain.processing.features import STFT, ISTFT from speechbrain.processing.multi_mic import Covariance from speechbrain.processing.multi_mic import GccPhat, DelaySum >>> xs_speech = read_audio(

‘samples/audio_samples/multi_mic/speech_-0.82918_0.55279_-0.082918.flac’

) xs_speech = xs_speech.unsqueeze(0) # [batch, time, channel] xs_noise = read_audio(‘samples/audio_samples/multi_mic/noise_diffuse.flac’) xs_noise = xs_noise.unsqueeze(0) #[batch, time, channels] fs = 16000 xs = xs_speech + 0.05 * xs_noise >>> stft = STFT(sample_rate=fs) cov = Covariance() gccphat = GccPhat() mvdr = Mvdr() istft = ISTFT(sample_rate=fs) >>> Xs = stft(xs) XXs = cov(Xs) tdoas = gccphat(XXs) Ys = mvdr(Xs, XXs, tdoas) ys = istft(Ys)

forward(Xs, XXs, localization_tensor, doa_mode=False, mics=None, fs=None, c=343.0)[source]

This method computes a steering vector before using the utility function _mvdr to perform beamforming. The result has the following format: (batch, time_step, n_fft, 2, 1).

Parameters
  • Xs (tensor) – A batch of audio signals in the frequency domain. The tensor must have the following format: (batch, time_step, n_fft/2 + 1, 2, n_mics)

  • XXs (tensor) – The covariance matrices of the input signal. The tensor must have the format (batch, time_steps, n_fft/2 + 1, 2, n_mics + n_pairs)

  • localization_tensor (tensor) – A tensor containing either time differences of arrival (TDOAs) (in samples) for each timestamp or directions of arrival (DOAs) (xyz coordinates in meters). If localization_tensor represents TDOAs, then its format is (batch, time_steps, n_mics + n_pairs). If localization_tensor represents DOAs, then its format is (batch, time_steps, 3)

  • doa_mode (bool) – The user needs to set this parameter to True if localization_tensor represents DOAs instead of TDOAs. Its default value is set to False.

  • mics (tensor) – The cartesian position (xyz coordinates in meters) of each microphone. The tensor must have the following format (n_mics, 3). This parameter is only mandatory when localization_tensor represents DOAs.

  • fs (int) – The sample rate in Hertz of the signals. This parameter is only mandatory when localization_tensor represents DOAs.

  • c (float) – The speed of sound in the medium. The speed is expressed in meters per second and the default value of this parameter is 343 m/s. This parameter is only used when localization_tensor represents DOAs.

training: bool
class speechbrain.processing.multi_mic.Gev[source]

Bases: torch.nn.modules.module.Module

Generalized EigenValue decomposition (GEV) Beamforming.

Example

from speechbrain.dataio.dataio import read_audio import torch >>> from speechbrain.processing.features import STFT, ISTFT from speechbrain.processing.multi_mic import Covariance from speechbrain.processing.multi_mic import Gev >>> xs_speech = read_audio(

‘samples/audio_samples/multi_mic/speech_-0.82918_0.55279_-0.082918.flac’

) xs_speech = xs_speech.unsqueeze(0) # [batch, time, channels] xs_noise = read_audio(‘samples/audio_samples/multi_mic/noise_0.70225_-0.70225_0.11704.flac’) xs_noise = xs_noise.unsqueeze(0) fs = 16000 ss = xs_speech nn = 0.05 * xs_noise xs = ss + nn >>> stft = STFT(sample_rate=fs) cov = Covariance() gev = Gev() istft = ISTFT(sample_rate=fs) >>> Ss = stft(ss) Nn = stft(nn) Xs = stft(xs) >>> SSs = cov(Ss) NNs = cov(Nn) >>> Ys = gev(Xs, SSs, NNs) ys = istft(Ys)

forward(Xs, SSs, NNs)[source]

This method uses the utility function _gev to perform generalized eigenvalue decomposition beamforming. Therefore, the result has the following format: (batch, time_step, n_fft, 2, 1).

Parameters
  • Xs (tensor) – A batch of audio signals in the frequency domain. The tensor must have the following format: (batch, time_step, n_fft/2 + 1, 2, n_mics).

  • SSs (tensor) – The covariance matrices of the target signal. The tensor must have the format (batch, time_steps, n_fft/2 + 1, 2, n_mics + n_pairs).

  • NNs (tensor) – The covariance matrices of the noise signal. The tensor must have the format (batch, time_steps, n_fft/2 + 1, 2, n_mics + n_pairs).

training: bool
class speechbrain.processing.multi_mic.GccPhat(tdoa_max=None, eps=1e-20)[source]

Bases: torch.nn.modules.module.Module

Generalized Cross-Correlation with Phase Transform localization.

Parameters
  • tdoa_max (int) – Specifies a range to search for delays. For example, if tdoa_max = 10, the method will restrict its search for delays between -10 and 10 samples. This parameter is optional and its default value is None. When tdoa_max is None, the method will search for delays between -n_fft/2 and n_fft/2 (full range).

  • eps (float) – A small value to avoid divisions by 0 with the phase transformation. The default value is 1e-20.

Example

import torch

from speechbrain.dataio.dataio import read_audio from speechbrain.processing.features import STFT, ISTFT from speechbrain.processing.multi_mic import Covariance from speechbrain.processing.multi_mic import GccPhat, DelaySum >>> xs_speech = read_audio(

‘samples/audio_samples/multi_mic/speech_-0.82918_0.55279_-0.082918.flac’

) xs_speech = xs_speech.unsqueeze(0) # [batch, time, channel] xs_noise = read_audio(‘samples/audio_samples/multi_mic/noise_diffuse.flac’) xs_noise = xs_noise.unsqueeze(0) #[batch, time, channels] fs = 16000 xs = xs_speech + 0.05 * xs_noise >>> stft = STFT(sample_rate=fs) cov = Covariance() gccphat = GccPhat() Xs = stft(xs) XXs = cov(Xs) tdoas = gccphat(XXs)

forward(XXs)[source]

Perform generalized cross-correlation with phase transform localization by using the utility function _gcc_phat and by extracting the delays (in samples) before performing a quadratic interpolation to improve the accuracy. The result has the format: (batch, time_steps, n_mics + n_pairs).

The order on the last dimension corresponds to the triu_indices for a square matrix. For instance, if we have 4 channels, we get the following order: (0, 0), (0, 1), (0, 2), (0, 3), (1, 1), (1, 2), (1, 3), (2, 2), (2, 3) and (3, 3). Therefore, delays[…, 0] corresponds to channels (0, 0) and delays[…, 1] corresponds to channels (0, 1).

XXstensor

The covariance matrices of the input signal. The tensor must have the format (batch, time_steps, n_fft/2 + 1, 2, n_mics + n_pairs).

training: bool
class speechbrain.processing.multi_mic.SrpPhat(mics, space='sphere', sample_rate=16000, speed_sound=343.0, eps=1e-20)[source]

Bases: torch.nn.modules.module.Module

Steered-Response Power with Phase Transform Localization.

Parameters
  • mics (tensor) – The cartesian coordinates (xyz) in meters of each microphone. The tensor must have the following format (n_mics, 3).

  • space (string) – If this parameter is set to ‘sphere’, the localization will be done in 3D by searching in a sphere of possible doas. If it set to ‘circle’, the search will be done in 2D by searching in a circle. By default, this parameter is set to ‘sphere’. Note: The ‘circle’ option isn’t implemented yet.

  • sample_rate (int) – The sample rate in Hertz of the signals to perform SRP-PHAT on. By default, this parameter is set to 16000 Hz.

  • speed_sound (float) – The speed of sound in the medium. The speed is expressed in meters per second and the default value of this parameter is 343 m/s.

  • eps (float) – A small value to avoid errors like division by 0. The default value of this parameter is 1e-20.

Example

import torch

from speechbrain.dataio.dataio import read_audio from speechbrain.processing.features import STFT from speechbrain.processing.multi_mic import Covariance from speechbrain.processing.multi_mic import SrpPhat

xs_speech = read_audio(‘samples/audio_samples/multi_mic/speech_-0.82918_0.55279_-0.082918.flac’) xs_noise = read_audio(‘samples/audio_samples/multi_mic/noise_diffuse.flac’) fs = 16000

xs_speech = xs_speech.unsqueeze(0) # [batch, time, channels] xs_noise = xs_noise.unsqueeze(0)

ss1 = xs_speech ns1 = 0.05 * xs_noise xs1 = ss1 + ns1

ss2 = xs_speech ns2 = 0.20 * xs_noise xs2 = ss2 + ns2

ss = torch.cat((ss1,ss2), dim=0) ns = torch.cat((ns1,ns2), dim=0) xs = torch.cat((xs1,xs2), dim=0)

mics = torch.zeros((4,3), dtype=torch.float) mics[0,:] = torch.FloatTensor([-0.05, -0.05, +0.00]) mics[1,:] = torch.FloatTensor([-0.05, +0.05, +0.00]) mics[2,:] = torch.FloatTensor([+0.05, +0.05, +0.00]) mics[3,:] = torch.FloatTensor([+0.05, +0.05, +0.00])

stft = STFT(sample_rate=fs) cov = Covariance() srpphat = SrpPhat(mics=mics)

Xs = stft(xs) XXs = cov(Xs) doas = srpphat(XXs)

forward(XXs)[source]

Perform SRP-PHAT localization on a signal by computing a steering vector and then by using the utility function _srp_phat to extract the doas. The result is a tensor containing the directions of arrival (xyz coordinates (in meters) in the direction of the sound source). The output tensor has the format (batch, time_steps, 3).

This localization method uses Global Coherence Field (GCF): https://www.researchgate.net/publication/221491705_Speaker_localization_based_on_oriented_global_coherence_field

Parameters

XXs (tensor) – The covariance matrices of the input signal. The tensor must have the format (batch, time_steps, n_fft/2 + 1, 2, n_mics + n_pairs).

training: bool
class speechbrain.processing.multi_mic.Music(mics, space='sphere', sample_rate=16000, speed_sound=343.0, eps=1e-20, n_sig=1)[source]

Bases: torch.nn.modules.module.Module

Multiple Signal Classification (MUSIC) localization.

Parameters
  • mics (tensor) – The cartesian coordinates (xyz) in meters of each microphone. The tensor must have the following format (n_mics, 3).

  • space (string) – If this parameter is set to ‘sphere’, the localization will be done in 3D by searching in a sphere of possible doas. If it set to ‘circle’, the search will be done in 2D by searching in a circle. By default, this parameter is set to ‘sphere’. Note: The ‘circle’ option isn’t implemented yet.

  • sample_rate (int) – The sample rate in Hertz of the signals to perform SRP-PHAT on. By default, this parameter is set to 16000 Hz.

  • speed_sound (float) – The speed of sound in the medium. The speed is expressed in meters per second and the default value of this parameter is 343 m/s.

  • eps (float) – A small value to avoid errors like division by 0. The default value of this parameter is 1e-20.

  • n_sig (int) – An estimation of the number of sound sources. The default value is set to one source.

Example

import torch

from speechbrain.dataio.dataio import read_audio from speechbrain.processing.features import STFT from speechbrain.processing.multi_mic import Covariance from speechbrain.processing.multi_mic import SrpPhat

xs_speech = read_audio(‘samples/audio_samples/multi_mic/speech_-0.82918_0.55279_-0.082918.flac’) xs_noise = read_audio(‘samples/audio_samples/multi_mic/noise_diffuse.flac’) fs = 16000

xs_speech = xs_speech.unsqueeze(0) # [batch, time, channels] xs_noise = xs_noise.unsqueeze(0)

ss1 = xs_speech ns1 = 0.05 * xs_noise xs1 = ss1 + ns1

ss2 = xs_speech ns2 = 0.20 * xs_noise xs2 = ss2 + ns2

ss = torch.cat((ss1,ss2), dim=0) ns = torch.cat((ns1,ns2), dim=0) xs = torch.cat((xs1,xs2), dim=0)

mics = torch.zeros((4,3), dtype=torch.float) mics[0,:] = torch.FloatTensor([-0.05, -0.05, +0.00]) mics[1,:] = torch.FloatTensor([-0.05, +0.05, +0.00]) mics[2,:] = torch.FloatTensor([+0.05, +0.05, +0.00]) mics[3,:] = torch.FloatTensor([+0.05, +0.05, +0.00])

stft = STFT(sample_rate=fs) cov = Covariance() music = Music(mics=mics)

Xs = stft(xs) XXs = cov(Xs) doas = music(XXs)

forward(XXs)[source]

Perform MUSIC localization on a signal by computing a steering vector and then by using the utility function _music to extract the doas. The result is a tensor containing the directions of arrival (xyz coordinates (in meters) in the direction of the sound source). The output tensor has the format (batch, time_steps, 3).

Parameters

XXs (tensor) – The covariance matrices of the input signal. The tensor must have the format (batch, time_steps, n_fft/2 + 1, 2, n_mics + n_pairs).

training: bool
speechbrain.processing.multi_mic.doas2taus(doas, mics, fs, c=343.0)[source]

This function converts directions of arrival (xyz coordinates expressed in meters) in time differences of arrival (expressed in samples). The result has the following format: (batch, time_steps, n_mics).

Parameters
  • doas (tensor) – The directions of arrival expressed with cartesian coordinates (xyz) in meters. The tensor must have the following format: (batch, time_steps, 3).

  • mics (tensor) – The cartesian position (xyz) in meters of each microphone. The tensor must have the following format (n_mics, 3).

  • fs (int) – The sample rate in Hertz of the signals.

  • c (float) – The speed of sound in the medium. The speed is expressed in meters per second and the default value of this parameter is 343 m/s.

Example

import torch

from speechbrain.dataio.dataio import read_audio from speechbrain.processing.multi_mic import sphere, doas2taus

xs = read_audio(‘samples/audio_samples/multi_mic/speech_-0.82918_0.55279_-0.082918.flac’) xs = xs.unsqueeze(0) # [batch, time, channels] fs = 16000 mics = torch.zeros((4,3), dtype=torch.float) mics[0,:] = torch.FloatTensor([-0.05, -0.05, +0.00]) mics[1,:] = torch.FloatTensor([-0.05, +0.05, +0.00]) mics[2,:] = torch.FloatTensor([+0.05, +0.05, +0.00]) mics[3,:] = torch.FloatTensor([+0.05, +0.05, +0.00])

doas = sphere() taus = doas2taus(doas, mics, fs)

speechbrain.processing.multi_mic.tdoas2taus(tdoas)[source]

This function selects the tdoas of each channel and put them in a tensor. The result has the following format: (batch, time_steps, n_mics).

tdoastensor

The time difference of arrival (TDOA) (in samples) for each timestamp. The tensor has the format (batch, time_steps, n_mics + n_pairs).

Example

import torch from speechbrain.dataio.dataio import read_audio from speechbrain.processing.features import STFT from speechbrain.processing.multi_mic import Covariance from speechbrain.processing.multi_mic import GccPhat, tdoas2taus >>> xs_speech = read_audio(

‘samples/audio_samples/multi_mic/speech_-0.82918_0.55279_-0.082918.flac’

) xs_noise = read_audio(‘samples/audio_samples/multi_mic/noise_diffuse.flac’) xs = xs_speech + 0.05 * xs_noise xs = xs.unsqueeze(0) fs = 16000 >>> stft = STFT(sample_rate=fs) cov = Covariance() gccphat = GccPhat() >>> Xs = stft(xs) XXs = cov(Xs) tdoas = gccphat(XXs) taus = tdoas2taus(tdoas)

speechbrain.processing.multi_mic.steering(taus, n_fft)[source]

This function computes a steering vector by using the time differences of arrival for each channel (in samples) and the number of bins (n_fft). The result has the following format: (batch, time_step, n_fft/2 + 1, 2, n_mics).

taustensor

The time differences of arrival for each channel. The tensor must have the following format: (batch, time_steps, n_mics).

n_fftint

The number of bins resulting of the STFT. It is assumed that the argument “onesided” was set to True for the STFT.

Example: ——–f import torch from speechbrain.dataio.dataio import read_audio from speechbrain.processing.features import STFT from speechbrain.processing.multi_mic import Covariance from speechbrain.processing.multi_mic import GccPhat, tdoas2taus, steering >>> xs_speech = read_audio(

‘samples/audio_samples/multi_mic/speech_-0.82918_0.55279_-0.082918.flac’

) xs_noise = read_audio(‘samples/audio_samples/multi_mic/noise_diffuse.flac’) xs = xs_speech + 0.05 * xs_noise xs = xs.unsqueeze(0) # [batch, time, channels] fs = 16000

stft = STFT(sample_rate=fs) cov = Covariance() gccphat = GccPhat() >>> Xs = stft(xs) n_fft = Xs.shape[2] XXs = cov(Xs) tdoas = gccphat(XXs) taus = tdoas2taus(tdoas) As = steering(taus, n_fft)

speechbrain.processing.multi_mic.sphere(levels_count=4)[source]

This function generates cartesian coordinates (xyz) for a set of points forming a 3D sphere. The coordinates are expressed in meters and can be used as doas. The result has the format: (n_points, 3).

Parameters

levels_count (int) –

A number proportional to the number of points that the user wants to generate.

  • If levels_count = 1, then the sphere will have 42 points

  • If levels_count = 2, then the sphere will have 162 points

  • If levels_count = 3, then the sphere will have 642 points

  • If levels_count = 4, then the sphere will have 2562 points

  • If levels_count = 5, then the sphere will have 10242 points

By default, levels_count is set to 4.

Example

import torch from speechbrain.processing.multi_mic import sphere doas = sphere()