speechbrain.lobes.models.Xvector module

A popular speaker recognition and diarization model.

Authors
  • Nauman Dawalatabad 2020

  • Mirco Ravanelli 2020

Summary

Classes:

Classifier

This class implements the last MLP on the top of xvector features.

Discriminator

This class implements a discriminator on the top of xvector features.

Xvector

This model extracts X-vectors for speaker recognition and diarization.

Reference

class speechbrain.lobes.models.Xvector.Xvector(device='cpu', activation=<class 'torch.nn.modules.activation.LeakyReLU'>, tdnn_blocks=5, tdnn_channels=[512, 512, 512, 512, 1500], tdnn_kernel_sizes=[5, 3, 3, 1, 1], tdnn_dilations=[1, 2, 3, 1, 1], lin_neurons=512, in_channels=40)[source]

Bases: torch.nn.modules.module.Module

This model extracts X-vectors for speaker recognition and diarization.

Parameters
  • device (str) – Device used e.g. “cpu” or “cuda”.

  • activation (torch class) – A class for constructing the activation layers.

  • tdnn_blocks (int) – Number of time-delay neural (TDNN) layers.

  • tdnn_channels (list of ints) – Output channels for TDNN layer.

  • tdnn_kernel_sizes (list of ints) – List of kernel sizes for each TDNN layer.

  • tdnn_dilations (list of ints) – List of dilations for kernels in each TDNN layer.

  • lin_neurons (int) – Number of neurons in linear layers.

Example

compute_xvect = Xvector(‘cpu’) input_feats = torch.rand([5, 10, 40]) outputs = compute_xvect(input_feats) outputs.shape torch.Size([5, 1, 512])

forward(x, lens=None)[source]

Returns the x-vectors.

Parameters

x (torch.Tensor) –

training: bool
class speechbrain.lobes.models.Xvector.Classifier(input_shape, activation=<class 'torch.nn.modules.activation.LeakyReLU'>, lin_blocks=1, lin_neurons=512, out_neurons=1211)[source]

Bases: speechbrain.nnet.containers.Sequential

This class implements the last MLP on the top of xvector features.

Parameters
  • input_shape (tuple) – Expected shape of an example input.

  • activation (torch class) – A class for constructing the activation layers.

  • lin_blocks (int) – Number of linear layers.

  • lin_neurons (int) – Number of neurons in linear layers.

  • out_neurons (int) – Number of output neurons.

Example

input_feats = torch.rand([5, 10, 40]) compute_xvect = Xvector() xvects = compute_xvect(input_feats) classify = Classifier(input_shape=xvects.shape) output = classify(xvects) output.shape torch.Size([5, 1, 1211])

training: bool
class speechbrain.lobes.models.Xvector.Discriminator(input_shape, activation=<class 'torch.nn.modules.activation.LeakyReLU'>, lin_blocks=1, lin_neurons=512, out_neurons=1)[source]

Bases: speechbrain.nnet.containers.Sequential

This class implements a discriminator on the top of xvector features.

Parameters
  • device (str) – Device used e.g. “cpu” or “cuda”

  • activation (torch class) – A class for constructing the activation layers.

  • lin_blocks (int) – Number of linear layers.

  • lin_neurons (int) – Number of neurons in linear layers.

Example

input_feats = torch.rand([5, 10, 40]) compute_xvect = Xvector() xvects = compute_xvect(input_feats) discriminate = Discriminator(xvects.shape) output = discriminate(xvects) output.shape torch.Size([5, 1, 1])

training: bool