Representatoin learning-based features

Input and decoded spectrograms reconstructioon error

Created on Dec 18 2020

@author:

class replearning.RepLearning(model)

Feature extraction from speech signals based on representation learning strategies using convolutional and recurrent autoencoders

Two types of features are computed

  1. 256 features extracted from the bottleneck layer of the autoencoders
  2. 128 features based on the MSE between the decoded and input spectrograms of the autoencoder in different frequency regions

Additionally, static (for all utterance) or dynamic (for each 500 ms speech segments) features can be computed: - The static feature vector is formed with 1024 features and contains (384 descriptors) x (4 functionals: mean, std, skewness, kurtosis) - The dynamic feature matrix is formed with the 384 descriptors computed for speech segments with 500ms length and 250ms time-shift - You can choose between features computed from a convolutional or recurrent autoencoder

Script is called as follows

>>> python replearning.py <file_or_folder_audio> <file_features> <static (true or false)> <plots (true or false)> <format (csv, txt, npy, kaldi, torch)> <model (CAE, RAE)>

Examples command line:

>>> python replearning.py "../audios/001_ddk1_PCGITA.wav" "replearningfeaturesDDKst.txt" "true" "true" "txt" "CAE"
>>> python replearning.py "../audios/001_ddk1_PCGITA.wav" "replearningfeaturesDDKdyn.pt" "false" "true" "torch" "CAE"
>>> python replearning.py "../audios/" "replearningfeaturesst.txt" "true" "false" "txt" "CAE"
>>> python replearning.py "../audios/" "replearningfeaturesst.csv" "true" "false" "csv" "CAE"
>>> python replearning.py "../audios/" "replearningfeaturesdyn.pt" "false" "false" "torch" "CAE"

Examples directly in Python

>>> from replearning import RepLearning
>>> replearning=RepLearning('CAE')
>>> file_audio="../audios/001_a1_PCGITA.wav"
>>> features1=replearning.extract_features_file(file_audio, static=True, plots=True, fmt="npy")
>>> features2=replearning.extract_features_file(file_audio, static=True, plots=True, fmt="dataframe")
>>> features3=replearning.extract_features_file(file_audio, static=False, plots=True, fmt="torch")
>>> replearning.extract_features_file(file_audio, static=False, plots=False, fmt="kaldi", kaldi_file="./test")
extract_features_file(audio, static=True, plots=False, fmt='npy', kaldi_file='')

Extract the representation learning features from an audio file

Parameters:
  • audio – .wav audio file.
  • static – whether to compute and return statistic functionals over the feature matrix, or return the feature matrix computed over frames
  • plots – timeshift to extract the features
  • fmt – format to return the features (npy, dataframe, torch, kaldi)
  • kaldi_file – file to store kaldi features, only valid when fmt==”kaldi”
Returns:

features computed from the audio file.

>>> replearning=RepLearning('CAE')
>>> file_audio="../audios/001_ddk1_PCGITA.wav"
>>> features1=replearning.extract_features_file(file_audio, static=True, plots=True, fmt="npy")
>>> features2=replearning.extract_features_file(file_audio, static=True, plots=True, fmt="dataframe")
>>> features3=replearning.extract_features_file(file_audio, static=False, plots=True, fmt="torch")
>>> replearning.extract_features_file(file_audio, static=False, plots=False, fmt="kaldi", kaldi_file="./test")
>>> replearning=RepLearning('CAE')
>>> path_audio="../audios/"
>>> features1=replearning.extract_features_path(path_audio, static=True, plots=False, fmt="npy")
>>> features2=replearning.extract_features_path(path_audio, static=True, plots=False, fmt="csv")
>>> features3=replearning.extract_features_path(path_audio, static=False, plots=True, fmt="torch")
>>> replearning.extract_features_path(path_audio, static=False, plots=False, fmt="kaldi", kaldi_file="./test.ark")
extract_features_path(path_audio, static=True, plots=False, fmt='npy', kaldi_file='')

Extract the representation learning features for audios inside a path

Parameters:
  • path_audio – directory with (.wav) audio files inside, sampled at 16 kHz
  • static – whether to compute and return statistic functionals over the feature matrix, or return the feature matrix computed over frames
  • plots – timeshift to extract the features
  • fmt – format to return the features (npy, dataframe, torch, kaldi)
  • kaldi_file – file to store kaldifeatures, only valid when fmt==”kaldi”
Returns:

features computed from the audio file.

>>> replearning=RepLearning('CAE')
>>> path_audio="../audios/"
>>> features1=phonological.replearning(path_audio, static=True, plots=False, fmt="npy")
>>> features2=phonological.replearning(path_audio, static=True, plots=False, fmt="csv")
>>> features3=phonological.replearning(path_audio, static=False, plots=True, fmt="torch")
>>> replearning.extract_features_path(path_audio, static=False, plots=False, fmt="kaldi", kaldi_file="./test.ark")
class replearning.AEspeech(model, units)
compute_bottleneck_features(wav_file, return_numpy=True)

Compute the the bottleneck features of the autoencoder :param wav_file: *.wav file with a sampling frequency of 16kHz :param return_numpy: return the features in a numpy array (True) or a Pytorch tensor (False) :returns: Pytorch tensor (nf, h) or numpy array (nf, h) with the extracted features. nf: number of frames, size of the bottleneck space

compute_dynamic_features(wav_directory)

Compute both the bottleneck and the reconstruction error features from the autoencoder for wav files inside a directory :param wav_directory: *.wav file with a sampling frequency of 16kHz :return: dictionary with the extracted bottleneck and error features, and with information about which frame coresponds to which wav file in the directory.

compute_global_features(wav_directory, stack_feat=False)

Compute global features (1 vector per utterance) both for the bottleneck and the reconstruction error features from the autoencoder for wav files inside a directory :param wav_directory: *.wav file with a sampling frequency of 16kHz :param stack_feat: if True, returns also a feature matrix with the stack of the bottleneck and error features :return: pandas dataframes with the bottleneck and error features.

compute_rec_error_features(wav_file, return_numpy=True)

Compute the reconstruction error features from the autoencoder :param wav_file: *.wav file with a sampling frequency of 16kHz :param return_numpy: return the features in a numpy array (True) or a Pytorch tensor (False) :returns: Pytorch tensor (nf, 128) or numpy array (nf, 128) with the extracted features. nf: number of frames

compute_rec_spectrogram(wav_file, return_numpy=True)

Compute the reconstructed spectrogram from the autoencoder :param wav_file: *.wav file with a sampling frequency of 16kHz :param return_numpy: return the features in a numpy array (True) or a Pytorch tensor (False) :returns: Pytorch tensor (N, C, F, T). N: batch of spectrograms extracted every 500ms, C: number of channels (1), F: number of Mel frequencies (128), T: time steps (126)

compute_spectrograms(wav_file)

Compute the tensor of Mel-scale spectrograms to be used as input for the autoencoders from a wav file :param wav_file: *.wav file with a sampling frequency of 16kHz :returns: Pytorch tensor (N, C, F, T). N: batch of spectrograms extracted every 500ms, C: number of channels (1), F: number of Mel frequencies (128), T: time steps (126)

destandard(tensor)

destandardize input tensor from the autoencoders :param tensor: standardized input tensor for the AEs (N, 128,126) :returns: destandardized tensor for the AEs (N, 128,126)

plot_spectrograms(wav_file)

Figure of the decoded spectrograms by the AEs :param wav_file: *.wav file with a sampling frequency of 16kHz

show_spectrograms(spectrograms)

Visualization of the computed tensor of Mel-scale spectrograms to be used as input for the autoencoders from a wav file :param spectrograms: tensor of spectrograms obtained from ‘’‘compute_spectrograms(wav-file)’‘’

standard(tensor)

standardize input tensor for the autoencoders :param tensor: input tensor for the AEs (N, 128,126) :returns: standardize tensor for the AEs (N, 128,126)