mindspore.ops.CTCLoss

class mindspore.ops.CTCLoss(*args, **kwargs)[source]

Calculates the CTC (Connectionist Temporal Classification) loss and the gradient.

The CTC algorithm is proposed in Connectionist Temporal Classification: Labeling Unsegmented Sequence Data with Recurrent Neural Networks.

Parameters
  • preprocess_collapse_repeated (bool) – If true, repeated labels will be collapsed prior to the CTC calculation. Default: False.

  • ctc_merge_repeated (bool) – If false, during CTC calculation, repeated non-blank labels will not be merged and these labels will be interpreted as individual ones. This is a simplfied version of CTC. Default: True.

  • ignore_longer_outputs_than_inputs (bool) – If true, sequences with longer outputs than inputs will be ignored. Default: False.

Inputs:
  • inputs (Tensor) - The input Tensor must be a 3-D tensor whose shape is (max_time, batch_size, num_classes). num_classes must be num_labels + 1 classes, num_labels indicates the number of actual labels. Blank labels are reserved. Default blank label is num_classes - 1. Data type must be float16, float32 or float64.

  • labels_indices (Tensor) - The indices of labels. labels_indices[i, :] == [b, t] means labels_values[i] stores the id for (batch b, time t). The type must be int64 and rank must be 2.

  • labels_values (Tensor) - A 1-D input tensor. The values are associated with the given batch and time. The type must be int32. labels_values[i] must in the range of [0, num_classes).

  • sequence_length (Tensor) - A tensor containing sequence lengths with the shape of (batch_size). The type must be int32. Each value in the tensor must not be greater than max_time.

Outputs:
  • loss (Tensor) - A tensor containing log-probabilities, the shape is (batch_size). The tensor has the same type with inputs.

  • gradient (Tensor) - The gradient of loss, has the same type and shape with inputs.

Raises
  • TypeError – If preprocess_collapse_repeated, ctc_merge_repeated or ignore_longer_outputs_than_inputs is not a bool.

  • TypeError – If inputs, labels_indices, labels_values or sequence_length is not a Tensor.

  • TypeError – If dtype of inputs is not one of the following: float16, float32 or float64.

  • TypeError – If dtype of labels_indices is not int64.

  • TypeError – If dtype of labels_values or sequence_length is not int32.

Supported Platforms:

Ascend GPU CPU

Examples

>>> np.random.seed(0)
>>> inputs = Tensor(np.random.random((2, 2, 3)), mindspore.float32)
>>> labels_indices = Tensor(np.array([[0, 0], [1, 0]]), mindspore.int64)
>>> labels_values = Tensor(np.array([2, 2]), mindspore.int32)
>>> sequence_length = Tensor(np.array([2, 2]), mindspore.int32)
>>> ctc_loss = ops.CTCLoss()
>>> loss, gradient = ctc_loss(inputs, labels_indices, labels_values, sequence_length)
>>> print(loss)
[ 0.7864997  0.720426 ]
>>> print(gradient)
[[[ 0.30898064  0.36491138  -0.673892  ]
  [ 0.33421117  0.2960548  -0.63026595 ]]
 [[ 0.23434742  0.36907154  0.11261538 ]
  [ 0.27316454  0.41090325  0.07584976 ]]]