mindspore.dataset.text.transforms.PythonTokenizer¶
-
class
mindspore.dataset.text.transforms.
PythonTokenizer
(tokenizer)[source]¶ Callable class to be used for user-defined string tokenizer.
- Parameters
tokenizer (Callable) – Python function that takes a str and returns a list of str as tokens.
Examples
>>> def my_tokenizer(line): ... return line.split() >>> text_file_dataset = text_file_dataset.map(operations=text.PythonTokenizer(my_tokenizer))