nltk.tokenize.BlanklineTokenizer¶
- class nltk.tokenize.BlanklineTokenizer[source]¶
Bases:
RegexpTokenizer
Tokenize a string, treating any sequence of blank lines as a delimiter. Blank lines are defined as lines containing no characters, except for space or tab characters.
- span_tokenize(text)¶
Identify the tokens using integer offsets
(start_i, end_i)
, wheres[start_i:end_i]
is the corresponding token.- Return type
Iterator[Tuple[int, int]]
- span_tokenize_sents(strings: List[str]) Iterator[List[Tuple[int, int]]] ¶
Apply
self.span_tokenize()
to each element ofstrings
. I.e.:return [self.span_tokenize(s) for s in strings]
- Yield
List[Tuple[int, int]]
- Parameters
strings (List[str]) –
- Return type
Iterator[List[Tuple[int, int]]]