![]() |
TYPO3
7.6
|
Public Member Functions | |
__construct () | |
split2Words ($wordString) | |
addWords (&$words, &$wordString, $start, $len) | |
get_word (&$str, $pos=0) | |
utf8_is_letter (&$str, &$len, $pos=0) | |
charType ($cp) | |
utf8_ord (&$str, &$len, $pos=0, $hex=false) | |
Public Attributes | |
$debug = false | |
$debugString = '' | |
$csObj | |
$lexerConf | |
Lexer class for indexed_search A lexer splits the text into words
addWords | ( | & | $words, |
& | $wordString, | ||
$start, | |||
$len | |||
) |
Add word to word-array This function should be used to make sure CJK sequences are split up in the right way
array | $words | Array of accumulated words |
string | $wordString | Complete Input string from where to extract word |
int | $start | Start position of word in input string |
int | $len | The Length of the word string from start position |
Definition at line 117 of file Lexer.php.
References Lexer\charType(), and Lexer\utf8_ord().
Referenced by Lexer\split2Words().
charType | ( | $cp | ) |
Determine the type of character
int | $cp | Unicode number to evaluate |
Definition at line 260 of file Lexer.php.
Referenced by Lexer\addWords(), and Lexer\utf8_is_letter().
get_word | ( | & | $str, |
$pos = 0 |
|||
) |
Get the first word in a given utf-8 string (initial non-letters will be skipped)
string | $str | Input string (reference) |
int | $pos | Starting position in input string |
Definition at line 164 of file Lexer.php.
References Lexer\utf8_is_letter().
Referenced by Lexer\split2Words().
split2Words | ( | $wordString | ) |
Splitting string into words. Used for indexing, can also be used to find words in query.
string | String with UTF-8 content to process. |
Definition at line 73 of file Lexer.php.
References Lexer\addWords(), debug(), and Lexer\get_word().
utf8_is_letter | ( | & | $str, |
& | $len, | ||
$pos = 0 |
|||
) |
See if a character is a letter (or a string of letters or non-letters).
string | $str | Input string (reference) |
int | $len | Byte-length of character sequence (reference, return value) |
int | $pos | Starting position in input string |
Definition at line 189 of file Lexer.php.
References Lexer\charType(), elseif, GeneralUtility\inList(), and Lexer\utf8_ord().
Referenced by Lexer\get_word().
utf8_ord | ( | & | $str, |
& | $len, | ||
$pos = 0 , |
|||
$hex = false |
|||
) |
Converts a UTF-8 multibyte character to a UNICODE codepoint
string | $str | UTF-8 multibyte character string (reference) |
int | $len | The length of the character (reference, return value) |
int | $pos | Starting position in input string |
bool | $hex | If set, then a hex. number is returned |
Definition at line 287 of file Lexer.php.
Referenced by Lexer\addWords(), and Lexer\utf8_is_letter().