ABAP Keyword Documentation →  ABAP − Reference →  Processing Internal Data →  Character String and Byte String Processing →  Expressions and Functions for String Processing →  Regular Expressions →  Syntax of Regular Expressions → 

Character string patterns

Character strings are represented by concatenations or operators.

Concatenations

Concatenations are valid regular expressions that are written after each other. If r and s are regular expressions, the concatenation rs matches all character strings that can be formed from the concatenation of character strings that match r and s.

Examples

The following table shows some results from the test.

Pattern Text match
H[aeu]llo Hello X
H[aeu]llo Hello X
H[aeu]llo Hullo X
H[aeu]llo Hollo -

H[aeu]llo is the concatenation of five regular expressions for single characters.

Operators for character strings

These operators are made up of the special characters {, }, *, +, ?, |, (, ), and \. The special characters can be made into literal characters using the prefix \ or by enclosing with \Q ... \E.

Chaining Operators

The operators {n}, {n,m}, *, +, and ? (where n and m are natural numbers, including zero) can be written directly after a regular expression r, and thus generate concatenations rrr... of the regular expression:

Note

For a regular expression with concatenation operators, the first rule is that where possible the entire expression must match. This rule restricts the length of character strings that match the concatenations with the operators *and + and therefore their greedy behavior.

Examples

The following table shows some results from the test.

pattern Text match
Hel{2}o Hello X
H.{4} Hello X
.{0,4} Hello -
.{4,} Hello X
.+H.+e.+l.+l.+o.+ Hello -
x*Hx*ex*lx*lx*ox* Hello X
l+ ll X

Example

The first partial expression a+ is compared with the first 5 characters "aaaaa" from text and the last character "a" from text is reserved for the second partial expressiona.

DATA TEXT type STRING.
DATA result_tab TYPE match_result_tab.

text = 'aaaaaa'.

FIND ALL OCCURRENCES OF REGEX '(a+)(a)'
     IN text RESULTS result_tab.

Alternatives

The operator | can be written between two regular expressions r and s, and thereby creates one single regular expression r|s, that matches both r as well as s.

Note

Concatenations and other operators form a stronger bond than |, in other words r|st and r|s+ are equivalent to r|(?:st) or r|(?:s+) but not to (?:r|s)t or (?:r|s)+.

Examples

The following table shows some results from the test.

Pattern Text match
H(e|a|u)llo Hello X
H(e|a|u)llo Hollo -
He|a|ullo Hello -
He|a|ullo ullo X

Subgroups

The operators ( ... ) and (?: ... ) group concatenations of regular expressions, forming a unit. This affects the scope of other operators such as * or |. Here, the regular expressions (r) and (?:r) correspond to the regular expression r. The difference between ( ... ) and (?: ... ) is that the operator ( ... ) saves the sequence parts found in tabs and (?: ... ) does not.

Note

The above-mentioned greedy behavior of concatenation operators also applies for sub groups – from left to right. This does not break the rule that generally requires the entire regular expression to match.

Examples

The following table shows some results from the test.

Pattern Text match
Tral+a Tralala -
Tr(al)+a Tralala X
Tr(?:al)+a Tralala X

In the first expression, the concatenation with the operator + affects the literal character l, and in the second and third expression it affects the subgroup al.

Subgroups with Registration

Besides its effect with regards to creating subgroups, the operator ( ... ) also saves the subsequences (when matching the regular expression to a character string) in the correct order in tabs. The subgroups ( ... ) of the expression match these subsequences. In this process, an operator \1, \2, \3, ... is assigned to each subgroup, which can be listed within the expression after its subgroup, and thus acts as a placeholder for the character string stored in the corresponding register. In text replacements, the special characters $1, $2, $3, ... can be used to access the last assignment to the register.

The number of subgroups and registers is only limited by the capacity of the platform.

Notes

The addition SUBMATCHES of the statements FIND and REPLACE and the eponymous column of the results table filled using the addition RESULTS can be used to access the content of all registers for a found location. The the class CL_ABAP_MATCHER contains the method GET_SUBMATCH for this purpose.

If you only want to use grouping and you do not want to save any subsequences in tabs, you can use the operator (?: ... ) instead of ( ... ). With regards to the generation of subgroups, both operators have the same effect. However, (?: ... ) does not save anything in tabs.

Examples

The following table shows some results from the test.

Pattern Text match
(["']).+\1 " Hello " X
(["']).+\1 "Hello' -
(["']).+\1 'Hello' X

The concatenation (["']).+\1 matches all text strings of which the first character is " or ' and the last character is the same as the first. For the two successful checks, the register receives the values " or '.

Example

The example demonstrates the greedy behavior of the operator + in subgroups and its relation to the primary rule that the entire regular expression must match where possible. The first subgroup takes up as many characters as possible "a". It is assigned the first 4 characters "aaaa". One character "a" remains for each of the other two subgroups.

DATA text TYPE string.
DATA result_tab TYPE match_result_tab.

text = 'aaaaaa'.

FIND ALL OCCURRENCES OF REGEX '(a+)(a+)(a+)'
     IN text RESULTS result_tab.

Literal Characters

The operators \Q ... \E form a character string of literal characters from all enclosed characters. Special characters have no effect in this character string.

The following table shows some results from the test.

pattern Text match
.+\w\d Special: \w\d -
.+\\w\\d Special: \w\d X
.+\Q\w\d\E Special: \w\d X

Reserved Enhancements

The character string (? ... ) is generally reserved for later language enhancements. Apart from the operators already supported – (?:...), (?=... ), (?!... ) and (?> ... ), this string raises the exceptionCX_SY_INVALID_REGEX.