...
Extract regex groups: If there is one or more groups in the regular expression and this option is enabled, then the value of the groups will be set in the destination metadata. If it is not checked, the complete match is stored in the destination metadata. The groups of a regular expression are defined between (
and )
. For example, this regular expression has one group: word1 (.*?) word2
. In this regular expression, we are searching for the part of a text starting with word1
and ending with word2
. Let's take this text: My word1 titi tata toto word2 and other words
. With the chosen regular expression mentioned above, if the checkbox is enabled, then the destination metadata = gets the value "titi tata toto"
. OtherwiseIf the checkbox is not enabled, the destination metadata = gets the value "word1 titi tata toto word2"
. With Now, if we change a little the regex to have several groups, the values will be added separated with space. For example, with this regexfor example: word1 (.?) tata (.?) word2
, . the result is: destination metadata = "titi toto"
. “titi” and “toto” are extracted and set in destination metadata separated by a space. This field is optional.
With several groups, the values will be added separated with space.
Note |
---|
|
...
But if you want to extract only the department of <part1>, this is not possible with the Regex Entity Connector as it processes a file line by line. The file above presents several lines (i.e. with return characters). The regex to extract department of <part1> could have been: <part1>\s<department>(.*?)</department>
. Even with \s
is used to take into account the return character expected in the regex with \s
can’t help, because to the line. But as the connector reads only one line by line, so a return character after <part1> means that only <part1> will be read when the regex is applied.
...