Page Comparison

...

Extract regex groups: If there is one or more groups in the regular expression and this option is enabled, then the value of the groups will be set in the destination metadata. If it is not checked, the complete match is stored in the destination metadata. The groups of a regular expression are defined between ( and ). For example, this regular expression has one group: word1 (.*?) word2. In this regular expression, we are searching for the part of a text starting with word1and ending with word2. Let's take this text: My word1 titi tata toto word2 and other words. With the regular expression mentioned above, if the checkbox is enabled, then the destination metadata gets the value "titi tata toto". If the checkbox is not enabled, the destination metadata gets the value "word1 titi tata toto word2". Now, if we change a little the regex to have several groups, for example: word1 (.?) tata (.?) word2. , the result is: destination metadata = "titi toto". “titi” and “toto” are extracted and set in destination metadata, separated by a spacewhitespace. This field is optional.

With several groups, the values will be added separated with space.

Note

The source metadata must be “content”, “url” or any ManifoldCF document field provided by your Repository Connector.
The destination metadata must exist in Solr environment (no check done).
Depending on your regular expression, several different values may be found in a document, so the metadata receiving the results must be multi-valued, otherwise it will contain the last match found.
If the checkbox “Keep only one value” is set to true or if “Value if true” is specified, then only one value will be used.
This connector processes a file line by line: a line is defined by a end of line character or limited to a capacity of 65536 bytes.

You can add as many destination metadata, regular expression and source metadata as you want by clicking on the Add button.

...

Ignore case: (?i)searched_word: retrieves “searched_word” regardless of character case.
Retrieve the line containing: .*searched_word.*
Search a point: \. “\” is the escape character.
Spaces are taken into account, so searching “word1 word2” will search the exact expression in the content.
e-mails: ([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})
Phone number: (\+)([\s.\(\)]*\d{1}){8,13}(-)?(\d{1,5})
Search “word1” or “word2”: word1|word2.
For exemples using regex groups, see the section above about the “Extract regex groups” option.

More examples of regex here : Regex typical use cases

...

Version	Old Version 30	New Version Current
Changes made by	Guylaine BASSETTE	Cedric
Saved on	29 Jan, 2025	29 Jan, 2025

Versions Compared

Key