Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Ignore case: (?i)searched_word: retrieves “searched_word” regardless of character case.

  • Retrieve the line containing: .*searched_word.*

  • Search a point: \. “\” is the escape character.

  • Spaces are taken into account, so searching “word1 word2” will search the exact expression in the content.

  • e-mails: ([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})

  • Phone number: (\+)([\s.\(\)]*\d{1}){8,13}(-)?(\d{1,5})

  • Search “word1” or “word2”: word1|word2.

More examples of regex here : Regex typical use cases

Some example of use cases:

...

With the first line, the crawl will extract all email addresses from content, to store them into the multi-valued field “entity_email”.

The second lines indicates that if the document content contains at least one phone number, the “entity_phone_present” Solr field will be set to “true”. Otherwise, it will be set to “false”.

The third one allow the extraction of the first phone number appearing in the document, and store it into “entity_phone” field. If no phone number is found, this field won’t be added to the document.

Finally, the last line indicates that if a line from the document contains the expression “word”, then it will be added to the multi-valued field “entity_word”. If the expression is not found, then “entity_word” will be set to “No word here”.

Note

Warning! When using a destination field, make sure it exists in Solr, or create it if necessary. It has to be a multivalued field. Note that if you want to use it into facets, it has to be a “String” field.