Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Info

...

Valid from 4.3

The documentation below is valid from Datafari 4.3 upwards

Datafari comes bundled with a simple entity extraction tool capable of:

  • Extracting names provided in a list from the documents

  • Extracting phone numbers from the documents (if they conform to a certain format)

  • Extracting custom "special" entities that match a provided regular expression

Warning

Configuration of the entity extraction tool must be done BEFORE the indexation.

Any change in the configuration after the first indexation will require a total wipe of the index, reloading the solr core FileShare and indexing the files again for it to take effect.

Note

Entity extraction takes place during indexation, meaning that activating this feature will have an impact on the indexation performances. The impact will depend on the number of features activated and the complexity (time needed to compute matches) of the regex used.

Activating and de-activating the Feature

The activation state of this feature can be managed from the Search Engine Configuration → Entity Extraction page in the administration panel of Datafari, which is presented in the image bellow and only in this location.Image Removed

...

From there you have a global switch to activate or not the whole feature, and then one switch to toggle the activation of each specific feature separately.

...

Edit the file solrconfig.xml and go to the section 

...

<processor class="com.francelabs.datafari.updateprocessor.DatafariUpdateProcessorFactory">

 which should look like this:

...

Note

Extraction is performed at indexation time against the content of the documents only.

A complex regex or a regex matching long text can have a negative impact on the indexing performances.

Check Simple entity extraction implementation - Enteprise Edition if you want some details on where is the code that manages entity extraction and display in Datafari.

...

Info

...

Valid from 4.1

The documentation below is valid from Datafari 4.1 upwards

Datafari comes bundled with a simple entity extraction tool capable of:

  • Extracting names provided in a list from the documents

  • Extracting phone numbers from the documents (if they conform to a certain format)

  • Extracting custom "special" entities that match a provided regular expression

Warning

Configuration of the entity extraction tool must be done BEFORE the indexation.

Any change in the configuration after the first indexation will require a total wipe of the index, reloading the solr core FileShare and indexing the files again for it to take effect.

Note

Entity extraction takes place during indexation, meaning that activating this feature will have an impact on the indexation performances. The impact will depend on the number of features activated and the complexity (time needed to compute matches) of the regex used.

Activating and de-activating the Feature

The activation state of this feature can be managed from the Search Engine Configuration → Entity Extraction page in the administration panel of Datafari, which is presented in the image bellow.Image Removed

...

From there you have a global switch to activate or not the whole feature, and then one switch to toggle the activation of each specific feature separately.

...

Note

Extraction is performed at indexation time against the content of the documents only.

A complex regex or a regex matching long text can have a negative impact on the indexing performances.

Check Simple entity extraction implementation - Enteprise Edition if you want some details on where is the code that manages entity extraction and display in Datafari.