Info |
---|
...
Valid from 4.3The documentation below is valid from Datafari 4.3 upwards |
Datafari comes bundled with a simple entity extraction tool capable of:
Extracting names provided in a list from the documents
Extracting phone numbers from the documents (if they conform to a certain format)
Extracting custom "special" entities that match a provided regular expression
Warning |
---|
Configuration of the entity extraction tool must be done BEFORE the indexation. Any change in the configuration after the first indexation will require a total wipe of the index, reloading the solr core FileShare and indexing the files again for it to take effect. |
Note |
---|
Entity extraction takes place during indexation, meaning that activating this feature will have an impact on the indexation performances. The impact will depend on the number of features activated and the complexity (time needed to compute matches) of the regex used. |
Activating and de-activating the Feature
The activation state of this feature can be managed from the Search Engine Configuration → Entity Extraction page in the administration panel of Datafari, which is presented in the image bellow and only in this location.
...
From there you have a global switch to activate or not the whole feature, and then one switch to toggle the activation of each specific feature separately.
...
Edit the file solrconfig.xml and go to the section
...
<processor class="com.francelabs.datafari.updateprocessor.DatafariUpdateProcessorFactory">
which should look like this:
...
Note |
---|
Extraction is performed at indexation time against the content of the documents only. A complex regex or a regex matching long text can have a negative impact on the indexing performances. |
Check Simple entity extraction implementation - Enteprise Edition if you want some details on where is the code that manages entity extraction and display in Datafari.
...
Info |
---|
...
Valid from 4.1The documentation below is valid from Datafari 4.1 upwards |
Datafari comes bundled with a simple entity extraction tool capable of:
Extracting names provided in a list from the documents
Extracting phone numbers from the documents (if they conform to a certain format)
Extracting custom "special" entities that match a provided regular expression
Warning |
---|
Configuration of the entity extraction tool must be done BEFORE the indexation. Any change in the configuration after the first indexation will require a total wipe of the index, reloading the solr core FileShare and indexing the files again for it to take effect. |
Note |
---|
Entity extraction takes place during indexation, meaning that activating this feature will have an impact on the indexation performances. The impact will depend on the number of features activated and the complexity (time needed to compute matches) of the regex used. |
Activating and de-activating the Feature
The activation state of this feature can be managed from the Search Engine Configuration → Entity Extraction page in the administration panel of Datafari, which is presented in the image bellow.
...
From there you have a global switch to activate or not the whole feature, and then one switch to toggle the activation of each specific feature separately.
...
Note |
---|
Extraction is performed at indexation time against the content of the documents only. A complex regex or a regex matching long text can have a negative impact on the indexing performances. |
Check Simple entity extraction implementation - Enteprise Edition if you want some details on where is the code that manages entity extraction and display in Datafari.