Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

...

...

...

...

...

Valid from Datafari 4.1

In this page we explain how to do The basic entity extraction bundled in Datafari .In order to that we were is inspired by the following Lucidworks blog post : https://lucidworks.com/2013/06/27/poor-mans-entity-extraction-with-solr/

We put somme commented some code in Datafari Enterprise to reuse it for customers.

...

-- Fieldtype key_phrased added :

If you’ve got a select list of special terms or phrases for your domain that you’d like to make turn into facets and easily filterable to filter the documents that contain them, the field is will be useful.

-- Examples of fields created and commented to extract entities : entity_phone, entity_phone_present, entity_people, entity_people_presenpresent

-- Copyfield commented for filling used to fill entity_person field (may or may not be present wether the names entity extraction is activated or not in Datafari

  • keep_phrases.txt

A text file containing the entities to be identified in the documents when the names entity extraction feature is activates. One entity per line. It has been thought to be used to extract names but can be used to extract any list of phrases the user wants.


  • DatafariUpdateProcessor.java 

There is a section entity extraction added and commented. That needs to be uncommented to activate itin this class. Some checks are done against variables provided in the updateprocessor definition in solrconfig.xml to see if the feature is activated or not. This ensure that the code is not run is the feature is not activated.

The entity_person field is filled with a regex pattern. If we find a US phone number in the content field, we extract the expression and copy it on a specific field : entity_phone. We also put true to the field entity_phone_present.


2. UI configuration

To add The code for the facets , uncomment the code related to entity extraction can be found in :

-- Datafari/js/search.js

...

See the blog post : http://www.francelabs.com/blog/?p=475&preview=true for more details.

The page Basic Text Tagging at indexing and Searching time gives you the process to follow to configure the already implemented entity extraction in Datafari.