...
...
...
...
...
...
Info |
---|
Valid from Datafari 4.1 |
...
in EE and valid from Datafari 5 for CE |
The basic entity extraction bundled in Datafari .In order to that we were is inspired by the following Lucidworks blog post : https://lucidworks.com/2013/06/27/How to Perform Entity Extraction in Solr | Lucidworks
View file | ||
---|---|---|
|
We put somme commented some code in Datafari Enterprise to reuse it for customers.
The goal of this page is to explain the work already done and how to reuse it.
1. Solr configuration :
Schema.xml
-- Fieldtype key_phrased
added :
If you’ve got a select list of special terms or phrases for your domain that you’d like to make turn into facets and easily filterable to filter the documents that contain them, the field is will be useful.
-- Examples of fields created and commented to extract entities : entity_phone
, entity_phone_present
, entity_people
, entity_people_
presenpresent
-- Copyfield commented for filling used to fill entity_person
field (may or may not be present wether the names entity extraction is activated or not in Datafari)
keep_phrases.txt
...
A text file containing the entities to be identified in the documents when the names entity extraction feature is activates. One entity per line. It has been thought to be used to extract names but can be used to extract any list of phrases the user wants.
DatafariUpdateProcessor.java
There is a section entity extraction added and commented. That needs to be uncommented to activate itin this class. Some checks are done against variables provided in the updateprocessor definition in solrconfig.xml
to see if the feature is activated or not. This ensure that the code is not run is the feature is not activated.
The entity_person
field is filled with a regex pattern. If we find a US phone number in the content field, we extract the expression and copy it on a specific field : entity_phone
. We also put true to the field entity_phone_present
.
2. UI configuration To add the facets, uncomment the code entity extraction - Only for Ajaxfrancelabs (check DatafariUI config documentation for DatafariUI)
The code for the facets related to entity extraction can be found in :
-- Datafari/js/search.js
-- Datafari/searchView.jsp
-- Datafari/js/AjaxFranceLabs/widgets/SubClassResult.widget.js
See the blog post : http://www.francelabs.com/blog/?p=475&preview=true for more details.
The page Basic Text Tagging at indexing and Searching time gives you the process to follow to configure the already implemented entity extraction in Datafari.