In this page we explain how to do basic entity extraction in Datafari.
In order to that we were inspired by the Lucidworks blog post : https://lucidworks.com/2013/06/27/poor-mans-entity-extraction-with-solr/
We put somme commented code in Datafari Enterprise to reuse it for customers.
The goal of this page is to explain the work already done and how to reuse it.
1. Solr configuration :
- Schema.xml
-- Fieldtype key_phrased added :
If you’ve got a select list of special terms or phrases for your domain that you’d like to make turn into facets and easily filterable to the documents that contain them, the field is useful.
-- Examples of fields created and commented : entity_phone, entity_phone_present, entity_people, entity_people_presen
-- Copyfield commented for filling entity_person field
- keep_phrases.txt
- DatafariUpdateProcessor.java
There is a section entity extraction added and commented. That needs to be uncommented to activate it.
The entity_person field is filled with a regex pattern. If we find a US phone number in the content field, we extract the expression and copy it on a specific field : entity_phone. We also put true to the field entity_phone_present.
2. UI configuration
To add the facets, uncomment the code entity extraction in :
-- Datafari/js/search.js
-- Datafari/searchView.jsp
-- Datafari/js/AjaxFranceLabs/widgets/SubClassResult.widget.js
See the blog post : http://www.francelabs.com/blog/?p=475&preview=true for more details.