This documentation gives you the steps to follow to add entity recognition to your Datafari using an external fastapi (giving access to spacy models in our case) to extract entities. This goes from how to add entity extraction to the indexation pipeline to using those entities in an autocomplete component to help users in their searches.
Setting up the fastapi server
Follow the documentation Setting up a server to host Spacy for Named Entity Recognition to setup this service.
Adding Entity Extraction to a Job
In this step, you need to create an instance of the transformation connector. Then you will need to add this transformation to your job and run your job to extract the entities. These steps are explained in https://datafari.atlassian.net/wiki/pages/resumedraft.action?draftId=2469920769.
Adding a Search Component and Search Handler in Solr for Autocomplete
In this section we will assume that the field storing the entities you want to get autocomplete on is entity_keyword.
To perform the following actions, you need to have access to the server hosting Datafari and be able to modify some files on it.
We will add the configuration we need in the files located in the folder ${DATAFARI_HOME}/solr/solrcloud/FileShare/conf/customs_solrconfig/
First, in the file solr/solrcloud/FileShare/conf/customs_solrconfig/custom_search_components.incl, add the following:
<searchComponent class="solr.SuggestComponent" name="suggestEntityKeywords"> <lst name="suggester"> <str name="name">suggesterEntityKeywords</str> <str name="suggestAnalyzerFieldType">text_general</str> <str name="lookupImpl">AnalyzingInfixLookupFactory</str> <str name="maxEdits">0</str> <str name="field">entity_keywords</str> <str name="highlight">false</str> <str name="dictionaryImpl">HighFrequencyDictionaryFactory</str> <str name="buildOnCommit">true</str> </lst> </searchComponent>
Second, in the file solr/solrcloud/FileShare/conf/customs_solrconfig/custom_search_components.incl, add the following:
<requestHandler class="org.apache.solr.handler.component.SearchHandler" name="/suggestKeywords"> <lst name="defaults"> <str name="suggest">true</str> <str name="suggest.dictionary">suggesterEntityKeywords</str> <str name="suggest.count">10</str> </lst> <arr name="components"> <str>suggestEntityKeywords</str> </arr> </requestHandler>
You can change the same of the component to match the type of entity you are working with. But be careful to be consistent in your changes. The suggest.dictionary property of the request handler makes reference to the name of the searchComponent for example. As you can also see in this configuration, the field property of the searchComponent refers to the field holding the entity type we are interested in.
We now need to load these configuration changes into zookeeper and make them effective in solr. To do so, go to the System Configuration Manager (Zookeeper) in the Datafari admin interface. There click the “PUSH” button and then once it is done the “REFRESH” button.
Now run the following command from your Datafari server (changing the /suggestKeywords part) to build the suggestion dictionary:
curl "localhost:8983/solr/FileShare/suggestKeywords?suggest.q=something&suggest.build=true"
Our search component and sarch handler should be ready to go at this stage. We will now configure Datafari to serve the new suggest endpoints and add an autocomplete suggester to DatafariUI.
Configuring Datafari to Serve the New Suggest Request Handler
As the request handler we added is one that is used for suggestions and autocomplete, we need to modify the file ${DATAFARI_HOME}/tomcat/conf/entity-autocomplete.properties.
Add an entry to the AUTOCOMPLETESUGGESTERS array with the following information:
{\ "i18nKey":"asKeywords", \ "serverUrl":"", \ "solrCore":"", \ "servlet":"suggestKeywords", \ "suggestComponent":"suggesterKeywords", \ "maxSuggest":2, \ "categoryKey":"keywords", \ "categoryi18nKey":"keywords", \ "cssClass":"",\ "activated": "false"\ }
The only information that are really important here are the servlet and seggestComponent keys. Other properties are used by the AjaxFranceLabs UI framework, but we are going to use DatafariUI so it does not matter.
Save the file once you are done. You should not need to restart anything for that change to be effective.
Adding an Autocomplete Suggester in DatafariUI
Now that everything is ready, we can add an autocomplete suggester in DatafariUI. To do so, open the ${DATAFARI_HOME}/www/ui-config.json file and modify the suggester array which is in the searchBar object:
"searchBar": { "suggesters": [ { "type": "BASIC", "props": { "maxSuggestion": 5, "title": "SUGGESTED QUERIES", "subtitle": "Queries extending your current query terms" } }, { "type": "ENTITY", "props": { "field": "entity_keywords", "suggester": "suggestKeywords", "dictionary": "suggesterEntityKeywords", "asFacet": false, "maxSuggestion": 5, "title": "Suggested keywords search", "subtitle": "Keywords matching the text you typed" } } ] }
As you can see, we added a suggester of type ENTITY which refers to the field and different components we need. We recommend you leave asFacet to false unless you know the exact behavior of this option. You can modify the maximum number of suggestion shown, the title and subtitle as you see fit. Title and subtitle text can also be added to the translation files if you need to display them in different languages (use the text you put in this ui-config file as a key, and the translated text as a value in each language file you need).
Result
That’s it, you now have an autocomplete section covering your entity type in DatafariUI