New Language Configuration
Valid from 5.0
The documentation below is valid from Datafari 5.0 upwards
For now Datafari is preconfigured to display its menu and functionalities (not to be mixed up with the languages that can be analysed at the indexing phase) for English, French, German, Italian, Arabic, Brazilian Portuguese and Russian. See further below for the default languages that can be analysed at the indexing and search phases.
Step-by-step guide for an existing Datafari
For the internationalization of the user interface :
The folder to store the i18n config files is here : /opt/datafari/tomcat/webapps/Datafari/resources/js/AjaxFranceLabs/locale. Just open it and add your new language. For example if you add German translation, put here a file named de.json.
Also add a new entry in all the i18n files corresponding to your new language. For example to add Spanish language, we added the following entry in all the i18n files :"es_locale" : "Español"
For now, we need to add the new language in a Java class. So you will need to download the source code of Datafari, modify the class and upload it to your Datafari server. The class that needs to be modified is this one : located in the datafari-webapp module. You need to modify the following line by adding your language :
public static final List<String> availableLanguages = Arrays.asList("en", "fr", "it", "ar", "ru");
Then recompile the source code. When you are done, just send back the file $YOUR_DATAFARI_PROJECT/datafari-webapp/target/classes/com/francelabs/datafari/utils/LanguageUtils.class into your Datafari server to this location : /opt/datafari/tomcat/webapps/Datafari/WEB-INF/classes/com/francelabs/datafari/utils.
Then you need to restart Datafari to load your changes.Add the new language in /opt/datafari/tomcat/webapps/Datafari/resources/js/AjaxFranceLabs/i18njs.js :
availableLanguages : [ 'en', 'fr', 'it', 'ar', 'ru' ],
Create a file with the new language : LOCALE.json , for example es.json into the folder /opt/datafari/tomcat/webapps/Datafari/resources/customs/i18n
Add this content into it :
NB : no error on the content : it is really only : {}
For the internationalization of the language detection, indexing and search by the Datafari Solr engine :
Modify the Solr schema to handle the new language
Add the stopwords file specific to your language. In order to do so, download the Solr version of current version of Datafari here : then open the folder solr-8.5.2/server/solr/configsets/_default/conf/lang and search the stopwords file corresponding to your language.
Then add it to this location into your Datafari server : /opt/datafari/solr/solrcloud/FileShare/conf/lang/
Then you need to push the file to Zookeeper. To do so, go to the admin UI of Datafari and click on Search Engine Configuration / System Configuration manager menu then click on push my modifications button then apply my modifications (see the page System Configuration Manager (Zookeeper) for more explanations)Add the fieldtype of your language. In order to do so, download the Solr version of current version of Datafari here : then open the file solr-8.5.2/server/solr/configsets/_default/conf/managed-schema and search the fieldtype corresponding to your language.
For example, if we want to add Russian language, the Solr configuration is this one :We have to add it into our Datafari server. Edit the file /opt/datafari/solr/solrcloud/FileShare/conf/customs_schema/custom_fieldTypes.incl and adapt the previous code to put into it (you can look at the file /opt/datafari/solr/solrcloud/FileShare/conf/customs_schema/custom_fieldTypes.incl.example to help you) :
Add the new fieldType to Solr by launching the script located into /opt/datafari/solr/solrcloud/FileShare/conf/customs_schema :
Add fields specific to your new language :
You will notice that we have two fields which are language specific, namely content and title. Therefore, we have "content_en", "title_en", "content_fr", "title_fr", “title_de”, “content_de”. You need to create your specific "content_xy" and "title_xy" fields for your new language and configure them with the fieldType you just added previously. We want them to be indexed, stored and multiValued.
We have to add them into our Datafari server. Edit the file /opt/datafari/solr/solrcloud/FileShare/conf/customs_schema/custom_fields.incl and create the two fields like below (you can look at the file /opt/datafari/solr/solrcloud/FileShare/conf/customs_schema/custom_fields.incl.example to help you). For example here is the config for Russian language :Add the new fieldType to Solr by launching the script located into /opt/datafari/solr/solrcloud/FileShare/conf/customs_schema :
Modify the dedicated Solr updateprocessor which is declared in the /opt/datafari/solr/solrcloud/FileShare/conf/solrconfig.xml, that you can find at updateRequestProcessorChainDatafari, which detects the languages based on the fields content and title. By default, we use English, French and German. In order to add a new language, add the new language to the element "langid.whitelist" :
Then you need to push your changes to Zookeeper. To do so, go to the admin UI of Datafari and click on Search Engine Configuration / System Configuration manager menu then click on push my modifications button then apply my modifications (see the page System Configuration Manager (Zookeeper) for more explanations)
Finally change the Index fields relevancy weights Configuration. Indeed you need to add your new fields : title_xy and content_xy into the search algorithm. You can do it directly by the Datafari Admin UI : go to Search Engine Configuration → Fields Weight. Then click on the button add a field to add the two fields. For example for Russian we add :
title_ru 50
content_ru 10
(more explanations here : Index fields relevancy weights Configuration