Info |
---|
Valid from 5.0The documentation below is valid from Datafari 5.0 upwards |
For now Datafari is preconfigured to display its menu and functionalities (not to be mixed up with the languages that can be analysed at the indexing phase) for English, French, German, Italian, Arabic, Brazilian Portuguese and Russian. See further below for the default languages that can be analysed at the indexing and search phases.
Step-by-step guide for an existing Datafari
...
For the internationalization of the language detection, indexing and search by the Datafari Solr engine :
Modify the dedicated Solr updateprocessor which is declared in the /opt/datafari/solr/solrcloud/FileShare/conf/solrconfig.xml, that you can find at updateRequestProcessorChainDatafari, which detects the languages based on the fields content and title. By default, we use English, French and German. In order to add a new language, add the new language to the element "langid.whitelist" :
Code Block <processor class="org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessorFactory"> <str name="langid.fl">content,title</str> <str name="langid.langField">language</str> <str name="langid.map">true</str> <str name="langid.whitelist">en,fr,de</str> <str name="langid.fallback">en</str> </processor>
Modify the Solr schema to handle the new language
Add the stopwords file specific to your language. In order to do so, download the Solr version of current version of Datafari here : https://archive.apache.org/dist/lucene/solr/8.5.2/solr-8.5.2.tgz then open the folder solr-8.5.2/server/solr/configsets/_default/conf/lang and search the stopwords file corresponding to your language.
Then add it to this location into your Datafari server : /opt/datafari/solr/solrcloud/FileShare/conf/lang/
Then you need to push the file to Zookeeper. To do so, go to the admin UI of Datafari and click on Search Engine Configuration / System Configuration manager menu then click on push my modifications button then apply my modifications (see the page System Configuration Manager (Zookeeper) for more explanations)Add the stopwords file specific to your language. In order to do so, download the Solr version of current version of Datafari here : https://archive.apache.org/dist/lucene/solr/8.5.2/solr-8.5.2.tgz then open the folder solr-8.5.2/server/solr/configsets/_default/conf/lang and search the stopwords file corresponding to your language.
Then add it to this location into your Datafari server : /opt/datafari/solr/solrcloud/FileShare/conf/lang/
Then you need to push the file to Zookeeper. To do so, go to the admin UI of Datafari and click on Search Engine Configuration / System Configuration manager menu then click on push my modifications button then apply my modifications (see the page System Configuration Manager (Zookeeper) for more explanations)Add the fieldtype of your language. In order to do so, download the Solr version of current version of Datafari here : https://archive.apache.org/dist/lucene/solr/8.5.2/solr-8.5.2.tgz then open the file solr-8.5.2/server/solr/configsets/_default/conf/managed-schema and search the fieldtype corresponding to your language.
For example, if we want to add Russian language, the Solr configuration is this one :Code Block <!-- Russian --> <fieldType name="text_ru" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_ru.txt" format="snowball" /> <filter class="solr.SnowballPorterFilterFactory" language="Russian"/> <!-- less aggressive: <filter class="solr.RussianLightStemFilterFactory"/> --> </analyzer> </fieldType>
We have to add it into our Datafari server. Edit the file /opt/datafari/solr/solrcloud/FileShare/conf/customs_schema/custom_fieldTypes.incl and adapt the previous code to put into it (you can look at the file /opt/datafari/solr/solrcloud/FileShare/conf/customs_schema/custom_fieldTypes.incl.example to help you) :
We have to add itCode Block { "name":"text_ru", "class":"solr.TextField", "positionIncrementGap":"100", "analyzer" : { "tokenizer":{ "class":"solr.StandardTokenizerFactory" }, "filters":[{ "class":"solr.LowerCaseFilterFactory" }, { "class":"solr.StopFilterFactory","ignoreCase":"true","words":"lang/stopwords_ru.txt","format":"snowball" }, { "class":"solr.SnowballPorterFilterFactory","language":"Russian" <tokenizer class="solr.StandardTokenizerFactory"/> }] } <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_ru.txt" format="snowball" /> <filter class="solr.SnowballPorterFilterFactory" language="Russian"/> <!-- less aggressive: <filter class="solr.RussianLightStemFilterFactory"/> --> </analyzer> </fieldType>
}
Add the new fieldType to Solr by launching the script addCustomSchemaInfo.sh located into /opt/datafari/solr/solrcloud/FileShare/conf/customs_schema :
Code Block cd /opt/datafari/solr/solrcloud/FileShare/conf/customs_schema bash addCustomSchemaInfo.sh
Add fields specific to your new language :
You will notice that we have two fields which are language specific, namely content and title. Therefore, we have "content_en", "title_en", "content_fr", "title_fr", “title_de”, “content_de”. You need to create your specific "content_xy" and "title_xy" fields for your new language and configure them with the fieldType you just added previously. We want them to be indexed, stored and multiValued.
We have to add them into our Datafari server. Edit the file /opt/datafari/solr/solrcloud/FileShare/conf/customs_schema/custom_fieldTypesfields.incl and adapt the previous code to put into it create the two fields like below (you can look at the file /opt/datafari/solr/solrcloud/FileShare/conf/customs_schema/custom_fieldTypesfields.incl.example to help you). For example here is the config for Russian language :Code Block { "name":"title_ru", "nametype":"text_ru", "classstored":"solr.TextField"true, "positionIncrementGapmultiValued":"100true", "analyzer" :indexed":"true", } && { "name":"title_ru", "tokenizertype":{"text_ru", "stored":true, "classmultiValued":"solr.StandardTokenizerFactorytrue" }, "filtersindexed":[{ "class":"solr.LowerCaseFilterFactory" }, { "class":"solr.StopFilterFactory","ignoreCase":"true","words":"lang/stopwords_ru.txt","format":"snowball" },"true", }
Add the new fieldType to Solr by launching the script addCustomSchemaInfo.sh located into /opt/datafari/solr/solrcloud/FileShare/conf/customs_schema :
Code Block cd /opt/datafari/solr/solrcloud/FileShare/conf/customs_schema bash addCustomSchemaInfo.sh
Modify the dedicated Solr updateprocessor which is declared in the /opt/datafari/solr/solrcloud/FileShare/conf/solrconfig.xml, that you can find at updateRequestProcessorChainDatafari, which detects the languages based on the fields content and title. By default, we use English, French and German. In order to add a new language, add the new language to the element "langid.whitelist" :
{ "class":"solr.SnowballPorterFilterFactory","language":"Russian"Code Block <processor class="org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessorFactory">
}]<str name="langid.fl">content,title</str>
} }<str name="langid.langField">language</str>
Add the new fieldType to Solr by launching the script addCustomSchemaInfo.sh located into /opt/datafari/solr/solrcloud/FileShare/conf/customs_schema :
Code Block cd /opt/datafari/solr/solrcloud/FileShare/conf/customs_schema bash addCustomSchemaInfo.sh
Add fields specific to your new language :
You will notice that we have two fields which are language specific, namely content and title. Therefore, we have "content_en", "title_en", "content_fr", "title_fr", “title_de”, “content_de”. You need to create your specific "content_xy" and "title_xy" fields for your new language and configure them with the fieldType you just added previously.Modify the searchrequesthandler named select in the DATAFARI_HOME/solr/solr_home/FileShare/conf/solrconfig.xml. There, change the parameters qf et pf : put the following new fields: title_xy and content_xy to the existing chain of parameters.
- Now you can restart your Datafari for the changes to be taken into account.
<str name="langid.map">true</str> <str name="langid.whitelist">en,fr,de</str> <str name="langid.fallback">en</str> </processor>
Then you need to push your changes to Zookeeper. To do so, go to the admin UI of Datafari and click on Search Engine Configuration / System Configuration manager menu then click on push my modifications button then apply my modifications (see the page System Configuration Manager (Zookeeper) for more explanations)
Finally change the Index fields relevancy weights Configuration. Indeed you need to add your new fields : title_xy and content_xy into the search algorithm. You can do it directly by the Datafari Admin UI : go to Search Engine Configuration → Fields Weight. Then click on the button add a field to add the two fields. For example for Russian we add :
title_ru 50
content_ru 10
(more explanations here : Index fields relevancy weights Configuration
Add the fieldtype of your language. In order to do so, download the Solr version of current version of Datafari here : https://archive.apache.org/dist/lucene/solr/8.5.2/solr-8.5.2.tgz then open the file solr-8.5.2/server/solr/configsets/_default/conf/managed-schema and search the fieldtype corresponding to your language.
For example, if we want to add Russian language, the Solr configuration is this one :
Solr schema to handle the new language
...
Info |
---|
Valid from 4.4The documentation below is valid from Datafari 4.4 upwards |
For now Datafari is preconfigured to display its menu and functionalities (not to be mixed up with the languages that can be analysed at the indexing phase) for English, French, German, Italian, Arabic, Brazilian Portuguese and Russian. See further below for the default languages that can be analysed at the indexing and search phases.
Step-by-step guide
For the internationalization of the user interface, we use the i18n java library:
...