[DEPRECATED] Analytics (openDistro ELK) Configuration

Valid from 5.3

We are now using Apache Zeppelin with Logstash as a replacement to ELK and opendistro.


Valid from 4.5 up to 5.2

The documentation below is valid from Datafari v4.5 upwards

By default, aside from Kibana, Elasticsearch and Logstash are automatically configured by Datafari on the first start, to fit with your installation directory and be ready to run.

When you launch the initialization script (init-datafari.sh), the script asks you if you want to activate ELK directly. If you answer yes, ELK will be initialized and directly launched. You will not need to do the steps below on the ELK configuration admin UI.
If for some reasons (like an architecture decision), you move one or more of these components to a different place than the installation directory of Datafari, you must modify the ELK environment parameters located in [DATAFARI_HOME]/tomcat/conf/datafari.properties, [DATAFARI_HOME]/elk/scripts/set-elk-env.sh and maybe also the [DATAFARI_HOME]/elk/scripts/start-elk.sh and [DATAFARI_HOME]/elk/scripts/stop-elk.sh scripts.

Always let the default Kibana URI to http://localhost/app/kibana in monoserver or multiservers configuration. The only reason to change it is if you use external ELK. For more explanations: https://datafari.atlassian.net/wiki/spaces/DATAFARI/pages/20054029

The first time you run Datafari, go to the admin UI and to the 'ELK Configuration' page (Statistics/ELK Configuration). You will find this UI:

Let us go through the 4 options:

  • "ON/OFF" button : Should ELK start when Datafari is running ? (Note that if you swich it ON, it will immediately start ELK)

  • Kibana URI : The URI to reach Kibana. By default it is on the same server than the Datafari web app and must be reached by the Apache proxy so let the default IP to http://localhost/app/kibana. You only need to change it if you have externalised your ELK so you need to point to your Kibana host.

  • Monitoring User : This parameter should only be filled if you use ACLs for search. In that case, enter here the user used to crawl the files. Otherwise, ELK won't be able to generate statistics on the corpus.

  • External ELK : check the box if you use external ELK : if your ELK installation is not on the same server than the Datafari webapp

Once you have started ELK (be aware that the average time for ELK to be 100% operational is approx. 15 seconds on a "standard" PC), click on any statistic tab ('Usage Statistics', 'Corpus Statistics' or 'Corpus Over Time Statistics'), you will now see in a new browser tab the corresponding dashboard (you will need to authentify with an appropriate user first):


By default, aside from Kibana, Elasticsearch and Logstash are automatically configured by Datafari on the first start, to fit with your installation directory and be ready to run.
If for some reasons (like an architecture decision), you move one or more of these components to a different place than the installation directory of Datafari, you must modify the ELK environment parameters located in [DATAFARI_HOME]/elk/scripts/set-elk-env.sh and maybe also the [DATAFARI_HOME]/elk/scripts/start-elk.sh and [DATAFARI_HOME]/elk/scripts/stop-elk.sh scripts.

The first time you run Datafari, go to the admin UI and to the 'ELK Configuration' page (Statistics/ELK Configuration). You will find this UI:

Let us go through the 3 options:

  • "ON/OFF" button : Should ELK start when Datafari is running ? (Note that if you swich it ON, it will immediately start ELK)

  • Kibana URI : The URI to reach Kibana. By default it is the real IP address of your Datafari server, so if you have externalised your ELK, you need to point to your Kibana host.

  • Monitoring User : This parameter should only be filled if you use ACLs for search. In that case, enter here the user used to crawl the files. Otherwise, ELK won't be able to generate statistics on the corpus.

Once you have started ELK (be aware that the average time for ELK to be 100% operational is approx. 15 seconds on a "standard" PC), click on any statistic tab ('Usage Statistics', 'Corpus Statistics' or 'Corpus Over Time Statistics'), you will now see the corresponding dashboard: