[DEPRECATED] ELK

Deprecated

This documentation is deprecated as of Datafari version 5.0

SECURITY WARNING

In the community version, ELK is not secured. This means that anyone that knows the url to access the dashboards, will see everything. It is up to you to secure your ELK environment: some of your options can be to switch to the Datafari Enterprise edition, or to do it yourself, or to disable ELK.

Feature only available from the v2.2 of Datafari

ELK configuration notes

By default, aside from Kibana, Elasticsearch and Logstash are automatically configured by Datafari on the first start, to fit with your installation directory and be ready to run.
If for some reasons (like an architecture decision), you move one or more of these components to a different place than the installation directory of Datafari, you must modify the ELK environment parameters located in [DATAFARI_HOME]/elk/scripts/set-elk-env.sh and maybe also the [DATAFARI_HOME]/elk/scripts/start-elk.sh and [DATAFARI_HOME]/elk/scripts/stop-elk.sh scripts.

Furthermore, if you want to manage by yourself the instance of ELK you just need to set the ELKactivation property to "true" in [DATAFARI_HOME]/tomcat/conf/datafari.properties and enter the correct Kibana URI in the ELK configuration page of the admin UI

The 2.2 version of Datafari comes with an Elasticsearch Logstash Kibana layout in order to bring customizable monitoring and analytic views on Datafari.

In order to use the ELK layout, you need to configure it first !

Two new kind of logs are generated each in separate files, in order to be exploited in Kibana:

The diagram below shows the flow between Datafari and Kibana:

As Datafari writes lines in the log files, Logstash, in near real time, catches each new line and insert it in the "statistic/logs" or the "monitoring/logs" index of Elasticsearch, according to the log line comes from the "datafari.statistic.log" file, or the "datafari-monitoring.log".

The "logstash.conf" file which describe the path of the input log files and how to process the lines and insert them into Elasticsearch can be found in the Logstash main repository. Refer to the Logstash documentation for a deep understanding on how it works.
You will also find the templates used for each Elastisearch index in the "templates" directory under the Logstash main directory. The files are called "datafari-statistic-template.json" and "datafari-monitoring-template.json".

Let's describe how the flows works for each kind of log:

Statistics logs
Thoses logs give informations about the queries and actions performed in Datafari by users. For each query or action, a log line is written in the datafari.statistic.log file, Logstash detects the new line, applies some filters (defined in the logstash.conf file), and inserts it as a document in the statistic/logs index of Elasticsearch.
As described in the Statistics logs page, an id is generated for each query (the actions related to a query, like clicking on a result, keep the id of the query) by Datafari. This id is used as the document id in Elasticsearch. By doing this, one has the guarantee, in Elasticsearch and then Kibana, to not have duplicates data for the same query/action.
So, if several lines in the log file are related to the same query/action, they will be inserted in the same document in Elasticsearch, which consequently means that the document in Elasticsearch will correspond to the most recent log line. However, this does not mean that informations are lost, because a log line contains the history of the query and his related actions.
Thanks to this flow, you can see what kind of charts you can create on the Usage Analytics [DEPRECATED] page.
Core Monitoring logs
Those logs give informations on the content of the main Solr core of Datafari at a fixed rate. The default rate is once per hour and currently can only be changed by modifying it in the code.
So, at each iteration, Datafari performs some queries on Solr, format the results and writes several logs lines in the datafari-monitoring.log file. In near real time, Logstash detects the new lines, applies some filters (defined in the logstash.conf file), and inserts them as documents in the monitoring/logs index of Elasticsearch.
As described in the Core Monitoring logs page, a log line contains an id which is based on the facet value, the facet field and the time event unity. This id is used as the document id in Elasticsearch.
The time event, which is "daily" by default, correspond to the unity that will be used to visualize data in Kibana. For example, with the default "daily" unity, you will have one Elasticsearch document by facet value and field, by day. If you set the time event unity to "hourly" you will have one Elasticsearch document by facet value and field by hour.
This means that as long as a new time event has not started, for each new log line related to a couple of facet value and field, only one document will be created/updated. When a new time event starts, a new document is created/updated for the same couple of facet value and field.
Concretely, a daily time event unit allows such Kibana visualizations:
As you can see on the time based line chart on the left, which represents the document type distribution over the time, we only have one dot per doc type per day, so, for the last 5 days, 5 values for each facet value of the face field "extension".
On the global doc type distribution pie chart on the right, one have the global distribution of document types in the Solr core. This chart is not based on time unlike the other one, so, to be able to have a correct distribution view, one need to set the time frame of Kibana to the time event unity evoked before : the current day
Notice that, as our monitoring log iterations are hourly (by default), our data are updated each hour without having more than one dot per day in the time base line chart.
Now you have a better understanding on the time event unity that you can change in the code and his effect on Elasticsearch and the visualization of data in Kibana. Up to you to set you own time event unity and adapt your visualizations in consequence.
You can find more example of time based charts and "normal" charts you can create thanks to this flow in the Content Analysis Over Time and Content Analysis pages.