Info |
---|
Valid from Datafari 5.3 This documentation is only valid for Datafari 5.3 upwards |
Note |
---|
Before starting to use Apache Zeppelin notebooks be sure to read the section 3. Apache Zeppelin Notebooks traps |
Datafari 5.3 has dropped the Open Distro stack to use Apache Zeppelin instead, which is much less greedy in terms of resources consumption !
1. How it works ?
Where Open Distro required to index the analytics data of Datafari into Elasticsearch, we now index them into new Solr indexes:
Crawl: all logs related to the crawls (Enterprise edition only)
Access: all logs related to connections to the search UI
Logs: all components logs (Cassandra, Solr, Tomcat etc.) (Enterprise Edition only)
Statistics: all logs concerning searches performed by users
Monitoring: all logs concerning the corpus of documents (number of docs, file types, etc)
By indexing those logs in Solr instead of Elasticsearch, we saved resource consumption and removed security complexity !
We kept Logstash as a log pusher because Logstash OSS is opensource, light and we already had every log parser configurer, we just replaced the Elasticsearch output in the configuration by a Solr output thanks to a Solr plugin
Concerning the Kibana Dashboards, we have migrated to Apache Zeppelin and replaced the dashboards by notebooks.
2. How to access and use Apache Zeppelin in Datafari
Apache Zeppelin is automatically started/stopped when Datafari is started/stopped by default unless you have disabled it during the install phase by answering “no” to the question “Do you want to enable analytic stack (yes/no) [yes] ?” or you disabled it in the configuration file [DATAFARI_HOME]/tomcat/conf/datafari.properties by setting the parameter “AnalyticsActivation” to false.
In case the Analytics are disabled, to enable them you need to modify the parameter “AnalyticsActivation” in the conf file [DATAFARI_HOME]/tomcat/conf/datafari.properties and set it to “true”:
Code Block |
---|
#Analytics
AnalyticsActivation=true |
Then restart datafari !
Once the Analytics are enabled, Apache Zeppelin will be started and stopped synchronously with Datafari and the notebooks that replaced the old dashboards will be accessible through the admin UI of Datafari in the following sections:
Usage Analysis → Corpus Analysis
Usage Analysis → Queries Analysis
System Analysis → Check problematic files (Enterprise Only)
System Analysis → Logs Analysis (Enterprise Only)
Once you are connected to one of the Apache Zeppelin notebooks, you can navigate through the different notebooks available thanks to the “Notebook” header menu:
...
3. Apache Zeppelin Notebooks trap
Unlike Kibana, Apache Zeppelin does not automatically refresh the notebooks data ! By the way, when a user will access a notebook for the first time ever, there will not be any data displayed ! To refresh the data of a notebook, you MUST do it manually by clicking on the “Run all paragraph” button which is located next to the notebook name at the top of the notebook:
...
By clicking on that button all the notebook data will be refreshed ! You must perform this operation each time you open a notebook for the first time and each time you want to refresh the data of a notebook !
You can also refresh one paragraph at a time (a visualization is called a paragraph in Apache Zeppelin) by clicking on the “Run this paragraph” button in the top right corner of the paragraph:
...
4. Filtering the data of a visualization
Unlike Kibana, in a notebook, you are unable to simply filter data by clicking on a value. Instead, you will need to directly modify the query of the visualization that is displayed above it:
...
But you will need to be familiar with the Apache Zeppelin Solr plugin syntax AND the Solr query syntax which will require some skills ! Also every modification of those queries will be applied and saved by clicking on the “Run this paragraph” or “Run all paragraph” button which will erase the original query but also may set the visualization or even the whole notebook in error in case of mistake !