Info |
---|
Valid from Datafari 5.3 up to 5.5 This documentation is only valid for Datafari 5.3 upwardsup to 5.5 |
Note |
---|
Before starting to use Apache Zeppelin notebooks, be sure to read the section 3. Apache Zeppelin Notebooks trap |
Datafari 5.3 has dropped shifted from the Open Distro stack to use the Apache Zeppelin instead, which as it is much less greedy demanding in terms of resources consumption ! .
1. How does it works ?
Where Open Distro OpenDistro required to index the analytics data of Datafari into Elasticsearch, we can now index them into new in dedicated Solr indexes:
Crawl: all logs related to the crawls (Enterprise edition Edition only)
Access: all logs related to connections to the search UI
Logs: all components logs (Cassandra, Solr, Tomcat etc.) (Enterprise Edition only)
Access: all logs related to connections to the search UI
Statistics: all logs concerning searches performed by users
Monitoring: all logs concerning the corpus of documents (number of docs, file types, etc)
By indexing those these logs in into Solr instead of Elasticsearch, we saved cut the resource consumption and removed got rid of the security complexity !
We kept Logstash as a log pusher because Logstash OSS is opensource, light and we already had every log parser configurer, we all of the log parsers configured. We just replaced the Elasticsearch output in the configuration by a Solr output thanks to a Solr plugin.
Concerning the Kibana Dashboards, we they have been migrated to Apache Zeppelin and replaced , by replacing the dashboards by with notebooks.
2. How to access and use Apache Zeppelin in Datafari
Apache Zeppelin is automatically started/stopped when Datafari is started/stopped by default unless you have disabled it during the install phase by answering “no” to the question “Do you want to enable analytic stack (yes/no) [yes] ?” or if you have disabled it in the configuration file [DATAFARI_HOME]/tomcat/conf/datafari.properties by setting the parameter “AnalyticsActivation” to false.
In case the Analytics are disabled in an already installed Datafari, to enable them you need to modify the parameter “AnalyticsActivation” in the conf file [DATAFARI_HOME]/tomcat/conf/datafari.properties and set it to “true”:
Code Block |
---|
#Analytics AnalyticsActivation=true |
Then restart datafari !.
Once the Analytics are enabled, Apache Zeppelin will be started and stopped synchronously with Datafari and the notebooks that replaced the old dashboards will be accessible through the admin UI of Datafari in via the following sectionsadmin menu:
Usage Analysis → Corpus Analysis
Usage Analysis → Queries Analysis
System Analysis → Check problematic files (Enterprise Only)
System Analysis → Logs Analysis (Enterprise Only)
Once you are connected to one of the Apache Zeppelin notebooks, you can navigate through the different notebooks available thanks to the “Notebook” header menu:
...
3. Apache Zeppelin Notebooks trap
Unlike Kibana, Apache Zeppelin does not automatically refresh the notebooks data ! By the way, . Please note that when a user will access accesses a notebook for the first time ever, there no data will not be any data displayed ! . To refresh the data of a notebook, you MUST do it manually by clicking on the “Run all paragraph” button which is located next to the notebook name at the top of the notebook:
...
By clicking on that button, all of the notebook data will be refreshed ! . You must perform this operation each time you open a notebook for the first time and each time you want to refresh the data of a notebook !.
You can also refresh one paragraph at a time (a visualization is called a paragraph in Apache Zeppelin) by clicking on the “Run this paragraph” button in the top right corner of the paragraph:
...
4. Filtering the data of a
...
paragraph
Unlike Kibana, in a notebook, you are unable to simply filter data by clicking on a value. Instead, you will need to directly modify the query of the paragraph (a visualization is called a paragraph in Apache Zeppelin) that is displayed above it:
...