Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Note

Before starting to use Apache Zeppelin notebooks, be sure to read the section 3. Apache Zeppelin Notebooks trap

Datafari 5.3 has dropped shifted from the Open Distro stack to use the Apache Zeppelin instead, which as it is much less greedy demanding in terms of resources consumption ! .

1. How does it works ?

Where Open Distro OpenDistro required to index the analytics data of Datafari into Elasticsearch, we can now index them into new in dedicated Solr indexes:

  • Crawl: all logs related to the crawls (Enterprise edition Edition only)

  • Access: all logs related to connections to the search UI

  • Logs: all components logs (Cassandra, Solr, Tomcat etc.) (Enterprise Edition only)

  • Access: all logs related to connections to the search UI

  • Statistics: all logs concerning searches performed by users

  • Monitoring: all logs concerning the corpus of documents (number of docs, file types, etc)

By indexing those these logs in into Solr instead of Elasticsearch, we saved cut the resource consumption and removed got rid of the security complexity !

We kept Logstash as a log pusher because Logstash OSS is opensource, light and we already had every log parser configurer, we all of the log parsers configured. We just replaced the Elasticsearch output in the configuration by a Solr output thanks to a Solr plugin.

Concerning the Kibana Dashboards, we they have been migrated to Apache Zeppelin and replaced , by replacing the dashboards by with notebooks.

2. How to access and use Apache Zeppelin in Datafari

Apache Zeppelin is automatically started/stopped when Datafari is started/stopped by default unless you have disabled it during the install phase by answering “no” to the question “Do you want to enable analytic stack (yes/no) [yes] ?” or if you have disabled it in the configuration file [DATAFARI_HOME]/tomcat/conf/datafari.properties by setting the parameter “AnalyticsActivation” to false.

In case the Analytics are disabled in an already installed Datafari, to enable them you need to modify the parameter “AnalyticsActivation” in the conf file [DATAFARI_HOME]/tomcat/conf/datafari.properties and set it to “true”:

Code Block
#Analytics
AnalyticsActivation=true

Then restart datafari !.

Once the Analytics are enabled, Apache Zeppelin will be started and stopped synchronously with Datafari and the notebooks that replaced the old dashboards will be accessible through the admin UI of Datafari in via the following sectionsadmin menu:

  • Usage Analysis → Corpus Analysis

  • Usage Analysis → Queries Analysis

  • System Analysis → Check problematic files (Enterprise Only)

  • System Analysis → Logs Analysis (Enterprise Only)

Once you are connected to one of the Apache Zeppelin notebooks, you can navigate through the different notebooks available thanks to the “Notebook” header menu:

...

3. Apache Zeppelin Notebooks trap

Unlike Kibana, Apache Zeppelin does not automatically refresh the notebooks data ! By the way, . Please note that when a user will access accesses a notebook for the first time ever, there no data will not be any data displayed ! . To refresh the data of a notebook, you MUST do it manually by clicking on the “Run all paragraph” button which is located next to the notebook name at the top of the notebook:

...

By clicking on that button, all of the notebook data will be refreshed ! . You must perform this operation each time you open a notebook for the first time and each time you want to refresh the data of a notebook !.

You can also refresh one paragraph at a time (a visualization is called a paragraph in Apache Zeppelin) by clicking on the “Run this paragraph” button in the top right corner of the paragraph:

...