Analytic Stack

Valid from Datafari 6.0

Datafari 6.0 shifted from the Apache Zeppelin to a homegrown react based system, as it is much less demanding in terms of resources consumption.

 

Starting from Datafari 6.1 and only if you use Datafari into multiservers mode (Enterprise edition), the Solr indexes for analytics data are hosted into a dedicated Solr instance that is located into the main server. Thanks to that, the Solr search servers are only used for indexing data for the sources that need to be indexed : there is not drop of performance caused by the analytics data (see https://datafari.atlassian.net/wiki/spaces/DATAFARI/pages/559415376 if you want to see the architecture diagram).

1. How does it work ?

We index analytics data in dedicated Solr indexes:

  • Crawl: all logs related to the crawls (Enterprise Edition only)

  • Logs: all components logs (Cassandra, Solr, Tomcat etc.) (Enterprise Edition only)

  • Access: all logs related to connections to the search UI

  • Statistics: all logs concerning searches performed by users

  • Monitoring: all logs concerning the corpus of documents (number of docs, file types, etc)

By indexing these logs into Solr, we optimise the resource consumption and reduce technical management.

We kept Logstash as a log pusher because Logstash OSS is opensource, light and we already had all of the log parsers configured. We just replaced the Elasticsearch output in the configuration by a Solr output thanks to a Solr plugin.

Concerning the Dashboards, they have been migrated from Apache Zeppelin, by replacing the dashboards with React widgets.

2. How to access and use the analytics Dashboards in Datafari

The dashboards are accessible through the admin UI of Datafari via the admin menu:

  • Usage Analysis → Corpus Analysis

     

    image-20240313-091717.png

     

  • Usage Analysis → Queries Analysis

  • System Analysis → Check problematic files (Enterprise Only)

     

  • System Analysis → Logs Analysis (Enterprise Only)

    Once you are connected to analytics dashboards, you can navigate through the different notebooks available thanks to the upper right menu:

3. Modifying the time windows and refresh rates

From the upper right menu, you can modify the time window that will apply to all of the displayed widget, as well as the refresh rates.

For each widget that displays a “Change gap” dropdown, you can then also modify the time granularity.