Custom Analytics Stats

Valid from Datafari v5.3 upwards

This documentation is only valid for Datafari v5.3 upwards

By default, Datafari generates two kinds of logs that are meant to provide data to the Analytic stack in order to create visualization paragraphs in Apache Zeppelin notebooks : 

 

The visualization paragraphs you can create with a default installation of Datafari are limited to the data provided by those two kind of logs. 

So to add more data to the Analytic stack, you will need either to modify those logs or to create your own logs.

 

1. Modify the existing logs

The Core Monitoring logs are generated by the com.francelabs.datafari.monitoring.IndexMonitoring class. This class runs a scheduled thread that will query the Solr index each day with some facet queries and format the results into log entries. So it is in this class that you should add your own additionnal facet queries and modify the log format if needed in case you want to add more data.

The Statistics logs are generated by the com.francelabs.datafari.statistics.StatsPusher class that is called whenever a user performs an action in the search view (search, click on result/page/facet). This class contains two main methods:

  • pushDocument : this method is called by the com.francelabs.datafari.servlets.URL servlet which is triggered when a user clicks on a result in the search view (both in AjaxFranceLabs UI and DatafariUI)

  • pushQuery : this method is called by the com.francelabs.datafari.api.SearchAPI which is triggered when a user performs a search in the search view (both in AjaxFranceLabs UI and DatafariUI)

In each method, the com.francelabs.datafari.statistics.StatsUtils class is used to format and produce the logs.
So, you can either modify the existing logs to add more data or produce your own logs in the StatsPusher class.

To add more data (ie Solr fields) to the logs generated from the com.francelabs.datafari.statistics.StatsPusher class or com.francelabs.datafari.api.SearchAPI, you will need to add the wanted fields to the queries field list parameter. This can be done in the DATAFARI_HOME/tomcat/webapps/Datafari/js/main.js for the AjaxFranceLabs UI, or in the ui-config.json file for Datafari UI (see Customizing DatafariUI for more details).

 

Concerning AjaxFranceLabs and the DATAFARI_HOME/tomcat/webapps/Datafari/js/main.js file, add the desired fields in the existing list :

Manager.store.addByValue("fl", 'title,url,id,extension,preview_content');

Once this modification done, you will be able to access those fields values in the query responses and so, add the to the logs.

 

For DatafariUI, refer to the “Result list” section of the Customizing DatafariUI documentation !

 

2. Create your own logs

If the logs you want to create are meant to be generated on a user action in the search view, we strongly recommend you to generate them either from the com.francelabs.datafari.servlets.URL servlet (when a user clicks on a result) or the com.francelabs.datafari.api.SearchAPI (when a user performs a search). 

If the logs you want to create are meant to extract infos (fields etc.) from a Solr index at a defined frequency, we strongly suggest you to take example on the com.francelabs.datafari.monitoring.IndexMonitoring and com.francelabs.datafari.initializers.IndexMonitoringInitializer classes.

 

3. Update the Logstash and configuration

Whatever choice you will do, create or add new data to existing log, you will need to do modify the Logstash configuration so that they are correctly indexed into Solr then available in Apache Zeppelin notebooks.

In DATAFARI_HOME/analytic-stack/logstash/ you will need to edit the file logstash-datafari.conf. This file contains all the instructions to retrieve log files (input section), how to parse them (filter section) and where to put them (output section). You can rely on the official logstash documentation to understand how to update the configuration.

Once the Logstash configuration is updated, you will also need to update the schema of the Solr index corresponding to the kind of log you have created/updated/modified so the logs are correctly indexed and not rejected !

There are 2 indexes for analytics that may be concerned by your work:

  • Statistics: all logs concerning searches performed by users

  • Monitoring: all logs concerning the corpus of documents (number of docs, file types, etc)

So, depending on your needs, update the right schema !