[DEPRECATED] Content Analysis

Deprecated

This documentation is deprecated as of Datafari version 5.2. Kibana has been replaced by Apache Zeppelin and the new “dashboards” are presented in the following documentation: https://datafari.atlassian.net/wiki/spaces/DATAFARI/pages/2736095237

 

Valid from 5.0

The documentation below is valid from Datafari v5.0 upwards

This dashboard is based on the Core Monitoring logs generated by Datafari, which are inserted by Logstash in Elasticsearch under the "monitoring/logs" index.

It shows the documents distribution in the core.
There is one pie chart per distribution type, and by default the types are doc language, doc type and doc source. As you may have understood, each pie is based on a facet : language, source and extension. You can create more charts, based on other facets by learning how Kibana works, and how our ELK works as well.

The important thing on this dashboard is the selected time frame. You can see it at the top right of the Kibana, it is set to "Today" by default.
There is a really good reason why it is set to this value. Kibana works with time events and we decided by default to have one time event by day. It means that we have only one Elasticsearch document by facet value per day. Thus, those "daily documents" are updated each hour by Datafari when a monitoring iteration happens, and we keep the data displayed in Kibana uptodate. This also means that if you change the time frame of this dashboard to another value than "Today", the data used by the charts will be wrong. Indeed, the time frame of this dashboard must be set to one time event (so the current day if the time event is daily, the current hour if the time event is hourly, and the current minute if the time event is minutely)

For the moment, the Kibana time event is not configurable from the admin UI of Datafari but only by the code (see ELK for a better understanding).


Valid from 3.0 up to 4.x

The documentation below is valid from Datafari v3.0 up to 4.x

This dashboard is based on the Core Monitoring logs generated by Datafari, which are inserted by Logstash in Elasticsearch under the "monitoring/logs" index.

It shows the documents distribution in the core.
There is one pie chart per distribution type, and by default the types are doc language, doc type and doc source. As you may have understood, each pie is based on a facet : language, source and extension. You can create more charts, based on other facets by learning how Kibana works, and how our ELK works as well.

The important thing on this dashboard is the selected time frame. You can see it at the top right of the Kibana, it is set to "Today" by default.
There is a really good reason why it is set to this value. Kibana works with time events and we decided by default to have one time event by day. It means that we have only one Elasticsearch document by facet value per day. Thus, those "daily documents" are updated each hour by Datafari when a monitoring iteration happens, and we keep the data displayed in Kibana uptodate. This also means that if you change the time frame of this dashboard to another value than "Today", the data used by the charts will be wrong. Indeed, the time frame of this dashboard must be set to one time event (so the current day if the time event is daily, the current hour if the time event is hourly, and the current minute if the time event is minutely)

For the moment, the Kibana time event is not configurable from the admin UI of Datafari but only by the code (see ELK for a better understanding).