Datafari generates a statistic log each time a query is performed by a user or he/she clicks on a facet/result/page.
Here how a statistic log looks like:
statistic log
2015-11-05 15:24:45 STAT StatsPusher:95 - 508e9b3e-fdb0-4831-8fab-2acad81f1cb5|2015-11-05T14:21:29.740+0100|engine|0|4|1|7|2|1|[engine//////4///7///1//////, engine///(extension:docx )///2///6///1//////, ///////////////file:/home/youp/Downloads/doc/Alertes.docx///2]|||file:/home/youp/Downloads/doc/Alertes.docx
It respects a specific format which is :
[log4j_timestamp] [log_level] [logger_class]:[line_in_class] - [query_id] | [query_timestamp] | [query] | [noHits] | [numFound] | [numClicks] | [QTime] | [positionClickTot] | [click] | [history] | [spell] | [suggest] | [url]
Let explain each field:
- [log4j_timestamp] : the timestamp of the log, set by the log4j API
- [log_level] : literally the log level value
- [logger_class] : the name of the class which has generated the log
- [line_in_class] : the line number in the class that has generated the log
- [query_id] : the id of the query. This id is used to keep track of the user behaviour.
For example, if the user search the word "engine", a query id is generated. Then, if the user click on the facet "Language=>fr" on the query results, the facet "sub-query" will keep the id of the "engine" query but will enrich his history - [query_timestamp] : the full timestamp of the query, measured by Datafari
- [query] : the literal query performed by the user
- [noHits] : indicates if the query does or does not have hits. Two values are possible, '0' if the query has hits, '1' if the query does not have any hit
- [numFound] : the number of documents found for the query
- [numClicks] : the number of clicks on the found documents
- [QTime] : the query time in milliseconds. This value is directly provided by Solr
- [positionClickTot] : represents the sum of the positions of the documents clicked.
For example, if the user clicks on the first result and the third, the positionClickTot will be 1+3=4 - [click] : like [noHits], it is a boolean value where '0' indicates that the user did not click on any result, and '1' indicates that the user has clicked at least on one result.
- [history] : keep track of the user behaviour. it represents a list of user "actions" : [user_action1,user_action2,...]
A user action is formatted as follow: "query"///"facet_used"///"num_doc_found"///"query_time"///"num_of_page"///"url"///"url_position"- "query" : the query performed
- "facet_used" : if the user has clicked on a facet, it is formatted like this : ("facet_type":"facet_value" )
For example, if you look at the second "user action" in the history of the example log : (extension:docx ), you can deduce that the user has clicked on the facet "docx" which is of type "extension") - "num_doc_found" : number of docs returned by the query
- "query_time" : the query time measured by Solr
- "num_of_page" : the number of the page where was the user
- "url" : the URL of the clicked document
- "url_position" : the position of the clicked document (first result = 1, second = 2, etc.)
- [spell] : TODO
- [suggest] : TODO
- [url] : the last clicked document URL
By default, those logs are not displayed in the console, but are wrote into specifics log files.
The configuration of the log files (path, size, number etc.) can be set in the log4j properties located in tomcat/lib/log4j.properties.