Datafari generates a statistic log each time a query is performed by a user or he/she clicks on a facet/result/page.
Here is how a statistic log looks like:
2015-11-05 15:24:45 STAT StatsPusher:95 - 508e9b3e-fdb0-4831-8fab-2acad81f1cb5|2015-11-05T14:21:29.740+0100|engine|0|4|1|7|2|1|[engine//////4///7///1//////, engine///(extension:docx )///2///6///1//////, ///////////////file:/home/youp/Downloads/doc/Alertes.docx///2]|file:/home/youp/Downloads/doc/Alertes.docx
It respects a specific format which is :
[log4j_timestamp] [log_level] [logger_class]:[line_in_class] - [query_id] | [query_timestamp] | [query] | [noHits] | [numFound] | [numClicks] | [QTime] | [positionClickTot] | [click] | [history] | [spell] | [suggest] | [url]
Let's explain each field:
- [log4j_timestamp] : the timestamp of the log, set by the log4j API
- [log_level] : literally the log level value
- [logger_class] : the name of the class which has generated the log
- [line_in_class] : the line number in the class that has generated the log
- [query_id] : the id of the query. This id is used to keep track of the user behaviour.
A query id is generated each time the user clicks on the search button, or press the "Enter" key when the focus is on the search field of Datafari. However, when the user perform a "sub-query" like selecting a facet or a page or even clicking on a result, the "sub-query" keeps the id of the root query.
For example, if the user search the word "engine", a query id is generated. Then, if the user click on the facet "fr" on the query results, the facet "sub-query" will keep the id of the root "engine" query. So several log lines can concern one query id as one log is generated on each action performed - [query_timestamp] : the full timestamp of the query, measured by Datafari
- [query] : the literal query performed by the user
- [noHits] : indicates if the query does or does not have hits. Two values are possible, '0' if the query has hits, '1' if the query does not have any hit
- [numFound] : the number of documents found for the query
- [numClicks] : the number of clicks on the found documents
- [QTime] : the query time in milliseconds. This value is directly provided by Solr
- [positionClickTot] : represents the sum of the positions of the documents clicked.
For example, if the user clicks on the first result and the third, the positionClickTot will be 1+3=4
The positions are absolute and are not base on the page. If there are 10 results by page and the user clicks on the first link of the second page, the position number will be 11 - [click] : like [noHits], it is a boolean value where '0' indicates that the user did not click on any result, and '1' indicates that the user has clicked at least on one result.
- [history] : keep track of the user behaviour. it represents a list of user "actions" : [user_action1,user_action2,...]
A user action is formatted as follow: "query"///"facet_used"///"num_doc_found"///"query_time"///"num_of_page"///"url"///"url_position"- "query" : the query performed
- "facet_used" : if the user has clicked on a facet, it is formatted like this : ("facet_field":"facet_value" )
For example, if you look at the second "user action" in the history of the example log : (extension:docx ), you can deduce that the user has clicked on the facet "docx" which is based on the field named "extension") - "num_doc_found" : number of docs returned by the query
- "query_time" : the query time measured by Solr
- "num_of_page" : the number of the page where was the user
- "url" : the URL of the clicked document
- "url_position" : the position of the clicked document (first result = 1, second = 2, etc.). Remember that the position is absolute and based on all the available results.
- [url] : the clicked document URL. This value is only set if the log itself represents the click
Now that the basics are settled, let's decode the example log. Here is what it says :
- The log has been generated on a click of the user on a document (the [url] field is not empty)
- The original query was "engine", which returned 4 results and took 7 milliseconds
- The user has clicked on one document of which the URL is "file:/home/youp/Downloads/doc/Alertes.docx"
- Relying on the history, we can say that :
- the original query was "engine", returned 4 results and took 7 milliseconds
- then the user clicked on the facet "docx" which is based on the field named "extension". This "sub-query" returned 2 results and took 6 milliseconds
- the last action (which is the log subject) is a click on the 2nd result of which the URL is "file:/home/youp/Downloads/doc/Alertes.docx"
By default, those logs are not displayed in the console, but are wrote into specifics log files.
The configuration of the log files (path, size, number etc.) can be set in the log4j properties located in tomcat/lib/log4j.properties.