Statistics logs

Datafari generates a statistic log each time a query is performed by a user or he/she clicks on a facet/result/page.
Here is how a statistic log looks like:

statistic log
2015-11-05 15:24:45 STAT StatsPusher:95 - 508e9b3e-fdb0-4831-8fab-2acad81f1cb5|2015-11-05T14:21:29.740+0100|engine|0|4|1|7|2|1|[engine//////4///7///1//////, engine///(extension:docx )///2///6///1//////, ///////////////file:/home/youp/Downloads/doc/Alertes.docx///2]|file:/home/youp/Downloads/doc/Alertes.docx

It respects a specific format which is :
[query_id] | [query_timestamp] | [query] | [noHits] | [numFound] | [numClicks] | [QTime] | [positionClickTot] | [click] | [history] | [url]

Let's explain each field:

  • [query_id] : the id of the query. This id is used to keep track of the user behaviour.
    A query id is generated each time the user clicks on the search button, or presses the "Enter" key when the focus is on the search field of Datafari. However, when the user performs a "sub-query" like selecting a facet or a page or even clicking on a result, the "sub-query" keeps the id of the root query.
    For example, if the user searches for the word "engine", a query id is generated. Then, if the user clicks on the facet "fr" on the query results, the facet "sub-query" will keep the id of the root "engine" query. So several log lines can concern one query id as one log is generated on each action performed

  • [query_timestamp] : the full timestamp of the query, measured by Datafari

  • [query] : the literal query performed by the user

  • [noHits] : indicates if the query does or does not have hits. Two values are possible, '0' if the query has hits, '1' if the query does not have any hit

  • [numFound] : the number of documents found for the query

  • [numClicks] : the number of clicks on the found documents

  • [QTime] : the query time in milliseconds. This value is directly provided by Solr

  • [positionClickTot] : represents the sum of the positions of the documents clicked.
    For example, if the user clicks on the first result and the third, the positionClickTot will be 1+3=4
    The positions are absolute and are not based on the page. If there are 10 results by page and the user clicks on the first link of the second page, the position number will be 11

  • [click] : like [noHits], it is a boolean value where '0' indicates that the user did not click on any result, and '1' indicates that the user has clicked on at least on one result.

  • [history] : keeps track of the user behaviour. It represents a list of user "actions" : [user_action1,user_action2,...]
    A user action is formatted as follows: "query"///"facet_used"///"num_doc_found"///"query_time"///"num_of_page"///"url"///"url_position"

    • "query" : the query performed

    • "facet_used" : filled if the user has clicked on a facet, it is formatted like this : ("facet_field":"facet_value" )
      For example, if you look at the second "user action" in the history of the example log : (extension:docx ), you can deduce that the user has clicked on the facet "docx" which is based on the field named "extension")

    • "num_doc_found" : number of docs returned by the query

    • "query_time" : the query time measured by Solr

    • "num_of_page" : the number of the page where was the user

    • "url" : the URL of the clicked document

    • "url_position" : the position of the clicked document (first result = 1, second = 2, etc.). Remember that the position is absolute and based on all the available results.

    When the user clicks on a document, only two values of the history are set : the "url" and the "url_position"

  • [url] : the clicked document URL. This value is only set if the log itself represents the click

Now that the basics are settled, let's decode the example log. Here is what it says :

  • The log has been generated on a click of the user on a document (the [url] field is not empty)

  • The original query was "engine", which returned 4 results and took 7 milliseconds

  • The user has clicked on one document of which the URL is "file:/home/youp/Downloads/doc/Alertes.docx"

  • Relying on the history, we can say that :

    • the original query was "engine", returned 4 results and took 7 milliseconds

    • then the user clicked on the facet "docx" which is based on the field named "extension". This "sub-query" returned 2 results and took 6 milliseconds

    • the last action (which is the log subject) is a click on the 2nd result of which the URL is "file:/home/youp/Downloads/doc/Alertes.docx"

By default, those logs are not displayed in the console, but are written into specifics log files.
The configuration of the log files (path, size, number etc.) can be set in the log4j  properties located in tomcat/lib/log4j.properties.