The deduplication functionality is one of the functionalities that is simplified since Solr 1.4 and the above versions, as it proposes to enable it through its config files. Datafari used this functionality to implement it and make it available in the front-side.

For the front-end, we want to expose duplicates using a facet:

For this, we have created a new class called FacetDuplicates and which is located in /datafari/WebContent/js/AjaxFranceLabs/widgets/. This class inherits from TableWidget and overloads the update method. This was achieved due to the fact that duplication is a facet and so returns only the hashes and that we wanted to return the names of a document from the duplicated documents. So what happens is that we send for every hash in the facet a get query. We have also set a mincount for the facet so we will show only duplicated file names in the facet which is not the case when the mincount is equal to 0 (which is the default configuration). You can find in this link a short doc for the parameter mincount : https://cwiki.apache.org/confluence/display/solr/Faceting. We also make the facet disappear if it doesn't contain a duplicated document : a simple $('#facet_signature').show/hide had done the trick.