Not active since v3.0 - Enterprise Edition only
Since version 3.0, Deduplication is neither active nor maintained anymore.
Datafari can allow a user to see wich documents are duplicated in the result of the search.
The deduplication functionnality uses the MD5 Algorithm for hashing the documents so that solr could recognize which documents are duplicated.
When activated, users have a special “duplication” facet that appears on the bottom left of the results page. Each item in this facets represents a set of duplicated documents, with a name and the number of duplications in parenthesis.
When clicking on a facet item, the results display will show all the duplicated documents related to the clicked facet item. This functionality can be useful to find out how many duplicated documents are present in the corpus.