Deduplication

Valid from Datafari v5.4

The documentation below is valid starting from Datafari v5.4 upwards

When Deduplication is active and properly configured, a user with a searchexpert role can check duplicates in the admin UI. It is present in the Extra Functionalities menu.

You can get more details about this functionality in https://datafari.atlassian.net/wiki/spaces/DATAFARI/pages/681574448


Valid from Datafari v5.0 - Enterprise Edition only

The documentation below is valid starting from Datafari v5.0 up to 5.3 included

When Deduplication is active and properly configured, a user with a searchexpert role can check duplicates in the admin UI. It is present in the Extra Functionalities menu.

You can get more details about this functionality in https://datafari.atlassian.net/wiki/spaces/DATAFARI/pages/681574448


Not active since v3.0 - Enterprise Edition only

Since version 3.0, Deduplication is neither active nor maintained anymore.

Datafari can allow a user to see wich documents are duplicated in the result of the search.

The deduplication functionnality uses the MD5 Algorithm for hashing the documents so that solr could recognize which documents are duplicated.  

When activated, users have a special “duplication” facet that appears on the bottom left of the results page. Each item in this facets represents a set of duplicated documents, with a name and the number of duplications in parenthesis.

When clicking on a facet item, the results display will show all the duplicated documents related to the clicked facet item. This functionality can be useful to find out how many duplicated documents are present in the corpus.