DREDUPLICATE

Removes or tags duplicates after indexing.

This index action runs on a specified subset of the content, locating duplicates using a variety of methods. Any duplicates can then be deleted, moved to a different database, or tagged within a specified field, depending on the value of DuplicateAction that is chosen.

Note: The DREDUPLICATE index action only removes duplicate documents within a single Distributed Index Handler instance, rather than removing duplicates over the whole distributed system. To remove all duplicates, you must ensure that duplicates of a document are all sent to the same instance of Distributed Index Handler, for example by using DistributeByFields mode.

Example

http://12.3.4.56:20001/DREDUPLICATE?DuplicateAction=Delete&ReferenceField=*/DREREFERENCE

In this example, duplicates are identified using the DREREFERENCE field, and any duplicates found are deleted.

Parameters

Parameter Description Required
ChecksumField A reference field used to determine whether a match is exact.  
Database The database to move duplicates to. see Comments
DatabaseMatch A list of databases to search for duplicates in.  
DuplicateAction The action to perform on duplicates. Yes
IgnoreMaxPendingItems Whether to ignore the IndexQueueMaxPendingItems limit for this index action.  
IndexUID An identification code for any document tracking events.  
MaxID The last DocID to find duplicates of.  
MinID The first DocID to find duplicates of.  
Priority The priority for the index job.  
ReferenceField A reference field to use as the initial determination of whether two documents are a match. Yes
TagField The field to tag duplicates with. see Comments
TagValue The static value to tag duplicates with in the TagField.  
ThreadHashField The field containing the thread hash values used to determine whether a match is a duplicate.  

Comments


© 2013 Hewlett-Packard Development Company, L.P.