Locate Duplicate Documents

You can locate duplicate documents in the data index after indexing has taken place by using the DREDUPLICATE index action. This action locates duplicates in a specified subset of the content, and then removes them, tags a field, or moves the duplicate documents to another database.

http://ContentHost:indexPort/DREDUPLICATE?ReferenceField=Field&DuplicateAction=Action

where:

ContentHost is the IP address or host name of the machine on which the IDOL Content component is installed.
indexPort is the IDOL Content component index port (specified as IndexPort in the [Server] section of the IDOL Content component configuration file).
Field is a ReferenceType field used as the initial determination of whether two documents are a match.
Action

is the action to perform on a duplicate. The following options are available:

  • Delete. Deletes all duplicate documents.

  • Database. Moves all duplicate documents to a database. If you select the Database action, you must specify the database in the Database parameter.

  • Tag. Tags a specified field in the duplicate document. You must specify the field in the TagField index parameter. You can also specify a value to tag the field with by using the TagValue parameter. If you do not specify a TagValue, the field is tagged with the value 1.

For example:

http://MyHost:20001/DREDUPLICATE?ReferenceField=DOCUMENT/DREREFERENCE&DuplicateAction=Database&Database=Duplicates

This action uses port 20001 to remove duplicates from the IDOL Content component that is located on the machine with the host name MyHost. Content uses the DREREFERENCE field to identify duplicate documents, and moves them to the Duplicates database.

http://MyHost:20001/DREDUPLICATE?ReferenceField=DOCUMENT/DREREFERENCE&DuplicateAction=Tag&TagField=DOCUMENT/DRETITLE&TagValue=Duplicate

In this example, Content initially uses the DREREFERENCE field to identify the duplicate documents, and then it changes the DRETITLE field to the value Duplicate.

To prevent Content from indexing duplicate documents, use the KillDuplicates parameter with the DREADD and DREADDDATA index actions.

For details on the other parameters that are available for the DREDUPLICATE index action, refer to the IDOL Server Reference.


_HP_HTML5_bannerTitle.htm