Deduplication Options—KillDuplicates

Use the following parameters to specify deduplication options. HPE IDOL Server uses these parameters to determine whether documents match.

The following options are available for the deduplication parameters.

NONE Allows duplicate documents in HPE IDOL Server. HPE IDOL Server does not replace or delete documents.
REFERENCE Replaces an existing document with the new document if the document to index has the same value in its DREREFERENCE field.
REFERENCEMATCHN

Replaces the existing document with the new document if the content of the document to index is more than N percent similar to the existing document. HPE IDOL Server determines the similarity by comparing the content of the SourceType fields in the document, or the Index fields if no SourceType fields are configured.

NOTE:

This method can deduplicate only documents that are already synced in the IDOL Server index. It cannot deduplicate similar documents in the same index job.

FieldName

Replaces the existing document with the new document if the document to index contains a ReferenceType field named FieldName that has the same content as the FieldName field in the existing document.

You can specify multiple ReferenceType fields in this option (separated by a plus symbol or space), in which case HPE IDOL Server deletes documents that contain any of the specified fields with identical content.

NOTE:

You identify fields as ReferenceType fields through field processes in the HPE IDOL Server configuration file. If you list multiple fields in the same PropertyFieldCSVs parameter where you list the FieldName for deduplication, HPE IDOL Server uses all the fields to eliminate duplicate documents. If you want to define multiple ReferenceType fields but do not want to use all fields for duplicate elimination, set up multiple field processes.

ReferenceField,GREATER:VersionField

Replaces the existing document with the new document if the document to index contains a ReferenceType field named ReferenceField that has the same content as the ReferenceField field in the existing document, and if the VersionField field in the document to index has a higher value than the VersionField in the existing document. For XML documents, you must fully qualify the path of the XML field that you want to use as the version field (you cannot use wildcard values).

VersionField must contain a positive integer value, but you do not need to configure it as a numeric field. If only one of the incoming and current documents has a valid value in the VersionField, IDOL Server keeps the version with a valid VersionField. When both documents have the same VersionField, IDOL Server keeps the existing document.

NOTE:

When you index IDX documents, for the version comparison to work correctly, the value in the field that you use as the VersionField must be listed in quotation marks (""). That is, the field must have the following format in the IDX:

#DREFIELD MyField="N"

IDOL Server treats existing documents with a missing or non-numeric value in the VersionField as having a version number of negative infinity. It treats a new document with a missing or non-numeric value in the VersionField as having a version number of 0.

NOOP (DREADDDATA only)

Use the KillDuplicates parameter in the [Server] section of the HPE IDOL Server configuration file to determine how to treat duplicate documents.

NOTE:

This option is available only for the DREADDDATA action.

When you specify a deduplication option, note that:


_HP_HTML5_bannerTitle.htm