Open topic with navigation
There are some constraints on deduplication when using other IDOL parameters.
The IDOL Content component cannot use the same
ReferenceType field for deduplication as it uses for the
Combine action parameter. The
Combine operation occurs at query time and clashes with deduplication. If you intend to deduplicate when indexing and use the
Combine action parameter, you must set up separate
ReferenceType fields for these processes.
You can enable the DIH for reference-based indexing. Refer to the DIH Administration Guide.
If you index documents into IDOL with the DIH enabled for reference-based indexing, it might prevent deduplication of documents with different references. In this case, use only one of the following deduplication options:
You can use field-based indexing in the DIH to ensure correct deduplication in a distributed system. For more information on configuring the DIH for field-based indexing, refer to the DIH Administration Guide.
If you set
False, or use
KillDuplicatesDB options, it might prevent correct deduplication. To deduplicate correctly, you can distribute data by the
DeDupeHash field (MD5 hash) of the documents. In this way, DIH sends all duplicates to the same child server. Setting
DeDupeHash during the indexing action then ensures accurate deduplication.
To use a field for deduplication, you must configure it as a
ReferenceType field. You do not need to configure it as
ReferenceType in the DIH configuration file.
Deduplication of content occurs for all reference fields specified in a single
PropertyFieldCSVs list in the IDOL Content component configuration file. To use only the
DeDupeHash field to deduplicate, and not also the
DREREFERENCE, you must set these reference fields in separate field processing sections in the IDOL Content component configuration file.