Store Content in IDOL Server > Index Data > Prevent Duplicate Documents > Enable Deduplication for all Index Jobs

Enable Deduplication for all Index Jobs
To enable deduplication for all indexing jobs—in other words, set deduplication by default for the DREADD and DREADDATA actions—use the KillDuplicates configuration parameter in the [Server] section. Note that you must enable deduplication before you start indexing documents into IDOL server.
You can use the KillDuplicatesChecksumField parameter to configure IDOL to reverse normal deduplication and retain the existing document instead of the incoming document, based on the value of a specified field in the incoming document.
You can use the KillDuplicatesPreserveFields parameter to configure one or more IDX fields that IDOL server copies to a newer version of a duplicate document.
Related Topics 
*
To enable deduplication as the default for all indexing jobs
1.
2.
In the [Server] section, set the KillDuplicates parameter to REFERENCE, REFERENCEMATCHN, or the names of the ReferenceType fields to use to determine which documents are duplicates.
You can identify fields that contain document references by setting up an appropriate field process. When you index a document that has the same value in the same ReferenceType field as an existing document in IDOL server, IDOL server detects the duplicate. It deletes the existing document and replaces it with the new one.
3.
Related Topics 
*
Limit ReferenceType Fields used for Deduplication
You identify fields as ReferenceType fields through field processes. If you list multiple fields in the same PropertyFieldCSVs parameter where you list the FieldName for deduplication, IDOL server uses all the fields to eliminate duplicate documents. For example:
[SetReferenceFields]
Property=Reference
PropertyFieldCSVs=*/DREREFERENCE,*/URL
In this example, IDOL server uses both the DREREFERENCE field and URL field to eliminate duplicate copies if you set KillDuplicates to DREREFERENCE.
If you want to define multiple ReferenceType fields but do not want to use them all for duplicate elimination, set up multiple field processes. For example:
[SetReferenceFields]
Property=Reference
PropertyFieldCSVs=*/DREREFERENCE
[SetMoreReferenceFields]
Property=Reference
PropertyFieldCSVs=*/URL
In this example, IDOL server uses only the DREREFERENCE field to eliminate duplicate copies if KillDuplicates is DREREFERENCE. It does not use the URL field.
Related Topics 
*
Use KillDuplicatesChecksumField to Prevent Unnecessary Indexing
By default, when IDOL server detects that a new document is a duplicate of an existing one, it replaces the existing document with the new one.
For either of these two KillDuplicates options, you can also use the KillDuplicatesChecksumField configuration parameter to specify a checksum field. IDOL server then checks the value of this field in both documents. If the value is the same, IDOL server keeps the existing document rather than replacing it with the new document.
This process prevents unnecessary updates. For example, when re-fetching a Web site, use KillDuplicatesChecksumField to configure IDOL to update the index for this site only if the site has changed.
 
NOTE The KillDuplicatesChecksumField must be a ReferenceType field.
Use KillDuplicatesPreserveFields to Preserve a Field
If there is a field that you want to keep in all versions of a document, regardless of whether it is later deleted or changed, you can use the KillDuplicatesPreserveFields configuration parameter.
To preserve fields, set KillDuplicatesPreserveFields to a comma-separated list of fields that you want to save.
When IDOL server receives a duplicate document, it copies this field from the existing version of the document to the newer version when it performs KillDuplicates.
 
NOTE If there is more than one copy of the document in the IDOL server index when a new version arrives, IDOL server copies the preserve field from the existing duplicate with the highest document ID.