Store Content in IDOL Server > Index Data > Tag Documents into Clusters

Tag Documents into Clusters
After indexing, you can tag documents into clusters of similar documents. Tagging can be useful for grouping duplicate documents together.
Use the index action DRETAGDOCCLUSTERS. This action takes the following parameters.
 
DRETAGDOCCLUSTERS Example
IDOL server indexes three documents:
#DREREFERENCE A
#DREDBNAME Default
#DREFIELD CHECKSUM="ABCD1234"
#DRECONTENT
apple banana cheese
#DREENDDOC
#DREREFERENCE B
#DREDBNAME Default
#DREFIELD CHECKSUM="ABCD1234"
#DRECONTENT
apple banana cheese
#DREENDDOC
#DREREFERENCE C
#DREDBNAME Default
#DREFIELD CHECKSUM="XYZ9876"
#DRECONTENT
apple banana data
#DREENDDOC
After indexing, you send the following action:
[...]/DRETAGDOCCLUSTERS?tagfield=DOCUMENT/CLUSTERID&minscore=60&tagsourcefield=DOCUMENT/DREREFERENCE&MinId=1&MaxID=3&ChecksumField=DOCUMENT/CHECKSUM&TaggedDBName=tagged&RelevanceField=DOCUMENT/CLUSTERSCORE
IDOL server modifies the data:
#DREREFERENCE A
#DREDBNAME Tagged
#DREFIELD CHECKSUM="ABCD1234"
#DREFIELD CLUSTERID="A"
#DREFIELD CLUSTERSCORE="100.00"
#DRECONTENT
apple banana cheese
#DREENDDOC
#DREREFERENCE B
#DREDBNAME Tagged
#DREFIELD CHECKSUM="ABCD1234"
#DREFIELD CLUSTERID="A"
#DREFIELD CLUSTERSCORE="100.00"
#DRECONTENT
apple banana cheese
#DREENDDOC
#DREREFERENCE C
#DREDBNAME Tagged
#DREFIELD CHECKSUM="XYZ9876"
#DREFIELD CLUSTERID="A"
#DREFIELD CLUSTERSCORE="70.00"
#DRECONTENT
apple banana data
#DREENDDOC
A is tagged as A because it doesn't match any existing clusters.
B is tagged as A because its CHECKSUM field matches A's.
C is tagged as A because it is similar to A and has a score higher than the specified MinScore (60).