Store Content in IDOL Server > Process Data before you Index > Check OCR Document Quality > Set up an OCR Task

Set up an OCR Task
Use the following procedure to set up an OCR task.
To set up an OCR task
1.
Open the IDOL server configuration file in a text editor.
2.
[MyOCRTask]
3.
Set Module to OCR, to identify the task as an OCR task.
Module=OCR
4.
Specify the task that you want IDOL server to perform if a document meets the quality criteria. For example:
GoodTask=MyIndexTask
5.
Specify the task that you want IDOL server to perform if a document does not meet the quality criteria. For example:
BadTask=MyFileWriterTask
6.
EmptyTask=MyFileWriterTask
7.
TermFile=C:\TermFiles\EnglishTerms.dat
8.
Specify the path to the stop list file to use. IDOL server ignores stop list words when it determines the document quality. For example:
StopList=C:\Autonomy\IDOLserver\IDOL\langfiles\English.dat
9.
Specify any other parameters that you want to apply to your OCR task. For details on available parameters, refer to the IDOL server Online Help. For example:
Language=ENGLISH
Encoding=UTF8
10.
Add an OCRFilter section if you want to specify details that determine whether an OCR document is good or poor quality. For example:
[OCRFilter]
11.
Specify the quality threshold value (0–200) that determines how IDOL server processes a document next. If a document score is lower than the Threshold value, IDOL server executes the specified GoodTask next. If the score is higher than the Threshold value, IDOL server executes the specified BadTask next. For example:
Threshold=50
12.
Specify any other parameters that you want to apply to your OCR task for [OCRFilter]. For details on available parameters, refer to the online help. For example:
Punctuation = :;/*<>
MinimumValidTerms = 2
13.
Save the IDOL server configuration file and restart IDOL server for your configuration changes to take effect.
Related Topics 
*
Example
In the following example, an OCR task automatically checks the quality of OCR documents. It uses the following configuration:
[MyOCRTask]
Module=OCR
GoodTask=MyIndexTask
BadTask=MyFileWriterTask
TermFile=EnglishTerms.dat
StopList=C:\Autonomy\IDOLserver\IDOL\langfiles\English.dat
[OCRFilter]
Threshold=50
Punctuation=:;/*<>
MinimumValidTerms=10
PercentagePunctuation=15
Every time IDOL server performs this task on an incoming document, it looks at the quality score of the document. It also checks the number of valid terms in a document, and the amount of punctuation characters (based on the characters specified as punctuation in the Punctuation parameter).
For the document to be considered good quality, it must satisfy the following conditions:
*
*
*
If the document meets these conditions, IDOL server forwards it to MyIndexTask, and indexes it.
If any of these conditions are not met, then IDOL server forwards the document to MyFileWriterTask, and writes it to disk.