Check OCR Document Quality

The OCR task module automatically checks the quality of incoming documents with optical character recognition (OCR) data that IDOL server receives. You can then perform different tasks, depending on the quality of the documents.
IDOL server compares the content of document SourceType fields to the language model in the term file to identify good or bad OCR output. It gives these OCR documents a score based on the number of words in the document that are “real” or nonsense. A low score indicates that a low proportion of the terms in the document do not match the internal language model that OCR tasks use.