How Spelling Correction Works

To enable spell checking, set the parameters SpellCheckMaxCheckTerms, SpellCheckIncorrectMaxDocOccs, and UnstemmedMinDocOccs in the [Server] section of the configuration file before you index content. When you perform a query that includes Spellcheck=True, the IDOL Content component uses these settings in the spell checking process, as shown below:

  1. Content determines if the query is eligible for spell checking.

    Content checks how many terms the query text contains (it ignores stop words, proper-name terms and hyphenated terms). If the number does not exceed the specified SpellCheckMaxCheckTerms, the query is eligible for spell checking.

  2. Content determines which terms are misspelled.

    Content checks how many times each query term occurs in its data index. If a term occurs fewer times than the specified SpellCheckIncorrectMaxDocOccs, Content assumes that the term is misspelled.

  3. Content finds correct spellings and suggests them.

    Content uses a proprietary term-distancing algorithm to find terms in its data index that are closest to the misspelled terms. It then checks how many times these terms occur. If a term occurs at least the specified number of UnstemmedMinDocOccs times, it uses it as a spell check suggestion.

    Content returns the corrected terms as a comma-separated list in an <autn:spelling> field. It also returns a corrected version of the query text in an <autn:spellingquery> field.

  4. When you shut down the IDOL Content component, it creates a spelling correction file.

    The spelling correction file stores the corrections that you make. You can add further corrections to the file or amend existing corrections.