Store Content in IDOL Server > Index Data > Index Nonalphanumeric Characters > Character Tokenization

Character Tokenization
You can tokenize characters into N-grams of a specified size. Set the NGram configuration parameter in your language configuration section to the number of characters to use in each N-gram group.
 
NOTE You must not use NGram with the SentenceBreaking configuration parameter.
For example, if you set NGram to 2, then IDOL server tokenizes the word Hello as:
he el ll lo
To tokenize only multiple byte strings, set NGramMultiByteOnly to true.
[Japanese]
NGram=2
NGramMultiByteOnly=TRUE
For this configuration, if you have a document that contains both English and Asian (multiple byte) text, IDOL server tokenizes the Asian text according to the NGram parameter. It does not tokenize the English text.