Appendixes > Languages and Language Files > Stop Word Lists for Supported Languages

Stop Word Lists for Supported Languages
Each language that IDOL server supports needs a stop word list (stop list); if the IDOL server installer does not include a stop word list for the language that you want to use, you can create one. A stop word list is a list of common words that IDOL server does not index. Words such as the or a occur too frequently to carry any significance and IDOL server does not require them to understand the concept of text.
You can use a standard text editor to edit the stop word list that your IDOL server uses (stop word lists are located in the IDOL server IDOL/langfiles directory), for example, if you want to add other words that occur in most or all your documents.
You can list the words in the stop word list in any of the valid encodings for that language (for example, in Russian you can specify stop words in KOI8, UTF8, ISO and so on). You can use different encodings within the same stop word list file.
You need to specify each word only once. For example, you do not need to specify a word in several different encodings.
For all operations, IDOL server recognizes words as stop words irrespective of the encoding they are in. For example, in Russian you can list a stop word in the KOI8 encoding in the stop word list file and IDOL server recognizes it if it occurs in a document in UTF8.
For each encoding you want to use, create a section in your stop word list file. Give the section the same name as the language type that you are using. You can specify words in upper or lower case, and you can separate them with spaces or new lines.
For example:
äìñ äï
åå åçï
° ±µ· ±¾»µµ ±Ë ±Ë»s
In this example, a Russian stop word list contains 10 words, of which five are in CYRILLIC_KOI8 encoding and five are in the CYRILLIC_ISO encoding.
Related Topics