Language Configuration

For each language that you use, create a [MyLanguage] section, using the name of the language listed below. In each section, configure the parameters that determine how to handle the language.

The individual language configuration parameters override any values that you set for these parameters in the [LanguageTypes] section. If you do not specify a parameter for the individual language, IDOL Content Component uses the value in the [LanguageTypes] section, or the internal default.

NOTE:

If you have enabled automatic language detection, you can configure a General language to apply to any documents that are not in a specific language. IDOL Content Component assigns languages to the General language if it identifies the encoding but not the language. Add the appropriate encodings to the [General] configuration section.

If the document encoding is not configured, the document is placed into the DefaultLanguageType.

For example:

[english]
Encodings=ASCII:englishASCII,UTF8:englishUTF8
Stoplist=english.dat
IndexNumbers=1

[afrikaans]
Encodings=ASCII:afrikaansASCII,UTF8:afrikaansUTF8
IndexNumbers=1
			
[albanian]
Encodings=ASCII:albanianASCII,UTF8:albanianUTF8
IndexNumbers=1
			
[arabic]
Encodings=ARABIC_ISO:arabicARABIC_ISO,ARABIC:arabicARABIC,UTF8:arabicUTF8
IndexNumbers=1
			
[chinese]
Encodings=CHINESESIMPLIFIED:chineseCHINESESIMPLIFIED,CHINESETRADITIONAL:chineseCHINESETRADITIONAL,UTF8:chineseUTF8
SentenceBreaking=chinesebreaking
IndexNumbers=1
			
[general]
Encodings=UTF8:generalUTF8,ASCII:generalASCII,CYRILLIC:generalCYRILLIC
IndexNumbers=1
Acehnese Galician Luxembourgish Slovenian
Afrikaans Georgian Macedonian Somali
Albanian German Malagasy Sorbian
Amharic Gilaki Malay Spanish
Arabic Greek Malayalam Sranan
Armenian Greenlandic Maltese Sundanese
Azeri Guarani Manipuri Swahili
Basque Gujarati Maori Swedish
Belorussian Haitian Marathi Syriac
Bengali Hausa Mazandarani Tagalog
Berber Hawaiian Mirandese Tahitian
Bihari Hebrew Mongolian Tajik
Bikol Hindi Nahuatl Tamil
Bishnupriya Hungarian Navajo Tatar
Bosnian Icelandic Ndebele Telugu
Breton Igbo Nepali Tetum
Bulgarian Ilokano Newari Thai
Burmese Indonesian Norwegian Tibetan
Catalan Italian Oriya Tokpisin
Cebuano Japanese Ossetian Tongan
Cherokee Javanese Panjabi Tsonga
Chinese Traditional Kalmyk Papiamentu Tswana
Chinese Simplified Kannada Persian Turkish
Chuvash Kapampangan Polish Turkmen
Croatian Kazakh Portuguese Ukrainian
Czech Khmer Pushto Urdu
Danish Kikongo Quechua Uyghur
Divehi Kinyarwanda Rhaeto-Romance Uzbek
Dutch Kirundi Romanian Valencian
English Komi Russian Venda
Erzya Korean Sakha Vietnamese
Esperanto Kurdish Sami Waraywaray
Estonian Kyrgyz Sanskrit Welsh
Ethiopic Lao Serbian Wolof
Faroese Lappish Sesotho Xhosa
Finnish Latin Sesotho sa Leboa Yiddish
French Latvian Singhalese Yoruba
Frisian Lingala Siswant Zulu
Gaelic Lithuanian Slovak  

 

AugmentSeparators

DecompositionFile

DiminishSeparators

Encodings

HyphenChars

IndexNumbers

NGram

NGramMultibyteOnly

NGramOrientalOnly

Normalise

NumberPunctuation

OCRLanguageFile

ProperNames

SentenceBreaking

SentenceBreakingOptions

SoftSeparators

Stemming

StemmingFile

Stoplist

TangibleCharacters

Transliteration


_HP_HTML5_bannerTitle.htm