Convert Character Sets

Export enables you to control the character set of both the input and the output text. This is accomplished by either

The character sets are defined as constants in the Export class. Not all character sets can be used to specify the target character set. See Coded Character Sets for a list of character sets that can be used as a target character set.

Determine the Character Set of the Output Text

To determine the output character set of a converted document, Export considers the following:

Guidelines for Character Set Conversion

shows how the output character set is determined when the document character set can be determined.

shows how the output character set is determined when the document character set cannot be determined.

Examples of Character Set Conversion

The examples below demonstrate possible configurations for mapping character sets and the expected output for each scenario.

Document Character Set Can be Determined

For the example in Document Character Set Can be Determined, the document is an RTF file. The section Word Processing Formats indicates that the document character set can be obtained from this file type. The document character set is Traditional Chinese (BIG5).

Document Character Set Can be Determined

Source charset set

Target charset set

Output charset

KVCS_GB

KVCS_UTF8

KVCS_UTF8

Converts GB (Simplified Chinese) to UTF-8. The output character set is the target character set specified in the API.

KVCS_GB

--

KVCS_GB

Converts BIG5 to GB (Simplified Chinese). The output character set is the source character set specified in the API.

--

KVCS_UTF8

KVCS_UTF8

Converts BIG5 to UTF-8. The output character set is the target character set specified in the API.

--

--

KVCS_BIG5

The output character set is the document character set. No conversion.

Document Character Set Cannot be Determined

For the example in Document Character Set Cannot be Determined, the document is an ASCII file. The section Word Processing Formats indicates that the document character set cannot be obtained from this file type. The document’s source character set is KVCS_1251.

Document Character Set Cannot be Determined

Source charset set

Target charset set

Output charset

KVCS_1252

KVCS_UTF8

KVCS_UTF8

Converts KVCS_1252 to KVCS_UTF8. The output character set is the target character set specified in the API.

KVCS_1252

KVCS_UNKNOWN

KVCS_1252

The output character set is the source character set specified in the API because KVCS_UNKNOWN cannot be used. No conversion.

KVCS_1252

--

KVCS_1252

The output character set is the source character set specified in the API. No conversion.

--

KVCS_1252

KVCS_1252

Converts OS code page to KVCS_1252. The output character set is the target character set specified in the API.

--

--

The output character set is OS code page. No conversion.

Set the Character Set During Conversion

You can convert the character set of a file at the time the file is converted.

To specify the source character set, use the setSourceCharSet method of the OptionInfo object and set setForceSourceCharSet to TRUE.

To specify the target character set, use the setOutputCharSet method of the OptionInfo object and set setForceOutputCharSet to TRUE.

Set the Character Set During File Extraction from a Container

You can convert the character set of a container subfile at the time the subfile is extracted from the container and before it is converted to HTML. This is most often used to set the character set of a mail message’s body text. See Use the File Extraction API.

To specify the source and target character set of a subfile

  1. Use the methods of the ExtSubFileExtractConfig object to set the source and target character set.

  2. Call the extExtractSubFile method of the Export object and pass in the ExtSubFileExtractConfig object.


_HP_HTML5_bannerTitle.htm