FilterTest

The FilterTest program demonstrates most of the Filtering methods available in the Java API. It filters an input document to an output document and enables you to specify command-line options. The command-line options are listed in Options for FilterTest Sample Program.

To run FilterTest

  1. Add the location of the javaapi\KeyView.jar file, the javaapi\sample directory, and the Filter bin directory to the CLASSPATH environment variable.

  2. Type the following command line:

    java -Djava.library.path=bin_directory FilterTest [options] bin_directory input_file output_file

    where,

    bin_directory is the path to the Filter bin directory.

    options is one or more of the options listed in Options for FilterTest Sample Program.

    input_file is the path and file name of the source file.

    output_file is the path and file name of the generated file. If a path is not specified, the file is output to the current directory.

 

Options for FilterTest Sample Program

Option Description
-is Sets the input as a stream. The default is file.
-os Sets the output as a stream. The default is file.
-chunk Filters an input source and returns one chunk of output data. The program calls the filter method repeatedly until the entire output buffer is processed.
-docformat filename

Extracts the file format information and writes it to a file.

filename is the name of the file to which the format information is written.

-summary filename

Extracts the metadata and writes it to a file.

filename is the name of the file to which the metadata is written. See Extract Metadata.

-getTargetCS Extracts the character set used in the output file to the standard output.
-c charset

Sets the character set of the output file. Use the option -getTargetCS to determine whether the target character set specified is used in the output file.

charset is a character set defined in the Filter class. See Coded Character Sets.

-cs charset

Sets the character set of the source file.

charset is a character set defined in the Filter class. See Coded Character Sets.

-rc character Sets a replacement character for characters that cannot be mapped. The default is a question mark (?).
-ip Runs Filter in the same process as the calling application (in process). See Run Filter In Process.
-ooplog Enables error logging. See Enable or Disable Error Logging. Error logs are not generated when in-process filtering is enabled.
-oopmem Enables the memory trace system in the error logs. The memory trace system reports memory leaks and memory overwrites in the log file. See Report Memory Errors. Error logs are not generated when in-process filtering is enabled.
-hf Extracts headers and footers, as well as the body text.
-hftags Puts tags around header and footer data.
-lo Specifies that PowerPoint PPT97 and PPTX file text data is output in a logical reading order.
-lsbmsb Uses LSBMSB byte order for Unicode text. LSBMSB is the "Least Significant Byte Most Significant Byte," or in other words, the byte order for Little Endian systems.
-msblsb For Unicode text, uses MSBLSB byte order. MSBLSB is the "Most Significant Byte Least Significant Byte," or in other words, the byte order for Big Endian systems.
-bomarker Generates the byte order marker for Unicode text.
-nodefcsconv Prevents default conversion of document character encoding. See Prevent the Default Conversion of a Character Set.
-x xmlconfigfile Filters an XML file using customized extraction settings defined in the kvxconfig.ini file. If you do not enter the full path to the INI file, the program looks for the file in the current working directory. See Filter XML Files.
-z tempdirectory

Specifies a temporary directory where temporary files generated by the filtering process are stored. The default is the current working directory.

On Windows systems, there is a 64 K size limit to the temp directory. Once the limit is reached, you must either create a new directory or delete the contents of the existing directory; otherwise, you might receive an error message.

-ps password Specifies a password to open a password-protected PST file. This uses the Container API which is obsolete.
-pdflorder orderFlag

Specifies that PDF files are output in a logical reading order. The parameter orderFlag is one of the following:

  • ltr—left-to-right paragraph direction.
  • rtl—right-to-left paragraph direction.
  • auto—The PDF filter determines the paragraph direction (left-to-right or right-to-left) for each PDF page, and then sets the direction accordingly.
  • raw—Unstructured paragraph flow.

See Filter PDF Files.

-rm If you set this option, text that was deleted from a document with revision tracking enabled is extracted from the document and included in the filtered output. See Extract Tracked Deleted Text.
-embeddedfont If you set this option, text that contains embedded fonts is not filtered from PDF documents. See Filter PDF Files.

_HP_HTML5_bannerTitle.htm