filter_document

Filtering is the extraction of text from a document. This sample program makes use of the filter API method.

The program takes two positional arguments:

By default, the ouput is encoded in UTF-8.

$ ./filter_document input_file output.txt
CAUTION:

Not all document formats can be filtered. For example, trying to filter a PNG file produces an error message. For some file formats (notably emails), Keyview treats the text as an embedded subfile that you should access by using the extraction API, not the filter API.


_HP_HTML5_bannerTitle.htm