Open topic with navigation
The Filter SDK enables you to write custom readers for formats not directly supported by KeyView. A reader is required to parse the file format and generate a KeyView token stream, which represents the content and format of the document. Filter can then use this token stream to generate a text version of the original document. The readers interact with a structured access layer and a writer to generate a text file in Filter, an HTML file in HTML Export, an XML file in XML Export, and a near-to-original view of the document in the Viewing SDK.
The complexity of a custom reader depends on the file format used by the source document type. A simple reader extracts only the textual content, but ignores formatting and all other non-textual content. Readers of increasing complexity must address one or more of the following:
Even a simple reader might have to parse the following components of a document:
It is very important to fully understand the file specification for the file format used by the document. This is essential in determining how to parse the source file and generate a token stream that accurately and effectively represents the original document.
Within Filter, the custom reader must interact with a structured access layer and the format detection API, which in turn interacts with the top-level API. For a description of the Filter architecture, see
The custom reader must have a module definition file (
*.def) that defines the exported API function calls. In addition, the
formats.ini file must be modified to identify the custom reader and its associated format detection function.
See the source code for the sample custom reader (
utf8sr), which parses plain text files encoded in UTF-8. The source code is in the directory
install is the path name of the Filter installation directory.