Open topic with navigation
When a container file is extracted, any relationships between the subfiles in the container are not maintained. However, the File Extraction interface provides information that enables you to recreate the hierarchy. The hierarchy can be used to create a directory structure in a file system, or to categorize documents according to their relationship to each other. For example, if you use KeyView to generate text for a search engine, the hierarchical information enables your users to search for a document based on the document’s parent or sibling. In addition, when the document is returned to the user, the parent and sibling documents can be returned as recommendations.
The information needed to recreate a file’s hierarchy is provided in the call to
. Call this method to retrieve an object of the
class, then use the
methods in this object to retrieve information about the subfile’s parent and children. Since you can only retrieve the first-level children in a subfile, you must call
repeatedly until information for the leaf-node children is extracted.
Because of their structure, some container files do not contain a subfile or folder which acts as a root directory on which the hierarchy can be based. For example, subfiles in a Zip archive can be extracted, but none of the subfiles represent the root of the hierarchy. In this case, an artificial root node must be created at the top of the file hierarchy as a point of reference for each child, and ultimately to recreate the relationships. This artificial root node is an internal object, and is extracted to disk as a directory called
root. Its index number is 0.
To create a root node, call the
method in the
object, and pass
ExtOpenDocConfig to the
method. When a root node is created, the value returned from the
method in the
object includes the root node. For example, when you call
on a Microsoft Word document with three embedded OLE objects and the root node is disabled, the number of subfiles is 3. If you create a root node, the number of subfiles is 4.
For example, you might extract a PST file that contains seven subfiles with a root node enabled. The call to
extGetMainFileInfo() returns the number of subfiles as 8 (seven subfiles and one root node). Extracted PST File shows the structure and the available hierarchy information after the subfiles are extracted:
Extracted PST File
parentIndex specifies the index number of a subfile’s parent. The
childArray specifies an array of a subfile’s children. With this information, you can recreate the hierarchy shown in Recreated File Hierarchy:
Recreated File Hierarchy