Recreate a File’s Hierarchy

When you extract a container file, any relationships between the subfiles in the container are not maintained. However, the File Extraction interface provides information that enables you to recreate the hierarchy. You can use the hierarchy to create a directory structure in a file system, or to categorize documents according to their relationship to each other. For example, if you use KeyView to generate text for a search engine, the hierarchical information enables your users to search for a document based on the document’s parent or sibling. In addition, when the document is returned to the user, the parent and sibling documents can be returned as recommendations.

The information needed to recreate a file’s hierarchy is provided in the call to fpGetSubFileInfo(). The members KVSubFileInfo->parentIndex and KVSubFileInfo->childArray provide information about a subfile’s parent and children. Because you can only retrieve the first-level children in the subfile, you must call fpGetSubFileInfo() repeatedly until information for the leaf-node children is extracted.

Create a Root Node

Because of their structure, some container files do not contain a subfile or folder which acts as a root directory on which the hierarchy can be based. For example, subfiles in a Zip archive can be extracted, but none of the subfiles represent the root of the hierarchy. In this case, you must create an artificial root node at the top of the file hierarchy as a point of reference for each child, and ultimately to recreate the relationships. This artificial root node is an internal object, and is extracted to disk as a directory called root. Its index number is 0.

To create the root node, set openFlag to KVOpenFileFlag_CreateRootNode in the call to fpOpenFile(). When you create a root node, the value of numSubFiles in KVMainFileInfo includes the root node. For example, when you call fpGetMainFileInfo() on a Microsoft Word document with three embedded OLE objects and the root node is disabled, numSubFiles is 3. If you create a root node, numSubFiles is 4.

Recreate a File’s Hierarchy—Example

For example, you might extract a PST file that contains seven subfiles with a root node enabled. The call to fpGetMainFileInfo()returns the number of subfiles as eight (seven subfiles and one root node). The following diagram shows the structure and the available hierarchy information after the subfiles are extracted:

The parentIndex specifies the index number of a subfile’s parent. The childArray specifies an array of a subfile’s children. With this information, you can recreate the hierarchy shown in the following diagram.