Recreate a File Hierarchy

When a container file is extracted, any relationships between the subfiles in the container are not maintained. However, the File Extraction interface provides information that enables you to recreate the hierarchy. The hierarchy can be used to create a directory structure in a file system, or to categorize documents according to their relationship to each other. For example, if you use KeyView to generate text for a search engine, the hierarchical information enables your users to search for a document based on the document’s parent or sibling. In addition, when the document is returned to the user, the parent and sibling documents can be returned as recommendations.

The information needed to recreate a file’s hierarchy is provided in the call to extGetSubFileInfo. Call this method to retrieve an object of the ExtSubFileInfo class, then use the getParentIndex() and getChildArray() methods in this object to retrieve information about the subfile’s parent and children. Since you can only retrieve the first-level children in a subfile, you must call extGetSubFileInfo repeatedly until information for the leaf-node children is extracted.

Create a Root Node

Because of their structure, some container files do not contain a subfile or folder which acts as a root directory on which the hierarchy can be based. For example, subfiles in a Zip archive can be extracted, but none of the subfiles represent the root of the hierarchy. In this case, an artificial root node must be created at the top of the file hierarchy as a point of reference for each child, and ultimately to recreate the relationships. This artificial root node is an internal object, and is extracted to disk as a directory called root. Its index number is 0.

To create a root node, call the setCreateNode method in the ExtOpenDocConfig object, and pass ExtOpenDocConfig to the extOpenDocument method. When a root node is created, the value returned from the getNumSubFiles method in the ExtMainFileInfo object includes the root node. For example, when you call extGetMainFileInfo on a Microsoft Word document with three embedded OLE objects and the root node is disabled, the number of subfiles is 3. If you create a root node, the number of subfiles is 4.


For example, you might extract a PST file that contains seven subfiles with a root node enabled. The call to extGetMainFileInfo() returns the number of subfiles as 8 (seven subfiles and one root node). Extracted PST File shows the structure and the available hierarchy information after the subfiles are extracted:

Extracted PST File

The parentIndex specifies the index number of a subfile’s parent. The childArray specifies an array of a subfile’s children. With this information, you can recreate the hierarchy shown in Recreated File Hierarchy:

Recreated File Hierarchy