To filter a file, you must first determine whether the file contains any subfiles (attachments, embedded OLE objects, and so on). A file that contains subfiles is called a container file. A container file has a main file (parent) and subfiles (children) embedded in the main file.
The following are examples of container files:
Archive files such as ZIP, TAR, and RAR.
Mail messages such as Outlook (MSG) and Outlook Express (EML).
Mail stores such as Microsoft Outlook Personal Folders (PST), Mailbox (MBX), and Lotus Notes database (NSF).
PDF files that contain file attachments.
Compound documents with embedded OLE objects such as a Microsoft Word document with an embedded Excel chart.
NOTE: Supported Formats indicates which formats are treated as container files and are supported by the File Extraction API.
The subfiles might also be container files, creating a file hierarchy of multiple levels. For example, an MSG file (the root parent) might contain three attachments:
a Microsoft Word document that contains an embedded Microsoft Excel spreadsheet.
an AutoCAD drawing file (DWG).
an EML file with an attached Zip file, which in turn contains four archived files.
NOTE: The parent MSG file contains four first-level children. The body text of a message file, although not a standalone file in the container, is considered a child of the parent file.