Extract Mail Metadata

You can extract metadata such as subject, sender, and recipient from MSG, EML, MBX, PST, and NSF files by calling the extGetSubFileMetadata() method. You can extract a predefined set of metadata fields, individual fields, or both, that are unique to a file format.

Default Metadata Set

KeyView internally defines a set of common mail metadata fields that can be extracted as a group from mail formats. This default metadata set is listed in Default Mail Metadata List. When you retrieve all metadata for a file—that is, pass NULL for the array of metadata—the complete set of default metadata, not all available metadata in the file, is returned.

Default Mail Metadata List

Field Name (string to specify)

Description

From

The display name and email address of the sender.

Sent

The time the message was sent.

To

The display names and email addresses of the recipients.

Cc

The display names and email addresses of recipients who receive copies of the email.

Bcc

The display names and email addresses of recipients who received blind copies of the email.

Subject

The text in the subject line of the message.

Priority

The priority applied to the message.

Because mail formats use different terms for the same fields, the format’s reader maps the default field name to the appropriate format-specific name. For example, when retrieving the default metadata set, the NSF field Importance is mapped to the name Priority and is returned.

You can also extract the default field names individually by passing the field name (such as From, To, and Subject); however, in this case, the string is not mapped to the format-specific name. For example, if you pass Priority in the call, you will retrieve the contents of the Priority field from an MBX file, but will not retrieve the contents of the Importance field from an NSF file.

NOTE: Note: You cannot pass the field names listed in Default Mail Metadata List individually for PST files. However, you can pass either the MAPI tag number or one of the constants in the Export class as integers. See Microsoft Personal Folders File (PST) Metadata.

Extract the Default Metadata Set

To extract the default metadata set, call the extGetSubFileMetadata(long docContextID, int nSubFileIndex, ExtSubFileMetaConfig config) method.

For example:

ExtSubFileMetaConfig metaConfig = new ExtSubFileMetaConfig();
ExtSubFileMetadata subfilemeta = null;
subfilemeta = m_objExport.extGetSubFileMetadata(extContextID, index, metaConfig);

Microsoft Outlook (MSG) Metadata

In addition to the default metadata set, the metadata fields listed in MSG-Specific Metadata List can be extracted for MSG files. The field name must be passed to metaNameArray in the call to the extGetSubFileMetadata() method.

 

MSG-Specific Metadata List

Field Name (string to specify)

Description

AttachFileName

An attachment's long file name and extension, excluding path.

ConversationTopic

The topic of the first message in a conversation thread. A conversation thread is a series of messages and replies. This is the first message’s subject with any prefix removed.

CreationTime

The time the message or attachment was created. This value is displayed in the Sent field in the message’s Properties dialog in Outlook.

InternetMessageID

The identifier for messages that come in over the Internet. This is the MAPI property PR_INTERNET_MESSAGE_ID. This property is not in the MAPI headers or MAPI documentation.

LastModificationTime

The time the message or attachment was last modified. This value is displayed in the Modified field in the message’s Properties dialog in Outlook.

Location

The physical location of the event specified in the Outlook calendar entry.

MessageID

The message transfer system (MTS) identifier for the message transfer agent (MTA). This value is displayed on the Message ID tab in the message’s Properties dialog in Outlook.

Received

The date and time a message was delivered. This value is displayed in the Received field in the message’s Properties dialog in Outlook.

Sender

The name and email address of the message sender. This value is a concatenation of two MAPI properties in the following format:

"PR_SENDER_NAME" <PR_SENDER_EMAIL_ADDRESS>

The Sender value might be the same as or different than the default metadata From value (see Default Metadata Set), depending on which MAPI properties exist in the MSG file.

Sensitivity

The value indicating the message sender's opinion of the sensitivity of a message, such as Personal, Private, or Confidential. This value is displayed in the Sensitivity field in the message’s Properties dialog in Outlook.

TransportMsgHeaders

Contains transport-specific message envelope information. This value corresponds to the MAPI property PR_TRANSPORT_MESSAGE_HEADERS.

StartDate

Contains an appointment start date. This value corresponds to the PR_START_DATE MAPI property.

EndDate

Contains an appointment end date. This value corresponds to the PR_END_DATE MAPI property.

Extract MSG-Specific Metadata

To extract specific metadata fields from an MSG file, use the method extGetSubFileMetadata(long docContextID, int nSubFileIndex, java.lang.String[] metaNameArray, ExtSubFileMetaConfig config) and pass the field name defined in MSG-Specific Metadata List to metaNameArray (the string is not case sensitive).

For example, the following code extracts the contents of the ConversationTopic and MessageID fields:

ExtSubFileMetaConfig metaConfig = new ExtSubFileMetaConfig();
ExtSubFileMetadata subfilemeta = null;
String[] metaNameArray = {"conversationtopic", "MessageID"};
subfilemeta = m_objExport.extGetSubFileMetadata(extContextID, index, metaNameArray, metaConfig);

Microsoft Outlook Express (EML) and Mailbox (MBX) Metadata

In addition to the default metadata set, you can extract any metadata field that exists in the header of an EML or MBX file by passing the field’s name. If the name is a valid field in the file, the contents of the field are returned. For example, to retrieve the name of the last mail server that received the message before it was delivered, you can pass the string "Received".

Extract EML- or MBX-Specific Metadata

To extract specific metadata fields from an EML or MBX file, use the method extGetSubFileMetadata(long docContextID, int nSubFileIndex, java.lang.String[] metaNameArray, ExtSubFileMetaConfig config) and pass the metadata name to metaNameArray (the string is not case sensitive).

For example, the following code extracts the contents of the Received and Mime-version fields:

ExtSubFileMetaConfig metaConfig = new ExtSubFileMetaConfig();
ExtSubFileMetadata subfilemeta = null;
String[] metaNameArray = {"Received", "Mime-version"};
subfilemeta = m_objExport.extGetSubFileMetadata(extContextID, index, metaNameArray, metaConfig);

Lotus Notes Database (NSF) Metadata

In addition to the default metadata set, you can extract any Lotus field name that exists in an NSF file by passing the field’s name. (You can extract fields from mail NSF files and non-mail NSF files.) If the name is a valid field in the file, the field is returned. For example, to retrieve the date a document in an NSF file was last accessed, you would pass the string "$LastAccessedDB".

NOTE: Note: A complete list of NSF fields are provided in the Lotus Notes file stdnames.h. This header file is available in the Lotus API Toolkit.

Extract NSF-Specific Metadata

To extract specific metadata fields from an NSF file, use the method extGetSubFileMetadata(long docContextID, int nSubFileIndex, java.lang.String[] metaNameArray, ExtSubFileMetaConfig config) and pass the metadata name to metaNameArray (the string is not case sensitive).

For example, the following code extracts the contents of the Description and Categories fields:

ExtSubFileMetaConfig metaConfig = new ExtSubFileMetaConfig();
ExtSubFileMetadata subfilemeta = null;
String[] metaNameArray = {"description", "Categories"};
subfilemeta = m_objExport.extGetSubFileMetadata(extContextID, index, metaNameArray, metaConfig);

Microsoft Personal Folders File (PST) Metadata

In addition to the default metadata set, you can extract Messaging Application Programming Interface (MAPI) properties from a PST file. These properties describe elements (subject, sender, recipient, and so on) of Outlook items within the PST file. Since the properties are stored in the PST file itself, they can be retrieved before the contents of the PST are extracted. This enables you to determine whether an Outlook item should be extracted based on a subfile’s attributes. MAPI properties are also stored for Outlook attachments that are not mail messages (such as an attached Microsoft Word document or Lotus 1-2-3 file).

MAPI Properties

Each MAPI property is identified by a property tag, which is a constant that contains the property type and a unique identifier. For example, the property that indicates whether a message has attachments has the following components:

Property

PR_HASATTACH

Identifier

0x0E1B

Property type

PT_BOOLEAN (000B)

Property tag

0x0E1B000B

The Microsoft MAPI documentation on the Microsoft Developer Network website lists all available MAPI properties, their tags, and types.

You can retrieve any MAPI property that is of one of the MAPI property types listed below:

PT_I2

PT_DOUBLE

PT_STRING8

PT_I4

PT_FLOAT

PT_TSTRING

PT_BINARY

PT_LONG

PT_SYSTIME

PT_BOOLEAN

PT_SHORT

PT_UNICODE

NOTE: Note: Properties with a PT_TSTRING type have the property type recompiled to either a Unicode string (PT_UNICODE) or to an ANSI string (PT_STRING8) depending on the operating system’s character set. To retrieve the Unicode property, pass in the Unicode version of the tag. For example, the property tag for PR_SUBJECT is either 0x0037001E for an ANSI string, or 0x0037001F for a Unicode string.

Extract PST-Specific Metadata

In the call to extract subfile metadata, you can pass either the MAPI tag number (such as 0x0070001e) or one of the constants in the Export class (such as KVPR_SUBJECT). These constants are a subset of MAPI properties and use a KeyView naming convention. For example, the property PR_CONVERSATION_TOPIC is defined as KVPR_CONVERSATION_TOPIC. If the property you want to retrieve is not defined as a constant in the Export class, you must pass the MAPI tag number.

To extract specific MAPI properties from a PST file, use the method extGetSubFileMetadata(long docContextID, int nSubFileIndex, int[] metaNameArray, ExtSubFileMetaConfig config) and pass the tag number or constant to metaNameArray.

For example, the following code extracts the MAPI properties PR_SUBJECT and PR_ALTERNATE_RECIPIENT:

ExtSubFileMetaConfig metaConfig = new ExtSubFileMetaConfig();
ExtSubFileMetadata subfilemeta = null;
int[] metaNameArray = {Export.KVPR_SUBJECT, 0x3A010102};
subfilemeta = m_objExport.extGetSubFileMetadata(extContextID, index, metaNameArray, metaConfig);

Exclude Metadata from the Extracted Text File

When a mail message is extracted, the message text and header information (To, From, Sent, and so on) is also extracted. You can prevent the header information from appearing in the text file.

To exclude the header information, call the setExcludeMailHeader() method of the ExtSubFileExtractConfig object, and pass ExtSubFileExtractConfig to the extExtractSubFile method. For example:

m_excludeMailHeader = true;
extconfig = new ExtSubFileExtractConfig();
extconfig.setExcludeMailHeader(m_excludeMailHeader);
extinfo = m_objExport.extExtractSubFile(extContextID, i, extconfig);

_HP_HTML5_bannerTitle.htm