Extract Mail Metadata

You can extract metadata, such as subject, sender, and recipient, from MSG, EML, MBX, PST, and NSF files, by calling the fpGetSubFileMetaData() function. You can extract a predefined set of metadata fields, individual fields, or both, that are unique to a file format.

Default Metadata Set

KeyView internally defines a set of common mail metadata fields that you can extract as a group from mail formats. This default metadata set is listed in the following table. When you retrieve all metadata for a file—that is, pass NULL for the array of metadata—the complete set of default metadata, not all available metadata in the file, is returned.

Default Mail Metadata List

Field Name (string to specify) Description
From The display name and email address of the sender.
Sent The time that the message was sent.
To The display names and email addresses of the recipients.
Cc The display names and email addresses of recipients who receive copies of the email.
Bcc The display names and email addresses of recipients who received blind copies of the email.
Subject The text in the subject line of the message.
Priority The priority applied to the message.

Because mail formats use different terms for the same fields, the format’s reader maps the default field name to the appropriate format-specific name. For example, when retrieving the default metadata set, the NSF field Importance is mapped to the name Priority and is returned.

You can also extract the default field names individually by passing the field name (such as From, To, and Subject); however, in this case, the string is not mapped to the format-specific name. For example, if you pass Priority in the call, you retrieve the contents of the Priority field from an MBX file, but do not retrieve the contents of the Importance field from an NSF file.

NOTE: You cannot pass the field names listed in the table individually for PST files. However, you can pass either the MAPI tag number or the MAPI tag name as integers. See Microsoft Personal Folders File (PST) Metadata.

Extract the Default Metadata Set

To extract the default metadata set, call the fpGetSubFileMetaData() function, and pass 0 for metaNameCount and NULL for metaNameArray.

KVGetSubFileMetaArgRec metaArg;
			KVSubFileMetaData pMetaData = NULL;
			KVStructInit(&metaArg);

			metaArg.index = subFileIndex;
			metaArg.metaNameCount = 0;
			metaArg.metaNameArray = NULL;

			error = extractInterface->fpGetSubFileMetaData(pFile, &metaArg, &pMetaData);
			...
			extractInterface->fpFreeStruct(pFile,pMetaData);
		pMetaData = NULL;

Microsoft Outlook (MSG) Metadata

In addition to the default metadata set, you can extract the metadata fields listed in the following table for MSG files. You must pass the field name to metaNameArray in the call to the fpGetSubFileMetadata() function.

MSG-specific Metadata List

Field Name (string to specify) Description
AttachFileName An attachment's long file name and extension, excluding the path.
ConversationTopic The topic of the first message in a conversation thread. A conversation thread is a series of messages and replies. This is the first message’s subject with any prefix removed.
CreationTime The time that the message or attachment was created. This value is displayed in the Sent field in the message’s Properties dialog in Outlook.
InternetMessageID The identifier for messages that come in over the Internet. This is the MAPI property PR_INTERNET_MESSAGE_ID. This property is not in the MAPI headers or MAPI documentation.
LastModificationTime The time that the message or attachment was last modified. This value is displayed in the Modified field in the message’s Properties dialog in Outlook.
Location The physical location of the event specified in the Outlook calendar entry.
MessageID The message transfer system (MTS) identifier for the message transfer agent (MTA). This value is displayed on the Message ID tab in the message’s Properties dialog in Outlook.
Received The date and time a message was delivered. This value is displayed in the Received field in the message’s Properties dialog in Outlook.
Sender

The name and email address of the message sender. This value is a concatenation of two MAPI properties in the following format:

"PR_SENDER_NAME" <PR_SENDER_EMAIL_ADDRESS>

The Sender value might be the same as or different than the default metadata From value (see Default Metadata Set), depending on which MAPI properties exist in the MSG file.

Sensitivity The value indicating the message sender's opinion of the sensitivity of a message. For example, Personal, Private, or Confidential. This value is displayed in the Sensitivity field in the message’s Properties dialog in Outlook.
TransportMsgHeaders Transport-specific message envelope information. This value corresponds to the MAPI property PR_TRANSPORT_MESSAGE_HEADERS.
StartDate An appointment start date. This value corresponds to the PR_START_DATE MAPI property.
EndDate An appointment end date. This value corresponds to the PR_END_DATE MAPI property.

Extract MSG-Specific Metadata

To extract specific metadata fields from an MSG file, call the fpGetSubFileMetaData() function, and pass the field name defined in Default Metadata Set to metaNameArray (the string is not case sensitive).

For example, the following code extracts the contents of the ConversationTopic and MessageID fields:

KVGetSubFileMetaArgRec metaArg;
			KVSubFileMetaData pMetaData = NULL;
			KVStructInit(&metaArg);
			KVMetaNameRec names[2];
			KVMetaName    pname[2];

			names[0].type = KVMetaNameType_String;
			names[0].name.sname = "conversationtopic";
			names[1].type = KVMetaNameType_String;
			names[1].name.sname = "MessageID";

			pname[0] = &names[0];
			pname[1] = &names[1];

			metaArg.metaNameCount = 2;
			metaArg.metaNameArray = pname;
			metaArg.index = subFileIndex;

			error = extractInterface->fpGetSubFileMetaData(pFile, &metaArg, &pMetaData);
			...
			extractInterface->fpFreeStruct(pFile,pMetaData);
		pMetaData = NULL;

Microsoft Outlook Express (EML) and Mailbox (MBX) Metadata

In addition to the default metadata set, you can extract any metadata field that exists in the header of an EML or MBX file by passing the field’s name. If the name is a valid field in the file, the content of the field is returned. For example, to retrieve the name of the last mail server that received the message before it was delivered, you can pass the string "Received".

Extract EML- or MBX-Specific Metadata

To extract specific metadata fields from an EML or MBX file, call the fpGetSubFileMetaData() function, and pass the metadata name to metaNameArray (the string is not case sensitive).

For example, the following code extracts the contents of the Received and Mime-version fields:

KVGetSubFileMetaArgRec metaArg;
			KVSubFileMetaData pMetaData = NULL;
			KVStructInit(&metaArg);
			KVMetaNameRec names[2];
			KVMetaName    pname[2];

			names[0].type = KVMetaNameType_String;
			names[0].name.sname = "Received";
			names[1].type = KVMetaNameType_String;
			names[1].name.sname = "Mime-version";

			pname[0] = &names[0];
			pname[1] = &names[1];

			metaArg.metaNameCount = 2;
			metaArg.metaNameArray = pname;
			metaArg.index = subFileIndex;
			error = extractInterface->fpGetSubFileMetaData(pFile, &metaArg, &pMetaData);
			...
			extractInterface->fpFreeStruct(pFile,pMetaData);
		pMetaData = NULL;

Lotus Notes Database (NSF) Metadata

In addition to the default metadata set, you can extract any Lotus field name that exists in an NSF file by passing the field’s name. (You can extract fields from mail NSF files and non-mail NSF files.) If the name is a valid field in the file, the field is returned. For example, to retrieve the date when a document in an NSF file was last accessed, you would pass the string "$LastAccessedDB".

NOTE: A complete list of NSF fields is provided in the Lotus Notes file stdnames.h. This header file is available in the Lotus API Toolkit.

Extract NSF-Specific Metadata

To extract specific metadata fields from an NSF file , call the fpGetSubFileMetaData() function, and pass the metadata name to metaNameArray (the string is not case sensitive).

For example, the following code extracts the contents of the Description and Categories fields:

KVGetSubFileMetaArgRec metaArg;
			KVSubFileMetaData pMetaData = NULL;
			KVStructInit(&metaArg);
			KVMetaNameRec names[2];
			KVMetaName    pname[2];

			names[0].type = KVMetaNameType_String;
			names[0].name.sname = "description";
			names[1].type = KVMetaNameType_String;
			names[1].name.sname = "Categories";

			pname[0] = &names[0];
			pname[1] = &names[1];

			metaArg.metaNameCount = 2;
			metaArg.metaNameArray = pname;
			metaArg.index = subFileIndex;

			error = extractInterface->fpGetSubFileMetaData(pFile, &metaArg, &pMetaData);
			...
			extractInterface->fpFreeStruct(pFile,pMetaData);
		pMetaData = NULL;

Microsoft Personal Folders File (PST) Metadata

In addition to the default metadata set, you can extract Messaging Application Programming Interface (MAPI) properties from a PST file. These properties describe all elements of an Outlook item in a PST file (such as subject, sender, recipient, and message text). Because the properties are stored in the PST file itself, you can retrieve them before you extract the contents of the PST. This enables you to determine whether an Outlook item should be extracted based on its attributes. Some MAPI properties are also stored for Outlook attachments that are not mail messages (such as an attached Microsoft Word document or Lotus 1-2-3 file).

NOTE: Because all elements of a message (except non-mail attachments) are represented by MAPI properties, you can extract all components of a subfile, including the header and message text, by calling the fpGetSubFileMetadata() function.

MAPI Properties

Each MAPI property is identified by a property tag, which is a constant that contains the property type and a unique identifier. For example, the property that indicates whether a message has attachments has the following components:

Property PR_HASATTACH
Identifier 0x0E1B
Property type PT_BOOLEAN (000B)
Property tag 0x0E1B000B

The Microsoft MAPI documentation on the Microsoft Developer Network website lists all available MAPI properties, their tags, and types.

You can retrieve any MAPI property that is of one of the MAPI property types listed below:

PT_I2 PT_DOUBLE PT_STRING8
PT_I4 PT_FLOAT PT_TSTRING
PT_BINARY PT_LONG PT_SYSTIME
PT_BOOLEAN PT_SHORT PT_UNICODE
NOTE: Properties with a PT_TSTRING type have the property type recompiled to either a Unicode string (PT_UNICODE) or to an ANSI string (PT_STRING8) depending on the operating system’s character set. To retrieve the Unicode property, pass in the Unicode version of the tag. For example, the property tag for PR_SUBJECT is either 0x0037001E for an ANSI string, or 0x0037001F for a Unicode string.

Extract PST-Specific Metadata

In the call to extract subfile metadata, you can pass either the MAPI tag number (such as 0x0070001e) or the MAPI tag name (such as PR_CONVERSATION_TOPIC). If you specify the MAPI tag name, you must include the mapitags.h and mapidefs.h Windows header files, in which the MAPI tag name is defined as a tag number.

To extract specific MAPI properties from a PST file, call the fpGetSubFileMetaData() function, and pass the property tag to metaNameArray. The tag is passed as an integer.

For example, the following code extracts the MAPI properties PR_SUBJECT and PR_ALTERNATE_RECIPIENT:

KVGetSubFileMetaArgRec metaArg;
			KVSubFileMetaData pMetaData = NULL;
			KVMetaNameRec names[2];
			KVMetaName    pName[2];

			names[0].type = KVMetaNameType_Integer;
			names[0].name.iname = PR_SUBJECT;

			names[1].type = KVMetaNameType_Integer;
			names[1].name.iname = 0x3A010102;

			pName[0] = &names[0];
			pName[1] = &names[1];

			KVStructInit(&metaArg);

			metaArg.metaNameCount = 2;
			metaArg.metaNameArray = pName;
			metaArg.index = SubFileIndex;

			error = extractInterface->fpGetSubFileMetaData (pFile,&metaArg,&pMetaData);
			...
		extractInterface->fpFreeStruct(pFile,pMetaData);
pMetaData = NULL;
NOTE: You must include the mapitags.h and mapidefs.h Windows header files, in which PR_SUBJECT is defined as 0x0037001E.

Exclude Metadata from the Extracted Text File

When you extract a mail message, the message text and header information (To, From, Sent, and so on) is also extracted. You can prevent the header information from appearing in the text file.

To exclude the header information, set extractFlag to KVExtractionFlag_ExcludeMailHeader in the call to fpExtractSubFile().


_HP_HTML5_bannerTitle.htm