VgwStreamGetTokenFnc



Description

Use the VgwStreamGetTokenFnc function to submit a token that contains content from a document. The Verity engine calls your VgwStreamGetTokenFnc function when reading a stream; for example, during indexing or viewing.


Syntax

VdkError VgwStreamGetTokenFnc(
VgwAppStream   vgwStream,
VdkToken*      pvdkToken)

Arguments

 


vgwStream

VgwAppStream   The stream handle, which was created in your driver’s VgwStreamNewFnc function.

pvdkToken

VdkToken*   A pointer to a VdkTokenRec structure that holds the token type and the data associated with the token.


Member Descriptions

 

Table 7-2    VdkTokenRec Members


Member

Type/Description

owner

VdkVoidp   A pointer to the VgwAppStream structure that identifies the stream, which was created by your driver’s VgwStreamNewFnc callback function.

language

VdkVoidp   Internal.

docText

VdkCString   A pointer to a constant string that contains the document’s contents.

docBytes

VdkUint4   The size of the document’s contents, in bytes.

position

VdkUint4   Internal.

tokFlags

VdkUint1   One or more of the following flags:

VdkTokenFlag_SafeCopy if token need not be freed

VdkTokenFlag_HighLight if data is to be highlighted

VdkTokenFlag_SamePos if position in buffer is logically the same as last position

VdkTokenFlag_BinaryInfo if binary data, i.e. no character mapping is required

VdkTokenFlag_NoFieldOverride if the field should not override an existing field

VdkTokenFlag_IncrPos if incrementing the word position when the zone position is the same as the previous zone

ownerFlags

VdkUint1   Stream-specific flags; typically, these are one-bit flags.

type

VdkTokenType   A token type. Valid values are:

VdkTokenType_Eof for end of document

VdkTokenTyp_Nop for separator

VdkTokenTyp_Para for end of paragraph

VdkTokenTyp_Sent for end of sentence

VdkTokenTyp_Word for word

VdkTokenTyp_Punct for punctuation character (such as a new line, space, or tab)

VdkTokenTyp_Line for NULL-terminated text

VdkTokenTyp_Buffer for a buffer of characters

VdkTokenTyp_Highlight for a buffer of highlighted characters;

VdkTokenTyp_Field for populating an internal field

VdkTokenTyp_Zone for writing a document zone

VdkTokenTyp_FileByName for inserting a file

VdkTokenTyp_NewDoc for generating a new document

VdkTokenTyp_SkipDoc to skip the current document, deleting it from the collection

data

VdkTokenDataRec   The document contents associated with the token; it is the union of the following structures:

VdkTokenBufferRec for VdkTokenTyp_Buffer token type

VdkTokenNewDocRec for VdkTokenTyp_NewDoc token type

VdkTokenFieldRec for VdkTokenTyp_Field token type

VdkTokenZoneRec for VdkTokenTyp_Zone token type

VdkTokenWordRec for VdkTokenTyp_Word token type

VdkTokenFileByNameRec for VdkTokenTyp_FileByName token type

VdkTokenHighlightRec for VdkTokenTyp_Highlight token type


 

Table 7-3    VdkTokenBufferRec Members


Member

Type/Description

buffer

VdkBuffer   A pointer to the contents of the buffer.

size

VdkUint4   The length, in bytes, of text in the buffer. If the data is NULL-terminated, size does not include the NULL character.


 

Table 7-4    VdkTokenFieldRec Members


Member

Type/Description

name

VdkCString   An internal field name, as defined in the collection’s style.ddd file.

type

VdkUint4   A VdkFieldType flag for field tokens. Valid values are:

VdkFieldType_Text for text strings

VdkFieldType_Signed for signed integers

VdkFieldType_Unsigned for unsigned integers

VdkFieldType_Date for VdkDate type data

VdkFieldType_Float for double floats

VdkFieldType_Stream for a binary blob of data

VdkFieldType_Invalid for invalid data

precedence

VdkInt2   Internal.

value

union   The value, which is a union of variables. The VdkFieldType of the data being returned determines which variable to use.

value.number

VdkUint4   A numeric field value (signed or unsigned integer). This value is used for VdkFieldType_Signed and VdkFieldType_Unsigned field tokens.

value.realnum

VdkFloat   An IEEE 754 double-floating field value. Used for VdkFieldType_Float field tokens.

value.date

VdkDate   A date field value in date format. Used for VdkFieldType_Date field tokens.

value.buf

struct   A structure to hold the buffer and buffer size of a text string. Used for VdkFieldType_Text and VdkFieldType_Stream field tokens.

value.buf.buffer

VdkBuffer   A pointer to the field’s value for VdkFieldType_Text and VdkFieldType_Stream field tokens.

value.buf.size

VdkInt4   The length, in bytes, of the value pointed to by buf.buffer. The value of size affects how data referenced by the buffer member will be interpreted. See the discussion below for more information.


 

Table 7-5    VdkTokenZoneRec Members


Member

Type/Description

name

VdkCString   The name of the zone to start or end.

flags

VdkUint2   A flag indicating an action. Valid values are:

VdkTokenZone_Begin to start a new zone

VdkTokenZone_End to end the current zone


 

Table 7-6    VdkTokenFileByNameRec Members


Member

Type/Description

name

VdkCString   The file path to the file to be inserted.

offset

VdkInt4   The start offset into the file.

size

VdkInt4   The number of bytes from the start offset of the file.

flags

VdkUint2   The file name string format flag. Valid values are: VdkTokenFileByName_Exported for external character mapping format

VdkTokenFileByName_Imported for internal character mapping format


 

Table 7-7    VdkTokenNewDocRec Members


Member

Type/Description

key

VdkCString   The VdkDocKey of the new document.

flags

VdkUint4   A flag, which if set to VdkTokenNewDoc_Child, specifies that the document is a child of the current document.

dispatchField

VdkCString   The dispatch field to be populated.

dispatchFN

VdkCString   VDK dispatch field _FN value.

dispatchOF

VdkInt4   VDK dispatch field _OF value.

dispatchSZ

VdkInt4   VDK dispatch field _SZ value.



Returns

This function must return one of the following error codes:

VdkSuccess for success

 

VdkError_* for a standard Verity Developer Kit API error as described in the Verity Developer’s Kit Programming Reference

 

VdkFail for a non-specific error

 


Discussion

Your VgwStreamGetTokenFnc function can either read source data and populate a VdkTokenRec structure, or it can obtain a previously created VdkTokenRec structure. The function then assigns the structure to the pvdkToken argument before returning. The VdkTokenRec structure specifies the kind of token and the data being returned to the Verity engine. When no more source data exists, assign a VdkTokenRec structure of type VdkTokenType_Eof before returning.

The tokens sent by your VgwStreamGetTokenFnc function depend on the kind of document. For more information, see Creating Virtual Documents.

You must specify the stream associated with this token by assigning the vgwStream argument to the owner member of the VdkTokenRec structure.

You must set the owner member of the VdkToken argument in your gateway driver’s VgwStreamGetTokenFnc callback function to the driver’s VgwAppStream pointer; otherwise, the Verity engine may call VgwStreamFreeTokenFnc for the wrong stream.

When the Verity engine calls your driver’s VgwStreamFreeTokenFnc callback function, the owner member of the VdkToken argument may no longer be set to the driver’s VgwAppStream pointer; you should use the VgwAppStream argument to your driver’s VgwStreamFreeTokenFnc callback function instead.

The ownerFlags member in a VdkTokenRec structure is for internal use only and should not be used by your gateway driver.

There is no guarantee that a token will be freed in the order it was created or returned by your driver’s VgwStreamGetTokenFnc callback function.

The Verity engine performs automatic conversion of field token buffers whose field types are VdkFieldType_Text and VdkFieldType_Stream. The contents of the value.buf.buffer member of the VdkTokenFieldRec structure is converted if the value in value.buf.size is zero. In this case, the buffer must contain a null-terminated string, which the Verity engine converts to the target data type, as specified in a style file. The target data types supported are text, dates, integers, and floating-point numbers. If possible, use the universal date format (YYYYMMDDhhmmss) for dates that you want the Verity engine to convert.

If the value of value.buf.size is greater than zero, the contents of the value.buf.buffer member are not converted; for example, you can use it to store binary data. You specify the size as the length of the data in the buffer exclusive of a terminating null character; a terminating null character is not required when the value of value.buf.size is greater than zero.


Example

#define TEXTBUFFER "... document data ..."

static VdkError
VDK_CALLBACK VgwStreamGetToken(VgwAppStream pStream,
VdkToken* ppToken)
{
VdkToken pToken = NULL;

if (!ppToken)
return VdkError_InvalidArgs;
*ppToken = NULL;

if (!pStream)
return VdkError_InvalidArgs;

/***********************************************
* sending one VDK token up the document stream
***********************************************/

if ( !(pToken = (VdkToken)malloc(sizeof(VdkTokenRec))) )
return VdkError_OutOfMemory;

memset(pToken, 0, sizeof(VdkTokenRec));

/***********************************************
* if no more token, send the end of file token
***********************************************/

if (pStream->lastToken){
pToken->type = VdkTokenType_Eof;

} else {
/*************************************************
* send a buffer token, which contains above text
*************************************************/

pToken->type = VdkTokenType_Buffer;
pToken->data.buf.size = (VdkUint4)strlen(TEXTBUFFER);

if ( !(pToken->data.buf.buffer =
(VdkBuffer)malloc(pToken->data.buf.size+1)) ){
free(pToken);
return VdkError_OutOfMemory;
}

strcpy((char*)pToken->data.buf.buffer, TEXTBUFFER);
}

pToken->owner = pStream;

*ppToken = pToken;
return VdkSuccess;
}