Adds Parameter

The adds parameter specifies XML that describes the document to ingest. A document must have a unique reference. It can consist of metadata only, content only, or metadata and content. You can specify content using either plain text or a file.

The following XML describes a document with metadata and a file:

<adds>
   <add>
      <document>
         <reference>http://www.example.com/</reference>
         <metadata name="Field1" value="Value1"/>
         <metadata name="Field2" value="Value2"/>
      </document>
      <source filename="MyFile.doc" lifetime="permanent"/>
   </add>
</adds>

You can specify XML metadata using the xmlmetadata element:

<adds>
   <add>
      <document>
         <reference>http://www.example.com/</reference>
         <xmlmetadata>
            <Field1>Value1</Field1>
            <Field2>
               <SubFieldOne>First</SubFieldOne>
               <SubFieldTwo>Second</SubFieldTwo>
            </Field2>
         </xmlmetadata>
      </document>
      <source filename="MyFile.doc" lifetime="permanent"/>
   </add>
</adds>

In the preceding examples the source is specified as a file on the file system. You can also specify the source using a base64 encoded string:

<adds>
   <add>
      <document>
         <reference>http://www.example.com/</reference>
      </document>
      <source content="U29tZSB0ZXh0DQo="/>
   </add>
</adds>

You can also specify the plain text for each section using the pages element:

<adds>
   <add>
      <document>
         <reference>http://www.example.com/</reference>
         <pages content="Page 1 content"/>
         <pages content="Page 2 content"/>
      </document>
   </add>
</adds>

Multiple documents can be ingested by specifying multiple <add> elements. The following table describes the various XML elements.

XML element Description
add (required)

The add element describes a single document that should be ingested by CFS. To ingest multiple documents, use further add sections.

Each add section should contain a document element that describes the document to be indexed. It can also contain a source which describes a binary file to be ingested.

document (required)

The document element describes the document content and metadata.

Each document element must contain a reference element giving the unique reference for this document. It can also contain metadata, xmlmetadata or pages elements.

reference (required) The reference element is used to provide a unique reference (DREREFERENCE) for the document.
metadata (optional)

The metadata element is used to provide a key value pair describing metadata to be ingested.

  • name is the name of the metadata field
  • value is the value of the field
xmlmetadata (optional)

The xmlmetadata element contains any XML metadata that should be associated with the document.

pages (optional) The pages element is used to specify any filtered document content that should be sent with the document. You can use multiple pages elements, and each of these will map to a separate DRESECTION.
page (optional) The page element is used to specify the content for a single DRESECTION. Specify the content as plain text.
source (optional)

The source element describes the location or content of the binary file.

You must set either the filename or content attributes.

  • Filename is the full path to the document.
  • Content is the binary content of the document as a base64 string.

If you specify the location of a file using filename you must also set the Lifetime attribute. This attribute can take one of two values:

  • Permanent instructs CFS not to delete the file.
  • Temporary means that CFS should delete the file after it has been ingested.
NOTE:

The Lifetime attribute is ignored when you ingest an IDX file. CFS does not delete IDX files that are ingested.

Example

<adds>
   <add>
      <document>
         <reference>C:\Autonomy\newfs\data\050309-020409.xls</reference>
         <xmlmetadata>
            <AUTN_GROUP>fs</AUTN_GROUP>
            <AUTN_IDENTIFIER>PGlkIHM9IlRBU0sxIiByPSJDOlxBdXRvbm9teVxuZXdmc1xkYXRhXDA1MDMwOS0wMjA0MDkueGxzIi8+</AUTN_IDENTIFIER>
            <CREATED>2012-Feb-13 10:42:56.232479</CREATED>
            <DocTrackingId>9f3684f499aef4a9c025a43d8125029f</DocTrackingId>
            <DREDBNAME>Test</DREDBNAME>
            <FILESIZE>61440</FILESIZE>
            <LASTACCESSED>2012-Feb-13 10:42:56.232479</LASTACCESSED>
            <LASTCHANGED>2012-Feb-14 09:16:38.250813</LASTCHANGED>
            <LASTMODIFIED>2009-Apr-06 10:21:15.032472</LASTMODIFIED>
         </xmlmetadata>
      </document>
      <source filename="C:/Autonomy/newfs/data/050309-020409.xls" lifetime="permanent"/>
   </add>

   <add>
      <document>
         <reference>C:\Autonomy\newfs\data\070610-100610.xls</reference>
         <xmlmetadata>
            <AUTN_GROUP>fs</AUTN_GROUP>
            <AUTN_IDENTIFIER>PGlkIHM9IlRBU0sxIiByPSJDOlxBdXRvbm9teVxuZXdmc1xkYXRhXDA3MDYxMC0xMDA2MTAueGxzIi8+</AUTN_IDENTIFIER>
            <CREATED>2012-Feb-13 10:42:56.232479</CREATED>
            <DocTrackingId>4d7fbaa368fa1727177b9f1ef06caa57</DocTrackingId>
            <DREDBNAME>Test</DREDBNAME>
            <FILESIZE>54784</FILESIZE>
            <LASTACCESSED>2012-Feb-13 10:42:56.232479</LASTACCESSED>
            <LASTCHANGED>2012-Feb-14 09:16:38.235226</LASTCHANGED>
            <LASTMODIFIED>2010-Jun-14 13:52:24.041742</LASTMODIFIED>
         </xmlmetadata>
      </document>
      <source filename="C:/Autonomy/newfs/data/070610-100610.xls" lifetime="permanent"/>
   </add>
</adds>

Ingest an IDX File

You can ingest an IDX file by using the <source> element to specify the path to the file.

<adds>
   <add>
      <source filename="data.idx" />
   </add>
</adds>

If the <add> element includes a <document> element specifying a reference, metadata, or content:

If the IDX file contains sectioned documents, the sections are merged into a single document.

If CFS fails to parse the IDX file, and no documents were successfully extracted, the IDX is processed as a regular file. If CFS fails to parse the IDX file, but at least one document was successfully extracted, an error is logged and the remainder of the file is not processed.


_HP_HTML5_bannerTitle.htm