text_to_docs

The text_to_docs function splits a file into multiple documents.

Syntax

text_to_docs( doc, sectionName, filename)

Arguments

Argument Description
doc (LuaDocument) The document that you want to divide into multiple documents.
sectionName (string) The name of the section in the CFS configuration file that contains the TextToDocs configuration parameters. For information about these parameters, see TextToDocs Task Parameters.
filename (string) The file that contains the text to be converted (the original file that resulted in the document).

Returns

LuaDocuments. A list of document objects representing the documents that are produced.

Example

You might have a connector ingesting files from a repository, but want to split those files into multiple documents. The following example uses the get_filename function to find the path of the file associated with an ingested document, and uses the text_to_docs function to generate multiple documents. This example splits the file using settings in the [MyTextToDocs] section of the HPE CFS configuration file. It then calls the ingest function to add the resulting documents to the ingest queue.

function handler(document)
   if document:hasField("PROCESSED") then
     return true
   end
   
   local file = get_filename(document)
   local docs = text_to_docs(document, "MyTextToDocs", file)
   
   for i, doc in ipairs(docs) do
      doc:addField("PROCESSED", "YES")
      ingest(doc)
   end
   
   return true
end

In this example, the original documents are also indexed. If you want to index only the documents generated by the text_to_docs function, you could return false from the handler function.


_HP_HTML5_bannerTitle.htm