Open topic with navigation
text_to_docs function splits a file into multiple documents.
text_to_docs( doc, sectionName, filename)
||(LuaDocument) The document that you want to divide into multiple documents.|
||(string) The name of the section in the CFS configuration file that contains the TextToDocs configuration parameters. For information about these parameters, see TextToDocs Task Parameters.|
||(string) The file that contains the text to be converted (the original file that resulted in the document).|
LuaDocuments. A list of document objects representing the documents that are produced.
You might have a connector ingesting files from a repository, but want to split those files into multiple documents. The following example uses the get_filename function to find the path of the file associated with an ingested document, and uses the
text_to_docs function to generate multiple documents. This example splits the file using settings in the
[MyTextToDocs] section of the HPE CFS configuration file. It then calls the ingest function to add the resulting documents to the ingest queue.
function handler(document) if document:hasField("PROCESSED") then return true end local file = get_filename(document) local docs = text_to_docs(document, "MyTextToDocs", file) for i, doc in ipairs(docs) do doc:addField("PROCESSED", "YES") ingest(doc) end return true end
In this example, the original documents are also indexed. If you want to index only the documents generated by the
text_to_docs function, you could return
false from the