parse_document_xml

The parse_document_xml function parses an XML file into documents and calls a function on each document.

Syntax

parse_document_xml( filename, handler [, params ] )

Arguments

Argument Description
filename (string) The path and file name of the XML file to parse into documents.
handler (document_handler_function) The function to call on each document that is parsed from the XML file. The function must accept a LuaDocument as the only argument.
params (table) A table of named parameters to configure parsing. The table maps parameter names (String) to parameter values. For information about the parameters that you can set, see the following table.

Named Parameters

Named Parameter Description
content_paths (string list, default DRECONTENT) The paths in the XML to the elements that contain document content. You can specify a list of paths.
document_root_paths (string list, default DOCUMENT) The paths in the XML to the elements that represent the root of a document. You can specify a list of paths.
include_root_path (boolean, default false) Specifies whether to include the document_root_paths node in the document metadata. The default value includes only children of the root node.
reference_paths (string list, default DREREFERENCE) The paths in the XML to elements that contain document references. Though you can specify a list of paths, there must be exactly one reference per document.

Example

The following example parses an XML file named data.xml, and calls the function printReference on each document. Two values have been set for the named parameter content_paths. You might want to do this if there are multiple fields that contain content or you want to use the same script with XML files that have different schema.

local function printReference(document)
    print(document:getReference())
end

local xmlParams = {
        document_root_paths={"DOC"},
        reference_paths={"REF"},
        content_paths={"CONTENT","MORE_CONTENT"}
    }

parse_document_xml("./data.xml", printReference, xmlParams)

Returns

Nothing.


_HP_HTML5_bannerTitle.htm