HtmlExtraction

The HtmlExtraction task extracts the useful information from a HTML file and discards the irrelevant content, such as invalid HTML, headers, sidebars, advertisements, and scripts.

HPE recommends that you configure the HtmlExtraction task as a Pre import task. For example:

[ImportTasks]
Pre0=HtmlExtraction

There are no parameters to configure this task.


_HP_HTML5_bannerTitle.htm