Returning Document Summaries

Document summarization is a feature that can be used to generate content summaries for documents listed on the search-results page. The goal of automatic document summarization is to accelerate the browsing of search results returned by the Verity search engine.

By presenting a short summary for each document in a results list, a Verity application can help users quickly assess the relevance of the returned documents without wasting time loading and skimming the full text of the documents.

The following types of document summarization are available in K2:

Static summaries. K2 applications can support two types of static summarization:

 

Simple summarization displays information from the beginning of a document, for example the first 400 bytes.

Content summarization generates summaries by selecting sentences from the document that are indicative of the overall theme of the text. This kind of summarization relies on Verity feature extraction, as described in Extracting Document Features.

Static summaries are enabled by the administrator and created at collection-indexing time.

Passage-based summaries. A passage-based summary consists of one or more passages (sentences or phrases) from the document, each of which contains instances, usually highlighted, of the search terms that were used to locate the document. For example, with passage-based summary enabled, searching for the term report in a collection might yield a result like this:

 

Installed Reporting Components
...components in the tree: The report server. This is a standard K2 Server whose... one report server in a K2 domain. Its alias is report_server. The ... report index is attached to the report server. There is only one report index in ...

Your application displays a passage-based summary for a document by making calls to the Client C API or the VSearch Java API. Passage-based summaries must be enabled by the administrator at the time of indexing.

For more information on summarization, see the Verity Developer’s Kit Programming Reference, the Verity Collection Reference, and the Verity K2 Client Programming Guide.