Clustering Results

K2 applications can use document clustering to group related documents on a search results page. Clustering documents into groups can help give the user a sense of the main subject areas covered in a set of search results. For example, if only one of several document groups in the search results is of interest, the user can quickly focus on the most interesting documents without wasting time scanning the rest.

Clustering operates by analyzing the feature vectors in a document set and clustering documents that are more semantically similar to each other than they are to the documents in other clusters. Each document is assigned to one and only one cluster.

Document clustering is an inherently ambiguous process. There is no one “correct” grouping of documents into clusters. The number of clusters can be fixed in advance, or it can be automatically determined by the Verity engine based on the application’s preference for cluster granularity. Documents are clustered on the basis of their text content only, and not on the basis of meta-information such as title or other fields.

When displaying a cluster, a Verity application can also display the most important keywords for the cluster itself, to further help the user to quickly find the most relevant information.