Setting Up Parametric Search

Most enterprise information is semi-structured. For instance, a text document (unstructured) commonly includes associated metadata (structured) such as author, content source, date of creation, size, format, and language. Textual product pages in an online catalog commonly include extensive metadata relating to product features.

Parametric search is a Verity search capability in which users can locate information by simultaneously selecting values in structured metadata and searching through unstructured text.

In a typical setup, each document used for searching includes both unstructured data and structured attributes. For example, in the case of documents that describe cars, attributes might include Color, Price, Make, Model, Mileage, Location, and Year. Attributes can have numeric, date, or string values. The free-text portion of a search queries the unstructured data, while the parametric-selection portion queries the structured data through its attributes.

For example a parametric-search query for a car might be described like this:

Find Red cars less than $15,000 with “air conditioning

in which Red is one of the available values for the Color parameter, less than $15,000 selects a range from the Price parameter, and air conditioning is a phrase to be searched for in the unstructured text.

In addition to the ability to combine text queries and parametric values, parametric search can rank results, by either text-query scores or parameter values. The application can then sort the results for the user and allow the user to further refine or broaden the search.

Parametric Indexes

Parametric indexes are the structures that underlie parametric selection. You can think of a parametric index as an extension to a collection’s word index; the Verity engine uses it to identify documents matching the requested parameters.

Conceptually, you can view a parametric index as an n-dimensional “parametric cube,” a matrix in which each dimension represents a parameter. Figure 3-3 shows a portion of a three-dimensional version, in which the parameters are color, model year, and price.


Figure 3-3    A “parametric cube”



Each individual value or range of values for a parameter is called a bucket, and each parameter as a whole (each dimension of the cube) makes up a bucket set. Each bucket holds references to the documents that have that value for that parameter. For example, in the portion of the cube shown in Figure 3-3, the Color bucket set contains three buckets: Black, Red and Silver. The Red bucket identifies all documents in the collection that relate to red cars.

In this example, a parametric selection for all red cars costing less than $15,000 and made in either 1999 or 2000 (the shaded area in Figure 3-3) returns only those documents whose field values satisfy all three criteria.

Administrators or knowledge workers can build parametric indexes on top of existing Verity collections or directly from specifically formatted XML documents.



To create a parametric index, the administrator first creates an XML-based outline file that specifies the collection or XML fields from which to create the parameters. The administrator can then use the mkpi command-line tool to create the parametric index itself. Alternatively, a knowledge worker can use the graphical interface of the Verity Collaborative Classifier to set up the outline file and create the index.

For more information on configuring and using parametric selection, see the Verity K2 Dashboard Administrator Guide, the Verity Collaborative Classifier Guide, or the Verity K2 Parametric Developer Guide.

 

 
Note   parametric indexes can also support user browsing and selection from taxonomies (hierarchical classifications of information). See About Taxonomies and About Relational Taxonomies.