Implementing Search

The K2 search features provide your application with many choices for giving users precise results. By using the correct type of search, users can quickly find the information they need. Even one-word queries can return accurate results.

Your application conducts searches by making calls to the K2 VSearch Java API or the Verity Client C API.

Simple Search

By default, single words are searched case-insensitively and strings of words are searched as phrases. Your application can force case-sensitive searching by adding the <CASE> operator to the user’s search term. Your application can force a search for the individual words in a phrase (such as air conditioning) by converting the string to a comma-separated list of terms (air,conditioning) before submitting the query to K2.

Stemmed Search

Stemmed search is a fuzzy search (an inexact search) that returns occurrences of indexed words whose word stems match the search term. For example, a stemmed search for dance returns documents that contain dance or dancer or dances.

To conduct a stemmed search, your application adds the <STEM> operator to the user’s search terms.

Typo Search

Typo search is a fuzzy search that allows your users to find and retrieve documents even when they misspell the search terms. K2 allows a customizable, limited range of deviation in spelling between search terms and their indexed equivalents. For example, a typo search for contiment might return documents containing either continent or condiment.

To conduct a typo search, your application adds the <TYPO> operator to the user’s search term.

Synonym Search

Synonym search is a fuzzy search that returns occurrences of the search term or any of its synonyms. The synonyms are listed in a thesaurus file. For example, a synonym search for brave might return documents that contain brave or courageous or fearless.

To conduct a synonym search, your application adds the <THESAURUS> operator to the user’s search term. A properly constructed and compiled thesaurus file must be installed. K2 is delivered with default thesauruses for some languages; for others, the administrator may have to create a thesaurus file, as described in Using Thesauruses.

Soundex Search

Soundex is a fuzzy search that retrieves documents containing terms that are phonetically similar to the search term. For example, a Soundex search for Joan might return documents containing either Jean or Jane.

To conduct a Soundex search, your application adds the <SOUNDEX> operator to the user’s search term.

Word stems and indexed synonyms can be used in search terms, and relevancy ranking to assign a user-identified level of importance to documents. Wild cards can also be employed when they know only a few characters or a characteristic of a string being searched.

Wildcard Search

In wildcard search, users can substitute a wildcard character when they know only some of the characters of the term they are searching for. For example, a wildcard search for ta*l returns documents that contain tail or tall or tactical.

To conduct a wildcard search, your application adds the <WILDCARD> operator to the user’s search term.

Language-Specific Search

K2 includes support for search and display of documents in multiple languages. An application can be licensed to support one or more locales, each of which allows a user to search according to the rules of the locale’s language.

Also, if your K2 installation is using the multilanguage locale (see Locales), users of your application are able to conduct stemmed searches in any of the locale’s languages for which your installation is licensed.

For example, a single collection might include documents in English, French, and Japanese. The user can select a language and enter a term, and then your application can construct a query term like this:

<LANG/fr><STEM>fort

in which case only documents containing French words whose stem is fort will be returned.

Accent-Insensitive Search

Depending on the language of the documents being searched, searches with the simple query parser are by default either accent-insensitive or accent-sensitive. In an accent-insensitive search, using the search term resume returns documents that contain resume or resumé or résumé. In an accent-sensitive search, using the search term resume returns only documents that contain resume.

Converting your application between accent-insensitive and accent-sensitive searching is not a programming task; it is a configuration task performed by the administrator, as explained in the Verity Locale Configuration Guide.