Glossary



 


accent-insensitive search

A type of search that includes all accented variations of a letter in the search term. In accent-insensitive search, the search term si would find all instances of both si or , for example. Conversely, in accent-sensitive search, the search term si would find only instances of the unaccented si.

Active Server Page (ASP)

A Windows-specific file for generating Web pages. It contains a combination of server-side scripting, HTML, and COM components. The programming language can be Visual Basic or JavaScript and the Web server must be ASP aware. Compare Java Server Page (JSP).

adaptive ranking

The scoring and ranking of documents based on the historical behavior of users who have issued similar searches.

administrator

Design, install, configure, and maintain the K2 installation. They may also create collections and taxonomies.

Administration Server

A repository for, and synchronizer of, configuration information. In a K2 domain, there is one Administration Server for every host. Compare Master Administration Server.

authentication

The process of identifying a user by passing credentials to a secure server, such as a K2 Server, K2 Broker, or K2 Ticket Server.

browse

A command-line tool that lists the contents (field names and values) of a collection’s document table.

bucket

A value or range of values for a parameter in a parametric index. It identifies a set of like documents for parametric selection.

bucket set

The set of all buckets associated with a parameter in a parametric index.

category

A subject of interest used to collate documents that are relevant to the subject. Logically, a category represents a node in a taxonomy.

character set

A numeric encoding of the characters of a language. Text in a given language can be stored and manipulated using one or more character sets. Examples include ASCII, Shift-JIS, and UTF-8.

classification

The process of assigning documents to categories in a taxonomy.

cluster

A group of documents related by similarities in their content.

clustering

The process of automatically defining the clusters in a set of documents. Clustering uses Verity feature extraction technology.

collection

The set of index files and other information needed to search and classify documents in a repository. A collection stores the locations of all the indexed documents, the locations of all the indexed words in those documents, and metadata about the documents. It does not store the documents themselves.

collection indexing job

A specification of an indexing process, including which documents to index and the times when indexing should occur. Also called K2 Spider job. Compare user-defined job (UDJ).

collection-level security

A security feature that controls a user’s access to a collection as a whole. Collection-level security relies on the user’s group membership in the enterprise’s native security system. Compare document-level security.

concept tree

A hierarchy of key concepts generated by thematic mapping.

content organization

See classification.

controller

A K2 Spider process that manages crawlers and indexers for indexing.

corpus

A large collection or set of documents. An enterprise’s set of repositories can be considered its corpus.

crawler

A K2 Spider process that gathers document data for indexing.

crawling

The process of seeking out documents in a repository to determine if they are valid candidates for indexing. If those documents contain links to other documents, K2 can be configured to follow those links and also crawl the target documents

developer

Create search applications and implement user interfaces that leverage Verity search, classification, and social-network technologies.

didump

A command-line tool that displays a list of the words in a collection’s word index.

distributed search group

A search group in which some servers or brokers in the group are from a different domain.

document filter

A Verity software module that can read documents in one or more specific formats (such as PDF, XML, or Microsoft Word). Document filters receive documents from gateways, extract text data and field information from them, and pass that information along for indexing and storage in a collection.

document-level security

A security feature that controls which documents can either (1) appear in search results for a particular user, or (2) be viewed by that user. Document-level security relies on the user’s access permissions to the individual repositories. Compare collection-level security.

document profile

The representation of a document in a recommendation index. A document profile is based on the document’s feature vector and evolves over time from information based on queries that select the document. See also user profile.

document table

A table in a collection that specifies the location of each indexed document. The document table also contains all metadata (parameters) associated with each document.

domain

A grouping of K2 services consisting of one Master Administration Server and all the K2 services (K2 Ticket Servers, K2 Brokers, K2 Servers, Administration Servers, and so on) that it configures.

end users

Use K2 applications to search, browse, and retrieve information.

entity

In the Recommendation Engine, a person, document, query, or other object or concept that can be profiled using tensor-space analysis.

feature extraction

The process of automatically discovering the subjects addressed in a document by performing vector analysis on nouns and noun phrases. Feature extraction is performed during indexing.

feature vector

A mathematical structure, constructed during feature extraction, that represents the set of subjects addressed in a particular document.

field

A discrete item of document metadata, such as author, title, location, or creation date, in a Verity collection.

filter

See document filter.

fuzzy search

A search with the ability to retrieve documents containing words with spelling and typographical differences from the search term. Fuzzy search types include typo search, Soundex search, and stemmed search.

gateway

A Verity software module used to retrieve documents from a specific type of repository. K2 includes gateways for local file systems, HTTP, Documentum, ODBC databases, MAPI (MS Exchange), and Lotus Notes.

index

A Verity structure that provides the basis for searching. Examples include collection indexes, knowledge trees, parametric indexes, and recommendation indexes.

indexer

A K2 Spider process that performs indexing.

indexing

The process of scanning a document to create a word index and to store its metadata (fields and internal zones) into a collection.

input context

In the tensor analysis performed by the Recommendation Engine, the combination of a query, the document being viewed, and the user's identity.

intellectual capital management

The process of combining human knowledge and experience (both implicit and explicit) with the information and data in an enterprise for the purpose of exploiting greater value.

interest profile

a VQL query, stored in a profile net, that the Profiler Service compares documents to for the purpose of document classification or message routing.

Java Server Page (JSP)

A file containing Java code mixed with HTML and JavaScript. Used to generate Web pages. Compare Active Server Page (ASP).

job

See collection indexing job, user-defined job.

K2 Broker

A K2 service that receives client search requests and distributes them to available K2 Servers.

K2 Dashboard

A browser-based user interface that enables administrators to view and change configuration settings for K2 services from a single computer, even when the K2 services reside on many different computers.

K2 domain

A K2 system consisting of one Master Administration Server and all the K2 services configured by that Master Administration Server. Note that a K2 domain is unrelated to a Windows NT domain.

K2 search group

A set of K2 Brokers and K2 Servers containing one top-level K2 Broker and all the other K2 Brokers and K2 Servers attached below it, possibly including ones in different K2 domains. A search request handled by the top-level K2 Broker can be passed to any of the other K2 Brokers and K2 Servers in the search group, including those from other K2 domains.

K2 Server

A K2 service that receives search, viewing, profiling, and recommendation requests and performs searches of collections, knowledge trees, parametric indexes, RE Doc Indexes, and RE User Indexes.

K2 services

The executable processes in a K2 system, such as a K2 Broker, a K2 Server, or a K2 Ticket Server.

K2 Spider

A tool to perform spidering. K2 Spider executes through the K2 Server, and thus can perform distributed spidering. Compare Verity Spider.

K2 Spider Client

The executable and command-line tool used to interface with K2 Spider Servers to create and manage indexing jobs.

K2 system

A generic term meaning a K2 installation. It may be either a K2 domain or a K2 search group.

K2 Ticket Server

A K2 service that is used to implement secure access to collections, search results and documents. The K2 Ticket Server stores information in memory for users who have been authenticated.

knowledge tree

A structure for organizing documents for navigation to subjects of interest. A knowledge tree consists of a taxonomy plus category definitions plus documents attached to those categories.

knowledge worker

A librarian or domain expert that makes decisions about what information sources to make available to users of a K2 installation. They index collections, and create and populate taxonomies.

language identification filter

A document filter (flt_lang) used by the multilanguage locale to assign a language to a document before indexing.

locale

1. A geographic or political region that shares the same language and customs. 2. See Verity locale.

Logistic Regression Classifier (LRC)

A Verity tool that creates a category definition from a set of positive and negative exemplary documents.

Master Administration Server

An Administration Server that is the central hub for K2 configuration information. A K2 domain must have one and only one Master Administration Server.

metadata

Data that describes other data. For example, Author and Size could be metadata for a Microsoft Word document. Fields in Verity collections contain document metadata that can be searched for.

mirroring

The creation of multiple duplicate collections attached to different K2 Servers. K2 Spider can be configured to create mirrored collections.

mkprf

A command-line tool for building and maintaining profile nets.

mksyd

A command-line tool used to build a thesaurus from a thesaurus control file.

mktopics

A command-line tool for building and updating topic sets.

mkvdk

An all-purpose command-line collection maintenance tool.

multilanguage locale

A Verity locale (uni) that supports multiple languages simultaneously. See also single-language locale.

no results filtering

A setting for document-level security in which all found documents are displayed in results lists, regardless of user access rights. Compare results-list filtering.

noun phrase

A group of words (for example, due process or court of law) that functions as a noun. Part-of-speech processing during indexing can lead to the automatic extraction of noun phrases, which can be used in document feature extraction.

OTL file

See topic outline file (OTL).

outline file

1. An XML file that specifies the structure of a parametric index. 2. See topic outline file (OTL).

parallel querying

The ability to simultaneously search multiple collections. K2 Server and K2 Broker support parallel querying.

parametric index

An index that supports parametric selection.

parametric selection

The ability to search for documents based on the values of one or more document parameters, combined with full-text search on document content.

passage-based summary (PBS)

An automatically generated document summary that consists of text passages in which the search term appears, typically highlighted.

profile

See document profile, interest profile, user profile.

profile net

A set of stored interest profiles (queries) against which the Profiler Service evaluates documents.

Profiler Service

A K2 service that evaluates an incoming stream of documents against the interest profiles in a profile net. Developers can use Profile Services in applications such as message routing and document classification.

proximity search

A type of search that returns documents in which the specified terms are close to each other (for example, in the same sentence or separated by no more than a specified number of words).

rcadmin

A command-line tool used to administer K2. It has similar functionality to the K2 Dashboard.

rck2

A command-line tool used to connect to K2 Servers for searching collections and other Verity indexes.

rcvdk

A command-line tool used for searching collections and displaying documents.

RE Doc Index

A data file that contains the profiles of the documents in a collection.

RE User Index

A data file that contains the profiles of a set of users on a host.

Recommendation Engine

The K2 component that provides recommendations.

recommendation index

A data file that contains entity profiles used by the Recommendation Engine.

relational taxonomies

A Verity feature in more than one taxonomy is applied to a set of information. Relational taxonomies allows users to simultaneously navigate through the different taxonomies.

repository

A group of documents that are all stored in the same location and accessed through the same protocol, such as a file system. Repositories can include relational databases or proprietary storage systems such as Microsoft Exchange folders or Lotus Notes databases.

results-list filtering

A setting for document-level security in which results lists show only those documents that a user can retrieve. Compare no results filtering.

score

A numerical value indicating the degree of match between a document and a query. Scores, usually expressed to the end user as a decimal number between 0 and 1, are calculated during Verity search or Profiler operations. Scores are based on numerous factors, including the number of times search/query words appear in the document, their location in the document, and their proximity.

search group

A grouping of K2 services consisting of one top-level K2 Broker plus all the other K2 Brokers and K2 Servers attached to it.

search worker

A software module in federated search that connects to and retrieves information from a particular kind of information source.

session-based profile

Temporary and dynamic user profiles that can be used to track relevant searches and purchases so that similar products can also be recommended.

single-language locale

A Verity locale that supports only one language. Most locales are single-language. Compare multilanguage locale.

social network

A model of the explicit and implicit relationships between the people in an organization and the documents they create, modify, access, search, and organize.

Soundex search

A kind of search in which occurrences of the search term plus any words with similar pronunciation are returned. Verity supports Soundex search for the English language only.

spidering

The process of crawling and indexing the contents of a repository.

stemmed search

A kind of search that locates all words that share the same word stem. For example, a stemmed search for the term dance would find all occurrences of dance, but also all occurrences of dances and dancer.

stop-word list

A file containing search terms that should be ignored. Verity supports several types of stop-word lists, some used at indexing time and others at search time.

style file

A file used to configure the indexes and fields in a collection.

StyleSet Editor

The Verity application that enables administrators to create and modify style files.

synonym search

A type of search that returns all occurrences of the search term and also any of its synonyms, as defined in a thesaurus.

taxonomy

The hierarchical organization of categories. A taxonomy defines a structure for accessing data.

tensor

A multidimensional mathematical structure used by the Recommendation Engine to construct a weighted representation of the significant subjects and actions of a document or user.

tensor space

A multidimensional space to hold tensors used by the Recommendation Engine.

thematic mapping

A process that automatically discovers the key concepts in a collection of documents and maps the hierarchical relationships between them.

thesaurus

A dictionary of synonyms. Each Verity Locale supports use of a thesaurus for searching. In a synonym search, all occurrences of the search term and any of its synonyms are returned.

ticket

A temporary access pass granted by the K2 Ticket Serverto a user for as long as the user is logged in.

topic

A stored query expression written in the Verity Query Language (VQL). Topics are used to model concepts of interest in a classification task, or to enable users to quickly find information without having to compose sophisticated queries. See also topic set.

topic outline file (OTL)

A text file that defines the structure of a topic set. Topic outline files have a file extension of .otl.

topic set

A grouping of topics that have been compiled for use by a Verity application. For classification tasks, a topic set contains one or more topics used to classify documents in a collection.

transaction

A modification of one or more entities in a recommendation index. For example, a transaction may make a document more relevant to a particular query due to user input.

typo search

A kind of search that corrects for minor misspellings in the search terms. In a typo search, occurrences of the search term and any words close to it in spelling are returned.

Unicode

A standard for double-byte character sets. The Unicode standard encodes the characters for all major modern languages in one character set. There are various implementations of portions of the Unicode standard. The implementation used by the Verity multilanguage locale is UTF-8.

universal filter

A document filter that determines the file type of the incoming document and then invokes a suitable helper filter for extracting the available text and metadata.

user defined job (UDJ)

A specification, created by the administrator, of a command-line tool to be executed plus its associated arguments. Jobs can be scheduled and chained. Compare collection indexing job.

user profile

The representation of a user in a recommendation index. A user profile is created over time from information such as documents authored by the user, interests submitted, queries asked, and documents rated or viewed. See also document profile.

VDK

1. Verity Developer's Kit, the API that enables developers to build Verity functionality into their products. 2. The programming core on which most Verity applications are built.

Verity Intelligent Classifier

An application for creating, viewing, editing, and testing topics and taxonomies.

Verity locale

A software module that allows Verity applications to operate on documents in a specific language or set of languages. A locale provides one or more capabilities that may include tokenization, stemming, part-of-speech recognition, and thesaurus use. See also single-language locale, multilanguage locale.

Verity Query Language (VQL)

Verity’s standard language for creating search queries.

vspider

A command-line tool that provides document indexing capabilities. See also K2 Spider.

wildcard search

A type of search in which the search term contains special symbols that represent multiple characters. For example, a wildcard search with the term abc* returns all documents containing occurrences of words that start with abc.

word index

A collection index that lists all words that appear in the collection’s documents and the location of every instance of each word.

worker

See search worker.

zone

A named region of a document. Examples are HTML tags such as TITLE, BODY, and H1, and email fields such as TO, FROM, and SUBJECT. Zones can be made searchable in collections and can also be saved as collection fields.