Glossary



 


accent-insensitive search

A type of search that includes all accented variations of a letter in the search term. In accent-insensitive search, the search term si would find all instances of both si or , for example. Conversely, in accent-sensitive search, the search term si would find only instances of the unaccented si.

Active Server Page (ASP)

A Windows-specific web page containing a combination of server-side scripting, HTML, and COM components. The programming language can be VB Script, Visual Basic, JScript, or JavaScript and the web server must be ASP aware. It is similar in concept to JSP.

adaptive ranking

Also called popularity-based search results or popular ranking, it is the scoring and ranking of documents based on the historical behavior of users who have issued similar searches. The ranks of documents that are relevant to the search terms improve over time.

administrator

A person who designs, installs, configures, and maintains a K2 installation. An administrator may also create collections and taxonomies.

Administration Server

A repository for configuration information. In a K2 Domain, there is one Master Administration Server or Administration Server for every host. An Administration Server services the administration needs on a host and synchronizes with the Master Administration Server.

API

See Application Programming Interface.

Application Programming Interface (API)

A set of routines that an application uses for creating programs that interface to other programs.

ASP

See Active Server Page (ASP).

attribute

In parametric selection, a parameter type consisting of a name-value pair.

authentication

The process of passing credentials to a secure server, such as an Information Server or a K2 Server, K2 Broker, or K2 Ticket Server.

auto-case

A Verity search feature which, when enabled, conducts case-insensitive search when the search term is single-case (such as cat or CAT), and case-sensitive search when the search term is mixed-case (such as Cat or caT). With Auto-Case, the word cAt would be found by searching for cat or CAT, but not by searching for Cat or caT.

auto-detection

A Verity capability in which a document is analyzed to determine its character set and/or its language. Verity’s auto-detection can accurately determine both the character set and the native language of many documents.

auto-initialization

Pre-populating the Recommendation Engine with user information.

BIF

See bulk insert file.

branch node

In Verity Intelligent Classifier, a node that is between the top-level nodes and the leaf nodes.

browse

A command-line tool that lists the contents (field names and values) of a collection’s document table.

browsing

In K2, the act of traversing a knowledge tree to view the contents of a collection category-by-category. See also category drill-down.

bucket

A container for all keys associated with a discrete value or range in a parameter in a parametric index; it identifies a set of like documents for parametric selection.

bucket set

The set of all buckets associated with a parameter in a parametric index.

bulk insert file (BIF)

A text file used to compile data into Verity internal formats. For example, a BIF is used to submit documents to a collection for indexing, or to insert queries into a Profile Net. Typically, BIFs are also used to insert or edit metadata not found in the source documents being indexed.

callback function

A function supplied by the application that is called by the Verity engine.

case-insensitive search

A type of search in which the case of the letters in the search term does not matter. In case-insensitive search, the search term Cat would find all instances of cat or CAT or Cat, for example. Conversely, in case-sensitive search, the search term Cat would find only instances of Cat.

category

A subject of interest used to collate documents that are relevant to the subject; logically, a category represents a node in a taxonomy.

categorization

See classification.

category alias

An alternative name used to identify the category in all parts of the system.

category definition

A mathematical rule against which a document can be evaluated for membership in the category.

category drill-down

The process of successively browsing from a category to one of its sub-categories to find information.

category ID

A unique identifier for a category.

category name

The name or label associated with a category, used when browsing a knowledge tree or results list. Category names do not have to be unique.

CGI

See Common Gateway Interface.

character map

A mapping between characters encoded in one character set and the same characters encoded in a second character set. Often, different character sets encode the same characters, because different standards were set up by different groups of people to encode the same set of characters from one character set to another without loss.

character set

A numeric encoding of the characters of a language. Text in a given language can be stored and manipulated using one or more character sets. The mapping occurs between characters and byte strings; that is, the combination of a particular character encoding (which maps between byte strings and integers) and a particular coded character set (which maps between integers and characters). Examples include ASCII, Shift-JIS, and UTF-8.

charset

In general terms, this is an abbreviation of character set. In K2, this term corresponds to the -charset argument, which sets a character set using a given command-line tool.

child node

In Verity Intelligent Classifier, a node directly underneath another node in the topic tree.

classification

The process of assigning documents to categories in a taxonomy. See also content organization.

cluster

A group of documents related by similarities in their content.

clustering

The process of automatically discovering the clusters in a set of documents.

collection

The set of files and folders that stores all the information needed by VDK to search and classify documents in a repository. A collection stores the locations of all the indexed documents, the locations of all the indexed words in those documents, and metadata about the documents. It does not store the documents themselves.

collection indexing job

Specifies details for the indexing process, such as which documents to include in a collection, or the times when indexing should occur. These are also called K2 Spider jobs. See also user-defined job (UDJ).

collection purge

The most common method for purging indexed documents. A collection purge removes all indexed content from the collection, leaving only the style files that contain the collection schema. This type of purge is necessary when style file changes are made that affect the schema.

A collection purge automatically resets all indexing jobs that route information to only the specified collection. However, an indexing job can be set to route information to more than one collection, as in the case of mirrored collections. When a job routes information to more than one collection, you need to perform a job purge to properly reset it for indexing. See also job purge.

collection schema

The fields internal to a collection (as specified in style.ddd and any optional style.ufl files) and fields external to the collection (as specified by the gateway).

collection-level security

A security system that controls access to collections based on the user’s identity. See also document-level security.

Common Gateway Interface (CGI)

A standard that allows a web browser such as Netscape or Microsoft Internet Explorer to interact with a program running on the web server.

compound word

A word created by linking several independent words. Decomposition in indexing breaks up a compound word into subwords and creates index entries for each one.

concept extraction

The analysis and extraction of recurring key concepts in documents, and the process of relating these key concepts to the particular documents containing them.

concept tree

A hierarchy of key concepts, which is generated by thematic mapping.

content organization

The process of building taxonomies, defining categories, and populating taxonomies. See also classification.

Controller

A K2 Spider Server process executed with the -controller option. A Controller manages Crawlers and Indexers and the gathering and distribution of indexing job information.

corpus

A set of documents drawn from one or more and indexed by K2.

CPU binding

A feature that detects whether you are licensed for fewer CPUs than are available on your system and prompts you for information on the processors your K2 Server(s) will use.

Crawler

A K2 Spider Server process executed with the -crawler option. A Crawler gathers document data for indexing jobs.

crawling

The process of seeking out documents in a repository to determine if they are valid candidates for indexing. If those documents contain links to other documents, K2 can be configured to follow those links and also crawl the target documents.

cross-reference

A subcategory that refers to another category elsewhere in the knowledge tree, where it is defined.

decomposition

The process of breaking a compound word into its constituent subwords for indexing. Searches for a subword will then return all occurrences of the compound word.

decomposition pattern

In a user dictionary for the japanb locale, a numeric pattern that specifies how a compound word is to be broken into subwords.

default installation locale

The locale specified in the configuration file verity.cfg. If defined, it is the default session locale.

default session language

The language used as the default for queries during a VDK session. This applies only when the session locale is the multilanguage locale (uni).

default session locale

The locale assigned to a VDK session if no locale is specified when the session is opened.

delimiter

A character used by the tokenizer to split document text into searchable units. For many locales, white space and punctuation are the most common delimiters. See also tokenization.

developer

A person who creates search applications and implements user interfaces that leverage Verity search, classification, and social-network technologies.

didump

A command-line tool that generates a list of the words (tokens) in a collection’s word index.

distributed search group

A search group in which some servers or brokers in the group are from a different domain.

DLL

See dynamic link library.

document filter

A driver-level plug-in software module that can read documents in one or more specific formats (such as PDF, XML, or Microsoft Word). Document filters receive documents from gateways, extract text data and field information from them, and pass that information along for indexing and storage in a collection.

document index

A recommendation index that contains profiles of documents in a collection.

document key

A unique identifier for each document in a collection. It identifies the gateway key and collection alias of a document.

document profile

An internal representation of a document in a recommendation index. It is based on the content of the document and the relevance of the document content to query terms. Document profiles are created on a per-collection basis, and are initially seeded with the information stored about the document in a Verity collection. See also user profile, entity profile.

document summary

A short summary of the content in a document. K2 applications can automatically generate document summaries for display in search results. A document summary can be a static summary, dynamic summary, or passage-based summary.

document table

A table in a collection that specifies the location of each indexed document. The document table also contains all metadata (parameters) associated with each document.

document-level security

A method of controlling which documents appear in search results for a particular user. See also collection-level security.

domain

A grouping of K2 services consisting of one Master Administration Server and all the K2 Services (K2 Ticket Servers, K2 Brokers, K2 Servers, Administration Servers, and so on) that it configures.

double-byte string

A string formed by a sequence of characters from a double-byte character set. Each character in a double-byte character set is strictly two bytes in length. The null character that terminates the string is also a two-byte null-character. A double-byte string differs from a multibyte string in that the characters are all a fixed length of two bytes, whereas the characters in a multibyte string are a variable length. Unicode is an example of a double-byte character set.

dynamic highlighting

A method of highlighting the search term in a document summary or in a retrieved document. In dynamic highlighting, the application actually searches through the results or the document to locate and highlight the term. Dynamic highlighting is slower but more accurate than static highlighting.

dynamic link library (DLL)

A module containing functions that other programs or DLLs can call. DLLs cannot run by themselves; they must be loaded by other programs.

dynamic summary

A document summary that is generated at viewing time. A dynamic summary consists of phrases selected by K2 as being representative of the document as a whole. Compare static summary, passage-based summary.

end user

A person who uses K2 applications to search, browse, and retrieve information.

entity

A single resource (such as a user, document, expert, product, category, query) stored in a recommendation index.

entity index

A recommendation index that contains profiles for entities other than users or documents.

entity profile

The representation of an entity (other a user or document) in a recommendation index. See also document profile, user profile.

enumeration

In parametric selection, a parameter type. Enumerations are useful for flat, non-hierarchal data.

extended character

1. A character above the ASCII range (32 through 127) in Windows-based single-byte character sets. 2. An accented character.

eXtensible Markup Language

(XML)

A simplified form of SGML. A W3C standard for semantic and structural tagging of XML documents. It is a set of rules for forming semantic tags that break a document into parts and identify the different parts of the document.

external fields

Repository field names exposed to the Verity search engine. External field names can be the same as repository field names or an obvious mapping of the repository field name. See also internal fields.

external service

A K2 service (K2 Server, K2 Broker, or K2 Ticket Server) that belongs to a different K2 Domain. K2 services in the same K2 domain are called local services.

feature extraction

The process of automatically discovering the subjects addressed in a document by performing vector analysis on nouns and noun phrases. Feature extraction is performed during indexing.

feature vector

A mathematical structure, constructed during feature extraction, that represents the set of subjects addressed in a particular document.

federated search

A capability provided by the Verity Federator application that can a. post a query to multiple sources of information such as internal indices, web searches, and proprietary subscription sources (news feeds and business information services b. retrieve results from all of them, and c. merge them into a unified presentation for the user.

field

A discrete data item associated with the document such as the author, title, document location, creation date, etc.

foreign service

(This term is obsolete. Use external service.)

foreign word

In the uni locale, a word in any language other than the overall language of its document.

full-width character

In Japanese, a Katakana or Romaji character that occupies the same amount of horizontal space as a Kanji character. In Japanese character sets, a full-width character has a different character code than its half-width equivalent.

fuzzy search

The ability to find and retrieve a document even with spelling and typographical errors in the search. Phonetic and sounds-like searches are also possible.

gateway

A Verity software module used to retrieve documents from a specific type of repository for both indexing and viewing. K2 includes gateways for local files system, HTTP, Documentum, ODBC databases, MAPI (MS Exchange), and Lotus Notes.

gateway key

A unique identifier for any document or record in a collection. It is used to locate and retrieve the contents of the document. It is also known as the VdkVgwKey, short for Verity Development Kit Verity Gateway Key.

half-width character

In Japanese, a Katakana or Romaji character that occupies half the horizontal space of a Kanji character. In Japanese character sets, a half-width character has a different character code from its full-width equivalent.

HTML Export

See Verity Export SDK.

i18n

An abbreviation for internationalization.

index

A Verity collection, knowledge tree, or parametric index. See also RE Doc Index and RE User Index.

Indexer

A K2 Spider Server process executed with the -indexer option. An Indexer inserts document data into a Verity collection.

index file

A file used by a search engine to locate specific web pages in a web site. The structure of an index file is similar in concept to the index of a book, where keywords are cross-referenced to their occurrence on pages.

indexing

A process that scans each document to be indexed and enters document text words and locations, as well as the metadata (title, author, size, internal zones, and so on) into a collection. See also crawling.

input context

The combination of a query, the document being viewed, and the user's identity.

intellectual capital management

The process of combining human knowledge and experience (both implicit and explicit) with the information and data in an enterprise for the purpose of exploiting greater value.

interest profile

A VQL query, stored in a profile net, that the Profiler Service compares documents to for the purpose of document classification or message routing.

internal character set

See session character set.

internal fields

Document fields that are stored in indexed document tables internally managed by the Verity search engine. In contrast, external fields are stored in indexed document tables in external repositories, such as applications, relational databases, and so on. Internal fields are stored in the style.ddd file associated with a collection.

Internal fields can be displayed in a results list or used to define how the document body text is viewed. Only internal fields, not external fields, can be searched using Verity Query Language operators.

internal locale

See session locale.

internationalization

The process that occurs during application development that makes localization easier by separating locale differences from the rest of the program, which stays the same. If internationalization is thorough, localization requires no programming.

Internet-style query parser

A free-text query parser that lets users conduct familiar web-style searches. See also query parser.

iterator

Software used to “walk” an arbitrary string, one character at a time. It recognizes that multibyte characters vary in length and can jump ahead several bytes at a time. It keeps some state information when walking the string so that it can recognize the difference between one-byte and two-byte mode. This allows it to return characters from the string one at a time, no matter how many bytes are in each character.

Java Server Page (JSP)

Java code embedded in web pages. Similar in concept to ASP.

job

See collection indexing job and user-defined job (UDJ).

job purge

Purges the job that originally indexed the documents that now need removal from the collection. A job purge resets the particular job by removing its document records. It also removes from the associated collection(s) only those documents inserted by the job. A job purge is useful when the properties of one job are changed, and multiple jobs are associated with the collection. See also collection purge.

JSP

See Java Server Page.

K2 Broker

A K2 service that receives client search requests and distributes them to available K2 Servers.

K2 Dashboard

A browser-based user interface that enables administrators to view and change configuration settings for K2 services from a single computer, even when the K2 services reside on many different computers.

K2 Domain

A K2 system consisting of one Master Administration Server and all the K2 services (K2 Ticket Servers, K2 Brokers, K2 Servers, Administration Servers, and so on.) configured by that Master Administration Server. See also K2 system.

K2 Index Server

 

A K2 Index Server is a K2 service that performs maintenance of parametric indexes, taxonomies, and topic sets.

K2 Reporting

An administrative tool that allows you to generate, view, and export user activity reports. These reports include search and document retrieval information, such as the most frequent queries entered, categories navigated, or result documents selected.

The report data helps you determine how to tune K2 system performance, and how to make content more accessible to users. You can use the K2 Dashboard to view user activity reports for your entire K2 domain, or for specific K2 Brokers, Servers, or indexes.

K2 search group

A set of K2 Brokers and K2Servers containing one top-level K2 Broker and all the other K2 Brokers and K2 Servers attached below it, possibly including ones in different K2 Domains. Therefore, a search request handled by the top-level K2 Broker can be passed to any of the other K2 Brokers and K2 Servers in the search group, including those from other K2 domains. See also external service.

K2 Server

A K2 service that receives search, viewing, profiling, and recommendation requests and performs searches of collections, knowledge trees, parametric indexes, RE Doc Indexes, and RE User Indexes.

K2 services

The executable processes in a K2 system, such as a K2 Broker, a K2 Server, or a K2 Ticket Server.

K2 Spider

A tool to perform spidering. K2 Spider performs crawling and, by default, also performs indexing. However, this indexing can be disabled and the crawled material can be indexed using another Verity tool. See also Verity Spider (vspider).

K2 Spider Client

The executable and command-line tool used to interface with K2 Spider Servers to create and manage indexing jobs.

K2 Spider job

See collection indexing job.

K2 Spider Server

The executable and command-line tool that is run as either a Controller or some combination of Crawlers and Indexers.

K2 system

A generic term meaning a K2 installation. It may be either a K2 Domain or a K2 search group.

K2 Ticket Server

The K2 service that is used to implement document-level security and (along with the gateway) to implement collection-level security.

K2User

The information obtained via a K2UserLogin call from a K2 Ticket Server. It contains encrypted user ID information and encrypted server specifications.

KeyView filter

A document filter, based on Verity KeyView technology, that is used during indexing to process many types of files.

knowledge tree

A structure for organizing documents for navigation to subjects of interest. A knowledge tree consists of a taxonomy, category definitions, documents that have been placed into categories within the taxonomy by applying category definitions, and other information about the documents.

knowledge worker

A librarian or domain expert that makes decisions about what information sources to make available to users of a K2 installation. They index collections, and create and populate taxonomies.

l10n

An abbreviation for localization.

language ID

A two-character (ISO 639) code that specifies an individual language. Examples are en for English and zh for simplified Chinese. Verity uses language IDs for specifying languages for the multilanguage locale and the language identification command-line tool.

language identification

A Verity command-line tool that identifies the language of a document.

language identification filter

A document filter (flt_lang) used by the multilanguage locale to assign a language to a document before indexing.

LCID

See locale ID.

leaf nodes

In Verity Intelligent Classifier, the nodes that form the lowest level of the topic tree.

local service

See external service.

locale

A geographic or political region that shares the same language and customs. See also Verity Locale.

locale definition file

A file (loc00.lng) in each locale’s directory that controls the language handling characteristics of the locale.

locale ID (LCID)

A 32-bit value defined by Windows that consists of a language ID, a sort ID, and reserved bits.

locale-sensitive

Exhibiting different behavior or returning different data, depending on the locale. For example, the Win32 sort functions return different results depending on the locale parameter sent to each function.

localization

The process of adapting a program for a specific international market, which includes translating the user interface, resizing dialog boxes, defining lexing and stemming rules, customizing features (if necessary), and testing results to ensure that the program works as expected. See also internationalization.

Logistic Regression Classifier (LRC)

Software that creates a category definition from a set of positive and negative exemplary documents. Positive documents refer to documents that are relevant to the topic (or category) of interest. Negative documents are the opposite, that is, documents that are irrelevant to the topic (or category) of interest.

LRC

See Logistic Regression Classifier.

map

Provides a way of limiting the repository fields available to the Verity search engine. The gateway can specify external fields and then map them to the repository fields they represent.

Master Administration Server

The central hub for K2 configuration information. A K2 Domain must have one and only one Master Administration Server.

metadata

Data that describes other data. For example, Author and Size could be metadata for a Microsoft Word document. Most metadata can be used in a Verity Query Language expression to search for documents containing the given metadata value.

mkprf

A command-line tool for building and maintaining Profile Nets.

mksyd

A command-line tool used to build a thesaurus from a thesaurus control file.

mktopics

A command-line tool for building and updating topic sets.

mkvdk

An all-purpose command-line collection maintenance tool.

modifiers

Query terms used in conjunction with operators to change the standard behavior of an operator.

multibyte string

A string formed from characters encoded with a multibyte character set. A multibyte character set uses a variable number of bytes to represent one character.

multilanguage locale

A Verity locale (uni) that supports multiple languages simultaneously. See also single-language locale.

no results filtering

A setting for document-level security in which all found documents are displayed in results lists, regardless of user access rights. Compare results-list filtering.

named nodes

In Verity Intelligent Classifier, a node that has been given the name that distinguishes it from all other nodes in the topic set. Equivalent to a topic.

node

In Verity Intelligent Classifier, an element in the topic tree. A node can be an unnamed node, or a named node. Named nodes represent topics.

normalization

An indexing feature in which a single version of a character is used when alternate versions exist (such as half-width and full-width kana in Japanese), and a single spelling is used for a word that has alternate spellings (such as color and colour in English). Users searching a normalized collection for a word find all words with either the common spelling or any of the alternate spellings.

noun phrase

A group of words (for example, due process or court of law) that functions as a noun. Part-of-speech processing during indexing can lead to the automatic extraction of noun phrases, which can be used in the automatic creation of document features and summaries.

ODBC

See Open Database Connectivity.

ODK

See Organization Developer’s Kit.

okurigana

In Japanese, pronunciation marks added to Kanji words.

Open Database Connectivity (ODBC)

A middleware software standard that allows an application to communicate with various database engines.

Organization Developer’s Kit (ODK)

An SDK containing APIs that enables developers’ applications to create and populate taxonomies.

This SDK provides similar functionality to Verity Intelligent Classifier.

operators

Reserved words that describe the relationship between search terms in the Verity Query Language.

OTL file

A text file containing a representation of a topic set. See also topic outline file (OTL).

outline file

1. An XML file that specifies the structure of a parametric index. 2. See topic outline file (OTL).

parallel querying

The ability to simultaneously search multiple collections. K2 Server and K2 Broker support parallel querying.

parameter

In parametric selection, a set of discrete values or ranges.

parametric index

An index that enables retrieval of documents based on the values of parameters.

parametric selection

The ability to search for documents based on a value or values of one or more parameters. Parametric selection can be combined with full-text search on document content.

parametric tree

In parametric selection, a parameter type that defines a set of buckets whose values identify a category within a taxonomy; for example, the buckets associated with a taxonomy that identifies a category of country, state or province, and city, such as U.S.A./California/San Jose.

parent node

In Verity Intelligent Classifier, a node directly above another node in the topic tree.

partition

A subdivision of a collection. Partitioning collections improves scalability and searching performance.

part-of-speech processing

During indexing, the assignment of the appropriate part of speech (noun, verb, adjective, and so on) to each token in the word index.

passage-based summary

A document summary that is generated at viewing time and is based on document content plus the user query that retrieved the document. The summary consists of one or more passages (phrases) from the document, each of which contains a term from the query. Compare dynamic summary, static summary.

popular ranking

See adaptive ranking.

primary document key

See document key.

profile

1. A Recommendation Engine structure that is a dynamic representation of an entity. For example, a user profile represents the preferences, history, and interactions of a user. 2. A query (topic) stored in a profile net and used by the Profile Service. See document profile, interest profile, user profile.

Profile Net

The set of stored queries against which a Profile Service evaluates documents.

Profile Service

A service that processes large numbers of queries into Profile Nets and evaluates the incoming stream of documents against these queries. Developers can use Profile Services in applications such as message routing, document tagging, and classification.

promotion

A user interface element, such as an advertisement or datasheet, created in the Business Console, configured in the Component Framework, and displayed on a web page based on the rule used.

proximity search

A type of search that returns documents in which the specified terms are close to each other (for example, in the same sentence or separated by no more than a specified number of words).

purging

The act of deleting all records from a collection.

query parser

A mechanism that controls how the Verity engine parses query strings; for example, using standard Internet syntax consisting of words, quoted strings, and operators such as plus ( + ).

query suggestion

See spelling suggestion.

rcadmin

A command-line tool used to administer K2. It has similar functionality to the K2 Dashboard.

rck2

A command-line tool used to connect to K2 Servers for searching collections and other Verity indexes.

rcvdk

A command-line tool used for searching collections and displaying documents.

recommendation index

A logical grouping of closely related entity profiles used by the Recommendation Engine. For example, a book index might contain profiles of books, and it might furthermore be a specialized to contain profiles of fantasy-adventure books only.

RE Doc Index

A data file that contains the profiles of the documents in a collection.

RE User Index

A data file that contains the profiles of a set of users on a host.

recommendation

A feature that recommends documents relevant to user context. Recommendations are based on the users’ previously accessed documents and searches, on their departments and colleagues, on what documents their colleagues have accessed, and so on.

Recommendation Engine

The K2 component that provides recommendations.

relational taxonomies

A set of taxonomies containing the same documents, supporting simultaneous navigation through all the taxonomies. See also taxonomy.

report index

A specialized data structure that stores query log file data for use in generating reports. The report index is attached to the report server. There is only one report index in a K2 domain.

report server

A standard K2 Server whose function is to accept report-viewing requests, retrieve the appropriate data from the report index, and pass it to an application or to the K2 Dashboard for display. There is only one report server in a K2 domain.

repository

A group of documents that are all stored in the same location, such as a file system. Other repositories can be organized in a relational database or a proprietary storage system such as Microsoft Exchange folders or Lotus Notes databases.

results-list filtering

A setting for document-level security in which results lists show only those documents that a user can retrieve. Compare no results filtering.

routing

The ability to specify which documents get indexed into which collections, based on characteristics (MIME type, document keys, or K2 Profiler matches) of the documents.

scoped search

A search limited to documents in specified categories.

score

A numerical value indicating the degree of match between a document and a query. Scores, usually expressed to the end user as a decimal number between 0 and 1, are calculated during Verity search or Profiler operations. Scores are based on numerous factors, including the number of times search/query words appear in the document, their location in the document, and their proximity.

SDK

See Software Development Kit.

search application

The K2 components enabling administrators to index content and end-users to search and view that content. A search application typically includes K2 services (K2 Broker, K2 Server, K2 Ticket Server, and K2 Spider), JSP or ASP files, and Verity collections.

search engine

A software application that queries an index file. In broader terms it refers to a collection of programs which are used to index a repository and then present a user interface that enables queries to be constructed and searched.

search group

A grouping of K2 services consisting of one top-level K2 Broker plus all the other K2 Brokers and K2 Servers attached to it.

search worker

A software module in federated search that connects to and retrieves information from a particular kind of information source.

session-based profile

A temporary and dynamic user profile that can be used to track relevant searches and purchases so that similar products can also be recommended.

session character set

The character set used for input to and output from VDK during a VDK session. It must be a character set supported by the session locale.

session locale

The locale used for all operations during a VDK session.

sibling node

In Verity Intelligent Classifier, a node at the same level as another node, and directly below a parent node.

simple tokens

A behavior, available for some locales, in which nearly all symbols (in addition to white space and punctuation) are defined as delimiters. In simple-token behavior, words are broken down into smaller searchable units, thus increasing the potential for search hits.

single-byte string

A string written in a single-byte character set. Each character in a single-byte character set is one byte long. Single-byte strings can be 8-bit strings that use extended characters, or 7-bit strings that use ASCII.

single-language locale

A Verity Locale that supports only one language. Most locales are single-language. Compare multilanguage locale.

social network

A model of the patent and latent relationships between the people in an organization and the documents they create, modify, access, search, and organize.

Software Development Kit (SDK)

A set of APIs and related tools that enables a developer to create applications.

sorting order

The order in which a locale sorts the characters of its language. Verity Locales sort characters in a manner that facilitates accent-insensitive and case-insensitive search and display.

Soundex search

A type of search in which occurrences of the search term plus any words with similar pronunciation are returned. Verity supports Soundex search for the English language only.

spelling suggestion

A K2 feature that allows a search application to suggest corrections to mistyped words in a user’s query. If a search returns no or few results, the application can display a message on the search results page, listing a suggested alternate query.

spidering

The process of crawling and indexing. See also K2 Spider and Verity Spider (vspider).

static highlighting

A method of highlighting the search term in a document summary or retrieved document. In static highlighting, the application uses offsets in the collection’s word index to calculate the positions of terms to highlight. Static highlighting is faster but less accurate than dynamic highlighting.

static summary

A document summary that is generated at indexing time. A static summary can consist of either the initial text of the document or a set of phrases selected by K2 as being representative of the document as a whole. Compare dynamic summary, passage-based summary.

stemmed search

A type of search that locates all words that share the same word stem. For example, a stemmed search for the term house would find all occurrences of house, but also all occurrences of houses, housed, and housing.

stop word

A search term that should be ignored. Verity supports several types of stop-word lists, some used at indexing time and others used at search time.

stop-word list

A file containing search terms that should be ignored. Verity supports several types of stop-word lists, some used at indexing time and others at search time.

style file

A file used to configure the series of indexes in a collection that store data about its documents. The choice of a particular style file determines how indexing utilities function.

style.dft

A collection style file that controls the contents of the virtual document created during indexing.

style.fxs

A collection style file that contains feature-extraction stop words, that is, words that should not appear in document summaries and clusters. See also vdk30.stp.

style.lex

A collection style file that can control how tokenization occurs during indexing. Use of style.lex is discouraged; tokenization control is now available through the locale definition file associated with each locale.

style.prm

A collection style file containing parameters that control the generation of specialized indexes.

style.stp

A collection style file that contains indexing stop words, that is, words that should not be included in the collection’s word index.

style.ufl

A collection style file that defines custom fields to be included in the collection’s document table and optionally specifies the generation of indexes for those fields.

style.uni

A collection style file that controls the functioning of the universal filter.

style.zon

A filter style file that controls functioning of the zone filter.

styleset

The complete set of style files.

StyleSet Editor

The Verity application that enables administrators to create and modify style files.

subword

A constituent element of a compound word.

summary

See document summary.

synonym

A word associated with other words. In the Business Console Synonyms module, synonyms can be defined to refine a query result.

synonym search

A type of search that returns all occurrences of the search term and also any of its synonyms, as defined in a thesaurus.

system default locale

It is the default session locale if the default installation locale is not defined.

TAX file

A taxonomy file. This stores the taxonomy in a text format.

taxonomy

The hierarchical organization of data by category. A taxonomy defines the view(s) by which administrators want to organize data.

tensor

An algebraic representation of the terms significant to a user or a document.

tensor space

A space which has a dimension for each unique term in the Recommendation Engine. Each tensor in this space can be measured by a weight for each term.

thematic mapping

The process of automatically extracting the key concepts contained in a set of documents and organizing them into a hierarchy, which is called a concept tree.

thesaurus

A dictionary of synonyms. Each Verity Locale supports use of a thesaurus for searching. In a synonym search, all occurrences of the search term and any of its synonyms are returned.

thesaurus control file

A text file containing lists of synonyms. Administrators create a thesaurus control file, then they use the mksyd command-line tool to compile it into a thesaurus.

ticket

A temporary access pass granted by the K2 Ticket Server to a user for as long as the user is logged in.

time to live (TTL)

The maximum time for a K2 search. If the search takes longer than this time, K2 stops the search.

token

A searchable unit in a document. Tokens are typically the individual words in a document, but they can also be word stems, or any string fragments that occur between delimiter characters.

tokenization

The process by which the tokenizer converts a document’s text into searchable units (such as words and word stems). The tokens are then stored in a collection’s word index.

top level topic

The topic at the top level of the topic tree.

topic

A stored query expression written in the Verity Query Language (VQL) that is used 1. to model a concept of interest in a classification task, or 2. to enable users to quickly find information without having to compose sophisticated queries using complex syntax in a search task. In a classification task, a topic can be used by itself, or combined with other topics to specify a category definition.

topic outline file (OTL)

A text file that defines the structure of a topic set. Topic outline files have a file extension of .otl.

topic set

A grouping of topics that have been compiled for use by a Verity application; for classification tasks, a topic set contains one or more topics used to classify documents in a collection.

topic tree

The hierarchy of the topic set. It is shown in the Topic Pane of Verity Intelligent Classifier.

transaction

A modification of one or more entities in a Recommendation Engine index. For example, a transaction may make a document more relevant to a particular query due to user input.

TTL

See time to live.

typo search

A type of search that corrects for minor misspellings in the search terms. In a typo search, occurrences of the search term and any words close to it in spelling are returned.

UDJ

See user-defined job.

Unicode

A standard double-byte character set. The Unicode standard encodes the characters for all major modern languages. The advantages include characters that are always a fixed sized (2 bytes), and all characters can be represented in one character set. There are various implementations of portions of the Unicode standard. The implementation used by the Verity multilanguage locale is UTF-8.

universal filter

A document filter that receives raw data from the gateway and determines the file type of the incoming document. Based on this file type, the filter invokes a suitable helper filter, which extracts the available text and metadata from the document.

user-defined job (UDJ)

A job created by the K2 administrator. User defined jobs specify a command-line tool and its associated arguments. Once a job is created, administrators can run it right away or schedule it to run later. Administrators can also chain jobs so that other jobs start when a given job finishes. See also collection indexing job.

user index

A recommendation index that contains profiles of users on a particular host machine.

user profile

The representation of a user in a recommendation index. A user profile is created over time from information such as documents authored by the user, interests submitted, queries asked, and documents rated or viewed. See also document profile, entity profile.

VDK

See Verity Developer’s Kit.

vdk30.stp

A locale-specific file that contains feature-extraction stop words, that is, words that should not appear in document summaries and clusters. See also style.fxs.

VdkVgwKey

See gateway key.

Verity Developer’s Kit (VDK)

1. Verity Developer's Kit, the API that enables OEM developers to build Verity functionality into their products, or 2. the programming core on which most Verity applications are built.

Verity Export SDK

An API that enables custom and template-driven conversion of popular word processing, spreadsheet, and presentation files into high-quality HTML or XML through a variety of programming interfaces. It includes HTML Export and XML Export.

Verity Federator

An application that enables the user to perform federated searches.

Verity Filter SDK

An API that enables developers to extract text from a wide variety of word processing, spreadsheet, and presentation formats. The API also supports metadata extraction and automatic detection of document types.

Verity Intelligent Classifier

An application for creating, viewing, editing, and testing topics and taxonomies.

Verity Locale

A software module that allows Verity applications to operate on documents in a specific language or set of languages. A locale provides one or more capabilities that may include tokenization, stemming, part-of-speech recognition, and thesaurus use. See also single-language locale and multilanguage locale.

Verity Query Language

(VQL)

Verity’s standard language for issuing searches consisting of operators.

Verity Spider (vspider)

A command-line tool that provides indexing capabilities for many different document formats throughout the enterprise, including web-based, disk file systems, Lotus Notes, Microsoft Exchange, and ODBC. See also K2 Spider.

Verity Viewing SDK

An application that allows developers to build their own 32-bit applications using the Verity conversion and viewing tools. Developers can build these tools into their document management, web server, Internet/Intranet, groupware, information retrieval, e-mail, and imaging applications for seamless WYSIWYG viewing and printing of word processing, spreadsheet, presentation graphics, picture, and compression formats.

Verity web services

Web services that consist of a remote procedure call (RPC) interface to a search engine that enables a client application to invoke operations on the search engine over the internet.

VQL

See Verity Query Language.

web services

See Verity web services.

weight

In the Verity Query Language, a number that can be used to represent the importance of different parts of the query. Weights are combined with the scores returned by the query’s operators and hence affect the documents’ scores.

wildcard search

A type of search in which the search term contains special symbols that represent multiple characters. For example, a wildcard search with the term abc* returns occurrences of all words that start with abc.

word index

Stored in a collection, a list of all words that appear in the documents, plus the location of every instance of the word.

worker

See search worker.

XML

See eXtensible Markup Language.

XML Export

See Verity Export SDK.

XML filter

A document filter that processes XML documents.

XML schema

A structured framework or plan that contains elements or tags and their definitions to outline the organization of XML file content.

zone

A named region of a document that can be searched. Examples are HTML tags such as H1, H2, BODY, TITLE, and so on., or the values of the TO, FROM, and SUBJECT fields in email and Usenet messages.

zone filter

A document filter that processes documents—such as HTML, Usenet news, and email documents—that contain zones. See also XML filter.