Known Problems and Limitations



Installation and Uninstallation

 

 
Note   If you are migrating to K2 6.0 from K2 5.x, see important instructions and requirements in the Verity K2 Migration Guide.
 

 

Business Console requires Component Framework

 

If you are installing Business Console in a custom installation, you must also install the Component Framework, or else Business Console will not function. (101769)

K2 Administration Server will not restart on Windows if user account password is changed

 

If you specify a user account for the K2 Administration Server, and the password for that user is changed, the K2 Administration Server will not restart until you update the password for the user account for K2 Administration Server Windows service. (90446)

Configuration required for File System Gateway and the UNIX login module to support shadow passwords

 

In order for the File System Gateway and the UNIX Login Module to support shadow passwords, ensure that you make the following configuration changes (90841):

a. Install K2 on a local UNIX host. Support for shadow passwords will be on this host.

If you install K2 on a network share, you must configure NFS to share root access for the local root user. See your network administrator for assistance.

b. Change the ownership of k2/ platform/bin/vspget to the local root user.

c. Enable the sticky bit on k2/ platform/bin/vspget . (chmod +s )

Installing documentation after K2 Services leaves documentation unsearchable

 

If you install K2 Services, and then in a separate installation session, install K2 Documentation, the documentation collection (verity_doccoll ) is not searchable because it is not attached to any instance of a K2 Server.

Workaround: Use K2 Dashboard or the rcadmin to create the documentation server (default name is hostname_docserver ) and add verity_doccoll and other sample collections to it. (90885, 101562, 101564)

K2 Installation Script on UNIX Requires Space in /tmp

 

The K2 installation script requires there be space in /tmp or /var/tmp , even if you use the -is:tempdir option to specify a different temp directory. When running the installation script on UNIX, ensure you have available space, approximately 400MB. (101234)

Run rcadmin -update after installation

 

To ensure license information for an installation is synchronized with the Master Administration Server, always run rcadmin -update after installing anything on a host on which you previously ran an installation. For information about the -update option, see the Verity K2 rcadmin Guide. (87109, 90854)

Cannot uninstall Verity ODBC drivers without removing DSNs configured with the Verity ODBC drivers

 

You must remove any DSNs created with the Verity ODBC drivers before you can uninstall the Verity ODBC drivers. (84722)


Limitations:

Installer does not accept multibyte input

 

In general, the K2 Dashboard and K2 Services (K2 Administration Server, K2 Broker, etc.) support multibyte characters. However, during installation, multibyte characters cannot be used in input fields or paths; they should be composed of 7-bit ASCII characters only. Additionally, for the JSP version of the K2 Dashboard on UNIX, administrator user names cannot use multibyte characters. (72738)

Workaround: After install, go to K2 Dashboard, set the right encoding and then change the properties.


Indexing

Limited support for PSW encoding

 

PSW (paragraph-sentence-word) encoding is limited in this release to defaults supported by VDK. The K2 6.0 locales do not send paragraph tokens to the tokenizer. Therefore, if PSW encoding is selected in a collection’s style.prm file, the tokenizer uses the VDK default, which is to break paragraphs at every 15th sentence. (97130)

Workaround: Do not specify PSW encoding in style.prm .

Customizing virtual document format may require use of constants

 

Adjacent un-zoned fields in style.dft will by default be abutted in the virtual document, concatenating the last word of the first field to the first word of the second field, and so on. (90590)

Workaround: In style.dft , either zone the field using /zone=zone_name on each of your field statements, or separate fields with either of the following lines:

constant: "\n"

or

constant: " "

Topic warnings or errors may be due to locale mismatch

 

When attempting to access a topic in a topic set, you might receive a warning such as Ignoring topic A (not the name of an existing topic) or an error such as No topic built . If you know that the topic exists, the error could be the result of a mismatch between the current locale (for the VDK session) and the locale used when the topic was built. (90175)

For example, you might encounter this error if the topic set was built under an older version of K2 (in which the default locale is english , englishx , or englishv ) and you attempt to access it with K2 6.0 in the default locale (which is uni).

Workaround: Change your current locale to the topic set’s locale, or rebuild the topic set in the new locale.

Content on password-protected UNC shares cannot be accessed unless K2 Services run as user account

 

In order to index, search and view content on password-protected UNC shares, you must run K2 services (Administration Servers, K2 Servers, K2 Spider) as a specific user account with the necessary privileges to access the content. (90447)

Workaround: Run all K2 services from the command line.

Fully qualified domain name required for secure Web indexing

 

When indexing secure web sites, you must specify the fully qualified domain name for hosts in your starting points. (90787, 100965)

For example:

http://host:port/path will fail.

http://host.domain.com:port/path will succeed.

UNC provides most robust selection of network file system content on Windows

 

When indexing network mounted drives on Windows platforms, it is recommended you use UNC paths (\machine\path\). This is preferable to mapped drive letters, which are also supported by K2, but which may not map correctly on subsequent re-starts when the collection is refreshed or when viewing occurs. (90320)

Insufficient memory causes docs to be skipped

 

When there is insufficient memory to index, documents are skipped and reported as having bad keys. Ensure that the host performing the indexing has sufficient memory for the indexing job. (89056)

Behavior of maxmerge and maxclean for mkvdk

 

The maxmerge optimize option will only clean up partitions that have aged more than ten minutes and maxclean can be used to force an immediate cleanup if no read operations are expected on a collection. (86178)

Indexed word count may not match word count of source document

 

As part of VDK indexing work, VDK reports the total number of words indexed. This value may not match the number of words in the physical documents due to extra metadata that is also indexed. (86536)

Running out of file handles when indexing UNC paths with the File System Gateway on Windows 2000

 

If you are having problems indexing UNC paths with the File System gateway, read the following Windows Knowledge Base article (72459):

Configuring Opportunistic Locking in Windows 2000 (Q296264)

Purging a collection does not clean up its ZIP indexing cache

 

If a collection includes container files (ZIP or PST), indexing creates a cache to hold the extracted files for indexing. Subsequent purges of the collection do not by themselves empty the cache, which can grow large enough to cause disk-space problems. (94442)

Workaround: Manually delete the cache. The default path for the cache is collectionDir/kvcache , where is the collection’s directory.


Limitations:

Zip file containing more than 65535 files cannot be indexed

 

The maximum number of files that a Zip file can hold and still be indexable is 65535. A Zip file containing 65536 files will be indexed as an empty file. (99071)

Verity indexing tools cannot index files with certain characteristics

 

Verity indexing tools generate Error -17 when attempting to index files with all of the following characteristics (101003):

the file exists in a Windows directory with a space in the name

the file name is identical to the first word of the directory name

the file does not have an extension

For example, the following file cannot be indexed:

C:\My Programs\My

Required access rights for processes and domain user

 

To use provided native document security (either Windows Active Directory or NT Domain), the K2 Server, rcvdk , mkvdk , vspider and k2spider processes require read access to the files that have been indexed in order to retrieve them for indexing or viewing.

All files must be assigned read and other appropriate permissions in the Access Control Lists (ACLs) for the domain user who runs any of the processes: K2 Server, K2 Ticket Server, rcvdk , mkvdk , vspider and k2spider .

On Windows systems, in order for rcvdk to access a collection with the u command, the domain user account under which the rcvdk process is running must have the following rights and privileges:

Read and ACL access permission to the file

Act as part of the operating system

Log on as a service (if K2 processes are to be run as NT services)

Log on locally

Log on as a batch job

replace process level token


Filters

Unicode documents require a byte order mark

 

The KeyView filter (flt_kv ) does not recognize a Unicode text document if it lacks a byte order mark.

An exception is when the first 1024 bytes are Basic Latin Unicode. In this case a byte order mark is not required. (75233)

Some charts in spreadsheets are not supported

 

The KeyView filter (flt_kv) does not support donut, radar, surface or custom charts in spreadsheets.

Lotus Word Pro files are supported on Windows only

 

The KeyView filter (flt_kv) only supports Lotus Word Pro files (.lwp ) on Windows platforms because the filter uses the Lotus Word Pro SDK to support the format, and the Lotus SDK only runs on Windows.

Some characters in multibyte Lotus Word Pro files may not be filtered correctly

 

Some characters in multibyte Lotus Word Pro files may not be filtered correctly because the KeyView filter uses the Lotus Word Pro SDK to support the format, and the Lotus SDK has character set limitations.


Limitations:

Document-size limit

 

The KeyView filter does not handle files larger than 2GB. (97885)

Filtering/indexing limitations for PST files

 

Encryption and password-protection are only supported for PST files. Other file formats that are protected are not supported. (56661, 74399)

Since KeyView accesses PST files using MAPI, PST files are supported on Windows platforms only. This also requires that either an Outlook 2002 or 2003 client is installed and is the default email client.

PST files are only supported on Windows x86. (95997)

The contents of an Outlook folder that is deeply nested in the folder structure might not be extracted. For example, the contents of a folder that is 20 nested levels from the root folder may be ignored and not included in the output. The Windows platform restricts file paths to 256 characters. When an extensive folder structure is extracted to the file system, the file path created could exceed the 256 maximum. KeyView will not allow the path restriction to be violated, and ignores any folder that forces the extracted file path to exceed this limit. (95966)

Pathnames to indexes and binaries should be fully qualified.

KeyView does not support read-only PST files. For KeyView to open a PST file, the file must allow read and write access.

The version of Outlook client must be equal to or later than the version used to create the PST. For example, KeyView cannot filter an Outlook 2003 Unicode PST file with an Outlook 2002 client installed. (100723)

The Outlook client should not be running during indexing.

Filtering/indexing limitations for Microsoft Outlook files (msg)

 

Message descriptors, such as priority, status, and flags are not supported.

Metadata in the main message is not extracted. However, metadata, headers, and footers are extracted from the attachments.

Embedded graphics are not supported.

Format information (font, text effect, colors, and so on) in text and tables is not extracted.

An attachment may not be filtered correctly if the character set of the attachment is different from the character set of the original message. The outcome depends on whether the specified output character set is compatible with the character set of the attachment.

If an attachment filename is in a multibyte language, and the file is extracted on an English machine, the attachment can be filtered, but the multibyte filename is not preserved. The attachment filename after filtering is different from the original filename. Multibyte filenames are not supported on non-locale OS machines.

Filtering/indexing limitations for Microsoft Outlook Express files (eml)

 

Metadata in the main message is not extracted. However, metadata, headers, and footers are extracted from the attachments.

Embedded graphics are not supported.

Format information (font, text effect, colors, and so on) in text and tables is not extracted.

EML files generated by Microsoft Exchange are not supported.

If an attachment filename is in a multibyte language, and the file is extracted on an English machine, the attachment can be filtered, but the multibyte filename is not preserved. The attachment filename after filtering is different from the original filename. Multibyte filenames are not supported on non-locale OS machines.

If the body message of an EML file is in HTML format, the HTML is treated as plain text. All text and HTML tags are extracted.

Only two types of encoded text can be decoded: Base64 encoding and Quoted-Printable encoding.

Indexing and viewing limitations for flt_kv with PDF files

 

Supports 40-bit or 128-bit encryption. All PDF security attributes are supported, except user passwords and master passwords.

Embedded fonts in a PDF file are not translated correctly. They are usually displayed using the question mark (?) replacement character. (63767, 62927, 68261, 87987, 89141)

If an unsupported font is encountered during conversion, the default font, Times new Roman, is substituted. If the original font is wider than the substituted font, extra whitespace will appear in the output HTML file.

Bi-directional text is not supported.

Hyperlinks are not supported.

Annotations, such as notes, sound, and movies are not supported.

All pre-defined CMaps in the PDF 1.3 specification are supported. KeyView does not support CMaps that were added in the PDF 1.4 and PDF 1.5 specifications.

The following PDF color spaces are supported: DeviceRGB, DeviceGray, DeviceCMYK, CalGray, and CalRGB. Indexed color spaces are supported as long as they are used with a supported basic color space.

Vector graphics are not supported. Since background colors are defined in PDFs as vector graphics, background colors are also not supported. Raster graphics are supported.

When filtering a PDF document, flt_kv uses absolute positioning; that is the text appears in the exact position as in the original document. However, table of contents entries and summary information do not contain absolute positioning information. Therefore, if the main document, the table of contents and the summary information are generated in the same HTML output file, the TOC entries and summary information may overlap the body text in the document.

When viewing a PDF file, PDF bookmarks do not lead to the exact location of the destination marker, but jump to the page on which the destination marker exists. This is similar to the behavior of the Adobe Acrobat Reader.

The following features of PDF version 1.5 for Acrobat 6.0 are not supported:

Tagged PDFs

Images compressed in JPEG2000

Crypt Filter encryption

Hidden content in a PDF document, such as, Optional Content and OCG-State Actions

Interactive forms

Embedded multimedia presentation

Digital signatures and signature fields

Interactive presentations, that is, navigation between pages and transition actions

Donut, radar, surface and custom charts in spreadsheets are not supported

 

Donut, radar, surface and custom charts in a spreadsheet document are not displayed. Line charts are displayed correctly. (65499)

Background colors in word processing documents may not display correctly

 

A 16-color palette is used to display backgrounds of pages, tables, cells, or frames in word processing documents. A background using a 256-color palette is translated to 16 colors. (85545, 90388)

Links containing highlighted query terms do not function

 

If a query term appears in a hypertext link in a viewed document, the link is not active because the query term is highlighted. (76047, 76160)

Contents of word processing text boxes are not displayed correctly

 

If a graphic or table appears in a word processing text box, HTML Export cannot position it correctly in the HTML output file. (67271, 66762)


K2 Spider and VSpider

K2 Spider cannot index a SiteMinder-protected site

 

K2 Spider cannot index a Web site protected by basic authentication and a Netegrity SiteMinder system. (99694)

Workaround: To index a Web site protected by basic authentication and a Netegrity SiteMinder system, use vspider or Ultra Spider. Follow these steps:

a. Create the single sign-on style set, the collection, and the indexing job as described in the Netegrity SiteMinder Integration Technical Note.

b. Run the indexing job. Although the job will fail, the gateway file (vgwhttp.cfg ) will be created. You must now run the indexing job using either vspider or Verity Ultra Spider.

(Alternatively, you can manually create the file vgwhttp.vgw .)

c. To index the site using Vspider, use the following command:

vspider -collection <collname> -style <ssostyleset> -loglevel debug -start <startpoint> -locale <localename> -auth <job.auth> -header "cookie: SMCHALLENGE=YES"

The job authorization file must contain the password in plain text. For example, if you are indexing http://www.verity.com/finance , the job.auth file could contain:

*.verity.com "" username mypassword

d. To index the site using Ultra Spider, access the Ultra Spider interface using the K2 Dashboard.

In the tree at the left side of the K2 Dashboard, select the Administration Server on which Verity Ultra Spider is installed. (Verity Ultra Spider is installed separately from K2 Enterprise.) The K2 Dashboard displays the associated detail page.

Click the Ultra Spider Administration action link. You are prompted for a user name and password. Enter a user name and password, and click OK. The Verity Ultra Spider interface appears in a separate browser window. For information on this application, see the Verity Ultra Spider Administrator Guide.

Note: If you do not expect to use form-based authentication for searching and viewing, you can just use Ultra Spier with a default HTTP style set instead of creating a single sign-on style set.

K2 Spider cannot index a forms-based authenticated site

 

K2 Spider cannot index a Web site protected by forms-based authentication. (99703, 98770)

 

Workaround: To index a Web site protected by a forms-based authentication, use Ultra Spider. Follow these steps:

a. In the K2 Dashboard, create a Style Set and define the required authentication forms.

b. Create the collection using the new Style Set.

c. Create the indexing job and select the form(s) that apply to the job.

d. Run the indexing job. Although the job will fail, the gateway file (vgwhttp.cfg ) will be created. You must now run the indexing job using Verity Ultra Spider.

e. To index the site using Ultra Spider, access the Ultra Spider interface using the K2 Dashboard.

In the tree at the left side of the K2 Dashboard, select the Administration Server on which Verity Ultra Spider is installed. (Verity Ultra Spider is installed separately from K2 Enterprise.) The K2 Dashboard displays the associated detail page.

Click the Ultra Spider Administration action link. You are prompted for a user name and password. Enter a user name and password, and click OK. The Verity Ultra Spider interface appears in a separate browser window. For information on this application, see the Verity Ultra Spider Administrator Guide.

When running an indexing tool From the command-line to index PST Files, absolute paths are required

 

When indexing PST files with vspider, mkvdk , a K2 process, or K2 Spider from the command-line (as opposed to using K2 Spider within K2 Dashboard), you must specify absolute paths for the following:

collection locations

work paths

job work paths (if specified in a job INI file)

cookie paths (if specified in a job INI file)

auth paths (if specified in a job INI file)

(101435)

K2 Spider may re-index entire ZIP/PST file if duplicate detection is not enabled

 

To avoid re-indexing all documents in a ZIP or PST file that has changed since it was last indexed, ensure duplicate detection is enabled for File System and HTTP Gateway indexing jobs. For File System Gateway jobs, set detectdupfile = true in the job INI file. For HTTP Gateway jobs, set nodupdetect = false in the job INI file. (101588)

Indexing mirrored collections using k2 Spider and K2 Dashboard

 

Here is a summary of the requirements for indexing mirrored collections (92187):

All collections can have the same alias, but the collection directory names must be different.

Each individual collection has to be on a different machine.

You cannot stop an indexing job with mirrored collections. If the job stops, purge the job and also purge individual collections, then restart the job.

By default K2 Spider will not index over 500,000 documents from HTTP server

 

K2 Spider limits the maximum number of documents that can be indexed from an HTTP server to the number defined by the environment variable VERITY_K2SPIDER_MAXDOCPERHOST . If this environment variable is not defined, the default value of 500,000 is used. Thus by default K2 Spider will not index more than 500,000 documents from a single site. (90989)

Workaround: Define and set this environment variable in the K2 Spider Controller's environment before creating the indexing job.

Currently, there is no way to set this environment variable for jobs that run from the K2 Dashboard. All jobs running from the Dashboard are subject to the default limit.

K2 Spider log files cannot grow beyond 500MB

 

K2 Spider does not allow its log files to grow beyond 500 MB. When a log file reaches 100 MB, it is backed up as FileName.old . For example, if skip.log grows to 500MB, it is renamed to skip.log.old and a new, empty file named skip.log is created. When the new file grows to 500 MB, it is renamed and the previous skip.log.old is overwritten. Thus an administrator can see only the most recent 500MB - 1GB of log data. (90999)

Indexing batch size should not exceed 4000

 

The recommended upper limit for an indexing batch size is 4000. (90942)

Verity Spider and K2 Spider from command line:

The default value for the -submitsize option is 1024.

K2 Spider from K2 Dashboard:

The default value for Indexing Batch Size on the Indexing tab of job properties is 1024.

Specifying collectanchortext with K2 Spider can consume disk space and memory

 

When you specify the collectanchortext option with a K2 Spider indexing job, allow for sufficient disk space and memory usage. All documents that meet the indexing job criteria are cached locally and parsed prior to indexing. The size of the repository you are indexing and the broadness of your criteria determine how many documents are cached. (90611)

Must edit authorization file created for access to Web sites Secured by NTLM

 

The authorization file created by the K2 Dashboard is invalid for accessing web sites secured by NTLM. (87559)

Workaround from K2 Dashboard:

This workaround applies to using K2 Dashboard to Create Collections and Run Indexing Jobs

a. In K2 Dashboard, create an indexing job. Note the alias you specify for the job.

b. In / data/jobs/ jobalias, copy job.auth to NTLMjob.auth . Your path to data will depend on your installation, and jobalias is the alias you noted from step 1.

c. Open NTLMjob.auth in a text editor and for each starting point entry, replace path with NTLM as follows:

Entry in vgwhttp.auth created by StyleSet Editor:
# Encrypted " host. domain: port" " path" " domain\ user" " password"

Edited entry:
# Encrypted " host. domain: port" "NTLM" " domain\ user" " password"

d. In K2 Dashboard, create a user-defined job that copies NTLMjob.auth over job.auth . Chain the user-defined job to your indexing job such that the user-defined job executes before indexing occurs.

K2 Spider Controller can consume too much memory with unlimited=true

 

When you specify unlimited=true in the job.ini file for a K2 Spider indexing job, the K2 Spider Controller can consume excessive amounts of memory. Verity recommends that you limit the scope of your K2 Spider indexing jobs with either the host= or domain= options in job.ini . (84892)

vspider command option: -useget

 

When refreshing an HTTP page, vspider first sends a HEAD request to the web server to get the last modified time for the web page. vspider compares this timestamp with what it has stored in VSDB from the last crawl of this page. If this timestamp has changed, vspider generates a GET request to the web server to download the whole document and index it.

Some security modules used by web servers are not able to respond cleanly to a HEAD request even when the right credentials are supplied. If you run into such issues, you can use the new vspider command option -useget . When you specify -useget , the vspider web crawler will always use HTTP GET method to retrieve pages, instead of using HEAD and then GET. The default is off. (94816)

K2 Spider parameter: reposcharmap

 

A new parameter is allowed for the k2spider job.ini file: reposcharmap for the [JOB] section. This parameter allows the user to specify the charmap of the repository when filesys/http gateway is used. For example, the user is indexing a file system in which the file names may contain SJIS characters. In that case the job.ini file should contain "reposcharmap=sjis " in the [JOB] section. (91119)

The User can also specify the reposCharMap parameter for a job using the environment variable VERITY_REPOS_CHARMAP . For example if we set value of VERITY_REPOS_CHARMAP=sjis then all jobs will work as if the reposCharMap value is set as sjis . The environment variable support is provided for k2spider jobs that are run from Dashboard. Please note that the change in the environment variable must take effect before the K2 processes are started. Note that the value specified in the job.ini file for the reposcharmap field takes precedence over the value specified using the environment variable (if both are specified).

If a value of reposCharMap is provided the BIF files created by the k2spider for filesys gateway type will contain the DOC_FN key with the file path available in encoded form. The DOC_FN key is not written to the BIF files if reposCharMap is not specified. If the user wants to force the DOC_FN keys for filesys to be written (as encoded strings) the user can specify reposCharMap value to be * .

 

 
Note   vspider has an equivalent option, called -repos_charmap . (91386)
 

 

K2 Spider environment variable: VERITY_K2SPIDER_HTTP_USEGET

 

When refreshing an HTTP page, K2 Spider first sends a HEAD request to the web server to get the last modified time for the web page. K2 Spider compares this timestamp with what it has stored in VSDB from the last crawl of this page. If this timestamp has changed, K2 Spider generates a GET request to the web server to download the whole document and index it.

Some security modules used by web servers are not able to respond cleanly to a HEAD request even when the right credentials are supplied. If you run into such issues, you can use the environment variable VERITY_K2SPIDER_HTTP_USEGET to control how K2 Spider crawls an HTTP repository during a refresh. If the variable is on (set to 1), the K2 Spider web crawler will always use the HTTP GET method to retrieve pages, instead of using HEAD and then GET . The default is off. (94550)

K2 Spider environment variable: VERITY_K2SPIDER_DATE_TIME_FORMAT

 

The environment variable VERITY_K2SPIDER_DATE_TIME_FORMAT lets users specify to K2 Spider what their date format is. Users can use 'Y' for year, 'M' for month, 'D' for day, 'h' for hour, 'm' for minute, 's' for seconds and specify any arbitrary pattern, like

MM/DD/YYYY-hh:mm:ss

Once this variable is set to this pattern, any date like "06/04/2004-10:34:27 " will be recognized as June 4th, 2004, 10:34:27 AM GMT time. Note that the time zone used is always GMT, irrespective of the date/time format.

Setting this environment variable is necessary only if you are indexing a web server that uses non-standard date formats in HTTP headers and/or HTML meta tags. K2 Spider already recognizes RFC822, Asciitime and RFC850 date formats automatically. (93454)

K2 Spider environment variable: VERITY_K2SPIDER_VdkServiceType

 

K2 Spider has an environment variable called VERITY_K2SPIDER_VdkServiceType that can be set to a combination of service levels e.g.,

putenv VERITY_K2SPIDER_VdkServiceType "VdkServiceType_Search | VdkServiceType_Index"

One may use this environment variable to remove the VdkServiceType_DBA service level which is the root cause of the maxclean. By default VdkServiceType_Search is always on and should be kept on while defining this variable. VdkServiceType_Index needs to be turned on every time K2 Spider Indexer is running without -noindex option. If -nooptimize option in K2 Spider Indexer is not used, VdkServiceType_Optimize should also be turned on along with VdkServiceType_DBA . One may get rid of one or more of these service levels or introduce new ones using this variable. Please exercise caution while using this variable. (94266,92210)


Limitations:

Support for container documents (ZIP/PST)

 

K2 Spider has been enhanced to support container documents such as ZIP and PST files. Note these limitations (100149):

Child documents are indexed if they meet the same indexing job criteria as the parent document.

Child documents are not parsed for further links, but they are re-indexed if they are modified.

When new child documents are added to a container file, any that meet the job criteria are indexed; any child documents that are removed are also removed from the collection.

If you restart a job that was indexing PST files, some child keys for documents that were being indexed when the job was stopped may be skipped. (101698)

Container document support exists for the HTTP and File System Gateways only.

K2 Spider Indexers Shared Across Jobs Can Take a Long Time to Report State as Finished

 

If K2 Spider Indexers are shared across multiple jobs, the jobs that finish earlier than the others are not reported as Finished immediately after indexing is finished. Those jobs that finish early cannot be reported as finished because threads are still in use for optimization for the other jobs. (100716)

Form-based authentication URLs must match job start points and be fully qualified

 

When you specify URL patterns in StyleSet Editor for forms-based authentication, the URLs must be fully qualified, as in http://hostx.domain.ext , and the host names must match what is specified in the starting points. For example, a URL pattern of http://cairo.thecompany.com for forms-based authentication must be matched by http://cairo.thecompany.com in a starting point. (101686)

K2 Spider collects link information per job only

 

K2 Spider collects the link information on a per-job basis. Consolidating link information from multiple jobs is not supported. Because link information such as in-link count, out-link count and anchor text are used for scoring, this limitation can have some impact on relevance of search results.

K2 Spider collects links from WEB repository only

 

K2 Spider collects the link information only for web repository (i.e. HTTP collection only). Collecting link information for other repository types is not supported.

Limited refreshing of link information

 

K2 Spider refreshes document link information in two cases only:

When a new document got added to collection, the link info for this document will be added to collection.

When a document got modified and a new link, which points to a new document, is found, the link info for both documents will be added/refreshed in collection.

Workaround: Purge and Re-index the collection

Fast recovery for K2 Spider controller

 

When a controller crashes and is brought back up, it recovers all the jobs that were running at the time of crash but does not automatically restart them.

Workaround: You must explicitly specify the -recover option on the controller command line if you want the jobs to be automatically restarted.

Always use event table when running vspider in persist mode

 

The following configuration is not supported (it causes the duplication of documents):

Running VSpider in persist mode, and

Using ODBC gateway with no event-table definition in the vgwodbc.cfg file


Searching and Viewing

(See also Indexing and viewing limitations for flt_kv with PDF files under Indexing.)

Searching strings that contain periods

 

To support searching for strings that contain periods, the fix is to ignore sentence tagging when counting word positions (for more details check comments in style.prm ). Requires collection re-index. It is a collection level configuration parameter (NOEOS) and it is locale-independent. Sentence tokens are still available to VDK summarization and feature extraction. (94144)

The following related comment and definition are from style.prm :

This example enables Word Count word position format but ignores sentence tagging. The word position is bumped upon sentence tokens. However, the sentence breaks may be incorrect, on ignores sentence tagging during indexing time for word position counting (i.e., word positions will not be bumped upon sentence breaks).

#$define IDX-CONFIG "WCT NOEOS"

Cannot search documentation if K2 uses a third-party application server

 

If K2 has been configured to use a different application server from the Verity Administration Web Server, the document-search application (accessed through the K2 Dashboard Help or through the Start menu shortcut in Windows) does not function. (90778)

Workaround: Deploy a Verity-supplied documentation WAR file to your application server, then modify a link in the documentation-index page to point to the WAR file.

a. Locate the file verity_docs_webapp.war , in the directory installDir\data\docs\webapp .

b. Copy the WAR file to the appropriate directory of your application server. For Tomcat, for example, the directory is TomcatInstall\webapps ; for WebSphere, it is WebSphereInstall\installedAapps . See your application-server documentation for details.

c. Locate the Verity documentation index page index.html , in the directory installDir\data\docs .

d. Open the file for editing. Modify the anchor tag associated with the Search the Library link, by changing the href attribute from

href="/verity_docs/webapp/pages/search/basic.jsp"

to

href="/verity_docs_webapp/webapp/pages/search/basic.jsp"

e. Open the file web.xml , in the directory verity_docs_webapp (the uncompressed WAR-file directory beneath your application-server directory).

f. Change the host and port in the following tag pair

<param-name>VERITY_DOC_HOSTPORT</param-name>
<param-value>localhost:9920</param-value>

To the actual host and port used by your K2 Server.

Sort order for multilanguage locale

 

For searches on multiple collections in different languages created in the multilanguage (uni ) locale, K2 Server and K2 Broker sort results according to the Unicode Collation Algorithm. The language used in the query (as specified with the <LANG> modifier) has no effect on the sorting order.

Moving a collection with relative dockey paths requires moving the repository

 

If you move a Windows File System collection in which the keys use relative paths, you must also move the source documents to maintain their relative location compared to the collection. Otherwise, folder security checking will not work correctly and viewing will not be possible. (90985)

Highlighting in Exchange Gateway attachments

 

To support highlighted viewing of Exchange Gateway attachments when the VdkVgwkey is not in the collection character set, take these steps (94259,90864):

a. Add one entry to the config file vgwmsxch.cfg in order to turn on the appropriate charset conversion. For example:

[/] .... charset=1252 ....

in which the specified charset (1252 in the above example) is the repository charset.

b. Purge and re-index the collection.

Highlighting for PDF documents

 

When viewing PDF documents through sample applications or a test search in the K2 Dashboard, the universal filter invokes flt_kv for viewing, and the results are displayed as HTML. In some cases, highlighting of search terms may be incorrect. An alternative is to view the results as PDF with dynamic highlighting.

Add the following lines to the type statement for PDFs in the style.uni file:

type: "application/pdf"
$ifdef VDKSTREAMMODE_VIEW
/format-filter = "flt_pdf"
/charset = none
$else
/format-filter = "flt_kv"
/charset = utf8
$endif

 

   
  Note   By default, K2 uses dynamic highlighting. Dynamic highlighting performs highlighting on-the-fly without revisiting the word index to calculate highlights.
   

 

(For static highlighting with PDF viewing, use flt_pdf at indexing time.)

Highlighting hit counts may be inaccurate for small documents

 

The hit count in highlighting data may be inaccurate. By default, Verity conducts zone searches of the VdkSummary field when calculating the number of hits in a document. The smaller the number of words in a document, the greater the likelihood that the search term will occur in the summary field as well as the document itself.

For example, create a profile net with a single query for the word cat. Evaluate a buffer document containing the words a cute cat using the rck2 command line tool. If you then request highlighting information, the count of the number of times cat was found will be 2, since it occurs in both the buffer and the VdkSummary field. (85079)

Double escapes required for special characters with MATCHES operator

 

Searches on collection fields containing document paths don't work when the path contains backslash characters and the MATCHES operator is used with the * wildcard character. Consider the following example:

VdkVgwKey <starts> \\a\\b <and> <not> VdkVgwKey <matches> \\a\\b\\*\\*

This example returns all documents instead of just those in the path \a\b . (71227, 91217)

The = (equals) operator performs an exact literal match. The MATCHES operator performs a wildcard match. The string operand of MATCHES is parsed twice, so any special characters are unescaped twice (the first time by the query parser and the second time by the wildcard or regex matching procedure). (71217, 91217)

Workaround: Because it is unescaped twice, use a double escape (\\ ) when using the backslash (\ ) to escape a special character.

Field searches are slow

 

FIELD operators are slower than others. (91354)

Incorrect highlighting for queries with multiple LANG/ID modifiers

 

Use dynamic highlighting on results for legacy collections from queries that have multiple LANG/ID modifiers. You need to reindex your 5.5 collections if you do not use dynamic highlighting. (95004)

Symbol for ‘unknown character’ is replaceable

 

When a query or other string is translated from one character set to another, such as from UTF8 to 1252, some characters in the input string may have no translation in the output character set. The charset driver kvcs by default replaces such characters with a question mark (? ). Since ? is interpreted as a single-character wildcard by the simple query parser, this can cause unexpected search results. For example, a two-character Chinese query applied to an English/1252 collection will match all two-letter English words, because the converted query will be ?? . (99056)

You can replace the default ‘unknown character’ symbol in any of the VDK character-set (.cs ) files (in productDir/common ) that use the kvcs driver. In these files, the kvcs driver config line

driver: "kvcs"

accepts the -replace option followed by a numeric value, like this:

driver: "kvcs -replace asciiChar"

where asciiChar is the decimal code of the ASCII character that you want to use as the ‘unknown character’ symbol. Values of 1 to 127 are accepted; other values are ignored with no error message. The default is the question mark (decimal 63).

While no character is completely immune to misinterpretation by a query parer, the most neutral choices may be these:

 


Name

Character

Decimal value

hashmark

#

35

dot

.

46

tilde

~

126



Business Console

Automatically creating a taxonomy from a very large collection is slow.

 

Using a large collection to automatically create a taxonomy can be slow, especially if you are using topic creation. (92739)

Importing an expanded topic might fail.

 

If a topic is extremely large, performing an import of the topic into Business Console can fail. (92984)

The Unused category is published when the Root and subtree were published.

 

Publishing the Root category and its subtree also publishes the Unused category. (92829)

Workaround: You should not do a recursive publish on the Root category for this case.

Cannot open new taxonomies after associating many collections.

 

The Index Server might have run out of synchronous threads (two per association). (93329)

Workaround: Increase the number of synchronous threads of your Index Server, and restart Business Console. See the Verity Business Console Guide for more information about configuring Business Console.

Incorporate Taxonomy does not work with paths that contain double-byte characters.

 

The Incorporate Taxonomy feature does not incorporate a taxonomy that is located at an operating system path that contains double-byte characters (for example, a path that contains Chinese characters). (94548)

Business Console only works with K2 6.0 systems.

 

Business Console only works with pure 6.0 systems. It might not function correctly if linked to an earlier K2 version when using multiple nodes. (101493)

Copying categories with linked topics expands the links.

 

Copying a resource taxonomy that contains categories with linked topics expands the links rather than retaining the topic links within the subtree. (99258)

Cannot delete a parametric index from the Business Console Administration window.

 

If you create a parametric index from scratch and do not associate it with a collection or taxonomy, you cannot delete the parametric index from the Administration window. (101654)

Promotions do not work when mirrored to multiple hosts.

 

Promotions need to reside on the host running the Business Console server. (101633)

You cannot use the same name for parametric indexes on different hosts.

 

Parametric index mirroring is not fully supported in Business Console. (100797)

Workaround: Use K2 Dashboard to mirror the parametric indexes.

Transport tab available in Business Console client login dialog box.

 

This tab should not be available to the general users, but is available for 6.0. (101507)

Populating a parametric index using a taxonomy that has a category with no defined rule assigns all documents to that category.

 

If you populate a parametric index from the Parametric Index module of Business Console, and use a taxonomy that has a category with no rule defined, all the documents in the collection gets assigned to this category. (101605)

Error after deleting a reference category.

 

Deleting a reference category produces an error when attempting to select and edit the original category, if the original category has an associated topic. (101546)

Importing a synonym control file fails.

 

List and key identifiers must be lower case in the control file. See Verity Query Language and Topic Guide for information about synonym control file structure.

Workaround: Check the identifiers in the .ctl to ensure that these identifiers do not use upper case, and change them if required. (101488)

Deleted Promotion Properties still display.

 

If all of the promotion links for a promotion set are deleted, the last saved promotion link still displays when the promotion is triggered. If at least one existing promotion link is not deleted, then the deleted properties do not display. (101431)

Linked topics expand during an import.

 

Links are not supported during an import, so linked topics are expanded. This can create very large topics, and can produce an error. (100684)

You can add multiple promotion link properties with the same name.

 

Multiple link properties can be added with the same name and data type in the Promotions wizard. (98243)

Collection associations cannot be created if a K2 Server is stopped.

 

In a K2 Server and K2 Broker environment, where a single K2 Broker manages several K2 Servers, all the servers need to be running for Business Console to function correctly even if the resources being used are not on the stopped server. (92242)


Command-Line Tools

Don’t abort mkpi or mkprf with CTRL-C when used with Notes Gateway

 

When using mkpi or mkprf with collections or knowledge trees whose documents are accessed through a Notes gateway, aborting the session with CTRL-C can sometimes prevent Lotus Notes from shutting down. (76449, 76655)

Workaround: Instead of CTRL-C, let the tool complete its task or use an appropriate operating system method to kill it directly.


Limitations:

Cannot re-attempt full wire encryption with rck2

 

The rck2 command-line tool allows only a single attempt to add the encryption key for full wire encryption. Any attempt beyond the first attempt to add the same key will fail. If it is necessary to replace a previously added encryption key, rck2 must be shutdown, then restarted. Once restarted it will be possible once again to set the encryption key. (74114)


Security

Use of Distinguished name on Notes LDAP servers

 

On many LDAP servers, a distinguished name is of the form

uid= userName,ou=people,o= organizationName

in which the keys uid and cn can both be used to specify a user name. On a Notes LDAP server, the distinguished name is of the form

CN= userName,OU=people,O= organizationName

in which the uid key is not used. When configuring the LDAP login module, be sure to use a user-name key (such as uid or cn ) that is valid on the specific LDAP server you are using.

Intermittent login failures with UNIX login module on AIX

 

When you select the UNIX login module on AIX, you may face intermittent login failures with valid user IDs/passwords. The problem was observed when running AIX fileset bos.rte.libc , level 4.3.3.78. However, this problem was not seen when running bos.rte.libc - level 4.3.3.75 or bos.rte.libc - level 5.1.0.10 (AIX 5.1).

The problem occurs when a non-existing user ID attempts to login. After this, valid credentials may or may not be accepted. Invalid user IDs/passwords will always fail. (76482)

Nested groups are not shown for LDAP Schemas 3 and 4

 

When you select the Authentication Server Type of LDAP while installing K2 Platform, you are presented with several LDAP schemas. Nested groups for LDAP schemas 3 and 4 are not shown in the K2 Dashboard’s Modify window. (74179)


Profiler/Profile Nets

Cannot import a Japanese, Chinese, or Korean PI or taxonomy into a profile net with mkprf on HP-UX

 

Because of incompatibilities between Asian locales (japanb , koreab , simpcb , tradcb ) and the multilanguage (uni ) locale, mkprf can crash when attempting this import. (100487)

Workaround: Before importing, disable support for those Asian languages in the uni locale. See the Verity Locales Release Notes V6.0 for instructions.

Profiler environment variables

 

The environment variables VDK_NO_PRF_CASE , VDK_NO_PRF_STEM and VDK_NO_PRF_SNDX can be used to disable CASE/STEM/SNDX in the profiler. The environment variables must be set before the prf is opened and before the queries are loaded. With the environment variables set, profiler will turn off case, stem and soundex variations when matching queries. (93822,94150,93889)

Using K2 Profiler to profile gateway (non-indexed) documents

 

To allow K2 Profiler to profile gateway documents (non-collection documents) through a K2 Broker, take these steps (93314):

1. Introduce a new environment variable VERITY_K2BROKER_GW_PROF_ON .

2. If VERITY_K2BROKER_GW_PROF_ON is set (to any value), K2 Broker will not block the gateway profiling requests and pass those down to the server with the following assumptions:

The K2 Servers are correctly configured with "same/replicated" styleset alias

The actual dockey is accessible from all K2 Servers using the round-robin load balancing mechanism.

3. If VERITY_K2BROKER_GW_PROF_ON is not set, the current behavior is retained.

VDK profiling of secure documents through Documentum gateway fails

 

Gateways such as Documentum that perform a security check when returning fields cause the VDK Profiler to fail when it attempts to profile a secure document. (101744)

Workaround: Profile only public documents (until this issue is corrected in a soon-to-be-released patch).

Profiling without anonymous access

 

When the logged-in user has credentials on the gateway repository different from anonymous user, profiling fails with keys from anonymous collections. Anonymous credentials are not used, irrespective of login. (74283)


Limitations:

User-defined properties not imported into profile net

 

VIC and Business Console produce taxonomies in which categories can include user-defined properties. If you create such a taxonomy in XML format and want to import into a profile net using mkprf -importXML , the user-defined properties are not brought across. (96097, 95385)


Recommendation Engine

using mkre -log reset causes K2 Broker error

 

In a Linux environment, executing mkre -log reset -alias b1 can cause a fatal error in K2 Broker when the pingDelay setting is over 3000. (73948)

Workaround: Change the pingDelay setting to something smaller, such as 300 or 3000.


Limitations:

RITypes “doc”, “user” and “query” are reserved

 

In the Recommendation Engine and in the corresponding VAdministration Java OEM APIs, the RITypes “doc”, “user” and “query” are reserved and are treated in a special way.

The RI alias “users” (of type “user”) is used to refer to the host-wide default user index and should not be explicitly created by the administrator.

The type “query” is reserved for transient queries, and used internally by the engine. Administrator should not create any RIs of type “query”.

The “doc” type refers to RIs associated with Collections that already exist on the system.

Further, certain fields are not applicable to these special entity types, and certain defaults are used.

Workaround: Do not create an RI of type “query”. In rcadmin , do not define any RI of type “doc” or an RI with type “user” and alias “users”. To build an RI index for a collection using rcadmin , just enable Recommendation for that collection.

RI alias should not match existing collection alias

 

Do not create any RIs with aliases that match any collections on the system. Undesirable effects will occur. For example, if a collection exists with alias “c1”, do not create an RI of type “book” with the same alias “c1”.

Specify only valid collections for recommendation requests to a K2 Server

 

If a client connects directly to a K2 Server for a Recommendation Engine request, only collections which are actually attached to that server should be specified in the request. If an invalid collection is specified, the request will fail with an error. (73976)


Component Framework

Repository authentication components in K2

 

If you ignore a repository for authentication and this repository is required for authenticating an index, you can cause an infinite loop. Specifically, the search page will throw a “Need authentication” exception and the error handler component will redirect to the repository authentication page; however, if the only repository for authentication is ignored, there will have no repositories to draw and will act as if the server sent no repositories to authenticate and will redirect to the search page. The search page will resubmit the query, causing the cycle to repeat infinitely.

Workaround: Either do not use the ignore attribute with repositories requiring authentication or disable the collection from being searched. (101397)

.NET requestValidation behavior in K2

 

When searching queries containing <and> or <or> operators in a K2 .NET application, the following exception may be thrown:

HttpRequestValidationException (0x80004005): A potentially dangerous Request.QueryString value was detected from the client (advancedSearch:SearchTerm="...ollection <and> k2 <and> secur...").]

Workaround: Use a different query parser so that you can rewrite the query to avoid the use of angle brackets.

 

   
  Note   Removing the angle bracket ( <> ) detection for IIS is not recommended because it creates a security hole; for more information, see http://www.asp.net/faq/RequestValidation.aspx .
   

 

(99905, 99920)

K2 .NET sample applications

 

On Windows 2003, if you install K2 and immediately try to run a K2 sample application, such as search or recipes , an exception may be thrown.

Workaround: Restart your Windows 2003 machine. (99806)


Other

Cannot load uni-local collection and Asian-locale collection simultaneously

 

One K2 Server or indexing application cannot load a collection using the multilanguage (uni) locale and a collection using an Asian locale (japanb , koreab , simpcb , tradcb ) into the same process. (97590, 98187, 100603)

Workaround: Disable support for those Asian languages in the uni locale. See the Verity Locales Release Notes V6.0.

For HTTP gateway, name of dynamically retrieved mod-date field has changed

 

If you are retrieving document- modification date dynamically from the gateway, note that the name of the gateway field has changed, from modified to modifiedDate .

Verity recommends that you do not read modification date dynamically, because it can seriously affect performance. Instead, read it from the existing default collection field Modified . (101096)

Verity Control Module watched services polling interval

 

To set the VCM watched services polling interval, create the environment variable VERITY_WS_POLLING . If its value is less than 30, 30 will be used as default. Otherwise its value will be set to the polling interval. (93930,92287)

Serializing VDK transactions

 

- To serialize VDK transactions, use the style.plc option /serialize = 1 . This could cause a performance problem (in some circumstances) when invoked, because indexing concurrency (for any one collection) is lost. (86674, 92901, 93136)

Here is a sample of style.plc using this option:

$control: 1
policy:
{
mode: default
/inherit = Generic #inherits from Generic policy mode
/serialize = 1
}
$$

VParametric methods to configure passage-based summaries

 

Two methods of ResultSet object have been added to set PBS properties (93500):

public void setSummaryMaxPassageBytes(int pbsMaxPsgBytes);

Sets the parameter that decides the maximum number of passages that will be returned for passage-based summary (PBS). Note that this number includes the bytes required for highlight tags if highlighting was requested. If highlighting tags are long the actual text (excluding the highlight tags) returned for the passages may be less.

public void setSummaryMaxPassageCount(int pbsMaxPsgCount);

Sets the maximum number of passages to retrieve with PBS.