Information Repositories

An enterprise might store millions of files on dozens of file servers. This same enterprise might also store many thousands of Web pages on multiple Web servers. In each case, files are stored in their native formats on a single type of platform or storage medium, or accessed through a specific connection protocol. The file server represents one type of information repository, while the Web server represents another.


Figure 2-1    Examples of information repositories



K2 includes indexing technology that gives you simultaneous access to many types of repositories. You can classify, search for, retrieve, and compare information in locations as diverse as a record in an Oracle database and an XML page on an IIS Web server.

Indexed repository data is the basis for many of Verity’s information-management features, including text search, parametric selection, topic sets, and knowledge trees.

Verity Gateways

Verity applications access repositories through gateways, driver modules that provide interfaces to specific repository types. Gateways unify your business information by making it all available for indexing and retrieval, regardless of where and how it is stored.

The gateways supplied by Verity with the K2 product include the following:

Verity ODBC Gateway. Provides access to ODBC-compliant databases. The gateway can combine data from any number of databases and tables, such as help desk, sales tracking, or marketing information, so users can view it alongside other enterprise information resources.

 

Verity Lotus Notes Gateway. Provides connectivity to Lotus Notes repositories. The gateway allows remote or local access, supports encrypted secure Internet passwords, and permits users to search and retrieve information in all views, as well as attachments, OLE objects and encrypted fields.

 

Verity Documentum Gateway. Provides connectivity to the Documentum eContent Server. The gateway allows users to search and retrieve Documentum content, metadata, and repository-managed properties. It supports hierarchies and relationships such as simple and virtual documents and handles annotations.

 

Verity Exchange Gateway. Provides secure access to documents in Microsoft Exchange public folders through an Exchange MAPI client. The gateway allows authorized users to search and retrieve information in e-mail attachments and public folders.

 

Verity HTTP Gateway. Provides simultaneous accesses to information on multiple Internet and intranet Web sites. The gateway allows exploration of all CGI-compliant Web servers. It supports proxy and firewall authentication, HTTPS/SSL and various login methods.

 

Verity File System Gateway. Provides access to information on UNIX and Microsoft NTFS file systems. It supports local access, as well as remote mounted, mapped, or UNC access.

 

 

   
  Note   Additional types of gateways may be available through Verity Professional Services.
   

 

If your business stores information in repositories other than those described here, Verity offers the Gateway Development Kit (GDK), which includes a set of APIs for writing customized gateways for unique repositories. These APIs allow you to build new gateways or modify existing gateways to accommodate your specific features. See the Verity Gateway Developer’s Kit Programming Reference for more information.

Verity Document Filters

Accessing a file in a repository is only the first step toward indexing its information. Repositories store documents in hundreds of native file formats. Therefore, Verity also supplies document filters, driver modules that can detect, open, and extract the text from files in hundreds of the most popular file types, including

Word processing files, such as Microsoft Word, Lotus Word Pro and Corel WordPerfect

 

Spreadsheet documents, such as Lotus 1-2-3, Corel QuattroPro and Microsoft Excel

 

Presentation files, such as Corel Presentations, Microsoft PowerPoint, and Lotus Freelance

 

Adobe Acrobat PDF

 

HTML and XML

 

Entity-extraction filter

 

Document filters not only extract the textual content of documents for indexing, they can also extract field information, such as the title or author of a text document. (Field information from a file is indexed separately from its text content.) The entity extraction filter goes even farther; it extracts entities (such as names or addresses) from a document’s regular body text and saves them in collection fields.