Multiple Primary Storage Formats

The lowest level of the Intelligent Content Services architecture consists of unstructured, semi-structured, and structured data belonging to a given organization or enterprise.

The typical enterprise has documents residing in a variety of content sources and databases. K2 provides access to documents in the following types of information repositories:

File systems (Windows and UNIX)


Web sites


Lotus Notes servers


Email servers (Microsoft Exchange)


Databases (ODBC)


Document-management systems (Documentum)


Documents can exist in a myriad of file formats ranging from Microsoft Office documents to Lotus Smartsuite to Adobe PDF. K2 utilizes state-of-the-art technology to read documents in over 200 such formats, extracting the structured and unstructured content from the documents. Documents may exist in many languages with different character sets—sets of numeric codes based on the characters of a language—and have attachments in multiple MIME-types and languages. K2 can support close to 100 languages and popular character sets including the Unicode standard.

K2 makes documents from this wide variety of sources available for searching by processing them into Verity collections, index structures that support extremely rapid and flexible search capabilities over very large numbers of documents.