Open topic with navigation
File System Connector can index documents into Vertica, so that you can run queries on structured fields (document metadata).
Depending on the metadata contained in your documents, you could investigate the average age of documents in a repository. You might want to answer questions such as: How much time has passed since the documents were last updated? How many files are regularly updated? Does this represent a small proportion of the total number of documents? Who are the most active users?
Tip: In most cases, HPE recommends sending documents to a Connector Framework Server (CFS). CFS extracts metadata and content from any files that the connector has retrieved, and can manipulate and enrich documents before they are indexed. CFS also has the capability to insert documents into more than one index, for example IDOL Server and a Vertica database. For information about sending documents to CFS, see Send Data to Connector Framework Server
When documents are indexed into Vertica, File System Connector adds a timestamp that contains the time when the document was indexed. The field is named
VERTICA_INDEXER_TIMESTAMP and the timestamp is in the format
When a document in a data repository is modified, File System Connector adds a new record to the database with a new timestamp. All of the fields are populated with the latest data. The record describing the older version of the document is not deleted. You can create a projection to make sure your queries only return the latest record for a document.
When File System Connector detects that a document has been deleted from a repository, the connector inserts a new record into the database. The record contains only the
DREREFERENCE and the field
VERTICA_INDEXER_DELETED set to
Documents that are created by connectors can have multiple levels of fields, and field attributes. A database table has a flat structure, so this information is indexed into Vertica as follows:
my_fieldwith a sub-field named
subfieldresults in two columns,
my_field, with an attribute named
my_attributeresults in two columns,
my_fieldholding the field value and
my_field.my_attributeholding the attribute value.