Customize Entity Extraction

The Passage Extractor entity extraction file provides HPE Answer Server with a map to specify what components to use to extract entities, depending on the question classification.

When you ask a question, Passage Extractor classifies it by using the question classifier, and then finds matching documents and document sections in the data store. It uses IDOL Content highlighting to find the most relevant passages, which it uses as candidate answers. Passage Extractor then uses Eduction and an Agentstore component to find entities in the candidate answers that match the question classification.

For example, if you have an Agent entity database with the names of plants, and you send a question that Passage Extractor classifies as plants, Passage Extractor uses the Agentstore component to find the relevant plant entities in the candidate answer text.

By default, if you configure an Agentstore component, Passage Extractor uses the Agentstore for the classifications HUM:gr, all LOC classifications, ENTY:plant, ENTY:animal, and ENTY:lang. It uses Eduction and Agentstore for the HUM:ind question classification, and Eduction only for all other question classifications.

You can use the Entity Extraction file to modify these classifications, for example if you create additional Agent entity files for your data.

NOTE:

You do not need to specify an entity type to extract for every question classification. If a question classification does not appear in the entity extraction file, Passage Extractor does not attempt to extract entities. This might be appropriate for many question classifications (for example, if the appropriate answer is a long description, there might not be a corresponding entity).

Passage Extractor also attempts to corroborate the candidate answers, by comparing how often particular entities occur. In most cases, this improves the quality of the result answers.

In some cases, corroboration might not be appropriate. For example, if valid answers include very common words (such as one and two), the words might occur in multiple places, and be falsely corroborated as a likely answer. For this reason, corroboration is turned off for the NUM:count entity type in the default entity extraction JSON file.

You might also want to turn corroboration off if likely answers occur only once in your data set. In these cases you can modify the entity extraction JSON file to turn corroboration off for particular entities.

The Entity Extraction File Format

The entity extraction file contains the question classifications, which match the values that you use in the classifier training file. For each question classification, it also contains at least one of:

When there is an Agentstore database, you can also specify Agent FieldText to use in a query to the Agentstore entity database for the question classification.

The entity extraction file is a JSON file, with the following structure:

{
   "entity_map": [
      {
         "entity_type": "QuestionClass1", 
         "agentstore": {
            "databases": [ListOfAgentstoreDatabases],
            "fieldtext": "FieldTextRestriction"
         }, 
         "eduction": {"entities": [ListOfEductionEntities]},
         "corroborate": Boolean
      },
      {
         "entity_type": "QuestionClass2", 
         "agentstore": {
            "databases": [ListOfAgentstoreDatabases],
            "fieldtext": "FieldTextRestriction"
         }, 
         "eduction": {"entities": [ListOfEductionEntities]},
         "corroborate": Boolean
      }
      ...
   ]
}

where,

QuestionClassN is the name of the question classification (for example, HUM:ind).
ListofEductionEntities is an array of relevant Eduction entities.
ListOfAgentstoreDatabases is an array of databases in the Agentstore component that contain relevant entities.
FieldTextRestriction is an IDOL FieldText expression to use to restrict the Agent query in the specified database.

You must specify at least one of the eduction or agentstore properties for each question classification. If you specify the agentstore property, the database property is required, but fieldtext is not.

If you do not want to use entity extraction for a particular question classification, do not include it in the entity extraction file.

The corroborate property is optional. The default value is true.

The following example gives some of the question classifications in the default entity extraction file:

{
   "entity_map": [
      {
         "entity_type": "HUM:ind", 
         "agentstore": {"databases": ["people"]}, 
         "eduction": {"entities": ["hum/ind"]}
      },
      {
         "entity_type": "NUM:date", 
         "eduction": {"entities": ["num/date", "date/*"]}
      },
      {
         "entity_type": "ENTY:plant", 
         "agentstore": {
            "databases": ["organisms"], 
            "fieldtext": "MATCH{PLANTAE,VIRIDIPLANTAE}:ORGANISMS_KINGDOM"
         }
      },
      {
         "entity_type": "NUM:count", 
         "eduction": {"entities": ["num/count"]}, 
         "corroborate": false
      },

...

Modify the Entity Extraction File

The default entity extraction file, included in your HPE Answer Server installation , is appropriate for most installation. However, you might need to modify the file if:

To update the entity extraction file

  1. Open the entity extraction JSON file in a text editor.

  2. Make the necessary modifications. You can add, delete, or update, any of details for the question classifications.

    To turn off corroboration, add the corroborate property in a particular group and set it to false. For example:

    {
       "entity_type": "NUM:count", 
       "eduction": {"entities": ["num/count"]}, 
       "corroborate": false
    }
  3. Save and close the entity extraction file.

  4. Restart HPE Answer Server for your changes to take effect.

    NOTE:

    If you add new question classifications that do not exist in the classifier training file, you must also update the classifier training file and retrain the classifier. See Train Passage Extractor Classifiers.

Use a Different Entity Extraction File

You can use the EntityExtractionFile configuration parameter to configure the location of the entity extraction file. If you want to move or rename the entity extraction file, or use a different file for any reason, you must modify the value of this parameter to specify the name and location of the new file.


_HP_HTML5_bannerTitle.htm