Glossary

Glossary
 
A grammar file that has been compiled from XML into ECR file format using the Eduction command-line tool edktool, so that Eduction can use it directly. See also standard grammar, grammar, and user grammar.
An XML file that provides a vocabulary for an entity. Eduction uses the dictionary to scan a document and extract the defined entities that match the search pattern. See also extraction.
ECR is a proprietary format for grammar files that Eduction can easily read at runtime. You can write grammar files in XML, then use the Eduction command-line tool edktool to compile them into ECR format. See also compiled grammar.
Extensible Markup Language (XML)
Eduction extracts entities from documents based on the rules you have created in your dictionaries and grammars, and returns an XML list of matches, or adds the matches to the source document as new fields. See also grammar and dictionary.
Linguistic Sentiment Analysis (LSA)
A tool based on Eduction and sentiment grammar files, that you can use to identify positive, negative, or neutral sentiments in text.
A formula used to validate identification numbers, such as credit card numbers and social security numbers. The formula checks for errors by performing mathematical operations in the number to calculate a number that must agree with the final digit of the number.
A pattern is a description of the entity you want to extract, that enables Eduction to produce a list of matches based on that pattern. A pattern can explicitly list what Eduction should look for (for example, a list of names), or can specify in general terms what a match should look like (for example, phone numbers). See also entities, extraction, and grammar.
The recall of an extraction is the percentage of matches that are actually returned, out of the total number of matches that should return in theory. See also precision.
A form of sentence analysis that identifies the constituent parts of the sentence, such as noun phrases, but not their structure or their role in the sentence. See also chunking.
Eduction includes a set of standard grammars that allow you to extract the most common entities, such as person, place, or company names, legal terms, addresses, dates, and times. See also compiled grammar, grammar, and user grammar.
IDOL Server stores document text as a series of tokens. Generally, a token is a word, but it can also include other strings of characters (such as a phone number or e-mail address).
XML files created by the user that describe entities that can locate patterns in text using the Eduction grammar language. See also Extensible Markup Language (XML).