Introduction to Eduction > About Eduction

About Eduction

Topics in this Section

Eduction is a tool that you can use to identify and extract an entity (a word, phrase, or block of information) from text, based on a pattern you define. The pattern can be a dictionary of names such as people or places (see Figure 1), or the pattern can describe what the sequence of text looks like without having to list it explicitly, for example, a telephone number, or a time (see Figure 2). The entities are contained inside grammar files.
Eduction includes standard grammar files, which allow you to quickly and easily extract commonly sought entities, such as social security numbers, names, telephone numbers, addresses, and so on.
For example, if you apply a grammar file that contains rules for identifying telephone numbers to an e-mail chain, the output consists of a list of all the telephone numbers that Eduction identified in the e-mail chain. In some cases, Eduction might also identify the type of telephone number (for example, mobile or landline), and where it occurs in the document.
 
NOTE Figure 1 and Figure 2 provide simplified versions of grammar files for example purposes, not actual source code.
Figure 1 Simplified grammar file containing a dictionary of place names
Figure 2 Simplified grammar file containing patterns to match times of day
Eduction also allows you to extend existing grammars, and to author new ones.