edktool Command-Line Tool > edktool Options > Extract

Extract
This option extracts entities from a document. It can print the output to a file, or to STDOUT. You can use this option to test your grammars before running the Eduction IndexTasks module.
 
-l <licensefile>
The file containing a valid license key for Eduction. By default, edktool looks for the licensekey.dat file in the working directory.
-i <inputfile>
The file to perform entity extraction on. The input file can be either an IDOL IDX file, an IDOL XML file, or a plain text file. It must be UTF-8 encoded.
NOTE If the input file is an XML file, the configuration file (in either IDOL configuration file format or XML format) must contain entries for the DocumentDelimiterCSVs parameter. If this setting is not correct, Eduction might not find any documents in the XML file. For information on how to set this option, refer to the Eduction Parameters.
-c <configfile>
A configuration file controlling the extraction. The configuration file can be either an IDOL Server style .CFG configuration file or an XML configuration file. See Configuration Files for Eduction Settings.
One or more grammar files and one or more entities can be specified in place of a configuration file. Specifying a configuration file overrides the grammar or entity parameters.
-g <grammarfile>
If a grammar file is provided but no entities are specified with -e, all entities in the grammar file are extracted.
-e <entity>
-o <outputfile>
The file containing the results of the extraction. The content of the optional output file depends on the type of input file provided and whether the -m option is used.
If the input file type is an IDOL file and the -m option is not used, the output file is identical to the input file, except the matched entities are appended to each document as additional fields. This behavior is the same as Eduction running in IDOL.
If the input file is a plain text file or an IDOL file with the -m option, the output file is an XML file containing the matched entities.
The extract option requires an input file (either in IDOL IDX, IDOL XML, or plain text format) and either a configuration file or a grammar file. If you do not provide a configuration file, edktool searches the file for any specified entities in the specified grammar (or all entities, if none are specified). For example, in the simplest command line:
C:\>edktool e -i myData.txt -g grammar1.ecr,grammar2.ecr
edktool is invoked with no configuration file. It uses the command-line arguments to process the data file myData.txt with the grammar files grammar1.ecr and grammar2.ecr. Eduction identifies all the entities in the two grammar files, and matches on these. The output is sent to the console in XML format, identifying matches in the data file and using the entity names to generate field names for the matches that contain the matched data. Assuming myData.txt is a plain text file, the entire body of the file is matched.