Grammar Reference > Create and Edit Grammar Files > Regular Expressions

Regular Expressions
This section describes the regular expressions syntax that Eduction supports.
The engine’s parser interprets regular expression syntax nearly identically to the UNIX regular expression syntax. The engine’s regular expression syntax also includes some extensions for matching substrings.
Operators
Table 1 lists the base regular expression operators available in the Eduction engine and the pattern the operator matches.
 
Table 1 Operators  
Quantifiers
 
Table 2 Quantifiers  
Match at least n times, but no more than m times.
Metacharacters
 
Table 3 Metacharacters  
Match end of string. Never match at line breaks; only match at the end of the final buffer of text submitted for matching.
Extensions
 
Table 4 Extensions  
(?A:entity)
Copying an entity improves pattern execution speed, but increases compilation time and memory usage. It is recommended unless the copied entry is large and is copied multiple times.
(?A^entity)
Referencing an entity minimizes the size and memory usage of the grammar, but decreases performance. The performance impact can vary from unnoticeable to significant, depending on the size and structure of the grammar.
(?A!expr)
Match the expression expr but exclude its output. Designates an expression that helps identify an entity, but is not part of it.
   <grammar name="person">
      <entity name="age" type="public">
         <pattern>(?A!Age:\s)[1-9][0-9]?</pattern>
      </entity>
   </grammar>
   Name: Simon. Age: 32. Address. 12 Fifth Street, Las Vegas.
the text 32 is returned but 12 is ignored because it does not have the prefix “Age:”, which is matched upon but excluded from the output.
(?A=component:expr)
Define a component within an entity’s definition. A component is a named part of an entity.
For example, the following grammar defines areacode and main as components:
   <grammar name="number">
      <entity name="phone" type="public">
         <pattern>(?A=areacode:[0-9]{3})-(?A=main:[0-9]{3}-[0-9]{4})</pattern>
      </entity>
   </grammar>
   The phone number is 408-555-1342.
   <OutputSimpleMatchInfo>false</OutputSimpleMatchInfo>
   <EnableComponents>true</EnableComponents>
then the output displays the areacode value 408 and the main value 555-1342 separately.
Token Properties
 
Table 5 Token Properties  
(?A:{properties})
Matches a token that satisfies the list of properties provided. The properties are specified in a comma-separated list of one or more of the following: