Jo Ebergen, Charles Molnar, Radu Negulescu, Huub Schols,Send comments to rsproull@east.sun.com
Bob Sproull (editor), Jan Tijmen Udding, Tom Verhoeff
An ad hoc meeting of several members of the research community working on techniques for designing and analyzing asynchronous systems felt it would be a good idea to define a simple text file format for exchanging descriptions of finite automata and similar structures. This document describes the format, whose name is pronounced "Andover IF", "Andover" in honor of the town in which the meeting was held (Andover, Mass.) and "IF" to stand variously and ambiguously for "intermediate form" or "interchange format."
The basic objectives of the design are:
0. Be simple.
1. Use only ASCII text, so that AND/IF files can be exchanged easily, embedded in electronic mail messages, etc.
2. Be parsable easily by computers and (just barely) readable by humans. While AND/IF files may be prepared by hand with a text editor, they will more likely be generated from another, more convenient notation.
3. Be extensible, to accommodate experimental uses without the need for widespread changes to software embedded within numerous tools. In particular, the structures encoded by AND/IF can be given non-standard interpretations.
(
", or a
right parenthesis ")
". Define a "whitespace" character to be any of
the characters: space, tab, carriage return, line feed, form feed, or
vertical tab. A comment is any sequence of characters on the same
line starting with a "%
" character and ending with an end-of-line. An
"identifier" is a nonempty sequence of characters, none of which is a
whitespace character or "(
" or ")
" or "%
".
Identifiers will be used to tag different parts of the file, to name
states, to name symbols in the alphabet of the finite automaton, and
for other purposes explained below. Identifiers are not case
sensitive (i.e., NAME is equal to name and Name). However, programs
that read AND/IF files are encouraged to preserve the case of
identifiers as they appear in the file, so that presentations of the
symbols closely match the AND/IF description. By convention,
underscore (_
) is used to simulate spaces within identifiers. A
"number" is an identifier that can be parsed as a non-negative integer
in standard decimal notation.
If the input stream has been separated into tokens up to a given character, the next token is the longest string of characters that could constitute a token. After tokens are identified by lexical scanning, whitespace characters and comments are discarded.
A special lexical convention permits finding the beginning of an AND/IF specification that is embedded within a larger text file. The first 11 characters of the AND/IF text are a "herald" that appears on a single line (no leading white space), as follows:
(AND/IF_1.0
This convention establishes a way for a parser to skip over extraneous
material (such as email headers or other comments) until it detects the
herald that signals the beginning of the text that should be parsed
according to the rules specified below. If new versions of the AND/IF
format are developed, the first 8 characters of the herald will remain
invariant, but a new version number will be substituted. A parser
willing to deal with arbitrary versions should therefore match only the
first 8 characters of the herald.
Vertical bar (|
) separates syntactic alternatives. Kleene star (*
)
indicates zero or more repetitions of the preceding non-terminal. If
a non-terminal appears on the left-hand side of more than one
production, each right-hand side is interpreted as a syntactic
alternative for the non-terminal.
The AND/IF specification can be parsed by a simple recursive-descent parser.
AND/IF uses a convention in which the specification is a well-formed list structure. The first item in each list is an identifier ("type") that determines the meaning of the rest of the items in the list. This document defines a number of standard types. If the parser does not know how to interpret a type, it can easily skip over the rest of the list labeled by that type.
You may introduce private types for experimental purposes; in this way,
you can experiment with enhancements to AND/IF files that may later be
accepted into the standard. To avoid name clashes on experimental
types, the prefix of a type you define should be your last name,
followed by a slash (/
). Thus "verhoeff/box
" might be a type introduced
by Tom Verhoeff. Places where private types may appear in the format
are identified by the symbol <<private_type>>.
<AND/IF_specification> ::= ( AND/IF_1.0 <description>* )Each description contains a type and any number of clauses. (Note: Despite appearances, this BNF rule is consistent with the lexical convention specified above that requires the first 8 characters of an AND/IF file to be "
(AND/IF_
". BNF rules are formatted to show clearly
the separate tokens in them, hence there is whitespace between the two
tokens "(
" and "AND/IF_1.0
" in the line above. However, the lexical rules
do not require whitespace between a "(
" and a subsequent identifier. In
the particular case of the first two tokens in the file, the special
lexical convention forbids whitespace to appear.)
<description> ::= ( NFA <nfa_clause>* ) <nfa_clause> ::= <nfa_symbols> | <nfa_states> | <nfa_transitions> | <nfa_options>An NFA is described by a number of clauses. It is a context condition that an NFA description must contain at least one SYMBOLS clause, one STATES clause, and one TRANSITIONS clause. Note that there may be more than one clause of each type. The OPTIONS clause is optional.
<nfa_symbols> ::= ( SYMBOLS <symbol_description>* ) <symbol_description> ::= <<symbol_name>> | ( <<symbol_name>> <symbol_property>* ) <symbol_property> ::= INPUT | OUTPUT | INOUT | EPSILON | <<private_type>>The SYMBOLS clauses together specify the alphabet of the NFA by giving an identifier for each symbol in the alphabet; it may also specify a symbol to be used to identify "epsilon transitions." Optionally, a symbol may have one or more properties. Each symbol must be listed only once.
The properties INPUT, OUTPUT, and INOUT determine whether the symbol corresponds to a signal in the machine that is an input, output, or both. The EPSILON property identifies the single symbol (if any) that labels "epsilon transitions" in the NFA. The epsilon symbol is *not* a member of the alphabet, but the symbol can be used to label transitions.
<nfa_states> ::= ( STATES <state_description>* ) <state_description> ::= <<state_name>> | ( <<state_name>> <state_property>* ) <state_property> ::= INITIAL | FINAL | TRANSIENT | DEMANDING | BOX | BOTTOM | TOP | <<private_type>>The STATES clauses together specify the set of states of an NFA by giving an identifier for each state. The identifiers may be numbers but need not be. Optionally, a state may have one or more properties. Each state must be listed only once. States are permitted to be named by identifiers specified in a SYMBOLS clause, but this is not recommended.
The interpretations for the optional state properties are:
<nfa_transitions> ::= ( TRANSITIONS <transition>* ) <transition> ::= ( <<from_state>> <<to_state>> <<symbol_name>> <transition_property>* ) <transition_property> ::= <<private_type>>The TRANSITIONS clauses together specify the set of transitions of the NFA. Each transition must be labeled by a symbol. Each state and each symbol in a transition must have been specified in a preceding STATES and SYMBOLS clause. Although it is not customary to label transitions with properties, we have provided syntax for optionally doing so.
The following clauses are optional.
<nfa_options> ::= ( NAME <<identifier_or_list>>* )This optional clause gives the NFA a printable name. It is recommended that each NFA be given a name.
<nfa_options> ::= ( NOTE <<identifier_or_list>>* )This optional clause lets you associate notes with an NFA, such as what tool generated it, what problem it is solving, etc.
<nfa_options> ::= ( INTERPRETATION <<private_type>> )The standard interpretation of an NFA is that given by Hopcroft and Ullman. However, the AND/IF format can be used to encode descriptions that have the same structure as an NFA but a different interpretation. For example, a transition labeled OTHER might be used to specify ways that the NFA interacts with other automata in a composition rather than a transition that consumes (or emits) a symbol named OTHER. While this machine is structurally equivalent to an NFA, it has a quite different interpretation. If the description has a non-standard interpretation, the author is encouraged to flag this with the INTERPRETATION option, using a private name for the interpretation.
<nfa_options> ::= <private_list> <private_list> ::= ( <<private_type>> <identifier_or_list>* ) <identifier_or_list> ::= <<any_identifier>> | <private_list>This optional clause lets you add clauses to the definition of an NFA.
<description> ::= <private_list>
(AND/IF_1.0 (NFA (NOTE Created by hand) (NAME Join module with inputs A and B, and output C.) (SYMBOLS (A INPUT) (B input) (C output)) (STATES (0 INITIAL) 1 2 3) (TRANSITIONS (0 1 A)) (Transitions (1 3 B)) (TRANSITIONS (0 2 B)) (TRANSITIONS (2 3 A)) (TRANSITIONS (3 0 C)) ))Here is an equivalent form that uses only one TRANSITIONS clause:
(AND/IF_1.0 (NFA (NAME Join module with inputs A and B, and output C.) (SYMBOLS (A INPUT) (B input) (C output)) (STATES (0 INITIAL) 1 2 3) (TRANSITIONS (0 1 A) (1 3 B) (0 2 B) (2 3 A) (3 0 C)) ))Here is an NFA that makes use of epsilon transitions (Hopcroft and Ullman, Fig. 2.8, p. 24):
(AND/IF_1.0 (NFA (NAME Hopcroft and Ullman Figure 2.8) (SYMBOLS 0 1 2 (e EPSILON)) (STATES (Q0 INITIAL) Q1 (Q2 FINAL)) (TRANSITIONS (Q0 Q0 0) (Q0 Q1 e) (Q1 Q1 1) (Q1 Q2 e) (Q2 Q2 2)) ))Here is an NFA with a single initial state and no transitions.
(AND/IF_1.0 (NFA (NAME A single-state NFA with no transitions) (SYMBOLS ) (STATES (Q INITIAL)) (TRANSITIONS ) ))The empty list of symbols and transitions must be present, since each NFA description must have at least one SYMBOLS, STATES, and TRANSITIONS clause.
<AND/IF_specification> ::= ( AND/IF_1.0 <description>* ) <description> ::= ( NFA <nfa_clause>* ) | <private_list> <nfa_clause> ::= <nfa_symbols> | <nfa_states> | <nfa_transitions> | <nfa_options> <nfa_options> ::= ( NAME <identifier_or_list>* ) | ( NOTE <identifier_or_list>* ) | ( INTERPRETATION <<private_type>> ) | <private_list> <nfa_symbols> ::= ( SYMBOLS <symbol_description>* ) <symbol_description> ::= <<symbol_name>> | ( <<symbol_name>> <symbol_property>* ) <symbol_property> ::= INPUT | OUTPUT | INOUT | EPSILON | <<private_type>> <nfa_states> ::= ( STATES <state_description>* ) <state_description> ::= <<state_name>> | ( <<state_name>> <state_property>* ) <state_property> ::= INITIAL | FINAL | TRANSIENT | DEMANDING | BOX | BOTTOM | TOP | <<private_type>> <nfa_transitions> ::= ( TRANSITIONS <transition>* ) <transition> ::= ( <<from_state>> <<to_state>> <<symbol_name>> <transition_property>* ) <transition_property> ::= <<private_type>> <private_list> ::= ( <<private_type>> <identifier_or_list>* ) <identifier_or_list> ::= <<any_identifier>> | <private_list>