AND/IF 1.0 -- a file format for exchanging finite automata descriptions

Contributors:

Jo Ebergen, Charles Molnar, Radu Negulescu, Huub Schols,
Bob Sproull (editor), Jan Tijmen Udding, Tom Verhoeff

An ad hoc meeting of several members of the research community working on techniques for designing and analyzing asynchronous systems felt it would be a good idea to define a simple text file format for exchanging descriptions of finite automata and similar structures. This document describes the format, whose name is pronounced "Andover IF", "Andover" in honor of the town in which the meeting was held (Andover, Mass.) and "IF" to stand variously and ambiguously for "intermediate form" or "interchange format."

The basic objectives of the design are:

0. Be simple.

1. Use only ASCII text, so that AND/IF files can be exchanged easily, embedded in electronic mail messages, etc.

2. Be parsable easily by computers and (just barely) readable by humans. While AND/IF files may be prepared by hand with a text editor, they will more likely be generated from another, more convenient notation.

3. Be extensible, to accommodate experimental uses without the need for widespread changes to software embedded within numerous tools. In particular, the structures encoded by AND/IF can be given non-standard interpretations.

I. Lexical conventions

Lexical processing decodes the AND/IF text file into a sequence of tokens. A token is either an identifier, a left parenthesis "(", or a right parenthesis ")". Define a "whitespace" character to be any of the characters: space, tab, carriage return, line feed, form feed, or vertical tab. A comment is any sequence of characters on the same line starting with a "%" character and ending with an end-of-line. An "identifier" is a nonempty sequence of characters, none of which is a whitespace character or "(" or ")" or "%".

Identifiers will be used to tag different parts of the file, to name states, to name symbols in the alphabet of the finite automaton, and for other purposes explained below. Identifiers are not case sensitive (i.e., NAME is equal to name and Name). However, programs that read AND/IF files are encouraged to preserve the case of identifiers as they appear in the file, so that presentations of the symbols closely match the AND/IF description. By convention, underscore (_) is used to simulate spaces within identifiers. A "number" is an identifier that can be parsed as a non-negative integer in standard decimal notation.

If the input stream has been separated into tokens up to a given character, the next token is the longest string of characters that could constitute a token. After tokens are identified by lexical scanning, whitespace characters and comments are discarded.

A special lexical convention permits finding the beginning of an AND/IF specification that is embedded within a larger text file. The first 11 characters of the AND/IF text are a "herald" that appears on a single line (no leading white space), as follows:

(AND/IF_1.0

This convention establishes a way for a parser to skip over extraneous material (such as email headers or other comments) until it detects the herald that signals the beginning of the text that should be parsed according to the rules specified below. If new versions of the AND/IF format are developed, the first 8 characters of the herald will remain invariant, but a new version number will be substituted. A parser willing to deal with arbitrary versions should therefore match only the first 8 characters of the herald.

II. Syntactic conventions

The description of the AND/IF syntax uses a simplified form of BNF together with informally described context conditions. Any string of the form <...> is a non-terminal symbol. Any string of the form <<...>> stands for an identifier token whose interpretation is given by the text within <<...>>. Any string of characters and punctuation stands for the corresponding identifier token. Parentheses, namely ( and ), stand for the corresponding parenthesis tokens.

Vertical bar (|) separates syntactic alternatives. Kleene star (*) indicates zero or more repetitions of the preceding non-terminal. If a non-terminal appears on the left-hand side of more than one production, each right-hand side is interpreted as a syntactic alternative for the non-terminal.

The AND/IF specification can be parsed by a simple recursive-descent parser.

AND/IF uses a convention in which the specification is a well-formed list structure. The first item in each list is an identifier ("type") that determines the meaning of the rest of the items in the list. This document defines a number of standard types. If the parser does not know how to interpret a type, it can easily skip over the rest of the list labeled by that type.

You may introduce private types for experimental purposes; in this way, you can experiment with enhancements to AND/IF files that may later be accepted into the standard. To avoid name clashes on experimental types, the prefix of a type you define should be your last name, followed by a slash (/). Thus "verhoeff/box" might be a type introduced by Tom Verhoeff. Places where private types may appear in the format are identified by the symbol <<private_type>>.

III. AND/IF format

A file contains any number of descriptions:

<AND/IF_specification> ::= ( AND/IF_1.0 <description>* )

Each description contains a type and any number of clauses. (Note: Despite appearances, this BNF rule is consistent with the lexical convention specified above that requires the first 8 characters of an AND/IF file to be "(AND/IF_". BNF rules are formatted to show clearly the separate tokens in them, hence there is whitespace between the two tokens "(" and "AND/IF_1.0" in the line above. However, the lexical rules do not require whitespace between a "(" and a subsequent identifier. In the particular case of the first two tokens in the file, the special lexical convention forbids whitespace to appear.)

IV. NFA description

An NFA defines a non-deterministic finite automaton. The nomenclature and standard interpretation for an NFA is taken from J.E. Hopcroft and J.D. Ullman, "Introduction to Automata Theory, Languages, and Computation," Addison Wesley 1979.

<description> ::= ( NFA <nfa_clause>* )
<nfa_clause>  ::= <nfa_symbols> | <nfa_states> | <nfa_transitions> |
                  <nfa_options>

An NFA is described by a number of clauses. It is a context condition that an NFA description must contain at least one SYMBOLS clause, one STATES clause, and one TRANSITIONS clause. Note that there may be more than one clause of each type. The OPTIONS clause is optional.

<nfa_symbols> ::= ( SYMBOLS <symbol_description>* )
<symbol_description> ::= <<symbol_name>> |
                         ( <<symbol_name>> <symbol_property>* )
<symbol_property> ::= INPUT | OUTPUT | INOUT | EPSILON |
		      <<private_type>>

The SYMBOLS clauses together specify the alphabet of the NFA by giving an identifier for each symbol in the alphabet; it may also specify a symbol to be used to identify "epsilon transitions." Optionally, a symbol may have one or more properties. Each symbol must be listed only once.

The properties INPUT, OUTPUT, and INOUT determine whether the symbol corresponds to a signal in the machine that is an input, output, or both. The EPSILON property identifies the single symbol (if any) that labels "epsilon transitions" in the NFA. The epsilon symbol is *not* a member of the alphabet, but the symbol can be used to label transitions.

<nfa_states> ::= ( STATES <state_description>* )
<state_description> ::= <<state_name>> | 
                        ( <<state_name>> <state_property>* )
<state_property> ::= INITIAL | FINAL | TRANSIENT | DEMANDING |
                     BOX | BOTTOM | TOP | <<private_type>>

The STATES clauses together specify the set of states of an NFA by giving an identifier for each state. The identifiers may be numbers but need not be. Optionally, a state may have one or more properties. Each state must be listed only once. States are permitted to be named by identifiers specified in a SYMBOLS clause, but this is not recommended.

The interpretations for the optional state properties are:

INITIAL: A "starting state" of the NFA. Some tools may insist that an NFA have exactly one INITIAL state.
FINAL: An accepting state for an NFA defining a regular language
TRANSIENT: The module being defined by this NFA is unstable in this state, and is expected to make a transition by generating an output.
DEMANDING: The module being defined by this NFA is expecting an input from the environment in this state.
BOX: No obligation for module or environment to proceed.
BOTTOM: The "chaos" state in which further behavior of the module is indeterminate, because the environment has misbehaved.
TOP: The state in which further behavior of the environment is indeterminate, because the module has misbehaved.
<<private_type>>: To accommodate unanticipated properties.

<nfa_transitions> ::= ( TRANSITIONS <transition>* )
<transition> ::= ( <<from_state>> <<to_state>>
		<<symbol_name>> <transition_property>* )
<transition_property> ::= <<private_type>>

The TRANSITIONS clauses together specify the set of transitions of the NFA. Each transition must be labeled by a symbol. Each state and each symbol in a transition must have been specified in a preceding STATES and SYMBOLS clause. Although it is not customary to label transitions with properties, we have provided syntax for optionally doing so.

The following clauses are optional.

<nfa_options> ::= ( NAME <<identifier_or_list>>* )

This optional clause gives the NFA a printable name. It is recommended that each NFA be given a name.

<nfa_options> ::= ( NOTE <<identifier_or_list>>* )

This optional clause lets you associate notes with an NFA, such as what tool generated it, what problem it is solving, etc.

<nfa_options> ::= ( INTERPRETATION <<private_type>> )

The standard interpretation of an NFA is that given by Hopcroft and Ullman. However, the AND/IF format can be used to encode descriptions that have the same structure as an NFA but a different interpretation. For example, a transition labeled OTHER might be used to specify ways that the NFA interacts with other automata in a composition rather than a transition that consumes (or emits) a symbol named OTHER. While this machine is structurally equivalent to an NFA, it has a quite different interpretation. If the description has a non-standard interpretation, the author is encouraged to flag this with the INTERPRETATION option, using a private name for the interpretation.

<nfa_options> ::= <private_list>
<private_list>  ::= ( <<private_type>> <identifier_or_list>* )
<identifier_or_list> ::= <<any_identifier>> | <private_list>

This optional clause lets you add clauses to the definition of an NFA.

V. Other descriptions

In the future, AND/IF may be extended to allow other standard descriptions. The following structure allows you to include non-standard descriptions in an AND/IF file, provided they obey a well-defined list structure:

<description> ::= <private_list>

VI. NFA examples

Here is an example of a JOIN module with inputs A and B, and output C:

(AND/IF_1.0
(NFA
	(NOTE Created by hand)
	(NAME Join module with inputs A and B, and output C.)
	(SYMBOLS (A INPUT) (B input) (C output))
	(STATES (0 INITIAL) 1 2 3)
	(TRANSITIONS (0 1 A))
	(Transitions (1 3 B))
	(TRANSITIONS (0 2 B))
	(TRANSITIONS (2 3 A))
	(TRANSITIONS (3 0 C))
))

Here is an equivalent form that uses only one TRANSITIONS clause:

(AND/IF_1.0
(NFA
	(NAME Join module with inputs A and B, and output C.)
	(SYMBOLS (A INPUT) (B input) (C output))
	(STATES (0 INITIAL) 1 2 3)
	(TRANSITIONS (0 1 A) (1 3 B) (0 2 B) (2 3 A) (3 0 C))
))

Here is an NFA that makes use of epsilon transitions (Hopcroft and Ullman, Fig. 2.8, p. 24):

(AND/IF_1.0
(NFA
	(NAME Hopcroft and Ullman Figure 2.8)
	(SYMBOLS 0 1 2 (e EPSILON))
	(STATES
		(Q0 INITIAL)
                Q1
		(Q2 FINAL))
	(TRANSITIONS
		(Q0 Q0 0)
		(Q0 Q1 e)
		(Q1 Q1 1)
		(Q1 Q2 e)
		(Q2 Q2 2))
))

Here is an NFA with a single initial state and no transitions.

(AND/IF_1.0
(NFA
	(NAME A single-state NFA with no transitions)
	(SYMBOLS )
	(STATES (Q INITIAL))
	(TRANSITIONS )
))

The empty list of symbols and transitions must be present, since each NFA description must have at least one SYMBOLS, STATES, and TRANSITIONS clause.

VII. Summary of context conditions

There are a few general context conditions for the AND/IF grammar. Here they are.

Each NFA description must contain at least one SYMBOLS clause, one STATES clause, and one TRANSITIONS clause.
Each symbol name may be listed only once in the SYMBOLS clause(s).
Each state name may be listed only once in the STATES clause(s).
At most one symbol name may be specified as the EPSILON symbol.
Symbol names and state names must be declared before use in a TRANSITIONS clause.

In addition to these general conditions, specific interpretations or tools may require more context conditions to hold. For example, there may be further conditions on attaching symbol properties to symbols, on attaching state properties to states, or on transitions. Such conditions can be associated with particular interpretations of an NFA and can be specified later.

VIII. Summary of the grammar rules

Here are all the grammar rules summarized.

<AND/IF_specification> 
   ::= ( AND/IF_1.0 <description>* )

<description> 
   ::= ( NFA <nfa_clause>* )
     | <private_list>

<nfa_clause>  
   ::= <nfa_symbols> 
     | <nfa_states> 
     | <nfa_transitions> 
     | <nfa_options>

<nfa_options> 
   ::= ( NAME <identifier_or_list>* )
     | ( NOTE <identifier_or_list>* )
     | ( INTERPRETATION <<private_type>> )
     | <private_list>

<nfa_symbols>        
   ::= ( SYMBOLS <symbol_description>* )

<symbol_description> 
   ::= <<symbol_name>> 
     | ( <<symbol_name>> <symbol_property>* )

<symbol_property>    
   ::= INPUT | OUTPUT | INOUT | EPSILON | <<private_type>>

<nfa_states>         
   ::= ( STATES <state_description>* )

<state_description>  
   ::= <<state_name>> 
     | ( <<state_name>> <state_property>* )

<state_property>   
   ::= INITIAL | FINAL | TRANSIENT | DEMANDING | BOX | BOTTOM | TOP 
     | <<private_type>>

<nfa_transitions> 
   ::= ( TRANSITIONS <transition>* )

<transition> 
   ::= ( <<from_state>> <<to_state>> <<symbol_name>> <transition_property>* )

<transition_property> 
   ::= <<private_type>>

<private_list> 
   ::= ( <<private_type>> <identifier_or_list>* )
<identifier_or_list> 
   ::= <<any_identifier>> | <private_list>

AND/IF 1.0 -- a file format for exchanging finite automata descriptions

Last modified at Mon Oct 26 11:56:33 1998