incubator-flex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Arno" <da...@davidarno.org>
Subject [gosh] On the sad tale of BNF and optional semicolons
Date Tue, 21 Feb 2012 17:27:59 GMT
I have spent a bit of time over the last few days trying to define a BNF
grammar for AS3. As "Left Right" predicted, it's looking like this just
isn't possible. The problem is such a trivial little thing too: optional
semicolons.
 
To illustrate the problem, let me give an example piece of AS3 BNF:
 
imports = import | imports import
import = 'import' type_reference ';'
 
As I'm sure there are many here who are unfamiliar with BNF, I'll explain
what the above means. The first line defines a symbol: imports, which is
defined as either being import, or a recursive reference to itself, followed
by import. In other words, a language consisting of one or more import
symbols is a language that matches the imports symbol. Next, I define what
the import symbol is, it being an "import" keyword token, followed by yet
another symbol, type_reference (which I haven't included here) and finally a
semicolon. This snippet then covers import collections, such as:
 
       import flash.display.DisplayObject;
       import flash.events.Event;
       import flash.events.MouseEvent;
 
However, there is a problem. AS3 doesn't mandate that I specify those
semicolons at the end of the lines. AS3 supports implied semicolons: in
certain circumstances, the newline character is good enough to tell the
compiler that the end of a statement has been reached, so a semicolon isn't
required. Unfortunately, such a concept cannot be handled by BNF. If end of
line characters are significant within the grammar, then within BNF, they
must be explicitly referenced. As a result, the BNF becomes really complex:
 
imports = import | imports import
import = 'import' type_reference import_terminator
    | 'import' unimportant_newlines type_reference statement_terminator
unimportant_newlines = '\n' | unimportant_newlines '\n'
statement_terminator = ';' 
    | ';' unimportant_newlines
    | '\n'
    | '\n' unimportant_newlines
    | unimportant_newlines ';'
    | unimportant_newlines ';' unimportant_newlines
 
Not only does this become unreadable (remember, unimportant_newlines will
appear absolutely everywhere in the BNF where whitespace is allowed in the
code, but any tool that generates a parser from BNF-like definitions will
complain of conflicts as newline characters can apply to multiple rules at
any one time and so the parser has to guess which one to use.
 
As far as I can see, we have three choices:
 
1.     Hand craft a parser that can handle optional semicolons, rather than
using a BNF-based one. I really don't want to do this as it requires us to
pick one language for the compiler , it's harder to maintain and takes
longer to write.
2.     Hand craft a lexical analyser that knows about optional semicolons
and inserts missing ones into the token stream passed to the parser. I have
to confess I've no idea at this stage how feasible this would be, and it has
the same issues of language-specificity and complexity as the previous
option.
3.     Make semicolons mandatory.
 
The purpose of this email is to gauge people's reactions to option 3. If we
created a compiler that mandated semicolons, would this cause problems for
anyone? Is it an idea we can consider, or is it a complete no-no?
 
Thoughts and opinions please people.
 
David. 

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message