hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Olga Natkovich" <ol...@yahoo-inc.com>
Subject switching to different parser in Pig
Date Thu, 12 Feb 2009 23:58:50 GMT
Pig Developers,
 
Pig currently uses javacc for parsing pig commands. We have found
several shortcomings with using javacc. In particular,
 
(1) Lack of good documentation which makes it hard to and time consuming
to learn javacc and make changes to Pig grammar
(2) No easy way to customize error handling and error messages
(3) Single path that performs both tokenizing and parsing
 
We are considering to use JFlex and Cup which are Java versions of Lex
and Bison instead. The main advantage of this transition is proven, well
known and well understood technology and input format. In addition, it
addresses the issues stated above.
 
One problem with the transition is that JFlex and Cup have GPL license
that is not compatible with Apache license. The workaround could be that
we don't commit the tools into SVN and instead developers who need to
update grammar would install them on their own. Note, that we can commit
the input grammar as well as the output of the grammar into SVN which
means that for developers just compiling code or making non-parser
changes, there will be no impact.
 
Please, comment on whether you think this is a reasonable change.
 
Thanks,
 
Olga

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message