hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From pi song <pi.so...@gmail.com>
Subject Re: switching to different parser in Pig
Date Sat, 14 Feb 2009 12:10:53 GMT
Due to my limited knowledge, I don't quite understand why building ast from
outside Pig would be helpful. Isn't Pig Latin already good enough to
interface to the world?

In terms of parser generator, has anyone considered ANTLR? I had spent a few
weeks on it a while ago. It is quite well-documented and the tools are
GREAT!! (see http://www.antlr.org/works/index.html) Its license is BSD which
is the same as JavaCC anyway. The only ugly thing is that you'll have
antlr.jar in your distribution.


On Fri, Feb 13, 2009 at 6:34 PM, Mridul Muralidharan

> This sounds like a great idea !
> Would be great if other means of generating ast's for pig was possible.
> Regards,
> Mridul
> Ted Dunning wrote:
>> In general, it would be really, really nice if it were easy to build
>> abstract Pig syntax trees outside of the normal parser.
>> For instance, I find the fact that pig is not a full scale scripting
>> language incredibly confining.  I would love to be able to build a DSL in
>> groovy that let me use groovy for scripting, but still execute pig jobs
>> easily.  If I could build Pig syntax trees easily, then I would be, as
>> they
>> say, in pig heaven.
>> That would also let the switch to a different parsing technology happen
>> gradually rather than all at once.  Two different grunt interpreters could
>> coexist for a short time while the new one is proved out.
>> On Thu, Feb 12, 2009 at 3:58 PM, Olga Natkovich <olgan@yahoo-inc.com>
>> wrote:
>>  Pig Developers,
>>> Pig currently uses javacc for parsing pig commands. We have found
>>> several shortcomings with using javacc. In particular,
>>> (1) Lack of good documentation which makes it hard to and time consuming
>>> to learn javacc and make changes to Pig grammar
>>> (2) No easy way to customize error handling and error messages
>>> (3) Single path that performs both tokenizing and parsing
>>> We are considering to use JFlex and Cup which are Java versions of Lex
>>> and Bison instead. The main advantage of this transition is proven, well
>>> known and well understood technology and input format. In addition, it
>>> addresses the issues stated above.
>>> One problem with the transition is that JFlex and Cup have GPL license
>>> that is not compatible with Apache license. The workaround could be that
>>> we don't commit the tools into SVN and instead developers who need to
>>> update grammar would install them on their own. Note, that we can commit
>>> the input grammar as well as the output of the grammar into SVN which
>>> means that for developers just compiling code or making non-parser
>>> changes, there will be no impact.
>>> Please, comment on whether you think this is a reasonable change.
>>> Thanks,
>>> Olga

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message