hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Milind A Bhandarkar <mili...@yahoo-inc.com>
Subject Re: Consider cleaning up backend code
Date Thu, 22 Apr 2010 21:01:02 GMT
I think it is a great idea to be able to plug-in a different back-ends.

But the way to do that, IMHO, is to make the intermediate artifacts public
(akin to making byte-code specs public).

That way, independent projects can spring up that take the translated pig
script, and provide a new interpreter for that physical plan, and show their
superiority / cool features etc.

My suggestion is this:

Pigcc -L myScript.pig -> parses pig script, generates logical plan, and
stores it in myScript.pig.l

Pigcc -P myScript.pig.l -> produces physical plan from the logical plan, and
stores it in myScript.pig.p

Pigcc -M myScript.pig.p -> produces map-reduce plan, myScript.pig.m

Pig myScript.pig.m -> interprets the MR plan. This can be split into
multiple sequential MR jobs plans too,  myScript.pig.m.{1,2,3..}, so that a
way to execute the pig script is to run

Hadoop jar pigRT.jar myScript.pig.m.1
Hadoop jar pigRT.jar myScript.pig.m.2
Hadoop jar pigRT.jar myScript.pig.m.3
Hadoop jar pigRT.jar myScript.pig.m.4

in sequence or as a DAG.

That also makes it easy for someone to write an experimental runtime, or a
full-fledged translator to other languages, without having to wait for pig
committers to have their patches committed. This will have beneficial impact
on the pig eco-system.

Dmitry, you might remember that we had spoken about it in CMU last October

- Milind

On 4/22/10 1:34 PM, "Dmitriy Ryaboy" <dvryaboy@gmail.com> wrote:

> I kind of dig the concept of being able to plug in a different backend,
> though I definitely thing we should get rid of the dead localmode code. Can
> you give an example of how this will simplify the codebase? Is it more than
> just GenericClass foo = new SpecificClass(), and the associated extra files?
> -D
> On Thu, Apr 22, 2010 at 1:25 PM, Arun C Murthy <acm@yahoo-inc.com> wrote:
>> +1
>> Arun
>> On Apr 22, 2010, at 11:35 AM, Richard Ding wrote:
>>  Pig has an abstraction layer (interfaces and abstract classes) to
>>> support multiple execution engines. After PIG-1053, Hadoop is the only
>>> execution engine supported by Pig. I wonder if we should remove this
>>> layer of code, and make Hadoop THE execution engine for Pig. This will
>>> simplify a lot the backend code.
>>> Thanks,
>>> -Richard

Milind Bhandarkar
Y!IM: GridSolutions
Tel: 408-203-5213 

View raw message