hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Antonio Magnaghi (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-32) Abstraction Layer to decouple Pig from Back-End
Date Fri, 14 Dec 2007 14:45:43 GMT

    [ https://issues.apache.org/jira/browse/PIG-32?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12551817

Antonio Magnaghi commented on PIG-32:

Attaching to the bug this high level summary that I sent out to the mailing list few days

Have discussed this with Ben, one aspect we talked about was to estend the API to provide
a way to collect logging and debugging information.

From: Antonio Magnaghi 
Sent: Monday, December 10, 2007 9:29 AM
To: 'pig-dev@incubator.apache.org'
Subject: Abstraction layer: execution engine (PIG-32)

I'm starting to work on the portion of the abstraction layer about the execution engine for
the separation of front-end from back-end. 

Based on some previous discussions with various folks, including Trevor Strohman from the
Galago project, I think it is possible to identify some requirements/changes that I've summarize
below (in addition to what is currently posted at: http://wiki.apache.org/pig/PigAbstractionLayer.)

I would like to get some feedback on these points and whether I have left out aspects that'd
need to be considered as well.


Change logical plan representation: goal is to change the representation of logical plans
so that: 
•	details pertaining to the physical query plan execution are not present anymore in the
•	a new logical plan submitted to the back-end can reference a portion (or alias) of another
logical plan

Aspects affected by the changes above are:
1.	need to remove data collectors and logic to manage data-pipes from the eval specs and cond's
of logical operators. These data structures are used in the case of the local execution mode.
We can add physical eval specs and cond's where data pipes and data collectors are set up.
This has the disadvantage of creating extra code (similar to the code for logical eval specs
and logical cond's), but the overall separation of the logical aspects from the physical execution
should be much cleaner.
2.	need to remove the table of query results, where aliases are mapped to intermediate results.
This data structure is populated when the logical plan is compiled. The concept of intermediate
results does not seem to belong in the front-end. (Information about the generation of intermediate
results will be maintained in the back-end)
3.	extend representation of logical operators assigning to them a scope and a unique id within
the scope. The motivation for doing this would be that new logical plans submitted to the
back end can reference previous logical plans (or parts of it) via a (scope id, node id) pair.
Having the concept of scope can provide support in the back-end for purging information about
entities that go out of scope. For instance, the session id could be used as scope to garbage
collect entities in the back-end no longer needed.
4.	need to add a catalog that maps aliases to logical trees. For instance, when a store operation
is encountered, the front-end can determine the set of dependent logical trees to serialize
and send to the back-end or (scope, id) of previous plans to reference. 
5.	Serialization process from the front-end to the back-end can produce a representation of
the logical plan and its dependencies that include (scope, id) of each operators to send to
the back end.

1.	back-end would maintain table of intermediate results
2.	compilation of logical plan to physical plan would take place in the back-end
3.	a local back-end would generate physical trees using the physical eval specs and physical
cond's (as described above)
4.	a Hadoop back-end would compile logical plan to map/reduce

> Abstraction Layer to decouple Pig from Back-End
> -----------------------------------------------
>                 Key: PIG-32
>                 URL: https://issues.apache.org/jira/browse/PIG-32
>             Project: Pig
>          Issue Type: New Feature
>          Components: impl
>            Reporter: Antonio Magnaghi
>            Assignee: Antonio Magnaghi
>         Attachments: DataStorage.diff, DataStorage20071212.diff
> I'm opening a new issue to track the development work to support an abstraction layer
for Pig as defined at http://wiki.apache.org/pig/PigAbstractionLayer

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message