hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shravan Matthur Narayanamurthy (JIRA)" <j...@apache.org>
Subject [jira] Updated: (PIG-161) Rework physical plan
Date Sat, 19 Apr 2008 00:28:21 GMT

     [ https://issues.apache.org/jira/browse/PIG-161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Shravan Matthur Narayanamurthy updated PIG-161:

    Attachment: incr5.patch

Includes mainly the MRCompiler and its subordinate classes included in the mapReduceLayer
package. Also included are tests for the same. In order for better testablility, I have included
some dummy operators like POGlobalRearrange, POCast as the compiler does not care about the
operator's functionality. Also included is the POSplit Map Reduce operator. It is essentially
dummy as it does not do any work. The compiler translates it into store-load and assumes that
the logical to physical translation would ensure that the relevant filters are used as outputs
of the Split.

Also in the patch is an implementation of the POUnion operator which works for both MapReduce
and Local backends. Ialso have tests for the same.

Another class included is a PlanPrinter which does tree-like pretty printing of the plan.
I am attaching another file which has all the test cases I have ran for the MRCompiler which
has about 14 test cases. It has the PlanPrinter representation of the plan compiled and the
compiled plan. Please check if the conversion taking place is apt.

The MRCompiler doesn't support the POSort operator. After much thought I decided to submit
it without it because the POSort MR needs POUserFunc and POSort local. So decided to wait
for them to be checked in.

This would not be a major change and would not affect existing code.

Another thing is that the MRCompiler uses GenPhyOp class because of which I have include some
test folder classes into the compilation of the src folder classes. As an artifact of the
changes in GenPhyOp, which calls the PigContext.connect() and hence needs to use the MiniCluster,
all test that use it will take much longer to execute. The test time has shot up to 1 min
46 sec. Is there a way to just create the MiniCluster once rather than doing it in each TestCase?

Pretty long patch and an important one too. So please review it thoroughly. Awaiting comments.

> Rework physical plan
> --------------------
>                 Key: PIG-161
>                 URL: https://issues.apache.org/jira/browse/PIG-161
>             Project: Pig
>          Issue Type: Sub-task
>            Reporter: Alan Gates
>            Assignee: Alan Gates
>         Attachments: arithmeticOperators.patch, incr2.patch, incr3.patch, incr4.patch,
incr5.patch, Phy_AbsClass.patch, pogenerate.patch, pogenerate.patch, pogenerate.patch, posort.patch
> This bug tracks work to rework all of the physical operators as described in http://wiki.apache.org/pig/PigTypesFunctionalSpec

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message