hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Santhosh Srinivasan (JIRA)" <j...@apache.org>
Subject [jira] Issue Comment Edited: (PIG-161) Rework physical plan
Date Mon, 16 Jun 2008 23:33:45 GMT

    [ https://issues.apache.org/jira/browse/PIG-161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12605456#action_12605456
] 

sms edited comment on PIG-161 at 6/16/08 4:33 PM:
------------------------------------------------------------------

Consider the following example, a modification of Case 2 in the previous comment:

{code}
A = load 'myfile';
B = group A by $0;
C = foreach B {
    C1 = distinct $1;
    generate group + SUM(C1);
};
{code}

Top level plan:

load -> group -> foreach

The foreach will have a nested plan:

plan 1: project(1) -> distinct -> accumulate

The accumulate will have a nested plan of: 

{noformat}
     project( * )
                        \
                        SUM()
                         / 
project(group)
{noformat}

The accumulate operator requires two inputs:

1. The tuple from foreach for projecting 'group'
2. The bag from distinct for the aggregate SUM

With the proposed changes, accumulate will not be able to receive inputs from both foreach
and distinct. In order to solve this problem, accumulate has to be made a proxy root by attaching
the input from foreach to accumulate. The second input from distinct will be retrieved using
getNext()

In addition to the changes proposed in the previous comment, the following changes have to
be made:

1. In the logical layer indicate if accumulate requires its input from foreach
2. In the physical layer (for foreach), attach input should attach the tuple to accumulate
in addition to all the roots in the nested plans of foreach

      was (Author: sms):
    Consider the following example, a modification of Case 2 in the previous comment:

{code}
A = load 'myfile';
B = group A by $0;
C = foreach B {
    C1 = distinct $1;
    generate group + SUM(C1);
};
{code}

Top level plan:

load -> group -> foreach

The foreach will have a nested plan:

plan 1: project(1) -> distinct -> accumulate

The accumulate will have a nested plan of: 

{format}
     project( * )
                        \
                        SUM()
                         / 
project(group)
{format}

The accumulate operator requires two inputs:

1. The tuple from foreach for projecting 'group'
2. The bag from distinct for the aggregate SUM

With the proposed changes, accumulate will not be able to receive inputs from both foreach
and distinct. In order to solve this problem, accumulate has to be made a proxy root by attaching
the input from foreach to accumulate. The second input from distinct will be retrieved using
getNext()

In addition to the changes proposed in the previous comment, the following changes have to
be made:

1. In the logical layer indicate if accumulate requires its input from foreach
2. In the physical layer (for foreach), attach input should attach the tuple to accumulate
in addition to all the roots in the nested plans of foreach
  
> Rework physical plan
> --------------------
>
>                 Key: PIG-161
>                 URL: https://issues.apache.org/jira/browse/PIG-161
>             Project: Pig
>          Issue Type: Sub-task
>            Reporter: Alan Gates
>            Assignee: Alan Gates
>         Attachments: arithmeticOperators.patch, BinCondAndNegative.patch, CastAndMapLookUp.patch,
incr2.patch, incr3.patch, incr4.patch, incr5.patch, logToPhyTranslator.patch, missingOps.patch,
MRCompilerTests_PlansAndOutputs.txt, Phy_AbsClass.patch, physicalOps.patch, physicalOps.patch,
physicalOps.patch, physicalOps.patch, physicalOps_latest.patch, POCast.patch, POCast.patch,
podistinct.patch, pogenerate.patch, pogenerate.patch, pogenerate.patch, posort.patch, POUserFuncCorrection.patch,
TEST-org.apache.pig.test.TestLocalJobSubmission.txt, TEST-org.apache.pig.test.TestLogToPhyCompiler.txt,
TEST-org.apache.pig.test.TestLogToPhyCompiler.txt, TEST-org.apache.pig.test.TestMapReduce.txt,
TEST-org.apache.pig.test.TestTypeCheckingValidator.txt, TEST-org.apache.pig.test.TestUnion.txt,
translator.patch, translator.patch, translator.patch, translator.patch
>
>
> This bug tracks work to rework all of the physical operators as described in http://wiki.apache.org/pig/PigTypesFunctionalSpec

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message