hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shravan Matthur Narayanamurthy (JIRA)" <j...@apache.org>
Subject [jira] Issue Comment Edited: (PIG-161) Rework physical plan
Date Tue, 17 Jun 2008 11:19:44 GMT

    [ https://issues.apache.org/jira/browse/PIG-161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12605551#action_12605551
] 

shravanmn edited comment on PIG-161 at 6/17/08 4:19 AM:
-----------------------------------------------------------------------------

I have a diff suggestion.

Top level plan:
load -> group -> foreach

The foreach will have a nested plan:
plan1: project(1) -> distinct -> accumulate
{noformat}
Add
|
|---- project(0)
|
|---- accumulate
        |        |
        |        |---SUM()
        |              |
        |              |--- project(1)
        |                    |
        |                    |--- project(*)
        |---- distinct
                |
                |---- project(1)
{noformat}
But I think we still have some issues with this. Consider this:
{noformat}
A = load 'myfile';
B = group A by $0;
C = foreach B {
    C1 = distinct $1;
    C2 = filter $1 by $0>10;
    generate group + SUM(C1.$1), (myUDF1(C1,C2)*myUDF2(C1,C2))+(COUNT(C1)*group);
};
{noformat}
But here we definitely need Accumulate to handle multiple inputs.


      was (Author: shravanmn):
    I have a diff suggestion.

Top level plan:
load -> group -> foreach

The foreach will have a nested plan:
plan1: project(1) -> distinct -> accumulate
{{{
Add
|
|---- project(0)
|
|---- accumulate
        |        |
        |        |---SUM()
        |              |
        |              |--- project(1)
        |                    |
        |                    |--- project(*)
        |---- distinct
                |
                |---- project(1)
}}}
But I think we still have some issues with this. Consider this:
{{{
A = load 'myfile';
B = group A by $0;
C = foreach B {
    C1 = distinct $1;
    C2 = filter $1 by $0>10;
    generate group + SUM(C1.$1), (myUDF1(C1,C2)*myUDF2(C1,C2))+(COUNT(C1)*group);
};
}}}
But here we definitely need Accumulate to handle multiple inputs.

  
> Rework physical plan
> --------------------
>
>                 Key: PIG-161
>                 URL: https://issues.apache.org/jira/browse/PIG-161
>             Project: Pig
>          Issue Type: Sub-task
>            Reporter: Alan Gates
>            Assignee: Alan Gates
>         Attachments: arithmeticOperators.patch, BinCondAndNegative.patch, CastAndMapLookUp.patch,
incr2.patch, incr3.patch, incr4.patch, incr5.patch, logToPhyTranslator.patch, missingOps.patch,
MRCompilerTests_PlansAndOutputs.txt, Phy_AbsClass.patch, physicalOps.patch, physicalOps.patch,
physicalOps.patch, physicalOps.patch, physicalOps_latest.patch, POCast.patch, POCast.patch,
podistinct.patch, pogenerate.patch, pogenerate.patch, pogenerate.patch, posort.patch, POUserFuncCorrection.patch,
TEST-org.apache.pig.test.TestLocalJobSubmission.txt, TEST-org.apache.pig.test.TestLogToPhyCompiler.txt,
TEST-org.apache.pig.test.TestLogToPhyCompiler.txt, TEST-org.apache.pig.test.TestMapReduce.txt,
TEST-org.apache.pig.test.TestTypeCheckingValidator.txt, TEST-org.apache.pig.test.TestUnion.txt,
translator.patch, translator.patch, translator.patch, translator.patch
>
>
> This bug tracks work to rework all of the physical operators as described in http://wiki.apache.org/pig/PigTypesFunctionalSpec

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message