hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ashutosh Chauhan (JIRA)" <j...@apache.org>
Subject [jira] Updated: (PIG-834) incorrect plan when algebraic functions are nested
Date Fri, 05 Feb 2010 23:00:28 GMT

     [ https://issues.apache.org/jira/browse/PIG-834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ashutosh Chauhan updated PIG-834:
---------------------------------

    Attachment: pig-834_2.patch

Correct approach is following: If leaf of inner plan of ForEach is not combinable then we
dont put combiner in any case. If it is, there should not be any other combinable POUserFunc
in the ForEach's inner plan. First check already exists in trunk. This patch checks for this
second conditon and makes sure not to fire combiner if there is any other combinable POUserFunc
in the ForEach inner plan apart from leaf POUserFunc.

> incorrect plan when algebraic functions are nested
> --------------------------------------------------
>
>                 Key: PIG-834
>                 URL: https://issues.apache.org/jira/browse/PIG-834
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>            Reporter: Thejas M Nair
>            Assignee: Ashutosh Chauhan
>             Fix For: 0.7.0
>
>         Attachments: pig-834.patch, pig-834_2.patch
>
>
> a = load 'students.txt' as (c1,c2,c3,c4); 
> c = group a by c2;  
> f = foreach c generate COUNT(org.apache.pig.builtin.Distinct($1.$2));
> Notice that Distinct udf is missing in Combiner and reduce stage. As a result distinct
does not function, and incorrect results are produced.
> Distinct should have been evaluated in the 3 stages and output of Distinct should be
given to COUNT in reduce stage.
> {code}
> # Map Reduce Plan                                  
> #--------------------------------------------------
> MapReduce node 1-122
> Map Plan
> Local Rearrange[tuple]{bytearray}(false) - 1-139
> |   |
> |   Project[bytearray][1] - 1-140
> |
> |---New For Each(false,false)[bag] - 1-127
>     |   |
>     |   POUserFunc(org.apache.pig.builtin.COUNT$Initial)[tuple] - 1-125
>     |   |
>     |   |---POUserFunc(org.apache.pig.builtin.Distinct)[bag] - 1-126
>     |       |
>     |       |---Project[bag][2] - 1-123
>     |           |
>     |           |---Project[bag][1] - 1-124
>     |   |
>     |   Project[bytearray][0] - 1-133
>     |
>     |---Pre Combiner Local Rearrange[tuple]{Unknown} - 1-141
>         |
>         |---Load(hdfs://wilbur11.labs.corp.sp1.yahoo.com/user/tejas/students.txt:org.apache.pig.builtin.PigStorage)
- 1-111--------
> Combine Plan
> Local Rearrange[tuple]{bytearray}(false) - 1-143
> |   |
> |   Project[bytearray][1] - 1-144
> |
> |---New For Each(false,false)[bag] - 1-132
>     |   |
>     |   POUserFunc(org.apache.pig.builtin.COUNT$Intermediate)[tuple] - 1-130
>     |   |
>     |   |---Project[bag][0] - 1-135
>     |   |
>     |   Project[bytearray][1] - 1-134
>     |
>     |---POCombinerPackage[tuple]{bytearray} - 1-137--------
> Reduce Plan
> Store(fakefile:org.apache.pig.builtin.PigStorage) - 1-121
> |
> |---New For Each(false)[bag] - 1-120
>     |   |
>     |   POUserFunc(org.apache.pig.builtin.COUNT$Final)[long] - 1-119
>     |   |
>     |   |---Project[bag][0] - 1-136
>     |
>     |---POCombinerPackage[tuple]{bytearray} - 1-145--------
> Global sort: false
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message