hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Olga Natkovich (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-1116) Remove redundant map-reduce job for merge join
Date Thu, 03 Dec 2009 01:55:20 GMT

    [ https://issues.apache.org/jira/browse/PIG-1116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12785134#action_12785134
] 

Olga Natkovich commented on PIG-1116:
-------------------------------------

+1

> Remove redundant map-reduce job for merge join
> ----------------------------------------------
>
>                 Key: PIG-1116
>                 URL: https://issues.apache.org/jira/browse/PIG-1116
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Daniel Dai
>            Assignee: Pradeep Kamath
>             Fix For: 0.6.0
>
>         Attachments: PIG-1116.patch
>
>
> In merge join, when we convert right hand side file into a side file, we didn't remove
it from the map-reduce plan, we only disconnect it from the plan. When we run the query, the
redundant load will load the data but doing nothing. This operation should be removed entirely.

> Eg: 
> a = load '/user/pig/tests/data/zebra/singlefile/studentsortedtab10k' using org.apache.hadoop.zebra.pig.TableLoader('',
'sorted') as (name, age, gpa);
> b = load '/user/pig/tests/data/zebra/singlefile/votersortedtab10k' using org.apache.hadoop.zebra.pig.TableLoader('',
'sorted') as (name, age, registration, contributions);
> c = join a by name, b by name using "merge";
> explain c;
> {code}
> #--------------------------------------------------
> # Map Reduce Plan                                  
> #--------------------------------------------------
> MapReduce node 1-21
> Map Plan
> Load(hdfs://wilbur20.labs.corp.sp1.yahoo.com:9020/user/pig/tests/data/zebra/singlefile/votersortedtab10k:org.apache.hadoop.zebra.pig.TableLoader('','sorted'))
- 1-13--------
> Global sort: false
> ----------------
> MapReduce node 1-20
> Map Plan
> Store(fakefile:org.apache.pig.builtin.PigStorage) - 1-19
> |
> |---MergeJoin[tuple] - 1-16
>     |
>     |---Load(hdfs://wilbur20.labs.corp.sp1.yahoo.com:9020/user/pig/tests/data/zebra/singlefile/studentsortedtab10k:org.apache.hadoop.zebra.pig.TableLoader('','sorted'))
- 1-12--------
> Global sort: false
> ----------------
> {code}
> 1-21 should be removed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message