pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thejas M Nair (JIRA)" <j...@apache.org>
Subject [jira] Updated: (PIG-1672) order of relations in replicated join gets switched in a query where first relation has two mergeable foreach statements
Date Sat, 09 Oct 2010 00:00:31 GMT

     [ https://issues.apache.org/jira/browse/PIG-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Thejas M Nair updated PIG-1672:
-------------------------------

    Status: Patch Available  (was: Open)

Unit tests and test-patch have passed with PIG-1672.2.patch .


> order of relations in replicated join gets switched in a query where first relation has
two mergeable foreach statements
> ------------------------------------------------------------------------------------------------------------------------
>
>                 Key: PIG-1672
>                 URL: https://issues.apache.org/jira/browse/PIG-1672
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.0
>            Reporter: Thejas M Nair
>            Assignee: Thejas M Nair
>             Fix For: 0.8.0
>
>         Attachments: PIG-1672.1.patch, PIG-1672.2.patch
>
>
> The replicated join query was running out of memory because the order of relations got
switched during logical plan optimization and it was attempting to load the larger (left)
relation into memory.
> {code}
> cat replj.pig
> l1 = load 'x' as (a);
> l2 = load 'y' as (b);
> l3 = load 'z' as (a1,b1,c1,d1);
> f1 = foreach l3 generate a1 as a, b1 as b, c1 as c, d1 as d;
> f2 = foreach f1 generate a,b,c; 
> j1 = join f2 by a, l1 by a using 'replicated';
> j2 = join j1 by b, l2 by b using 'replicated';
> explain j2;
> Note that in the MR plan printed below, the Load in the MR job with join operations has
'x' as the input instead of 'z' .
> #--------------------------------------------------
> # Map Reduce Plan                                  
> #--------------------------------------------------
> MapReduce node scope-30
> Map Plan
> Store(file:/tmp/temp101387354/tmp-125684214:org.apache.pig.impl.io.InterStorage) - scope-31
> |
> |---l2: Load(file:///Users/tejas/pig-0.8/branch-0.8/y:org.apache.pig.builtin.PigStorage)
- scope-17--------
> Global sort: false
> ----------------
> MapReduce node scope-27
> Map Plan
> j2: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-26
> |
> |---j2: FRJoin[tuple] - scope-20
>     |   |
>     |   Project[bytearray][1] - scope-18
>     |   |
>     |   Project[bytearray][0] - scope-19
>     |
>     |---j1: FRJoin[tuple] - scope-11
>         |   |
>         |   Project[bytearray][0] - scope-9
>         |   |
>         |   Project[bytearray][0] - scope-10
>         |
>         |---l1: Load(file:///Users/tejas/pig-0.8/branch-0.8/x:org.apache.pig.builtin.PigStorage)
- scope-0--------
> Global sort: false
> ----------------
> MapReduce node scope-28
> Map Plan
> Store(file:/tmp/temp101387354/tmp-890864787:org.apache.pig.impl.io.InterStorage) - scope-29
> |
> |---f2: New For Each(false,false,false)[bag] - scope-8
>     |   |
>     |   Project[bytearray][0] - scope-2
>     |   |
>     |   Project[bytearray][1] - scope-4
>     |   |
>     |   Project[bytearray][2] - scope-6
>     |
>     |---l3: Load(file:///Users/tejas/pig-0.8/branch-0.8/z:org.apache.pig.builtin.PigStorage)
- scope-1--------
> Global sort: false
> ----------------
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message