hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ashutosh Chauhan (JIRA)" <j...@apache.org>
Subject [jira] Created: (PIG-1590) Use POMergeJoin for Left Outer Join when join using 'merge'
Date Wed, 01 Sep 2010 20:34:52 GMT
Use POMergeJoin for Left Outer Join when join using 'merge'
-----------------------------------------------------------

                 Key: PIG-1590
                 URL: https://issues.apache.org/jira/browse/PIG-1590
             Project: Pig
          Issue Type: Improvement
          Components: impl
    Affects Versions: 0.8.0
            Reporter: Ashutosh Chauhan
            Priority: Minor


C = join A by $0 left, B by $0 using 'merge';

will result in map-side sort merge join. Internally, it will translate to use POMergeCogroup
+ ForEachFlatten. POMergeCogroup places quite a few restrictions on its loaders (A and B in
this case) which is cumbersome. Currently, only Zebra is known to satisfy all those requirements.
It will be better to use POMergeJoin in this case, since it has far fewer requirements on
its loader. Importantly, it works with PigStorage.  Plus, POMergeJoin will be faster then
POMergeCogroup + FE-Flatten.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message