pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ashutosh Chauhan (JIRA)" <j...@apache.org>
Subject [jira] Updated: (PIG-1353) Map-side joins
Date Fri, 16 Apr 2010 22:05:25 GMT

     [ https://issues.apache.org/jira/browse/PIG-1353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Ashutosh Chauhan updated PIG-1353:

          Status: Resolved  (was: Patch Available)
    Release Note: 
With this patch, it is now possible to perform [left|right|full] outer joins on two tables
as well as inner joins on more then two tables in Pig in map-side if data is sorted and one
of the loader implements {{CollectableLoader}} interface. Primary algorithm is based on sort-merge

Additional implementation details:
1) No other operations can be done between load and join statements.
2) Data must be sorted in ASC order.
3) Nulls are considered smaller then everything. So, if data contains null keys, they should
occur before anything else.
4) Left-most loader must implement CollectableLoader interface as well as OrderedLoadFunc.
5) All other loaders must implement IndexableLoadFunc.   

Note that Zebra loader satisfies all of these conditions, so can be used out of box.
Similiar conditions apply to map-side cogroups (PIG-1309) as well.  
      Resolution: Fixed

Patch checked-in.

> Map-side joins
> --------------
>                 Key: PIG-1353
>                 URL: https://issues.apache.org/jira/browse/PIG-1353
>             Project: Pig
>          Issue Type: Improvement
>          Components: impl
>            Reporter: Ashutosh Chauhan
>            Assignee: Ashutosh Chauhan
>             Fix For: 0.8.0
>         Attachments: pig-1353.patch, pig-1353.patch
> Pig already has couple of map-side join implementations: Merge Join and Fragmented-Replicate
Join. But both of them are pretty restrictive. Merge Join can only join two tables and that
too can only do inner join. FR Join can join multiple relations, but it can also only do inner
and left outer joins. Further it restricts the sizes of side relations. It will be nice if
we can do map side joins on multiple tables as well do inner, left outer, right outer and
full outer joins. 
> Lot of groundwork for this has already been done in PIG-1309. Remaining will be tracked
in this jira.   

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message