hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Olga Natkovich (JIRA)" <j...@apache.org>
Subject [jira] Updated: (PIG-1309) Map-side Cogroup
Date Fri, 20 Aug 2010 18:57:17 GMT

     [ https://issues.apache.org/jira/browse/PIG-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Olga Natkovich updated PIG-1309:

    Release Note: 
With this patch, it is now possible to perform map-side cogroup if data is sorted and one
of the loader implements {{CollectableLoader}} interface. Primary algorithm is based on sort-merge

Additional implementation details: 
1) No other operations can be done between load and join statements. 
2) Data must be sorted in ASC order. 
3) Nulls are considered smaller then everything. So, if data contains null keys, they should
occur before anything else. 
4) Left-most loader must implement CollectableLoader interface as well as OrderedLoadFunc.

5) All other loaders must implement IndexableLoadFunc. 

Note that Zebra loader satisfies all of these conditions, so can be used out of box. 
Similiar conditions apply to map-side cogroups (PIG-1309) as well. 

A = load 'data1' using org.apache.hadoop.zebra.pig.TableLoader('', 'sorted'); 
B = load 'data2' using org.apache.hadoop.zebra.pig.TableLoader('', 'sorted'); 
C = COGROUP A by id, B by id using 'merge'; 

> Map-side Cogroup
> ----------------
>                 Key: PIG-1309
>                 URL: https://issues.apache.org/jira/browse/PIG-1309
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>            Reporter: Ashutosh Chauhan
>            Assignee: Ashutosh Chauhan
>             Fix For: 0.7.0, 0.8.0
>         Attachments: mapsideCogrp.patch, pig-1309_1.patch, pig-1309_2.patch, PIG_1309_7.patch
> In never ending quest to make Pig go faster, we want to parallelize as many relational
operations as possible. Its already possible to do Group-by( PIG-984 ) and Joins( PIG-845
, PIG-554 ) purely in map-side in Pig. This jira is to add map-side implementation of Cogroup
in Pig. Details to follow.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message