hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Olga Natkovich (JIRA)" <j...@apache.org>
Subject [jira] Updated: (PIG-282) Custom Partitioner
Date Sat, 21 Aug 2010 00:12:18 GMT

     [ https://issues.apache.org/jira/browse/PIG-282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Olga Natkovich updated PIG-282:
-------------------------------

    Release Note: 
This feature allows to specify Hadoop Partitioner for the following operations: GROUP/COGROUP,
CROSS, DISTINCT, JOIN (except 'skewed'  join). Partitioner controls the partitioning of the
keys of the intermediate map-outputs. See http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/Partitioner.html
for more details.

To use this feature you can add PARTITION BY clause to the appropriate operator:
A = load 'input_data';
B = group A by $0 PARTITION BY org.apache.pig.test.utils.SimpleCustomPartitioner parallel
2;
.....
Here is the code for SimpleCustomPartitioner

public class SimpleCustomPartitioner extends Partitioner<PigNullableWritable, Writable>
{
     //@Override
    public int getPartition(PigNullableWritable key, Writable value, int numPartitions) {
        if(key.getValueAsPigType() instanceof Integer) {
            int ret = (((Integer)key.getValueAsPigType()).intValue() % numPartitions);
            return ret;
       }
       else {
            return (key.hashCode()) % numPartitions;
        }
    }
}

> Custom Partitioner
> ------------------
>
>                 Key: PIG-282
>                 URL: https://issues.apache.org/jira/browse/PIG-282
>             Project: Pig
>          Issue Type: New Feature
>    Affects Versions: 0.7.0
>            Reporter: Amir Youssefi
>            Assignee: Aniket Mokashi
>            Priority: Minor
>             Fix For: 0.8.0
>
>         Attachments: CustomPartitioner.patch, CustomPartitionerFinale.patch, CustomPartitionerTest.patch
>
>
> By adding custom partitioner we can give control over which output partition a key (/value)
goes to. We can add keywords to language e.g. 
> PARTITION BY UDF(...)
> or a similar syntax. UDF returns a number between 0 and n-1 where n is number of output
partitions.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message