hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From java8964 java8964 <java8...@hotmail.com>
Subject RE: secondary sort - number of reducers
Date Fri, 30 Aug 2013 01:42:28 GMT
The method getPartition() needs to return a positive number. Simply use hashCode() method is
not enough.
See the Hadoop HashPartitioner implementation:

return (key.hashCode() & Integer.MAX_VALUE) % numReduceTasks;
When I first read this code, I always wonder why not use Math.abs? Is ( & Integer.MAX_VALUE)
Date: Thu, 29 Aug 2013 20:55:46 -0400
Subject: Re: secondary sort - number of reducers
From: adeelmahmood@gmail.com
To: user@hadoop.apache.org

okay so when i specify the number of reducers e.g. in my example i m using 4 (for a much smaller
data set) it works if I use a single column in my composite key .. but if I add multiple columns
in the composite key separated by a delimi .. it then throws the illegal partition error (keys
before the pipe are group keys and after the pipe are the sort keys and my partioner only
uses the group keys

java.io.IOException: Illegal partition for Atlanta:GA|Atlanta:GA:1:Adeel (-1)        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1073)
       at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:691)
        at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
       at com.att.hadoop.hivesort.HSMapper.map(HSMapper.java:39)        at com.att.hadoop.hivesort.HSMapper.map(HSMapper.java:1)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:255)        at java.security.AccessController.doPrivileged(Native
Method)        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1136)
       at org.apache.hadoop.mapred.Child.main(Child.java:249)

	public int getPartition(Text key, HCatRecord record, int numParts) {		//extract the group
key from composite key
		String groupKey = key.toString().split("\\|")[0];				return groupKey.hashCode() % numParts;


On Thu, Aug 29, 2013 at 8:31 PM, Shekhar Sharma <shekhar2581@gmail.com> wrote:

No...partitionr decides which keys should go to which reducer...and

number of reducers you need to decide...No of reducers depends on

factors like number of key value pair, use case etc


Som Shekhar Sharma


On Fri, Aug 30, 2013 at 5:54 AM, Adeel Qureshi <adeelmahmood@gmail.com> wrote:

> so it cant figure out an appropriate number of reducers as it does for

> mappers .. in my case hadoop is using 2100+ mappers and then only 1 reducer

> .. since im overriding the partitioner class shouldnt that decide how

> manyredeucers there should be based on how many different partition values

> being returned by the custom partiotioner



> On Thu, Aug 29, 2013 at 7:38 PM, Ian Wrigley <ian@cloudera.com> wrote:


>> If you don't specify the number of Reducers, Hadoop will use the default

>> -- which, unless you've changed it, is 1.


>> Regards


>> Ian.


>> On Aug 29, 2013, at 4:23 PM, Adeel Qureshi <adeelmahmood@gmail.com> wrote:


>> I have implemented secondary sort in my MR job and for some reason if i

>> dont specify the number of reducers it uses 1 which doesnt seems right

>> because im working with 800M+ records and one reducer slows things down

>> significantly. Is this some kind of limitation with the secondary sort that

>> it has to use a single reducer .. that kind of would defeat the purpose of

>> having a scalable solution such as secondary sort. I would appreciate any

>> help.


>> Thanks

>> Adeel




>> ---

>> Ian Wrigley

>> Sr. Curriculum Manager

>> Cloudera, Inc

>> Cell: (323) 819 4075



View raw message