hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adeel Qureshi <adeelmahm...@gmail.com>
Subject Re: secondary sort - number of reducers
Date Fri, 30 Aug 2013 13:24:35 GMT
yup it was negative and by doing this now it seems to be working fine


On Fri, Aug 30, 2013 at 3:09 AM, Shekhar Sharma <shekhar2581@gmail.com>wrote:

> Is the hash code of that key  is negative.?
> Do something like this
>
> return groupKey.hashCode() & Integer.MAX_VALUE % numParts;
>
> Regards,
> Som Shekhar Sharma
> +91-8197243810
>
>
> On Fri, Aug 30, 2013 at 6:25 AM, Adeel Qureshi <adeelmahmood@gmail.com>
> wrote:
> > okay so when i specify the number of reducers e.g. in my example i m
> using 4
> > (for a much smaller data set) it works if I use a single column in my
> > composite key .. but if I add multiple columns in the composite key
> > separated by a delimi .. it then throws the illegal partition error (keys
> > before the pipe are group keys and after the pipe are the sort keys and
> my
> > partioner only uses the group keys
> >
> > java.io.IOException: Illegal partition for Atlanta:GA|Atlanta:GA:1:Adeel
> > (-1)
> >         at
> >
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1073)
> >         at
> >
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:691)
> >         at
> >
> org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
> >         at com.att.hadoop.hivesort.HSMapper.map(HSMapper.java:39)
> >         at com.att.hadoop.hivesort.HSMapper.map(HSMapper.java:1)
> >         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> >         at
> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
> >         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> >         at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> >         at java.security.AccessController.doPrivileged(Native Method)
> >         at javax.security.auth.Subject.doAs(Subject.java:396)
> >         at
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1136)
> >         at org.apache.hadoop.mapred.Child.main(Child.java:249)
> >
> >
> > public int getPartition(Text key, HCatRecord record, int numParts) {
> > //extract the group key from composite key
> > String groupKey = key.toString().split("\\|")[0];
> > return groupKey.hashCode() % numParts;
> > }
> >
> >
> > On Thu, Aug 29, 2013 at 8:31 PM, Shekhar Sharma <shekhar2581@gmail.com>
> > wrote:
> >>
> >> No...partitionr decides which keys should go to which reducer...and
> >> number of reducers you need to decide...No of reducers depends on
> >> factors like number of key value pair, use case etc
> >> Regards,
> >> Som Shekhar Sharma
> >> +91-8197243810
> >>
> >>
> >> On Fri, Aug 30, 2013 at 5:54 AM, Adeel Qureshi <adeelmahmood@gmail.com>
> >> wrote:
> >> > so it cant figure out an appropriate number of reducers as it does for
> >> > mappers .. in my case hadoop is using 2100+ mappers and then only 1
> >> > reducer
> >> > .. since im overriding the partitioner class shouldnt that decide how
> >> > manyredeucers there should be based on how many different partition
> >> > values
> >> > being returned by the custom partiotioner
> >> >
> >> >
> >> > On Thu, Aug 29, 2013 at 7:38 PM, Ian Wrigley <ian@cloudera.com>
> wrote:
> >> >>
> >> >> If you don't specify the number of Reducers, Hadoop will use the
> >> >> default
> >> >> -- which, unless you've changed it, is 1.
> >> >>
> >> >> Regards
> >> >>
> >> >> Ian.
> >> >>
> >> >> On Aug 29, 2013, at 4:23 PM, Adeel Qureshi <adeelmahmood@gmail.com>
> >> >> wrote:
> >> >>
> >> >> I have implemented secondary sort in my MR job and for some reason
> if i
> >> >> dont specify the number of reducers it uses 1 which doesnt seems
> right
> >> >> because im working with 800M+ records and one reducer slows things
> down
> >> >> significantly. Is this some kind of limitation with the secondary
> sort
> >> >> that
> >> >> it has to use a single reducer .. that kind of would defeat the
> purpose
> >> >> of
> >> >> having a scalable solution such as secondary sort. I would appreciate
> >> >> any
> >> >> help.
> >> >>
> >> >> Thanks
> >> >> Adeel
> >> >>
> >> >>
> >> >>
> >> >> ---
> >> >> Ian Wrigley
> >> >> Sr. Curriculum Manager
> >> >> Cloudera, Inc
> >> >> Cell: (323) 819 4075
> >> >>
> >> >
> >
> >
>

Mime
View raw message