crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chao Shi <stepi...@live.com>
Subject Re: Sort with multiple reducers not working?
Date Wed, 31 Jul 2013 06:54:37 GMT
Got it. I have to test my patch on a real cluster manually and it works. Is
there any way to do it in unit test?


On Tue, Jul 30, 2013 at 11:32 PM, Josh Wills <jwills@cloudera.com> wrote:

> Hey Chao,
>
> It's just a problem w/the LocalJobRunner, which always uses a single
> reducer no matter what you set it to in the configuration.
>
> J
>
>
> On Tue, Jul 30, 2013 at 1:06 AM, Chao Shi <stepinto@live.com> wrote:
>
> > Hi devs,
> >
> > Does any one tried sorting with multiple reducers? I seem to hit this
> when
> > trying to implement the HFile bulk loader.
> >
> > You can reproduce this as follow:
> > 1. modify SortIT to run multiple reducers
> > 2. run SortIT#testWritableSortDesc
> >
> > I got exception:
> > java.lang.IllegalArgumentException: Can't read partitions file
> >         at
> >
> >
> org.apache.crunch.lib.sort.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:81)
> >         at
> > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
> >         at
> >
> >
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
> >         at
> >
> >
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:677)
> >         at
> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:756)
> >         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> >         at
> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:214)
> > Caused by: java.io.IOException: Wrong number of partitions in keyset
> >         at
> >
> >
> org.apache.crunch.lib.sort.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:77)
> >         ... 6 more
> >
> > It seems that TotalOrderPartitioner does not receive the correct number
> of
> > reducers. Any ideas?
> >
> > Thanks,
> > Chao
> >
>
>
>
> --
> Director of Data Science
> Cloudera <http://www.cloudera.com>
> Twitter: @josh_wills <http://twitter.com/josh_wills>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message