hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <tdunn...@veoh.com>
Subject Re: Multi-threaded Reduce
Date Thu, 04 Oct 2007 05:59:09 GMT

You don't need to do all that work.

Just set:

<property>
  <name>mapred.reduce.tasks</name>
  <value>4</value>
  <description>The default number of map tasks per job.  Typically set
  to a prime several times greater than number of available hosts.
  Ignored when mapred.job.tracker is "local".
  </description>
</property>



Either in hadoop-site or in your program using
conf.set("mapred.reduce.tasks", 4)

That will give you 4 reduce threads.  You can have lots more than that if
you like.


On 10/3/07 6:50 PM, "Nguyen Manh Tien" <tien.nguyenmanh@gmail.com> wrote:

> I know in Hadoop we can implement multi-threaded, asynchronous mapping with
> class MapRunnable. But this don't exist the  similar class to do
> multi-threaded in reduce phrase. Could we do milti-thread in reduce phrase?.
> Does the following code work?
> 
> public void reduce(WritableComparable key, Iterator values,
>                      OutputCollector output, Reporter reporter) {
>      new SomeThread(output).start(); // transfer OutputCollector to thread
> }
> 
> public class SomeThread extend Thread {
>    OutputCollector ouput;
>    public SomeThread(OutputCollector output) {
>      this.output = output;
>    }
>    public void run() {
>      output.collect(key, value);
>    }
> }


Mime
View raw message