hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <tdunn...@veoh.com>
Subject Re: Multi-threaded Reduce
Date Thu, 04 Oct 2007 17:27:24 GMT

Arun,

I think that you are strictly correct, but that the original questioner
simply needed some parallelism for reduces, not necessarily parallelism on a
single node.

I could be very wrong.  It is always difficult to determine what a question
means, of course, since the person asking the question generally doesn't
understand something about the system (hence the question).


On 10/4/07 1:13 AM, "Arun C Murthy" <arunc@yahoo-inc.com> wrote:

> Ted Dunning wrote:
>> You don't need to do all that work.
>> 
>> Just set:
>> 
>> <property>
>>   <name>mapred.reduce.tasks</name>
>>   <value>4</value>
>>   <description>The default number of map tasks per job.  Typically set
>>   to a prime several times greater than number of available hosts.
>>   Ignored when mapred.job.tracker is "local".
>>   </description>
>> </property>
>> 
>> Either in hadoop-site or in your program using
>> conf.set("mapred.reduce.tasks", 4)
>> 
>> That will give you 4 reduce threads.  You can have lots more than that if
>> you like.
>> 
> 
> Err... no.
> 
> *mapred.reduce.tasks* is the default no. of reduces for a job.
> 
> I think the config knob you want is *mapred.tasktrackers.tasks.maximum*.
> (http://lucene.apache.org/hadoop/hadoop-default.html#mapred.tasktracker.tasks.
> maximum)
> 
> That, btw, is the maximum no. of tasks of a given kind (map or reduce)
> which can be simultaneously running on a given tasktracker (separate
> jvms). This is a cluster-wide limit, and there are jira issues open to
> make that a per-tracker knob (HADOOP-1245 & HADOOP-1274).
> 
> Arun
> 
>> 
>> On 10/3/07 6:50 PM, "Nguyen Manh Tien" <tien.nguyenmanh@gmail.com> wrote:
>> 
>> 
>>> I know in Hadoop we can implement multi-threaded, asynchronous mapping with
>>> class MapRunnable. But this don't exist the  similar class to do
>>> multi-threaded in reduce phrase. Could we do milti-thread in reduce phrase?.
>>> Does the following code work?
>>> 
>>> public void reduce(WritableComparable key, Iterator values,
>>>                     OutputCollector output, Reporter reporter) {
>>>     new SomeThread(output).start(); // transfer OutputCollector to thread
>>> }
>>> 
>>> public class SomeThread extend Thread {
>>>   OutputCollector ouput;
>>>   public SomeThread(OutputCollector output) {
>>>     this.output = output;
>>>   }
>>>   public void run() {
>>>     output.collect(key, value);
>>>   }
>>> }
>> 
>> 
> 


Mime
View raw message