hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Balog <d...@conviva.com>
Subject Re: Why separate Map/Reduce task limits per node ?
Date Tue, 28 Oct 2008 20:41:16 GMT
Hi Alex, I'm sorry, I think you misunderstood my question. Let me  
explain some more.

I have a hadoop cluster of dual quad core machines.
I'm using hadoop-0.18.1 with Matei's fairscheduler patch
https://issues.apache.org/jira/browse/HADOOP-3746 running in FIFO mode.
I have about 5 different jobs running in a pipeline. The number of map/ 
reduce tasks per job
varies based on the input data.
I assign the various jobs different priorities, and Matei's FIFO  
scheduler does almost exactly what I want.
(The default scheduler did a horrible job with our workload, because  
it prefers map tasks.)

I'm trying to tune the tasks per node to fully utilize my cluster,  my  
goal < 10% idle.
I'm pretty sure my jobs are cpu bound. I can control the number of  
tasks per node by
  setting mapred.tasktracker.map.tasks.maximum
  and mapred.tasktracker.reduce.tasks.maximum in hadoop-site.xml.

But I don't have a fixed number of maps and reduces that I run, so  
saying 5+3 tends to leave
my nodes more idle than I want. I just want to say  run 8 tasks per  
node, I don't care what the mix between
map and reduce tasks per node.

>> I've been wondering why there are separate task limits for map and  
>> reduce.
>> Why not a single generic  task limit per node ?
>



The only reason I can think of for having
separate map and reduce task limits, is the default scheduler.
It wants to schedule all map tasks first, so you really need to limit  
the number of
them so that reduces have a chance to run.

Thanks for any insight,
Doug


On Oct 27, 2008, at 6:26 PM, Alex Loddengaard wrote:

> In most jobs, map and reduce tasks are significantly differently,  
> and their
> runtimes vary as well.  The number of reducers also determines how  
> many
> output files you have.  So in the case when you would want one  
> output file,
> having a single generic task limit would mean that you'd also have one
> mapper.  This would be quite a limiting setup.
> Hope this helps.
>
> Alex
>
> On Mon, Oct 27, 2008 at 1:31 PM, Doug Balog <doug@conviva.com> wrote:
>
>> Hi,
>> I've been wondering why there are separate task limits for map and  
>> reduce.
>> Why not a single generic  task limit per node ?
>>
>> Thanks for any insight,
>>
>> Doug
>>
>>
>>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message