hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: reduce copy rate
Date Fri, 15 Apr 2011 17:57:16 GMT
Hello Juwei,

On Fri, Apr 15, 2011 at 10:43 PM, Juwei Shi <shijuwei@gmail.com> wrote:
> Harsh,
>
> Do you know why reducers start one by one with serveral seconds' interval?
> They do not start at the same time. For example, if we set the reduce task
> capacity (max concurrent reduce tasks) to be 100, and the average run time
> of a reduce task is 15 second. Althrough all map tasks are completed, some
> reduce tasks are not initiated when the prior reduce tasks have already
> completed. Then the number of concurrent running reduce tasks will be about
> 20 rather than 100.
>
> This may not be a problem because MapReduce is designed for high throughput
> not low latency. But if I have some requirement to optimize the latency, do
> you know how to control it? Either by tuning parameters or changing some
> source code such as heartbeat interval.

Have a look at this thread:
http://search-hadoop.com/m/bYupFnX7FY1/number+of+tasks+assign+per+heartbeat

It is basically to gain better spreading of reduce tasks across TT
hosts (improves network usage). You can try writing your own scheduler
and/or investigate alternative scheduler behaviors.

-- 
Harsh J

Mime
View raw message