hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Juwei Shi <shiju...@gmail.com>
Subject Re: reduce copy rate
Date Fri, 15 Apr 2011 17:13:41 GMT

Do you know why reducers start one by one with serveral seconds' interval?
They do not start at the same time. For example, if we set the reduce task
capacity (max concurrent reduce tasks) to be 100, and the average run time
of a reduce task is 15 second. Althrough all map tasks are completed, some
reduce tasks are not initiated when the prior reduce tasks have already
completed. Then the number of concurrent running reduce tasks will be about
20 rather than 100.

This may not be a problem because MapReduce is designed for high throughput
not low latency. But if I have some requirement to optimize the latency, do
you know how to control it? Either by tuning parameters or changing some
source code such as heartbeat interval.

2011/4/16 Harsh J <harsh@cloudera.com>

> Hello Baran,
> On Fri, Apr 15, 2011 at 8:19 PM, baran cakici <barancakici@gmail.com>
> wrote:
> > Hi,
> >
> > I have a question about copy speed by a MapReduce Job.I have a Cluster
> with
> > 4 slave and 1 master, computers connected each other with one
> 8-Port-Switch
> > (up to 1000Mbps).  Copy speed is by my Job 1,6 - 1,8MB.  Is it not too
> > slow?
> As Juwei has pointed out, the Reducer's copy phase is a progressive
> one by default (Controlled by [1]). Reducers may be initialized when
> 5% of your maps may have reported completion, and begin copying early
> outputs as they become available to it. Thus, you won't really see the
> maximum copy rate possible all the time (since the data is pulled
> gradually, but grows in size as map waves begin completing further).
> [1] - The mapred.reduce.slowstart.completed.maps property @
> http://hadoop.apache.org/common/docs/r0.20.0/mapred-default.html
> --
> Harsh J

- Juwei

View raw message