hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: Reduce Copy Speed
Date Mon, 01 Oct 2012 18:12:58 GMT
Hi Brandon,

On Mon, Oct 1, 2012 at 11:23 PM, Brandon <bmason@upstreamsoftware.com> wrote:
> What speed do people typically see for the copy during a reduce?

It varies due to a few factors. But there's highly improved
netty-based transfers in Hadoop 2.x that you can use for even faster,
and more reliable transfers.

> From tasktracker here is an average on:
> reduce > copy (500 of 504 at 1.52 MB/s) >
> We have seen it range from .5 to 4 MB/s.
> That seems a bit slow.

Slow compared to what exactly? How many concurrent reducers fetch at
the same time from a single machine? And what is your slowstart
threshold, which would dictate that the reducers wait for many more
maps than 5% to finish before beginning to pull data from other
tasktrackers - leading to continuous large transfers rather than
small, frequent transfers of a few tasks at a time - saving resources
and improving speed.

> Does anyone else have other benchmark numbers to share?

Harsh J

View raw message