hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: hadoop idle time on terasort
Date Wed, 02 Dec 2009 23:30:53 GMT
Hi Vasilis,

This is seen reasonably often, and could be partly due to missed
configuration changes. A few things to check:

- Did you increase the number of tasks per node from the default? If you
have a reasonable number of disks/cores, you're going to want to run a lot
more than 2 map and 2 reduce tasks on each node.

- Have you tuned any other settings? If you google around you can find some
guides for configuration tuning that should help squeeze some performance
out of your cluster.

There are several patches that aren't in 0.20.1 but will be in 0.21 that
help performance. These aren't eligible for backport into 0.20 since point
releases are for bug fixes only. Some are eligible for backporting into
Cloudera's distro (or Yahoo's) and may show up in our next release (CDH3)
which should be available first in January for those who like to live on the
edge.

Thanks,
-Todd

On Wed, Dec 2, 2009 at 12:22 PM, Vasilis Liaskovitis <vliaskov@gmail.com>wrote:

> Hi,
>
> I am using hadoop-0.20.1 to run terasort and randsort benchmarking
> tests on a small 8-node linux cluster. Most runs consist of usually
> low (<50%) core utilizations in the map and reduce phase, as well as
> heavy I/O phases . There is usually a large fraction of runtime for
> which cores are idling and i/o disk traffic is not heavy.
>
> On average for the duration of a terasort run I get 20-30% cpu
> utilization, 10-30% iowait times and the rest 40-70% is idle time.
> This is data collected with mpstat for the duration of the run across
> the cores of a specific node. This utilization behaviour is true and
> symmetric for all tasktracker/data nodes (The namenode cores and I/O
> are mostly idle, so there doesn’t seem to be a bottleneck in the
> namenode).
>
> I am looking for an explanation for the significant idle-time in the
> runs. Could it have something to do with misconfigured network/RPC
> latency hadoop paremeters? For example, I have tried to increase
> mapred.heartbeats.in.second to 1000 from 100 but that didn’t help. The
> network bandwidth (1Gige card on each node) is not saturated during
> the runs, according to my netstat results.
>
> Have other people noticed significant cpu idle times that can’t be
> explained by I/O traffic?
>
> Is it reasonable to always expect decreasing idle times as the
> terasort dataset scales on the same cluster? I ‘ve only tried 2 small
> datasets of 40GB and 64GB each, but core utilizations didn’t increase
> with the runs done so far.
>
> Yahoo’s paper on terasort (http://sortbenchmark.org/Yahoo2009.pdf)
> mentions several performance optimizations, some of which seem
> relevant to idle times. I am wondering which, if any, of the yahoo
> patches are part of the hadoop-0.20.1 distribution.
>
> Would it be a good idea to try a development version of hadoop to
> resolve this issue?
>
> thanks,
>
> - Vasilis
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message