hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: HyperThreading in TaskTracker nodes?
Date Tue, 05 Feb 2013 21:09:27 GMT
Power issues aside, I've seen similar sorts of performance gains for MR
workloads - around 15-20%.

I think a fair bit of it is due to poor CPU cache utilization in various
parts of Hadoop - hyperthreading gets some extra parallelism there while
the core is waiting on round trips to DRAM.


On Tue, Feb 5, 2013 at 10:03 AM, Brad Sarsfield <brad@bing.com> wrote:

> Hate to say it, but HyperThreading can have either positive or negative
> performance characteristics.  It all depends on your workload.  You have to
> measure very careful; it may not even be a bottleneck(!) :)
> I hit a pretty significant power issue when I enable HyperThreading at
> multi-thousand node scale.  We hit a ~8-10% power utilization increase,
> which, if rolled out to the entire cluster, would put me a few %'ge over
> our max spec power. In this case, for our workload, we actually saw a 15%
> increase in processing throughput / job latency.   We ended up literally
> turning off machines and enabling HyperThreading on the remaining and saw
> an overall ~10% efficiency gain in the cluster, with a few less machines,
> but running hot on power.
> ~Brad
> -----Original Message-----
> From: Terry Healy [mailto:thealy@bnl.gov]
> Sent: Tuesday, February 5, 2013 7:20 AM
> To: user@hadoop.apache.org
> Subject: HyperThreading in TaskTracker nodes?
> I would like to get some opinions / recommendations about the pros and
> cons of enabling HyperThreading on TaskTracker nodes. Presumably memory
> could be an issue, but is there anything to be gained, perhaps because of
> I/O wait? My small cluster is made of relatively slow and old systems,
> which mostly are quite slow to/from disk, if that matters.
> Thanks,
> Terry

Todd Lipcon
Software Engineer, Cloudera

View raw message