hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mathias Herberts <mathias.herbe...@gmail.com>
Subject Re: Sanity check re: value of 10GbE NICs for Hadoop?
Date Tue, 28 Jun 2011 23:05:18 GMT
On Wed, Jun 29, 2011 at 01:02, Matei Zaharia <matei@eecs.berkeley.edu> wrote:
> Ideally, to evaluate whether you want to go for 10GbE NICs, you would profile your target
Hadoop workload and see whether it's communication-bound. Hadoop jobs can definitely be communication-bound
if you shuffle a lot of data between map and reduce, but I've also seen a lot of clusters
that are CPU-bound (due to decompression, running python, or just running expensive user code)
or disk-IO-bound. You might be surprised at what your bottleneck is.

>From my experience, jobs that shuffle lots of data are also very often
slowed down by the sort phase, compressing mappers' output is a first
step to improve performance. Given the cost of a 10GbE infrastructure
with no oversubscription I'd monitor bandwith usage very closely prior
to investing in that kind of network gear.

Mime
View raw message