hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Guang Yang <gy...@millennialmedia.com>
Subject question of how to take full advantage of cluster resources
Date Fri, 14 Dec 2012 23:03:54 GMT

We have a beefy Hadoop cluster with 12 worker nodes and each one with 32 cores. We have been
running Map/reduce jobs on this cluster and we noticed that if we configure the Map/Reduce
capacity in the cluster to be less than the available processors in the cluster (32 x 12 =
384), say 216 map slots and 144 reduce slots (360 total), the jobs run okay. But if we configure
the total Map/Reduce capacity to be more than 384, we observe that sometimes job runs unusual
long and the symptom is that certain tasks (usually map tasks) are stuck in "initializing"
stage for a long time on certain nodes, before get processed. The nodes exhibiting this behavior
are random and not tied to specific boxes. Isn't the general rule of thumb of configuring
M/R capacity to be twice the number of processors in the cluster? What do people usually do
to try to maximize the usage of the cluster resources in term of cluster capacity configuration?
I'd appreciate any responses.

Guang Yang

View raw message