hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: question of how to take full advantage of cluster resources
Date Fri, 14 Dec 2012 23:12:57 GMT
Please add in your RAM details as well, as that matters for
concurrently spawned JVMs.

On Sat, Dec 15, 2012 at 4:33 AM, Guang Yang <gyang@millennialmedia.com> wrote:
> Hi,
> We have a beefy Hadoop cluster with 12 worker nodes and each one with 32
> cores. We have been running Map/reduce jobs on this cluster and we noticed
> that if we configure the Map/Reduce capacity in the cluster to be less than
> the available processors in the cluster (32 x 12 = 384), say 216 map slots
> and 144 reduce slots (360 total), the jobs run okay. But if we configure the
> total Map/Reduce capacity to be more than 384, we observe that sometimes job
> runs unusual long and the symptom is that certain tasks (usually map tasks)
> are stuck in "initializing" stage for a long time on certain nodes, before
> get processed. The nodes exhibiting this behavior are random and not tied to
> specific boxes. Isn't the general rule of thumb of configuring M/R capacity
> to be twice the number of processors in the cluster? What do people usually
> do to try to maximize the usage of the cluster resources in term of cluster
> capacity configuration? I'd appreciate any responses.
> Thanks,
> Guang Yang

Harsh J

View raw message