hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeffrey Buell <jbu...@vmware.com>
Subject Re: question of how to take full advantage of cluster resources
Date Fri, 14 Dec 2012 23:29:15 GMT
Number of CPU cores is just one of several hardware constraints on the number of tasks that
can be run efficiently at the same time. Other constraints: 

- Usually 1 to 2 map tasks per physical disk 
- Leave half the memory of the machine for the buffer cache and other things, and note that
the task memory might be twice the maximum heap size. I'd say 4 GB/core is minimum, 8-12 GB/core
would be better. 
- With 32 cores you need at least 10 GbE networking 


----- Original Message -----

From: "Guang Yang" <gyang@millennialmedia.com> 
To: user@hadoop.apache.org 
Cc: "Peter Sheridan" <psheridan@millennialmedia.com>, "Jim Brooks" <jbrooks@millennialmedia.com>

Sent: Friday, December 14, 2012 3:03:54 PM 
Subject: question of how to take full advantage of cluster resources 


We have a beefy Hadoop cluster with 12 worker nodes and each one with 32 cores. We have been
running Map/reduce jobs on this cluster and we noticed that if we configure the Map/Reduce
capacity in the cluster to be less than the available processors in the cluster (32 x 12 =
384), say 216 map slots and 144 reduce slots (360 total), the jobs run okay. But if we configure
the total Map/Reduce capacity to be more than 384, we observe that sometimes job runs unusual
long and the symptom is that certain tasks (usually map tasks) are stuck in "initializing"
stage for a long time on certain nodes, before get processed. The nodes exhibiting this behavior
are random and not tied to specific boxes. Isn't the general rule of thumb of configuring
M/R capacity to be twice the number of processors in the cluster? What do people usually do
to try to maximize the usage of the cluster resources in term of cluster capacity configuration?
I'd appreciate any responses. 

Guang Yang 

View raw message