hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeffrey Buell <jbu...@vmware.com>
Subject RE: Spindle per Cores
Date Fri, 12 Oct 2012 21:19:53 GMT
I've done some experiments along these lines.  I'm using high-performance 15K RPM SAS drives
instead of the more usual SATA drives, which should reduce the number of drives I need.  I
have dual 4-core processors at 3.6 GHz.  These are more powerful than the average 4-core processor,
which should increase the number of drives I need.  Assuming these 2 effects cancel, then
my results should also apply to machines with SATA drives and average processors.  Using 8
drives (1-1) gets good performance for teragen and terasort.  Going to 12 drives (1.5 per
core) increases terasort performance by 15%.  That might not seem like much compared to increasing
the number of drives by 50%, but a better comparison is that 4 extra drives increased the
cost of each machine by only about 12%, so the extra drives are (barely) worth it. If you're
more time sensitive than cost sensitive, they they're definitely worth it.  The extra drives
did not help teragen, apparently because both CPU and the internal storage controller were
close to saturation. So, of course everything depends on the app.  You're shooting for saturated
CPUs and disk bandwidth.  Check that the CPU is not saturated (after checking Hadoop tuning
and optimizing the number of tasks). Check that you have enough memory for more tasks with
room leftover for a large buffer cache.  Use 10 GbE networking or make sure the network has
enough headroom.  Check the storage controller can handle more bandwidth.  If all are true
(that is, no other bottlenecks), consider adding more drives.


> -----Original Message-----
> From: Hank Cohen [mailto:hank.cohen@altior.com]
> Sent: Friday, October 12, 2012 1:46 PM
> To: user@hadoop.apache.org
> Subject: RE: Spindle per Cores
> What empirical evidence is there for this rule of thumb?
> In other words, what tests or metrics would indicate an optimal
> spindle/core ratio and how dependent is this on the nature of the data
> and of the map/reduce computation?
> My understanding is that there are lots of clusters with more spindles
> than cores.  Specifically, typical 2U servers can hold 12 3.5" disk
> drives.  So lots of Hadoop clusters have dual 4 core processors and 12
> spindles.  Would it be better to have 6 core processors if you are
> loading up the boxes with 12 disks?  And most importantly, how would
> one know that the mix was optimal?
> Hank Cohen
> Altior Inc.
> -----Original Message-----
> From: Patai Sangbutsarakum [mailto:silvianhadoop@gmail.com]
> Sent: Friday, October 12, 2012 10:46 AM
> To: user@hadoop.apache.org
> Subject: Spindle per Cores
> I have read around about the hardware recommendation for hadoop
> cluster.
> One of them is recommend 1:1 ratio between spindle per core.
> Intel CPU come with Hyperthread which will double the number cores on
> one physical CPU. eg. 8 cores with Hyperthread it because 16 which is
> where we start to calculate about number of task slots per node.
> Once it come to spindle, i strongly believe I should pick 8 cores and
> picks 8 disks in order to get 1:1 ratio.
> Please suggest
> Patai

View raw message