hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ranjith raghunath <ranjith.raghuna...@gmail.com>
Subject Re: Spindle per Cores
Date Sat, 13 Oct 2012 03:27:50 GMT
Thanks Michael.
On Oct 12, 2012 9:59 PM, "Michael Segel" <michael_segel@hotmail.com> wrote:

> I think what we are seeing is the ratio based on physical Xeon cores.
> So hyper threading wouldn't make any change to  the actual ratio.
> (1 disk per physical core, would be 1 disk per 2 virtual cores.)
>
> Again YMMV and of course thanks to this guy Moore who decided to write
> some weird laws... the ratio could change over time as the CPUs become more
> efficient and faster.
>
>
> On Oct 12, 2012, at 9:52 PM, ranjith raghunath <
> ranjith.raghunath1@gmail.com> wrote:
>
> Does hypertheading affect this ratio?
> On Oct 12, 2012 9:36 PM, "Michael Segel" <michael_segel@hotmail.com>
> wrote:
>
>> First, the obvious caveat... YMMV
>>
>> Having said that.
>>
>> The key here is to take a look across the various jobs that you will run.
>> Some may be more CPU intensive, others more I/O intensive.
>>
>> If you monitor these jobs via Ganglia, when you have too few spindles you
>> should see the wait cpu rise on the machines in the cluster.  That is to
>> say that you are putting an extra load on the systems because you're
>> waiting for the disks to catch up.
>>
>> If you increase the ratio of disks to CPU, you should see that load drop
>> as you are not wasting CPU cycles.
>>
>> Note that its not just the number of spindles, but also the bus and the
>> controller cards that can also affect the throughput of disk I/O.
>>
>> Now just IMHO, there was a discussion on some of the CPU recommendations.
>> To a point, it doesn't matter that much. You want to maximize the bang for
>> the buck you can get w your hardware purchase.
>>
>> Use the ratio as a buying guide. Fewer than a ratio of 1 disk per core,
>> and you're wasting the cpu that you bought.
>>
>> Going higher than a ratio of 1, like 1.5, and you may be buying too many
>> spindles and not see a performance gain that offsets your cost.
>>
>> Search for a happy medium and don't sweat the maximum performance that
>> you may get.
>>
>> HTH
>>
>> On Oct 12, 2012, at 4:19 PM, Jeffrey Buell <jbuell@vmware.com> wrote:
>>
>> > I've done some experiments along these lines.  I'm using
>> high-performance 15K RPM SAS drives instead of the more usual SATA drives,
>> which should reduce the number of drives I need.  I have dual 4-core
>> processors at 3.6 GHz.  These are more powerful than the average 4-core
>> processor, which should increase the number of drives I need.  Assuming
>> these 2 effects cancel, then my results should also apply to machines with
>> SATA drives and average processors.  Using 8 drives (1-1) gets good
>> performance for teragen and terasort.  Going to 12 drives (1.5 per core)
>> increases terasort performance by 15%.  That might not seem like much
>> compared to increasing the number of drives by 50%, but a better comparison
>> is that 4 extra drives increased the cost of each machine by only about
>> 12%, so the extra drives are (barely) worth it. If you're more time
>> sensitive than cost sensitive, they they're definitely worth it.  The extra
>> drives did not help teragen, apparently because both CPU and the internal
>> storage controller were close to saturation. So, of course everything
>> depends on the app.  You're shooting for saturated CPUs and disk bandwidth.
>>  Check that the CPU is not saturated (after checking Hadoop tuning and
>> optimizing the number of tasks). Check that you have enough memory for more
>> tasks with room leftover for a large buffer cache.  Use 10 GbE networking
>> or make sure the network has enough headroom.  Check the storage controller
>> can handle more bandwidth.  If all are true (that is, no other
>> bottlenecks), consider adding more drives.
>> >
>> > Jeff
>> >
>> >> -----Original Message-----
>> >> From: Hank Cohen [mailto:hank.cohen@altior.com]
>> >> Sent: Friday, October 12, 2012 1:46 PM
>> >> To: user@hadoop.apache.org
>> >> Subject: RE: Spindle per Cores
>> >>
>> >> What empirical evidence is there for this rule of thumb?
>> >> In other words, what tests or metrics would indicate an optimal
>> >> spindle/core ratio and how dependent is this on the nature of the data
>> >> and of the map/reduce computation?
>> >>
>> >> My understanding is that there are lots of clusters with more spindles
>> >> than cores.  Specifically, typical 2U servers can hold 12 3.5" disk
>> >> drives.  So lots of Hadoop clusters have dual 4 core processors and 12
>> >> spindles.  Would it be better to have 6 core processors if you are
>> >> loading up the boxes with 12 disks?  And most importantly, how would
>> >> one know that the mix was optimal?
>> >>
>> >> Hank Cohen
>> >> Altior Inc.
>> >>
>> >> -----Original Message-----
>> >> From: Patai Sangbutsarakum [mailto:silvianhadoop@gmail.com]
>> >> Sent: Friday, October 12, 2012 10:46 AM
>> >> To: user@hadoop.apache.org
>> >> Subject: Spindle per Cores
>> >>
>> >> I have read around about the hardware recommendation for hadoop
>> >> cluster.
>> >> One of them is recommend 1:1 ratio between spindle per core.
>> >>
>> >> Intel CPU come with Hyperthread which will double the number cores on
>> >> one physical CPU. eg. 8 cores with Hyperthread it because 16 which is
>> >> where we start to calculate about number of task slots per node.
>> >>
>> >> Once it come to spindle, i strongly believe I should pick 8 cores and
>> >> picks 8 disks in order to get 1:1 ratio.
>> >>
>> >> Please suggest
>> >> Patai
>> >>
>> >
>> >
>>
>>
>

Mime
View raw message