hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Scott Carey <sc...@richrelevance.com>
Subject Re: Dedicated disk for operating system
Date Wed, 10 Aug 2011 17:24:25 GMT

On 8/10/11 7:56 AM, "Evert Lammerts" <Evert.Lammerts@sara.nl> wrote:

>A short, slightly off-topic question:
>>       Also note that in this configuration that one cannot take
>> advantage of the "keep the machine up at all costs" features in newer
>> Hadoop's, which require that root, swap, and the log area be mirrored
>> to be truly effective.  I'm not quite convinced that those features are
>> worth it yet for anything smaller than maybe a 12 disk config.
>Dell and Cloudera promote the C2100. I'd like to see the calculations
>behind that config. Am I wrong thinking that keeping your cluster up with
>such dense nodes will only work if you have many (order of magnitude
>100+) of them, and interconnected with 10Gb Ethernet?
>If you don't then recovery times from failing disks / rack switches are
>going to get crazy, right? If you want to get bang for buck, don't the
>proportions "disk IO / processing power", "node storage capacity /
>ethernet speed" and "total amount of nodes / ethernet speed", indicate
>many small nodes with not too many disks and 1Gb Ethernet?

IMO and experience, absolutely.

Get to 40 nodes before you even think about going for high density
machines with more than 5 drives and the higher end network infrastructure

1GB ethernet (or 2x1GB bonded) with smaller machines (1 quad+ core, 4
drives) and 40 nodes will beat the same total cost cluster with larger (8+
drives, 2x quad core cpu, double the RAM) nodes (~18 or so, with more
cores but lower Mhz) every time. And your failure scenario (replicate a
40th of data versus an 18th if a node fails) is better.

In short, try to get to 50 to 100 nodes before you look at larger
machines.  For larger clusters (200 - thousands) the tradeoffs are very
different -- expensive network infrastructure is required and the
incremental cost of going to 10Gb network isn't as large.

The cost sweet spot for a server goes from ~$4k to ~$10k depending on
cluster size and whether power or space is a larger cost or limiting

FWIW, Dell r310's are decent small nodes (r410's are popular for more CPU
heavy workloads, but I'm not a fan due to the larger CPU/disk ratio and
much higher CPU$ / (CPU * Mhz) ratio).  Next gen single socket, 4 drive
servers with SandyBridge Xeon E 1200 series processors will use even less
power and have about 50% better CPU performance per core than today's
common Nehalem processors used in 2 socket servers due to much higher
clock speeds and about 15% better performance at the same clock.

The next gen Intel processors after that (Ivy Bridge) promise another big
Mhz jump without a power increase in late 2012 / early 2013.  At that
point, I expect that single socket, 4 or 6 core machines will be optimal
for a larger range of use cases than now. Per socket CPU power is
increasing at a faster rate than drive and network performance.


View raw message