hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michel Segel <michael_se...@hotmail.com>
Subject Re: Hadoop and hardware
Date Sat, 17 Dec 2011 11:30:51 GMT
Uhm... If I may add something...

Joep is correct. There are a lot of factors that will effect your cluster design.
And there have been a lot of threads on this topic because hardware prices frequently change
along with advances in technology as well as non-commodity solutions aimed at niche spaces.
Plus this is the biggest decision that you can't easily change and you are forced to live
with it...
(I think there's a potential blog in this ...)

Going from memory and at 4:30 am is not a good thing to do, I believe that in a standard rack
there are 42 1U spaces so you can fit 20 2U boxes in your rack and still have room for your
ToR switch. There is also the issue of power and cooling that may take up space too...

The one common question that no one seems to ask is .... "What are your constraints?"
For some it may be physical space, others budget... power... hardware availability.... 
That one question will have a big impact on you cluster design and nobody asks it. 

With respect to quad socket vs dual socket...

There was a post on Cloudera's site which recommended 2 drives per core so w 16 cores, you
would have 32 spindles. Maximizing your data density, you will want 3.5" drives. I don't think
that you can fit 16 3.5" drives in a 2U box, let alone 32 ...  Note that I didn't even think
about 24 cores...

And as Joep points out that with this much disk 1GBe even port bonded isn't going to cut it...

Lots of thing to think about...

Sent from a remote device. Please excuse any typos...

Mike Segel

On Dec 16, 2011, at 10:47 AM, "J. Rottinghuis" <jrottinghuis@gmail.com> wrote:

> Pierre,
> As discussed in recent other threads, it depends.
> The most sensible thing for Hadoop nodes is to find a sweet spot for
> price/performance.
> In general that will mean keeping a balance between compute power, disks,
> and network bandwidth, and factor in racks, space, operating costs etc.
> How much storage capacity are you thinking of when you target "about 120
> data nodes"?
> If you had for example 60 quad core nodes with 12 * 2 TB disks (or more) I
> would suspect you would be bottle-necked on your 1GB network connections.
> Other things to consider is how many nodes per rack? If these 60 nodes
> would be 2u and you'd fit 20 nodes in a rack, then loosing one top of the
> rack switch means loosing 1/3 of the capacity of your cluster.
> Yet another consideration is how easily you want to be able to expand your
> cluster incrementally? Until you run Hadoop 0.23 you probably want all your
> nodes to be roughly similar in capacity.
> Cheers,
> Joep
> On Fri, Dec 16, 2011 at 3:50 AM, Cussol <pierre.cussol@cnes.fr> wrote:
>> In my company, we intend to set up an hadoop cluster to run analylitics
>> applications. This cluster would have about 120 data nodes with dual
>> sockets
>> servers with a GB interconnect. We are also exploring a solution with 60
>> quad sockets servers. How do compare the quad sockets and dual sockets
>> servers in an hadoop cluster ?
>> any help ?
>> pierre
>> --
>> View this message in context:
>> http://old.nabble.com/Hadoop-and-hardware-tp32987374p32987374.html
>> Sent from the Hadoop core-user mailing list archive at Nabble.com.

View raw message