hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Marc Spaggiari <jean-m...@spaggiari.org>
Subject Re: recommended nodes
Date Wed, 28 Nov 2012 16:33:04 GMT
Hi Mike,

Thanks for all those details!

So to simplify the equation, for 16 virtual cores we need 48 to 64GB.
Which mean 3 to 4GB per core. So with quad cores, 12GB to 16GB are a
good start? Or I simplified it to much?

Regarding the hard drives. If you add more than one drive, do you need
to build them on RAID or similar systems? Or can Hadoop/HBase be
configured to use more than one drive?

Thanks,

JM

2012/11/27, Michael Segel <michael_segel@hotmail.com>:
>
> OK... I don't know why Cloudera is so hung up on 32GB. ;-) [Its an inside
> joke ...]
>
> So here's the problem...
>
> By default, your child processes in a map/reduce job get a default 512MB.
> The majority of the time, this gets raised to 1GB.
>
> 8 cores (dual quad cores) shows up at 16 virtual processors in Linux. (Note:
> This is why when people talk about the number of cores, you have to specify
> physical cores or logical cores....)
>
> So if you were to over subscribe and have lets say 12  mappers and 12
> reducers, that's 24 slots. Which means that you would need 24GB of memory
> reserved just for the child processes. This would leave 8GB for DN, TT and
> the rest of the linux OS processes.
>
> Can you live with that? Sure.
> Now add in R, HBase, Impala, or some other set of tools on top of the
> cluster.
>
> Ooops! Now you are in trouble because you will swap.
> Also adding in R, you may want to bump up those child procs from 1GB to 2
> GB. That means the 24 slots would now require 48GB.  Now you have swap and
> if that happens you will see HBase in a cascading failure.
>
> So while you can do a rolling restart with the changed configuration
> (reducing the number of mappers and reducers) you end up with less slots
> which will mean in longer run time for your jobs. (Less slots == less
> parallelism )
>
> Looking at the price of memory... you can get 48GB or even 64GB  for around
> the same price point. (8GB chips)
>
> And I didn't even talk about adding SOLR either again a memory hog... ;-)
>
> Note that I matched the number of mappers w reducers. You could go with
> fewer reducers if you want. I tend to recommend a ratio of 2:1 mappers to
> reducers, depending on the work flow....
>
> As to the disks... no 7200 SATA III drives are fine. SATA III interface is
> pretty much available in the new kit being shipped.
> Its just that you don't have enough drives. 8 cores should be 8 spindles if
> available.
> Otherwise you end up seeing your CPU load climb on wait states as the
> processes wait for the disk i/o to catch up.
>
> I mean you could build out a cluster w 4 x 3 3.5" 2TB drives in a 1 U
> chassis based on price. You're making a trade off and you should be aware of
> the performance hit you will take.
>
> HTH
>
> -Mike
>
> On Nov 27, 2012, at 1:52 PM, Jean-Marc Spaggiari <jean-marc@spaggiari.org>
> wrote:
>
>> Hi Michael,
>>
>> so are you recommanding 32Gb per node?
>>
>> What about the disks? SATA drives are to slow?
>>
>> JM
>>
>> 2012/11/26, Michael Segel <michael_segel@hotmail.com>:
>>> Uhm, those specs are actually now out of date.
>>>
>>> If you're running HBase, or want to also run R on top of Hadoop, you
>>> will
>>> need to add more memory.
>>> Also forget 1GBe got 10GBe,  and w 2 SATA drives, you will be disk i/o
>>> bound
>>> way too quickly.
>>>
>>>
>>> On Nov 26, 2012, at 8:05 AM, Marcos Ortiz <mlortiz@uci.cu> wrote:
>>>
>>>> Are you asking about hardware recommendations?
>>>> Eric Sammer on his "Hadoop Operations" book, did a great job about
>>>> this:
>>>> For middle size clusters (until 300 nodes):
>>>> Processor: A dual quad-core 2.6 Ghz
>>>> RAM: 24 GB DDR3
>>>> Dual 1 Gb Ethernet NICs
>>>> a SAS drive controller
>>>> at least two SATA II drives in a JBOD configuration
>>>>
>>>> The replication factor depends heavily of the primary use of your
>>>> cluster.
>>>>
>>>> On 11/26/2012 08:53 AM, David Charle wrote:
>>>>> hi
>>>>>
>>>>> what's the recommended nodes for NN, hmaster and zk nodes for a larger
>>>>> cluster, lets say 50-100+
>>>>>
>>>>> also, what would be the ideal replication factor for larger clusters
>>>>> when
>>>>> u have 3-4 racks ?
>>>>>
>>>>> --
>>>>> David
>>>>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
>>>>> INFORMATICAS...
>>>>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>>>>>
>>>>> http://www.uci.cu
>>>>> http://www.facebook.com/universidad.uci
>>>>> http://www.flickr.com/photos/universidad_uci
>>>>
>>>> --
>>>>
>>>> Marcos Luis Ortíz Valmaseda
>>>> about.me/marcosortiz <http://about.me/marcosortiz>
>>>> @marcosluis2186 <http://twitter.com/marcosluis2186>
>>>>
>>>>
>>>>
>>>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
>>>> INFORMATICAS...
>>>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>>>>
>>>> http://www.uci.cu
>>>> http://www.facebook.com/universidad.uci
>>>> http://www.flickr.com/photos/universidad_uci
>>>
>>>
>>
>
>

Mime
View raw message