hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Marc Spaggiari <jean-m...@spaggiari.org>
Subject Re: recommended nodes
Date Wed, 28 Nov 2012 17:26:49 GMT
Hi Mike,

Helped a lot. Just pointed me that not any of my nodes is correct ;)
But now I know which way to go.

Regarding SATA II vs SATA III is there a big difference? I found many
JBOD cards working with SATAII but I did not found any (at good price)
which is managing SATA III..

Or will LVM be able to replace a JBOD card? In the documentation it's
saying that LVM is suitable for "Creating single logical volumes of
multiple physical volumes or entire hard disks (somewhat similar to
RAID 0, but more similar to JBOD), allowing for dynamic volume
resizing.". This is what we want to achieve here, right?

JM


2012/11/28, Adrien Mogenet <adrien.mogenet@gmail.com>:
> Does HBase really benefit from 64 GB of RAM since allocating too large heap
> might increase GC time ?
>
> Another question : why not RAID 0, in order to aggregate disk bandwidth ?
> (and thus keep 3x replication factor)
>
>
> On Wed, Nov 28, 2012 at 5:58 PM, Michael Segel
> <michael_segel@hotmail.com>wrote:
>
>> Sorry,
>>
>> I need to clarify.
>>
>> 4GB per physical core is a good starting point.
>> So with 2 quad core chips, that is going to be 32GB.
>>
>> IMHO that's a minimum. If you go with HBase, you will want more.
>> (Actually
>> you will need more.) The next logical jump would be to 48 or 64GB.
>>
>> If we start to price out memory, depending on vendor, your company's
>> procurement,  there really isn't much of a price difference in terms of
>> 32,48, or 64 GB.
>> Note that it also depends on the chips themselves. Also you need to see
>> how many memory channels exist in the mother board. You may need to buy
>> in
>> pairs or triplets. Your hardware vendor can help you. (Also you need to
>> keep an eye on your hardware vendor. Sometimes they will give you higher
>> density chips that are going to be more expensive...) ;-)
>>
>> I tend to like having extra memory from the start.
>> It gives you a bit more freedom and also protects you from 'fat' code.
>>
>> Looking at YARN... you will need more memory too.
>>
>>
>> With respect to the hard drives...
>>
>> The best recommendation is to keep the drives as JBOD and then use 3x
>> replication.
>> In this case, make sure that the disk controller cards can handle JBOD.
>> (Some don't support JBOD out of the box)
>>
>> With respect to RAID...
>>
>> If you are running MapR, no need for RAID.
>> If you are running an Apache derivative, you could use RAID 1. Then cut
>> your replication to 2X. This makes it easier to manage drive failures.
>> (Its not the norm, but it works...) In some clusters, they are using
>> appliances like Net App's e series where the machines see the drives as
>> local attached storage and I think the appliances themselves are using
>> RAID.  I haven't played with this configuration, however it could make
>> sense and its a valid design.
>>
>> HTH
>>
>> -Mike
>>
>> On Nov 28, 2012, at 10:33 AM, Jean-Marc Spaggiari
>> <jean-marc@spaggiari.org>
>> wrote:
>>
>> > Hi Mike,
>> >
>> > Thanks for all those details!
>> >
>> > So to simplify the equation, for 16 virtual cores we need 48 to 64GB.
>> > Which mean 3 to 4GB per core. So with quad cores, 12GB to 16GB are a
>> > good start? Or I simplified it to much?
>> >
>> > Regarding the hard drives. If you add more than one drive, do you need
>> > to build them on RAID or similar systems? Or can Hadoop/HBase be
>> > configured to use more than one drive?
>> >
>> > Thanks,
>> >
>> > JM
>> >
>> > 2012/11/27, Michael Segel <michael_segel@hotmail.com>:
>> >>
>> >> OK... I don't know why Cloudera is so hung up on 32GB. ;-) [Its an
>> inside
>> >> joke ...]
>> >>
>> >> So here's the problem...
>> >>
>> >> By default, your child processes in a map/reduce job get a default
>> 512MB.
>> >> The majority of the time, this gets raised to 1GB.
>> >>
>> >> 8 cores (dual quad cores) shows up at 16 virtual processors in Linux.
>> (Note:
>> >> This is why when people talk about the number of cores, you have to
>> specify
>> >> physical cores or logical cores....)
>> >>
>> >> So if you were to over subscribe and have lets say 12  mappers and 12
>> >> reducers, that's 24 slots. Which means that you would need 24GB of
>> memory
>> >> reserved just for the child processes. This would leave 8GB for DN, TT
>> and
>> >> the rest of the linux OS processes.
>> >>
>> >> Can you live with that? Sure.
>> >> Now add in R, HBase, Impala, or some other set of tools on top of the
>> >> cluster.
>> >>
>> >> Ooops! Now you are in trouble because you will swap.
>> >> Also adding in R, you may want to bump up those child procs from 1GB
>> >> to
>> 2
>> >> GB. That means the 24 slots would now require 48GB.  Now you have swap
>> and
>> >> if that happens you will see HBase in a cascading failure.
>> >>
>> >> So while you can do a rolling restart with the changed configuration
>> >> (reducing the number of mappers and reducers) you end up with less
>> >> slots
>> >> which will mean in longer run time for your jobs. (Less slots == less
>> >> parallelism )
>> >>
>> >> Looking at the price of memory... you can get 48GB or even 64GB  for
>> around
>> >> the same price point. (8GB chips)
>> >>
>> >> And I didn't even talk about adding SOLR either again a memory hog...
>> ;-)
>> >>
>> >> Note that I matched the number of mappers w reducers. You could go
>> >> with
>> >> fewer reducers if you want. I tend to recommend a ratio of 2:1 mappers
>> to
>> >> reducers, depending on the work flow....
>> >>
>> >> As to the disks... no 7200 SATA III drives are fine. SATA III
>> >> interface
>> is
>> >> pretty much available in the new kit being shipped.
>> >> Its just that you don't have enough drives. 8 cores should be 8
>> spindles if
>> >> available.
>> >> Otherwise you end up seeing your CPU load climb on wait states as the
>> >> processes wait for the disk i/o to catch up.
>> >>
>> >> I mean you could build out a cluster w 4 x 3 3.5" 2TB drives in a 1 U
>> >> chassis based on price. You're making a trade off and you should be
>> aware of
>> >> the performance hit you will take.
>> >>
>> >> HTH
>> >>
>> >> -Mike
>> >>
>> >> On Nov 27, 2012, at 1:52 PM, Jean-Marc Spaggiari <
>> jean-marc@spaggiari.org>
>> >> wrote:
>> >>
>> >>> Hi Michael,
>> >>>
>> >>> so are you recommanding 32Gb per node?
>> >>>
>> >>> What about the disks? SATA drives are to slow?
>> >>>
>> >>> JM
>> >>>
>> >>> 2012/11/26, Michael Segel <michael_segel@hotmail.com>:
>> >>>> Uhm, those specs are actually now out of date.
>> >>>>
>> >>>> If you're running HBase, or want to also run R on top of Hadoop,
you
>> >>>> will
>> >>>> need to add more memory.
>> >>>> Also forget 1GBe got 10GBe,  and w 2 SATA drives, you will be disk
>> >>>> i/o
>> >>>> bound
>> >>>> way too quickly.
>> >>>>
>> >>>>
>> >>>> On Nov 26, 2012, at 8:05 AM, Marcos Ortiz <mlortiz@uci.cu>
wrote:
>> >>>>
>> >>>>> Are you asking about hardware recommendations?
>> >>>>> Eric Sammer on his "Hadoop Operations" book, did a great job
about
>> >>>>> this:
>> >>>>> For middle size clusters (until 300 nodes):
>> >>>>> Processor: A dual quad-core 2.6 Ghz
>> >>>>> RAM: 24 GB DDR3
>> >>>>> Dual 1 Gb Ethernet NICs
>> >>>>> a SAS drive controller
>> >>>>> at least two SATA II drives in a JBOD configuration
>> >>>>>
>> >>>>> The replication factor depends heavily of the primary use of
your
>> >>>>> cluster.
>> >>>>>
>> >>>>> On 11/26/2012 08:53 AM, David Charle wrote:
>> >>>>>> hi
>> >>>>>>
>> >>>>>> what's the recommended nodes for NN, hmaster and zk nodes
for a
>> larger
>> >>>>>> cluster, lets say 50-100+
>> >>>>>>
>> >>>>>> also, what would be the ideal replication factor for larger
>> >>>>>> clusters
>> >>>>>> when
>> >>>>>> u have 3-4 racks ?
>> >>>>>>
>> >>>>>> --
>> >>>>>> David
>> >>>>>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS
CIENCIAS
>> >>>>>> INFORMATICAS...
>> >>>>>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>> >>>>>>
>> >>>>>> http://www.uci.cu
>> >>>>>> http://www.facebook.com/universidad.uci
>> >>>>>> http://www.flickr.com/photos/universidad_uci
>> >>>>>
>> >>>>> --
>> >>>>>
>> >>>>> Marcos Luis Ortíz Valmaseda
>> >>>>> about.me/marcosortiz <http://about.me/marcosortiz>
>> >>>>> @marcosluis2186 <http://twitter.com/marcosluis2186>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
>> >>>>> INFORMATICAS...
>> >>>>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>> >>>>>
>> >>>>> http://www.uci.cu
>> >>>>> http://www.facebook.com/universidad.uci
>> >>>>> http://www.flickr.com/photos/universidad_uci
>> >>>>
>> >>>>
>> >>>
>> >>
>> >>
>> >
>>
>>
>
>
> --
> Adrien Mogenet
> 06.59.16.64.22
> http://www.mogenet.me
>

Mime
View raw message