hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Marc Spaggiari <jean-m...@spaggiari.org>
Subject Re: recommended nodes
Date Thu, 20 Dec 2012 22:07:47 GMT
I did the test with a 2GB file... So read and write were spread over the 2
drives for RAID0.

Those test were to give an overall idea of the performances vs CPU usage
etc. and you might need to adjust them based on the way it's configured on
your system.

I don't know how RAID0 is managing small files (<=64k) but maybe it's still
spread on the 2 disks too?

JM

2012/12/20 Varun Sharma <varun@pinterest.com>

> Hmm, I thought that RAID0 simply stripes across all disks. So if you got 4
> disks - an HFile block for example could get striped across 4 disks. So to
> read that block, you would need all 4 of them to seek so that you could
> read all 4 stripes for that HFile block. This could make things as slow as
> the slowest seeking disk for that random read. However, certainly, data
> xfer rate would be much faster with RAID0 but since this is merely 64K for
> a HFile block, I would have expected the seek latency to play a major role
> and not really the data xfer latency.
>
> However, your tests indeed show that RAID0 still outperforms JBOD on seeks.
> Am I missing something ?
>
> On Thu, Dec 20, 2012 at 1:26 PM, Jean-Marc Spaggiari <
> jean-marc@spaggiari.org> wrote:
>
> > Hi Varun,
> >
> > The hard drivers I used are now used on the hadoop/hbase cluster, but
> they
> > was clear and formated for the tests I did. The computer where I run
> those
> > tests was one of the region servers. It was re-installed to be very
> clear,
> > and it's now running a datanode and a RS.
> >
> > Regarding RAID, I think you are confusing RAID0 and RAID1. It's RAID1
> which
> > need to access the 2 files each time. RAID0 is more like JBOD, but
> faster.
> >
> > JM
> >
> > 2012/12/20 Varun Sharma <varun@pinterest.com>
> >
> > > Hi Jean,
> > >
> > > Very interesting benchmark - how are these numbers arrived at ? Is this
> > on
> > > a real hbase cluster ? To me, it felt kind of counter intuitive that
> > RAID0
> > > beats JBOD on random seeks because with RAID0 all disks need to seek at
> > the
> > > same time and the performance should basically be as bad as the slowest
> > > seeking disk.
> > >
> > > Varun
> > >
> > > On Wed, Dec 19, 2012 at 5:14 PM, Michael Segel <
> > michael_segel@hotmail.com
> > > >wrote:
> > >
> > > > Yeah,
> > > > I couldn't argue against LVMs when talking with the system admins.
> > > > In terms of speed its noise because the CPUs are pretty efficient and
> > > > unless you have more than 1 drive per physical core, you will end up
> > > > saturating your disk I/O.
> > > >
> > > > In terms of MapR, you want the raw disk. (But we're talking Apache)
> > > >
> > > >
> > > > On Dec 19, 2012, at 4:59 PM, Jean-Marc Spaggiari <
> > > jean-marc@spaggiari.org>
> > > > wrote:
> > > >
> > > > > Finally, it took me a while to run those tests because it was way
> > > > > longer than expected, but here are the results:
> > > > >
> > > > > http://www.spaggiari.org/bonnie.html
> > > > >
> > > > > LVM is not really slower than JBOD and not really taking more CPU.
> So
> > > > > I will say, if you have to choose between the 2, take the one you
> > > > > prefer. Personally, I prefer LVM because it's easy to configure.
> > > > >
> > > > > The big winner here is RAID0. It's WAY faster than anything else.
> But
> > > > > it's using twice the space... Your choice.
> > > > >
> > > > > I did not get a chance to test with the Ubuntu tool because it's
> not
> > > > > working with LVM drives.
> > > > >
> > > > > JM
> > > > >
> > > > > 2012/11/28, Michael Segel <michael_segel@hotmail.com>:
> > > > >> Ok, just a caveat.
> > > > >>
> > > > >> I am discussing MapR as part of a complete response. As Mohit
> posted
> > > > MapR
> > > > >> takes the raw device for their MapR File System.
> > > > >> They do stripe on their own within what they call a volume.
> > > > >>
> > > > >> But going back to Apache...
> > > > >> You can stripe drives, however I wouldn't recommend it. I don't
> > think
> > > > the
> > > > >> performance gains would really matter.
> > > > >> You're going to end up getting blocked first by disk i/o, then
> your
> > > > >> controller card, then your network... assuming 10GBe.
> > > > >>
> > > > >> With only 2 disks on an 8 core system, you will hit disk i/o
first
> > and
> > > > then
> > > > >> you'll watch your CPU Wait I/O climb.
> > > > >>
> > > > >> HTH
> > > > >>
> > > > >> -Mike
> > > > >>
> > > > >> On Nov 28, 2012, at 7:28 PM, Jean-Marc Spaggiari <
> > > > jean-marc@spaggiari.org>
> > > > >> wrote:
> > > > >>
> > > > >>> Hi Mike,
> > > > >>>
> > > > >>> Why not using LVM with MapR? Since LVM is reading from 2
drives
> > > almost
> > > > >>> at the same time, it should be better than RAID0 or a single
> drive,
> > > > >>> no?
> > > > >>>
> > > > >>> 2012/11/28, Michael Segel <michael_segel@hotmail.com>:
> > > > >>>> Just a couple of things.
> > > > >>>>
> > > > >>>> I'm neutral on the use of LVMs. Some would point out
that
> there's
> > > some
> > > > >>>> overhead, but on the flip side, it can make managing
the
> machines
> > > > >>>> easier.
> > > > >>>> If you're using MapR, you don't want to use LVMs but
raw
> devices.
> > > > >>>>
> > > > >>>> In terms of GC, its going to depend on the heap size
and not the
> > > total
> > > > >>>> memory. With respect to HBase. ... MSLABS is the way
to go.
> > > > >>>>
> > > > >>>>
> > > > >>>> On Nov 28, 2012, at 12:05 PM, Jean-Marc Spaggiari
> > > > >>>> <jean-marc@spaggiari.org>
> > > > >>>> wrote:
> > > > >>>>
> > > > >>>>> Hi Gregory,
> > > > >>>>>
> > > > >>>>> I founs this about LVM:
> > > > >>>>> -> http://blog.andrew.net.au/2006/08/09
> > > > >>>>> ->
> > > > >>>>>
> > > >
> http://www.phoronix.com/scan.php?page=article&item=fedora_15_lvm&num=2
> > > > >>>>>
> > > > >>>>> Seems that performances are still correct with it.
I will most
> > > > >>>>> probably give it a try and bench that too... I have
one new
> hard
> > > > drive
> > > > >>>>> which should arrived tomorrow. Perfect timing ;)
> > > > >>>>>
> > > > >>>>>
> > > > >>>>>
> > > > >>>>> JM
> > > > >>>>>
> > > > >>>>> 2012/11/28, Mohit Anchlia <mohitanchlia@gmail.com>:
> > > > >>>>>>
> > > > >>>>>>
> > > > >>>>>>
> > > > >>>>>>
> > > > >>>>>> On Nov 28, 2012, at 9:07 AM, Adrien Mogenet <
> > > > adrien.mogenet@gmail.com>
> > > > >>>>>> wrote:
> > > > >>>>>>
> > > > >>>>>>> Does HBase really benefit from 64 GB of RAM
since allocating
> > too
> > > > >>>>>>> large
> > > > >>>>>>> heap
> > > > >>>>>>> might increase GC time ?
> > > > >>>>>>>
> > > > >>>>>> Benefit you get is from OS cache
> > > > >>>>>>> Another question : why not RAID 0, in order
to aggregate disk
> > > > >>>>>>> bandwidth
> > > > >>>>>>> ?
> > > > >>>>>>> (and thus keep 3x replication factor)
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>> On Wed, Nov 28, 2012 at 5:58 PM, Michael
Segel
> > > > >>>>>>> <michael_segel@hotmail.com>wrote:
> > > > >>>>>>>
> > > > >>>>>>>> Sorry,
> > > > >>>>>>>>
> > > > >>>>>>>> I need to clarify.
> > > > >>>>>>>>
> > > > >>>>>>>> 4GB per physical core is a good starting
point.
> > > > >>>>>>>> So with 2 quad core chips, that is going
to be 32GB.
> > > > >>>>>>>>
> > > > >>>>>>>> IMHO that's a minimum. If you go with
HBase, you will want
> > more.
> > > > >>>>>>>> (Actually
> > > > >>>>>>>> you will need more.) The next logical
jump would be to 48 or
> > > 64GB.
> > > > >>>>>>>>
> > > > >>>>>>>> If we start to price out memory, depending
on vendor, your
> > > > company's
> > > > >>>>>>>> procurement,  there really isn't much
of a price difference
> in
> > > > terms
> > > > >>>>>>>> of
> > > > >>>>>>>> 32,48, or 64 GB.
> > > > >>>>>>>> Note that it also depends on the chips
themselves. Also you
> > need
> > > > to
> > > > >>>>>>>> see
> > > > >>>>>>>> how many memory channels exist in the
mother board. You may
> > need
> > > > to
> > > > >>>>>>>> buy
> > > > >>>>>>>> in
> > > > >>>>>>>> pairs or triplets. Your hardware vendor
can help you. (Also
> > you
> > > > need
> > > > >>>>>>>> to
> > > > >>>>>>>> keep an eye on your hardware vendor.
Sometimes they will
> give
> > > you
> > > > >>>>>>>> higher
> > > > >>>>>>>> density chips that are going to be more
expensive...) ;-)
> > > > >>>>>>>>
> > > > >>>>>>>> I tend to like having extra memory from
the start.
> > > > >>>>>>>> It gives you a bit more freedom and also
protects you from
> > 'fat'
> > > > >>>>>>>> code.
> > > > >>>>>>>>
> > > > >>>>>>>> Looking at YARN... you will need more
memory too.
> > > > >>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>> With respect to the hard drives...
> > > > >>>>>>>>
> > > > >>>>>>>> The best recommendation is to keep the
drives as JBOD and
> then
> > > use
> > > > >>>>>>>> 3x
> > > > >>>>>>>> replication.
> > > > >>>>>>>> In this case, make sure that the disk
controller cards can
> > > handle
> > > > >>>>>>>> JBOD.
> > > > >>>>>>>> (Some don't support JBOD out of the box)
> > > > >>>>>>>>
> > > > >>>>>>>> With respect to RAID...
> > > > >>>>>>>>
> > > > >>>>>>>> If you are running MapR, no need for
RAID.
> > > > >>>>>>>> If you are running an Apache derivative,
you could use RAID
> 1.
> > > > Then
> > > > >>>>>>>> cut
> > > > >>>>>>>> your replication to 2X. This makes it
easier to manage drive
> > > > >>>>>>>> failures.
> > > > >>>>>>>> (Its not the norm, but it works...) In
some clusters, they
> are
> > > > using
> > > > >>>>>>>> appliances like Net App's e series where
the machines see
> the
> > > > drives
> > > > >>>>>>>> as
> > > > >>>>>>>> local attached storage and I think the
appliances themselves
> > are
> > > > >>>>>>>> using
> > > > >>>>>>>> RAID.  I haven't played with this configuration,
however it
> > > could
> > > > >>>>>>>> make
> > > > >>>>>>>> sense and its a valid design.
> > > > >>>>>>>>
> > > > >>>>>>>> HTH
> > > > >>>>>>>>
> > > > >>>>>>>> -Mike
> > > > >>>>>>>>
> > > > >>>>>>>> On Nov 28, 2012, at 10:33 AM, Jean-Marc
Spaggiari
> > > > >>>>>>>> <jean-marc@spaggiari.org>
> > > > >>>>>>>> wrote:
> > > > >>>>>>>>
> > > > >>>>>>>>> Hi Mike,
> > > > >>>>>>>>>
> > > > >>>>>>>>> Thanks for all those details!
> > > > >>>>>>>>>
> > > > >>>>>>>>> So to simplify the equation, for
16 virtual cores we need
> 48
> > to
> > > > >>>>>>>>> 64GB.
> > > > >>>>>>>>> Which mean 3 to 4GB per core. So
with quad cores, 12GB to
> > 16GB
> > > > are
> > > > >>>>>>>>> a
> > > > >>>>>>>>> good start? Or I simplified it to
much?
> > > > >>>>>>>>>
> > > > >>>>>>>>> Regarding the hard drives. If you
add more than one drive,
> do
> > > you
> > > > >>>>>>>>> need
> > > > >>>>>>>>> to build them on RAID or similar
systems? Or can
> Hadoop/HBase
> > > be
> > > > >>>>>>>>> configured to use more than one drive?
> > > > >>>>>>>>>
> > > > >>>>>>>>> Thanks,
> > > > >>>>>>>>>
> > > > >>>>>>>>> JM
> > > > >>>>>>>>>
> > > > >>>>>>>>> 2012/11/27, Michael Segel <michael_segel@hotmail.com>:
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> OK... I don't know why Cloudera
is so hung up on 32GB. ;-)
> > > [Its
> > > > an
> > > > >>>>>>>> inside
> > > > >>>>>>>>>> joke ...]
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> So here's the problem...
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> By default, your child processes
in a map/reduce job get a
> > > > default
> > > > >>>>>>>> 512MB.
> > > > >>>>>>>>>> The majority of the time, this
gets raised to 1GB.
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> 8 cores (dual quad cores) shows
up at 16 virtual
> processors
> > in
> > > > >>>>>>>>>> Linux.
> > > > >>>>>>>> (Note:
> > > > >>>>>>>>>> This is why when people talk
about the number of cores,
> you
> > > have
> > > > >>>>>>>>>> to
> > > > >>>>>>>> specify
> > > > >>>>>>>>>> physical cores or logical cores....)
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> So if you were to over subscribe
and have lets say 12
> >  mappers
> > > > and
> > > > >>>>>>>>>> 12
> > > > >>>>>>>>>> reducers, that's 24 slots. Which
means that you would need
> > > 24GB
> > > > of
> > > > >>>>>>>> memory
> > > > >>>>>>>>>> reserved just for the child processes.
This would leave
> 8GB
> > > for
> > > > >>>>>>>>>> DN,
> > > > >>>>>>>>>> TT
> > > > >>>>>>>> and
> > > > >>>>>>>>>> the rest of the linux OS processes.
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> Can you live with that? Sure.
> > > > >>>>>>>>>> Now add in R, HBase, Impala,
or some other set of tools on
> > top
> > > > of
> > > > >>>>>>>>>> the
> > > > >>>>>>>>>> cluster.
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> Ooops! Now you are in trouble
because you will swap.
> > > > >>>>>>>>>> Also adding in R, you may want
to bump up those child
> procs
> > > from
> > > > >>>>>>>>>> 1GB
> > > > >>>>>>>>>> to
> > > > >>>>>>>> 2
> > > > >>>>>>>>>> GB. That means the 24 slots would
now require 48GB.  Now
> you
> > > > have
> > > > >>>>>>>>>> swap
> > > > >>>>>>>> and
> > > > >>>>>>>>>> if that happens you will see
HBase in a cascading failure.
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> So while you can do a rolling
restart with the changed
> > > > >>>>>>>>>> configuration
> > > > >>>>>>>>>> (reducing the number of mappers
and reducers) you end up
> > with
> > > > less
> > > > >>>>>>>>>> slots
> > > > >>>>>>>>>> which will mean in longer run
time for your jobs. (Less
> > slots
> > > ==
> > > > >>>>>>>>>> less
> > > > >>>>>>>>>> parallelism )
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> Looking at the price of memory...
you can get 48GB or even
> > > 64GB
> > > > >>>>>>>>>> for
> > > > >>>>>>>> around
> > > > >>>>>>>>>> the same price point. (8GB chips)
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> And I didn't even talk about
adding SOLR either again a
> > memory
> > > > >>>>>>>>>> hog...
> > > > >>>>>>>> ;-)
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> Note that I matched the number
of mappers w reducers. You
> > > could
> > > > go
> > > > >>>>>>>>>> with
> > > > >>>>>>>>>> fewer reducers if you want. I
tend to recommend a ratio of
> > 2:1
> > > > >>>>>>>>>> mappers
> > > > >>>>>>>> to
> > > > >>>>>>>>>> reducers, depending on the work
flow....
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> As to the disks... no 7200 SATA
III drives are fine. SATA
> > III
> > > > >>>>>>>>>> interface
> > > > >>>>>>>> is
> > > > >>>>>>>>>> pretty much available in the
new kit being shipped.
> > > > >>>>>>>>>> Its just that you don't have
enough drives. 8 cores should
> > be
> > > 8
> > > > >>>>>>>> spindles if
> > > > >>>>>>>>>> available.
> > > > >>>>>>>>>> Otherwise you end up seeing your
CPU load climb on wait
> > states
> > > > as
> > > > >>>>>>>>>> the
> > > > >>>>>>>>>> processes wait for the disk i/o
to catch up.
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> I mean you could build out a
cluster w 4 x 3 3.5" 2TB
> drives
> > > in
> > > > a
> > > > >>>>>>>>>> 1
> > > > >>>>>>>>>> U
> > > > >>>>>>>>>> chassis based on price. You're
making a trade off and you
> > > should
> > > > >>>>>>>>>> be
> > > > >>>>>>>> aware of
> > > > >>>>>>>>>> the performance hit you will
take.
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> HTH
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> -Mike
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> On Nov 27, 2012, at 1:52 PM,
Jean-Marc Spaggiari <
> > > > >>>>>>>> jean-marc@spaggiari.org>
> > > > >>>>>>>>>> wrote:
> > > > >>>>>>>>>>
> > > > >>>>>>>>>>> Hi Michael,
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> so are you recommanding 32Gb
per node?
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> What about the disks? SATA
drives are to slow?
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> JM
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> 2012/11/26, Michael Segel
<michael_segel@hotmail.com>:
> > > > >>>>>>>>>>>> Uhm, those specs are
actually now out of date.
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> If you're running HBase,
or want to also run R on top of
> > > > Hadoop,
> > > > >>>>>>>>>>>> you
> > > > >>>>>>>>>>>> will
> > > > >>>>>>>>>>>> need to add more memory.
> > > > >>>>>>>>>>>> Also forget 1GBe got
10GBe,  and w 2 SATA drives, you
> will
> > > be
> > > > >>>>>>>>>>>> disk
> > > > >>>>>>>>>>>> i/o
> > > > >>>>>>>>>>>> bound
> > > > >>>>>>>>>>>> way too quickly.
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> On Nov 26, 2012, at 8:05
AM, Marcos Ortiz <
> mlortiz@uci.cu
> > >
> > > > >>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>>> Are you asking about
hardware recommendations?
> > > > >>>>>>>>>>>>> Eric Sammer on his
"Hadoop Operations" book, did a
> great
> > > job
> > > > >>>>>>>>>>>>> about
> > > > >>>>>>>>>>>>> this:
> > > > >>>>>>>>>>>>> For middle size clusters
(until 300 nodes):
> > > > >>>>>>>>>>>>> Processor: A dual
quad-core 2.6 Ghz
> > > > >>>>>>>>>>>>> RAM: 24 GB DDR3
> > > > >>>>>>>>>>>>> Dual 1 Gb Ethernet
NICs
> > > > >>>>>>>>>>>>> a SAS drive controller
> > > > >>>>>>>>>>>>> at least two SATA
II drives in a JBOD configuration
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> The replication factor
depends heavily of the primary
> use
> > > of
> > > > >>>>>>>>>>>>> your
> > > > >>>>>>>>>>>>> cluster.
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> On 11/26/2012 08:53
AM, David Charle wrote:
> > > > >>>>>>>>>>>>>> hi
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>> what's the recommended
nodes for NN, hmaster and zk
> > nodes
> > > > for
> > > > >>>>>>>>>>>>>> a
> > > > >>>>>>>> larger
> > > > >>>>>>>>>>>>>> cluster, lets
say 50-100+
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>> also, what would
be the ideal replication factor for
> > > larger
> > > > >>>>>>>>>>>>>> clusters
> > > > >>>>>>>>>>>>>> when
> > > > >>>>>>>>>>>>>> u have 3-4 racks
?
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>> --
> > > > >>>>>>>>>>>>>> David
> > > > >>>>>>>>>>>>>> 10mo. ANIVERSARIO
DE LA CREACION DE LA UNIVERSIDAD DE
> > LAS
> > > > >>>>>>>>>>>>>> CIENCIAS
> > > > >>>>>>>>>>>>>> INFORMATICAS...
> > > > >>>>>>>>>>>>>> CONECTADOS AL
FUTURO, CONECTADOS A LA REVOLUCION
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>> http://www.uci.cu
> > > > >>>>>>>>>>>>>> http://www.facebook.com/universidad.uci
> > > > >>>>>>>>>>>>>> http://www.flickr.com/photos/universidad_uci
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> --
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> Marcos Luis Ortíz
Valmaseda
> > > > >>>>>>>>>>>>> about.me/marcosortiz
<http://about.me/marcosortiz>
> > > > >>>>>>>>>>>>> @marcosluis2186 <http://twitter.com/marcosluis2186>
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> 10mo. ANIVERSARIO
DE LA CREACION DE LA UNIVERSIDAD DE
> LAS
> > > > >>>>>>>>>>>>> CIENCIAS
> > > > >>>>>>>>>>>>> INFORMATICAS...
> > > > >>>>>>>>>>>>> CONECTADOS AL FUTURO,
CONECTADOS A LA REVOLUCION
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> http://www.uci.cu
> > > > >>>>>>>>>>>>> http://www.facebook.com/universidad.uci
> > > > >>>>>>>>>>>>> http://www.flickr.com/photos/universidad_uci
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>> --
> > > > >>>>>>> Adrien Mogenet
> > > > >>>>>>> 06.59.16.64.22
> > > > >>>>>>> http://www.mogenet.me
> > > > >>>>>>
> > > > >>>>>
> > > > >>>>
> > > > >>>>
> > > > >>>
> > > > >>
> > > > >>
> > > > >
> > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message