hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vidhyashankar Venkataraman <vidhy...@yahoo-inc.com>
Subject Re: Performance at large number of regions/node
Date Wed, 02 Jun 2010 15:24:28 GMT
         I increased the flush size to 800M.... Now, 2 things have started to happen: (I forgot
to add in my previous mails that the data is uncompressed since the data is chosen at random
anyways and this is the likely size of the compressed data we will be handling)..

    (a) Flush sizes arent reached because the global memstore is hitting the max more often:
I will change the max and min global memstore and see what happens..

    2010-06-02 15:07:28,424 INFO org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Forced
flushing of DocData,68372423,1275489341661 because global memstore limit of 1.2g exceeded;
currently 1.2g and flushing till 1.0g2010-06-02 15:07:28,424 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
Started memstore flush for region DocData,68372423,1275489341661. Current region memstore
size 164.1m
2010-06-02 15:07:29,516 DEBUG org.apache.hadoop.hbase.regionserver.Store: Added hdfs://b5120231.yst.yahoo.net:4600/hbase/
30353376, entries=5700, sequenceid=17022, memsize=87.6m, filesize=87.2m to DocData,68372423,12754893416612010-06-02
15:07:30,345 DEBUG org.apache.hadoop.hbase.regionserver.Store: Added hdfs://b5120231.yst.yahoo.net:4600/hbase/DocData/55952207/bigColumn/130333135
3726747525, entries=627000, sequenceid=17022, memsize=76.5m, filesize=29.9m to DocData,68372423,1275489341661
2010-06-02 15:07:30,346 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore
flush of ~164.1m for region DocData,68372423,1275489341661 in 1922ms, sequence id=17022, compaction

  (b) Compactions still keep happening (the flushes happen frequently anyways)..

2010-06-02 15:07:30,346 DEBUG org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction
requested for region DocData,68372423,1275489341661/55952207 because: regionserver/
15:07:30,346 INFO org.apache.hadoop.hbase.regionserver.HRegion: Starting compaction on region
2010-06-02 15:07:30,349 DEBUG org.apache.hadoop.hbase.regionserver.Store: Compaction size
of CONTENT: 1.9g; Skipped 1 file(s), size: 13629604912010-06-02 15:07:30,349 DEBUG org.apache.hadoop.hbase.regionserver.Store:
Started compaction of 3 file(s)  into /hbase/DocData/compaction.dir/55952207, seqid
=170222010-06-02 15:07:50,935 DEBUG org.apache.hadoop.hbase.regionserver.Store: Completed
compaction of CONTENT; new storefile is hdfs://b5120231.yst.yahoo.net:460
0/hbase/DocData/55952207/CONTENT/4437999784964187538; store size is 1.9g2010-06-02 15:07:50,937
DEBUG org.apache.hadoop.hbase.regionserver.Store: Compaction size of bigColumn: 643.5m; Skipped
1 file(s), size: 460567753
2010-06-02 15:07:50,937 DEBUG org.apache.hadoop.hbase.regionserver.Store: Started compaction
of 3 file(s)  into /hbase/DocData/compaction.dir/55952207, seqid=17022
2010-06-02 15:08:01,173 DEBUG org.apache.hadoop.hbase.regionserver.Store: Completed compaction
of bigColumn; new storefile is hdfs://b5120231.yst.yahoo.net:4600/hbase/DocData/55952207/bigColumn/8483730575294442675;
store size is 643.5m
2010-06-02 15:08:01,175 INFO org.apache.hadoop.hbase.regionserver.HRegion: compaction completed
on region DocData,68372423,1275489341661 in 30sec2010-06-02 15:08:15,825 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache:
Cache Stats: Sizes: Total=4.2187347MB (4423664), Free=895.61255MB (939117840),

   This was a reply that I sent to Jonathan but not to the entire group:

        Forgot to add this: Do I increase the hbase.hstore.compactionThreshold (right now,
3) as well? The block multiplier is already 8...

>> What's going on in your logs, especially region servers?  Do you see blocking of
Yes, I do see blocking of updates.. Minor compactions running.. I now realize I had been only
seeing the iowait field (consistently around 2-3%) shown by sar but that wouldn't catch the
actual io performed due to minor compactions (since updates are blocked anyways).. The memory
flush size is around 100M... Do I increase it to around 1 Gig or something?

>> Anything unusual like flushes that aren't from hitting the flush size?
Flushes are mostly hitting the full size of 100M...

>> Are you starting from an empty table?  Are your insertion keys random?
Are you asking to verify if load is balanced? I am running the clients in such a way that
they cover mutually exclusive ranges. And each client inserts rows sequentially in order..
How will it matter if I choose keys at random or sequentially? (I think I am missing something
here).. Region splits happen only when region sizes get too high. But what row range do the
regions get split into after a split? Can you give an example? (I should look into the logs
to see if I can get an answer)

Total heap size is 3 gigs.. Around the max that I can use for 32-bit java..


On 6/1/10 5:10 PM, "Jonathan Gray" <jgray@facebook.com> wrote:

This is significantly lower than the top write speeds I've seen, like an order of magnitude.
 And you are running on 4 disks per node so should be way faster.  One thing to keep in mind
though is HBase does not support concurrent compactions so we don't always fully utilize multi-disk
setups.  Multiple compactions should be included in the next major release.

What's going on in your logs, especially region servers?  Do you see blocking of updates?
 Anything unusual like flushes that aren't from hitting the flush size?

Are you starting from an empty table?  Are your insertion keys random?

Your 1.5MB/sec/node comes from a steady-state insertion load once the table is evenly distributed
across nodes?  How many regions at this time and do you see even or uneven load across RS?

What I remember was being on the order of 1/2 or 1/4 the raw write throughput of the drives,
something in that range though I'm forgetting the details.  There's no architectural reason
not to be in that range or better.  In these calculations, however, all the writes to disk
were being used in the calculation (io used for flushes, compactions, etc).  Your calculation
is based on the actual size of the data, though behind the scenes HBase is writing this multiple

Did you change the MemStore flush size?  You're going to end up doing a ton of compactions
if you are flushing small MemStores but have a big max region size.  The flush size is one
factor.  The total heap on each RS and the number of regions per RS will also impact the sizes
of flushed files.  Each time you do a compaction, you rewrite data, this kills io.

There are lots of changes coming up in the next release.  Follow along HBASE-2375 and related
jiras for the compaction/split/flush improvements being worked on.


> -----Original Message-----
> From: Vidhyashankar Venkataraman [mailto:vidhyash@yahoo-inc.com]
> Sent: Tuesday, June 01, 2010 4:21 PM
> To: user@hbase.apache.org
> Subject: Re: Performance at large number of regions/node
> I have a related question: I tried a simple load experiment too using
> Hbase's Java API.. (The nodes do only loading: nothing else.. The
> client programs generate random data on the fly to load.. So, no reads
> of the input data)..
> 120m rows 15KB each. 2 column families.
> 5 region servers, ran around 4 or 5 clients per node on the 5 nodes
> that run the region servers..
> 2MB block size, 2gigs region size, WAL disabled, auto flush disabled..
> 2MB write buffer.. Major compactions disabled..
> The other configs are quite similar to the configs discussed in this
> thread..
> And I get a throughput of around 1.5 MB per second per node..
> (500 rows per second for the entire cluster)..  Do these values seem
> reasonable?
> Thanks
> Vidhya
> On 5/29/10 6:36 PM, "Jacob Isaac" <jacob@ebrary.com> wrote:
> Hi J-D
> We have 8 drives (~500G per drive - total 4G) per machine
> The metrics from my run indicate that I achieve around
> for writes -
> around 1 row(5k) in 2ms => 500 rows(5K) in 1 sec => 2.5 Mb/sec
> and from your the observation at StumbleUpon
> 200k rows (presuming 100 bytes per row)/sec  => 20Mb/sec
> Wow !! that an order of difference
> I am sure disabling WAL during the writes is giving you a significant
> boost.
> Are you reading the data at the same time as you are writing?
> Thx
> Jacob
> On Fri, May 28, 2010 at 9:04 PM, Jean-Daniel Cryans
> <jdcryans@apache.org> wrote:
> >> What I wanted out of this discussion was to find out whether I am in
> the
> >> ballpark of what I can juice out of HBase or I am way off the mark.
> >>
> >
> > I understand... but this is a distributed system we're talking about.
> > Unless I have the same code, hbase/hadoop version, configuration,
> > number of nodes, cpu, RAM, # of HDDs, OS, network equipment, data
> set,
> > etc... it's really hard to assess right? For starters, I don't think
> > you specified the number of drives you have per machine, and HBase is
> > mostly IO-bound.
> >
> > FWIW, here's our experience. At StumbleUpon, we uploaded our main
> data
> > set consisting of 13B*2 rows on 20 machines (2xi7, 24GB (8 for
> HBase),
> > 4x 1TB JBOD) with MapReduce (using 8 maps per machine) pulling from a
> > MySQL cluster (we were selecting large ranges in batches), inserting
> > at an average rate of 150-200k rows per second, peaks at 1M. Our rows
> > are a few bytes, mostly integers and some text. We did it in the time
> > with HBase 0.20.3 + the parallel-put patch we wrote here (available
> in
> > trunk) with the configuration I pasted previously. For that upload
> the
> > WAL was disabled and ALL our tables are LZOed (can't stress enough
> the
> > importance of compressing your tables!) and 1GB max file size.
> >
> > My guess is yes you can juice it out more, first by using LZO ;)
> >
> > Also, are your machines even stressed during the test? Do you
> monitor?
> > Could you increase the number of clients?
> >
> > Sorry I can't give you a very clear answer, but without using a
> common
> > benchmark to compare numbers we're pretty much all in the dark. YCSB
> > is one, but IIRC it needs some patches to work efficiently (Todd
> > Lipcon from Cloudera has them in his github).
> >
> > J-D
> >

------ End of Forwarded Message

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message