hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lars hofhansl <la...@apache.org>
Subject Re: Hbase write stream blocking and any solutions?
Date Mon, 10 Jun 2013 17:46:49 GMT
Thanks Kevin,

great point about the HLog sizing.
That reminds me, I wrote a blog about RegionServer sizing guidelines: http://hadoop-hbase.blogspot.com/2013/01/hbase-region-server-memory-sizing.html
I'll add a comment about the HLog size/number there as well.

The HLogs are by default sized at 95% of the HDFS blocksize and default number of HLogs is
32 (so 1945mb by default).

The overall memstore size per RegionServer is capped at 40% of the heap by default. I think
we could some better auto tuning here.

-- Lars
From: Kevin O'dell <kevin.odell@cloudera.com>
To: "user@hbase.apache.org" <user@hbase.apache.org> 
Cc: lars hofhansl <larsh@apache.org> 
Sent: Monday, June 10, 2013 6:37 AM
Subject: Re: Hbase write stream blocking and any solutions?

Hi Yun,

Sorry for the novel, also I am not fully awake so hopefully all of this makes sense!

  Something like that(peak/off peak settings) does not exist, but a properly sized and tuned
system should not experience this behavior. The blog you linked(great blog btw) covers a
fair portion of it.  You have to look at it on whole.  If you have a peak that you cannot
sustain, you are probably undersized as you need a cluster that can handle peak +10%(arbitrary
number, but to make a point).  I would be curious to take a look at your logs during the
peak time when you are seeing blocking.  

  Just raising the memstore flush size won't help you if you are already flushing at too
small of a size.  For example:

Memstore = 128MB
Total heap 10GB
memstore upper limit is .4 
total heap devoted to memstore is 4GB
100 active regions per region server

4096MB / 100regions = ~41MB MAX per region

In the above case if you raise your memstore flush size to 256MB, then nothing is gained since
our bottle neck was flush size.  The bottle neck was heap based, so we either need to raise
our heap, allocate more to our upper/lower limit, or lower region count.

  Another aspect I look at is Hlog count/size. You want to try to size the total numbers
of HLogs * the size of HLogs to be equal to your memstore flush size so that they roll right
around the same time.  If you don't you will have big Memstores(256/512MB) flush sizes, but
your HLogs will roll and cause premature small flushes.  This will also cause more flushes,
hence more compactions, and can lead to blocking.

  Raising the blocking number, is typically a last resort for me.  I do think 7 is too low
of a number and I usually set systems to 15.  If you just raise this to 100 or even 1000,
you are just masking the issue.  Also if you get too far behind it can fall so far behind
you would not be able to catch up.

  There is also a chance that you are trying to do too much with too little.  Like I said
before, always size your system for your peak loads.

On Sun, Jun 9, 2013 at 10:17 PM, yun peng <pengyunmomo@gmail.com> wrote:

thanks lars for the insights. I guess current hbase may have to block write
>stream even when data write rate does not reach the limit of IO subsystems.
>Blocking happen because of the compaction which is so consuming and has to
>be invoked synchronously (say to keep #hfile < K), then the invocation of
>compaction could block write stream..? (correct me if I am wrong).
>On Sun, Jun 9, 2013 at 7:33 PM, lars hofhansl <larsh@apache.org> wrote:
>> One thing to keep in mind is that this typically happen when you write
>> faster than your IO subsystems can support.
>> For a while HBase will absorb this by buffering in the memstore, but if
>> you sustain the write load something will have to slow down the writers.
>> Granted, this could be done a bit more graceful.
>> -- Lars
>> ________________________________
>> From: yun peng <pengyunmomo@gmail.com>
>> To: user@hbase.apache.org
>> Sent: Sunday, June 9, 2013 6:28 AM
>> Subject: Hbase write stream blocking and any solutions?
>> Hi, All
>> HBase could block the online write operations when there are too many data
>> in memstore (to be more efficient for the potential compaction incurred by
>> this flush when there're many files on disk). This blocking effect is also
>> observed by others (e.g.,
>> http://gbif.blogspot.com/2012/07/optimizing-writes-in-hbase.html).
>> The solution come up with on the above web blog is to increase the Memstore
>> size with fewer # of flushes, and to tolerate bigger # of files on disk (by
>> increasing blockingStoreFiles). This is a kind of HBase tuning towards
>> write intensive workload.
>> My targeted application has dynamical workload which may changes from
>> write-intensive to read-intensive. Also there are peak hours (when blocking
>> is user perceivable and should not be invoked) and offpeak hours (when
>> blocking is tolerable). I am wondering if there is any more intelligent
>> solution (say a clever scheduling policy that blocks only at offpeak hours)
>> exist in the latest HBase version that could minimizes the effect of write
>> stream block?
>> Regards
>> Yun*
>> *

Kevin O'Dell
Systems Engineer, Cloudera   

View raw message