hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kireet <kir...@feedly.com>
Subject Re: HBase load problems
Date Thu, 17 Oct 2013 01:26:13 GMT
is there a downside to going to larger regions? It looks like merge is a 
larger operation, so changing the config setting would simply cause 
existing regions to grow to the larger value right? Would the way to 
confirm this be to check compaction rates and also maybe the size of 
store files in hdfs?

On 10/16/13 2:00 PM, Vladimir Rodionov wrote:
> There is a pressure on memstore to flush more frequently and create smaller store files
when you have too many active regions (50 in your case)
> if you have all your settings default you allocate only 40% of heap to  memstores. If
you have say 8GB heap - memstore has 3.2GB or
> You flush size is going to be 3.2/(50* column family number) < 60MB. You create 60
MB store files and you need to run compaction more frequently
> in this case.
> What options do you have:
> 1. Increase heap (if not large already) and/or increase memstore size from default 0.4
(do not forget to decrease block cache)
> 2. Increase region size
> 3. You may play with WAL log size (default is HDFS block size) and with number of WAL
files per Region. Both needs to be increased as well.
> Monitor split/compaction activity (check HBase book how to do this)
> Best regards,
> Vladimir Rodionov
> Principal Platform Engineer
> Carrier IQ, www.carrieriq.com
> e-mail: vrodionov@carrieriq.com
> ________________________________________
> From: Kireet [kireet@feedly.com]
> Sent: Wednesday, October 16, 2013 8:47 AM
> To: user@hbase.apache.org
> Subject: HBase load problems
> Over the past couple of months we have seen a significant increase in
> datanode I/O load in our cluster, an increase of 100% in disk read/write
> rates while our application requests have increased by a much smaller
> amount, perhaps 5-10%. The read/write rate has been increasing gradually
> over time.
> The data size of our cluster has increased quite a bit. In particular we
> have one table that is keyed by randomized timestamp (random bytes +
> timestamp). It has grown at about 40GB/day (before replication) with an
> average row size of about 1KB in a single column. It makes up about 80%
> of our total data size and is at about 50 regions per data node. Our
> first guess is the issue has something to do with this table since it
> dominates the cluster data size.
> We are considering splitting the table into multiple tables organized by
> timestamp. 90% or more of reads/writes are for recent data, so our
> thinking is we could keep the "most recent data" table much smaller by
> doing this and perhaps make it easier for hbase to optimize things.
> E.g., compactions would be quicker and perhaps the block cache would
> become more effective as each block would have recent data instead of a
> continually decreasing fraction.
> However, this would be a big code change and we would like to confirm
> as much as possible that this is the true problem. What are the key
> metrics we should look at for confirmation?
> Also we don't have short circuit reads enabled at the moment. We have
> seen articles on the web claiming big improvements in some cases but no
> change in others. Are there particular characteristics of systems that
> will see big improvements when this setting is enabled?
> Confidentiality Notice:  The information contained in this message, including any attachments
hereto, may be confidential and is intended to be read only by the individual or entity to
whom this message is addressed. If the reader of this message is not the intended recipient
or an agent or designee of the intended recipient, please note that any review, use, disclosure
or distribution of this message or its attachments, in any form, is strictly prohibited. 
If you have received this message in error, please immediately notify the sender and/or Notifications@carrieriq.com
and delete or destroy any copy of this message and its attachments.

View raw message