hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vladimir Rodionov <vrodio...@carrieriq.com>
Subject RE: HBase Random Read latency > 100ms
Date Tue, 08 Oct 2013 01:45:05 GMT
Ramu,

If your working set of data fits into 192GB you may get additional boost by utilizing OS page
cache, or wait until
0.98 release which introduces new bucket cache implementation (port of Facebook L2 cache).
You can try vanilla bucket cache in 0.96 (not released yet
but is due soon). Both caches stores data off-heap, but Facebook version can store encoded
and compressed data and vanilla bucket cache does not.
There are some options how to utilize efficiently available RAM (at least in upcoming HBase
releases)
. If your data set does not fit RAM then your only hope is your 24 SAS drives. Depending on
your RAID settings, disk IO perf, HDFS configuration (I think the latest Hadoop is preferable
here).

OS page cache is most vulnerable and volatile, it can not be controlled and can be easily
polluted by either some other processes or by HBase itself (long scan).
With Block cache you have more control but the first truly usable *official* implementation
is going to be a part of 0.98 release.

As far as I understand, your use case would definitely covered by something similar to BigTable
ScanCache (RowCache) , but there is no such cache in HBase yet.
One major advantage of RowCache vs BlockCache (apart from being much more efficient in RAM
usage) is resilience to Region compactions. Each minor Region compaction invalidates partially
Region's data in BlockCache and major compaction invalidates this Region's data completely.
This is not the case with RowCache (would it be implemented).

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodionov@carrieriq.com

________________________________________
From: Ramu M S [ramu.malur@gmail.com]
Sent: Monday, October 07, 2013 5:25 PM
To: user@hbase.apache.org
Subject: Re: HBase Random Read latency > 100ms

Vladimir,

Yes. I am fully aware of the HDD limitation and wrong configurations wrt
RAID.
Unfortunately, the hardware is leased from others for this work and I
wasn't consulted to decide the h/w specification for the tests that I am
doing now. Even the RAID cannot be turned off or set to RAID-0

Production system is according to the Hadoop needs (100 Nodes with 16 Core
CPU, 192 GB RAM, 24 X 600GB SAS Drives, RAID cannot be completely turned
off, so we are creating 1 Virtual Disk containing only 1 Physical Disk and
the VD RAID level set to* *RAID-0). These systems are still not available. If
you have any suggestion on the production setup, I will be glad to hear.

Also, as pointed out earlier, we are planning to use HBase also as an in
memory KV store to access the latest data.
That's why RAM was considered huge in this configuration. But looks like we
would run into more problems than any gains from this.

Keeping that aside, I was trying to get the maximum out of the current
cluster or as you said Is 500-1000 OPS the max I could get out of this
setup?

Regards,
Ramu



Confidentiality Notice:  The information contained in this message, including any attachments
hereto, may be confidential and is intended to be read only by the individual or entity to
whom this message is addressed. If the reader of this message is not the intended recipient
or an agent or designee of the intended recipient, please note that any review, use, disclosure
or distribution of this message or its attachments, in any form, is strictly prohibited. 
If you have received this message in error, please immediately notify the sender and/or Notifications@carrieriq.com
and delete or destroy any copy of this message and its attachments.

Mime
View raw message