hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lars hofhansl <la...@apache.org>
Subject Re: Slow Get Performance (or how many disk I/O does it take for one non-cached read?)
Date Sun, 02 Feb 2014 04:06:32 GMT
HBase always loads the whole block and then seeks forward in that block until it finds the
KV it  is looking for (there is no indexing inside the block).

Also note that HBase has compression and block encoding. These are different. Compression
compresses the files on disk (at the HDFS level) and not in memory, so it does not help with
your cache size. Encoding is applied at the HBase block level and is retained in the block

I'm really curious as what kind of improvement you see with smaller block size. Remember that
after you change BLOCKSIZE you need to issue a major compaction so that the data is rewritten
into smaller blocks.

We should really document this stuff better.

-- Lars

 From: Jan Schellenberger <leipzig3@gmail.com>
To: user@hbase.apache.org 
Sent: Friday, January 31, 2014 10:31 PM
Subject: RE: Slow Get Performance (or how many disk I/O does it take for one non-cached read?)

A lot of useful information here...

I disabled bloom filters
I changed to gz compression (compressed files significantly)

I'm now seeing about *80gets/sec/server* which is a pretty good improvement. 
Since I estimate that the server is capable of about 300-350 hard disk
operations/second, that's about 4 hard disk operations/get.

I will experiment with the BLOCKSIZE next.  Unfortunately upgrading our
system to a newer HBASE/Hadoop is tricky for various IT/regulation reasons
but I'll ask to upgrade.  From what I see, even Cloudera 4.5.0 still comes
with HBase 94.6

I also restarted the regionservers and am now getting
blockCacheHitCachingRatio=51% and blockCacheHitRatio=51%.  
So conceivably, I could be hitting the: 
root index (cache hit)
block index (cache hit)
load on average 2 blocks to get data (cache misses most likely as my total
heap space is 1/7 the compressed dataset)
That would be about 52% cache hit overall and if each data access requires 2
Hard Drive reads (data + checksum) then that would explain my throughput.
It still seems high but probably within the realm of reason.

Does HBase always read a full block (the 64k HFile block, not the HDFS
block) at a time or can it just jump to a particular location within the

View this message in context: http://apache-hbase.679495.n3.nabble.com/Slow-Get-Performance-or-how-many-disk-I-O-does-it-take-for-one-non-cached-read-tp4055545p4055564.html

Sent from the HBase User mailing list archive at Nabble.com.
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message