Mailing-List: contact user-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hbase.apache.org
Received-SPF: softfail (athena.apache.org: transitioning domain of
 leipzig3@gmail.com does not designate 216.139.250.139 as permitted sender)
Date: Fri, 31 Jan 2014 22:31:43 -0800 (PST)
From: Jan Schellenberger <leipzig3@gmail.com>
To: user@hbase.apache.org
Message-ID: <1391236303870-4055564.post@n3.nabble.com>
In-Reply-To: 
 <DC5EBE7F3610EB4CA5C7E92D76873E1518629B58D3@exchange2007.carrieriq.com>
References: <1391209929367-4055545.post@n3.nabble.com>
 <1391232345.78878.YahooMailNeo@web140603.mail.bf1.yahoo.com>
 <DC5EBE7F3610EB4CA5C7E92D76873E1518629B58D3@exchange2007.carrieriq.com>
Subject: RE: Slow Get Performance (or how many disk I/O does it take for one
 non-cached read?)
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

A lot of useful information here...

I disabled bloom filters
I changed to gz compression (compressed files significantly)

I'm now seeing about *80gets/sec/server* which is a pretty good improvement. 
Since I estimate that the server is capable of about 300-350 hard disk
operations/second, that's about 4 hard disk operations/get.

I will experiment with the BLOCKSIZE next.  Unfortunately upgrading our
system to a newer HBASE/Hadoop is tricky for various IT/regulation reasons
but I'll ask to upgrade.  From what I see, even Cloudera 4.5.0 still comes
with HBase 94.6


I also restarted the regionservers and am now getting
blockCacheHitCachingRatio=51% and blockCacheHitRatio=51%.  
So conceivably, I could be hitting the: 
root index (cache hit)
block index (cache hit)
load on average 2 blocks to get data (cache misses most likely as my total
heap space is 1/7 the compressed dataset)
That would be about 52% cache hit overall and if each data access requires 2
Hard Drive reads (data + checksum) then that would explain my throughput.
It still seems high but probably within the realm of reason.

Does HBase always read a full block (the 64k HFile block, not the HDFS
block) at a time or can it just jump to a particular location within the
block?


--
View this message in context: http://apache-hbase.679495.n3.nabble.com/Slow-Get-Performance-or-how-many-disk-I-O-does-it-take-for-one-non-cached-read-tp4055545p4055564.html
Sent from the HBase User mailing list archive at Nabble.com.