hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wayne <wav...@gmail.com>
Subject Breaking down an HBase read through thrift
Date Sat, 08 Jan 2011 16:46:32 GMT
I am trying to understand exactly what an HBase read is doing through Thrift
(python) so that we can know what to change to improve our performance (read
latency). We have turned off all cache to make testing consistent.

*Region/Meta Cache
Often times the region list is not "hot" and thrift has to talk to the meta
table. We have 6k+ regions and growing quickly and expect 1k+/node. Can we
help our performance by pre-caching all region locations? How many regions
can thrift keep before over-writing in cache (given default settings)? Does
it make sense to write a program to cycle through all thrift servers and
heat up the region cache after a lot of writes?

*Thrift Logs
*Below is an example of a scan from the thrift logs. I am trying to
understand what is going on and where things can be sped up. The
scannerGetList log below took 43ms which is longer than usual. What would
slow this down, waiting for the region server to respond (there is load on
the cluster for writes)? It took 12ms to be finished with scanning which
makes sense, but then it appears to take almost as long to close the scanner
(11ms). This read took 66ms when in reality it only took 12ms to read the
data from what I can tell. That does not make sense to me. The disk i/o
should be the bulk of the time spent on a read and it appears to be little
of this 66ms. How can we speed up read latency? Faster disks should be the
answer but I am not sure it makes much of a difference here.

2011-01-08 16:28:47,921 DEBUG
org.apache.hadoop.hbase.client.HTable$ClientScanner: Creating scanner over
xxx starting at key '12345'
2011-01-08 16:28:47,921 DEBUG
org.apache.hadoop.hbase.client.HTable$ClientScanner: Advancing internal
scanner to startKey at '12345'
2011-01-08 16:28:47,964 DEBUG
org.apache.hadoop.hbase.thrift.ThriftServer$HBaseHandler: scannerGetList:
2011-01-08 16:28:47,976 DEBUG
org.apache.hadoop.hbase.client.HTable$ClientScanner: Finished with scanning
at REGION => {NAME => ....
2011-01-08 16:28:47,987 DEBUG
org.apache.hadoop.hbase.thrift.ThriftServer$HBaseHandler: scannerClose:


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message