hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Varun Sharma <va...@pinterest.com>
Subject Poor HBase random read performance
Date Sat, 29 Jun 2013 19:13:23 GMT

I was doing some tests on how good HBase random reads are. The setup is
consists of a 1 node cluster with dfs replication set to 1. Short circuit
local reads and HBase checksums are enabled. The data set is small enough
to be largely cached in the filesystem cache - 10G on a 60G machine.

Client sends out multi-get operations in batches to 10 and I try to measure

Test #1

All Data was cached in the block cache.

Test Time = 120 seconds
Num Read Ops = 12M

Throughput = 100K per second

Test #2

I disable block cache. But now all the data is in the file system cache. I
verify this by making sure that IOPs on the disk drive are 0 during the
test. I run the same test with batched ops.

Test Time = 120 seconds
Num Read Ops = 0.6M
Throughput = 5K per second

Test #3

I saw all the threads are now stuck in idLock.lockEntry(). So I now run
with the lock disabled and the block cache disabled.

Test Time = 120 seconds
Num Read Ops = 1.2M
Throughput = 10K per second

Test #4

I re enable block cache and this time hack hbase to only cache Index and
Bloom blocks but data blocks come from File System cache.

Test Time = 120 seconds
Num Read Ops = 1.6M
Throughput = 13K per second

So, I wonder how come such a massive drop in throughput. I know that HDFS
code adds tremendous overhead but this seems pretty high to me. I use
0.94.7 and cdh 4.2.0


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message