hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Juhani Connolly <juha...@gmail.com>
Subject 0.92 and Read/writes not scaling
Date Mon, 19 Mar 2012 10:41:12 GMT
Hi,

We're running into a brick wall where our throughput numbers will not
scale as we increase server counts both using custom inhouse tests and
ycsb.

We're using hbase 0.92 on hadoop 0.20.2(we also experience the same
issues using 0.90 before switching our testing to  this version).

Our cluster consists of:
- Namenode and hmaster on separate servers, 24 core, 64gb
- up to 11 datanode/regionservers. 24 core, 64gb, 4 * 1tb disks(hope
to get this changed)

We have adjusted our gc settings, and mslabs:

  <property>
    <name>hbase.hregion.memstore.mslab.enabled</name>
    <value>true</value>
  </property>

  <property>
    <name>hbase.hregion.memstore.mslab.chunksize</name>
    <value>2097152</value>
  </property>

  <property>
    <name>hbase.hregion.memstore.mslab.max.allocation</name>
    <value>1024768</value>
  </property>

hdfs xceivers is set to 8192

We've experimented with a variety of handler counts for namenode,
datanodes and regionservers with no changes in throughput.

For testing with ycsb, we do the following each time(with nothing else
using the cluster):
- truncate test table
- add a small amount of data, then split the table into 32 regions and
call balancer from the shell.
- load 10m rows
- do a 1:2:7 insert:update:read test with 10million rows (64k/sec)
- do a 5:5 insert:update test with 10 million rows (23k/sec)
- do a pure read test with 10 million rows (75k/sec)

We have observed ganglia, iostat -d -x, iptraf, top, dstat and a
variety of other diagnostic tools and network/io/cpu/memory as
bottlenecks seem highly unlikely as none of them are  ever seriously
taxed. This leave me to assume this is some kind of locking issue?
Delaying WAL flushes gives a small throughput bump but it doesn't
scale.

There also doesn't seem to be many figures around to compare ours to.
We can get our throughput numbers higher with tricks like not writing
the WAL or delaying flushes, batching requests, but nothing seems to
scale with additional slaves.
Could anyone provide guidance as to what may be preventing throughput
figures from scaling as we increase our slave count?

Mime
View raw message