hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ramkrishna.S.Vasudevan" <ramkrishna.vasude...@huawei.com>
Subject RE: 0.92 and Read/writes not scaling
Date Mon, 19 Mar 2012 11:03:08 GMT
Hi Juhani

Can you tell more on how the regions are balanced? 
Are you overloading only specific region server alone? 

Regards
Ram

> -----Original Message-----
> From: Juhani Connolly [mailto:juhanic@gmail.com]
> Sent: Monday, March 19, 2012 4:11 PM
> To: user@hbase.apache.org
> Subject: 0.92 and Read/writes not scaling
> 
> Hi,
> 
> We're running into a brick wall where our throughput numbers will not
> scale as we increase server counts both using custom inhouse tests and
> ycsb.
> 
> We're using hbase 0.92 on hadoop 0.20.2(we also experience the same
> issues using 0.90 before switching our testing to  this version).
> 
> Our cluster consists of:
> - Namenode and hmaster on separate servers, 24 core, 64gb
> - up to 11 datanode/regionservers. 24 core, 64gb, 4 * 1tb disks(hope
> to get this changed)
> 
> We have adjusted our gc settings, and mslabs:
> 
>   <property>
>     <name>hbase.hregion.memstore.mslab.enabled</name>
>     <value>true</value>
>   </property>
> 
>   <property>
>     <name>hbase.hregion.memstore.mslab.chunksize</name>
>     <value>2097152</value>
>   </property>
> 
>   <property>
>     <name>hbase.hregion.memstore.mslab.max.allocation</name>
>     <value>1024768</value>
>   </property>
> 
> hdfs xceivers is set to 8192
> 
> We've experimented with a variety of handler counts for namenode,
> datanodes and regionservers with no changes in throughput.
> 
> For testing with ycsb, we do the following each time(with nothing else
> using the cluster):
> - truncate test table
> - add a small amount of data, then split the table into 32 regions and
> call balancer from the shell.
> - load 10m rows
> - do a 1:2:7 insert:update:read test with 10million rows (64k/sec)
> - do a 5:5 insert:update test with 10 million rows (23k/sec)
> - do a pure read test with 10 million rows (75k/sec)
> 
> We have observed ganglia, iostat -d -x, iptraf, top, dstat and a
> variety of other diagnostic tools and network/io/cpu/memory as
> bottlenecks seem highly unlikely as none of them are  ever seriously
> taxed. This leave me to assume this is some kind of locking issue?
> Delaying WAL flushes gives a small throughput bump but it doesn't
> scale.
> 
> There also doesn't seem to be many figures around to compare ours to.
> We can get our throughput numbers higher with tricks like not writing
> the WAL or delaying flushes, batching requests, but nothing seems to
> scale with additional slaves.
> Could anyone provide guidance as to what may be preventing throughput
> figures from scaling as we increase our slave count?


Mime
View raw message