hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ramkrishna.S.Vasudevan" <ramkrishna.vasude...@huawei.com>
Subject RE: 0.92 and Read/writes not scaling
Date Mon, 19 Mar 2012 12:27:55 GMT
Hi

In our experience rather than increasing threads increase the number of
clients.
Increasing the client number has given us better throughput.

Regards
Ram

> -----Original Message-----
> From: Juhani Connolly [mailto:juhanic@gmail.com]
> Sent: Monday, March 19, 2012 5:33 PM
> To: user@hbase.apache.org
> Subject: Re: 0.92 and Read/writes not scaling
> 
> I was concerned that may be the case too, which is why we ran the ycsb
> tests in addition to our application specific and general performance
> tests. checking profiles of the execution just showed the vast majority
> of
> time spent waiting for responses. these were all run with 400
> threads(though we tried more/less just in case)
> 2012/03/19 20:57 "Mingjian Deng" <koven2049@gmail.com>:
> 
> > @Juhani:
> > How many clients did you test? Maybe the bottleneck was client?
> >
> > 2012/3/19 Ramkrishna.S.Vasudevan <ramkrishna.vasudevan@huawei.com>
> >
> > > Hi Juhani
> > >
> > > Can you tell more on how the regions are balanced?
> > > Are you overloading only specific region server alone?
> > >
> > > Regards
> > > Ram
> > >
> > > > -----Original Message-----
> > > > From: Juhani Connolly [mailto:juhanic@gmail.com]
> > > > Sent: Monday, March 19, 2012 4:11 PM
> > > > To: user@hbase.apache.org
> > > > Subject: 0.92 and Read/writes not scaling
> > > >
> > > > Hi,
> > > >
> > > > We're running into a brick wall where our throughput numbers will
> not
> > > > scale as we increase server counts both using custom inhouse
> tests and
> > > > ycsb.
> > > >
> > > > We're using hbase 0.92 on hadoop 0.20.2(we also experience the
> same
> > > > issues using 0.90 before switching our testing to  this version).
> > > >
> > > > Our cluster consists of:
> > > > - Namenode and hmaster on separate servers, 24 core, 64gb
> > > > - up to 11 datanode/regionservers. 24 core, 64gb, 4 * 1tb
> disks(hope
> > > > to get this changed)
> > > >
> > > > We have adjusted our gc settings, and mslabs:
> > > >
> > > >   <property>
> > > >     <name>hbase.hregion.memstore.mslab.enabled</name>
> > > >     <value>true</value>
> > > >   </property>
> > > >
> > > >   <property>
> > > >     <name>hbase.hregion.memstore.mslab.chunksize</name>
> > > >     <value>2097152</value>
> > > >   </property>
> > > >
> > > >   <property>
> > > >     <name>hbase.hregion.memstore.mslab.max.allocation</name>
> > > >     <value>1024768</value>
> > > >   </property>
> > > >
> > > > hdfs xceivers is set to 8192
> > > >
> > > > We've experimented with a variety of handler counts for namenode,
> > > > datanodes and regionservers with no changes in throughput.
> > > >
> > > > For testing with ycsb, we do the following each time(with nothing
> else
> > > > using the cluster):
> > > > - truncate test table
> > > > - add a small amount of data, then split the table into 32
> regions and
> > > > call balancer from the shell.
> > > > - load 10m rows
> > > > - do a 1:2:7 insert:update:read test with 10million rows
> (64k/sec)
> > > > - do a 5:5 insert:update test with 10 million rows (23k/sec)
> > > > - do a pure read test with 10 million rows (75k/sec)
> > > >
> > > > We have observed ganglia, iostat -d -x, iptraf, top, dstat and a
> > > > variety of other diagnostic tools and network/io/cpu/memory as
> > > > bottlenecks seem highly unlikely as none of them are  ever
> seriously
> > > > taxed. This leave me to assume this is some kind of locking
> issue?
> > > > Delaying WAL flushes gives a small throughput bump but it doesn't
> > > > scale.
> > > >
> > > > There also doesn't seem to be many figures around to compare ours
> to.
> > > > We can get our throughput numbers higher with tricks like not
> writing
> > > > the WAL or delaying flushes, batching requests, but nothing seems
> to
> > > > scale with additional slaves.
> > > > Could anyone provide guidance as to what may be preventing
> throughput
> > > > figures from scaling as we increase our slave count?
> > >
> > >
> >


Mime
View raw message