hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Juhani Connolly <juha...@gmail.com>
Subject Re: 0.92 and Read/writes not scaling
Date Mon, 26 Mar 2012 16:48:21 GMT
Easiest to answer one mail at a time:

On Mon, Mar 26, 2012 at 10:58 PM, Matt Corgan <mcorgan@hotpads.com> wrote:
> When you increased regions on your previous test, did it start maxing out
> CPU?  What improvement did you see?
>

Once we increased regions, to match cpu count, things started scaling,
not linearly, but there was definite increases in numbers.

> Have you tried increasing the memstore flush size to something like 512MB?
>  Maybe you're blocked on flushes.  40,000 (4,000/server) is pretty slow for
> a disabled WAL i think, especially with batch size of 10.  If you increase
> write batch size to 1000 how much does your write throughput increase?
>

We haven't, I'll give it a shot.
Things are actually much worse now with hdfs 0.23. But our original
scaling problem seems to be sorted by increasing the number of regions
to match the number of cpus. The problem now has shifted to extremely
slow writes(about 14000/s over 11 servers with a full region count).
We may just switch back the hdfs version, but it seems a bit
counterproductive :/




>
> On Fri, Mar 23, 2012 at 3:48 AM, Juhani Connolly <juhanic@gmail.com> wrote:
>
>> Also, the latency on requests is extremely long. If we group them into
>> sets of 10 puts(128-256 bytes each) before flushing the client table,
>> latency is over 1 second.
>>
>> We get entries like this in our logs:
>> 22:17:51,010 WARN org.apache.hadoop.ipc.HBaseServer:
>> (responseTooSlow):
>>
>> {"processingtimems":16692,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@65312e3b
>> ),
>> rpc version=1, client version=29,
>> methodsFingerPrint=54742778","client":"10.172.109.3:42725
>> ","starttimems":1332335854317,"queuetimems":6387,"class":"HRegionServer","responsesize":0,"method":"multi"}
>>
>> Any suggestions as to where we should be digging?
>>
>> On Fri, Mar 23, 2012 at 4:40 PM, Juhani Connolly <juhanic@gmail.com>
>> wrote:
>> > Status update:
>> >
>> > - We moved to cdh 4b1, so hbase 0.92 and hdfs 0.23(until now we were
>> > using 0.20.2 series)
>> > - Did the tests now with 256/512 regions, the numbers do appear to
>> > scale which is good.
>> >
>> > BUT, our write throughput has gone in the dump. If we disable wal
>> > writes, we still get nearly 40,000 a second, but with it on, we're
>> > lucky to get more than 12,000. Before we were getting as high as
>> > 70,000 grouping puts together. Have set up log collection, and am not
>> > finding anything unusual in the logs.
>> >
>> > Mikael: One of the tests is the ycsb one where we just let it choose
>> > the size. Our own custom test has a configurable size, but we have
>> > been testing with entries that are 128-256 bytes per entry, as this is
>> > what we expect in our application. What exactly should we be looking
>> > at with the storefiles?
>> >
>> > On Wed, Mar 21, 2012 at 2:29 PM, Mikael Sitruk <mikael.sitruk@gmail.com>
>> wrote:
>> >> Juhani,
>> >> Can you look at the storefiles and tell how they behave during the test?
>> >> What is the size of the data you insert/update?
>> >> Mikael
>> >> On Mar 20, 2012 8:10 PM, "Juhani Connolly" <juhanic@gmail.com> wrote:
>> >>
>> >>> Hi Matt,
>> >>>
>> >>> this is something we haven't tested much, we were always running with
>> >>> about 32 regions which gave enough coverage for an even spread over
>> >>> all machines.
>> >>> I will run our tests with enough regions per server to cover all cores
>> >>> and get back to the ml
>> >>>
>> >>> On Tue, Mar 20, 2012 at 1:55 AM, Matt Corgan <mcorgan@hotpads.com>
>> wrote:
>> >>> > I'd be curious to see what happens if you split the table into
1
>> region
>> >>> per
>> >>> > CPU core, so 24 cores * 11 servers = 264 regions.  Each region
has 1
>> >>> > memstore which is a ConcurrentSkipListMap, and you're currently
>> hitting
>> >>> > each CSLM with 8 cores which might be too contentious.  Normally
in
>> >>> > production you would want multiple memstores per CPU core.
>> >>> >
>> >>> >
>> >>> > On Mon, Mar 19, 2012 at 5:31 AM, Juhani Connolly <juhanic@gmail.com>
>> >>> wrote:
>> >>> >
>> >>> >> Actually we did try running off two machines both running our
own
>> >>> >> tests in parallel. Unfortunately the results were a split that
>> results
>> >>> >> in the same total throughput. We also did the same thing with
iperf
>> >>> >> running from each machine to another machine, indicating 800Mb
>> >>> >> additional throughput between each pair of machines.
>> >>> >> However we didn't try these tests very thoroughly so I will
revisit
>> >>> >> them as soon as I get back to the office, thanks.
>> >>> >>
>> >>> >> On Mon, Mar 19, 2012 at 9:21 PM, Christian Schäfer <
>> >>> syrious3000@yahoo.de>
>> >>> >> wrote:
>> >>> >> > referring to my experiences I expect the client to be
the
>> bottleneck,
>> >>> >> too.
>> >>> >> >
>> >>> >> > So try to increase the count of client-machines (not client
>> threads)
>> >>> >> each with its own unshared network interface.
>> >>> >> >
>> >>> >> > In my case I could double write throughput by doubling
client
>> machine
>> >>> >> count with a much smaller system than yours (5 machines, 4gigs
RAM
>> >>> each).
>> >>> >> >
>> >>> >> > Good Luck
>> >>> >> > Chris
>> >>> >> >
>> >>> >> >
>> >>> >> >
>> >>> >> > ________________________________
>> >>> >> >  Von: Juhani Connolly <juhanic@gmail.com>
>> >>> >> > An: user@hbase.apache.org
>> >>> >> > Gesendet: 13:02 Montag, 19.März 2012
>> >>> >> > Betreff: Re: 0.92 and Read/writes not scaling
>> >>> >> >
>> >>> >> > I was concerned that may be the case too, which is why
we ran the
>> ycsb
>> >>> >> > tests in addition to our application specific and general
>> performance
>> >>> >> > tests. checking profiles of the execution just showed
the vast
>> >>> majority
>> >>> >> of
>> >>> >> > time spent waiting for responses. these were all run with
400
>> >>> >> > threads(though we tried more/less just in case)
>> >>> >> > 2012/03/19 20:57 "Mingjian Deng" <koven2049@gmail.com>:
>> >>> >> >
>> >>> >> >> @Juhani:
>> >>> >> >> How many clients did you test? Maybe the bottleneck
was client?
>> >>> >> >>
>> >>> >> >> 2012/3/19 Ramkrishna.S.Vasudevan <
>> ramkrishna.vasudevan@huawei.com>
>> >>> >> >>
>> >>> >> >> > Hi Juhani
>> >>> >> >> >
>> >>> >> >> > Can you tell more on how the regions are balanced?
>> >>> >> >> > Are you overloading only specific region server
alone?
>> >>> >> >> >
>> >>> >> >> > Regards
>> >>> >> >> > Ram
>> >>> >> >> >
>> >>> >> >> > > -----Original Message-----
>> >>> >> >> > > From: Juhani Connolly [mailto:juhanic@gmail.com]
>> >>> >> >> > > Sent: Monday, March 19, 2012 4:11 PM
>> >>> >> >> > > To: user@hbase.apache.org
>> >>> >> >> > > Subject: 0.92 and Read/writes not scaling
>> >>> >> >> > >
>> >>> >> >> > > Hi,
>> >>> >> >> > >
>> >>> >> >> > > We're running into a brick wall where our
throughput numbers
>> will
>> >>> >> not
>> >>> >> >> > > scale as we increase server counts both
using custom inhouse
>> >>> tests
>> >>> >> and
>> >>> >> >> > > ycsb.
>> >>> >> >> > >
>> >>> >> >> > > We're using hbase 0.92 on hadoop 0.20.2(we
also experience
>> the
>> >>> same
>> >>> >> >> > > issues using 0.90 before switching our testing
to  this
>> version).
>> >>> >> >> > >
>> >>> >> >> > > Our cluster consists of:
>> >>> >> >> > > - Namenode and hmaster on separate servers,
24 core, 64gb
>> >>> >> >> > > - up to 11 datanode/regionservers. 24 core,
64gb, 4 * 1tb
>> >>> disks(hope
>> >>> >> >> > > to get this changed)
>> >>> >> >> > >
>> >>> >> >> > > We have adjusted our gc settings, and mslabs:
>> >>> >> >> > >
>> >>> >> >> > >   <property>
>> >>> >> >> > >     <name>hbase.hregion.memstore.mslab.enabled</name>
>> >>> >> >> > >     <value>true</value>
>> >>> >> >> > >   </property>
>> >>> >> >> > >
>> >>> >> >> > >   <property>
>> >>> >> >> > >     <name>hbase.hregion.memstore.mslab.chunksize</name>
>> >>> >> >> > >     <value>2097152</value>
>> >>> >> >> > >   </property>
>> >>> >> >> > >
>> >>> >> >> > >   <property>
>> >>> >> >> > >     <name>hbase.hregion.memstore.mslab.max.allocation</name>
>> >>> >> >> > >     <value>1024768</value>
>> >>> >> >> > >   </property>
>> >>> >> >> > >
>> >>> >> >> > > hdfs xceivers is set to 8192
>> >>> >> >> > >
>> >>> >> >> > > We've experimented with a variety of handler
counts for
>> namenode,
>> >>> >> >> > > datanodes and regionservers with no changes
in throughput.
>> >>> >> >> > >
>> >>> >> >> > > For testing with ycsb, we do the following
each time(with
>> nothing
>> >>> >> else
>> >>> >> >> > > using the cluster):
>> >>> >> >> > > - truncate test table
>> >>> >> >> > > - add a small amount of data, then split
the table into 32
>> >>> regions
>> >>> >> and
>> >>> >> >> > > call balancer from the shell.
>> >>> >> >> > > - load 10m rows
>> >>> >> >> > > - do a 1:2:7 insert:update:read test with
10million rows
>> >>> (64k/sec)
>> >>> >> >> > > - do a 5:5 insert:update test with 10 million
rows (23k/sec)
>> >>> >> >> > > - do a pure read test with 10 million rows
(75k/sec)
>> >>> >> >> > >
>> >>> >> >> > > We have observed ganglia, iostat -d -x,
iptraf, top, dstat
>> and a
>> >>> >> >> > > variety of other diagnostic tools and network/io/cpu/memory
>> as
>> >>> >> >> > > bottlenecks seem highly unlikely as none
of them are  ever
>> >>> seriously
>> >>> >> >> > > taxed. This leave me to assume this is some
kind of locking
>> >>> issue?
>> >>> >> >> > > Delaying WAL flushes gives a small throughput
bump but it
>> doesn't
>> >>> >> >> > > scale.
>> >>> >> >> > >
>> >>> >> >> > > There also doesn't seem to be many figures
around to compare
>> ours
>> >>> >> to.
>> >>> >> >> > > We can get our throughput numbers higher
with tricks like not
>> >>> >> writing
>> >>> >> >> > > the WAL or delaying flushes, batching requests,
but nothing
>> >>> seems to
>> >>> >> >> > > scale with additional slaves.
>> >>> >> >> > > Could anyone provide guidance as to what
may be preventing
>> >>> >> throughput
>> >>> >> >> > > figures from scaling as we increase our
slave count?
>> >>> >> >> >
>> >>> >> >> >
>> >>> >> >>
>> >>> >>
>> >>>
>>

Mime
View raw message