hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Juhani Connolly <juha...@gmail.com>
Subject Re: 0.92 and Read/writes not scaling
Date Mon, 26 Mar 2012 16:59:24 GMT
On Mon, Mar 26, 2012 at 11:21 PM, Mikael Sitruk <mikael.sitruk@gmail.com> wrote:
> Juhani hi
>
> By storefile behavior i meant that you look to the metrics and check the
> number of store file over time and see if the you are bounded or the files
> increase and decrease all the time. if this is not the case (and the number
> of store file increase all the time), hbase will throttle the requests.
> 128-256 bytes each request grouped in 10 is not much data, i have a data
> set where each request is approx 4K and have insert time of 7-10 ms.
> Do you see this latency problem on insert during all the test or at some
> time?
>

I'm aware that the writes are small, even batched. So long as there
are not too many threads, we were getting very fast writes(3-4ms),
though as we increase the thread count, they slow with them(even if
the clients are on two separate servers). There is a fixed relation
between threadcount and latency, with a seeming hard limit on how many
writes will go through(which with hdfs 0.23 is terrible at 14000 over
11 servers).
I suspected this might be something along the lines of insufficient
connections to the datanodes or something and tried increasing RPC
threads on master, namenode, datanode and regionservers, nothing
changed this.
We have also run the tests on significantly larger pieces of data.
Running with 16kb inserts, the throughput drops a little but far from
being a linear drop with the payload size... An increase from 256
bytes to 16kb didn't result in more than say a halving of speed(this
is from memory, I'll recheck the figures tomorrow from my notes)

> Did you check your network latency?
Network throughput with iperf is far beyond the throughput that our
setup is pushing at the moment. I'll have to check the latency, but on
single puts, it is 2-3ms.

> BTW batch is not supported by ycsb, so when you mean a set of 10 put you
> mean the table buffer? in my test it is disabled.
>

We did a lot of our tests on ycsb(which incidentally has good write
throughput for the load because it disables instant wal flushing),
using the run mode the throughput is miserable. I wrote a separate
test tool for trying out writes without the wal, multiputs and
switching off wal flushing and flushing it manually in the program(and
whatever other test took our fancy). Writes that weren't sent to the
wall are much much faster(on a table with only one region normal
writes were about 3000/s, wal-less writes, 40k/s)

I can only guess there is something really weird going on when writing
the wal(threads locking each other out?).

> Mikael.S
>
>
> On Mon, Mar 26, 2012 at 6:58 AM, Matt Corgan <mcorgan@hotpads.com> wrote:
>
>> When you increased regions on your previous test, did it start maxing out
>> CPU?  What improvement did you see?
>>
>> Have you tried increasing the memstore flush size to something like 512MB?
>>  Maybe you're blocked on flushes.  40,000 (4,000/server) is pretty slow for
>> a disabled WAL i think, especially with batch size of 10.  If you increase
>> write batch size to 1000 how much does your write throughput increase?
>>
>>
>> On Fri, Mar 23, 2012 at 3:48 AM, Juhani Connolly <juhanic@gmail.com>
>> wrote:
>>
>> > Also, the latency on requests is extremely long. If we group them into
>> > sets of 10 puts(128-256 bytes each) before flushing the client table,
>> > latency is over 1 second.
>> >
>> > We get entries like this in our logs:
>> > 22:17:51,010 WARN org.apache.hadoop.ipc.HBaseServer:
>> > (responseTooSlow):
>> >
>> >
>> {"processingtimems":16692,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@65312e3b
>> > ),
>> > rpc version=1, client version=29,
>> > methodsFingerPrint=54742778","client":"10.172.109.3:42725
>> >
>> ","starttimems":1332335854317,"queuetimems":6387,"class":"HRegionServer","responsesize":0,"method":"multi"}
>> >
>> > Any suggestions as to where we should be digging?
>> >
>> > On Fri, Mar 23, 2012 at 4:40 PM, Juhani Connolly <juhanic@gmail.com>
>> > wrote:
>> > > Status update:
>> > >
>> > > - We moved to cdh 4b1, so hbase 0.92 and hdfs 0.23(until now we were
>> > > using 0.20.2 series)
>> > > - Did the tests now with 256/512 regions, the numbers do appear to
>> > > scale which is good.
>> > >
>> > > BUT, our write throughput has gone in the dump. If we disable wal
>> > > writes, we still get nearly 40,000 a second, but with it on, we're
>> > > lucky to get more than 12,000. Before we were getting as high as
>> > > 70,000 grouping puts together. Have set up log collection, and am not
>> > > finding anything unusual in the logs.
>> > >
>> > > Mikael: One of the tests is the ycsb one where we just let it choose
>> > > the size. Our own custom test has a configurable size, but we have
>> > > been testing with entries that are 128-256 bytes per entry, as this is
>> > > what we expect in our application. What exactly should we be looking
>> > > at with the storefiles?
>> > >
>> > > On Wed, Mar 21, 2012 at 2:29 PM, Mikael Sitruk <
>> mikael.sitruk@gmail.com>
>> > wrote:
>> > >> Juhani,
>> > >> Can you look at the storefiles and tell how they behave during the
>> test?
>> > >> What is the size of the data you insert/update?
>> > >> Mikael
>> > >> On Mar 20, 2012 8:10 PM, "Juhani Connolly" <juhanic@gmail.com>
wrote:
>> > >>
>> > >>> Hi Matt,
>> > >>>
>> > >>> this is something we haven't tested much, we were always running
with
>> > >>> about 32 regions which gave enough coverage for an even spread
over
>> > >>> all machines.
>> > >>> I will run our tests with enough regions per server to cover all
>> cores
>> > >>> and get back to the ml
>> > >>>
>> > >>> On Tue, Mar 20, 2012 at 1:55 AM, Matt Corgan <mcorgan@hotpads.com>
>> > wrote:
>> > >>> > I'd be curious to see what happens if you split the table
into 1
>> > region
>> > >>> per
>> > >>> > CPU core, so 24 cores * 11 servers = 264 regions.  Each region
has
>> 1
>> > >>> > memstore which is a ConcurrentSkipListMap, and you're currently
>> > hitting
>> > >>> > each CSLM with 8 cores which might be too contentious.  Normally
in
>> > >>> > production you would want multiple memstores per CPU core.
>> > >>> >
>> > >>> >
>> > >>> > On Mon, Mar 19, 2012 at 5:31 AM, Juhani Connolly <
>> juhanic@gmail.com>
>> > >>> wrote:
>> > >>> >
>> > >>> >> Actually we did try running off two machines both running
our own
>> > >>> >> tests in parallel. Unfortunately the results were a split
that
>> > results
>> > >>> >> in the same total throughput. We also did the same thing
with
>> iperf
>> > >>> >> running from each machine to another machine, indicating
800Mb
>> > >>> >> additional throughput between each pair of machines.
>> > >>> >> However we didn't try these tests very thoroughly so I
will
>> revisit
>> > >>> >> them as soon as I get back to the office, thanks.
>> > >>> >>
>> > >>> >> On Mon, Mar 19, 2012 at 9:21 PM, Christian Schäfer <
>> > >>> syrious3000@yahoo.de>
>> > >>> >> wrote:
>> > >>> >> > referring to my experiences I expect the client to
be the
>> > bottleneck,
>> > >>> >> too.
>> > >>> >> >
>> > >>> >> > So try to increase the count of client-machines (not
client
>> > threads)
>> > >>> >> each with its own unshared network interface.
>> > >>> >> >
>> > >>> >> > In my case I could double write throughput by doubling
client
>> > machine
>> > >>> >> count with a much smaller system than yours (5 machines,
4gigs RAM
>> > >>> each).
>> > >>> >> >
>> > >>> >> > Good Luck
>> > >>> >> > Chris
>> > >>> >> >
>> > >>> >> >
>> > >>> >> >
>> > >>> >> > ________________________________
>> > >>> >> >  Von: Juhani Connolly <juhanic@gmail.com>
>> > >>> >> > An: user@hbase.apache.org
>> > >>> >> > Gesendet: 13:02 Montag, 19.März 2012
>> > >>> >> > Betreff: Re: 0.92 and Read/writes not scaling
>> > >>> >> >
>> > >>> >> > I was concerned that may be the case too, which is
why we ran
>> the
>> > ycsb
>> > >>> >> > tests in addition to our application specific and
general
>> > performance
>> > >>> >> > tests. checking profiles of the execution just showed
the vast
>> > >>> majority
>> > >>> >> of
>> > >>> >> > time spent waiting for responses. these were all
run with 400
>> > >>> >> > threads(though we tried more/less just in case)
>> > >>> >> > 2012/03/19 20:57 "Mingjian Deng" <koven2049@gmail.com>:
>> > >>> >> >
>> > >>> >> >> @Juhani:
>> > >>> >> >> How many clients did you test? Maybe the bottleneck
was client?
>> > >>> >> >>
>> > >>> >> >> 2012/3/19 Ramkrishna.S.Vasudevan <
>> > ramkrishna.vasudevan@huawei.com>
>> > >>> >> >>
>> > >>> >> >> > Hi Juhani
>> > >>> >> >> >
>> > >>> >> >> > Can you tell more on how the regions are
balanced?
>> > >>> >> >> > Are you overloading only specific region
server alone?
>> > >>> >> >> >
>> > >>> >> >> > Regards
>> > >>> >> >> > Ram
>> > >>> >> >> >
>> > >>> >> >> > > -----Original Message-----
>> > >>> >> >> > > From: Juhani Connolly [mailto:juhanic@gmail.com]
>> > >>> >> >> > > Sent: Monday, March 19, 2012 4:11 PM
>> > >>> >> >> > > To: user@hbase.apache.org
>> > >>> >> >> > > Subject: 0.92 and Read/writes not scaling
>> > >>> >> >> > >
>> > >>> >> >> > > Hi,
>> > >>> >> >> > >
>> > >>> >> >> > > We're running into a brick wall where
our throughput
>> numbers
>> > will
>> > >>> >> not
>> > >>> >> >> > > scale as we increase server counts
both using custom
>> inhouse
>> > >>> tests
>> > >>> >> and
>> > >>> >> >> > > ycsb.
>> > >>> >> >> > >
>> > >>> >> >> > > We're using hbase 0.92 on hadoop 0.20.2(we
also experience
>> > the
>> > >>> same
>> > >>> >> >> > > issues using 0.90 before switching
our testing to  this
>> > version).
>> > >>> >> >> > >
>> > >>> >> >> > > Our cluster consists of:
>> > >>> >> >> > > - Namenode and hmaster on separate
servers, 24 core, 64gb
>> > >>> >> >> > > - up to 11 datanode/regionservers.
24 core, 64gb, 4 * 1tb
>> > >>> disks(hope
>> > >>> >> >> > > to get this changed)
>> > >>> >> >> > >
>> > >>> >> >> > > We have adjusted our gc settings, and
mslabs:
>> > >>> >> >> > >
>> > >>> >> >> > >   <property>
>> > >>> >> >> > >     <name>hbase.hregion.memstore.mslab.enabled</name>
>> > >>> >> >> > >     <value>true</value>
>> > >>> >> >> > >   </property>
>> > >>> >> >> > >
>> > >>> >> >> > >   <property>
>> > >>> >> >> > >     <name>hbase.hregion.memstore.mslab.chunksize</name>
>> > >>> >> >> > >     <value>2097152</value>
>> > >>> >> >> > >   </property>
>> > >>> >> >> > >
>> > >>> >> >> > >   <property>
>> > >>> >> >> > >
>> <name>hbase.hregion.memstore.mslab.max.allocation</name>
>> > >>> >> >> > >     <value>1024768</value>
>> > >>> >> >> > >   </property>
>> > >>> >> >> > >
>> > >>> >> >> > > hdfs xceivers is set to 8192
>> > >>> >> >> > >
>> > >>> >> >> > > We've experimented with a variety of
handler counts for
>> > namenode,
>> > >>> >> >> > > datanodes and regionservers with no
changes in throughput.
>> > >>> >> >> > >
>> > >>> >> >> > > For testing with ycsb, we do the following
each time(with
>> > nothing
>> > >>> >> else
>> > >>> >> >> > > using the cluster):
>> > >>> >> >> > > - truncate test table
>> > >>> >> >> > > - add a small amount of data, then
split the table into 32
>> > >>> regions
>> > >>> >> and
>> > >>> >> >> > > call balancer from the shell.
>> > >>> >> >> > > - load 10m rows
>> > >>> >> >> > > - do a 1:2:7 insert:update:read test
with 10million rows
>> > >>> (64k/sec)
>> > >>> >> >> > > - do a 5:5 insert:update test with
10 million rows
>> (23k/sec)
>> > >>> >> >> > > - do a pure read test with 10 million
rows (75k/sec)
>> > >>> >> >> > >
>> > >>> >> >> > > We have observed ganglia, iostat -d
-x, iptraf, top, dstat
>> > and a
>> > >>> >> >> > > variety of other diagnostic tools and
network/io/cpu/memory
>> > as
>> > >>> >> >> > > bottlenecks seem highly unlikely as
none of them are  ever
>> > >>> seriously
>> > >>> >> >> > > taxed. This leave me to assume this
is some kind of locking
>> > >>> issue?
>> > >>> >> >> > > Delaying WAL flushes gives a small
throughput bump but it
>> > doesn't
>> > >>> >> >> > > scale.
>> > >>> >> >> > >
>> > >>> >> >> > > There also doesn't seem to be many
figures around to
>> compare
>> > ours
>> > >>> >> to.
>> > >>> >> >> > > We can get our throughput numbers higher
with tricks like
>> not
>> > >>> >> writing
>> > >>> >> >> > > the WAL or delaying flushes, batching
requests, but nothing
>> > >>> seems to
>> > >>> >> >> > > scale with additional slaves.
>> > >>> >> >> > > Could anyone provide guidance as to
what may be preventing
>> > >>> >> throughput
>> > >>> >> >> > > figures from scaling as we increase
our slave count?
>> > >>> >> >> >
>> > >>> >> >> >
>> > >>> >> >>
>> > >>> >>
>> > >>>
>> >
>>
>
>
>
> --
> Mikael.S

Mime
View raw message