hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: 0.92 and Read/writes not scaling
Date Tue, 27 Mar 2012 01:43:37 GMT
Hi Juhani,

I wouldn't have expected CDH4b1 (0.23) to be slower than 0.20 for
writes. They should be around the same speed, or even a little faster
in some cases. That said, I haven't personally run any benchmarks in
several months on this setup. I know our performance/QA team has done
some, so I asked them to take a look. Hopefully we should have some
results soon.

If you can take 10-20 jstacks of the RegionServer and the DN on that
same machine while performing your write workload, that would be
helpful. It's possible we had a regression during some recent
development right before the 4b1 release. If you're feeling
adventurous, you can also try upgrading to CDH4b2 snapshot builds,
which do have a couple of performance improvements/bugfixes that may
help. Drop by #cloudera on IRC and one of us can point you in the
right direction if you're willing to try (though of course the nightly
builds are somewhat volatile and haven't had any QA)


On Mon, Mar 26, 2012 at 10:08 AM, Juhani Connolly <juhanic@gmail.com> wrote:
> On Tue, Mar 27, 2012 at 1:42 AM, Stack <stack@duboce.net> wrote:
>> On Mon, Mar 26, 2012 at 6:58 AM, Matt Corgan <mcorgan@hotpads.com> wrote:
>>> When you increased regions on your previous test, did it start maxing out
>>> CPU?  What improvement did you see?
>> What Matt asks, what is your cluster doing?  What changes do you see
>> when you say, increase size of your batching or as Mat asks, what is
>> the difference when you went from less to more regions?
> None of our hardware is even near its limit. Ganglia rarely has a
> single machine over 25% load, and we have verified io, network, cpu
> and memory all have plenty of breathing space with other tools(top,
> iostat, dstat and others mentioned in the hstack article).
>>> Have you tried increasing the memstore flush size to something like 512MB?
>>>  Maybe you're blocked on flushes.  40,000 (4,000/server) is pretty slow for
>>> a disabled WAL i think, especially with batch size of 10.  If you increase
>>> write batch size to 1000 how much does your write throughput increase?
>> The above sounds like something to try -- upping flush sizes.
>> Are you spending your time compacting all the time?  For kicks try
>> disabling compactions when doing your write tests.  Does it make a
>> difference?  What does ganglia show as hot?  Are you network-bound,
>> io-bound, cpu-bound?
>> Thanks,
>> St.Ack
> The compaction and flush times according to ganglia are pretty short
> and insignificant. I've also been watching the rpcs and past events
> from the html control panel which don't seem to be indicative of a
> problem. However I will try changing the flushes and using bigger
> batches, it might turn up something interesting, thanks.

Todd Lipcon
Software Engineer, Cloudera

View raw message