hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Juhani Connolly <juha...@gmail.com>
Subject Re: 0.92 and Read/writes not scaling
Date Tue, 03 Apr 2012 03:02:11 GMT
Jon,

we had a fair few long pauses. Our test tool gave us latency, and we
got a lot of requests taking much longer than they should.
Unfortunately we didn't hold onto our logs from the PerformanceEvaluation runs.

Also I would note that PerformanceEvaluation internally disables
autoFlush, so it does not run into the issues I have described. I
would recommend running some code that has autoWrite set to true to
test this problem.

We've moved our environment back to 0.20.2 as we start testing before
using it in production, so unfortunately we can't run any more tests
on it, sorry :/

On Tue, Apr 3, 2012 at 9:21 AM, Jonathan Hsieh <jon@cloudera.com> wrote:
> Juhani,
>
> Have you looked at any of the logs from your perf runs?  Can you try
> running HBase's performance evaluation with debug comments on?   I'd like
> to know if what I'm seeing is the same as you.
>
> I've started running some of these and have encountered what seems to be
> networking code isssues (SocketTimeoutExceptions, a bunch of delayedAcks in
> ganglia,  and 4x-5x degradation in write's from 0.90 runs to 0.92 runs).
>
> == cmd lines:
> hbase org.apache.hadoop.hbase.PerformanceEvaluation sequentialWrite 1
> hbase org.apache.hadoop.hbase.PerformanceEvaluation randomWrite 1
>
> == in log4j.properties
> log4j.logger.org.apache.hadoop.hbase=DEBUG
>
>
> Jon.
>
>
> On Thu, Mar 29, 2012 at 12:05 AM, Juhani Connolly <juhanic@gmail.com> wrote:
>
>> On Thu, Mar 29, 2012 at 1:10 PM, Stack <stack@duboce.net> wrote:
>> > On Wed, Mar 28, 2012 at 7:36 PM, Juhani Connolly
>> > <juhani_connolly@cyberagent.co.jp> wrote:
>> >> Since we haven't heard anything on expected throughput we're
>> downgrading our
>> >> hdfs back to 0.20.2, I'd be curious to hear how other people do with
>> 0.23
>> >> and the throughput they're getting.
>> >>
>> >
>> > We don't have much experience running on 0.23, I think its fair to
>> > say.  It works but not much more than that can be said.  The sync code
>> > path is different in 0.23 than in 0.20.2 and has had less scrutiny
>> > (When you say 0.20.2, you mean CDH?  Which CDH?).  I think its good to
>> > go back.
>>
>> Thanks for the info on 0.23. I suspect that the change in sync you
>> mentioned may well have something to do with this, since decreasing
>> the frequency of appends through the use of a moderate sized
>> writeBuffer at the client end pays huge dividends(as of course does
>> removing the appends altogether by disabling wal writes). High counts
>> of ungrouped(whether that be by group puts, delayed client flushing or
>> delayed WAL flushing) writes seem to suffer pretty badly under 0.23.
>> We'll be moving back to 0.20.2 as it seems to be much better tested
>> and stressed, likely to the cdh distro(3u3).
>>
>> >
>> > Regards numbers, its hard to compare workloads but if it helps,
>> > looking at our frontend now, its relatively idle doing between
>> > 100-500k hits on 30 machines that are less than yours, less memory,
>> > 10k regions, with a workload that is mostly increments
>> > (read-mostly-from-block-cache/modify/write).
>> >
>>
>> Thanks... It's nice to have a frame of reference to compare against.
>>
>> > Yes, the errors are relatively few but poke around more if you can.
>> > Why are there errors at all?
>> > St.Ack
>>
>> I'm not sure. As has been said, likely unrelated, going to try and
>> figure it out.
>>
>> Thanks,
>>  Juhani
>>
>
>
>
> --
> // Jonathan Hsieh (shay)
> // Software Engineer, Cloudera
> // jon@cloudera.com

Mime
View raw message