hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Purtell <apurt...@apache.org>
Subject Re: Performance oddity between AWS instance sizes
Date Thu, 18 Sep 2014 22:02:01 GMT
1 GB heap is nowhere enough to run if you're tying to test something
real (or approximate it with YCSB). Try 4 or 8, anything up to 31 GB,
use case dependent. >= 32 GB gives away compressed OOPs and maybe GC
issues.

Also, I recently redid the HBase YCSB client in a modern way for >=
0.98. See https://github.com/apurtell/YCSB/tree/new_hbase_client . It
performs in an IMHO more useful fashion than the previous for what
YCSB is intended, but might need some tuning (haven't tried it on a
cluster of significant size). One difference you should see is we
won't back up for 30-60 seconds after a bunch of threads flush fat 12+
MB write buffers.

On Thu, Sep 18, 2014 at 2:31 PM, Josh Williams <jwilliams@endpoint.com> wrote:
> Ted,
>
> Stack trace, that's definitely a good idea.  Here's one jstack snapshot
> from the region server while there's no apparent activity going on:
> https://gist.github.com/joshwilliams/4950c1d92382ea7f3160
>
> If it's helpful, this is the YCSB side of the equation right around the
> same time:
> https://gist.github.com/joshwilliams/6fa3623088af9d1446a3
>
>
> And Gary,
>
> As far as the memory configuration, that's a good question.  Looks like
> HBASE_HEAPSIZE isn't set, which I now see has a default of 1GB.  There
> isn't any swap configured, and 12G of the memory on the instance is
> going to file cache, so there's definitely room to spare.
>
> Maybe it'd help if I gave it more room by setting HBASE_HEAPSIZE.
> Couldn't hurt to try that now...
>
> What's strange is running on m3.xlarge, which also has 15G of RAM but
> fewer CPU cores, it runs fine.
>
> Thanks to you both for the insight!
>
> -- Josh
>
>
>
> On Thu, 2014-09-18 at 11:42 -0700, Gary Helmling wrote:
>> What do you have HBASE_HEAPSIZE set to in hbase-env.sh?  Is it
>> possible that you're overcommitting memory and the instance is
>> swapping?  Just a shot in the dark, but I see that the m3.2xlarge
>> instance has 30G of memory vs. 15G for c3.2xlarge.
>>
>> On Wed, Sep 17, 2014 at 3:28 PM, Ted Yu <yuzhihong@gmail.com> wrote:
>> > bq. there's almost no activity on either side
>> >
>> > During this period, can you capture stack trace for the region server and
>> > pastebin the stack ?
>> >
>> > Cheers
>> >
>> > On Wed, Sep 17, 2014 at 3:21 PM, Josh Williams <jwilliams@endpoint.com>
>> > wrote:
>> >
>> >> Hi, everyone.  Here's a strange one, at least to me.
>> >>
>> >> I'm doing some performance profiling, and as a rudimentary test I've
>> >> been using YCSB to drive HBase (originally 0.98.3, recently updated to
>> >> 0.98.6.)  The problem happens on a few different instance sizes, but
>> >> this is probably the closest comparison...
>> >>
>> >> On m3.2xlarge instances, works as expected.
>> >> On c3.2xlarge instances, HBase barely responds at all during workloads
>> >> that involve read activity, falling silent for ~62 second intervals,
>> >> with the YCSB throughput output resembling:
>> >>
>> >>  0 sec: 0 operations;
>> >>  2 sec: 918 operations; 459 current ops/sec; [UPDATE
>> >> AverageLatency(us)=1252778.39] [READ AverageLatency(us)=1034496.26]
>> >>  4 sec: 918 operations; 0 current ops/sec;
>> >>  6 sec: 918 operations; 0 current ops/sec;
>> >> <snip>
>> >>  62 sec: 918 operations; 0 current ops/sec;
>> >>  64 sec: 5302 operations; 2192 current ops/sec; [UPDATE
>> >> AverageLatency(us)=7715321.77] [READ AverageLatency(us)=7117905.56]
>> >>  66 sec: 5302 operations; 0 current ops/sec;
>> >>  68 sec: 5302 operations; 0 current ops/sec;
>> >> (And so on...)
>> >>
>> >> While that happens there's almost no activity on either side, the CPU's
>> >> and disks are idle, no iowait at all.
>> >>
>> >> There isn't much that jumps out at me when digging through the Hadoop
>> >> and HBase logs, except that those 62-second intervals are often (but
>> >> note always) associated with ClosedChannelExceptions in the regionserver
>> >> logs.  But I believe that's just HBase finding that a TCP connection it
>> >> wants to reply on had been closed.
>> >>
>> >> As far as I've seen this happens every time on this or any of the larger
>> >> c3 class of instances, surprisingly.  The m3 instance class sizes all
>> >> seem to work fine.  These are built with a custom AMI that has HBase and
>> >> all installed, and run via a script, so the different instance type
>> >> should be the only difference between them.
>> >>
>> >> Anyone seen anything like this?  Any pointers as to what I could look at
>> >> to help diagnose this odd problem?  Could there be something I'm
>> >> overlooking in the logs?
>> >>
>> >> Thanks!
>> >>
>> >> -- Josh
>> >>
>> >>
>> >>
>
>



-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet
Hein (via Tom White)

Mime
View raw message