hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Williams <jwilli...@endpoint.com>
Subject Re: Performance oddity between AWS instance sizes
Date Thu, 18 Sep 2014 21:31:39 GMT
Ted,

Stack trace, that's definitely a good idea.  Here's one jstack snapshot
from the region server while there's no apparent activity going on:
https://gist.github.com/joshwilliams/4950c1d92382ea7f3160

If it's helpful, this is the YCSB side of the equation right around the
same time:
https://gist.github.com/joshwilliams/6fa3623088af9d1446a3


And Gary,

As far as the memory configuration, that's a good question.  Looks like
HBASE_HEAPSIZE isn't set, which I now see has a default of 1GB.  There
isn't any swap configured, and 12G of the memory on the instance is
going to file cache, so there's definitely room to spare.

Maybe it'd help if I gave it more room by setting HBASE_HEAPSIZE.
Couldn't hurt to try that now...

What's strange is running on m3.xlarge, which also has 15G of RAM but
fewer CPU cores, it runs fine.

Thanks to you both for the insight!

-- Josh



On Thu, 2014-09-18 at 11:42 -0700, Gary Helmling wrote:
> What do you have HBASE_HEAPSIZE set to in hbase-env.sh?  Is it
> possible that you're overcommitting memory and the instance is
> swapping?  Just a shot in the dark, but I see that the m3.2xlarge
> instance has 30G of memory vs. 15G for c3.2xlarge.
> 
> On Wed, Sep 17, 2014 at 3:28 PM, Ted Yu <yuzhihong@gmail.com> wrote:
> > bq. there's almost no activity on either side
> >
> > During this period, can you capture stack trace for the region server and
> > pastebin the stack ?
> >
> > Cheers
> >
> > On Wed, Sep 17, 2014 at 3:21 PM, Josh Williams <jwilliams@endpoint.com>
> > wrote:
> >
> >> Hi, everyone.  Here's a strange one, at least to me.
> >>
> >> I'm doing some performance profiling, and as a rudimentary test I've
> >> been using YCSB to drive HBase (originally 0.98.3, recently updated to
> >> 0.98.6.)  The problem happens on a few different instance sizes, but
> >> this is probably the closest comparison...
> >>
> >> On m3.2xlarge instances, works as expected.
> >> On c3.2xlarge instances, HBase barely responds at all during workloads
> >> that involve read activity, falling silent for ~62 second intervals,
> >> with the YCSB throughput output resembling:
> >>
> >>  0 sec: 0 operations;
> >>  2 sec: 918 operations; 459 current ops/sec; [UPDATE
> >> AverageLatency(us)=1252778.39] [READ AverageLatency(us)=1034496.26]
> >>  4 sec: 918 operations; 0 current ops/sec;
> >>  6 sec: 918 operations; 0 current ops/sec;
> >> <snip>
> >>  62 sec: 918 operations; 0 current ops/sec;
> >>  64 sec: 5302 operations; 2192 current ops/sec; [UPDATE
> >> AverageLatency(us)=7715321.77] [READ AverageLatency(us)=7117905.56]
> >>  66 sec: 5302 operations; 0 current ops/sec;
> >>  68 sec: 5302 operations; 0 current ops/sec;
> >> (And so on...)
> >>
> >> While that happens there's almost no activity on either side, the CPU's
> >> and disks are idle, no iowait at all.
> >>
> >> There isn't much that jumps out at me when digging through the Hadoop
> >> and HBase logs, except that those 62-second intervals are often (but
> >> note always) associated with ClosedChannelExceptions in the regionserver
> >> logs.  But I believe that's just HBase finding that a TCP connection it
> >> wants to reply on had been closed.
> >>
> >> As far as I've seen this happens every time on this or any of the larger
> >> c3 class of instances, surprisingly.  The m3 instance class sizes all
> >> seem to work fine.  These are built with a custom AMI that has HBase and
> >> all installed, and run via a script, so the different instance type
> >> should be the only difference between them.
> >>
> >> Anyone seen anything like this?  Any pointers as to what I could look at
> >> to help diagnose this odd problem?  Could there be something I'm
> >> overlooking in the logs?
> >>
> >> Thanks!
> >>
> >> -- Josh
> >>
> >>
> >>



Mime
View raw message