hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gary Helmling <ghelml...@gmail.com>
Subject Re: Performance oddity between AWS instance sizes
Date Thu, 18 Sep 2014 18:42:31 GMT
What do you have HBASE_HEAPSIZE set to in hbase-env.sh?  Is it
possible that you're overcommitting memory and the instance is
swapping?  Just a shot in the dark, but I see that the m3.2xlarge
instance has 30G of memory vs. 15G for c3.2xlarge.

On Wed, Sep 17, 2014 at 3:28 PM, Ted Yu <yuzhihong@gmail.com> wrote:
> bq. there's almost no activity on either side
>
> During this period, can you capture stack trace for the region server and
> pastebin the stack ?
>
> Cheers
>
> On Wed, Sep 17, 2014 at 3:21 PM, Josh Williams <jwilliams@endpoint.com>
> wrote:
>
>> Hi, everyone.  Here's a strange one, at least to me.
>>
>> I'm doing some performance profiling, and as a rudimentary test I've
>> been using YCSB to drive HBase (originally 0.98.3, recently updated to
>> 0.98.6.)  The problem happens on a few different instance sizes, but
>> this is probably the closest comparison...
>>
>> On m3.2xlarge instances, works as expected.
>> On c3.2xlarge instances, HBase barely responds at all during workloads
>> that involve read activity, falling silent for ~62 second intervals,
>> with the YCSB throughput output resembling:
>>
>>  0 sec: 0 operations;
>>  2 sec: 918 operations; 459 current ops/sec; [UPDATE
>> AverageLatency(us)=1252778.39] [READ AverageLatency(us)=1034496.26]
>>  4 sec: 918 operations; 0 current ops/sec;
>>  6 sec: 918 operations; 0 current ops/sec;
>> <snip>
>>  62 sec: 918 operations; 0 current ops/sec;
>>  64 sec: 5302 operations; 2192 current ops/sec; [UPDATE
>> AverageLatency(us)=7715321.77] [READ AverageLatency(us)=7117905.56]
>>  66 sec: 5302 operations; 0 current ops/sec;
>>  68 sec: 5302 operations; 0 current ops/sec;
>> (And so on...)
>>
>> While that happens there's almost no activity on either side, the CPU's
>> and disks are idle, no iowait at all.
>>
>> There isn't much that jumps out at me when digging through the Hadoop
>> and HBase logs, except that those 62-second intervals are often (but
>> note always) associated with ClosedChannelExceptions in the regionserver
>> logs.  But I believe that's just HBase finding that a TCP connection it
>> wants to reply on had been closed.
>>
>> As far as I've seen this happens every time on this or any of the larger
>> c3 class of instances, surprisingly.  The m3 instance class sizes all
>> seem to work fine.  These are built with a custom AMI that has HBase and
>> all installed, and run via a script, so the different instance type
>> should be the only difference between them.
>>
>> Anyone seen anything like this?  Any pointers as to what I could look at
>> to help diagnose this odd problem?  Could there be something I'm
>> overlooking in the logs?
>>
>> Thanks!
>>
>> -- Josh
>>
>>
>>

Mime
View raw message