hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: REST servers locked up on single RS malfunction.
Date Mon, 25 Apr 2011 20:15:52 GMT
There's a good chance that if the region server started getting slow,
the requests from the REST servers would start piling up in the queues
and finally blow out the memory. You could confirm that by looking at
the GC logs before the OOME.

Also when it died, it should a dumped a hprof file. If you have that
file (should be a few GBs), it would be possible to tell what was
using all that space.

It would be interesting to see what happened in the logs before that,
including the metrics dump, might give us a clue.

Once we have a better understanding of what happened, we could look
into finding the right solution.

J-D

On Mon, Apr 25, 2011 at 1:04 PM, Jack Levin <magnito@gmail.com> wrote:
> thats a separate cluster, its barely getting any traffic so I don't
> think queue would be an issue.   We do however have very large files
> stored (file per row).  So question is, if this is a GET that breaks
> things, how can we avoid it?
>
> -Jack
>
> On Mon, Apr 25, 2011 at 10:37 AM, Jean-Daniel Cryans
> <jdcryans@apache.org> wrote:
>> Can't tell what it was because it OOME'd while reading whatever was coming in.
>>
>> Did you bump the number of handlers in that cluster too? Because you
>> might hit what we talked about in this jira:
>> https://issues.apache.org/jira/browse/HBASE-3813
>>
>> "Chatting w/ J-D this morning, he asked if the queues hold 'data'. The
>> queues hold 'Calls'. Calls are the client request. They contain data.
>> Jack had 2500 items queued. If each item to insert was 1MB, thats 25k
>> * 1MB of memory that is outside of our generally accounting."
>>
>> So the higher the number of handlers the more memory could be used by
>> the queues.
>>
>> J-D
>>
>> On Mon, Apr 25, 2011 at 10:32 AM, Jack Levin <magnito@gmail.com> wrote:
>>> Stack:
>>>
>>> Exception in thread "pool-1-thread-9" java.lang.OutOfMemoryError: Java
>>> heap space
>>>        at org.apache.hadoop.hbase.ipc.HBaseRPC$Invocation.readFields(HBaseRPC.java:120)
>>>        at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.processData(HBaseServer.java:959)
>>>        at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:927)
>>>        at org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:503)
>>>        at org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:297)
>>>        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>>        at java.lang.Thread.run(Thread.java:619)
>>>
>>> Btw, is this put or read?  Perhaps we are crashing on some sort of large read?
>>>
>>> -Jack
>>>
>>> On Thu, Apr 21, 2011 at 12:47 AM, Jack Levin <magnito@gmail.com> wrote:
>>>> Shouldn't the RS just shutdown then?  Because it stays half alive and
>>>> none of the puts succeed.  Also the oome happen right after
>>>> flush/compaction/split... so clearly the RS was busy, and it could be
>>>> just a matter of hitting Heap ceiling perhaps.
>>>>
>>>> -Jack
>>>>
>>>> On Thu, Apr 21, 2011 at 12:13 AM, Stack <stack@duboce.net> wrote:
>>>>> This looks like a bug.  Elsewhere in the RPC you can register a
>>>>> handler for OOME explicitly and we have a callback up into the
>>>>> regionserver where we will set that the server abort or stop dependent
>>>>> on type of OOME we see.  In this case it looks like on OOME we just
>>>>> throw and the then all the executors fill so no more executors
>>>>> available to process requests (This is my current accessment -- it
>>>>> could be a different one by morning).
>>>>>
>>>>> The root cause would look to be a big put.  Could that be the case.
>>>>>
>>>>> On the naming, that looks to be the default naming of executor threads
>>>>> done by the hosting executorservice.
>>>>>
>>>>> St.Ack
>>>>>
>>>>>
>>>>> On Wed, Apr 20, 2011 at 10:11 PM, Jack Levin <magnito@gmail.com>
wrote:
>>>>>> Hello, with 0.89 HBASE, we see the following, all REST servers get
>>>>>> locked on trying to connect to one of our RS servers, the error in
the
>>>>>> .out file on that Region Server looks like this:
>>>>>>
>>>>>> Exception in thread "pool-1-thread-3" java.lang.OutOfMemoryError:
Java
>>>>>> heap space
>>>>>>        at org.apache.hadoop.hbase.ipc.HBaseRPC$Invocation.readFields(HBaseRPC.java:120)
>>>>>>        at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.processData(HBaseServer.java:959)
>>>>>>        at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:927)
>>>>>>        at org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:503)
>>>>>>        at org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:297)
>>>>>>        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>>>>>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>>>>>        at java.lang.Thread.run(Thread.java:619)
>>>>>>
>>>>>> Question is, how come the region server did not die after this but
>>>>>> just hogged the REST connections?  And what is pool1-thread-3 actually
>>>>>> do?
>>>>>>
>>>>>> -Jack
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message