hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: Avoiding OutOfMemory Java heap space in region servers
Date Tue, 10 Aug 2010 22:40:54 GMT
OOME may manifest in one place but be caused by some other behavior
altogether.  Its an Error.  You can't tell for sure what damage its
done to the running process (Though, in your stack trace, an OOME
during the array copy could likely be because of very large cells).
Rather than let the damaged server continue, HBase is conservative and
shuts itself down to minimize possible dataloss whenever it gets an
OOME (It has kept aside an emergency memory supply that it releases on
OOME so the shutdown can 'complete' successfully).

Are you doing large multiputs?  Do you have lots of handlers running?
If the multiputs are held up because things are running slow, memory
used out on the handlers could throw you over especially if your heap
is small.

What size heap are you running with?

St.Ack



On Tue, Aug 10, 2010 at 3:26 PM, Stuart Smith <stu24mail@yahoo.com> wrote:
> Hello,
>
>   I'm seeing errors like so:
>
> 010-08-10 12:58:38,938 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$ClientZKWatcher:
Got ZooKeeper event, state: Disconnected, type: None, path: null
> 2010-08-10 12:58:38,939 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Got
ZooKeeper event, state: Disconnected, type: None, path: null
>
> 2010-08-10 12:58:38,941 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: OutOfMemoryError,
aborting.
> java.lang.OutOfMemoryError: Java heap space
>        at java.util.Arrays.copyOf(Arrays.java:2786)
>        at java.io.ByteArrayOutputStream.toByteArray(ByteArrayOutputStream.java:133)
>        at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:942)
>
> Then I see:
>
> 2010-08-10 12:58:39,408 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 79
on 60020, call close(-2793534857581898004) from 192.168.195.88:41233: error: java.io.IOException:
Server not running, aborting
> java.io.IOException: Server not running, aborting
>
> And finally:
>
> 2010-08-10 12:58:39,514 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Stop
requested, clearing toDo despite exception
> 2010-08-10 12:58:39,515 INFO org.apache.hadoop.ipc.HBaseServer: Stopping server on 60020
> 2010-08-10 12:58:39,515 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 1
on 60020: exiting
>
> And the server begins to shut down.
>
> Now, it's very likely these are due to retrieving unusually large cells - in fact, that's
my current assumption.. I'm seeing M/R tasks fail with intermittently with the same issue
on the read of cell data.
>
> My question is why does this bring the whole regionserver down? I would think the regionserver
would just fail the Get(), and move on...
>
> Am I misdiagnosing the error? Or is it the case that if I want different behavior, I
should pony up with some code? :)
>
> Take care,
>  -stu
>
>
>
>
>

Mime
View raw message