hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] Created: (HBASE-706) On OOME, regionserver sticks around and doesn't go down with cluster
Date Wed, 25 Jun 2008 17:34:44 GMT
On OOME, regionserver sticks around and doesn't go down with cluster

                 Key: HBASE-706
                 URL: https://issues.apache.org/jira/browse/HBASE-706
             Project: Hadoop HBase
          Issue Type: Bug
            Reporter: stack
             Fix For: 0.2.0

On John Gray cluster, an errant, massive, store file caused us OOME.  Shutdown of cluster
left this regionserver in place. A thread dump failed with OOME.  Here is last thing in log:

2008-06-25 03:21:55,111 INFO org.apache.hadoop.hbase.HRegionServer: worker thread exiting
2008-06-25 03:24:26,923 FATAL org.apache.hadoop.hbase.HRegionServer: Set stop flag in regionserver/0:0:0:0:0:0:0:0:60020.cacheFlusher
java.lang.OutOfMemoryError: Java heap space
        at java.util.HashMap.<init>(HashMap.java:226)
        at java.util.HashSet.<init>(HashSet.java:103)
        at org.apache.hadoop.hbase.HRegionServer.getRegionsToCheck(HRegionServer.java:1789)
        at org.apache.hadoop.hbase.HRegionServer$Flusher.enqueueOptionalFlushRegions(HRegionServer.java:479)
        at org.apache.hadoop.hbase.HRegionServer$Flusher.run(HRegionServer.java:385)
2008-06-25 03:24:26,923 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 60020,
call batchUpdate(items,,1214272763124, 9223372036854775807, org.apache.hadoop.hbase.io.BatchUpdate@67d6b1e2)
from error: java.io.IOException: Server not running
java.io.IOException: Server not running
        at org.apache.hadoop.hbase.HRegionServer.checkOpen(HRegionServer.java:1758)
        at org.apache.hadoop.hbase.HRegionServer.batchUpdate(HRegionServer.java:1547)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:616)
        at org.apache.hadoop.hbase.ipc.HbaseRPC$Server.call(HbaseRPC.java:413)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:901)

If I get an OOME just trying to threaddump, would seem to indicate we need to start keeping
a little memory resevoir around for emergencies such as this just so we can shutdown clean.

Moving this into 0.2.  Seems important to fix if robustness is name of the game.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message