hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ryan rawson (JIRA)" <j...@apache.org>
Subject [jira] Created: (HBASE-1384) when a regionserver loses it's ZK connection, it becomes permanently hosed
Date Thu, 07 May 2009 04:24:30 GMT
when a regionserver loses it's ZK connection, it becomes permanently hosed
--------------------------------------------------------------------------

                 Key: HBASE-1384
                 URL: https://issues.apache.org/jira/browse/HBASE-1384
             Project: Hadoop HBase
          Issue Type: Bug
    Affects Versions: 0.20.0
            Reporter: ryan rawson
            Assignee: Nitay Joffe
             Fix For: 0.20.0


Some regionservers lost their ZK connection (timed out) then this happened:

2009-05-06 21:09:31,558 WARN org.apache.zookeeper.ClientCnxn: Exception closing session 0x1210ac3ab1400e1
to sun.nio.ch.SelectionKeyImpl@736921fd
java.io.IOException: TIMED OUT
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:837)
2009-05-06 21:09:31,558 WARN org.apache.hadoop.hbase.regionserver.HRegionServer: Processing
message (Retry: 0)
org.apache.hadoop.hbase.Leases$LeaseStillHeldException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:539)
        at org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:94)
        at org.apache.hadoop.hbase.RemoteExceptionHandler.checkThrowable(RemoteExceptionHandler.java:48)
        at org.apache.hadoop.hbase.RemoteExceptionHandler.checkIOException(RemoteExceptionHandler.java:66)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:496)
        at java.lang.Thread.run(Thread.java:717)

At this point, the regionserver has been hosed for over an hour, and shows no signs of returning.

Of my 19 regionservers, 15 are left, and when i ls /hbase/rs I only see 15 ephermeral nodes.

But the master isn't giving it up and refuses to let the regionservers rejoin the cluster.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message