hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ryan rawson (JIRA)" <j...@apache.org>
Subject [jira] Created: (HBASE-1384) when a regionserver loses it's ZK connection, it becomes permanently hosed
Date Thu, 07 May 2009 04:24:30 GMT
when a regionserver loses it's ZK connection, it becomes permanently hosed

                 Key: HBASE-1384
                 URL: https://issues.apache.org/jira/browse/HBASE-1384
             Project: Hadoop HBase
          Issue Type: Bug
    Affects Versions: 0.20.0
            Reporter: ryan rawson
            Assignee: Nitay Joffe
             Fix For: 0.20.0

Some regionservers lost their ZK connection (timed out) then this happened:

2009-05-06 21:09:31,558 WARN org.apache.zookeeper.ClientCnxn: Exception closing session 0x1210ac3ab1400e1
to sun.nio.ch.SelectionKeyImpl@736921fd
java.io.IOException: TIMED OUT
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:837)
2009-05-06 21:09:31,558 WARN org.apache.hadoop.hbase.regionserver.HRegionServer: Processing
message (Retry: 0)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:539)
        at org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:94)
        at org.apache.hadoop.hbase.RemoteExceptionHandler.checkThrowable(RemoteExceptionHandler.java:48)
        at org.apache.hadoop.hbase.RemoteExceptionHandler.checkIOException(RemoteExceptionHandler.java:66)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:496)
        at java.lang.Thread.run(Thread.java:717)

At this point, the regionserver has been hosed for over an hour, and shows no signs of returning.

Of my 19 regionservers, 15 are left, and when i ls /hbase/rs I only see 15 ephermeral nodes.

But the master isn't giving it up and refuses to let the regionservers rejoin the cluster.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message