hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available
Date Mon, 09 Apr 2012 22:49:18 GMT

    [ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13250269#comment-13250269

stack commented on HBASE-5666:

Patch looks good.

Logs '{
+            LOG.warn(zkw.prefix("Unable to set watcher on znode (" + znode + ")"), keeperEx);
... but the method says its checkExists w/o setting watch.

I think this a bad idea; i.e. sleeping w/o interrupt.  How long is SOCKET_RETRY_WAIT_MS? 
What if we try to stop the hosting server in meantime?  We have to wait on this to come up
out of this loop?

+        Threads.sleepWithoutInterrupt(HConstants.SOCKET_RETRY_WAIT_MS);

Passing 0, are we supposed to try once only?  My guess is that we could try more than once
given how the loop runs; i.e. we may loop multiple times in same millisecond.. you might want
to exit loop if timeout is zero.

What happens if a client comes in during this time?  It will crash out immediately because
no base node?

Thanks Matteo.
> RegionServer doesn't retry to check if base node is available
> -------------------------------------------------------------
>                 Key: HBASE-5666
>                 URL: https://issues.apache.org/jira/browse/HBASE-5666
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver, zookeeper
>    Affects Versions: 0.92.1, 0.94.0, 0.96.0
>            Reporter: Matteo Bertozzi
>            Assignee: Matteo Bertozzi
>         Attachments: HBASE-5666-v1.patch, HBASE-5666-v2.patch, HBASE-5666-v3.patch, HBASE-5666-v4.patch,
HBASE-5666-v5.patch, HBASE-5666-v6.patch, hbase-1-regionserver.log, hbase-2-regionserver.log,
hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log
> I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed
= true)
> {code}
> $HBASE_HOME/bin/start-hbase.sh
> $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
> {code}
> but the region servers are not able to start...
> It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper()
check just once if the base not is available.
> {code}
> 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED:
Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the
one configured in the master.
> 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING
region server localhost,60202,1332964444824: Initialization of RS failed.  Hence aborting
> java.io.IOException: Received the shutdown message while waiting.
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
> 	at java.lang.Thread.run(Thread.java:662)
> {code}

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message