hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Himanshu Vashishtha (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-5549) Master can fail if ZooKeeper session expires
Date Sat, 07 Jul 2012 20:54:35 GMT

    [ https://issues.apache.org/jira/browse/HBASE-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13408770#comment-13408770
] 

Himanshu Vashishtha commented on HBASE-5549:
--------------------------------------------

>From the code (and javadoc), it seems we are not 100% sure of the zookeeper close session
event. 
And, we avoid test failures based on that (either by re-creating zkw as in TestZookeeper#testClientSessionExpired,
or remove asserts altogether, TestReplicationPeer.
Is it ok to impose a hard wait on the session timeout (basic idea is to have this in HBaseTestingUtility#expireSession):
{code}
+    final boolean[] isClosed = new boolean[]{false} ;
+    ZooKeeper monitorWatcher = new ZooKeeper(quorumServers, sessionTimeout, new Watcher()
{
+      @Override
+      public void process(WatchedEvent event) {
+        LOG.info("Closed in the monitor.");
+        isClosed[0] = true ;
+      }
+    }, sessionID, password);
     monitorWatcher.close();
+    while(!isClosed[0]){
+      // sleep;
+      Thread.sleep(sessionTimeout);
+    }
{code}

And, remove the two handler approach.

This way, we are sure that the session has indeed expired, and clean up the tests. The downside
is we sleep, until we actually have expired the session (or we can have some increasing sleep
time duration and then fail the process after a hard limit). 
Good to know what others think.
                
> Master can fail if ZooKeeper session expires
> --------------------------------------------
>
>                 Key: HBASE-5549
>                 URL: https://issues.apache.org/jira/browse/HBASE-5549
>             Project: HBase
>          Issue Type: Bug
>          Components: master, zookeeper
>    Affects Versions: 0.96.0
>         Environment: all
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5549.v10.patch, 5549.v11.patch, 5549.v6.patch, 5549.v7.patch, 5549.v8.patch,
5549.v9.patch, nochange.patch
>
>
> There is a retry mechanism in RecoverableZooKeeper, but when the session expires, the
whole ZooKeeperWatcher is recreated, hence the retry mechanism does not work in this case.
This is why a sleep is needed in TestZooKeeper#testMasterSessionExpired: we need to wait for
ZooKeeperWatcher to be recreated before using the connection.
> This can happen in real life, it can happen when:
> - master & zookeeper starts
> - zookeeper connection is cut
> - master enters the retry loop
> - in the meantime the session expires
> - the network comes back, the session is recreated
> - the retries continues, but on the wrong object, hence fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message