hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-5549) Master can fail if ZooKeeper session expires
Date Thu, 23 Aug 2012 15:26:42 GMT

    [ https://issues.apache.org/jira/browse/HBASE-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13440378#comment-13440378
] 

stack commented on HBASE-5549:
------------------------------

@Himanshu Where is the issue which adds waiting on expire if its not being done over in hbase-6354?

I backported this patch because it removed a test purportedly 'useless' that was messing up
a subsequent test.  I also backported it to get the other cleanups around zk interaction that
this patch adds and to make it so there is parity here from 0.92 through to trunk.  If there
is an issue w/ fix for wait on expire, I can backport that too.

Suggest you look at Alex's countdown latch over in his watcher in the this patch https://reviews.facebook.net/D4605
It does something like what you paste above for the expiration monitor.
                
> Master can fail if ZooKeeper session expires
> --------------------------------------------
>
>                 Key: HBASE-5549
>                 URL: https://issues.apache.org/jira/browse/HBASE-5549
>             Project: HBase
>          Issue Type: Bug
>          Components: master, zookeeper
>    Affects Versions: 0.96.0
>         Environment: all
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.92.2, 0.96.0, 0.94.2
>
>         Attachments: 5549_092.txt, 5549_094.txt, 5549.v10.patch, 5549.v11.patch, 5549.v6.patch,
5549.v7.patch, 5549.v8.patch, 5549.v9.patch, nochange.patch
>
>
> There is a retry mechanism in RecoverableZooKeeper, but when the session expires, the
whole ZooKeeperWatcher is recreated, hence the retry mechanism does not work in this case.
This is why a sleep is needed in TestZooKeeper#testMasterSessionExpired: we need to wait for
ZooKeeperWatcher to be recreated before using the connection.
> This can happen in real life, it can happen when:
> - master & zookeeper starts
> - zookeeper connection is cut
> - master enters the retry loop
> - in the meantime the session expires
> - the network comes back, the session is recreated
> - the retries continues, but on the wrong object, hence fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message