hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Himanshu Vashishtha (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-5549) Master can fail if ZooKeeper session expires
Date Thu, 23 Aug 2012 01:30:42 GMT

    [ https://issues.apache.org/jira/browse/HBASE-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13439992#comment-13439992

Himanshu Vashishtha commented on HBASE-5549:

@Stack: The patch is at HBase-6354. It is based on the approach as mentioned in the above
comment. It works as per current approach, but when I tried to catch explicitly the "Expired"
event in the process method of the watcher, it hangs as if it never received that event. In
that case, the only event it received is SynConnected, which is when you create a new zk instance.

Basically, this:
    ZooKeeper monitorWatcher = new ZooKeeper(quorumServers, 1000,
        new Watcher() {
          public void process(WatchedEvent event) {
            KeeperState state = event.getState();
            switch (state) {
            case Expired:
              LOG.info("processing SessionExpired event");
              synchronized (sessionClosed) {
                LOG.info("processing default event" +state);

        }, sessionID, password);

Sorry for the late  reply.
> Master can fail if ZooKeeper session expires
> --------------------------------------------
>                 Key: HBASE-5549
>                 URL: https://issues.apache.org/jira/browse/HBASE-5549
>             Project: HBase
>          Issue Type: Bug
>          Components: master, zookeeper
>    Affects Versions: 0.96.0
>         Environment: all
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>         Attachments: 5549_092.txt, 5549.v10.patch, 5549.v11.patch, 5549.v6.patch, 5549.v7.patch,
5549.v8.patch, 5549.v9.patch, nochange.patch
> There is a retry mechanism in RecoverableZooKeeper, but when the session expires, the
whole ZooKeeperWatcher is recreated, hence the retry mechanism does not work in this case.
This is why a sleep is needed in TestZooKeeper#testMasterSessionExpired: we need to wait for
ZooKeeperWatcher to be recreated before using the connection.
> This can happen in real life, it can happen when:
> - master & zookeeper starts
> - zookeeper connection is cut
> - master enters the retry loop
> - in the meantime the session expires
> - the network comes back, the session is recreated
> - the retries continues, but on the wrong object, hence fails.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message