hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bikas Saha (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-8212) Improve ActiveStandbyElector's behavior when session expires
Date Tue, 27 Mar 2012 17:36:26 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-8212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13239683#comment-13239683

Bikas Saha commented on HADOOP-8212:

bq. The patch does add the same handling to StatCallback. It uses the ZooKeeper "context"
parameter to pass the original zkClient. Unfortunately the Watcher interface doesn't have
any context object, which is why I had to introduce the wrapper class there.
Not this. I was talking about the handling of session expired code that was added to the create
callback (and is the title of this jira). The same thing could happen in the stat callback.
the stat callback could get session expired code and send a fatal error instead of letting
the process watcher callback rejoin the election on session expired.

bq.In my experience working on similar projects in the past, getting all the initial code
in place is only half the battle. The real work starts once the code is there and you start
banging on it in realistic test scenarios.
And it might be that if we go slow in the first place we might save that time in the later
phase :)

bq.We'd like to see automatic failover be a supported piece of the HA solution in 0.23.x (..err..2.0),
and to hit that timeline, we need to get into the latter phase ASAP.
That seems like a worthwhile engineering goal!

bq.If you'd prefer, I'm happy to create a feature branch for auto-failover and then call a
merge vote when it's ready for the full QA onslaught.
Thats a really good idea! This way at least all the jira will be linked to that and easy to
follow. And we can make aggressive changes without worrying about churning or destabilizing
trunk. I dont think voting to bring this important piece of work back to trunk will be a problem.
But it makes the process a lot more manageable and trackable. I am +1 for it. Thanks!

> Improve ActiveStandbyElector's behavior when session expires
> ------------------------------------------------------------
>                 Key: HADOOP-8212
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8212
>             Project: Hadoop Common
>          Issue Type: Improvement
>    Affects Versions: 0.23.3, 0.24.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>             Fix For: 0.23.3, 0.24.0
>         Attachments: hadoop-8212.txt, hadoop-8212.txt
> Currently when the ZK session expires, it results in a fatal error being sent to the
application callback. This is not the best behavior -- for example, in the case of HA, if
ZK goes down, we would like the current state to be maintained, rather than causing either
NN to abort. When the ZK clients are able to reconnect, they should sort out the correct leader
based on the normal locking schemes.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message