hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tom White (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HADOOP-9220) Unnecessary transition to standby in ActiveStandbyElector
Date Fri, 18 Jan 2013 17:12:13 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-9220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Tom White updated HADOOP-9220:

    Attachment: HADOOP-9220.patch

I've written a test which fails without the patch. Basically it checks that the number of
times that the HA service transitions to active is as expected.

There is another part to the fix, in addition to the previous patch. In ZKFailoverController#recheckElectability()
the check may be postponed if the FC has ceded its active state and is waiting for a timeout
(10s) before rejoining the election. The trouble is that the FC may have become active again
in the intervening time, but recheckElectability() doesn't take account of this (and will
call ActiveStandbyElector#createLockNodeAsync), and so the FC will transition to standby and
then to active again. The fix I have implemented changes a postponed recheckElectability()
to check if the FC is not currently active before joining the election.
> Unnecessary transition to standby in ActiveStandbyElector
> ---------------------------------------------------------
>                 Key: HADOOP-9220
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9220
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ha
>            Reporter: Tom White
>            Assignee: Tom White
>         Attachments: HADOOP-9220.patch, HADOOP-9220.patch
> When performing a manual failover from one HA node to a second, under some circumstances
the second node will transition from standby -> active -> standby -> active. This
is with automatic failover enabled, so there is a ZK cluster doing leader election.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message