hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karthik Kambatla (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-10584) ActiveStandbyElector goes down if ZK quorum become unavailable
Date Sat, 10 May 2014 22:13:16 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-10584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13993080#comment-13993080
] 

Karthik Kambatla commented on HADOOP-10584:
-------------------------------------------

More background: We saw this when ZK became inaccessible for a few minutes. ZKFC went down
and the corresponding master was transitioned to Standby. 

bq. You mean instead of calling fatalError() like its doing now?
Yes. Or, we should have two retry modes. The retries we have today followed by a call to becomeStandby,
within an outer retry-forever loop that sleeps for a shorter time between inner-loops.



> ActiveStandbyElector goes down if ZK quorum become unavailable
> --------------------------------------------------------------
>
>                 Key: HADOOP-10584
>                 URL: https://issues.apache.org/jira/browse/HADOOP-10584
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ha
>    Affects Versions: 2.4.0
>            Reporter: Karthik Kambatla
>            Assignee: Karthik Kambatla
>            Priority: Critical
>
> ActiveStandbyElector retries operations for a few times. If the ZK quorum itself is down,
it goes down and the daemons will have to be brought up again. 
> Instead, it should log the fact that it is unable to talk to ZK, call becomeStandby on
its client, and continue to attempt connecting to ZK.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message