ambari-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alejandro Fernandez (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (AMBARI-12488) RU - Use haadmin failover command instead of killing ZKFC during upgrade/downgrade
Date Thu, 23 Jul 2015 22:42:04 GMT

     [ https://issues.apache.org/jira/browse/AMBARI-12488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Alejandro Fernandez resolved AMBARI-12488.
------------------------------------------
    Resolution: Fixed

Committed additional patches.
trunk:
commit 0e01ff15e51073a6e2cbebf1f4e988cbacca5d6a
commit 9ce140c0cab3594c66508f300dabd485c466e979

branch-2.1:
commit 4532b5192c4677165a321778b92d8fe14530024b
commit ac74b79e8a1c458fd8486adf3ea4fe4c68a06d55

> RU - Use haadmin failover command instead of killing ZKFC during upgrade/downgrade
> ----------------------------------------------------------------------------------
>
>                 Key: AMBARI-12488
>                 URL: https://issues.apache.org/jira/browse/AMBARI-12488
>             Project: Ambari
>          Issue Type: Story
>          Components: ambari-server
>    Affects Versions: 2.0.0
>            Reporter: Alejandro Fernandez
>            Assignee: Alejandro Fernandez
>              Labels: rolling_upgrade
>             Fix For: 2.1.1
>
>         Attachments: AMBARI-12488.patch, AMBARI-12488.v1.patch, AMBARI-12488.v2.patch
>
>
> Currently RU orchestration during upgrade/downgrade kills ZKFC on the active NameNode
to initiate a failover to standby. We should instead use the failover command.
> E.g.,
> {code}
> su hdfs -c 'hdfs haadmin -failover nn1 nn2'
> {code}
> Where nn1 is the current namenode if it if the active one, and nn2 is the remaining namenode.
> This is safer than killing zkfc on the active namenode because this command first tries
to gracefully transition a NameNode to the Standby state. If this fails, the fencing methods
(as configured by dfs.ha.fencing.methods) will be attempted until one succeeds. After this
process the second NameNode will be transitioned to the Active state. 
> It reduces long waits between ZKFC kill, failure kicking-in after a timeout, and then
NN becoming active.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message