ambari-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alejandro Fernandez" <afernan...@hortonworks.com>
Subject Re: Review Request 30603: RU Hacks and Technical Debt - Namenode order of active/standby in code is flipped
Date Wed, 04 Feb 2015 19:08:22 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30603/
-----------------------------------------------------------

(Updated Feb. 4, 2015, 7:08 p.m.)


Review request for Ambari, Dmitro Lisnichenko, Jonathan Hurley, Nate Cole, and Yurii Shylov.


Changes
-------

Added a unit test to check the namenode order.


Bugs: AMBARI-9467
    https://issues.apache.org/jira/browse/AMBARI-9467


Repository: ambari


Description
-------

UpgradeHelper somehow calls the active Namenode first, but this ends up being the standby
namenode by the time it gets called; investigate why.

We will abide by the order in the runbook to first upgrade the standby then the active namenode,
which then causes a flip.
In rare cases, if a namenode fails for whatever reason, ZKFC will initiate a failover, which
explains why sometimes the order may be flipped by the time that the Namenode prepare happens.
However, the namenode_upgrade.py script works in both cases (active first, or standby first).
So this explains the rare behavior.
There's another Jira to run the namenode_upgrade script as part of the Pre-Cluster group to
make the backup, so this should reduce the likelyhood of a flip happening after the calculation
was made.


Diffs (updated)
-----

  ambari-server/src/main/java/org/apache/ambari/server/serveraction/upgrades/FinalizeUpgradeAction.java
fceb44d 
  ambari-server/src/main/java/org/apache/ambari/server/state/UpgradeHelper.java 0c6f68a 
  ambari-server/src/main/java/org/apache/ambari/server/state/cluster/ClusterImpl.java 4a8c020

  ambari-server/src/main/resources/common-services/ZOOKEEPER/3.4.5.2.0/package/scripts/params.py
2484463 
  ambari-server/src/main/resources/common-services/ZOOKEEPER/3.4.5.2.0/package/scripts/service_check.py
338de32 
  ambari-server/src/main/resources/common-services/ZOOKEEPER/3.4.5.2.0/package/scripts/zookeeper_server.py
a7ca335 
  ambari-server/src/test/java/org/apache/ambari/server/state/UpgradeHelperTest.java 396a91c

  ambari-server/src/test/java/org/apache/ambari/server/state/cluster/ClusterTest.java bb6a713


Diff: https://reviews.apache.org/r/30603/diff/


Testing
-------

Verified Rolling Upgrade a 3-node cluster with HDFS, ZK, and Namenode HA. The flip happens
rarely, but ambari must be robust to handle it.

Unit tests are in progress.


Thanks,

Alejandro Fernandez


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message