ambari-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alejandro Fernandez" <afernan...@hortonworks.com>
Subject Re: Review Request 30603: RU Hacks and Technical Debt - Namenode order of active/standby in code is flipped
Date Wed, 04 Feb 2015 02:55:08 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30603/#review70915
-----------------------------------------------------------



ambari-server/src/main/java/org/apache/ambari/server/serveraction/upgrades/FinalizeUpgradeAction.java
<https://reviews.apache.org/r/30603/#comment116382>

    Unrelated fix.



ambari-server/src/main/java/org/apache/ambari/server/state/cluster/ClusterImpl.java
<https://reviews.apache.org/r/30603/#comment116383>

    This logic should not be here, since it prevents accurately calculating the states, and
would require another restart, which can only be done through the API or the experimental
flag.



ambari-server/src/main/resources/common-services/ZOOKEEPER/3.4.5.2.0/package/scripts/zookeeper_server.py
<https://reviews.apache.org/r/30603/#comment116384>

    More debugging info.


- Alejandro Fernandez


On Feb. 4, 2015, 2:53 a.m., Alejandro Fernandez wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30603/
> -----------------------------------------------------------
> 
> (Updated Feb. 4, 2015, 2:53 a.m.)
> 
> 
> Review request for Ambari, Dmitro Lisnichenko, Jonathan Hurley, Nate Cole, and Yurii
Shylov.
> 
> 
> Bugs: AMBARI-9467
>     https://issues.apache.org/jira/browse/AMBARI-9467
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> UpgradeHelper somehow calls the active Namenode first, but this ends up being the standby
namenode by the time it gets called; investigate why.
> 
> We will abide by the order in the runbook to first upgrade the standby then the active
namenode, which then causes a flip.
> In rare cases, if a namenode fails for whatever reason, ZKFC will initiate a failover,
which explains why sometimes the order may be flipped by the time that the Namenode prepare
happens. However, the namenode_upgrade.py script works in both cases (active first, or standby
first). So this explains the rare behavior.
> There's another Jira to run the namenode_upgrade script as part of the Pre-Cluster group
to make the backup, so this should reduce the likelyhood of a flip happening after the calculation
was made.
> 
> 
> Diffs
> -----
> 
>   ambari-server/src/main/java/org/apache/ambari/server/serveraction/upgrades/FinalizeUpgradeAction.java
fceb44d 
>   ambari-server/src/main/java/org/apache/ambari/server/state/UpgradeHelper.java 0c6f68a

>   ambari-server/src/main/java/org/apache/ambari/server/state/cluster/ClusterImpl.java
db17109 
>   ambari-server/src/main/resources/common-services/ZOOKEEPER/3.4.5.2.0/package/scripts/params.py
2484463 
>   ambari-server/src/main/resources/common-services/ZOOKEEPER/3.4.5.2.0/package/scripts/service_check.py
338de32 
>   ambari-server/src/main/resources/common-services/ZOOKEEPER/3.4.5.2.0/package/scripts/zookeeper_server.py
a7ca335 
> 
> Diff: https://reviews.apache.org/r/30603/diff/
> 
> 
> Testing
> -------
> 
> Verified Rolling Upgrade a 3-node cluster with HDFS, ZK, and Namenode HA. The flip happens
rarely, but ambari must be robust to handle it.
> 
> Unit tests are in progress.
> 
> 
> Thanks,
> 
> Alejandro Fernandez
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message