ambari-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alejandro Fernandez" <afernan...@hortonworks.com>
Subject Re: Review Request 30603: RU Hacks and Technical Debt - Namenode order of active/standby in code is flipped
Date Wed, 04 Feb 2015 19:08:28 GMT


> On Feb. 4, 2015, 1:27 p.m., Jonathan Hurley wrote:
> > ambari-server/src/main/resources/common-services/ZOOKEEPER/3.4.5.2.0/package/scripts/service_check.py,
lines 42-43
> > <https://reviews.apache.org/r/30603/diff/1/?file=847234#file847234line42>
> >
> >     You're defaulting the client_port to None, but then you use it in the format;
will this not cause problems when generating the string?

The format function will do str() around it. So setting client_port to None generates
Execute['/var/lib/ambari-agent/data/tmp/zkSmoke.sh /usr/hdp/current/zookeeper-client/bin/zkCli.sh
ambari-qa /etc/zookeeper/conf None False  no_keytab no_principal']

Which results in,
Exception in thread "main" java.lang.NumberFormatException: For input string: "None"


- Alejandro


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30603/#review70958
-----------------------------------------------------------


On Feb. 4, 2015, 7:08 p.m., Alejandro Fernandez wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30603/
> -----------------------------------------------------------
> 
> (Updated Feb. 4, 2015, 7:08 p.m.)
> 
> 
> Review request for Ambari, Dmitro Lisnichenko, Jonathan Hurley, Nate Cole, and Yurii
Shylov.
> 
> 
> Bugs: AMBARI-9467
>     https://issues.apache.org/jira/browse/AMBARI-9467
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> UpgradeHelper somehow calls the active Namenode first, but this ends up being the standby
namenode by the time it gets called; investigate why.
> 
> We will abide by the order in the runbook to first upgrade the standby then the active
namenode, which then causes a flip.
> In rare cases, if a namenode fails for whatever reason, ZKFC will initiate a failover,
which explains why sometimes the order may be flipped by the time that the Namenode prepare
happens. However, the namenode_upgrade.py script works in both cases (active first, or standby
first). So this explains the rare behavior.
> There's another Jira to run the namenode_upgrade script as part of the Pre-Cluster group
to make the backup, so this should reduce the likelyhood of a flip happening after the calculation
was made.
> 
> 
> Diffs
> -----
> 
>   ambari-server/src/main/java/org/apache/ambari/server/serveraction/upgrades/FinalizeUpgradeAction.java
fceb44d 
>   ambari-server/src/main/java/org/apache/ambari/server/state/UpgradeHelper.java 0c6f68a

>   ambari-server/src/main/java/org/apache/ambari/server/state/cluster/ClusterImpl.java
4a8c020 
>   ambari-server/src/main/resources/common-services/ZOOKEEPER/3.4.5.2.0/package/scripts/params.py
2484463 
>   ambari-server/src/main/resources/common-services/ZOOKEEPER/3.4.5.2.0/package/scripts/service_check.py
338de32 
>   ambari-server/src/main/resources/common-services/ZOOKEEPER/3.4.5.2.0/package/scripts/zookeeper_server.py
a7ca335 
>   ambari-server/src/test/java/org/apache/ambari/server/state/UpgradeHelperTest.java 396a91c

>   ambari-server/src/test/java/org/apache/ambari/server/state/cluster/ClusterTest.java
bb6a713 
> 
> Diff: https://reviews.apache.org/r/30603/diff/
> 
> 
> Testing
> -------
> 
> Verified Rolling Upgrade a 3-node cluster with HDFS, ZK, and Namenode HA. The flip happens
rarely, but ambari must be robust to handle it.
> 
> Unit tests are in progress.
> 
> 
> Thanks,
> 
> Alejandro Fernandez
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message