ambari-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Hurley" <jhur...@hortonworks.com>
Subject Re: Review Request 37739: Do Not Automatically Abort Stack Repository Installation When A Host Timed Out
Date Tue, 25 Aug 2015 02:22:28 GMT


> On Aug. 24, 2015, 8:43 p.m., Nate Cole wrote:
> > ambari-server/src/main/java/org/apache/ambari/server/controller/internal/ClusterStackVersionResourceProvider.java,
line 114
> > <https://reviews.apache.org/r/37739/diff/1/?file=1048842#file1048842line114>
> >
> >     If 100 hosts per stage, would 30 have to fail to fail the stage (not 3) if set
to 70%?

Don't ever let me accuse you of not reading the comments in a code review. Fixed.


> On Aug. 24, 2015, 8:43 p.m., Nate Cole wrote:
> > ambari-server/src/main/java/org/apache/ambari/server/controller/internal/ClusterStackVersionResourceProvider.java,
line 341
> > <https://reviews.apache.org/r/37739/diff/1/?file=1048842#file1048842line341>
> >
> >     formatting

Fixed.


> On Aug. 24, 2015, 8:43 p.m., Nate Cole wrote:
> > ambari-server/src/main/java/org/apache/ambari/server/controller/internal/ClusterStackVersionResourceProvider.java,
line 396
> > <https://reviews.apache.org/r/37739/diff/1/?file=1048842#file1048842line396>
> >
> >     successFactor already a Float, no need to down-reference to the primitive.

Fixed.


> On Aug. 24, 2015, 8:43 p.m., Nate Cole wrote:
> > ambari-server/src/main/java/org/apache/ambari/server/controller/internal/ClusterStackVersionResourceProvider.java,
lines 503-506
> > <https://reviews.apache.org/r/37739/diff/1/?file=1048842#file1048842line503>
> >
> >     Thank you!

You're welcome :)


- Jonathan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/37739/#review96255
-----------------------------------------------------------


On Aug. 24, 2015, 7:30 p.m., Jonathan Hurley wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/37739/
> -----------------------------------------------------------
> 
> (Updated Aug. 24, 2015, 7:30 p.m.)
> 
> 
> Review request for Ambari, Alejandro Fernandez and Nate Cole.
> 
> 
> Bugs: AMBARI-12867
>     https://issues.apache.org/jira/browse/AMBARI-12867
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> On 1000 node RU I had 2.3.0.0-2557 installed with some 20 hosts down with heartbeat lost.
Then I registered 2.3.2.0-2664 and when I proceeded to install, it would always get aborted
with no logs in server or agents. 
> 
> Turns out that whenever we install, we do so in stages containing 100 hosts each. If
any of the host failed or timed out etc., the rest of the stages are aborted. So in this case
the first stage had 1 host timeout, which resulted in that and other stages being aborted.
> 
> I cannot install a version without all hosts being alive. Workaround seems to be to delete
lost hosts from Ambari.
> 
> The solution is to use the stage's success criteria to determine if the other stages
in the request should be aborted.
> 
> 
> Diffs
> -----
> 
>   ambari-server/src/main/java/org/apache/ambari/server/Role.java 636df3f 
>   ambari-server/src/main/java/org/apache/ambari/server/controller/internal/ClusterStackVersionResourceProvider.java
6133885 
>   ambari-server/src/test/java/org/apache/ambari/server/controller/internal/ClusterStackVersionResourceProviderTest.java
a56823b 
> 
> Diff: https://reviews.apache.org/r/37739/diff/
> 
> 
> Testing
> -------
> 
> mvn clean test
> 
> 
> Thanks,
> 
> Jonathan Hurley
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message