ambari-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dmitro Lisnichenko" <dlysniche...@hortonworks.com>
Subject Re: Review Request 40139: SKIPPED_FAILED state should not be bubbled up to the Upgrade level
Date Wed, 11 Nov 2015 10:50:04 GMT


> On Nov. 10, 2015, 9:31 p.m., Jonathan Hurley wrote:
> > ambari-server/src/main/java/org/apache/ambari/server/controller/internal/CalculatedStatus.java,
lines 352-361
> > <https://reviews.apache.org/r/40139/diff/2/?file=1122084#file1122084line352>
> >
> >     So my main issue here is that for someone not completely familiar with this
issue, they could easily use the other method which would produce a status of SKIPPED_FAILED.
> >     
> >     It's just too easy to pick the wrong method here when writing dependent code.
> >     
> >     With that said, why are we even bubbling this up? This should behave as any
other failure:
> >     
> >     counters.get(HostRoleStatus.FAILED) > 0 && !skippable ? HostRoleStatus.FAILED
:
> >     
> >     - If the command fails and it wasn't skippable, then the stage and request are
FAILED
> >     - If the command fails and it was skippable, then the stage and request are
COMPLETED but the command itself still failed
> >     
> >     Just because it was "auto skipped" doesn't mean it should display differently.
> 
> Jonathan Hurley wrote:
>     To clarify; auto skipped should leave the command as SKIPPED_FAILED, but the stage
and request should not be - just like how it is for a failure which was skipped by hand.
> 
> Nate Cole wrote:
>     The main thrust of this is that the UI guys want the Stage (Upgrade Item and Upgrade
Group) to show SKIPPED_FAILED with the icon they have (that "reply" arrow swoop), but the
request (Upgrade) should not.  They want that without having to iterate to the Task json object
to find out that there are SKIPPED_FAILED tasks.
> 
> Jonathan Hurley wrote:
>     I think they should have to iterate over it - logic-wise, there's no difference between
this and a normal skipped failure. The state calculations should remain the same. If the UI
needs to display different information, then they should do the iteration work.
> 
> Dmitro Lisnichenko wrote:
>     Jonathan, my modifications affect only summary request status. Stage state calculations
remain the same. Do you mean that UI should handle SKIPPED_FAILED itself on request level,
and patch is not required at all?

Read discussion in jira and got it


- Dmitro


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/40139/#review105932
-----------------------------------------------------------


On Nov. 10, 2015, 7:55 p.m., Dmitro Lisnichenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/40139/
> -----------------------------------------------------------
> 
> (Updated Nov. 10, 2015, 7:55 p.m.)
> 
> 
> Review request for Ambari, Alejandro Fernandez, Jonathan Hurley, Jayush Luniya, Nate
Cole, and Yusaku Sako.
> 
> 
> Bugs: AMBARI-13818
>     https://issues.apache.org/jira/browse/AMBARI-13818
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> When there is a skipped failure, the "upgrade" state itself comes SKIPPED_FAILED. Even
when the upgrade is running or paused, it is returning "SKIPPED_FAILED". The API should not
roll this up to the "upgrade" level as the current behavior is confusing. At the top level,
it should just be HOLDING, IN_PROGRESS, COMPLETED, etc. SKIPPED_FAILED should be bubbled up
to the upgrade group level and stop there.
> 
> 
> Also fixes another blocker:
> STR:
> 1) Install and deploy cluster with older HDP version
> 2) Enable NameNode HA
> 3) Register, install new HDP version
> 4) Start Rolling Upgrade with "Skip all Service Check failures" and "Skip all Slave Component
failures" options
> 5) Break datanode_upgrade.py script and wait for Core Slaves failures
> 6) Click "Pause upgrade" on "Core Slaves - >Verifying Skipped Failures" step
> Result:
> Button "Resume upgrade" doesn't work. After clicking on this button I've got next http
response
> {
>   "status" : 400,
>   "message" : "java.lang.IllegalArgumentException: Can only set status to PENDING when
the upgrade is ABORTED (currently SKIPPED_FAILED)"
> }
> 
> 
> Diffs
> -----
> 
>   ambari-server/src/main/java/org/apache/ambari/server/controller/internal/CalculatedStatus.java
f87c32c 
>   ambari-server/src/test/java/org/apache/ambari/server/controller/internal/CalculatedStatusTest.java
4b8587f 
> 
> Diff: https://reviews.apache.org/r/40139/diff/
> 
> 
> Testing
> -------
> 
> checked on live cluster
> 
> mvn clean test in progress
> 
> 
> Thanks,
> 
> Dmitro Lisnichenko
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message