ambari-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alejandro Fernandez (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (AMBARI-20593) RU Auto-retry does not start when Restarting NN Batch 2 step is corrupted [Batch 1 was corrupted and fixed before]
Date Tue, 28 Mar 2017 02:08:41 GMT

     [ https://issues.apache.org/jira/browse/AMBARI-20593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Alejandro Fernandez updated AMBARI-20593:
-----------------------------------------
    Description: 
STR:
1) Install ambari 2.5.0.1
In the ambari.properties file, set
stack.upgrade.auto.retry.timeout.mins=6
stack.upgrade.auto.retry.check.interval.secs=30

2) Install HDP with any set of services
3) Add NameNode HA
4) Register and install new HDP stack version
5) Start RU
5) Corrupt one step from Core Masters group (e.g., stop ambari-agent on a node while the command
is running)
Ambari will restart Restarting NN Batch 1 
6) Fix corrupted step (e.g., start ambari-agent again)
7) Corrupt another step from before the command is scheduled (e.g., stop ambari-agent on a
node)
8) Fix corrupted step (e.g., start ambari-agent agent)

The expectation is that Ambari Server should schedule the command on the 2nd node. However,
because the command never got an original_start_time and start_time, the RetryUpgradeActionService
was not able to retry it since it didn't have any timestamps to compare against.

  was:
STR:
1)Deploy cluster
2)Register and install new stack version 
3)Add properties for auto retries in ambari.properties file
stack.upgrade.auto.retry.timeout.mins=6
stack.upgrade.auto.retry.check.interval.secs=30
4)Start RU
5)Corrupt one step from CORE for rolling upgrade (stop ambari-agent on a node) [Restarting
NN Batch 1 ]

6)Fix corrupted step
7) Corrupt another step from CORE for rolling upgrade (stop ambari-agent on another node)
[Restarting NN Batch 2]

Actual result: RU: Paused upgrade (step was failed) but auto retries did not happen


> RU Auto-retry does not start when Restarting NN Batch 2 step is corrupted [Batch 1 was
corrupted and fixed before]
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: AMBARI-20593
>                 URL: https://issues.apache.org/jira/browse/AMBARI-20593
>             Project: Ambari
>          Issue Type: Bug
>          Components: ambari-server
>    Affects Versions: 2.5.0
>         Environment: rolling upgrade
>            Reporter: Sviatoslav Tereshchenko
>              Labels: rolling_upgrade
>             Fix For: 2.5.1
>
>
> STR:
> 1) Install ambari 2.5.0.1
> In the ambari.properties file, set
> stack.upgrade.auto.retry.timeout.mins=6
> stack.upgrade.auto.retry.check.interval.secs=30
> 2) Install HDP with any set of services
> 3) Add NameNode HA
> 4) Register and install new HDP stack version
> 5) Start RU
> 5) Corrupt one step from Core Masters group (e.g., stop ambari-agent on a node while
the command is running)
> Ambari will restart Restarting NN Batch 1 
> 6) Fix corrupted step (e.g., start ambari-agent again)
> 7) Corrupt another step from before the command is scheduled (e.g., stop ambari-agent
on a node)
> 8) Fix corrupted step (e.g., start ambari-agent agent)
> The expectation is that Ambari Server should schedule the command on the 2nd node. However,
because the command never got an original_start_time and start_time, the RetryUpgradeActionService
was not able to retry it since it didn't have any timestamps to compare against.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message