ambari-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Hurley (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (AMBARI-12850) Downgrades That Are Retried With Unhealthy Hosts Can Produce Multiple Stages In Progress
Date Fri, 21 Aug 2015 13:46:45 GMT

     [ https://issues.apache.org/jira/browse/AMBARI-12850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jonathan Hurley updated AMBARI-12850:
-------------------------------------
    Attachment: AMBARI-12850.patch

> Downgrades That Are Retried With Unhealthy Hosts Can Produce Multiple Stages In Progress
> ----------------------------------------------------------------------------------------
>
>                 Key: AMBARI-12850
>                 URL: https://issues.apache.org/jira/browse/AMBARI-12850
>             Project: Ambari
>          Issue Type: Bug
>          Components: ambari-server
>    Affects Versions: 2.1.0
>            Reporter: Jonathan Hurley
>            Assignee: Jonathan Hurley
>            Priority: Critical
>             Fix For: 2.1.2
>
>         Attachments: AMBARI-12850.patch
>
>
> When performing a downgrade from HDP 2.3 to HDP 2.2, the web client can sometimes not
show the Retry button in the event of a failure. The problem stems from two issues:
> - When retrying a stage with hosts not heartbeating, the entire stage is automatically
aborted
> - Task updates in stages is done in a non-atomic manner. 
> {code:java}
> {
>   "Upgrade" : {
>     "cluster_name" : "c1",
>     "request_id" : 21
>   },
>   "upgrade_groups" : [
>     {
>       "UpgradeGroup" : {
>         "completed_task_count" : 2,
>         "group_id" : 22,
>         "in_progress_task_count" : 0,
>         "name" : "CORE_SLAVES",
>         "progress_percent" : 100.0,
>         "request_id" : 21,
>         "status" : "COMPLETED",
>         "title" : "Core Slaves",
>         "total_task_count" : 2
>       }
>     },
>     {
>       "UpgradeGroup" : {
>         "completed_task_count" : 5,
>         "group_id" : 23,
>         "in_progress_task_count" : 3,
>         "name" : "CORE_MASTER",
>         "progress_percent" : 81.42857142857143,
>         "request_id" : 21,
>         "status" : "HOLDING_FAILED",
>         "title" : "Core Masters",
>         "total_task_count" : 8
>       }
>     },
>     {
>       "UpgradeGroup" : {
>         "completed_task_count" : 2,
>         "group_id" : 24,
>         "in_progress_task_count" : 1,
>         "name" : "ZOOKEEPER",
>         "progress_percent" : 66.66666666666666,
>         "request_id" : 21,
>         "status" : "IN_PROGRESS",
>         "title" : "ZooKeeper",
>         "total_task_count" : 3
>       }
>     },
>     {
>       "UpgradeGroup" : {
>         "completed_task_count" : 5,
>         "group_id" : 25,
>         "in_progress_task_count" : 0,
>         "name" : "POST_CLUSTER",
>         "progress_percent" : 100.0,
>         "request_id" : 21,
>         "status" : "COMPLETED",
>         "title" : "Finalize Downgrade",
>         "total_task_count" : 5
>       }
>     }
>   ]
> }
> {code}
> Since we have upgrade group in IN_PROGRESS state, and no upgrade items in IN_PROGRESS
state, following situation happened. After upgrade failed, all IN_PROGRESS groups, items should
be transitioned into ABORTED state, so this is BE issue in order to avoid such a collision.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message