incubator-ambari-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sumit Mohanty (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AMBARI-2651) If there is at least one host that is not heartbeating with host components in INSTALL_FAILED state, service operations fail
Date Tue, 16 Jul 2013 04:28:48 GMT

    [ https://issues.apache.org/jira/browse/AMBARI-2651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13709453#comment-13709453
] 

Sumit Mohanty commented on AMBARI-2651:
---------------------------------------

Looks like INSTALL_FAILED is the state that causes this issue. Other combinations work fine.
The issue is that when a SCH is in INSTALL_FAILED state then only way to get out of it is
to perform a successful INSTALL.

* Tasks get created because, only SCH that are in MAINTENANCE or UNKNOWN are ignored
* SCH in INSTALL_FAILED do not go to UNKNOWN

Possible solutions:
* Do not create tasks when HOST is in HEARTBEAT_LOST or UNHEALTHY state
* Allow going into MAINTENANCE from more states than just INSTALLED

We should implement the first solution. The second one can be discussed for later.
                
> If there is at least one host that is not heartbeating with host components in INSTALL_FAILED
state, service operations fail
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: AMBARI-2651
>                 URL: https://issues.apache.org/jira/browse/AMBARI-2651
>             Project: Ambari
>          Issue Type: Bug
>          Components: controller
>    Affects Versions: 1.2.5
>            Reporter: Sumit Mohanty
>            Assignee: Sumit Mohanty
>            Priority: Critical
>             Fix For: 1.2.5
>
>         Attachments: AMBARI-2651.patch
>
>
> This was a 3-host cluster. Tried to add one host via the Add Hosts Wizard.
> Forced an install failure and stopped ambari-agent on it. The Add Hosts Wizard was stuck
in the "Install, Start and Test" state. Fired an API call to get out of this state. This left
the host in a state where its host components are in INSTALL_FAILED state.
> Invoked MapReduce stop from the UI. This created host component install tasks on the
host as stage 1 tasks. This causes stage 2 tasks to be aborted (in this example, JobTracker
stop).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message