incubator-ambari-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Siddharth Wagle (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (AMBARI-2041) If a host that has a service client installed and the host is down, service start will fail
Date Mon, 29 Apr 2013 18:42:16 GMT

     [ https://issues.apache.org/jira/browse/AMBARI-2041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Siddharth Wagle updated AMBARI-2041:
------------------------------------

    Attachment: AMBARI-2041.patch

+ Unit Test
                
> If a host that has a service client installed and the host is down, service start will
fail
> -------------------------------------------------------------------------------------------
>
>                 Key: AMBARI-2041
>                 URL: https://issues.apache.org/jira/browse/AMBARI-2041
>             Project: Ambari
>          Issue Type: Bug
>          Components: controller
>    Affects Versions: 1.3.0
>            Reporter: Siddharth Wagle
>            Assignee: Siddharth Wagle
>             Fix For: 1.3.0
>
>         Attachments: AMBARI-2041.patch
>
>
> In condor, service start may include client install on some hosts. If the host where
a client is being installed is down (heartbeat lost) then service start fails. This is because
the success factor for clients (tested with MAPREDUCE_CLIENT) is 1 and single failure fails
the stage. During service start there are three stages, one each for installs, starts, and
check. When install stage fails, the later stages are aborted.
> Few observations:
>     Client goes to INSTALL_FAILED state. So second attempt ignores installing on the
client thereby succeeds in starting the service. (this is a bug as we should try installing
a component that is in INSTALL_FAILED state. However, at this point we are saved by this bug)
>     Service check can be scheduled on a host that is in UNHEALTHY/UNKNOWN state and can
fail
>     Now service cannot be stopped because:
>         Stop command sees INSTALL_FAILED state and schedules an INSTALL task for the
client which fails.
>         The STOP commands for other components are at a later stage and are aborted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message