Return-Path: X-Original-To: apmail-incubator-ambari-dev-archive@minotaur.apache.org Delivered-To: apmail-incubator-ambari-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7D67E107DB for ; Mon, 29 Apr 2013 18:48:16 +0000 (UTC) Received: (qmail 22914 invoked by uid 500); 29 Apr 2013 18:48:16 -0000 Delivered-To: apmail-incubator-ambari-dev-archive@incubator.apache.org Received: (qmail 22876 invoked by uid 500); 29 Apr 2013 18:48:16 -0000 Mailing-List: contact ambari-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: ambari-dev@incubator.apache.org Delivered-To: mailing list ambari-dev@incubator.apache.org Received: (qmail 22716 invoked by uid 99); 29 Apr 2013 18:48:16 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 29 Apr 2013 18:48:16 +0000 Date: Mon, 29 Apr 2013 18:48:16 +0000 (UTC) From: "Sumit Mohanty (JIRA)" To: ambari-dev@incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (AMBARI-2041) If a host that has a service client installed and the host is down, service start will fail MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/AMBARI-2041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13644750#comment-13644750 ] Sumit Mohanty commented on AMBARI-2041: --------------------------------------- +1, LGTM. > If a host that has a service client installed and the host is down, service start will fail > ------------------------------------------------------------------------------------------- > > Key: AMBARI-2041 > URL: https://issues.apache.org/jira/browse/AMBARI-2041 > Project: Ambari > Issue Type: Bug > Components: controller > Affects Versions: 1.3.0 > Reporter: Siddharth Wagle > Assignee: Siddharth Wagle > Fix For: 1.3.0 > > Attachments: AMBARI-2041.patch > > > In condor, service start may include client install on some hosts. If the host where a client is being installed is down (heartbeat lost) then service start fails. This is because the success factor for clients (tested with MAPREDUCE_CLIENT) is 1 and single failure fails the stage. During service start there are three stages, one each for installs, starts, and check. When install stage fails, the later stages are aborted. > Few observations: > Client goes to INSTALL_FAILED state. So second attempt ignores installing on the client thereby succeeds in starting the service. (this is a bug as we should try installing a component that is in INSTALL_FAILED state. However, at this point we are saved by this bug) > Service check can be scheduled on a host that is in UNHEALTHY/UNKNOWN state and can fail > Now service cannot be stopped because: > Stop command sees INSTALL_FAILED state and schedules an INSTALL task for the client which fails. > The STOP commands for other components are at a later stage and are aborted. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira