Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 447B8200B98 for ; Mon, 19 Sep 2016 07:57:38 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 43057160AD8; Mon, 19 Sep 2016 05:57:38 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id AEBBF160AC3 for ; Mon, 19 Sep 2016 07:57:37 +0200 (CEST) Received: (qmail 24740 invoked by uid 500); 19 Sep 2016 05:57:31 -0000 Mailing-List: contact dev-help@cloudstack.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cloudstack.apache.org Delivered-To: mailing list dev@cloudstack.apache.org Received: (qmail 24729 invoked by uid 99); 19 Sep 2016 05:57:31 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 19 Sep 2016 05:57:31 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id 3DC8BE0158; Mon, 19 Sep 2016 05:57:31 +0000 (UTC) From: koushik-das To: dev@cloudstack.apache.org Reply-To: dev@cloudstack.apache.org References: In-Reply-To: Subject: [GitHub] cloudstack issue #1640: CLOUDSTACK-9458: Fix HA bug when VMs are stopped on ... Content-Type: text/plain Message-Id: <20160919055731.3DC8BE0158@git1-us-west.apache.org> Date: Mon, 19 Sep 2016 05:57:31 +0000 (UTC) archived-at: Mon, 19 Sep 2016 05:57:38 -0000 Github user koushik-das commented on the issue: https://github.com/apache/cloudstack/pull/1640 @abhinandanprateek In latest master the sequence of event described above only happens when the host has been determined as 'Down'. Refer to the below code. So the bug described won't happen. Earlier even when host state was 'Alert' the same sequence used to get triggered which possibly killed healthy VMs. > if (host != null && host.getStatus() == Status.Down) { > _haMgr.scheduleRestartForVmsOnHost(host, true); > } In case there is still a possibility of healthy VMs getting killed then the scenario needs to be clearly identified. If we need to fix anything, the first thing would be look at improving the VM investigators rather than changing the existing fencing logic. If we go ahead with the above fix then I can think of the following scenario that is broken. In case of a genuine host down scenario non-HA VMs continue to remain in 'Running' state and no operations can be done on it. Currently non-HA VMs are marked as 'Stopped' after fencing is successful and they can be manually started on another host. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastructure@apache.org or file a JIRA ticket with INFRA. ---