Mailing-List: contact dev-help@cloudstack.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@cloudstack.apache.org
From: koushik-das <git@git.apache.org>
To: dev@cloudstack.apache.org
Reply-To: dev@cloudstack.apache.org
References: <git-pr-1640-cloudstack@git.apache.org>
In-Reply-To: <git-pr-1640-cloudstack@git.apache.org>
Subject: [GitHub] cloudstack issue #1640: CLOUDSTACK-9458: Fix HA bug when VMs are stopped on ...
Content-Type: text/plain
Message-Id: <20160919055731.3DC8BE0158@git1-us-west.apache.org>
Date: Mon, 19 Sep 2016 05:57:31 +0000 (UTC)
archived-at: Mon, 19 Sep 2016 05:57:38 -0000

Github user koushik-das commented on the issue:

    https://github.com/apache/cloudstack/pull/1640
  
    @abhinandanprateek In latest master the sequence of event described above only happens when the host has been determined as 'Down'. Refer to the below code. So the bug described won't happen. Earlier even when host state was 'Alert' the same sequence used to get triggered which possibly killed healthy VMs.
    
    > if (host != null && host.getStatus() == Status.Down) {
    >     _haMgr.scheduleRestartForVmsOnHost(host, true);
    > }
    
    In case there is still a possibility of healthy VMs getting killed then the scenario needs to be clearly identified. If we need to fix anything, the first thing would be look at improving the VM investigators rather than changing the existing fencing logic.
    
    If we go ahead with the above fix then I can think of the following scenario that is broken. In case of a genuine host down scenario non-HA VMs continue to remain in 'Running' state and no operations can be done on it. Currently non-HA VMs are marked as 'Stopped' after fencing is successful and they can be manually started on another host.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---