cloudstack-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CLOUDSTACK-9458) Some VMs are being stopped when agent is reconnecting
Date Tue, 23 Aug 2016 05:24:21 GMT

    [ https://issues.apache.org/jira/browse/CLOUDSTACK-9458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15432166#comment-15432166
] 

ASF GitHub Bot commented on CLOUDSTACK-9458:
--------------------------------------------

Github user koushik-das commented on the issue:

    https://github.com/apache/cloudstack/pull/1640
  
    @jburwell The issue that has been reported is on a custom branch, probably @marcaurele
needs to cherry-pick some additional commits from ACS. Master/4.9 doesn't have this issue,
so in that way the PR is not needed.
    
    @marcaurele Please read my last comment again and go through the restart() method logic
in HA manager code.
    >>> If the management server cannot determine the state of the VM, it could mark
them as stopped (even though I don't think it should). But it should not create a StopVM job,
because that might trigger a proper stop of the VM if the agent is reconnecting while the
job is picked by async job workers.
    The above is not correct. If the MS is not able to determine the state of the VM, it tries
fencing off the VM (using the various fencers available). If VM cannot be fenced off successfully,
the state of the VM is left unchanged. Also if any of the investigators is able to determine
the VM state as Down then only the VM is marked as stopped. Hope that clarifies things.


> Some VMs are being stopped when agent is reconnecting
> -----------------------------------------------------
>
>                 Key: CLOUDSTACK-9458
>                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-9458
>             Project: CloudStack
>          Issue Type: Bug
>      Security Level: Public(Anyone can view this level - this is the default.) 
>            Reporter: Marc-Aurèle Brothier
>            Assignee: Marc-Aurèle Brothier
>
> If you loose the communication between the management server and one of the agent for
a few minutes, even though HA mode is not active the HighAvailibilityManager kicks in and
start to schedule vm restart. Those tasks are being inserted as async job in the DB and if
the agent comes back online during the time the jobs are still in the async table, they are
pushed to the agent and shuts down the VMs. Then since HA is not active, the VM are not restarted.
> The expected behavior in my opinion is that the VM should not be restarted at all if
HA mode is not active on them, and let the agent update the VM state with the power report.
> The bug lies in {{HighAvailibilityManagerImpl.scheduleRestartForVmsOnHost(final HostVO
host, boolean investigate)}}, PR will follow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message