cloudstack-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <>
Subject [jira] [Commented] (CLOUDSTACK-9458) Some VMs are being stopped when agent is reconnecting
Date Thu, 18 Aug 2016 06:26:20 GMT


ASF GitHub Bot commented on CLOUDSTACK-9458:

Github user koushik-das commented on the issue:
    @marcaurele Based on the initial few lines of the logs the agent went to Alert state.
    srv02 2016-08-08 11:56:03,895 DEBUG [agent.manager.AgentManagerImpl] (AgentTaskPool-16:ctx-8b5b6956)
The next status of agent 44692is Alert, current status is Up
    srv02 2016-08-08 11:56:03,896 DEBUG [agent.manager.AgentManagerImpl] (AgentTaskPool-16:ctx-8b5b6956)
Deregistering link for 44692 with state Alert
    As per the latest ACS code (4.9/master) restart of VMs on a host are scheduled only if
the state of host is determined as Down. In case of Alert nothing is done.
    On what version of CS are you seeing this issue?

> Some VMs are being stopped when agent is reconnecting
> -----------------------------------------------------
>                 Key: CLOUDSTACK-9458
>                 URL:
>             Project: CloudStack
>          Issue Type: Bug
>      Security Level: Public(Anyone can view this level - this is the default.) 
>            Reporter: Marc-Aurèle Brothier
>            Assignee: Marc-Aurèle Brothier
> If you loose the communication between the management server and one of the agent for
a few minutes, even though HA mode is not active the HighAvailibilityManager kicks in and
start to schedule vm restart. Those tasks are being inserted as async job in the DB and if
the agent comes back online during the time the jobs are still in the async table, they are
pushed to the agent and shuts down the VMs. Then since HA is not active, the VM are not restarted.
> The expected behavior in my opinion is that the VM should not be restarted at all if
HA mode is not active on them, and let the agent update the VM state with the power report.
> The bug lies in {{HighAvailibilityManagerImpl.scheduleRestartForVmsOnHost(final HostVO
host, boolean investigate)}}, PR will follow.

This message was sent by Atlassian JIRA

View raw message