cloudstack-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Francois Nadeau <the.jfnad...@gmail.com>
Subject Re: Recover VM after KVM host down (and HA not working) ?
Date Sat, 23 Dec 2017 15:14:40 GMT
Clearly the management server doesn't realize the instance on the failed
host is not running...  but the host is in Alert state and powered down,
and missing NFS heartbeats.

2017-12-23 14:57:52,427 DEBUG [c.c.h.Status]
(AgentTaskPool-10:ctx-694feb6c) (logid:160220c5) Transition:[Resource state
= Enabled, Agent event = AgentDisconnected, Host id = 4, name =
r62-i122-36-01.domain.com]
2017-12-23 14:58:24,487 DEBUG [c.c.c.CapacityManagerImpl]
(CapacityChecker:ctx-66fbe484) (logid:1f53cd63) Found 1 VMs on host 4
2017-12-23 14:58:24,495 DEBUG [c.c.c.CapacityManagerImpl]
(CapacityChecker:ctx-66fbe484) (logid:1f53cd63) Found 0 VM, not running on
host 4

Next step ?

On Sat, Dec 23, 2017 at 9:49 AM, Jean-Francois Nadeau <
the.jfnadeau@gmail.com> wrote:

> I'd really like to get at the bottom of this.    It does sound like the
> behavior mentioned in https://issues.apache.org/
> jira/browse/CLOUDSTACK-5582 but should be long fixed.
>
> One suspect log entry (be unrelated) I noticed is this recurring exception
> in the manger logs :
>
> ERROR [c.c.v.UserVmManagerImpl] (UserVm-ipfetch-3:ctx-d4c44c2b)
> (logid:16dd70ad) Caught the Exception in VmIpFetchTask
>
> Which I guess is caused by the use of an external DHCP so manager fails to
> determine a running VM IP.    Which brings me to my next question.... how
> is a VM marked for HA actually monitored ?
>
>
> On Sat, Dec 23, 2017 at 3:38 AM, Eric Green <eric.lee.green@gmail.com>
> wrote:
>
>> If all else fails, change its state to the correct  state in the MySQL
>> database and restart the management  service. Sadly that is the only way I
>> could do it when my Cloudstack got confused and stuck an instance in an
>> intermediate state where I couldn't do anything with it.
>>
>> On Dec 22, 2017 at 9:09 AM, <Jean-Francois Nadeau <the.jfnadeau@gmail.com
>> >>
>> wrote:
>>
>> Good morning,
>>
>> New to ACS and doing a POC with 4.10 on Centos 7 and KVM.
>>
>> Im trying to recover VMs after an host failure (powered off from OOB).
>>
>> Primary storage is NFS and IPMI is configured for the KVM hosts.  Zone is
>> advanced mode with vlan separation and created a shared network with no
>> services since I wish to use an external DHCP.
>>
>> First,  say I don't have a compute offering with HA enabled and a KVM host
>> goes down...  I can't put it in maintenance mode while down and disabling
>> it have no effect on the state of the lost VMs.  VM stays in running state
>> according to manager.   What should I do to force restart on remaining
>> healthy hosts ?
>>
>> Then I enabled  IPMI on all KVM hosts and attempted the same experience
>> with a compute offering with HA enabled.   Same result.  Manager do see
>> the
>> host as disconnected and powered off but take no action.   I certainly
>> miss
>> something here.  Please help !
>>
>> Regards,
>>
>> Jean-Francois
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message