cloudstack-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bryan Whitehead <dri...@megahappy.net>
Subject Re: HA not working - CloudStack 4.1.0 and KVM hypervisor hosts
Date Wed, 24 Jul 2013 23:26:09 GMT
This same thing happened to me - but it was a Power-Supply that died
on a box. All my templates have HA turned on.

All the VM's (including 1 system-router-vm) were shown as "Running"
and the host itself was simply marked "Disconnected". When I tried to
shutdown the VM's to start them again I got errors about not being
able to communicate with the agent. I tried restarting the management
server but that didn't change anything.

Getting the router working again was extremely annoying. After
changing it to Stopped it kept trying to start it again on the dead
host. I marked it destroyed then restarted the network with the force
option. That fixed it. After I hacked the DB to get all my VM's not
running with state Running to Stopped, then I was able to start all
the VM's that were down on the bad host.

Anyway, The time between host death and me finding out was about 4
days - as these were on managed servers of a customer and their
monitoring of each host wasn't working. They were pretty unhappy. :(

Other notes: this is KVM with sharedmountpoint on a gluster mount.
After host got back online gluster rsynced about 200GB of data - I
migrated VM's to the host at the same time as normal. I've had a
similar things happen with 3.0.2 install of cloudstack and everything
seamlessly restarted. Disappointing this happened with 4.1

On Wed, Jul 24, 2013 at 9:23 AM, Indra Pramana <indra@sg.or.id> wrote:
> Dear Chip, Geoff and all,
>
> I scrutinized the management server's logs during the time when I shutdown
> the host and the time when I turned the host back on.
>
> This is the management server's logs when the host is being shut down:
>
> http://pastebin.com/4wfV830Z
>
> During the time, I noted that there are quite a lot of "Sending Disconnect
> to listener" messages, which implies that the management server try to
> notify other listeners that the host is going down. However, subsequently I
> didn't see any messages on the logs showing that the management server is
> trying to activate the HA capability to start the affected VMs on another
> available host.
>
> This is the management server's logs when the host is being turned back on:
>
> http://pastebin.com/JrLJxbXH
>
> When the agent is reconnected, then CloudStack marked the affected VMs as
> stopped from previously running:
>
> ===
> 2013-07-24 23:04:57,406 DEBUG [cloud.vm.VirtualMachineManagerImpl]
> (AgentConnectTaskPool-7:null) Found 5 VMs for host 34
> 2013-07-24 23:04:57,408 DEBUG [cloud.vm.VirtualMachineManagerImpl]
> (AgentConnectTaskPool-7:null) VM i-2-273-VM: cs state = Running and
> realState = Stopped
> 2013-07-24 23:04:57,408 DEBUG [cloud.vm.VirtualMachineManagerImpl]
> (AgentConnectTaskPool-7:null) VM i-2-273-VM: cs state = Running and
> realState = Stopped
> 2013-07-24 23:04:57,408 DEBUG [cloud.ha.HighAvailabilityManagerImpl]
> (AgentConnectTaskPool-7:null) VM does not require investigation so I'm
> marking it as Stopped: VM[User|Ubuntu-12-04-2-64bit]
> 2013-07-24 23:04:57,450 DEBUG [cloud.capacity.CapacityManagerImpl]
> (AgentConnectTaskPool-7:null) VM state transitted from :Running to Stopping
> with event: StopRequestedvm's original host id: 28 new host id: 34 host id
> before state transition: 34
> ===
>
> Then the HA starts to kick in.
>
> ===
> 2013-07-24 23:04:57,955 INFO  [cloud.ha.HighAvailabilityManagerImpl]
> (HA-Worker-1:work-307) Processing HAWork[307-HA-273-Stopped-Scheduled]
> 2013-07-24 23:04:57,956 DEBUG [cloud.capacity.CapacityManagerImpl]
> (AgentConnectTaskPool-7:null) VM state transitted from :Running to Stopping
> with event: StopRequestedvm's original host id: 28 new host id: 34 host id
> before state transition: 34
> 2013-07-24 23:04:57,960 DEBUG [agent.transport.Request]
> (AgentConnectTaskPool-7:null) Seq 34-105644038: Sending  { Cmd , MgmtId:
> 161342671900, via: 34, Ver: v1, Flags: 100111,
> [{"StopCommand":{"isProxy":false,"vmName":"i-2-281-VM","wait":0}}] }
> 2013-07-24 23:04:57,968 INFO  [cloud.ha.HighAvailabilityManagerImpl]
> (HA-Worker-1:work-307) HA on VM[User|Ubuntu-12-04-2-64bit]
> 2013-07-24 23:04:57,984 DEBUG [cloud.capacity.CapacityManagerImpl]
> (HA-Worker-1:work-307) VM state transitted from :Stopped to Starting with
> event: StartRequestedvm's original host id: 28 new host id: null host id
> before state transition: null
> 2013-07-24 23:04:57,984 DEBUG [cloud.vm.VirtualMachineManagerImpl]
> (HA-Worker-1:work-307) Successfully transitioned to start state for
> VM[User|Ubuntu-12-04-2-64bit] reservation id =
> b56364ef-90d8-443f-a348-7660fda48d34
> 2013-07-24 23:04:58,025 DEBUG [cloud.vm.VirtualMachineManagerImpl]
> (HA-Worker-1:work-307) Trying to deploy VM, vm has dcId: 6 and podId: 6
> 2013-07-24 23:04:58,025 DEBUG [cloud.vm.VirtualMachineManagerImpl]
> (HA-Worker-1:work-307) Deploy avoids pods: null, clusters: null, hosts: null
> 2013-07-24 23:04:58,031 DEBUG [cloud.vm.VirtualMachineManagerImpl]
> (HA-Worker-1:work-307) Root volume is ready, need to place VM in volume's
> cluster
> 2013-07-24 23:04:58,031 DEBUG [cloud.vm.VirtualMachineManagerImpl]
> (HA-Worker-1:work-307) Vol[295|vm=273|ROOT] is READY, changing deployment
> plan to use this pool's dcId: 6 , podId: 6 , and clusterId: 6
> ===
>
> My question is why HA only kicks in when the host is turned back on? By
> right it should kick in soon after the host is shut down and marked as
> "Disconnected".
>
> Any insights on the possible solutions to this problem is highly
> appreciated.
>
> Looking forward to your reply, thank you.
>
> Cheers.
>
>
>
> On Thu, Jul 25, 2013 at 12:00 AM, Indra Pramana <indra@sg.or.id> wrote:
>
>> Hi Chip,
>>
>> Yes, "Offer HA" is set to "Yes" on all my compute offerings.
>>
>> Hi Geoff,
>>
>> Yes, I am using KVM. Is this a known issue and is there any solution to
>> this problem?
>>
>> Looking forward to your reply, thank you.
>>
>> Cheers.
>>
>>
>>
>> On Wed, Jul 24, 2013 at 11:38 PM, Geoff Higginbottom <
>> geoff.higginbottom@shapeblue.com> wrote:
>>
>>> Is it running on KVM, we are seeing some real issue with HA simply not
>>> working on KVM.
>>>
>>> Regards
>>>
>>> Geoff Higginbottom
>>>
>>> D: +44 20 3603 0542 | S: +44 20 3603 0540 | M: +447968161581
>>>
>>> geoff.higginbottom@shapeblue.com
>>>
>>> -----Original Message-----
>>> From: Chip Childers [mailto:chip.childers@sungard.com]
>>> Sent: 24 July 2013 16:37
>>> To: <users@cloudstack.apache.org>
>>> Subject: Re: HA not working - CloudStack 4.1.0 and KVM hypervisor hosts
>>>
>>> Did you enable HA for your compute offering?
>>>
>>> On Jul 24, 2013, at 11:25 AM, Indra Pramana <indra@sg.or.id> wrote:
>>>
>>> > Dear all,
>>> >
>>> > I tried to shutdown one of my hypervisor hosts to simulate a server
>>> > failure, and the HA is not working, all the VMs on the affected host
>>> > is not started on another available host.
>>> >
>>> > I am using CloudStack 4.1.0 with KVM hypervisors and Ceph RBD for
>>> > primary storage.
>>> >
>>> > My issue is similar to what is being described here:
>>> >
>>> > https://issues.apache.org/jira/browse/CLOUDSTACK-3535
>>> >
>>> > Except that on my case, the host is indeed marked as "Disconnected"
>>> > but there is no attempt from CloudStack to try starting the VMs on
>>> > another host. I can't provide logs since there's nothing on the logs
>>> > which suggest that CloudStack tries to activate the HA and start the
>>> > affected VMs on another host.
>>> >
>>> > Anyone has similar experience? Anyone knows if the above bug has been
>>> > resolved?
>>> >
>>> > Looking forward to your reply, thank you.
>>> >
>>> > Cheers.
>>> This email and any attachments to it may be confidential and are intended
>>> solely for the use of the individual to whom it is addressed. Any views or
>>> opinions expressed are solely those of the author and do not necessarily
>>> represent those of Shape Blue Ltd or related companies. If you are not the
>>> intended recipient of this email, you must neither take any action based
>>> upon its contents, nor copy or show it to anyone. Please contact the sender
>>> if you believe you have received this email in error. Shape Blue Ltd is a
>>> company incorporated in England & Wales. ShapeBlue Services India LLP is
>>> operated under license from Shape Blue Ltd. ShapeBlue is a registered
>>> trademark.
>>>
>>
>>

Mime
View raw message