cloudstack-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Angus (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CLOUDSTACK-3535) No HA actions are performed when a KVM host goes offline
Date Thu, 22 Aug 2013 12:53:54 GMT

    [ https://issues.apache.org/jira/browse/CLOUDSTACK-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13747484#comment-13747484
] 

Paul Angus commented on CLOUDSTACK-3535:
----------------------------------------

I've tested the HA functionality on KVM and found that it did not work.

CloudStack ssems unable to 'stop' the VM which was on a host that failed because the host
is unavailable.  I waited an hour and the instance remained in the state 'stopping'.  I then
restarted the host and the instance stopped, but 5 hours later it hasn't restarted.


2013-08-22 08:35:09,802 INFO  [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-0:work-3)
KVMInvestigator found VM[User|HA-Test1]to be alive? null
2013-08-22 08:35:09,802 DEBUG [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-0:work-3)
Fencing off VM that we don't know the state of
2013-08-22 08:35:09,802 DEBUG [cloud.ha.XenServerFencer] (HA-Worker-0:work-3) Don't know how
to fence non XenServer hosts KVM
2013-08-22 08:35:09,803 INFO  [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-0:work-3)
Fencer null returned null
2013-08-22 08:35:09,807 DEBUG [agent.transport.Request] (HA-Worker-0:work-3) Seq 2-1715210012:
Sending  { Cmd , MgmtId: 345049337494, via: 2, Ver: v1, Flags: 100011, [{"com.cloud.agent.api.FenceCommand":{"vmName":"i-2-42-VM","hostGuid":"fdf1e936-0373-389b-abef-a68e339ff910-LibvirtComputingResource","hostIp":"10.0.100.41","inSeq":false,"wait":0}}]
}
2013-08-22 08:35:09,905 DEBUG [agent.transport.Request] (AgentManager-Handler-13:null) Seq
2-1715210012: Processing:  { Ans: , MgmtId: 345049337494, via: 2, Ver: v1, Flags: 10, [{"com.cloud.agent.api.FenceAnswer":{"result":true,"wait":0}}]
}
2013-08-22 08:35:09,905 DEBUG [agent.transport.Request] (HA-Worker-0:work-3) Seq 2-1715210012:
Received:  { Ans: , MgmtId: 345049337494, via: 2, Ver: v1, Flags: 10, { FenceAnswer } }
2013-08-22 08:35:09,905 INFO  [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-0:work-3)
Fencer KVMFenceBuilder returned true
2013-08-22 08:35:09,911 DEBUG [cloud.capacity.CapacityManagerImpl] (HA-Worker-0:work-3) VM
state transitted from :Running to Stopping with event: StopRequestedvm's original host id:
5 new host id: 5 host id before state transition: 5
2013-08-22 08:35:09,916 WARN  [cloud.vm.VirtualMachineManagerImpl] (HA-Worker-0:work-3) Unable
to stop vm, agent unavailable: com.cloud.exception.AgentUnavailableException: Resource [Host:5]
is unreachable: Host 5: Host with specified id is not in the right state: Down
2013-08-22 08:35:09,916 WARN  [cloud.vm.VirtualMachineManagerImpl] (HA-Worker-0:work-3) Unable
to actually stop VM[User|HA-Test1] but continue with release because it's a force stop
2013-08-22 08:35:09,920 ERROR [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-0:work-3)
Terminating HAWork[3-HA-42-Running-Investigating]
com.cloud.utils.exception.CloudRuntimeException: Caught exception even though it should be
handled.
	at com.cloud.ha.HighAvailabilityManagerImpl.restart(HighAvailabilityManagerImpl.java:479)
	at com.cloud.ha.HighAvailabilityManagerImpl$WorkerThread.run(HighAvailabilityManagerImpl.java:831)
Caused by: com.cloud.exception.AgentUnavailableException: Resource [Host:5] is unreachable:
Host 5: Host with specified id is not in the right state: Down
	at com.cloud.agent.manager.ClusteredAgentManagerImpl.getAttache(ClusteredAgentManagerImpl.java:540)
	at com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:479)
	at com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:439)
	at com.cloud.vm.VirtualMachineManagerImpl.advanceStop(VirtualMachineManagerImpl.java:1220)
	at com.cloud.ha.HighAvailabilityManagerImpl.restart(HighAvailabilityManagerImpl.java:476)
	... 1 more

                
> No HA actions are performed when a KVM host goes offline
> --------------------------------------------------------
>
>                 Key: CLOUDSTACK-3535
>                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-3535
>             Project: CloudStack
>          Issue Type: Bug
>      Security Level: Public(Anyone can view this level - this is the default.) 
>          Components: Hypervisor Controller, KVM, Management Server
>    Affects Versions: 4.1.0, 4.1.1, 4.2.0
>         Environment: KVM (CentOS 6.3) with CloudStack 4.1
>            Reporter: Paul Angus
>            Assignee: edison su
>            Priority: Blocker
>             Fix For: 4.2.0
>
>         Attachments: extract-management-server.log.2013-08-09, KVM-HA-4.1.1.2013-08-09-v1.patch,
management-server.log.Agent
>
>
> If a KVM host 'goes down', CloudStack does not perform HA for instances which are marked
as HA enabled on that host (including system VMs)
> CloudStack does not show the host as disconnected.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message