cloudstack-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Musayev, Ilya" <imusa...@webmd.net>
Subject RE: [MAJOR][BUG] ACS powers off some VMs in vSphere - when MS service is restarted
Date Wed, 25 Sep 2013 22:13:26 GMT
Sorry if the error log is abit hard to read, as an example, please track the vmname i-2-262-acs-docs-fc17.

2013-09-25 14:35:49,928 DEBUG [vmware.resource.VmwareResource] (AgentTaskPool-1:null) Detecting
a new state but couldn't find a old state so adding it to the changes: i-2-262-acs-docs-fc17
2013-09-25 14:35:53,614 DEBUG [cloud.vm.VirtualMachineManagerImpl] (AgentTaskPool-1:null)
VM i-2-262-acs-docs-fc17: cs state = Running and realState = Stopped
2013-09-25 14:35:53,614 DEBUG [cloud.vm.VirtualMachineManagerImpl] (AgentTaskPool-1:null)
VM i-2-262-acs-docs-fc17: cs state = Running and realState = Stopped
2013-09-25 14:35:53,614 INFO  [cloud.ha.HighAvailabilityManagerImpl] (AgentTaskPool-1:null)
Skip HA for VMware VM i-2-262-acs-docs-fc17
2013-09-25 14:35:53,694 DEBUG [agent.transport.Request] (AgentTaskPool-1:null) Seq 11-1418264581:
Sending  { Cmd , MgmtId: 345049078181, via: 11, Ver: v1, Flags: 100101, [{"StopCommand":{"isProxy":false,"vmName":"i-2-262-acs-docs-fc17","wait":0}},{"StopCommand":{"isProxy":false,"vmName":"i-2-278-demo01t-ops-08","wait":0}},{"StopCommand":{"isProxy":false,"vmName":"i-2-281-acs-appliance","wait":0}},{"StopCommand":{"isProxy":false,"vmName":"i-2-283-vmbld01l-ops-08","wait":0}},{"StopCommand":{"isProxy":false,"vmName":"i-2-285-ossec01l-ops-08","wait":0}}]
}
2013-09-25 14:35:53,695 DEBUG [agent.transport.Request] (AgentTaskPool-1:null) Seq 11-1418264581:
Executing:  { Cmd , MgmtId: 345049078181, via: 11, Ver: v1, Flags: 100101, [{"StopCommand":{"isProxy":false,"vmName":"i-2-262-acs-docs-fc17","wait":0}},{"StopCommand":{"isProxy":false,"vmName":"i-2-278-demo01t-ops-08","wait":0}},{"StopCommand":{"isProxy":false,"vmName":"i-2-281-acs-appliance","wait":0}},{"StopCommand":{"isProxy":false,"vmName":"i-2-283-vmbld01l-ops-08","wait":0}},{"StopCommand":{"isProxy":false,"vmName":"i-2-285-ossec01l-ops-08","wait":0}}]
}
2013-09-25 14:35:53,702 INFO  [vmware.resource.VmwareResource] (DirectAgent-3:vmha62d-ops-08.portal.webmd.com)
Executing resource StopCommand: {"isProxy":false,"vmName":"i-2-262-acs-docs-fc17","wait":0}
2013-09-25 14:35:53,703 DEBUG [vmware.mo.HostMO] (DirectAgent-3:vmha62d-ops-08.portal.webmd.com)
find VM i-2-262-acs-docs-fc17 on host
2013-09-25 14:35:54,753 INFO  [vmware.resource.VmwareResource] (DirectAgent-3:vmha62d-ops-08.portal.webmd.com)
VM i-2-262-acs-docs-fc17 is already in stopped state

> -----Original Message-----
> From: Musayev, Ilya [mailto:imusayev@webmd.net]
> Sent: Wednesday, September 25, 2013 6:08 PM
> To: dev@cloudstack.apache.org
> Subject: [MAJOR][BUG] ACS powers off some VMs in vSphere - when MS
> service is restarted
> 
> Hi All,
> 
> I'd like to raise an awareness on the issue with ACS 4.1.1 (I assume other
> versions are affected - since I could not find any changes in the latest code
> that would state otherwise).
> 
> On MS server start/restart, it checks all the VMs for their state, if some
> reason state is either not found or comes as stopped - which is inaccurate,
> the vms will *power off*.
> 
> On todays occurrence, half of my plant  went down because ACS invoked
> StopCommand on the vms that either had no state or for some reason
> agentState == Stopped.
> 
> The details of this issue is in
> https://issues.apache.org/jira/browse/CLOUDSTACK-4740
> 
> The error in the logs is:
> 
> 2013-09-25 14:35:49,928 DEBUG [vmware.resource.VmwareResource]
> (AgentTaskPool-1:null) Detecting a new state but couldn't find a old state so
> adding it to the changes: i-2-262-acs-docs-fc17
> 2013-09-25 14:35:51,213 DEBUG [agent.transport.Request] (AgentTaskPool-
> 1:null) Seq -1--1: Startup request from directly connected host: { Cmd ,
> MgmtId: -1, via: -1, Ver: v1, Flags: 11,
> [{"cpus":16,"speed":2199,"memory":68683468800,"dom0MinMemory":0,"po
> olSync":false,"vms":{"i-8-270-CLOUD411":{"state":"Running"},"r-15-
> CLOUD41-OLD":{"state":"Stopped"},"v-260-
> CLOUD411":{"state":"Running"},"i-2-283-vmbld01l-ops-
> 08":{"state":"Running"},"i-2-104-ACS41VM":{"state":"Running"},"--s-1-
> CLOUD41-OLD":{"state":"Running"},"i-27-280-
> CLOUD411":{"state":"Running"},"i-2-285-ossec01l-ops-
> 08":{"state":"Running"},"i-2-262-acs-docs-fc17":{"state":"Stopped"},"i-24-
> 265-test3":{"state":"Running"},"cloud01l-ops-
> 08.portal.webmd.com":{"state":"Running"},"i-2-278-demo01t-ops-
> 08":{"state":"Running"},"s-63-CLOUD411":{"state":"Running"},"r-66-
> CLOUD411":{"state":"Running"},"i-2-281-acs-
> appliance":{"state":"Running"}},"caps":"hvm","hypervisorType":"VMware","
> hostDetails":{"com.cloud.network.Networks.RouterPrivateIpStrategy":"DcGl
> obal","NativeHA":"true"},"hypervisorVersion":"5.0","type":"Routing","dataC
> enter":"2","pod":"2","cluster":"3","guid":"HostSystem:host-19716@vc00q-
> ops-08.portal.webmd.com","name":"vmha62d-ops-
> 08.portal.webmd.com","version":"4.1.1-
> SNAPSHOT","privateIpAddress":"172.25.243.31","privateMacAddress":"68:b5
> :99:73:0b:c2","privateNetmask":"255.255.255.0","storageIpAddress":"172.25.
> 243.31","storageNetmask":"255.255.255.0","storageMacAddress":"68:b5:99:7
> 3:0b:c2","wait":0},{"totalSize":0,"poolInfo":{"uuid":"72c8aedb-58c4-4569-
> ac51-adc5af770bf6","host":"vmha62d-ops-
> 08.portal.webmd.com","localPath":"","hostPath":"datastore-
> 19718","poolType":"LVM","capacityBytes":141465485312,"availableBytes":14
> 0383354880},"resourceType":"STORAGE_POOL","hostDetails":{},"type":"Stor
> age","dataCenter":"2","pod":"2","cluster":"3","guid":"72c8aedb-58c4-4569-
> ac51-adc5af770bf6","name":"72c8aedb-58c4-4569-ac51-
> adc5af770bf6","wait":0}] }
> 2013-09-25 14:35:53,614 DEBUG [cloud.vm.VirtualMachineManagerImpl]
> (AgentTaskPool-1:null) VM i-2-262-acs-docs-fc17: cs state = Running and
> realState = Stopped
> 2013-09-25 14:35:53,614 DEBUG [cloud.vm.VirtualMachineManagerImpl]
> (AgentTaskPool-1:null) VM i-2-262-acs-docs-fc17: cs state = Running and
> realState = Stopped
> 2013-09-25 14:35:53,614 INFO [cloud.ha.HighAvailabilityManagerImpl]
> (AgentTaskPool-1:null) Skip HA for VMware VM i-2-262-acs-docs-fc17
> 2013-09-25 14:35:53,694 DEBUG [agent.transport.Request] (AgentTaskPool-
> 1:null) Seq 11-1418264581: Sending { Cmd , MgmtId: 345049078181, via: 11,
> Ver: v1, Flags: 100101, [{"StopCommand":{"isProxy":false,"vmName":"i-2-
> 262-acs-docs-
> fc17","wait":0}},{"StopCommand":{"isProxy":false,"vmName":"i-2-278-
> demo01t-ops-08","wait":0}},{"StopCommand":{"isProxy":false,"vmName":"i-
> 2-281-acs-
> appliance","wait":0}},{"StopCommand":{"isProxy":false,"vmName":"i-2-283-
> vmbld01l-ops-08","wait":0}},{"StopCommand":{"isProxy":false,"vmName":"i-
> 2-285-ossec01l-ops-08","wait":0}}] }
> 2013-09-25 14:35:53,695 DEBUG [agent.transport.Request] (AgentTaskPool-
> 1:null) Seq 11-1418264581: Executing: { Cmd , MgmtId: 345049078181, via: 11,
> Ver: v1, Flags: 100101, [{"StopCommand":{"isProxy":false,"vmName":"i-2-
> 262-acs-docs-
> fc17","wait":0}},{"StopCommand":{"isProxy":false,"vmName":"i-2-278-
> demo01t-ops-08","wait":0}},{"StopCommand":{"isProxy":false,"vmName":"i-
> 2-281-acs-
> appliance","wait":0}},{"StopCommand":{"isProxy":false,"vmName":"i-2-283-
> vmbld01l-ops-08","wait":0}},{"StopCommand":{"isProxy":false,"vmName":"i-
> 2-285-ossec01l-ops-08","wait":0}}] }
> 2013-09-25 14:35:53,702 INFO [vmware.resource.VmwareResource]
> (DirectAgent-3:vmha62d-ops-08.portal.webmd.com) Executing resource
> StopCommand: {"isProxy":false,"vmName":"i-2-262-acs-docs-fc17","wait":0}
> 2013-09-25 14:35:53,703 DEBUG [vmware.mo.HostMO] (DirectAgent-
> 3:vmha62d-ops-08.portal.webmd.com) find VM i-2-262-acs-docs-fc17 on
> host
> 2013-09-25 14:35:54,753 INFO [vmware.resource.VmwareResource]
> (DirectAgent-3:vmha62d-ops-08.portal.webmd.com) VM i-2-262-acs-docs-
> fc17 is already in stopped state
> 2013-09-25 14:37:50,264 DEBUG [agent.transport.Request] (DirectAgent-
> 3:null) Seq 11-1418264581: Processing: { Ans: , MgmtId: 345049078181, via: 11,
> Ver: v1, Flags: 100, [{"StopAnswer":{"vncPort":0,"result":true,"details":"VM
> i-2-262-acs-docs-fc17 is already in stopped
> state","wait":0}},{"StopAnswer":{"vncPort":0,"result":true,"details":"Stop
> VM i-2-278-demo01t-ops-08
> Succeed","wait":0}},{"StopAnswer":{"vncPort":0,"result":true,"details":"Stop
> VM i-2-281-acs-appliance
> Succeed","wait":0}},{"StopAnswer":{"vncPort":0,"result":true,"details":"Stop
> VM i-2-283-vmbld01l-ops-08
> Succeed","wait":0}},{"StopAnswer":{"vncPort":0,"result":true,"details":"Stop
> VM i-2-285-ossec01l-ops-08 Succeed","wait":0}}] }
> 2013-09-25 14:38:14,528 DEBUG [vmware.mo.HostMO] (DirectAgent-
> 20:vmha62d-ops-08.portal.webmd.com) find VM i-2-262-acs-docs-fc17 on
> host
> 
> I wonder why this was not noted earlier.
> 
> The code in question is below, and comes from
> engine/orchestration/src/com/cloud/vm/VirtualMachineManagerImpl.java
> on master branch.
> 
> 2365 if (serverState == State.Running) {
> 2366 // Our records showed that it should be running so let's restart
> 2367 // it.
> 2368 _haMgr.scheduleRestart(vm, false);
> 2369 } else if (serverState == State.Stopping) {
> 2370 _haMgr.scheduleStop(vm, hostId, WorkType.ForceStop);
> 2371 s_logger.debug("Scheduling a check stop for VM in stopping mode: " +
> vm);
> 2372 } else if (serverState == State.Starting) {
> 2373 s_logger.debug("Ignoring VM in starting mode: " +
> vm.getInstanceName());
> 2374 _haMgr.scheduleRestart(vm, false);
> 2375 }
> 2376 command = cleanup(vm);
> 
> If there is a quick fix, which I think for now, I can just comment out the
> "command= cleanup(vm)", please let me know. I'm not certain what would
> be the repercussions.
> 
> As always any feedback, is appreciated.
> 
> Thanks
> ilya
> 
> 



Mime
View raw message