Return-Path: X-Original-To: apmail-cloudstack-issues-archive@www.apache.org Delivered-To: apmail-cloudstack-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5AB2817A5E for ; Tue, 15 Sep 2015 10:15:47 +0000 (UTC) Received: (qmail 72404 invoked by uid 500); 15 Sep 2015 10:15:47 -0000 Delivered-To: apmail-cloudstack-issues-archive@cloudstack.apache.org Received: (qmail 72366 invoked by uid 500); 15 Sep 2015 10:15:47 -0000 Mailing-List: contact issues-help@cloudstack.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cloudstack.apache.org Delivered-To: mailing list issues@cloudstack.apache.org Received: (qmail 72357 invoked by uid 500); 15 Sep 2015 10:15:47 -0000 Delivered-To: apmail-incubator-cloudstack-issues@incubator.apache.org Received: (qmail 72353 invoked by uid 99); 15 Sep 2015 10:15:47 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 15 Sep 2015 10:15:47 +0000 Date: Tue, 15 Sep 2015 10:15:47 +0000 (UTC) From: =?utf-8?Q?Ren=C3=A9_Moser_=28JIRA=29?= To: cloudstack-issues@incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (CLOUDSTACK-8848) Unexpected VR reboot after out-of-band migration MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CLOUDSTACK-8848?page=3Dcom.atl= assian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ren=C3=A9 Moser updated CLOUDSTACK-8848: ----------------------------------- Description:=20 In some conditions (race condition), VR gets rebooted after a out of band m= igration was done on vCenter.=20 {panel:bgColor=3D#FFFFCE} Note, new global setting in 4.5.2 "VR reboot after out of band migration" i= s set to *false* and this looks more like a bug. {panel} After a VR migration to a host _and_ when the VM power state report gatheri= ng is running, the VR (and also any user VM as well) will get into the "Pow= erReportMissing". If the VM is a VR. it will be powered off and started again on vCenter. Tha= t is what we see. In can not be reproduced every time a migration was done,= but it seems the problem is related to "powerReportMissing". I grep-ed the source and found this line related https://github.com/apache/cloudstack/blob/4.5.2/engine/orchestration/src/co= m/cloud/vm/VirtualMachineManagerImpl.java#L3616 and also it seems that the graceful period might be also related,=20 https://github.com/apache/cloudstack/blob/4.5.2/engine/orchestration/src/co= m/cloud/vm/VirtualMachinePowerStateSyncImpl.java#L110 In case it is a user VM, we see in the logs that the state will be set to p= owered-off, but the VM keeps running. After a while a new VM power state re= port is running and the state for the user VM gets updated to Running again= . Below the logs=20 h2. VR r-342-VM reboot log {code:none} 2015-09-15 09:37:06,508 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] (Dir= ectAgentCronJob-253:ctx-c4f59216) Run missing VM report. current time: 1442= 302626508 2015-09-15 09:37:06,508 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] (Dir= ectAgentCronJob-253:ctx-c4f59216) Detected missing VM. host: 19, vm id: 342= , power state: PowerReportMissing, last state update: 1442302506000 2015-09-15 09:37:06,508 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] (Dir= ectAgentCronJob-253:ctx-c4f59216) vm id: 342 - time since last state update= (120508ms) has passed graceful period 2015-09-15 09:37:06,517 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] (Dir= ectAgentCronJob-253:ctx-c4f59216) VM state report is updated. host: 19, vm = id: 342, power state: PowerReportMissing=20 2015-09-15 09:37:06,525 INFO [c.c.v.VirtualMachineManagerImpl] (DirectAgen= tCronJob-253:ctx-c4f59216) VM r-342-VM is at Running and we received a powe= r-off report while there is no pending jobs on it 2015-09-15 09:37:06,532 DEBUG [c.c.a.t.Request] (DirectAgentCronJob-253:ctx= -c4f59216) Seq 19-4511199451741686482: Sending { Cmd , MgmtId: 34505112210= 6, via: 19(cu01-testpod01-esx03.stxt.media.int), Ver: v1, Flags: 100011, [{= "com.cloud.agent.api.StopCommand":{"isProxy":false,"executeInSequence":fals= e,"checkBeforeCleanup":true,"vmName":"r-342-VM","wait":0}}] } 2015-09-15 09:37:06,532 DEBUG [c.c.a.t.Request] (DirectAgentCronJob-253:ctx= -c4f59216) Seq 19-4511199451741686482: Executing: { Cmd , MgmtId: 34505112= 2106, via: 19(cu01-testpod01-esx03.stxt.media.int), Ver: v1, Flags: 100011,= [{"com.cloud.agent.api.StopCommand":{"isProxy":false,"executeInSequence":f= alse,"checkBeforeCleanup":true,"vmName":"r-342-VM","wait":0}}] } 2015-09-15 09:37:06,532 DEBUG [c.c.a.m.DirectAgentAttache] (DirectAgent-136= :ctx-9bc0a401) Seq 19-4511199451741686482: Executing request 2015-09-15 09:37:06,532 INFO [c.c.h.v.r.VmwareResource] (DirectAgent-136:c= tx-9bc0a401 cu01-testpod01-esx03.stxt.media.int, cmd: StopCommand) Executin= g resource StopCommand: {"isProxy":false,"executeInSequence":false,"checkBe= foreCleanup":true,"vmName":"r-342-VM","wait":0} 2015-09-15 09:37:06,551 DEBUG [c.c.h.v.m.HostMO] (DirectAgent-136:ctx-9bc0a= 401 cu01-testpod01-esx03.stxt.media.int, cmd: StopCommand) find VM r-342-VM= on host 2015-09-15 09:37:06,551 INFO [c.c.h.v.m.HostMO] (DirectAgent-136:ctx-9bc0a= 401 cu01-testpod01-esx03.stxt.media.int, cmd: StopCommand) VM r-342-VM not = found in host cache 2015-09-15 09:37:06,551 DEBUG [c.c.h.v.m.HostMO] (DirectAgent-136:ctx-9bc0a= 401 cu01-testpod01-esx03.stxt.media.int, cmd: StopCommand) load VM cache on= host 2015-09-15 09:37:06,577 INFO [c.c.h.v.r.VmwareResource] (DirectAgent-136:c= tx-9bc0a401 cu01-testpod01-esx03.stxt.media.int, cmd: StopCommand) VM r-342= -VM is no longer in vSphere 2015-09-15 09:37:06,578 DEBUG [c.c.a.m.DirectAgentAttache] (DirectAgent-136= :ctx-9bc0a401) Seq 19-4511199451741686482: Response Received:=20 2015-09-15 09:37:06,578 DEBUG [c.c.a.t.Request] (DirectAgent-136:ctx-9bc0a4= 01) Seq 19-4511199451741686482: Processing: { Ans: , MgmtId: 345051122106,= via: 19, Ver: v1, Flags: 10, [{"com.cloud.agent.api.StopAnswer":{"result":= true,"details":"VM r-342-VM is no longer in vSphere","wait":0}}] } 2015-09-15 09:37:06,578 DEBUG [c.c.a.t.Request] (DirectAgentCronJob-253:ctx= -c4f59216) Seq 19-4511199451741686482: Received: { Ans: , MgmtId: 34505112= 2106, via: 19, Ver: v1, Flags: 10, { StopAnswer } } 2015-09-15 09:37:06,585 DEBUG [c.c.n.r.VirtualNetworkApplianceManagerImpl] = (DirectAgentCronJob-253:ctx-c4f59216) Successfully updated user statistics = as a part of domR VM[DomainRouter|r-342-VM] reboot/stop 2015-09-15 09:37:06,617 DEBUG [c.c.c.CapacityManagerImpl] (DirectAgentCronJ= ob-253:ctx-c4f59216) VM state transitted from :Running to Stopped with even= t: FollowAgentPowerOffReportvm's original host id: 20 new host id: null hos= t id before state transition: 19 2015-09-15 09:37:06,627 WARN [o.a.c.alerts] (DirectAgentCronJob-253:ctx-c4= f59216) alertType:: 27 // dataCenterId:: 1 // podId:: 1 // clusterId:: nul= l // message:: VM state sync alert 2015-09-15 09:37:06,630 DEBUG [c.c.a.AlertManagerImpl] (DirectAgentCronJob-= 253:ctx-c4f59216) Have already sent: 1 emails for alert type '27' -- skippi= ng send email 2015-09-15 09:37:06,630 INFO [c.c.v.VirtualMachineManagerImpl] (DirectAgen= tCronJob-253:ctx-c4f59216) VM r-342-VM is sync-ed to at Stopped state accor= ding to power-off report from hypervisor 2015-09-15 09:37:06,630 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] (Dir= ectAgentCronJob-253:ctx-c4f59216) Done with process of VM state report. hos= t: 19 2015-09-15 09:37:07,506 DEBUG [c.c.c.ConsoleProxyManagerImpl] (consoleproxy= -1:ctx-90d7ef3a) Zone 1 is ready to launch console proxy 2015-09-15 09:37:07,524 INFO [c.c.h.v.r.VmwareResource] (DirectAgentCronJo= b-64:ctx-47298215) Scan hung worker VM to recycle 2015-09-15 09:37:07,611 INFO [c.c.h.v.r.VmwareResource] (DirectAgentCronJo= b-254:ctx-757ec6a9) Scan hung worker VM to recycle 2015-09-15 09:37:07,722 DEBUG [c.c.a.m.DirectAgentAttache] (DirectAgentCron= Job-64:ctx-47298215) Ping from 25(cu01-testpod01-esx02.stxt.media.int) 2015-09-15 09:37:07,722 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] (Dir= ectAgentCronJob-64:ctx-47298215) Process host VM state report from ping pro= cess. host: 25 2015-09-15 09:37:07,722 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] (Dir= ectAgentCronJob-64:ctx-47298215) Process VM state report. host: 25, number = of records in report: 0 2015-09-15 09:37:07,723 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] (Dir= ectAgentCronJob-64:ctx-47298215) Done with process of VM state report. host= : 25 2015-09-15 09:37:07,732 DEBUG [o.a.c.s.SecondaryStorageManagerImpl] (secsto= rage-1:ctx-d63339a0) Zone 1 is ready to launch secondary storage VM 2015-09-15 09:37:07,774 DEBUG [c.c.a.m.DirectAgentAttache] (DirectAgentCron= Job-254:ctx-757ec6a9) Ping from 20(cu01-testpod01-esx04.stxt.media.int) 2015-09-15 09:37:07,774 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] (Dir= ectAgentCronJob-254:ctx-757ec6a9) Process host VM state report from ping pr= ocess. host: 20 2015-09-15 09:37:07,806 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] (Dir= ectAgentCronJob-254:ctx-757ec6a9) Process VM state report. host: 20, number= of records in report: 11 2015-09-15 09:37:07,806 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] (Dir= ectAgentCronJob-254:ctx-757ec6a9) VM state report. host: 20, vm id: 356, po= wer state: PowerOn 2015-09-15 09:37:07,809 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] (Dir= ectAgentCronJob-254:ctx-757ec6a9) VM power state does not change, skip DB w= riting. vm id: 356 2015-09-15 09:37:07,809 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] (Dir= ectAgentCronJob-254:ctx-757ec6a9) VM state report. host: 20, vm id: 342, po= wer state: PowerOn 2015-09-15 09:37:07,825 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] (Dir= ectAgentCronJob-254:ctx-757ec6a9) VM state report is updated. host: 20, vm = id: 342, power state: PowerOn 2015-09-15 09:37:07,832 INFO [c.c.v.VirtualMachineManagerImpl] (DirectAgen= tCronJob-254:ctx-757ec6a9) VM r-342-VM is at Stopped and we received a powe= r-on report while there is no pending jobs on it 2015-09-15 09:37:07,845 INFO [c.c.n.r.VirtualNetworkApplianceManagerImpl] = (DirectAgentCronJob-254:ctx-757ec6a9) Schedule a router reboot task as rout= er 342 is powered-on out-of-band. we need to reboot to refresh network rule= s 2015-09-15 09:37:07,846 INFO [c.c.n.r.VirtualNetworkApplianceManagerImpl] = (RouterMonitor-1:ctx-c118d628) Reboot router 342 to refresh network rules 2015-09-15 09:37:07,847 DEBUG [c.c.c.CapacityManagerImpl] (DirectAgentCronJ= ob-254:ctx-757ec6a9) VM state transitted from :Stopped to Running with even= t: FollowAgentPowerOnReportvm's original host id: 20 new host id: 20 host i= d before state transition: null {code} h2. user vm log for i-2-355-VM=20 {code:none} 2015-09-14 17:35:06,525 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] (Dir= ectAgentCronJob-36:ctx-08ae2caf) Run missing VM report. current time: 14422= 44906525 2015-09-14 17:35:06,525 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] (Dir= ectAgentCronJob-36:ctx-08ae2caf) Detected missing VM. host: 19, vm id: 355,= power state: PowerReportMissing, last state update: 1441982337000 2015-09-14 17:35:06,525 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] (Dir= ectAgentCronJob-36:ctx-08ae2caf) vm id: 355 - time since last state update(= 262569525ms) has passed graceful period 2015-09-14 17:35:06,534 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] (Dir= ectAgentCronJob-36:ctx-08ae2caf) VM state report is updated. host: 19, vm i= d: 355, power state: PowerReportMissing=20 2015-09-14 17:35:06,540 INFO [c.c.v.VirtualMachineManagerImpl] (DirectAgen= tCronJob-36:ctx-08ae2caf) VM i-2-355-VM is at Running and we received a pow= er-off report while there is no pending jobs on it 2015-09-14 17:35:06,546 DEBUG [c.c.a.t.Request] (DirectAgentCronJob-36:ctx-= 08ae2caf) Seq 19-4511199451741683988: Sending { Cmd , MgmtId: 345051122106= , via: 19(cu01-testpod01-esx03.stxt.media.int), Ver: v1, Flags: 100011, [{"= com.cloud.agent.api.StopCommand":{"isProxy":false,"executeInSequence":false= ,"checkBeforeCleanup":true,"vmName":"i-2-355-VM","wait":0}}] } 2015-09-14 17:35:06,546 DEBUG [c.c.a.t.Request] (DirectAgentCronJob-36:ctx-= 08ae2caf) Seq 19-4511199451741683988: Executing: { Cmd , MgmtId: 345051122= 106, via: 19(cu01-testpod01-esx03.stxt.media.int), Ver: v1, Flags: 100011, = [{"com.cloud.agent.api.StopCommand":{"isProxy":false,"executeInSequence":fa= lse,"checkBeforeCleanup":true,"vmName":"i-2-355-VM","wait":0}}] } 2015-09-14 17:35:06,546 DEBUG [c.c.a.m.DirectAgentAttache] (DirectAgent-424= :ctx-4793889d) Seq 19-4511199451741683988: Executing request 2015-09-14 17:35:06,547 INFO [c.c.h.v.r.VmwareResource] (DirectAgent-424:c= tx-4793889d cu01-testpod01-esx03.stxt.media.int, cmd: StopCommand) Executin= g resource StopCommand: {"isProxy":false,"executeInSequence":false,"checkBe= foreCleanup":true,"vmName":"i-2-355-VM","wait":0} 2015-09-14 17:35:06,565 DEBUG [c.c.h.v.m.HostMO] (DirectAgent-424:ctx-47938= 89d cu01-testpod01-esx03.stxt.media.int, cmd: StopCommand) find VM i-2-355-= VM on host 2015-09-14 17:35:06,565 INFO [c.c.h.v.m.HostMO] (DirectAgent-424:ctx-47938= 89d cu01-testpod01-esx03.stxt.media.int, cmd: StopCommand) VM i-2-355-VM no= t found in host cache 2015-09-14 17:35:06,565 DEBUG [c.c.h.v.m.HostMO] (DirectAgent-424:ctx-47938= 89d cu01-testpod01-esx03.stxt.media.int, cmd: StopCommand) load VM cache on= host 2015-09-14 17:35:06,595 INFO [c.c.h.v.r.VmwareResource] (DirectAgent-424:c= tx-4793889d cu01-testpod01-esx03.stxt.media.int, cmd: StopCommand) VM i-2-3= 55-VM is no longer in vSphere 2015-09-14 17:35:06,595 DEBUG [c.c.a.m.DirectAgentAttache] (DirectAgent-424= :ctx-4793889d) Seq 19-4511199451741683988: Response Received:=20 2015-09-14 17:35:06,595 DEBUG [c.c.a.t.Request] (DirectAgent-424:ctx-479388= 9d) Seq 19-4511199451741683988: Processing: { Ans: , MgmtId: 345051122106,= via: 19, Ver: v1, Flags: 10, [{"com.cloud.agent.api.StopAnswer":{"result":= true,"details":"VM i-2-355-VM is no longer in vSphere","wait":0}}] } 2015-09-14 17:35:06,595 DEBUG [c.c.a.t.Request] (DirectAgentCronJob-36:ctx-= 08ae2caf) Seq 19-4511199451741683988: Received: { Ans: , MgmtId: 345051122= 106, via: 19, Ver: v1, Flags: 10, { StopAnswer } } 2015-09-14 17:35:06,666 DEBUG [c.c.n.NetworkModelImpl] (DirectAgentCronJob-= 36:ctx-08ae2caf) Service SecurityGroup is not supported in the network id= =3D211 2015-09-14 17:35:06,672 DEBUG [c.c.n.NetworkModelImpl] (DirectAgentCronJob-= 36:ctx-08ae2caf) Service SecurityGroup is not supported in the network id= =3D211 2015-09-14 17:35:06,672 DEBUG [c.c.c.CapacityManagerImpl] (DirectAgentCronJ= ob-36:ctx-08ae2caf) VM state transitted from :Running to Stopped with event= : FollowAgentPowerOffReportvm's original host id: 19 new host id: null host= id before state transition: 19 2015-09-14 17:35:06,689 WARN [o.a.c.alerts] (DirectAgentCronJob-36:ctx-08a= e2caf) alertType:: 27 // dataCenterId:: 1 // podId:: 1 // clusterId:: null= // message:: VM state sync alert 2015-09-14 17:35:06,692 DEBUG [c.c.a.AlertManagerImpl] (DirectAgentCronJob-= 36:ctx-08ae2caf) Have already sent: 1 emails for alert type '27' -- skippin= g send email 2015-09-14 17:35:06,692 INFO [c.c.v.VirtualMachineManagerImpl] (DirectAgen= tCronJob-36:ctx-08ae2caf) VM i-2-355-VM is sync-ed to at Stopped state acco= rding to power-off report from hypervisor 2015-09-14 17:35:06,692 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] (Dir= ectAgentCronJob-36:ctx-08ae2caf) Done with process of VM state report. host= : 19 ... 2015-09-14 17:35:07,754 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] (Dir= ectAgentCronJob-137:ctx-80d2695e) VM state report is updated. host: 20, vm = id: 355, power state: PowerOn 2015-09-14 17:35:07,762 INFO [c.c.v.VirtualMachineManagerImpl] (DirectAgen= tCronJob-137:ctx-80d2695e) VM i-2-355-VM is at Stopped and we received a po= wer-on report while there is no pending jobs on it 2015-09-14 17:35:07,793 DEBUG [c.c.c.CapacityManagerImpl] (DirectAgentCronJ= ob-137:ctx-80d2695e) VM state transitted from :Stopped to Running with even= t: FollowAgentPowerOnReportvm's original host id: 19 new host id: 20 host i= d before state transition: nul {code} was: In some conditions (race condition), VR gets rebooted after a out of band m= igration was done on vCenter.=20 {panel:bgColor=3D#FFFFCE} Note, new global setting in 4.5.2 "VR reboot after out of band migration" i= s set to *false* and this looks more like a bug. {panel} After a VR migration to a host _and_ when the VM power state report gatheri= ng is running, the VR (and also any user VM as well) will get into the "Pow= erReportMissing". If the VM is a VR. it will be powered off and started again on vCenter. Tha= t is what we see. In can not be reproduced every time a migration was done,= but it seems the problem is related to "powerReportMissing". I grep-ed the source and found this line related - https://github.com/apache/cloudstack/blob/master/engine/orchestration/src= /com/cloud/vm/VirtualMachineManagerImpl.java#L3702 - https://github.com/apache/cloudstack/blob/4.5.2/engine/orchestration/src/co= m/cloud/vm/VirtualMachineManagerImpl.java#L3616 and also it seems that the graceful period might be also related,=20 - https://github.com/apache/cloudstack/blob/master/engine/orchestration/src= /com/cloud/vm/VirtualMachinePowerStateSyncImpl.java#L110 - https://github.com/apache/cloudstack/blob/4.5.2/engine/orchestration/src/co= m/cloud/vm/VirtualMachinePowerStateSyncImpl.java#L110 In case it is a user VM, we see in the logs that the state will be set to p= owered-off, but the VM keeps running. After a while a new VM power state re= port is running and the state for the user VM gets updated to Running again= . Below the logs=20 h2. VR r-342-VM reboot log {code:none} 2015-09-15 09:37:06,508 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] (Dir= ectAgentCronJob-253:ctx-c4f59216) Run missing VM report. current time: 1442= 302626508 2015-09-15 09:37:06,508 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] (Dir= ectAgentCronJob-253:ctx-c4f59216) Detected missing VM. host: 19, vm id: 342= , power state: PowerReportMissing, last state update: 1442302506000 2015-09-15 09:37:06,508 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] (Dir= ectAgentCronJob-253:ctx-c4f59216) vm id: 342 - time since last state update= (120508ms) has passed graceful period 2015-09-15 09:37:06,517 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] (Dir= ectAgentCronJob-253:ctx-c4f59216) VM state report is updated. host: 19, vm = id: 342, power state: PowerReportMissing=20 2015-09-15 09:37:06,525 INFO [c.c.v.VirtualMachineManagerImpl] (DirectAgen= tCronJob-253:ctx-c4f59216) VM r-342-VM is at Running and we received a powe= r-off report while there is no pending jobs on it 2015-09-15 09:37:06,532 DEBUG [c.c.a.t.Request] (DirectAgentCronJob-253:ctx= -c4f59216) Seq 19-4511199451741686482: Sending { Cmd , MgmtId: 34505112210= 6, via: 19(cu01-testpod01-esx03.stxt.media.int), Ver: v1, Flags: 100011, [{= "com.cloud.agent.api.StopCommand":{"isProxy":false,"executeInSequence":fals= e,"checkBeforeCleanup":true,"vmName":"r-342-VM","wait":0}}] } 2015-09-15 09:37:06,532 DEBUG [c.c.a.t.Request] (DirectAgentCronJob-253:ctx= -c4f59216) Seq 19-4511199451741686482: Executing: { Cmd , MgmtId: 34505112= 2106, via: 19(cu01-testpod01-esx03.stxt.media.int), Ver: v1, Flags: 100011,= [{"com.cloud.agent.api.StopCommand":{"isProxy":false,"executeInSequence":f= alse,"checkBeforeCleanup":true,"vmName":"r-342-VM","wait":0}}] } 2015-09-15 09:37:06,532 DEBUG [c.c.a.m.DirectAgentAttache] (DirectAgent-136= :ctx-9bc0a401) Seq 19-4511199451741686482: Executing request 2015-09-15 09:37:06,532 INFO [c.c.h.v.r.VmwareResource] (DirectAgent-136:c= tx-9bc0a401 cu01-testpod01-esx03.stxt.media.int, cmd: StopCommand) Executin= g resource StopCommand: {"isProxy":false,"executeInSequence":false,"checkBe= foreCleanup":true,"vmName":"r-342-VM","wait":0} 2015-09-15 09:37:06,551 DEBUG [c.c.h.v.m.HostMO] (DirectAgent-136:ctx-9bc0a= 401 cu01-testpod01-esx03.stxt.media.int, cmd: StopCommand) find VM r-342-VM= on host 2015-09-15 09:37:06,551 INFO [c.c.h.v.m.HostMO] (DirectAgent-136:ctx-9bc0a= 401 cu01-testpod01-esx03.stxt.media.int, cmd: StopCommand) VM r-342-VM not = found in host cache 2015-09-15 09:37:06,551 DEBUG [c.c.h.v.m.HostMO] (DirectAgent-136:ctx-9bc0a= 401 cu01-testpod01-esx03.stxt.media.int, cmd: StopCommand) load VM cache on= host 2015-09-15 09:37:06,577 INFO [c.c.h.v.r.VmwareResource] (DirectAgent-136:c= tx-9bc0a401 cu01-testpod01-esx03.stxt.media.int, cmd: StopCommand) VM r-342= -VM is no longer in vSphere 2015-09-15 09:37:06,578 DEBUG [c.c.a.m.DirectAgentAttache] (DirectAgent-136= :ctx-9bc0a401) Seq 19-4511199451741686482: Response Received:=20 2015-09-15 09:37:06,578 DEBUG [c.c.a.t.Request] (DirectAgent-136:ctx-9bc0a4= 01) Seq 19-4511199451741686482: Processing: { Ans: , MgmtId: 345051122106,= via: 19, Ver: v1, Flags: 10, [{"com.cloud.agent.api.StopAnswer":{"result":= true,"details":"VM r-342-VM is no longer in vSphere","wait":0}}] } 2015-09-15 09:37:06,578 DEBUG [c.c.a.t.Request] (DirectAgentCronJob-253:ctx= -c4f59216) Seq 19-4511199451741686482: Received: { Ans: , MgmtId: 34505112= 2106, via: 19, Ver: v1, Flags: 10, { StopAnswer } } 2015-09-15 09:37:06,585 DEBUG [c.c.n.r.VirtualNetworkApplianceManagerImpl] = (DirectAgentCronJob-253:ctx-c4f59216) Successfully updated user statistics = as a part of domR VM[DomainRouter|r-342-VM] reboot/stop 2015-09-15 09:37:06,617 DEBUG [c.c.c.CapacityManagerImpl] (DirectAgentCronJ= ob-253:ctx-c4f59216) VM state transitted from :Running to Stopped with even= t: FollowAgentPowerOffReportvm's original host id: 20 new host id: null hos= t id before state transition: 19 2015-09-15 09:37:06,627 WARN [o.a.c.alerts] (DirectAgentCronJob-253:ctx-c4= f59216) alertType:: 27 // dataCenterId:: 1 // podId:: 1 // clusterId:: nul= l // message:: VM state sync alert 2015-09-15 09:37:06,630 DEBUG [c.c.a.AlertManagerImpl] (DirectAgentCronJob-= 253:ctx-c4f59216) Have already sent: 1 emails for alert type '27' -- skippi= ng send email 2015-09-15 09:37:06,630 INFO [c.c.v.VirtualMachineManagerImpl] (DirectAgen= tCronJob-253:ctx-c4f59216) VM r-342-VM is sync-ed to at Stopped state accor= ding to power-off report from hypervisor 2015-09-15 09:37:06,630 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] (Dir= ectAgentCronJob-253:ctx-c4f59216) Done with process of VM state report. hos= t: 19 2015-09-15 09:37:07,506 DEBUG [c.c.c.ConsoleProxyManagerImpl] (consoleproxy= -1:ctx-90d7ef3a) Zone 1 is ready to launch console proxy 2015-09-15 09:37:07,524 INFO [c.c.h.v.r.VmwareResource] (DirectAgentCronJo= b-64:ctx-47298215) Scan hung worker VM to recycle 2015-09-15 09:37:07,611 INFO [c.c.h.v.r.VmwareResource] (DirectAgentCronJo= b-254:ctx-757ec6a9) Scan hung worker VM to recycle 2015-09-15 09:37:07,722 DEBUG [c.c.a.m.DirectAgentAttache] (DirectAgentCron= Job-64:ctx-47298215) Ping from 25(cu01-testpod01-esx02.stxt.media.int) 2015-09-15 09:37:07,722 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] (Dir= ectAgentCronJob-64:ctx-47298215) Process host VM state report from ping pro= cess. host: 25 2015-09-15 09:37:07,722 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] (Dir= ectAgentCronJob-64:ctx-47298215) Process VM state report. host: 25, number = of records in report: 0 2015-09-15 09:37:07,723 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] (Dir= ectAgentCronJob-64:ctx-47298215) Done with process of VM state report. host= : 25 2015-09-15 09:37:07,732 DEBUG [o.a.c.s.SecondaryStorageManagerImpl] (secsto= rage-1:ctx-d63339a0) Zone 1 is ready to launch secondary storage VM 2015-09-15 09:37:07,774 DEBUG [c.c.a.m.DirectAgentAttache] (DirectAgentCron= Job-254:ctx-757ec6a9) Ping from 20(cu01-testpod01-esx04.stxt.media.int) 2015-09-15 09:37:07,774 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] (Dir= ectAgentCronJob-254:ctx-757ec6a9) Process host VM state report from ping pr= ocess. host: 20 2015-09-15 09:37:07,806 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] (Dir= ectAgentCronJob-254:ctx-757ec6a9) Process VM state report. host: 20, number= of records in report: 11 2015-09-15 09:37:07,806 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] (Dir= ectAgentCronJob-254:ctx-757ec6a9) VM state report. host: 20, vm id: 356, po= wer state: PowerOn 2015-09-15 09:37:07,809 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] (Dir= ectAgentCronJob-254:ctx-757ec6a9) VM power state does not change, skip DB w= riting. vm id: 356 2015-09-15 09:37:07,809 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] (Dir= ectAgentCronJob-254:ctx-757ec6a9) VM state report. host: 20, vm id: 342, po= wer state: PowerOn 2015-09-15 09:37:07,825 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] (Dir= ectAgentCronJob-254:ctx-757ec6a9) VM state report is updated. host: 20, vm = id: 342, power state: PowerOn 2015-09-15 09:37:07,832 INFO [c.c.v.VirtualMachineManagerImpl] (DirectAgen= tCronJob-254:ctx-757ec6a9) VM r-342-VM is at Stopped and we received a powe= r-on report while there is no pending jobs on it 2015-09-15 09:37:07,845 INFO [c.c.n.r.VirtualNetworkApplianceManagerImpl] = (DirectAgentCronJob-254:ctx-757ec6a9) Schedule a router reboot task as rout= er 342 is powered-on out-of-band. we need to reboot to refresh network rule= s 2015-09-15 09:37:07,846 INFO [c.c.n.r.VirtualNetworkApplianceManagerImpl] = (RouterMonitor-1:ctx-c118d628) Reboot router 342 to refresh network rules 2015-09-15 09:37:07,847 DEBUG [c.c.c.CapacityManagerImpl] (DirectAgentCronJ= ob-254:ctx-757ec6a9) VM state transitted from :Stopped to Running with even= t: FollowAgentPowerOnReportvm's original host id: 20 new host id: 20 host i= d before state transition: null {code} h2. user vm log for i-2-355-VM=20 {code:none} 2015-09-14 17:35:06,525 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] (Dir= ectAgentCronJob-36:ctx-08ae2caf) Run missing VM report. current time: 14422= 44906525 2015-09-14 17:35:06,525 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] (Dir= ectAgentCronJob-36:ctx-08ae2caf) Detected missing VM. host: 19, vm id: 355,= power state: PowerReportMissing, last state update: 1441982337000 2015-09-14 17:35:06,525 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] (Dir= ectAgentCronJob-36:ctx-08ae2caf) vm id: 355 - time since last state update(= 262569525ms) has passed graceful period 2015-09-14 17:35:06,534 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] (Dir= ectAgentCronJob-36:ctx-08ae2caf) VM state report is updated. host: 19, vm i= d: 355, power state: PowerReportMissing=20 2015-09-14 17:35:06,540 INFO [c.c.v.VirtualMachineManagerImpl] (DirectAgen= tCronJob-36:ctx-08ae2caf) VM i-2-355-VM is at Running and we received a pow= er-off report while there is no pending jobs on it 2015-09-14 17:35:06,546 DEBUG [c.c.a.t.Request] (DirectAgentCronJob-36:ctx-= 08ae2caf) Seq 19-4511199451741683988: Sending { Cmd , MgmtId: 345051122106= , via: 19(cu01-testpod01-esx03.stxt.media.int), Ver: v1, Flags: 100011, [{"= com.cloud.agent.api.StopCommand":{"isProxy":false,"executeInSequence":false= ,"checkBeforeCleanup":true,"vmName":"i-2-355-VM","wait":0}}] } 2015-09-14 17:35:06,546 DEBUG [c.c.a.t.Request] (DirectAgentCronJob-36:ctx-= 08ae2caf) Seq 19-4511199451741683988: Executing: { Cmd , MgmtId: 345051122= 106, via: 19(cu01-testpod01-esx03.stxt.media.int), Ver: v1, Flags: 100011, = [{"com.cloud.agent.api.StopCommand":{"isProxy":false,"executeInSequence":fa= lse,"checkBeforeCleanup":true,"vmName":"i-2-355-VM","wait":0}}] } 2015-09-14 17:35:06,546 DEBUG [c.c.a.m.DirectAgentAttache] (DirectAgent-424= :ctx-4793889d) Seq 19-4511199451741683988: Executing request 2015-09-14 17:35:06,547 INFO [c.c.h.v.r.VmwareResource] (DirectAgent-424:c= tx-4793889d cu01-testpod01-esx03.stxt.media.int, cmd: StopCommand) Executin= g resource StopCommand: {"isProxy":false,"executeInSequence":false,"checkBe= foreCleanup":true,"vmName":"i-2-355-VM","wait":0} 2015-09-14 17:35:06,565 DEBUG [c.c.h.v.m.HostMO] (DirectAgent-424:ctx-47938= 89d cu01-testpod01-esx03.stxt.media.int, cmd: StopCommand) find VM i-2-355-= VM on host 2015-09-14 17:35:06,565 INFO [c.c.h.v.m.HostMO] (DirectAgent-424:ctx-47938= 89d cu01-testpod01-esx03.stxt.media.int, cmd: StopCommand) VM i-2-355-VM no= t found in host cache 2015-09-14 17:35:06,565 DEBUG [c.c.h.v.m.HostMO] (DirectAgent-424:ctx-47938= 89d cu01-testpod01-esx03.stxt.media.int, cmd: StopCommand) load VM cache on= host 2015-09-14 17:35:06,595 INFO [c.c.h.v.r.VmwareResource] (DirectAgent-424:c= tx-4793889d cu01-testpod01-esx03.stxt.media.int, cmd: StopCommand) VM i-2-3= 55-VM is no longer in vSphere 2015-09-14 17:35:06,595 DEBUG [c.c.a.m.DirectAgentAttache] (DirectAgent-424= :ctx-4793889d) Seq 19-4511199451741683988: Response Received:=20 2015-09-14 17:35:06,595 DEBUG [c.c.a.t.Request] (DirectAgent-424:ctx-479388= 9d) Seq 19-4511199451741683988: Processing: { Ans: , MgmtId: 345051122106,= via: 19, Ver: v1, Flags: 10, [{"com.cloud.agent.api.StopAnswer":{"result":= true,"details":"VM i-2-355-VM is no longer in vSphere","wait":0}}] } 2015-09-14 17:35:06,595 DEBUG [c.c.a.t.Request] (DirectAgentCronJob-36:ctx-= 08ae2caf) Seq 19-4511199451741683988: Received: { Ans: , MgmtId: 345051122= 106, via: 19, Ver: v1, Flags: 10, { StopAnswer } } 2015-09-14 17:35:06,666 DEBUG [c.c.n.NetworkModelImpl] (DirectAgentCronJob-= 36:ctx-08ae2caf) Service SecurityGroup is not supported in the network id= =3D211 2015-09-14 17:35:06,672 DEBUG [c.c.n.NetworkModelImpl] (DirectAgentCronJob-= 36:ctx-08ae2caf) Service SecurityGroup is not supported in the network id= =3D211 2015-09-14 17:35:06,672 DEBUG [c.c.c.CapacityManagerImpl] (DirectAgentCronJ= ob-36:ctx-08ae2caf) VM state transitted from :Running to Stopped with event= : FollowAgentPowerOffReportvm's original host id: 19 new host id: null host= id before state transition: 19 2015-09-14 17:35:06,689 WARN [o.a.c.alerts] (DirectAgentCronJob-36:ctx-08a= e2caf) alertType:: 27 // dataCenterId:: 1 // podId:: 1 // clusterId:: null= // message:: VM state sync alert 2015-09-14 17:35:06,692 DEBUG [c.c.a.AlertManagerImpl] (DirectAgentCronJob-= 36:ctx-08ae2caf) Have already sent: 1 emails for alert type '27' -- skippin= g send email 2015-09-14 17:35:06,692 INFO [c.c.v.VirtualMachineManagerImpl] (DirectAgen= tCronJob-36:ctx-08ae2caf) VM i-2-355-VM is sync-ed to at Stopped state acco= rding to power-off report from hypervisor 2015-09-14 17:35:06,692 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] (Dir= ectAgentCronJob-36:ctx-08ae2caf) Done with process of VM state report. host= : 19 ... 2015-09-14 17:35:07,754 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] (Dir= ectAgentCronJob-137:ctx-80d2695e) VM state report is updated. host: 20, vm = id: 355, power state: PowerOn 2015-09-14 17:35:07,762 INFO [c.c.v.VirtualMachineManagerImpl] (DirectAgen= tCronJob-137:ctx-80d2695e) VM i-2-355-VM is at Stopped and we received a po= wer-on report while there is no pending jobs on it 2015-09-14 17:35:07,793 DEBUG [c.c.c.CapacityManagerImpl] (DirectAgentCronJ= ob-137:ctx-80d2695e) VM state transitted from :Stopped to Running with even= t: FollowAgentPowerOnReportvm's original host id: 19 new host id: 20 host i= d before state transition: nul {code} > Unexpected VR reboot after out-of-band migration > ------------------------------------------------ > > Key: CLOUDSTACK-8848 > URL: https://issues.apache.org/jira/browse/CLOUDSTACK-884= 8 > Project: CloudStack > Issue Type: Bug > Security Level: Public(Anyone can view this level - this is the defa= ult.)=20 > Components: VMware > Affects Versions: 4.5.2 > Reporter: Ren=C3=A9 Moser > Priority: Critical > Fix For: 4.5.3 > > > In some conditions (race condition), VR gets rebooted after a out of band= migration was done on vCenter.=20 > {panel:bgColor=3D#FFFFCE} > Note, new global setting in 4.5.2 "VR reboot after out of band migration"= is set to *false* and this looks more like a bug. > {panel} > After a VR migration to a host _and_ when the VM power state report gathe= ring is running, the VR (and also any user VM as well) will get into the "P= owerReportMissing". > If the VM is a VR. it will be powered off and started again on vCenter. T= hat is what we see. In can not be reproduced every time a migration was don= e, but it seems the problem is related to "powerReportMissing". > I grep-ed the source and found this line related > https://github.com/apache/cloudstack/blob/4.5.2/engine/orchestration/src/= com/cloud/vm/VirtualMachineManagerImpl.java#L3616 > and also it seems that the graceful period might be also related,=20 > https://github.com/apache/cloudstack/blob/4.5.2/engine/orchestration/src/= com/cloud/vm/VirtualMachinePowerStateSyncImpl.java#L110 > In case it is a user VM, we see in the logs that the state will be set to= powered-off, but the VM keeps running. After a while a new VM power state = report is running and the state for the user VM gets updated to Running aga= in. Below the logs=20 > h2. VR r-342-VM reboot log > {code:none} > 2015-09-15 09:37:06,508 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] (D= irectAgentCronJob-253:ctx-c4f59216) Run missing VM report. current time: 14= 42302626508 > 2015-09-15 09:37:06,508 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] (D= irectAgentCronJob-253:ctx-c4f59216) Detected missing VM. host: 19, vm id: 3= 42, power state: PowerReportMissing, last state update: 1442302506000 > 2015-09-15 09:37:06,508 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] (D= irectAgentCronJob-253:ctx-c4f59216) vm id: 342 - time since last state upda= te(120508ms) has passed graceful period > 2015-09-15 09:37:06,517 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] (D= irectAgentCronJob-253:ctx-c4f59216) VM state report is updated. host: 19, v= m id: 342, power state: PowerReportMissing=20 > 2015-09-15 09:37:06,525 INFO [c.c.v.VirtualMachineManagerImpl] (DirectAg= entCronJob-253:ctx-c4f59216) VM r-342-VM is at Running and we received a po= wer-off report while there is no pending jobs on it > 2015-09-15 09:37:06,532 DEBUG [c.c.a.t.Request] (DirectAgentCronJob-253:c= tx-c4f59216) Seq 19-4511199451741686482: Sending { Cmd , MgmtId: 345051122= 106, via: 19(cu01-testpod01-esx03.stxt.media.int), Ver: v1, Flags: 100011, = [{"com.cloud.agent.api.StopCommand":{"isProxy":false,"executeInSequence":fa= lse,"checkBeforeCleanup":true,"vmName":"r-342-VM","wait":0}}] } > 2015-09-15 09:37:06,532 DEBUG [c.c.a.t.Request] (DirectAgentCronJob-253:c= tx-c4f59216) Seq 19-4511199451741686482: Executing: { Cmd , MgmtId: 345051= 122106, via: 19(cu01-testpod01-esx03.stxt.media.int), Ver: v1, Flags: 10001= 1, [{"com.cloud.agent.api.StopCommand":{"isProxy":false,"executeInSequence"= :false,"checkBeforeCleanup":true,"vmName":"r-342-VM","wait":0}}] } > 2015-09-15 09:37:06,532 DEBUG [c.c.a.m.DirectAgentAttache] (DirectAgent-1= 36:ctx-9bc0a401) Seq 19-4511199451741686482: Executing request > 2015-09-15 09:37:06,532 INFO [c.c.h.v.r.VmwareResource] (DirectAgent-136= :ctx-9bc0a401 cu01-testpod01-esx03.stxt.media.int, cmd: StopCommand) Execut= ing resource StopCommand: {"isProxy":false,"executeInSequence":false,"check= BeforeCleanup":true,"vmName":"r-342-VM","wait":0} > 2015-09-15 09:37:06,551 DEBUG [c.c.h.v.m.HostMO] (DirectAgent-136:ctx-9bc= 0a401 cu01-testpod01-esx03.stxt.media.int, cmd: StopCommand) find VM r-342-= VM on host > 2015-09-15 09:37:06,551 INFO [c.c.h.v.m.HostMO] (DirectAgent-136:ctx-9bc= 0a401 cu01-testpod01-esx03.stxt.media.int, cmd: StopCommand) VM r-342-VM no= t found in host cache > 2015-09-15 09:37:06,551 DEBUG [c.c.h.v.m.HostMO] (DirectAgent-136:ctx-9bc= 0a401 cu01-testpod01-esx03.stxt.media.int, cmd: StopCommand) load VM cache = on host > 2015-09-15 09:37:06,577 INFO [c.c.h.v.r.VmwareResource] (DirectAgent-136= :ctx-9bc0a401 cu01-testpod01-esx03.stxt.media.int, cmd: StopCommand) VM r-3= 42-VM is no longer in vSphere > 2015-09-15 09:37:06,578 DEBUG [c.c.a.m.DirectAgentAttache] (DirectAgent-1= 36:ctx-9bc0a401) Seq 19-4511199451741686482: Response Received:=20 > 2015-09-15 09:37:06,578 DEBUG [c.c.a.t.Request] (DirectAgent-136:ctx-9bc0= a401) Seq 19-4511199451741686482: Processing: { Ans: , MgmtId: 34505112210= 6, via: 19, Ver: v1, Flags: 10, [{"com.cloud.agent.api.StopAnswer":{"result= ":true,"details":"VM r-342-VM is no longer in vSphere","wait":0}}] } > 2015-09-15 09:37:06,578 DEBUG [c.c.a.t.Request] (DirectAgentCronJob-253:c= tx-c4f59216) Seq 19-4511199451741686482: Received: { Ans: , MgmtId: 345051= 122106, via: 19, Ver: v1, Flags: 10, { StopAnswer } } > 2015-09-15 09:37:06,585 DEBUG [c.c.n.r.VirtualNetworkApplianceManagerImpl= ] (DirectAgentCronJob-253:ctx-c4f59216) Successfully updated user statistic= s as a part of domR VM[DomainRouter|r-342-VM] reboot/stop > 2015-09-15 09:37:06,617 DEBUG [c.c.c.CapacityManagerImpl] (DirectAgentCro= nJob-253:ctx-c4f59216) VM state transitted from :Running to Stopped with ev= ent: FollowAgentPowerOffReportvm's original host id: 20 new host id: null h= ost id before state transition: 19 > 2015-09-15 09:37:06,627 WARN [o.a.c.alerts] (DirectAgentCronJob-253:ctx-= c4f59216) alertType:: 27 // dataCenterId:: 1 // podId:: 1 // clusterId:: n= ull // message:: VM state sync alert > 2015-09-15 09:37:06,630 DEBUG [c.c.a.AlertManagerImpl] (DirectAgentCronJo= b-253:ctx-c4f59216) Have already sent: 1 emails for alert type '27' -- skip= ping send email > 2015-09-15 09:37:06,630 INFO [c.c.v.VirtualMachineManagerImpl] (DirectAg= entCronJob-253:ctx-c4f59216) VM r-342-VM is sync-ed to at Stopped state acc= ording to power-off report from hypervisor > 2015-09-15 09:37:06,630 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] (D= irectAgentCronJob-253:ctx-c4f59216) Done with process of VM state report. h= ost: 19 > 2015-09-15 09:37:07,506 DEBUG [c.c.c.ConsoleProxyManagerImpl] (consolepro= xy-1:ctx-90d7ef3a) Zone 1 is ready to launch console proxy > 2015-09-15 09:37:07,524 INFO [c.c.h.v.r.VmwareResource] (DirectAgentCron= Job-64:ctx-47298215) Scan hung worker VM to recycle > 2015-09-15 09:37:07,611 INFO [c.c.h.v.r.VmwareResource] (DirectAgentCron= Job-254:ctx-757ec6a9) Scan hung worker VM to recycle > 2015-09-15 09:37:07,722 DEBUG [c.c.a.m.DirectAgentAttache] (DirectAgentCr= onJob-64:ctx-47298215) Ping from 25(cu01-testpod01-esx02.stxt.media.int) > 2015-09-15 09:37:07,722 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] (D= irectAgentCronJob-64:ctx-47298215) Process host VM state report from ping p= rocess. host: 25 > 2015-09-15 09:37:07,722 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] (D= irectAgentCronJob-64:ctx-47298215) Process VM state report. host: 25, numbe= r of records in report: 0 > 2015-09-15 09:37:07,723 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] (D= irectAgentCronJob-64:ctx-47298215) Done with process of VM state report. ho= st: 25 > 2015-09-15 09:37:07,732 DEBUG [o.a.c.s.SecondaryStorageManagerImpl] (secs= torage-1:ctx-d63339a0) Zone 1 is ready to launch secondary storage VM > 2015-09-15 09:37:07,774 DEBUG [c.c.a.m.DirectAgentAttache] (DirectAgentCr= onJob-254:ctx-757ec6a9) Ping from 20(cu01-testpod01-esx04.stxt.media.int) > 2015-09-15 09:37:07,774 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] (D= irectAgentCronJob-254:ctx-757ec6a9) Process host VM state report from ping = process. host: 20 > 2015-09-15 09:37:07,806 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] (D= irectAgentCronJob-254:ctx-757ec6a9) Process VM state report. host: 20, numb= er of records in report: 11 > 2015-09-15 09:37:07,806 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] (D= irectAgentCronJob-254:ctx-757ec6a9) VM state report. host: 20, vm id: 356, = power state: PowerOn > 2015-09-15 09:37:07,809 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] (D= irectAgentCronJob-254:ctx-757ec6a9) VM power state does not change, skip DB= writing. vm id: 356 > 2015-09-15 09:37:07,809 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] (D= irectAgentCronJob-254:ctx-757ec6a9) VM state report. host: 20, vm id: 342, = power state: PowerOn > 2015-09-15 09:37:07,825 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] (D= irectAgentCronJob-254:ctx-757ec6a9) VM state report is updated. host: 20, v= m id: 342, power state: PowerOn > 2015-09-15 09:37:07,832 INFO [c.c.v.VirtualMachineManagerImpl] (DirectAg= entCronJob-254:ctx-757ec6a9) VM r-342-VM is at Stopped and we received a po= wer-on report while there is no pending jobs on it > 2015-09-15 09:37:07,845 INFO [c.c.n.r.VirtualNetworkApplianceManagerImpl= ] (DirectAgentCronJob-254:ctx-757ec6a9) Schedule a router reboot task as ro= uter 342 is powered-on out-of-band. we need to reboot to refresh network ru= les > 2015-09-15 09:37:07,846 INFO [c.c.n.r.VirtualNetworkApplianceManagerImpl= ] (RouterMonitor-1:ctx-c118d628) Reboot router 342 to refresh network rules > 2015-09-15 09:37:07,847 DEBUG [c.c.c.CapacityManagerImpl] (DirectAgentCro= nJob-254:ctx-757ec6a9) VM state transitted from :Stopped to Running with ev= ent: FollowAgentPowerOnReportvm's original host id: 20 new host id: 20 host= id before state transition: null > {code} > h2. user vm log for i-2-355-VM=20 > {code:none} > 2015-09-14 17:35:06,525 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] (D= irectAgentCronJob-36:ctx-08ae2caf) Run missing VM report. current time: 144= 2244906525 > 2015-09-14 17:35:06,525 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] (D= irectAgentCronJob-36:ctx-08ae2caf) Detected missing VM. host: 19, vm id: 35= 5, power state: PowerReportMissing, last state update: 1441982337000 > 2015-09-14 17:35:06,525 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] (D= irectAgentCronJob-36:ctx-08ae2caf) vm id: 355 - time since last state updat= e(262569525ms) has passed graceful period > 2015-09-14 17:35:06,534 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] (D= irectAgentCronJob-36:ctx-08ae2caf) VM state report is updated. host: 19, vm= id: 355, power state: PowerReportMissing=20 > 2015-09-14 17:35:06,540 INFO [c.c.v.VirtualMachineManagerImpl] (DirectAg= entCronJob-36:ctx-08ae2caf) VM i-2-355-VM is at Running and we received a p= ower-off report while there is no pending jobs on it > 2015-09-14 17:35:06,546 DEBUG [c.c.a.t.Request] (DirectAgentCronJob-36:ct= x-08ae2caf) Seq 19-4511199451741683988: Sending { Cmd , MgmtId: 3450511221= 06, via: 19(cu01-testpod01-esx03.stxt.media.int), Ver: v1, Flags: 100011, [= {"com.cloud.agent.api.StopCommand":{"isProxy":false,"executeInSequence":fal= se,"checkBeforeCleanup":true,"vmName":"i-2-355-VM","wait":0}}] } > 2015-09-14 17:35:06,546 DEBUG [c.c.a.t.Request] (DirectAgentCronJob-36:ct= x-08ae2caf) Seq 19-4511199451741683988: Executing: { Cmd , MgmtId: 3450511= 22106, via: 19(cu01-testpod01-esx03.stxt.media.int), Ver: v1, Flags: 100011= , [{"com.cloud.agent.api.StopCommand":{"isProxy":false,"executeInSequence":= false,"checkBeforeCleanup":true,"vmName":"i-2-355-VM","wait":0}}] } > 2015-09-14 17:35:06,546 DEBUG [c.c.a.m.DirectAgentAttache] (DirectAgent-4= 24:ctx-4793889d) Seq 19-4511199451741683988: Executing request > 2015-09-14 17:35:06,547 INFO [c.c.h.v.r.VmwareResource] (DirectAgent-424= :ctx-4793889d cu01-testpod01-esx03.stxt.media.int, cmd: StopCommand) Execut= ing resource StopCommand: {"isProxy":false,"executeInSequence":false,"check= BeforeCleanup":true,"vmName":"i-2-355-VM","wait":0} > 2015-09-14 17:35:06,565 DEBUG [c.c.h.v.m.HostMO] (DirectAgent-424:ctx-479= 3889d cu01-testpod01-esx03.stxt.media.int, cmd: StopCommand) find VM i-2-35= 5-VM on host > 2015-09-14 17:35:06,565 INFO [c.c.h.v.m.HostMO] (DirectAgent-424:ctx-479= 3889d cu01-testpod01-esx03.stxt.media.int, cmd: StopCommand) VM i-2-355-VM = not found in host cache > 2015-09-14 17:35:06,565 DEBUG [c.c.h.v.m.HostMO] (DirectAgent-424:ctx-479= 3889d cu01-testpod01-esx03.stxt.media.int, cmd: StopCommand) load VM cache = on host > 2015-09-14 17:35:06,595 INFO [c.c.h.v.r.VmwareResource] (DirectAgent-424= :ctx-4793889d cu01-testpod01-esx03.stxt.media.int, cmd: StopCommand) VM i-2= -355-VM is no longer in vSphere > 2015-09-14 17:35:06,595 DEBUG [c.c.a.m.DirectAgentAttache] (DirectAgent-4= 24:ctx-4793889d) Seq 19-4511199451741683988: Response Received:=20 > 2015-09-14 17:35:06,595 DEBUG [c.c.a.t.Request] (DirectAgent-424:ctx-4793= 889d) Seq 19-4511199451741683988: Processing: { Ans: , MgmtId: 34505112210= 6, via: 19, Ver: v1, Flags: 10, [{"com.cloud.agent.api.StopAnswer":{"result= ":true,"details":"VM i-2-355-VM is no longer in vSphere","wait":0}}] } > 2015-09-14 17:35:06,595 DEBUG [c.c.a.t.Request] (DirectAgentCronJob-36:ct= x-08ae2caf) Seq 19-4511199451741683988: Received: { Ans: , MgmtId: 3450511= 22106, via: 19, Ver: v1, Flags: 10, { StopAnswer } } > 2015-09-14 17:35:06,666 DEBUG [c.c.n.NetworkModelImpl] (DirectAgentCronJo= b-36:ctx-08ae2caf) Service SecurityGroup is not supported in the network id= =3D211 > 2015-09-14 17:35:06,672 DEBUG [c.c.n.NetworkModelImpl] (DirectAgentCronJo= b-36:ctx-08ae2caf) Service SecurityGroup is not supported in the network id= =3D211 > 2015-09-14 17:35:06,672 DEBUG [c.c.c.CapacityManagerImpl] (DirectAgentCro= nJob-36:ctx-08ae2caf) VM state transitted from :Running to Stopped with eve= nt: FollowAgentPowerOffReportvm's original host id: 19 new host id: null ho= st id before state transition: 19 > 2015-09-14 17:35:06,689 WARN [o.a.c.alerts] (DirectAgentCronJob-36:ctx-0= 8ae2caf) alertType:: 27 // dataCenterId:: 1 // podId:: 1 // clusterId:: nu= ll // message:: VM state sync alert > 2015-09-14 17:35:06,692 DEBUG [c.c.a.AlertManagerImpl] (DirectAgentCronJo= b-36:ctx-08ae2caf) Have already sent: 1 emails for alert type '27' -- skipp= ing send email > 2015-09-14 17:35:06,692 INFO [c.c.v.VirtualMachineManagerImpl] (DirectAg= entCronJob-36:ctx-08ae2caf) VM i-2-355-VM is sync-ed to at Stopped state ac= cording to power-off report from hypervisor > 2015-09-14 17:35:06,692 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] (D= irectAgentCronJob-36:ctx-08ae2caf) Done with process of VM state report. ho= st: 19 > ... > 2015-09-14 17:35:07,754 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] (D= irectAgentCronJob-137:ctx-80d2695e) VM state report is updated. host: 20, v= m id: 355, power state: PowerOn > 2015-09-14 17:35:07,762 INFO [c.c.v.VirtualMachineManagerImpl] (DirectAg= entCronJob-137:ctx-80d2695e) VM i-2-355-VM is at Stopped and we received a = power-on report while there is no pending jobs on it > 2015-09-14 17:35:07,793 DEBUG [c.c.c.CapacityManagerImpl] (DirectAgentCro= nJob-137:ctx-80d2695e) VM state transitted from :Stopped to Running with ev= ent: FollowAgentPowerOnReportvm's original host id: 19 new host id: 20 host= id before state transition: nul > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)