cloudstack-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CLOUDSTACK-8848) Unexpected VR reboot after out-of-band migration
Date Sun, 27 Sep 2015 10:47:04 GMT

    [ https://issues.apache.org/jira/browse/CLOUDSTACK-8848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14909683#comment-14909683
] 

ASF GitHub Bot commented on CLOUDSTACK-8848:
--------------------------------------------

Github user karuturi commented on the pull request:

    https://github.com/apache/cloudstack/pull/885#issuecomment-143542706
  
    code LGTM. I am facing some issues with my devcloud4 setup and hence couldnt run tests.



> Unexpected VR reboot after out-of-band migration
> ------------------------------------------------
>
>                 Key: CLOUDSTACK-8848
>                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-8848
>             Project: CloudStack
>          Issue Type: Bug
>      Security Level: Public(Anyone can view this level - this is the default.) 
>          Components: VMware
>    Affects Versions: 4.5.2, 4.6.0
>            Reporter: René Moser
>            Priority: Blocker
>             Fix For: 4.5.3, 4.6.0
>
>
> In some conditions (race condition), VR gets rebooted after a out of band migration was
done on vCenter. 
> {panel:bgColor=#FFFFCE}
> Note, new global setting in 4.5.2 "VR reboot after out of band migration" is set to *false*
and this looks more like a bug.
> {panel}
> After a VR migration to a host _and_ when the VM power state report gathering is running,
the VR (and also any user VM as well) will get into the "PowerReportMissing".
> If the VM is a VR. it will be powered off and started again on vCenter. That is what
we see. In can not be reproduced every time a migration was done, but it seems the problem
is related to "powerReportMissing".
> I grep-ed the source and found this line related
> https://github.com/apache/cloudstack/blob/4.5.2/engine/orchestration/src/com/cloud/vm/VirtualMachineManagerImpl.java#L3616
> and also it seems that the graceful period might be also related, 
> https://github.com/apache/cloudstack/blob/4.5.2/engine/orchestration/src/com/cloud/vm/VirtualMachinePowerStateSyncImpl.java#L110
> In case it is a user VM, we see in the logs that the state will be set to powered-off,
but the VM keeps running. After a while a new VM power state report is running and the state
for the user VM gets updated to Running again. Below the logs 
> h2. VR  r-342-VM reboot log
> {code:none}
> 2015-09-15 09:37:06,508 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] (DirectAgentCronJob-253:ctx-c4f59216)
Run missing VM report. current time: 1442302626508
> 2015-09-15 09:37:06,508 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] (DirectAgentCronJob-253:ctx-c4f59216)
Detected missing VM. host: 19, vm id: 342, power state: PowerReportMissing, last state update:
1442302506000
> 2015-09-15 09:37:06,508 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] (DirectAgentCronJob-253:ctx-c4f59216)
vm id: 342 - time since last state update(120508ms) has passed graceful period
> 2015-09-15 09:37:06,517 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] (DirectAgentCronJob-253:ctx-c4f59216)
VM state report is updated. host: 19, vm id: 342, power state: PowerReportMissing 
> 2015-09-15 09:37:06,525 INFO  [c.c.v.VirtualMachineManagerImpl] (DirectAgentCronJob-253:ctx-c4f59216)
VM r-342-VM is at Running and we received a power-off report while there is no pending jobs
on it
> 2015-09-15 09:37:06,532 DEBUG [c.c.a.t.Request] (DirectAgentCronJob-253:ctx-c4f59216)
Seq 19-4511199451741686482: Sending  { Cmd , MgmtId: 345051122106, via: 19(cu01-testpod01-esx03.stxt.media.int),
Ver: v1, Flags: 100011, [{"com.cloud.agent.api.StopCommand":{"isProxy":false,"executeInSequence":false,"checkBeforeCleanup":true,"vmName":"r-342-VM","wait":0}}]
}
> 2015-09-15 09:37:06,532 DEBUG [c.c.a.t.Request] (DirectAgentCronJob-253:ctx-c4f59216)
Seq 19-4511199451741686482: Executing:  { Cmd , MgmtId: 345051122106, via: 19(cu01-testpod01-esx03.stxt.media.int),
Ver: v1, Flags: 100011, [{"com.cloud.agent.api.StopCommand":{"isProxy":false,"executeInSequence":false,"checkBeforeCleanup":true,"vmName":"r-342-VM","wait":0}}]
}
> 2015-09-15 09:37:06,532 DEBUG [c.c.a.m.DirectAgentAttache] (DirectAgent-136:ctx-9bc0a401)
Seq 19-4511199451741686482: Executing request
> 2015-09-15 09:37:06,532 INFO  [c.c.h.v.r.VmwareResource] (DirectAgent-136:ctx-9bc0a401
cu01-testpod01-esx03.stxt.media.int, cmd: StopCommand) Executing resource StopCommand: {"isProxy":false,"executeInSequence":false,"checkBeforeCleanup":true,"vmName":"r-342-VM","wait":0}
> 2015-09-15 09:37:06,551 DEBUG [c.c.h.v.m.HostMO] (DirectAgent-136:ctx-9bc0a401 cu01-testpod01-esx03.stxt.media.int,
cmd: StopCommand) find VM r-342-VM on host
> 2015-09-15 09:37:06,551 INFO  [c.c.h.v.m.HostMO] (DirectAgent-136:ctx-9bc0a401 cu01-testpod01-esx03.stxt.media.int,
cmd: StopCommand) VM r-342-VM not found in host cache
> 2015-09-15 09:37:06,551 DEBUG [c.c.h.v.m.HostMO] (DirectAgent-136:ctx-9bc0a401 cu01-testpod01-esx03.stxt.media.int,
cmd: StopCommand) load VM cache on host
> 2015-09-15 09:37:06,577 INFO  [c.c.h.v.r.VmwareResource] (DirectAgent-136:ctx-9bc0a401
cu01-testpod01-esx03.stxt.media.int, cmd: StopCommand) VM r-342-VM is no longer in vSphere
> 2015-09-15 09:37:06,578 DEBUG [c.c.a.m.DirectAgentAttache] (DirectAgent-136:ctx-9bc0a401)
Seq 19-4511199451741686482: Response Received: 
> 2015-09-15 09:37:06,578 DEBUG [c.c.a.t.Request] (DirectAgent-136:ctx-9bc0a401) Seq 19-4511199451741686482:
Processing:  { Ans: , MgmtId: 345051122106, via: 19, Ver: v1, Flags: 10, [{"com.cloud.agent.api.StopAnswer":{"result":true,"details":"VM
r-342-VM is no longer in vSphere","wait":0}}] }
> 2015-09-15 09:37:06,578 DEBUG [c.c.a.t.Request] (DirectAgentCronJob-253:ctx-c4f59216)
Seq 19-4511199451741686482: Received:  { Ans: , MgmtId: 345051122106, via: 19, Ver: v1, Flags:
10, { StopAnswer } }
> 2015-09-15 09:37:06,585 DEBUG [c.c.n.r.VirtualNetworkApplianceManagerImpl] (DirectAgentCronJob-253:ctx-c4f59216)
Successfully updated user statistics as a part of domR VM[DomainRouter|r-342-VM] reboot/stop
> 2015-09-15 09:37:06,617 DEBUG [c.c.c.CapacityManagerImpl] (DirectAgentCronJob-253:ctx-c4f59216)
VM state transitted from :Running to Stopped with event: FollowAgentPowerOffReportvm's original
host id: 20 new host id: null host id before state transition: 19
> 2015-09-15 09:37:06,627 WARN  [o.a.c.alerts] (DirectAgentCronJob-253:ctx-c4f59216)  alertType::
27 // dataCenterId:: 1 // podId:: 1 // clusterId:: null // message:: VM state sync alert
> 2015-09-15 09:37:06,630 DEBUG [c.c.a.AlertManagerImpl] (DirectAgentCronJob-253:ctx-c4f59216)
Have already sent: 1 emails for alert type '27' -- skipping send email
> 2015-09-15 09:37:06,630 INFO  [c.c.v.VirtualMachineManagerImpl] (DirectAgentCronJob-253:ctx-c4f59216)
VM r-342-VM is sync-ed to at Stopped state according to power-off report from hypervisor
> 2015-09-15 09:37:06,630 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] (DirectAgentCronJob-253:ctx-c4f59216)
Done with process of VM state report. host: 19
> 2015-09-15 09:37:07,506 DEBUG [c.c.c.ConsoleProxyManagerImpl] (consoleproxy-1:ctx-90d7ef3a)
Zone 1 is ready to launch console proxy
> 2015-09-15 09:37:07,524 INFO  [c.c.h.v.r.VmwareResource] (DirectAgentCronJob-64:ctx-47298215)
Scan hung worker VM to recycle
> 2015-09-15 09:37:07,611 INFO  [c.c.h.v.r.VmwareResource] (DirectAgentCronJob-254:ctx-757ec6a9)
Scan hung worker VM to recycle
> 2015-09-15 09:37:07,722 DEBUG [c.c.a.m.DirectAgentAttache] (DirectAgentCronJob-64:ctx-47298215)
Ping from 25(cu01-testpod01-esx02.stxt.media.int)
> 2015-09-15 09:37:07,722 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] (DirectAgentCronJob-64:ctx-47298215)
Process host VM state report from ping process. host: 25
> 2015-09-15 09:37:07,722 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] (DirectAgentCronJob-64:ctx-47298215)
Process VM state report. host: 25, number of records in report: 0
> 2015-09-15 09:37:07,723 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] (DirectAgentCronJob-64:ctx-47298215)
Done with process of VM state report. host: 25
> 2015-09-15 09:37:07,732 DEBUG [o.a.c.s.SecondaryStorageManagerImpl] (secstorage-1:ctx-d63339a0)
Zone 1 is ready to launch secondary storage VM
> 2015-09-15 09:37:07,774 DEBUG [c.c.a.m.DirectAgentAttache] (DirectAgentCronJob-254:ctx-757ec6a9)
Ping from 20(cu01-testpod01-esx04.stxt.media.int)
> 2015-09-15 09:37:07,774 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] (DirectAgentCronJob-254:ctx-757ec6a9)
Process host VM state report from ping process. host: 20
> 2015-09-15 09:37:07,806 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] (DirectAgentCronJob-254:ctx-757ec6a9)
Process VM state report. host: 20, number of records in report: 11
> 2015-09-15 09:37:07,806 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] (DirectAgentCronJob-254:ctx-757ec6a9)
VM state report. host: 20, vm id: 356, power state: PowerOn
> 2015-09-15 09:37:07,809 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] (DirectAgentCronJob-254:ctx-757ec6a9)
VM power state does not change, skip DB writing. vm id: 356
> 2015-09-15 09:37:07,809 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] (DirectAgentCronJob-254:ctx-757ec6a9)
VM state report. host: 20, vm id: 342, power state: PowerOn
> 2015-09-15 09:37:07,825 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] (DirectAgentCronJob-254:ctx-757ec6a9)
VM state report is updated. host: 20, vm id: 342, power state: PowerOn
> 2015-09-15 09:37:07,832 INFO  [c.c.v.VirtualMachineManagerImpl] (DirectAgentCronJob-254:ctx-757ec6a9)
VM r-342-VM is at Stopped and we received a power-on report while there is no pending jobs
on it
> 2015-09-15 09:37:07,845 INFO  [c.c.n.r.VirtualNetworkApplianceManagerImpl] (DirectAgentCronJob-254:ctx-757ec6a9)
Schedule a router reboot task as router 342 is powered-on out-of-band. we need to reboot to
refresh network rules
> 2015-09-15 09:37:07,846 INFO  [c.c.n.r.VirtualNetworkApplianceManagerImpl] (RouterMonitor-1:ctx-c118d628)
Reboot router 342 to refresh network rules
> 2015-09-15 09:37:07,847 DEBUG [c.c.c.CapacityManagerImpl] (DirectAgentCronJob-254:ctx-757ec6a9)
VM state transitted from :Stopped to Running with event: FollowAgentPowerOnReportvm's original
host id: 20 new host id: 20 host id before state transition: null
> {code}
> h2. user vm log for  i-2-355-VM 
> {code:none}
> 2015-09-14 17:35:06,525 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] (DirectAgentCronJob-36:ctx-08ae2caf)
Run missing VM report. current time: 1442244906525
> 2015-09-14 17:35:06,525 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] (DirectAgentCronJob-36:ctx-08ae2caf)
Detected missing VM. host: 19, vm id: 355, power state: PowerReportMissing, last state update:
1441982337000
> 2015-09-14 17:35:06,525 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] (DirectAgentCronJob-36:ctx-08ae2caf)
vm id: 355 - time since last state update(262569525ms) has passed graceful period
> 2015-09-14 17:35:06,534 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] (DirectAgentCronJob-36:ctx-08ae2caf)
VM state report is updated. host: 19, vm id: 355, power state: PowerReportMissing 
> 2015-09-14 17:35:06,540 INFO  [c.c.v.VirtualMachineManagerImpl] (DirectAgentCronJob-36:ctx-08ae2caf)
VM i-2-355-VM is at Running and we received a power-off report while there is no pending jobs
on it
> 2015-09-14 17:35:06,546 DEBUG [c.c.a.t.Request] (DirectAgentCronJob-36:ctx-08ae2caf)
Seq 19-4511199451741683988: Sending  { Cmd , MgmtId: 345051122106, via: 19(cu01-testpod01-esx03.stxt.media.int),
Ver: v1, Flags: 100011, [{"com.cloud.agent.api.StopCommand":{"isProxy":false,"executeInSequence":false,"checkBeforeCleanup":true,"vmName":"i-2-355-VM","wait":0}}]
}
> 2015-09-14 17:35:06,546 DEBUG [c.c.a.t.Request] (DirectAgentCronJob-36:ctx-08ae2caf)
Seq 19-4511199451741683988: Executing:  { Cmd , MgmtId: 345051122106, via: 19(cu01-testpod01-esx03.stxt.media.int),
Ver: v1, Flags: 100011, [{"com.cloud.agent.api.StopCommand":{"isProxy":false,"executeInSequence":false,"checkBeforeCleanup":true,"vmName":"i-2-355-VM","wait":0}}]
}
> 2015-09-14 17:35:06,546 DEBUG [c.c.a.m.DirectAgentAttache] (DirectAgent-424:ctx-4793889d)
Seq 19-4511199451741683988: Executing request
> 2015-09-14 17:35:06,547 INFO  [c.c.h.v.r.VmwareResource] (DirectAgent-424:ctx-4793889d
cu01-testpod01-esx03.stxt.media.int, cmd: StopCommand) Executing resource StopCommand: {"isProxy":false,"executeInSequence":false,"checkBeforeCleanup":true,"vmName":"i-2-355-VM","wait":0}
> 2015-09-14 17:35:06,565 DEBUG [c.c.h.v.m.HostMO] (DirectAgent-424:ctx-4793889d cu01-testpod01-esx03.stxt.media.int,
cmd: StopCommand) find VM i-2-355-VM on host
> 2015-09-14 17:35:06,565 INFO  [c.c.h.v.m.HostMO] (DirectAgent-424:ctx-4793889d cu01-testpod01-esx03.stxt.media.int,
cmd: StopCommand) VM i-2-355-VM not found in host cache
> 2015-09-14 17:35:06,565 DEBUG [c.c.h.v.m.HostMO] (DirectAgent-424:ctx-4793889d cu01-testpod01-esx03.stxt.media.int,
cmd: StopCommand) load VM cache on host
> 2015-09-14 17:35:06,595 INFO  [c.c.h.v.r.VmwareResource] (DirectAgent-424:ctx-4793889d
cu01-testpod01-esx03.stxt.media.int, cmd: StopCommand) VM i-2-355-VM is no longer in vSphere
> 2015-09-14 17:35:06,595 DEBUG [c.c.a.m.DirectAgentAttache] (DirectAgent-424:ctx-4793889d)
Seq 19-4511199451741683988: Response Received: 
> 2015-09-14 17:35:06,595 DEBUG [c.c.a.t.Request] (DirectAgent-424:ctx-4793889d) Seq 19-4511199451741683988:
Processing:  { Ans: , MgmtId: 345051122106, via: 19, Ver: v1, Flags: 10, [{"com.cloud.agent.api.StopAnswer":{"result":true,"details":"VM
i-2-355-VM is no longer in vSphere","wait":0}}] }
> 2015-09-14 17:35:06,595 DEBUG [c.c.a.t.Request] (DirectAgentCronJob-36:ctx-08ae2caf)
Seq 19-4511199451741683988: Received:  { Ans: , MgmtId: 345051122106, via: 19, Ver: v1, Flags:
10, { StopAnswer } }
> 2015-09-14 17:35:06,666 DEBUG [c.c.n.NetworkModelImpl] (DirectAgentCronJob-36:ctx-08ae2caf)
Service SecurityGroup is not supported in the network id=211
> 2015-09-14 17:35:06,672 DEBUG [c.c.n.NetworkModelImpl] (DirectAgentCronJob-36:ctx-08ae2caf)
Service SecurityGroup is not supported in the network id=211
> 2015-09-14 17:35:06,672 DEBUG [c.c.c.CapacityManagerImpl] (DirectAgentCronJob-36:ctx-08ae2caf)
VM state transitted from :Running to Stopped with event: FollowAgentPowerOffReportvm's original
host id: 19 new host id: null host id before state transition: 19
> 2015-09-14 17:35:06,689 WARN  [o.a.c.alerts] (DirectAgentCronJob-36:ctx-08ae2caf)  alertType::
27 // dataCenterId:: 1 // podId:: 1 // clusterId:: null // message:: VM state sync alert
> 2015-09-14 17:35:06,692 DEBUG [c.c.a.AlertManagerImpl] (DirectAgentCronJob-36:ctx-08ae2caf)
Have already sent: 1 emails for alert type '27' -- skipping send email
> 2015-09-14 17:35:06,692 INFO  [c.c.v.VirtualMachineManagerImpl] (DirectAgentCronJob-36:ctx-08ae2caf)
VM i-2-355-VM is sync-ed to at Stopped state according to power-off report from hypervisor
> 2015-09-14 17:35:06,692 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] (DirectAgentCronJob-36:ctx-08ae2caf)
Done with process of VM state report. host: 19
> ...
> 2015-09-14 17:35:07,754 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] (DirectAgentCronJob-137:ctx-80d2695e)
VM state report is updated. host: 20, vm id: 355, power state: PowerOn
> 2015-09-14 17:35:07,762 INFO  [c.c.v.VirtualMachineManagerImpl] (DirectAgentCronJob-137:ctx-80d2695e)
VM i-2-355-VM is at Stopped and we received a power-on report while there is no pending jobs
on it
> 2015-09-14 17:35:07,793 DEBUG [c.c.c.CapacityManagerImpl] (DirectAgentCronJob-137:ctx-80d2695e)
VM state transitted from :Stopped to Running with event: FollowAgentPowerOnReportvm's original
host id: 19 new host id: 20 host id before state transition: nul
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message