cloudstack-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mani Prashanth Varma Manthena (JIRA)" <j...@apache.org>
Subject [jira] [Created] (CLOUDSTACK-9575) ACS 4.9 + VMware/ESXi 6.0: Sometimes VRs are failing to fully come up into running state
Date Fri, 04 Nov 2016 10:21:58 GMT
Mani Prashanth Varma Manthena created CLOUDSTACK-9575:
---------------------------------------------------------

             Summary: ACS 4.9 + VMware/ESXi 6.0: Sometimes VRs are failing to fully come up
into running state
                 Key: CLOUDSTACK-9575
                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-9575
             Project: CloudStack
          Issue Type: Bug
      Security Level: Public (Anyone can view this level - this is the default.)
          Components: Management Server, VMware
    Affects Versions: 4.9.0
            Reporter: Mani Prashanth Varma Manthena
            Priority: Critical


Here is the exception in the cloudstack-management server log:
{noformat}
2016-10-21 07:33:53,361 WARN  [c.c.n.v.VpcManagerImpl] (API-Job-Executor-62:ctx-749d66c8 job-102
ctx-25e96d70) (logid:502d5b39) Failed to start vpc [VPC [9-nuageVpcWithoutStaticNatNetACL]
due to
com.cloud.exception.AgentUnavailableException: Resource [Host:3] is unreachable: Host 3: Unable
to start instance due to Unable to start  VM:bd09d7ab-81f1-4053-a838-869d108cb827 due to error
in finalizeStart, not retrying
        at com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMachineManagerImpl.java:1120)
        at com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMachineManagerImpl.java:4592)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at com.cloud.vm.VmWorkJobHandlerProxy.handleVmWorkJob(VmWorkJobHandlerProxy.java:107)
        at com.cloud.vm.VirtualMachineManagerImpl.handleVmWorkJob(VirtualMachineManagerImpl.java:4753)
        at com.cloud.vm.VmWorkJobDispatcher.runJob(VmWorkJobDispatcher.java:102)
        at org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.runInContext(AsyncJobManagerImpl.java:554)
        at org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
        at org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
        at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
        at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
        at org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
        at org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.run(AsyncJobManagerImpl.java:502)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: com.cloud.utils.exception.ExecutionException: Unable to start  VM:bd09d7ab-81f1-4053-a838-869d108cb827
due to error in finalizeStart, not retrying
        at com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMachineManagerImpl.java:1084)
        at com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMachineManagerImpl.java:4592)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        ... 18 more
2016-10-21 07:33:53,362 DEBUG [c.c.n.v.VpcManagerImpl] (API-Job-Executor-62:ctx-749d66c8 job-102
ctx-25e96d70) (logid:502d5b39) Destroying vpc [VPC [9-nuageVpcWithoutStaticNatNetACL] that
failed to start
2016-10-21 07:33:53,362 DEBUG [c.c.n.v.VpcManagerImpl] (API-Job-Executor-62:ctx-749d66c8 job-102
ctx-25e96d70) (logid:502d5b39) Destroying vpc [VPC [9-nuageVpcWithoutStaticNatNetACL]
2016-10-21 07:33:53,363 DEBUG [c.c.n.v.VpcManagerImpl] (API-Job-Executor-62:ctx-749d66c8 job-102
ctx-25e96d70) (logid:502d5b39) Updating VPC [VPC [9-nuageVpcWithoutStaticNatNetACL] with state
Inactive as a part of vpc delete
2016-10-21 07:33:53,373 DEBUG [c.c.n.v.VpcManagerImpl] (API-Job-Executor-62:ctx-749d66c8 job-102
ctx-25e96d70) (logid:502d5b39) Shutting down vpc [VPC [9-nuageVpcWithoutStaticNatNetACL]
2016-10-21 07:33:53,376 DEBUG [c.c.n.r.NetworkHelperImpl] (API-Job-Executor-62:ctx-749d66c8
job-102 ctx-25e96d70) (logid:502d5b39) Attempting to destroy router 13
{noformat}

Note: We are hitting this issue 1 out of three times in our regressions.

Here are example failing regression runs due to this issue:
http://mvdcdev28.us.alcatel-lucent.com/results/2016/Month_10/Oct_20/08:07:50.Prashanth.Manthena/masterlog_list.html
http://mvdcdev28.us.alcatel-lucent.com/results/2016/Month_10/Oct_20/10:28:09.Prashanth.Manthena/masterlog_list.html
http://mvdcdev28.us.alcatel-lucent.com/results/2016/Month_10/Oct_21/05:53:01.Prashanth.Manthena/masterlog_list.html


Here are the failures on the ESXi 6.0 host:
{noformat}
2016-10-25T05:54:12.964Z error hostd[2B981B70] [Originator@6876 sub=Hbrsvc] Failed to retrieve
VM config (id=7)                                                                         
                           
2016-10-25T05:54:12.964Z error hostd[2B981B70] [Originator@6876 sub=Hbrsvc] Replicator: VmReconfig
failed to retrieve replication config for VM 7, ignoring: vim.fault.ReplicationVmConfigFault
2016-10-25T05:54:12.996Z warning hostd[2BB81B70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/469adbc4-fd0b07aa/r-7-VM/r-7-VM.vmx
opID=76510a94-ea-f9-04a0 user=vpxuser:VSPHERE.LOCAL\Administrator] CannotRetrieveCorefiles:
VM is in an invalid state
2016-10-25T05:54:13.022Z info hostd[2BB81B70] [Originator@6876 sub=Libs opID=76510a94-ea-f9-04a0
user=vpxuser:VSPHERE.LOCAL\Administrator] Error getting extended config name from config file.
                    
2016-10-25T05:54:13.022Z info hostd[2BB81B70] [Originator@6876 sub=Libs opID=76510a94-ea-f9-04a0
user=vpxuser:VSPHERE.LOCAL\Administrator] Error getting extended config file path.
2016-10-25T05:54:55.764Z info hostd[2BC85B70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/469adbc4-fd0b07aa/r-7-VM/r-7-VM.vmx
opID=HB-SpecSync-host-3855@93-2beb0d5-87-06ad user=vpxuser] State Transition (VM_STATE_ON
-> VM_STATE_RECONFIGURING)
2016-10-25T05:54:55.765Z info hostd[2BC85B70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/469adbc4-fd0b07aa/r-7-VM/r-7-VM.vmx
opID=HB-SpecSync-host-3855@93-2beb0d5-87-06ad user=vpxuser] State Transition (VM_STATE_RECONFIGURING
-> VM_STATE_ON)
2016-10-25T05:54:55.766Z info hostd[2BBC2B70] [Originator@6876 sub=Guestsvc.GuestFileTransferImpl]
Entered VmPowerStateListener                                                             
                       
2016-10-25T05:54:55.766Z info hostd[2BBC2B70] [Originator@6876 sub=Guestsvc.GuestFileTransferImpl]
VmPowerStateListener succeeded                                                           
                       
2016-10-25T05:54:55.766Z info hostd[2BBC2B70] [Originator@6876 sub=Hbrsvc] Replicator: powerstate
change VM: 7 Old: 1 New: 1                                                               
                        
2016-10-25T05:54:55.766Z info hostd[2BC85B70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/469adbc4-fd0b07aa/r-7-VM/r-7-VM.vmx
opID=HB-SpecSync-host-3855@93-2beb0d5-87-06ad user=vpxuser] Send config update invoked
2016-10-25T05:54:55.783Z info hostd[2BC85B70] [Originator@6876 sub=Vimsvc.TaskManager opID=HB-SpecSync-host-3855@93-2beb0d5-87-06ad
user=vpxuser] Task Completed : haTask-ha-root-pool-vim.ResourcePool.updateChildResourceConfiguration-1928
Status success
2016-10-25T05:54:58.330Z info hostd[2A59FB70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/469adbc4-fd0b07aa/r-7-VM/r-7-VM.vmx]
Send config update invoked                                                           
2016-10-25T05:54:58.366Z info hostd[2A59FB70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/469adbc4-fd0b07aa/r-7-VM/r-7-VM.vmx]
Send config update invoked                                                           
2016-10-25T05:54:58.385Z info hostd[2A59FB70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/469adbc4-fd0b07aa/r-7-VM/r-7-VM.vmx]
Send config update invoked                                                           
2016-10-25T05:54:58.438Z info hostd[2A59FB70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/469adbc4-fd0b07aa/r-7-VM/r-7-VM.vmx]
Tools manifest version status changed from guestToolsNotInstalled to guestToolsUnmana
2016-10-25T05:54:58.447Z info hostd[2A59FB70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/469adbc4-fd0b07aa/r-7-VM/r-7-VM.vmx]
Send config update invoked                                                           
2016-10-25T05:54:58.469Z info hostd[2A59FB70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/469adbc4-fd0b07aa/r-7-VM/r-7-VM.vmx]
Send config update invoked                                                           
2016-10-25T05:54:58.492Z info hostd[2A59FB70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/469adbc4-fd0b07aa/r-7-VM/r-7-VM.vmx]
Tools manifest version status changed from guestToolsUnmanaged to guestToolsUnmanaged
2016-10-25T05:54:58.498Z info hostd[2A59FB70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/469adbc4-fd0b07aa/r-7-VM/r-7-VM.vmx]
Send config update invoked                                                           
2016-10-25T05:54:58.515Z info hostd[2A59FB70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/469adbc4-fd0b07aa/r-7-VM/r-7-VM.vmx]
Setting the tools properties cache.                                                  
2016-10-25T05:55:05.555Z info hostd[FFA02A70] [Originator@6876 sub=Vimsvc.TaskManager opID=134074df-7f-06cc
user=vpxuser:VSPHERE.LOCAL\vpxd-extension-57107a00-4014-11e6-85ca-90e2ba4b0016] Task Created
: haTask-ha-host-vim.HostSystem.acquireCimServicesTicket-1943
2016-10-25T05:55:05.556Z info hostd[2BBC2B70] [Originator@6876 sub=Vimsvc.TaskManager opID=134074df-7f-06cc
user=vpxuser:VSPHERE.LOCAL\vpxd-extension-57107a00-4014-11e6-85ca-90e2ba4b0016] Task Completed
: haTask-ha-host-vim.HostSystem.acquireCimServicesTicket-1943 Status success
2016-10-25T05:55:05.780Z info hostd[2A59FB70] [Originator@6876 sub=Vimsvc.TaskManager opID=HB-host-3855@531-609c8d99-79-06d3
user=vpxuser] Task Created : haTask--vim.OverheadService.downloadVMXConfig-1946        
2016-10-25T05:55:05.781Z info hostd[2A59FB70] [Originator@6876 sub=Vimsvc.TaskManager opID=HB-host-3855@531-609c8d99-79-06d3
user=vpxuser] Task Completed : haTask--vim.OverheadService.downloadVMXConfig-1946 Status success
2016-10-25T05:55:28.986Z info hostd[2BBC2B70] [Originator@6876 sub=Vimsvc.TaskManager opID=358a7cd8-cf-06ea
user=vpxuser:VSPHERE.LOCAL\Administrator] Task Created : haTask-7-vim.VirtualMachine.shutdownGuest-1957
2016-10-25T05:55:28.987Z info hostd[2BBC2B70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/469adbc4-fd0b07aa/r-7-VM/r-7-VM.vmx
opID=358a7cd8-cf-06ea user=vpxuser:VSPHERE.LOCAL\Administrator] State Transition (VM_STATE_ON
-> VM_STATE_SHUTTING_DOWN)
2016-10-25T05:55:28.987Z info hostd[2BBC2B70] [Originator@6876 sub=Vimsvc.ha-eventmgr opID=358a7cd8-cf-06ea
user=vpxuser:VSPHERE.LOCAL\Administrator] Event 496 : Guest OS shut down for r-7-VM on esxi-2.mvdcdev28.us.alcatel-lucent.com
in ha-datacenter
2016-10-25T05:55:29.202Z info hostd[2BBC2B70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/469adbc4-fd0b07aa/r-7-VM/r-7-VM.vmx
opID=358a7cd8-cf-06ea user=vpxuser:VSPHERE.LOCAL\Administrator] State Transition (VM_STATE_SHUTTING_DOWN
-> VM_STATE_ON_SHUTTING_DOWN)
2016-10-25T05:55:29.203Z info hostd[2BBC2B70] [Originator@6876 sub=Vimsvc.TaskManager opID=358a7cd8-cf-06ea
user=vpxuser:VSPHERE.LOCAL\Administrator] Task Completed : haTask-7-vim.VirtualMachine.shutdownGuest-1957
Status success
2016-10-25T05:55:30.163Z info hostd[2BB81B70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/469adbc4-fd0b07aa/r-7-VM/r-7-VM.vmx]
Send config update invoked                                                           
2016-10-25T05:55:30.182Z info hostd[2BB81B70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/469adbc4-fd0b07aa/r-7-VM/r-7-VM.vmx]
Send config update invoked                                                           
[LikewiseGetDomainJoinInfo:355] QueryInformation(): ERROR_FILE_NOT_FOUND (2/0):          
                                                                                         
                                
2016-10-25T05:55:48.770Z info hostd[2A59FB70] [Originator@6876 sub=Hostsvc opID=6ea70743]
VsanSystemVmkProvider : GetRuntimeInfo: Complete, runtime info: (vim.vsan.host.VsanRuntimeInfo)
{                         
-->    accessGenNo = 0,                                                               
                                                                                         
                                   
--> }                                                                                 
                                                                                         
                                   
2016-10-25T05:55:54.814Z info hostd[2B981B70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/469adbc4-fd0b07aa/r-7-VM/r-7-VM.vmx]
Send config update invoked                                                           
2016-10-25T05:55:54.825Z info hostd[2C080B70] [Originator@6876 sub=Vimsvc.ha-eventmgr] Event
497 : The dvPort 262 link was down in the vSphere Distributed Switch  in ha-datacenter   
                             
2016-10-25T05:55:54.887Z warning hostd[FFA43B70] [Originator@6876 sub=Libs] VigorOnlineStatusCb:
Failed to find RPC Request in pending list for sequence 0                                
                         
2016-10-25T05:55:54.887Z info hostd[FFA43B70] [Originator@6876 sub=Libs] SOCKET 5 (44)   
                                                                                         
                                
2016-10-25T05:55:54.887Z info hostd[FFA43B70] [Originator@6876 sub=Libs] recv detected client
closed connection                                                                        
                            
2016-10-25T05:55:54.887Z info hostd[FFA43B70] [Originator@6876 sub=Libs] VigorTransportClientProcessError:
Remote connection failure                                                                
               
2016-10-25T05:55:54.887Z info hostd[FFA43B70] [Originator@6876 sub=Libs] VigorTransportClientDrainRecv:
draining read.                                                                           
                  
2016-10-25T05:55:54.887Z info hostd[FFA43B70] [Originator@6876 sub=Libs] SOCKET 5 (44)   
                                                                                         
                                
2016-10-25T05:55:54.887Z info hostd[FFA43B70] [Originator@6876 sub=Libs] recv detected client
closed connection                                                                        
                            
2016-10-25T05:55:54.887Z info hostd[FFA43B70] [Originator@6876 sub=Libs] VigorTransportClientProcessError:
closing connection.                                                                      
               
2016-10-25T05:55:54.887Z info hostd[FFA43B70] [Originator@6876 sub=Libs] VigorTransportClientManageConnection:
connection closed.                                                                       
           
2016-10-25T05:55:54.892Z info hostd[FFA43B70] [Originator@6876 sub=Libs] CnxAuthdProtoSecureConnect:
Unencrypted connection, skipping thumbprint exchange.                                    
                     
2016-10-25T05:55:54.900Z info hostd[FFA43B70] [Originator@6876 sub=Libs] CnxConnectAuthd:
Returning false because CnxAuthdProtoConnect failed                                      
                                
2016-10-25T05:55:54.900Z info hostd[FFA43B70] [Originator@6876 sub=Libs] Cnx_Connect: Returning
false because CnxConnectAuthd failed                                                     
                          
2016-10-25T05:55:54.900Z info hostd[FFA43B70] [Originator@6876 sub=vm:Cnx_Connect: Error message:
There is no VMware process running for config file /vmfs/volumes/469adbc4-fd0b07aa/r-7-VM/r-7-VM.vmx]
            
2016-10-25T05:55:54.900Z warning hostd[FFA43B70] [Originator@6876 sub=vm:VigorTransportClientManageConnection:
Failed to re-connect to VM /vmfs/volumes/469adbc4-fd0b07aa/r-7-VM/r-7-VM.vmx]            
           
2016-10-25T05:55:54.902Z info hostd[FFA43B70] [Originator@6876 sub=Libs] VigorOnlineDisconnectCb:
connection closed (is final).                                                            
                        
2016-10-25T05:55:54.920Z info hostd[2B981B70] [Originator@6876 sub=Hostsvc] Lookupvm: Cartel
ID not set for VM 7                                                                      
                             
2016-10-25T05:55:54.920Z warning hostd[2B981B70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/469adbc4-fd0b07aa/r-7-VM/r-7-VM.vmx]
Unable to get resource settings for a powered on VM                               
2016-10-25T05:55:54.920Z info hostd[2B981B70] [Originator@6876 sub=Hostsvc] Lookupvm: Cartel
ID not set for VM 7                                                                      
                             
2016-10-25T05:55:54.931Z info hostd[2B981B70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/469adbc4-fd0b07aa/r-7-VM/r-7-VM.vmx]
Send config update invoked                                                           
2016-10-25T05:55:54.952Z info hostd[2B981B70] [Originator@6876 sub=Hostsvc] Lookupvm: Cartel
ID not set for VM 7                                                                      
                             
2016-10-25T05:55:54.952Z warning hostd[2B981B70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/469adbc4-fd0b07aa/r-7-VM/r-7-VM.vmx]
Unable to get resource settings for a powered on VM                               
2016-10-25T05:55:54.952Z info hostd[2B981B70] [Originator@6876 sub=Hostsvc] Lookupvm: Cartel
ID not set for VM 7                                                                      
                             
2016-10-25T05:55:54.962Z info hostd[2B981B70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/469adbc4-fd0b07aa/r-7-VM/r-7-VM.vmx]
State Transition (VM_STATE_ON_SHUTTING_DOWN -> VM_STATE_OFF)
2016-10-25T05:55:54.963Z info hostd[2A59FB70] [Originator@6876 sub=Guestsvc.GuestFileTransferImpl]
Entered VmPowerStateListener                                                             
                       
2016-10-25T05:55:54.963Z info hostd[2A59FB70] [Originator@6876 sub=Guestsvc.GuestFileTransferImpl]
VmPowerStateListener succeeded                                                           
                       
2016-10-25T05:55:54.963Z info hostd[2A59FB70] [Originator@6876 sub=Hbrsvc] Replicator: powerstate
change VM: 7 Old: 1 New: 0                                                               
                        
2016-10-25T05:55:54.963Z info hostd[2A59FB70] [Originator@6876 sub=Hbrsvc] Replicator: Poweroff
for VM: (id=7)                                                                           
                          
2016-10-25T05:55:54.963Z warning hostd[2B981B70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/469adbc4-fd0b07aa/r-7-VM/r-7-VM.vmx]
Failed to find activation record, event user unknown.                             
2016-10-25T05:55:54.964Z info hostd[2B981B70] [Originator@6876 sub=Vimsvc.ha-eventmgr] Event
498 : r-7-VM on  esxi-2.mvdcdev28.us.alcatel-lucent.com in ha-datacenter is powered off
{noformat}

Workaround: Redo/retry the operation that brings up the VRs (e.g. create VPCs/networks, restart
VPCs/networks with cleanup, etc.).




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message