cloudstack-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mani Prashanth Varma Manthena (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CLOUDSTACK-9547) ACS 4.9 + VMware: Unable to remove one of the NICs of a multi-nic guest VM
Date Tue, 18 Oct 2016 09:12:58 GMT

    [ https://issues.apache.org/jira/browse/CLOUDSTACK-9547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15584950#comment-15584950
] 

Mani Prashanth Varma Manthena edited comment on CLOUDSTACK-9547 at 10/18/16 9:12 AM:
-------------------------------------------------------------------------------------

I am only hitting this issue for guest VMs (i.e. not with VPC VRs) created in ACS 4.9 (i.e.
not in ACS 4.7) with VMware setups.

Moreover, I get the same error when I am trying to remove the NIC (i.e. network adapter) directly
from VMware's Vcenter.

There is a possible workaround for this issue from VMware on Internet, which doesn't work
in this scenario both from CloudStack and VMware: https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2081503

Most likely, this issue has something to do with how we deploy (multi-nic) guest VMs in ACS
4.9 with VMware setups.

However, installing vmware tools (not easy in our built-in templates) made the problem go
away.  With the tools installed the removal was pretty much instant.

Doesn't explain the difference between 4.7 and 4.9 that we are seeing on the same VMware setup
regarding this issue.


was (Author: prashanth varma manthena):
I am only hitting this issue for guest VMs (i.e. not with VPC VRs) created in ACS 4.9 (i.e.
not in ACS 4.7) with VMware setups.

Moreover, I get the same error when I am trying to remove the NIC (i.e. network adapter) directly
from VMware's Vcenter.

There is a possible workaround for this issue from VMware on Internet, which doesn't work
in this scenario both from CloudStack and VMware: https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2081503

Most likely, this issue has something to do with how we deploy (multi-nic) guest VMs in ACS
4.9 with VMware setups.

However, installing vmware tools (not easy in our built-in templates) made the problem go
away.  With the tools installed the removal was pretty much instant.

Doesn't explain the difference 4.7 and 4.9 that we are seeing though on the same VMware setup.

> ACS 4.9 + VMware: Unable to remove one of the NICs of a multi-nic guest VM
> --------------------------------------------------------------------------
>
>                 Key: CLOUDSTACK-9547
>                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-9547
>             Project: CloudStack
>          Issue Type: Bug
>      Security Level: Public(Anyone can view this level - this is the default.) 
>          Components: Management Server, VMware
>    Affects Versions: 4.9.0
>            Reporter: Mani Prashanth Varma Manthena
>            Priority: Critical
>              Labels: vmware
>
> Steps to reproduce:
> 1) Deploy a multi-nic guest VM (or) add a nic to a single-nic guest VM
> 2) Remove the non-default nic from the multi-nic guest VM, which fails with the following
error/exception in the management server log:
> {noformat}
> 2016-10-05 06:13:28,251 DEBUG [c.c.a.ApiServlet] (catalina-exec-14:ctx-f8dc6bd0 ctx-ee610e01)
(logid:58e9cf98) ===END===  10.31.52.95 -- GET  command=queryAsyncJobResult&jobId=9ad66ce9-6e1b-4c25-bd2e-763f4586dd86&response=json&_=1475673245452
> 2016-10-05 06:13:29,787 ERROR [c.c.h.v.r.VmwareResource] (DirectAgent-302:ctx-78a58d67
10.31.56.178, job-171/job-172, cmd: UnPlugNicCommand) (logid:9ad66ce9) Unexpected exception:
> java.lang.RuntimeException: The guest operating system did not respond to a hot-remove
request for device ethernet1 in a timely manner.
>         at com.cloud.hypervisor.vmware.util.VmwareClient.waitForTask(VmwareClient.java:354)
>         at com.cloud.hypervisor.vmware.mo.VirtualMachineMO.configureVm(VirtualMachineMO.java:949)
>         at com.cloud.hypervisor.vmware.resource.VmwareResource.execute(VmwareResource.java:1103)
>         at com.cloud.hypervisor.vmware.resource.VmwareResource.executeRequest(VmwareResource.java:469)
>         at com.cloud.agent.manager.DirectAgentAttache$Task.runInContext(DirectAgentAttache.java:315)
>         at org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
>         at org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
>         at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
>         at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
>         at org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> 2016-10-05 06:13:29,788 DEBUG [c.c.a.m.DirectAgentAttache] (DirectAgent-302:ctx-78a58d67)
(logid:9ad66ce9) Seq 4-1440588930805137508: Response Received:
> 2016-10-05 06:13:29,788 DEBUG [c.c.a.m.ClusteredAgentManagerImpl] (DirectAgent-302:ctx-78a58d67)
(logid:9ad66ce9) Seq 4-1440588930805137508: MgmtId 275619427423488: Resp: Routing to peer
> 2016-10-05 06:13:29,789 DEBUG [c.c.a.m.AgentAttache] (DirectAgent-302:ctx-78a58d67) (logid:9ad66ce9)
Seq 4-1440588930805137508: No more commands found
> 2016-10-05 06:13:31,120 DEBUG [o.s.b.f.s.DefaultListableBeanFactory] (API-Job-Executor-8:ctx-a6e36538
job-171 ctx-446c510f) (logid:9ad66ce9) Returning cached instance of singleton bean 'messageBus'
> 2016-10-05 06:13:31,127 ERROR [c.c.a.ApiAsyncJobDispatcher] (API-Job-Executor-8:ctx-a6e36538
job-171) (logid:9ad66ce9) Unexpected exception while executing org.apache.cloudstack.api.command.admin.vm.RemoveNicFromVMCmdByAdmin
> com.cloud.utils.exception.CloudRuntimeException: Unable to remove Ntwk[205|Guest|16]
from VM[User|i-2-3-VM]
>         at com.cloud.vm.UserVmManagerImpl.removeNicFromVirtualMachine(UserVmManagerImpl.java:1291)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
> at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:317)
>         at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183)
>         at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)
>         at org.apache.cloudstack.network.contrail.management.EventUtils$EventInterceptor.invoke(EventUtils.java:106)
>         at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:161)
>         at com.cloud.event.ActionEventInterceptor.invoke(ActionEventInterceptor.java:51)
>         at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:161)
>         at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:91)
>         at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
>         at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204)
>         at com.sun.proxy.$Proxy198.removeNicFromVirtualMachine(Unknown Source)
>         at org.apache.cloudstack.api.command.admin.vm.RemoveNicFromVMCmdByAdmin.execute(RemoveNicFromVMCmdByAdmin.java:44)
>         at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:150)
>         at com.cloud.api.ApiAsyncJobDispatcher.runJob(ApiAsyncJobDispatcher.java:108)
>         at org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.runInContext(AsyncJobManagerImpl.java:554)
>         at org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
>         at org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
>         at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
>         at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
>         at org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
>         at org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.run(AsyncJobManagerImpl.java:502)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> 2016-10-05 06:13:31,129 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (API-Job-Executor-8:ctx-a6e36538
job-171) (logid:9ad66ce9) Complete async job-171, jobStatus: FAILED, resultCode: 530, result:
org.apache.cloudstack.api.response.ExceptionResponse/null/{"uuidList":[],"errorcode":530,"errortext":"Unable
to remove Ntwk[205|Guest|16] from VM[User|i-2-3-VM]"}
> {noformat}
> {noformat}
> [root@csc-1 i-2-9-VM]# pwd
> /export/primary/i-2-9-VM
> [root@csc-1 i-2-9-VM]# tail -f vmware.log 
> 2016-10-06T04:34:33.214Z| vcpu-0| I120: CDROM: Emulate GET CONFIGURATION RT 2 starting
feature 0
> 2016-10-06T04:34:33.214Z| vcpu-0| I120: CDROM: Emulate GET CONFIGURATION RT 2 starting
feature 0
> 2016-10-06T04:36:09.127Z| vmx| I120: GuestRpcSendTimedOut: message to toolbox timed out.
> 2016-10-06T04:36:09.128Z| vmx| I120: Vix: [80405 guestCommands.c:1924]: Error VIX_E_TOOLS_NOT_RUNNING
in VMAutomationTranslateGuestRpcError(): VMware Tools are not running in the guest
> 2016-10-06T04:43:08.823Z| vmx| I120: Tools: No activity for 10 minutes, resetting Tools
version.
> 2016-10-06T04:43:08.824Z| vmx| I120: ToolsSetVersionWork did nothing; new tools version
(0) matches old Tools version
> 2016-10-06T04:43:23.152Z| vmx| W110: Requesting hot-remove of ethernet1
> 2016-10-06T04:50:13.889Z| mks| I120: SOCKET 3 (121) recv detected client closed connection
> 2016-10-06T04:50:13.890Z| mks| I120: SOCKET 3 (121) VNC Remote Disconnect: socket closed.
> 2016-10-06T04:53:25.699Z| vmx| W110: VigorHotRemovePoll: device ethernet1 failed to disappear.
> {noformat}
> Note: I am hitting the same issue when I am trying to directly remove the VM NIC from
VMware.
> We don't hit this issue with VPC VR while deleting one of its tiers. The vNIC gets deleted
successfully from VMware as well. 
> {noformat}
> [root@csc-1 r-5-VM]# pwd
> /export/primary/r-5-VM
> [root@csc-1 r-5-VM]# tail -f vmware.log
> 2016-10-14T02:09:27.520Z| vmx| W110: Requesting hot-remove of ethernet2
> 2016-10-14T02:09:27.622Z| vcpu-0| I120: Destroying virtual dev for scsi0:0 vscsi=8203
> 2016-10-14T02:09:27.622Z| vcpu-0| I120: VMMon_VSCSIStopVports: No such target on adapter
> 2016-10-14T02:09:27.666Z| vcpu-0| I120: Closing all the disks of the VM.
> 2016-10-14T02:09:27.666Z| vcpu-0| I120: Closing disk scsi0:0
> 2016-10-14T02:09:27.667Z| vcpu-0| I120: DISKLIB-VMFS  : "/vmfs/volumes/469adbc4-fd0b07aa/r-5-VM/ROOT-5-000001-flat.vmdk"
: closed.
> 2016-10-14T02:09:27.695Z| vcpu-0| I120: Checkpoint_Unstun: vm stopped for 73057 us
> 2016-10-14T02:09:27.695Z| vcpu-0| I120: SCSI: switching scsi0 to push completion mode
> 2016-10-14T02:09:27.702Z| vcpu-0| I120: DISK: OPEN scsi0:0 '/vmfs/volumes/469adbc4-fd0b07aa/r-5-VM/ROOT-5-000001.vmdk'
persistent R[]
> 2016-10-14T02:09:27.722Z| vcpu-0| I120: DISKLIB-VMFS  : "/vmfs/volumes/469adbc4-fd0b07aa/r-5-VM/ROOT-5-000001-flat.vmdk"
: open successful (10) size = 3145728000, hd = 3513569. Type 3
> 2016-10-14T02:09:27.722Z| vcpu-0| I120: DISKLIB-DSCPTR: Opened [0]: "ROOT-5-000001-flat.vmdk"
(0xa)
> 2016-10-14T02:09:27.722Z| vcpu-0| I120: DISKLIB-LINK  : Opened '/vmfs/volumes/469adbc4-fd0b07aa/r-5-VM/ROOT-5-000001.vmdk'
(0xa): vmfs, 6144000 sectors / 2.9 GB.
> 2016-10-14T02:09:27.723Z| vcpu-0| I120: DISKLIB-LIB   : Opened "/vmfs/volumes/469adbc4-fd0b07aa/r-5-VM/ROOT-5-000001.vmdk"
(flags 0xa, type vmfs).
> 2016-10-14T02:09:27.724Z| vcpu-0| I120: DISK: Disk '/vmfs/volumes/469adbc4-fd0b07aa/r-5-VM/ROOT-5-000001.vmdk'
has UUID '60 00 c2 93 cd 78 05 be-22 47 1c 78 bd 04 05 e0'
> 2016-10-14T02:09:27.724Z| vcpu-0| I120: DISK: OPEN '/vmfs/volumes/469adbc4-fd0b07aa/r-5-VM/ROOT-5-000001.vmdk'
Geo (6095/16/63) BIOS Geo (761/128/63)
> 2016-10-14T02:09:27.725Z| vcpu-0| I120: Creating virtual dev for scsi0:0 
> 2016-10-14T02:09:27.726Z| vcpu-0| I120: DumpDiskInfo: scsi0:0 createType=11, capacity
= 6144000, numLinks = 1, allocationType = 3
> 2016-10-14T02:09:27.726Z| vcpu-0| I120: SCSIDiskESXPopulateVDevDesc: Using FS backend
> 2016-10-14T02:09:27.726Z| vcpu-0| I120: DISKUTIL: scsi0:0 : geometry=761/128/63
> 2016-10-14T02:09:27.822Z| vcpu-0| I120: Powering off Ethernet2
> 2016-10-14T02:09:27.822Z| vcpu-0| I120: VMXVigorClassEthernet_SetConnectionStatus: request
(0 200 0) ignored as device ethernet2 could not be found.
> 2016-10-14T02:09:27.838Z| vcpu-0| I120: Hot removal done.
> 2016-10-14T02:09:31.062Z| vcpu-0| I120: HBACommon: First write on scsi0:0.fileName='/vmfs/volumes/469adbc4-fd0b07aa/r-5-VM/ROOT-5-000001.vmdk'
> 2016-10-14T02:09:31.062Z| vcpu-0| I120: DDB: "longContentID" = "72b284a1a34cba7d565f5553fa862bed"
(was "9be5af4dfbba31a0bc98f40bfef5451d")
> 2016-10-14T02:09:31.077Z| vcpu-0| I120: DISKLIB-CHAIN : DiskChainUpdateContentID: old=0xfef5451d,
new=0xfa862bed (72b284a1a34cba7d565f5553fa862bed)
> 2016-10-14T02:10:27.840Z| vmx| I120: GuestRpcSendTimedOut: message to toolbox-dnd timed
out.
> {noformat}
> Note: I am not hitting this issue with ACP 4.7 + VMware, where I am using using the same
VMware setup as above.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message