cloudstack-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sheng Yang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CLOUDSTACK-4540) Parallel deployment - Vmware - When deploying 30 parallel Vms , 16 Vms fails to get deployed due to "VmDataCommand failed due to Exception: java.lang.Exception Message: Timed out in waiting SSH execution result"
Date Thu, 14 Nov 2013 00:29:21 GMT

    [ https://issues.apache.org/jira/browse/CLOUDSTACK-4540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13822032#comment-13822032
] 

Sheng Yang commented on CLOUDSTACK-4540:
----------------------------------------

I found it’s basically inevitable for this issue in theory.

The things is, VR would take time to execute the commands, say it would need time t1(which
is greater than 0).

And the interval between parallel deployment is t2(which can be almost 0).

In any case, VR need to handle commands in sequence internally, so if t1 > t2, then the
new task in the VR would wait longer and longer to execute, then some commands result in timeout
ultimately. No matter how long the timeout is, if there are enough big number of queued task
for VR, the last ones can timeout.

Currently VR has a robust mechanism to sequence the jobs internal and I confirmed in this
case, it works well. But there is no way to fix this issue if VR is already 100% load at all
time.

Probably we can improve the speed of VR internal executing, but seems the ultimate answer
is: set execute.in.sequence.network.element.commands to true. VR doesn’t know how long it
would take for mgmt. server to timeout, only mgmt. server knows that.

I’ve tested deploying 30 vms, and about exactly last 6~7 failed on Shweta’s setup with
parallel execution of commands due to timeout(and lot of lock pending info in the /var/log/messages,
but locks are all cleared after execution completed), and no failure if set parallel to false
for network element commands.

So set execute.in.sequence.network.element.commands to true is an solution.

> Parallel deployment - Vmware - When deploying 30 parallel Vms , 16 Vms fails to get deployed
due to "VmDataCommand failed due to Exception: java.lang.Exception Message: Timed out in waiting
SSH execution result"
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CLOUDSTACK-4540
>                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-4540
>             Project: CloudStack
>          Issue Type: Bug
>      Security Level: Public(Anyone can view this level - this is the default.) 
>          Components: Management Server
>    Affects Versions: 4.2.0
>         Environment: Build from 4.2-forward.
>            Reporter: Sangeetha Hariharan
>            Assignee: Sheng Yang
>            Priority: Blocker
>             Fix For: 4.3.0
>
>         Attachments: management-server.log
>
>
> Parallel deployment - Vmware - When deploying 30 parallel Vms , 16 Vms fails to get deployed
due to "VmDataCommand failed due to Exception: java.lang.Exception
> Message: Timed out in waiting SSH execution result"
> Set up - Advanced zone with 1 Vmware 5.0.0 Esxi host.
> Deploy 30 Vms in parallel.
> 16 out of 30 vms deployed in parallel , failed due to "VmDataCommand failed due to Exception:
java.lang.Exception
> Message: Timed out in waiting SSH execution result"
> Following exception seen in Management server logs:
> 2013-08-28 10:26:58,939 ERROR [vmware.resource.VmwareResource] (DirectAgent-21:10.223.58.66)
VmDataCommand failed due to Exception: java.lang.Exception
> Message: Timed out in waiting SSH execution result
> java.lang.Exception: Timed out in waiting SSH execution result
>         at com.cloud.utils.ssh.SshHelper.sshExecute(SshHelper.java:166)
>         at com.cloud.utils.ssh.SshHelper.sshExecute(SshHelper.java:37)
>         at com.cloud.hypervisor.vmware.resource.VmwareResource.execute(VmwareResource.java:2470)
>         at com.cloud.hypervisor.vmware.resource.VmwareResource.executeRequest(VmwareResource.java:441)
>         at com.cloud.agent.manager.DirectAgentAttache$Task.run(DirectAgentAttache.java:186)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>         at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:165)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:266)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>         at java.lang.Thread.run(Thread.java:679)
> 2013-08-28 10:26:58,940 DEBUG [agent.manager.DirectAgentAttache] (DirectAgent-21:null)
Seq 1-170983503: Response Received:
> 2013-08-28 10:26:58,941 DEBUG [agent.transport.Request] (DirectAgent-21:null) Seq 1-170983503:
Processing:  { Ans: , MgmtId: 7083743249448, via: 1, Ver: v1, Flags: 10, [{"com.cloud.agent.api.Answer":{"result":true,"wait":0}},{"com.cloud.agent.api.Answer":{"result":false,"details":"VmDataCommand
failed due to Exception: java.lang.Exception\nMessage: Timed out in waiting SSH execution
result\n","wait":0}}] }
> 2013-08-28 10:26:58,941 DEBUG [agent.transport.Request] (Job-Executor-29:job-398 = [
b3a34f25-37b2-4f33-b183-c0ea348d7af9 ]) Seq 1-170983503: Received:  { Ans: , MgmtId: 7083743249448,
via: 1, Ver: v1, Flags: 10, { Answer, Answer } }
> 2013-08-28 10:26:58,979 INFO  [cloud.vm.VirtualMachineManagerImpl] (Job-Executor-29:job-398
= [ b3a34f25-37b2-4f33-b183-c0ea348d7af9 ]) Unable to contact resource.
> com.cloud.exception.ResourceUnavailableException: Resource [DataCenter:1] is unreachable:
Unable to apply userdata and password entry on router
>         at com.cloud.network.router.VirtualNetworkApplianceManagerImpl.applyRules(VirtualNetworkApplianceManagerImpl.java:3808)
>         at com.cloud.network.router.VirtualNetworkApplianceManagerImpl.applyUserData(VirtualNetworkApplianceManagerImpl.java:2993)
>         at com.cloud.network.element.VirtualRouterElement.addPasswordAndUserdata(VirtualRouterElement.java:926)
>         at com.cloud.network.NetworkManagerImpl.prepareElement(NetworkManagerImpl.java:2076)
>         at com.cloud.network.NetworkManagerImpl.prepareNic(NetworkManagerImpl.java:2191)
>         at com.cloud.network.NetworkManagerImpl.prepare(NetworkManagerImpl.java:2127)
>         at com.cloud.vm.VirtualMachineManagerImpl.advanceStart(VirtualMachineManagerImpl.java:886)
>         at com.cloud.vm.VirtualMachineManagerImpl.start(VirtualMachineManagerImpl.java:578)
>         at org.apache.cloudstack.engine.cloud.entity.api.VMEntityManagerImpl.deployVirtualMachine(VMEntityManagerImpl.java:227)
>         at org.apache.cloudstack.engine.cloud.entity.api.VirtualMachineEntityImpl.deploy(VirtualMachineEntityImpl.java:209)
>         at com.cloud.vm.UserVmManagerImpl.startVirtualMachine(UserVmManagerImpl.java:3406)
>         at com.cloud.vm.UserVmManagerImpl.startVirtualMachine(UserVmManagerImpl.java:2966)
>         at com.cloud.vm.UserVmManagerImpl.startVirtualMachine(UserVmManagerImpl.java:2952)
>         at com.cloud.utils.component.ComponentInstantiationPostProcessor$InterceptorDispatcher.intercept(ComponentInstantiationPostProcessor.java:125)
>         at org.apache.cloudstack.api.command.user.vm.DeployVMCmd.execute(DeployVMCmd.java:420)
>         at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:158)
>         at com.cloud.async.AsyncJobManagerImpl$1.run(AsyncJobManagerImpl.java:531)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>         at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>         at java.lang.Thread.run(Thread.java:679)



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message