Mailing-List: contact issues-help@cloudstack.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@cloudstack.apache.org
Date: Thu, 14 Nov 2013 00:29:21 +0000 (UTC)
From: "Sheng Yang (JIRA)" <jira@apache.org>
To: cloudstack-issues@incubator.apache.org
Message-ID: <JIRA.12665975.1377714723298.70469.1384388961814@arcas>
In-Reply-To: <JIRA.12665975.1377714723298@arcas>
References: <JIRA.12665975.1377714723298@arcas>
Subject: [jira] [Commented] (CLOUDSTACK-4540) Parallel deployment - Vmware -
 When deploying 30 parallel Vms , 16 Vms fails to get deployed due to
 "VmDataCommand failed due to Exception: java.lang.Exception Message: Timed
 out in waiting SSH execution result"
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable


    [ https://issues.apache.org/jira/browse/CLOUDSTACK-4540?page=3Dcom.atla=
ssian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=
=3D13822032#comment-13822032 ]=20

Sheng Yang commented on CLOUDSTACK-4540:
----------------------------------------

I found it=E2=80=99s basically inevitable for this issue in theory.

The things is, VR would take time to execute the commands, say it would nee=
d time t1(which is greater than 0).

And the interval between parallel deployment is t2(which can be almost 0).

In any case, VR need to handle commands in sequence internally, so if t1 > =
t2, then the new task in the VR would wait longer and longer to execute, th=
en some commands result in timeout ultimately. No matter how long the timeo=
ut is, if there are enough big number of queued task for VR, the last ones =
can timeout.

Currently VR has a robust mechanism to sequence the jobs internal and I con=
firmed in this case, it works well. But there is no way to fix this issue i=
f VR is already 100% load at all time.

Probably we can improve the speed of VR internal executing, but seems the u=
ltimate answer is: set execute.in.sequence.network.element.commands to true=
. VR doesn=E2=80=99t know how long it would take for mgmt. server to timeou=
t, only mgmt. server knows that.

I=E2=80=99ve tested deploying 30 vms, and about exactly last 6~7 failed on =
Shweta=E2=80=99s setup with parallel execution of commands due to timeout(a=
nd lot of lock pending info in the /var/log/messages, but locks are all cle=
ared after execution completed), and no failure if set parallel to false fo=
r network element commands.

So set execute.in.sequence.network.element.commands to true is an solution.

> Parallel deployment - Vmware - When deploying 30 parallel Vms , 16 Vms fa=
ils to get deployed due to "VmDataCommand failed due to Exception: java.lan=
g.Exception Message: Timed out in waiting SSH execution result"
> -------------------------------------------------------------------------=
---------------------------------------------------------------------------=
---------------------------------------------------------------
>
>                 Key: CLOUDSTACK-4540
>                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-454=
0
>             Project: CloudStack
>          Issue Type: Bug
>      Security Level: Public(Anyone can view this level - this is the defa=
ult.)=20
>          Components: Management Server
>    Affects Versions: 4.2.0
>         Environment: Build from 4.2-forward.
>            Reporter: Sangeetha Hariharan
>            Assignee: Sheng Yang
>            Priority: Blocker
>             Fix For: 4.3.0
>
>         Attachments: management-server.log
>
>
> Parallel deployment - Vmware - When deploying 30 parallel Vms , 16 Vms fa=
ils to get deployed due to "VmDataCommand failed due to Exception: java.lan=
g.Exception
> Message: Timed out in waiting SSH execution result"
> Set up - Advanced zone with 1 Vmware 5.0.0 Esxi host.
> Deploy 30 Vms in parallel.
> 16 out of 30 vms deployed in parallel , failed due to "VmDataCommand fail=
ed due to Exception: java.lang.Exception
> Message: Timed out in waiting SSH execution result"
> Following exception seen in Management server logs:
> 2013-08-28 10:26:58,939 ERROR [vmware.resource.VmwareResource] (DirectAge=
nt-21:10.223.58.66) VmDataCommand failed due to Exception: java.lang.Except=
ion
> Message: Timed out in waiting SSH execution result
> java.lang.Exception: Timed out in waiting SSH execution result
>         at com.cloud.utils.ssh.SshHelper.sshExecute(SshHelper.java:166)
>         at com.cloud.utils.ssh.SshHelper.sshExecute(SshHelper.java:37)
>         at com.cloud.hypervisor.vmware.resource.VmwareResource.execute(Vm=
wareResource.java:2470)
>         at com.cloud.hypervisor.vmware.resource.VmwareResource.executeReq=
uest(VmwareResource.java:441)
>         at com.cloud.agent.manager.DirectAgentAttache$Task.run(DirectAgen=
tAttache.java:186)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.=
java:471)
>         at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:=
334)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutu=
reTask.access$101(ScheduledThreadPoolExecutor.java:165)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutu=
reTask.run(ScheduledThreadPoolExecutor.java:266)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolEx=
ecutor.java:1110)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolE=
xecutor.java:603)
>         at java.lang.Thread.run(Thread.java:679)
> 2013-08-28 10:26:58,940 DEBUG [agent.manager.DirectAgentAttache] (DirectA=
gent-21:null) Seq 1-170983503: Response Received:
> 2013-08-28 10:26:58,941 DEBUG [agent.transport.Request] (DirectAgent-21:n=
ull) Seq 1-170983503: Processing:  { Ans: , MgmtId: 7083743249448, via: 1, =
Ver: v1, Flags: 10, [{"com.cloud.agent.api.Answer":{"result":true,"wait":0}=
},{"com.cloud.agent.api.Answer":{"result":false,"details":"VmDataCommand fa=
iled due to Exception: java.lang.Exception\nMessage: Timed out in waiting S=
SH execution result\n","wait":0}}] }
> 2013-08-28 10:26:58,941 DEBUG [agent.transport.Request] (Job-Executor-29:=
job-398 =3D [ b3a34f25-37b2-4f33-b183-c0ea348d7af9 ]) Seq 1-170983503: Rece=
ived:  { Ans: , MgmtId: 7083743249448, via: 1, Ver: v1, Flags: 10, { Answer=
, Answer } }
> 2013-08-28 10:26:58,979 INFO  [cloud.vm.VirtualMachineManagerImpl] (Job-E=
xecutor-29:job-398 =3D [ b3a34f25-37b2-4f33-b183-c0ea348d7af9 ]) Unable to =
contact resource.
> com.cloud.exception.ResourceUnavailableException: Resource [DataCenter:1]=
 is unreachable: Unable to apply userdata and password entry on router
>         at com.cloud.network.router.VirtualNetworkApplianceManagerImpl.ap=
plyRules(VirtualNetworkApplianceManagerImpl.java:3808)
>         at com.cloud.network.router.VirtualNetworkApplianceManagerImpl.ap=
plyUserData(VirtualNetworkApplianceManagerImpl.java:2993)
>         at com.cloud.network.element.VirtualRouterElement.addPasswordAndU=
serdata(VirtualRouterElement.java:926)
>         at com.cloud.network.NetworkManagerImpl.prepareElement(NetworkMan=
agerImpl.java:2076)
>         at com.cloud.network.NetworkManagerImpl.prepareNic(NetworkManager=
Impl.java:2191)
>         at com.cloud.network.NetworkManagerImpl.prepare(NetworkManagerImp=
l.java:2127)
>         at com.cloud.vm.VirtualMachineManagerImpl.advanceStart(VirtualMac=
hineManagerImpl.java:886)
>         at com.cloud.vm.VirtualMachineManagerImpl.start(VirtualMachineMan=
agerImpl.java:578)
>         at org.apache.cloudstack.engine.cloud.entity.api.VMEntityManagerI=
mpl.deployVirtualMachine(VMEntityManagerImpl.java:227)
>         at org.apache.cloudstack.engine.cloud.entity.api.VirtualMachineEn=
tityImpl.deploy(VirtualMachineEntityImpl.java:209)
>         at com.cloud.vm.UserVmManagerImpl.startVirtualMachine(UserVmManag=
erImpl.java:3406)
>         at com.cloud.vm.UserVmManagerImpl.startVirtualMachine(UserVmManag=
erImpl.java:2966)
>         at com.cloud.vm.UserVmManagerImpl.startVirtualMachine(UserVmManag=
erImpl.java:2952)
>         at com.cloud.utils.component.ComponentInstantiationPostProcessor$=
InterceptorDispatcher.intercept(ComponentInstantiationPostProcessor.java:12=
5)
>         at org.apache.cloudstack.api.command.user.vm.DeployVMCmd.execute(=
DeployVMCmd.java:420)
>         at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:158)
>         at com.cloud.async.AsyncJobManagerImpl$1.run(AsyncJobManagerImpl.=
java:531)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.=
java:471)
>         at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:=
334)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolEx=
ecutor.java:1110)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolE=
xecutor.java:603)
>         at java.lang.Thread.run(Thread.java:679)


--
This message was sent by Atlassian JIRA
(v6.1#6144)