Return-Path: X-Original-To: apmail-cloudstack-issues-archive@www.apache.org Delivered-To: apmail-cloudstack-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2701110909 for ; Thu, 18 Jul 2013 23:00:49 +0000 (UTC) Received: (qmail 5125 invoked by uid 500); 18 Jul 2013 23:00:49 -0000 Delivered-To: apmail-cloudstack-issues-archive@cloudstack.apache.org Received: (qmail 5089 invoked by uid 500); 18 Jul 2013 23:00:49 -0000 Mailing-List: contact issues-help@cloudstack.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cloudstack.apache.org Delivered-To: mailing list issues@cloudstack.apache.org Received: (qmail 5081 invoked by uid 500); 18 Jul 2013 23:00:49 -0000 Delivered-To: apmail-incubator-cloudstack-issues@incubator.apache.org Received: (qmail 5078 invoked by uid 99); 18 Jul 2013 23:00:49 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 18 Jul 2013 23:00:49 +0000 Date: Thu, 18 Jul 2013 23:00:48 +0000 (UTC) From: "Prachi Damle (JIRA)" To: cloudstack-issues@incubator.apache.org Message-ID: In-Reply-To: References: Subject: =?utf-8?Q?[jira]_[Resolved]_(CLOUDSTACK-34?= =?utf-8?Q?51)_Parallel_deployment_-_Xenser?= =?utf-8?Q?ver_-_When_deploying_30_Vms_in_p?= =?utf-8?Q?arallel,_some_of_the_Vm_deploymen?= =?utf-8?Q?t_fails_when_=E2=80=9Capplying_dhcp_entr?= =?utf-8?Q?y/applying_userdata_and_password_?= =?utf-8?Q?entry_on_router=E2=80=9D__and_retry_even?= =?utf-8?Q?tually_happens_when_they_succeed.?= MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CLOUDSTACK-3451?page=3Dcom.atl= assian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prachi Damle resolved CLOUDSTACK-3451. -------------------------------------- Resolution: Fixed =20 > Parallel deployment - Xenserver - When deploying 30 Vms in parallel, some= of the Vm deployment fails when =E2=80=9Capplying dhcp entry/applying user= data and password entry on router=E2=80=9D and retry eventually happens wh= en they succeed. > -------------------------------------------------------------------------= ---------------------------------------------------------------------------= --------------------------------------------------------------------------- > > Key: CLOUDSTACK-3451 > URL: https://issues.apache.org/jira/browse/CLOUDSTACK-345= 1 > Project: CloudStack > Issue Type: Bug > Security Level: Public(Anyone can view this level - this is the defa= ult.)=20 > Components: Management Server > Affects Versions: 4.2.0 > Environment: Build from 4.2 > Reporter: Sangeetha Hariharan > Assignee: Prachi Damle > Priority: Critical > Fix For: 4.2.0 > > Attachments: xenparallel.rar > > > Steps to reproduce the problem: > Advanced zone set up with Xenserver host. > Deploy 30 Vms in parallel. > 9 of the Vm deployments actually had a failure when trying to =E2=80=9Cap= ply userdata and password entry on router=E2=80=9D / =E2=80=9Capply dhcp en= try=E2=80=9D. But in all these cases I see that we stop the Vm that is in = =E2=80=9CStarting=E2=80=9D state and immediately attempt to start the Vm wh= ich succeeds this time. > Issues: > 1. We should not be seeing any failures during the jobs for "applying use= rdata and password entry on router=E2=80=9D / =E2=80=9Capplying dhcp entry= =E2=80=9D. > 2. Why is there a logic to retry the Vm deployment again ? In my case I h= ave only 1 host in the setup which seems to be put in avoid state as part o= f the initial failure. But again the Vm gets successfully deployed in this = host. > Jobs that went thru this scenario are: > [root@asfmgmt management]# grep -i "stopcommand" management-server.log |= grep Executing > 2013-07-09 16:30:05,707 DEBUG [agent.transport.Request] (Job-Executor-18:= job-18) Seq 1-729350370: Executing: { Cmd , MgmtId: 7200344900649, via: 1,= Ver: v1, Flags: 100011, [{"com.cloud.agent.api.StopCommand":{"isProxy":fal= se,"executeInSequence":false,"vmName":"i-3-9-VM","wait":0}}] } > 2013-07-09 16:30:12,547 DEBUG [agent.transport.Request] (Job-Executor-15:= job-15) Seq 1-729350377: Executing: { Cmd , MgmtId: 7200344900649, via: 1,= Ver: v1, Flags: 100011, [{"com.cloud.agent.api.StopCommand":{"isProxy":fal= se,"executeInSequence":false,"vmName":"i-3-7-VM","wait":0}}] } > 2013-07-09 16:30:20,990 DEBUG [agent.transport.Request] (Job-Executor-38:= job-38) Seq 1-729350381: Executing: { Cmd , MgmtId: 7200344900649, via: 1,= Ver: v1, Flags: 100011, [{"com.cloud.agent.api.StopCommand":{"isProxy":fal= se,"executeInSequence":false,"vmName":"i-3-26-VM","wait":0}}] } > 2013-07-09 16:30:23,529 DEBUG [agent.transport.Request] (Job-Executor-25:= job-25) Seq 1-729350383: Executing: { Cmd , MgmtId: 7200344900649, via: 1,= Ver: v1, Flags: 100011, [{"com.cloud.agent.api.StopCommand":{"isProxy":fal= se,"executeInSequence":false,"vmName":"i-3-18-VM","wait":0}}] } > 2013-07-09 16:30:32,231 DEBUG [agent.transport.Request] (Job-Executor-35:= job-35) Seq 1-729350390: Executing: { Cmd , MgmtId: 7200344900649, via: 1,= Ver: v1, Flags: 100011, [{"com.cloud.agent.api.StopCommand":{"isProxy":fal= se,"executeInSequence":false,"vmName":"i-3-27-VM","wait":0}}] } > 2013-07-09 16:30:45,744 DEBUG [agent.transport.Request] (Job-Executor-17:= job-17) Seq 1-729350407: Executing: { Cmd , MgmtId: 7200344900649, via: 1,= Ver: v1, Flags: 100011, [{"com.cloud.agent.api.StopCommand":{"isProxy":fal= se,"executeInSequence":false,"vmName":"i-3-8-VM","wait":0}}] } > 2013-07-09 16:30:46,511 DEBUG [agent.transport.Request] (Job-Executor-36:= job-36) Seq 1-729350408: Executing: { Cmd , MgmtId: 7200344900649, via: 1,= Ver: v1, Flags: 100011, [{"com.cloud.agent.api.StopCommand":{"isProxy":fal= se,"executeInSequence":false,"vmName":"i-3-25-VM","wait":0}}] } > 2013-07-09 16:31:04,826 DEBUG [agent.transport.Request] (Job-Executor-35:= job-35) Seq 1-729350422: Executing: { Cmd , MgmtId: 7200344900649, via: 1,= Ver: v1, Flags: 100011, [{"com.cloud.agent.api.StopCommand":{"isProxy":fal= se,"executeInSequence":false,"vmName":"i-3-27-VM","wait":0}}] } > 2013-07-09 16:31:17,707 DEBUG [agent.transport.Request] (Job-Executor-26:= job-26) Seq 1-729350432: Executing: { Cmd , MgmtId: 7200344900649, via: 1,= Ver: v1, Flags: 100011, [{"com.cloud.agent.api.StopCommand":{"isProxy":fal= se,"executeInSequence":false,"vmName":"i-3-15-VM","wait":0}}] } > Management server log snippet: > 2013-07-09 16:30:22,214 DEBUG [cloud.vm.VirtualMachineManagerImpl] (Job-E= xecutor-38:job-38) Successfully cleanued up resources for the vm VM[U > ser|hello-16] in Starting state > 2013-07-09 16:30:22,220 DEBUG [cloud.deploy.DeploymentPlanningManagerImpl= ] (Job-Executor-38:job-38) Deploy avoids pods: null, clusters: null, > hosts: [1] > 2013-07-09 16:30:22,220 DEBUG [cloud.deploy.DeploymentPlanningManagerImpl= ] (Job-Executor-38:job-38) DataCenter id =3D '1' provided is in avoid s > et, DeploymentPlanner cannot allocate the VM, returning. > 2013-07-09 16:30:22,236 DEBUG [cloud.capacity.CapacityManagerImpl] (Job-E= xecutor-38:job-38) VM state transitted from :Starting to Stopped with > event: OperationFailedvm's original host id: null new host id: null host = id before state transition: 1 > 2013-07-09 16:30:22,241 DEBUG [cloud.capacity.CapacityManagerImpl] (Job-E= xecutor-38:job-38) Hosts's actual total CPU: 9044 and CPU after apply > ing overprovisioning: 9044 > 2013-07-09 16:30:22,241 DEBUG [cloud.capacity.CapacityManagerImpl] (Job-E= xecutor-38:job-38) Hosts's actual total RAM: 16190149248 and RAM afte > r applying overprovisioning: 16190149632 > 2013-07-09 16:30:22,241 DEBUG [cloud.capacity.CapacityManagerImpl] (Job-E= xecutor-38:job-38) release cpu from host: 1, old used: 4600,reserved: > 0, actual total: 9044, total with overprovisioning: 9044; new used: 4500,= reserved:0; movedfromreserved: false,moveToReserveredfalse > 2013-07-09 16:30:22,241 DEBUG [cloud.capacity.CapacityManagerImpl] (Job-E= xecutor-38:job-38) release mem from host: 1, old used: 9602859008,res > erved: 0, total: 16190149632; new used: 9340715008,reserved:0; movedfromr= eserved: false,moveToReserveredfalse > 2013-07-09 16:30:22,256 DEBUG [cloud.capacity.CapacityManagerImpl] (Job-E= xecutor-38:job-38) VM state transitted from :Stopped to Starting with > event: StartRequestedvm's original host id: null new host id: null host i= d before state transition: null > 2013-07-09 16:30:22,256 DEBUG [cloud.vm.VirtualMachineManagerImpl] (Job-E= xecutor-38:job-38) Successfully transitioned to start state for VM[Us > er|hello-16] reservation id =3D 8b8d8303-0f78-4f03-9ee3-4cc3e129c746 > 2013-07-09 16:30:22,262 DEBUG [cloud.vm.VirtualMachineManagerImpl] (Job-E= xecutor-38:job-38) Trying to deploy VM, vm has dcId: 1 and podId: 1 > 2013-07-09 16:30:22,262 DEBUG [cloud.vm.VirtualMachineManagerImpl] (Job-E= xecutor-38:job-38) Deploy avoids pods: null, clusters: null, hosts: n > ull > 2013-07-09 16:30:22,269 DEBUG [cloud.deploy.DeploymentPlanningManagerImpl= ] (Job-Executor-38:job-38) Deploy avoids pods: null, clusters: null, > hosts: null > 2013-07-09 16:30:22,270 DEBUG [cloud.deploy.DeploymentPlanningManagerImpl= ] (Job-Executor-38:job-38) DeploymentPlanner allocation algorithm: co > m.cloud.deploy.FirstFitPlanner_EnhancerByCloudStack_b2132c10@13aee390 > 2013-07-09 16:30:22,270 DEBUG [cloud.deploy.DeploymentPlanningManagerImpl= ] (Job-Executor-38:job-38) Trying to allocate a host and storage pool > s from dc:1, pod:1,cluster:null, requested cpu: 100, requested ram: 26214= 4000 > 2013-07-09 16:30:22,273 DEBUG [cloud.deploy.DeploymentPlanningManagerImpl= ] (Job-Executor-38:job-38) Is ROOT volume READY (pool already allocat > ed)?: No > 2013-07-09 16:30:22,273 DEBUG [cloud.deploy.FirstFitPlanner] (Job-Executo= r-38:job-38) Searching resources only under specified Pod: 1 > 2013-07-09 16:30:22,273 DEBUG [cloud.deploy.FirstFitPlanner] (Job-Executo= r-38:job-38) Listing clusters in order of aggregate capacity, that ha > ve (atleast one host with) enough CPU and RAM capacity under this Pod: 1 > 2013-07-09 16:30:22,280 DEBUG [cloud.deploy.DeploymentPlanningManagerImpl= ] (Job-Executor-38:job-38) Checking resources in Cluster: 1 under Pod > : 1 > 2013-07-09 16:30:22,280 DEBUG [allocator.impl.FirstFitAllocator] (Job-Exe= cutor-38:job-38 FirstFitRoutingAllocator) Looking for hosts in dc: 1 > pod:1 cluster:1 > 2013-07-09 16:30:22,282 DEBUG [allocator.impl.FirstFitAllocator] (Job-Exe= cutor-38:job-38 FirstFitRoutingAllocator) FirstFitAllocator has 1 hos > ts to check for allocation: [Host[-1-Routing]] -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrato= rs For more information on JIRA, see: http://www.atlassian.com/software/jira