Mailing-List: contact issues-help@cloudstack.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@cloudstack.apache.org
Date: Wed, 22 May 2013 19:16:20 +0000 (UTC)
From: "ASF subversion and git services (JIRA)" <jira@apache.org>
To: cloudstack-issues@incubator.apache.org
Message-ID: <JIRA.12648384.1369027021224.6082.1369250180705@arcas>
In-Reply-To: <JIRA.12648384.1369027021224@arcas>
References: <JIRA.12648384.1369027021224@arcas>
Subject: [jira] [Commented] (CLOUDSTACK-2568) ACS41 regression in storage
 subsystem (seen with local storage and 2 or more hosts)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/CLOUDSTACK-2568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13664426#comment-13664426 ] 

ASF subversion and git services commented on CLOUDSTACK-2568:
-------------------------------------------------------------

Commit 78186c3b0201ecf55779b3f4bb6a3105fec1288d in branch refs/heads/4.1 from [~prachidamle]
[ https://git-wip-us.apache.org/repos/asf?p=cloudstack.git;h=78186c3 ]

CLOUDSTACK-2568: ACS41 regression in storage subsystem (seen with local storage and 2 or more hosts)

Patch for 4.1, changes:
- In VolumeReservationVO, the getter method of a column had a typo, causing us to create a wrong searchbuilder. It was searching over the 'id' column instead of 'vm_reservation_id' causing
- This bug was causing the vm deployment to choose a wrong pool during deployment since the search was choosing incorrectly
- This bug in the GenericSearchBuilder is also fixed - if the getter method does not use the standard 'get' or 'is' prefix, one should annotate that method using
 @Column(name = "<column_name>") and indicate which column this method refers to. This will cause the GenericSearchBuilder to identify the field correctly.
- Also, let planner search for pools instead of selecting the one reserved - because there is no way currently to pass multiple pool information to the planner and this may cause issues when a VM has multiple disks.

Signed-off-by: Chip Childers <chip.childers@gmail.com>

                
> ACS41 regression in storage subsystem (seen with local storage and 2 or more hosts)
> -----------------------------------------------------------------------------------
>
>                 Key: CLOUDSTACK-2568
>                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-2568
>             Project: CloudStack
>          Issue Type: Bug
>      Security Level: Public(Anyone can view this level - this is the default.) 
>          Components: Management Server
>    Affects Versions: 4.1.0
>         Environment: RHES64 as in OEL64. Install from RPM built from latest GIT on OEL64.
> 2 or more KVM hypervisors with local storage in one cluster that has one primary NFS storage pool.
>            Reporter: Ove Ewerlid
>            Assignee: Prachi Damle
>            Priority: Blocker
>             Fix For: 4.1.0
>
>         Attachments: var-log-cloudstack-management.tar.gz
>
>
> ACS402 works with no issues when tested in exactly the setup where ACS41 fails.
> Identical configuration (the same setup program is used for testing both versions).
> In ACS410 startVM fails if and only if the advanceStart: log line picks a poolID that is not valid.
> E.g., the poolID reported in this logline appears random across a large number of tests.
> If a poolID that can not be reached by the host selected for deployment, the startVM fails.
> This is blocking upgrade from 4.0 to 4.1 since there is no  reliable way to start VMs that have been deployed. If a deployed VM fails to start, giving the startVM command multiple times, will eventually make the VM start.
> The more hosts there are, the less likely it is a startVM will succeed. It is less likely that the poolID is correct.
> The below log portion conveys how the VM has a "correct" Deployment Destination reported and the advanceStart reports a poolID that is different and since the selected hypervisor can not reach the poolID the startVM fails.
> The bug never triggers if there is only one KVM with local storage since the poolId can not be wrong, there is just one (and the NFS pool is always valid).
> -------------------
> 2013-05-20 06:49:29,477 DEBUG [cloud.deploy.FirstFitPlanner] (Job-Executor-34:job-34) Found a potential host id: 1 name: vm3-net0-s0-14.test.devops and associated storage pools for this VM
> 2013-05-20 06:49:29,478 DEBUG [cloud.deploy.FirstFitPlanner] (Job-Executor-34:job-34) Returning Deployment Destination: Dest[Zone(Id)-Pod(Id)-Cluster(Id)-Host(Id)-Storage(Volume(Id|Type-->Pool(Id))] : Dest[Zone(1)-Pod(1)-Cluster(1)-Host\
> (1)-Storage(Volume(10|ROOT-->Pool(200), Volume(11|DATADISK-->Pool(200))]
> 2013-05-20 06:49:29,495 DEBUG [cloud.capacity.CapacityManagerImpl] (Job-Executor-34:job-34) VM state transitted from :Stopped to Starting with event: StartRequestedvm's original host id: null new host id: null host id before state trans\
> ition: null
> 2013-05-20 06:49:29,495 DEBUG [cloud.vm.VirtualMachineManagerImpl] (Job-Executor-34:job-34) Successfully transitioned to start state for VM[User|testvm-a] reservation id = e644d55e-3627-4395-9f89-639e6fc2f261
> 2013-05-20 06:49:29,502 DEBUG [cloud.vm.VirtualMachineManagerImpl] (Job-Executor-34:job-34) Trying to deploy VM, vm has dcId: 1 and podId: null
> 2013-05-20 06:49:29,502 DEBUG [cloud.vm.VirtualMachineManagerImpl] (Job-Executor-34:job-34) advanceStart: DeploymentPlan is provided, using dcId:1, podId: 1, clusterId: 1, hostId: 1, poolId: 201
> 2013-05-20 06:49:29,502 DEBUG [cloud.vm.VirtualMachineManagerImpl] (Job-Executor-34:job-34) Deploy avoids pods: null, clusters: null, hosts: null
> 2013-05-20 06:49:29,504 DEBUG [cloud.deploy.FirstFitPlanner] (Job-Executor-34:job-34) DeploymentPlanner allocation algorithm: random
> 2013-05-20 06:49:29,504 DEBUG [cloud.deploy.FirstFitPlanner] (Job-Executor-34:job-34) Trying to allocate a host and storage pools from dc:1, pod:1,cluster:1, requested cpu: 4000, requested ram: 2147483648
> 2013-05-20 06:49:29,504 DEBUG [cloud.deploy.FirstFitPlanner] (Job-Executor-34:job-34) Is ROOT volume READY (pool already allocated)?: Yes
> 2013-05-20 06:49:29,504 DEBUG [cloud.deploy.FirstFitPlanner] (Job-Executor-34:job-34) DeploymentPlan has host_id specified, making no checks on this host, looks like admin test: 1
> 2013-05-20 06:49:29,505 DEBUG [cloud.deploy.FirstFitPlanner] (Job-Executor-34:job-34) Looking for suitable pools for this host under zone: 1, pod: 1, cluster: 1
> 2013-05-20 06:49:29,506 DEBUG [cloud.deploy.FirstFitPlanner] (Job-Executor-34:job-34) Checking suitable pools for volume (Id, Type): (10,ROOT)
> 2013-05-20 06:49:29,506 DEBUG [cloud.deploy.FirstFitPlanner] (Job-Executor-34:job-34) Volume has pool(201) already allocated, checking if pool can be reused, poolId: null
> 2013-05-20 06:49:29,506 DEBUG [cloud.deploy.FirstFitPlanner] (Job-Executor-34:job-34) finding pool by id '201'
> 2013-05-20 06:49:29,507 DEBUG [cloud.deploy.FirstFitPlanner] (Job-Executor-34:job-34) Planner need not allocate a pool for this volume since its READY
> 2013-05-20 06:49:29,507 DEBUG [cloud.deploy.FirstFitPlanner] (Job-Executor-34:job-34) Checking suitable pools for volume (Id, Type): (11,DATADISK)
> 2013-05-20 06:49:29,507 DEBUG [cloud.deploy.FirstFitPlanner] (Job-Executor-34:job-34) Volume has pool(201) already allocated, checking if pool can be reused, poolId: null
> 2013-05-20 06:49:29,507 DEBUG [cloud.deploy.FirstFitPlanner] (Job-Executor-34:job-34) finding pool by id '201'
> 2013-05-20 06:49:29,508 DEBUG [cloud.deploy.FirstFitPlanner] (Job-Executor-34:job-34) Planner need not allocate a pool for this volume since its READY
> 2013-05-20 06:49:29,508 DEBUG [cloud.deploy.FirstFitPlanner] (Job-Executor-34:job-34) Trying to find a potenial host and associated storage pools from the suitable host/pool lists for this VM
> 2013-05-20 06:49:29,508 DEBUG [cloud.deploy.FirstFitPlanner] (Job-Executor-34:job-34) Checking if host: 1 can access any suitable storage pool for volume: DATADISK
> 2013-05-20 06:49:29,508 DEBUG [cloud.deploy.FirstFitPlanner] (Job-Executor-34:job-34) Host: 1 cannot access pool: 201
> 2013-05-20 06:49:29,508 DEBUG [cloud.deploy.FirstFitPlanner] (Job-Executor-34:job-34) Could not find a potential host that has associated storage pools from the suitable host/pool lists for this VM
> 2013-05-20 06:49:29,508 DEBUG [cloud.deploy.FirstFitPlanner] (Job-Executor-34:job-34) Cannnot deploy to specified host, returning.
> 2013-05-20 06:49:29,524 DEBUG [cloud.capacity.CapacityManagerImpl] (Job-Executor-34:job-34) VM state transitted from :Starting to Stopped with event: OperationFailedvm's original host id: null new host id: null host id before state tran\
> sition: null
> 2013-05-20 06:49:29,533 ERROR [cloud.async.AsyncJobManagerImpl] (Job-Executor-34:job-34) Unexpected exception while executing org.apache.cloudstack.api.command.user.vm.StartVMCmd
> com.cloud.exception.InsufficientServerCapacityException: Unable to create a deployment for VM[User|testvm-a]Scope=interface com.cloud.dc.DataCenter; id=1
>         at com.cloud.vm.VirtualMachineManagerImpl.advanceStart(VirtualMachineManagerImpl.java:728)
>         at com.cloud.vm.VirtualMachineManagerImpl.start(VirtualMachineManagerImpl.java:471)
>         at org.apache.cloudstack.engine.cloud.entity.api.VMEntityManagerImpl.deployVirtualMachine(VMEntityManagerImpl.java:212)
>         at org.apache.cloudstack.engine.cloud.entity.api.VirtualMachineEntityImpl.deploy(VirtualMachineEntityImpl.java:209)
>         at com.cloud.vm.UserVmManagerImpl.startVirtualMachine(UserVmManagerImpl.java:3865)
>         at com.cloud.vm.UserVmManagerImpl.startVirtualMachine(UserVmManagerImpl.java:2573)
>         at com.cloud.utils.component.ComponentInstantiationPostProcessor$InterceptorDispatcher.intercept(ComponentInstantiationPostProcessor.java:125)
>         at org.apache.cloudstack.api.command.user.vm.StartVMCmd.execute(StartVMCmd.java:120)
>         at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:162)
>         at com.cloud.async.AsyncJobManagerImpl$1.run(AsyncJobManagerImpl.java:437)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>         at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:166)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira