incubator-cloudstack-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nitin Mehta <Nitin.Me...@citrix.com>
Subject Re: Review Request: CLOUDSTACK-606: Starting VM fails with 'ConcurrentOperationException' in a clustered MS scenario
Date Fri, 01 Feb 2013 10:07:39 GMT
Submitted with 
commit 777147ce8a47238125a5439f207c225aa9db5304
Author: Koushik Das <koushik.das@citrix.com>
Date:   Fri Feb 1 15:34:41 2013 +0530



On 01/02/13 2:55 PM, "Koushik Das" <koushik.das@citrix.com> wrote:

>
>-----------------------------------------------------------
>This is an automatically generated e-mail. To reply, visit:
>https://reviews.apache.org/r/9133/
>-----------------------------------------------------------
>
>(Updated Feb. 1, 2013, 9:25 a.m.)
>
>
>Review request for cloudstack, Abhinandan Prateek and Alex Huang.
>
>
>Changes
>-------
>
>Updated exception message as the fix is not merged yet.
>
>
>Description
>-------
>
>The issue happens randomly when hosts in a cluster gets distributed
>across multiple MS. Host can get split in following scenarios:
>    a. Add host ­ MS on which add host is executed takes ownership of the
>host. So if 2 hosts belonging to same cluster are added from 2 different
>MS then cluster gets split
>    b. scanDirectAgentToLoad ­ This runs every 90 secs. and check if
>there are any hosts that needs to be reconnected. The current logic of
>host scan can also lead to a split
>    
>    The idea is to fix (b) to ensure that hosts in a cluster are managed
>by same MS. For (a) only the entry in the database is going to be created
>except in case if the host getting added is first in the cluster (in this
>case agent creation happens at the same time) and then (b) will take care
>of connection and agent creation part. Since currently addHost only
>creates an entry in the db there is a small window where the host state
>will be shown as 'Alert' till the time (b) is scheduled and picks up the
>host to make a connection. The MS doing add host will immediately
>schedule a scan task and also send notification to peers to start the
>scan task.
>
>
>This addresses bug CLOUDSTACK-606.
>
>
>Diffs (updated)
>-----
>
>  api/src/com/cloud/agent/api/ScheduleHostScanTaskCommand.java
>PRE-CREATION 
>  server/src/com/cloud/agent/manager/ClusteredAgentManagerImpl.java
>ca0bf5c 
>  server/src/com/cloud/cluster/ClusterManagerImpl.java e341b88
>  server/src/com/cloud/host/dao/HostDaoImpl.java 0881675
>  server/src/com/cloud/resource/ResourceManagerImpl.java f82424a
>
>Diff: https://reviews.apache.org/r/9133/diff/
>
>
>Testing
>-------
>
>Manually tested the following scenarios:
>
>- Added hostA in cluster1 from MS1, gets owned by MS1 as first host in
>cluster. Added hostB in same cluster1 from MS2. Once both hosts are in
>'Up' state ensure that they are owned by the same MS (i.e. MS1).
>- Error scenarios when host goes to disconnected, alert or down state
>(disconnected host from network) and is reconnected back (connected to
>network). Ensure that once connected back, host should be owned by same
>MS as other hosts in the cluster.
>- Have a scenario where hosts are already in a distributed state (before
>the fix added hosts to the same cluster from different MSs) and ensure
>that after applying the patch and retarting the MSs distribution happens
>properly.
>- Did basic validation in a single MS setup, added multiple hosts in a
>cluster and created VMs on them.
>
>
>Thanks,
>
>Koushik Das
>


Mime
View raw message