incubator-cloudstack-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chiradeep Vittal <Chiradeep.Vit...@citrix.com>
Subject Re: Review Request: CLOUDSTACK-606: Starting VM fails with 'ConcurrentOperationException' in a clustered MS scenario
Date Thu, 31 Jan 2013 19:28:55 GMT
I hope you have read this:
https://cwiki.apache.org/confluence/x/twDFAQ
A good log has the 5W's: Who, What, Where, Why, When: your log does not
indicate What, Why and Where.
A better log would be:
 "Caught an exception while trying to schedule a host scan task on <>:
ignoring because foo"

On 1/31/13 11:01 AM, "Koushik Das" <koushik.das@citrix.com> wrote:

>
>
>> On Jan. 31, 2013, 6:32 p.m., Chiradeep Vittal wrote:
>> > server/src/com/cloud/cluster/ClusterManagerImpl.java, line 371
>> > 
>><https://reviews.apache.org/r/9133/diff/3/?file=253825#file253825line371>
>> >
>> >     If the cloud operator sees this WARNING, what is he supposed to
>>do? Should it be INFO? Should you tell him that it is safe to ignore?
>
>What is the logging guideline in the case of suppressing an exception? I
>see in other places in the code that a warning is logged in a similar
>situation. As long as there is consistency I feel that warning is fine. I
>would interpret the warning as some operation failed but the system can
>recover from that.
>
>
>- Koushik
>
>
>-----------------------------------------------------------
>This is an automatically generated e-mail. To reply, visit:
>https://reviews.apache.org/r/9133/#review15951
>-----------------------------------------------------------
>
>
>On Jan. 31, 2013, 9:10 a.m., Koushik Das wrote:
>> 
>> -----------------------------------------------------------
>> This is an automatically generated e-mail. To reply, visit:
>> https://reviews.apache.org/r/9133/
>> -----------------------------------------------------------
>> 
>> (Updated Jan. 31, 2013, 9:10 a.m.)
>> 
>> 
>> Review request for cloudstack, Abhinandan Prateek and Alex Huang.
>> 
>> 
>> Description
>> -------
>> 
>> The issue happens randomly when hosts in a cluster gets distributed
>>across multiple MS. Host can get split in following scenarios:
>>     a. Add host ­ MS on which add host is executed takes ownership of
>>the host. So if 2 hosts belonging to same cluster are added from 2
>>different MS then cluster gets split
>>     b. scanDirectAgentToLoad ­ This runs every 90 secs. and check if
>>there are any hosts that needs to be reconnected. The current logic of
>>host scan can also lead to a split
>>     
>>     The idea is to fix (b) to ensure that hosts in a cluster are
>>managed by same MS. For (a) only the entry in the database is going to
>>be created except in case if the host getting added is first in the
>>cluster (in this case agent creation happens at the same time) and then
>>(b) will take care of connection and agent creation part. Since
>>currently addHost only creates an entry in the db there is a small
>>window where the host state will be shown as 'Alert' till the time (b)
>>is scheduled and picks up the host to make a connection. The MS doing
>>add host will immediately schedule a scan task and also send
>>notification to peers to start the scan task.
>> 
>> 
>> This addresses bug CLOUDSTACK-606.
>> 
>> 
>> Diffs
>> -----
>> 
>>   api/src/com/cloud/agent/api/ScheduleHostScanTaskCommand.java
>>PRE-CREATION 
>>   server/src/com/cloud/agent/manager/ClusteredAgentManagerImpl.java
>>ca0bf5c 
>>   server/src/com/cloud/cluster/ClusterManagerImpl.java e341b88
>>   server/src/com/cloud/host/dao/HostDaoImpl.java 0881675
>>   server/src/com/cloud/resource/ResourceManagerImpl.java f82424a
>> 
>> Diff: https://reviews.apache.org/r/9133/diff/
>> 
>> 
>> Testing
>> -------
>> 
>> Manually tested the following scenarios:
>> 
>> - Added hostA in cluster1 from MS1, gets owned by MS1 as first host in
>>cluster. Added hostB in same cluster1 from MS2. Once both hosts are in
>>'Up' state ensure that they are owned by the same MS (i.e. MS1).
>> - Error scenarios when host goes to disconnected, alert or down state
>>(disconnected host from network) and is reconnected back (connected to
>>network). Ensure that once connected back, host should be owned by same
>>MS as other hosts in the cluster.
>> - Have a scenario where hosts are already in a distributed state
>>(before the fix added hosts to the same cluster from different MSs) and
>>ensure that after applying the patch and retarting the MSs distribution
>>happens properly.
>> - Did basic validation in a single MS setup, added multiple hosts in a
>>cluster and created VMs on them.
>> 
>> 
>> Thanks,
>> 
>> Koushik Das
>> 
>>
>


Mime
View raw message