Return-Path: X-Original-To: apmail-cloudstack-dev-archive@www.apache.org Delivered-To: apmail-cloudstack-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id EC16810677 for ; Wed, 16 Oct 2013 21:52:33 +0000 (UTC) Received: (qmail 12162 invoked by uid 500); 16 Oct 2013 21:52:32 -0000 Delivered-To: apmail-cloudstack-dev-archive@cloudstack.apache.org Received: (qmail 12130 invoked by uid 500); 16 Oct 2013 21:52:30 -0000 Mailing-List: contact dev-help@cloudstack.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cloudstack.apache.org Delivered-To: mailing list dev@cloudstack.apache.org Received: (qmail 12122 invoked by uid 99); 16 Oct 2013 21:52:29 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 16 Oct 2013 21:52:29 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of Prachi.Damle@citrix.com designates 66.165.176.63 as permitted sender) Received: from [66.165.176.63] (HELO SMTP02.CITRIX.COM) (66.165.176.63) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 16 Oct 2013 21:52:23 +0000 X-IronPort-AV: E=Sophos;i="4.93,509,1378857600"; d="scan'208";a="61630297" Received: from sjcpex01cl01.citrite.net ([10.216.14.143]) by FTLPIPO02.CITRIX.COM with ESMTP/TLS/AES128-SHA; 16 Oct 2013 21:52:00 +0000 Received: from SJCPEX01CL02.citrite.net ([169.254.2.131]) by SJCPEX01CL01.citrite.net ([10.216.14.143]) with mapi id 14.02.0342.004; Wed, 16 Oct 2013 14:51:59 -0700 From: Prachi Damle To: "dev@cloudstack.apache.org" Subject: RE: Possible bug in DeploymentPlanner? Thread-Topic: Possible bug in DeploymentPlanner? Thread-Index: AQHOycJQKhiYuwYoAkiitoaYVX4p+pn2CwaA//+H8oCAAImDAIAAnhqA//+QuEA= Date: Wed, 16 Oct 2013 21:51:58 +0000 Message-ID: <75942ED3FD12D64EABE71209BD0F62CC0F5B4D@SJCPEX01CL02.citrite.net> References: <306B1185-8E58-4FB3-9816-8EE23F5888C8@citrix.com> <75942ED3FD12D64EABE71209BD0F62CC0F5444@SJCPEX01CL02.citrite.net> <75942ED3FD12D64EABE71209BD0F62CC0F55F5@SJCPEX01CL02.citrite.net> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.216.48.12] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org -----Original Message----- From: Koushik Das [mailto:koushik.das@citrix.com]=20 Sent: Wednesday, October 16, 2013 5:21 AM To: Subject: Re: Possible bug in DeploymentPlanner? On 16-Oct-2013, at 3:12 AM, Prachi Damle wrote: >=20 > -----Original Message----- > From: Koushik Das [mailto:koushik.das@citrix.com]=20 > Sent: Tuesday, October 15, 2013 11:43 AM > To: > Subject: Re: Possible bug in DeploymentPlanner? >=20 >=20 > Thanks for the explanation Prachi. >=20 > Here is one issue I see with what you explained below. If an allocator is= not suppose to handle a particular scenario how should it deal with the av= oid set. Should it put all available pools in the avoid set. In that case s= ubsequent allocators won't get a chance to do anything as all pools are in = avoid set. If more than one allocator can operate on a particular pool then= the situation becomes tricky. >=20 > To give an example, if there is a request for volume on shared storage an= d first allocator to get called is the local allocator. Since it is not goi= ng to handle this case it will return empty/null list. Now should it put al= l the other shared pools in avoid set or only the local pools in avoid set?= Similarly if the request is for a local volume and the request first goes = to zone or cluster-wide storage allocator what should be there in the avoid= set? >=20 >=20 > [Prachi] I see the bug you are thinking about - deploymentPlanningManager= :: canAvoidCluster is only checking the shared storages and thus does not= put the cluster in avoid set in case the local storage is not a match for = the host. >=20 > I think the storage allocators should put the resources not considered, i= n avoid set, that are within their scope. Also they should do so only when = they are supposed to handle the request. Thus local allocator sees only loc= al pools, so it should place them into avoid set if not considered and only= when this allocator is supposed to handle the scenario. While the zone-wid= e/cluster-wide should set those within their scope. > Also, the allocators have a defined order in which they get invoked and w= e need to maintain the order defined in the componentContext.xml >=20 > Now to solve the bug, the deploymentPlanningManager :: canAvoidCluster n= eeds to consider which scenario applies for the given vmprofile - i.e wheth= er its local storage or shared to correctly figure out if cluster can be av= oided by considering applicable pools only. > Does this work fine for your scenario? >=20 >=20 [Koushik] Not sure if canAvoidCluster() is the right place. Shared and loca= l pools can belong to the same cluster, so the cluster cannot be put in avo= id set. Only the pools needs to be selectively added to avoid set. [Prachi] Allocators add the resources to avoid set and then DPM can figure= out if a cluster is done searching looking at the avoid set, so that the p= lanner need not choose that cluster on next try. So two changes are needed: Storage Allocators should set resources in avoid= set within their scope and DPM should check if a cluster is done accordin= gly.=20 Can you let me know the specific bug you want to change allocators for? I w= ill add the above change in a separate ticket if already not done for your = bug. >=20 > On 15-Oct-2013, at 11:31 PM, Prachi Damle > wrote: >=20 >> Koushik, >>=20 >> The deployment planning process is divided between DeploymentPlanners an= d Allocators. The planners are supposed to feed a list of clusters the allo= cators should look into to find a potential destination. While the allocato= rs should provide a avoid set back to the planners consisting of any resour= ce not considered. >>=20 >> Reasoning how the planners work: >> Planners can set/reset clusters into avoid set depending on the heuristi= cs they implement and the order in which they want the clusters to be trave= rsed. >> When all options of the planner are exhausted they should return an empt= y cluster list halting the search. >>=20 >> What should the allocators do? >> Iterate over the given cluster list and in each cluster find out suitabl= e resources - all other resources not considered AND not suitable MUST be a= dded to the avoid set so that the planners get the correct avoid input. >> This is necessary for the logic to not enter an infinite loop. >>=20 >> As you see, only planners can halt the search process by referring to th= e avoid set provided by the allocators. If you see any case not following t= his, that needs to be fixed.=20 >>=20 >> Also, I do think in general it will be safe to add a configurable retry = limit on this loop to have control in case any new logic in allocators don'= t follow this reasoning. I will add that limit. >>=20 >> Thanks, >> Prachi >>=20 >>=20 >>=20 >> -----Original Message----- >> From: Koushik Das [mailto:koushik.das@citrix.com]=20 >> Sent: Tuesday, October 15, 2013 9:19 AM >> To: >> Subject: Possible bug in DeploymentPlanner? >>=20 >> I was making some changes in the storage pool allocators related to some= bug fix and came across this code snippet in planDeplyment() method of Dep= loymentPlanningManagerImpl.java. >> In this if the checkClustersforDestination() returns null and the 'avoid= s' parameter is not correctly updated (one such place can be the storage al= locators) then the while loop will never terminate. Is there any assumptio= n about how the 'avoids' parameter needs to be updated? From the code it is= not very intuitive. I saw some places in the storage pool allocators where= this will not get updated. >>=20 >> Wanted to understand the reason for doing it this way? Can the while (tr= ue) condition be replaced with something more intuitive? >>=20 >> while (true) { >> if (planner instanceof DeploymentClusterPlanner) { >> ExcludeList plannerAvoidInput =3D new ExcludeList(avoi= ds.getDataCentersToAvoid(), >> avoids.getPodsToAvoid(), avoids.getClustersToA= void(), avoids.getHostsToAvoid(), >> avoids.getPoolsToAvoid()); >>=20 >> clusterList =3D ((DeploymentClusterPlanner) planner).o= rderClusters(vmProfile, plan, avoids); >> if (clusterList !=3D null && !clusterList.isEmpty()) { >> // planner refactoring. call allocators to list ho= sts >> ExcludeList plannerAvoidOutput =3D new ExcludeList= (avoids.getDataCentersToAvoid(), >> avoids.getPodsToAvoid(), avoids.getCluster= sToAvoid(), avoids.getHostsToAvoid(), >> avoids.getPoolsToAvoid()); >>=20 >> resetAvoidSet(plannerAvoidOutput, plannerAvoidInpu= t); >>=20 >> dest =3D checkClustersforDestination(clusterList, = vmProfile, plan, avoids, dc, >> getPlannerUsage(planner, vmProfile, plan, = avoids), plannerAvoidOutput); >> if (dest !=3D null) { >> return dest; >> } >> // reset the avoid input to the planners >> resetAvoidSet(avoids, plannerAvoidOutput); >>=20 >> } else { >> return null; >> } >> } else { >> ............ >> ............ >> } >> } >>=20 >>=20 >>=20 >>=20 >>=20 >>=20 >=20