Return-Path: Delivered-To: apmail-hadoop-core-commits-archive@www.apache.org Received: (qmail 51458 invoked from network); 16 Jan 2009 00:23:15 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 16 Jan 2009 00:23:15 -0000 Received: (qmail 99786 invoked by uid 500); 16 Jan 2009 00:23:15 -0000 Delivered-To: apmail-hadoop-core-commits-archive@hadoop.apache.org Received: (qmail 99614 invoked by uid 500); 16 Jan 2009 00:23:14 -0000 Mailing-List: contact core-commits-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-dev@hadoop.apache.org Delivered-To: mailing list core-commits@hadoop.apache.org Received: (qmail 99604 invoked by uid 99); 16 Jan 2009 00:23:14 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 15 Jan 2009 16:23:14 -0800 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO eris.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 16 Jan 2009 00:23:12 +0000 Received: by eris.apache.org (Postfix, from userid 65534) id E8B76238889E; Thu, 15 Jan 2009 16:22:50 -0800 (PST) Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Subject: svn commit: r734868 - in /hadoop/core/trunk: CHANGES.txt src/contrib/capacity-scheduler/src/java/org/apache/hadoop/mapred/CapacityTaskScheduler.java src/contrib/capacity-scheduler/src/test/org/apache/hadoop/mapred/TestCapacityScheduler.java Date: Fri, 16 Jan 2009 00:22:50 -0000 To: core-commits@hadoop.apache.org From: yhemanth@apache.org X-Mailer: svnmailer-1.0.8 Message-Id: <20090116002250.E8B76238889E@eris.apache.org> X-Virus-Checked: Checked by ClamAV on apache.org Author: yhemanth Date: Thu Jan 15 16:22:50 2009 New Revision: 734868 URL: http://svn.apache.org/viewvc?rev=734868&view=rev Log: HADOOP-4988. Fix reclaim capacity to work even when there are queues with no capacity. Contributed by Vivek Ratan. Modified: hadoop/core/trunk/CHANGES.txt hadoop/core/trunk/src/contrib/capacity-scheduler/src/java/org/apache/hadoop/mapred/CapacityTaskScheduler.java hadoop/core/trunk/src/contrib/capacity-scheduler/src/test/org/apache/hadoop/mapred/TestCapacityScheduler.java Modified: hadoop/core/trunk/CHANGES.txt URL: http://svn.apache.org/viewvc/hadoop/core/trunk/CHANGES.txt?rev=734868&r1=734867&r2=734868&view=diff ============================================================================== --- hadoop/core/trunk/CHANGES.txt (original) +++ hadoop/core/trunk/CHANGES.txt Thu Jan 15 16:22:50 2009 @@ -588,6 +588,9 @@ HADOOP-4977. Fix a deadlock between the reclaimCapacity and assignTasks in capacity scheduler. (Vivek Ratan via yhemanth) + HADOOP-4988. Fix reclaim capacity to work even when there are queues with + no capacity. (Vivek Ratan via yhemanth) + Release 0.19.1 - Unreleased IMPROVEMENTS Modified: hadoop/core/trunk/src/contrib/capacity-scheduler/src/java/org/apache/hadoop/mapred/CapacityTaskScheduler.java URL: http://svn.apache.org/viewvc/hadoop/core/trunk/src/contrib/capacity-scheduler/src/java/org/apache/hadoop/mapred/CapacityTaskScheduler.java?rev=734868&r1=734867&r2=734868&view=diff ============================================================================== --- hadoop/core/trunk/src/contrib/capacity-scheduler/src/java/org/apache/hadoop/mapred/CapacityTaskScheduler.java (original) +++ hadoop/core/trunk/src/contrib/capacity-scheduler/src/java/org/apache/hadoop/mapred/CapacityTaskScheduler.java Thu Jan 15 16:22:50 2009 @@ -405,26 +405,16 @@ return -1; } else if ((0 == t1.reclaimList.size()) && (0 == t2.reclaimList.size())){ - // neither needs to reclaim. If either doesn't have a capacity yet, - // it comes at the end of the queue. - if ((t1.guaranteedCapacity == 0) && - (t2.guaranteedCapacity != 0)) { - return 1; - } else if ((t1.guaranteedCapacity != 0) && - (t2.guaranteedCapacity == 0)) { - return -1; - } else if ((t1.guaranteedCapacity == 0) && - (t2.guaranteedCapacity == 0)) { - // both don't have capacities, treat them as equal. - return 0; - } else { - // look at how much capacity they've filled - double r1 = (double)t1.numRunningTasks/(double)t1.guaranteedCapacity; - double r2 = (double)t2.numRunningTasks/(double)t2.guaranteedCapacity; - if (r1r2) return 1; - else return 0; - } + // neither needs to reclaim. + // look at how much capacity they've filled. Treat a queue with gc=0 + // equivalent to a queue running at capacity + double r1 = (0 == t1.guaranteedCapacity)? 1.0f: + (double)t1.numRunningTasks/(double)t1.guaranteedCapacity; + double r2 = (0 == t2.guaranteedCapacity)? 1.0f: + (double)t2.numRunningTasks/(double)t2.guaranteedCapacity; + if (r1r2) return 1; + else return 0; } else { // both have to reclaim. Look at which one needs to reclaim earlier @@ -768,12 +758,10 @@ // collections are up-to-date. private TaskLookupResult assignTasks(TaskTrackerStatus taskTracker) throws IOException { for (QueueSchedulingInfo qsi : qsiForAssigningTasks) { - if (getTSI(qsi).guaranteedCapacity <= 0.0f) { - // No capacity is guaranteed yet for this queue. - // Queues are sorted so that ones without capacities - // come towards the end. Hence, we can simply return - // from here without considering any further queues. - return TaskLookupResult.getNoTaskFoundResult(); + // we may have queues with gc=0. We shouldn't look at jobs from + // these queues + if (0 == getTSI(qsi).guaranteedCapacity) { + continue; } TaskLookupResult tlr = getTaskFromQueue(taskTracker, qsi); TaskLookupResult.LookUpStatus lookUpStatus = tlr.getLookUpStatus(); Modified: hadoop/core/trunk/src/contrib/capacity-scheduler/src/test/org/apache/hadoop/mapred/TestCapacityScheduler.java URL: http://svn.apache.org/viewvc/hadoop/core/trunk/src/contrib/capacity-scheduler/src/test/org/apache/hadoop/mapred/TestCapacityScheduler.java?rev=734868&r1=734867&r2=734868&view=diff ============================================================================== --- hadoop/core/trunk/src/contrib/capacity-scheduler/src/test/org/apache/hadoop/mapred/TestCapacityScheduler.java (original) +++ hadoop/core/trunk/src/contrib/capacity-scheduler/src/test/org/apache/hadoop/mapred/TestCapacityScheduler.java Thu Jan 15 16:22:50 2009 @@ -1415,6 +1415,52 @@ } + // test code to reclaim capacity with one queue haveing zero GC + // (HADOOP-4988). + // Simple test: reclaim capacity should work even if one of the + // queues has a gc of 0. + public void testReclaimCapacityWithZeroGC() throws Exception { + // set up some queues + String[] qs = {"default", "q2", "q3"}; + taskTrackerManager.addQueues(qs); + resConf = new FakeResourceManagerConf(); + ArrayList queues = new ArrayList(); + // we want q3 to have 0 GC. Map slots = 4. + queues.add(new FakeQueueInfo("default", 50.0f, 1000, true, 25)); + queues.add(new FakeQueueInfo("q2", 40.0f, 1000, true, 25)); + queues.add(new FakeQueueInfo("q3", 10.0f, 1000, true, 25)); + // note: because of the way we convert gc% into actual gc, q2's gc + // will be 1, not 2. + resConf.setFakeQueues(queues); + resConf.setReclaimCapacityInterval(500); + scheduler.setResourceManagerConf(resConf); + scheduler.start(); + + // set up a situation where q2 is under capacity, and default + // is over capacity + FakeJobInProgress j1 = submitJobAndInit(JobStatus.PREP, 10, 10, null, "u1"); + //FakeJobInProgress j2 = submitJobAndInit(JobStatus.PREP, 10, 10, "q3", "u1"); + checkAssignment("tt1", "attempt_test_0001_m_000001_0 on tt1"); + checkAssignment("tt1", "attempt_test_0001_m_000002_0 on tt1"); + checkAssignment("tt2", "attempt_test_0001_m_000003_0 on tt2"); + checkAssignment("tt2", "attempt_test_0001_m_000004_0 on tt2"); + // now submit a job to q2 + FakeJobInProgress j3 = submitJobAndInit(JobStatus.PREP, 10, 10, "q2", "u1"); + // get scheduler to notice that q2 needs to reclaim + scheduler.reclaimCapacity(); + // our queue reclaim time is 1000s, heartbeat interval is 5 sec, so + // we start reclaiming when 15 secs are left. + clock.advance(400000); + scheduler.reclaimCapacity(); + // no tasks should have been killed yet + assertEquals(j1.runningMapTasks, 4); + clock.advance(200000); + scheduler.reclaimCapacity(); + // task from j1 will be killed + assertEquals(j1.runningMapTasks, 3); + + } + /* * Following is the testing strategy for testing scheduling information. * - start capacity scheduler with two queues.