Return-Path: Delivered-To: apmail-hadoop-core-dev-archive@www.apache.org Received: (qmail 63829 invoked from network); 11 Feb 2009 04:55:22 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 11 Feb 2009 04:55:22 -0000 Received: (qmail 3044 invoked by uid 500); 11 Feb 2009 04:55:20 -0000 Delivered-To: apmail-hadoop-core-dev-archive@hadoop.apache.org Received: (qmail 3012 invoked by uid 500); 11 Feb 2009 04:55:20 -0000 Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-dev@hadoop.apache.org Delivered-To: mailing list core-dev@hadoop.apache.org Received: (qmail 3001 invoked by uid 99); 11 Feb 2009 04:55:20 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 Feb 2009 20:55:20 -0800 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 11 Feb 2009 04:55:19 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 9EC3C234C498 for ; Tue, 10 Feb 2009 20:54:59 -0800 (PST) Message-ID: <542012650.1234328099649.JavaMail.jira@brutus> Date: Tue, 10 Feb 2009 20:54:59 -0800 (PST) From: "Vinod K V (JIRA)" To: core-dev@hadoop.apache.org Subject: [jira] Commented: (HADOOP-5214) ConcurrentModificationException in FairScheduler.getTotalSlots In-Reply-To: <2005911189.1234327742723.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-5214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12672509#action_12672509 ] Vinod K V commented on HADOOP-5214: ----------------------------------- Here's the trace. {code} 2009-02-08 16:27:57,496 ERROR org.apache.hadoop.mapred.FairScheduler: Failed to update fair share calculations java.util.ConcurrentModificationException at java.util.HashMap$HashIterator.nextEntry(HashMap.java:793) at java.util.HashMap$ValueIterator.next(HashMap.java:822) at org.apache.hadoop.mapred.FairScheduler.getTotalSlots(FairScheduler.java:703) at org.apache.hadoop.mapred.FairScheduler.updateFairShares(FairScheduler.java:622) at org.apache.hadoop.mapred.FairScheduler.update(FairScheduler.java:358) at org.apache.hadoop.mapred.FairScheduler$UpdateThread.run(FairScheduler.java:212) 2009-02-08 16:27:57,500 ERROR org.apache.hadoop.mapred.JobTracker: Tracker Expiry Thread got exception: java.lang.NullPointerException {code} And the underlying problem is that getTotalSlots gets the list of trackers via TaskTrackerManager.taskTrackers() and iterates through it to calculate the total number of map and reduce slots in the cluster. The exception occurs when JT internally modifies the list of TaskTrackers for e.g while updating the list of lost TaskTrackers. This information is already available via ClusterStatus. Using it will fix the issue. > ConcurrentModificationException in FairScheduler.getTotalSlots > -------------------------------------------------------------- > > Key: HADOOP-5214 > URL: https://issues.apache.org/jira/browse/HADOOP-5214 > Project: Hadoop Core > Issue Type: Bug > Components: contrib/fair-share > Reporter: Vinod K V > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.