Return-Path: Delivered-To: apmail-lucene-hadoop-dev-archive@locus.apache.org Received: (qmail 6761 invoked from network); 28 Dec 2007 05:15:07 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 28 Dec 2007 05:15:07 -0000 Received: (qmail 96685 invoked by uid 500); 28 Dec 2007 05:14:55 -0000 Delivered-To: apmail-lucene-hadoop-dev-archive@lucene.apache.org Received: (qmail 96657 invoked by uid 500); 28 Dec 2007 05:14:55 -0000 Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-dev@lucene.apache.org Received: (qmail 96648 invoked by uid 99); 28 Dec 2007 05:14:55 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 27 Dec 2007 21:14:55 -0800 X-ASF-Spam-Status: No, hits=-100.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO brutus.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 28 Dec 2007 05:14:38 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 3969E7141F1 for ; Thu, 27 Dec 2007 21:14:43 -0800 (PST) Message-ID: <14263958.1198818883232.JavaMail.jira@brutus> Date: Thu, 27 Dec 2007 21:14:43 -0800 (PST) From: "Devaraj Das (JIRA)" To: hadoop-dev@lucene.apache.org Subject: [jira] Commented: (HADOOP-2492) ConcurrentModificationException in org.apache.hadoop.ipc.Server.Responder In-Reply-To: <15223271.1198740823223.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-2492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12554653 ] Devaraj Das commented on HADOOP-2492: ------------------------------------- There is no stack trace since the Responder.run catches the exception and just logs the exception (_LOG.warn("Exception in Responder " + e)_). It doesn't print the stack trace... > ConcurrentModificationException in org.apache.hadoop.ipc.Server.Responder > ------------------------------------------------------------------------- > > Key: HADOOP-2492 > URL: https://issues.apache.org/jira/browse/HADOOP-2492 > Project: Hadoop > Issue Type: Bug > Components: ipc > Affects Versions: 0.16.0 > Reporter: Devaraj Das > Assignee: dhruba borthakur > Fix For: 0.16.0 > > > I was running hadoop on 800 machines and after running a couple of jobs, and running 100% of the maps of the current job, the JobTracker stopped responding - *all* tasktrackers were lost ... When I looked at the JT logs, these seemed alarming: > 2007-12-26 19:18:30,185 WARN org.apache.hadoop.ipc.Server: Exception in Responder java.util.ConcurrentModificationException > Following the above exception, I saw a whole lot of exceptions like: > 2007-12-26 19:23:10,926 WARN org.apache.hadoop.ipc.Server: Call queue overflow discarding oldest call heartbeat(org.apache.hadoop.mapred.TaskTrackerStatus@5a05f9, false, true, 1758) from 1.2.3.4:1234 > From the number of exceptions to do with call queue overflow, it seemed like the jobtracker was not processing RPCs after it got the ConcurrentModificationException, and around that time the tasktrackers started getting timeouts on RPCs... > There were two occurrences of the ConcurrentModificationException but the first instance seemed to not have any effect on the call queue... -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.