Return-Path: Delivered-To: apmail-lucene-hadoop-dev-archive@locus.apache.org Received: (qmail 23759 invoked from network); 22 Mar 2007 00:20:55 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 22 Mar 2007 00:20:55 -0000 Received: (qmail 35357 invoked by uid 500); 22 Mar 2007 00:21:02 -0000 Delivered-To: apmail-lucene-hadoop-dev-archive@lucene.apache.org Received: (qmail 34975 invoked by uid 500); 22 Mar 2007 00:21:01 -0000 Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-dev@lucene.apache.org Received: (qmail 34966 invoked by uid 99); 22 Mar 2007 00:21:01 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 21 Mar 2007 17:21:01 -0700 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO brutus.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 21 Mar 2007 17:20:52 -0700 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 96718714077 for ; Wed, 21 Mar 2007 17:20:32 -0700 (PDT) Message-ID: <30563950.1174522832613.JavaMail.jira@brutus> Date: Wed, 21 Mar 2007 17:20:32 -0700 (PDT) From: "David Bowen (JIRA)" To: hadoop-dev@lucene.apache.org Subject: [jira] Updated: (HADOOP-1140) Deadlock bug involving the o.a.h.metrics package In-Reply-To: <29435265.1174521032145.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Bowen updated HADOOP-1140: -------------------------------- Attachment: 1140.patch A simple fix in org.apache.hadoop.metrics.spi.AbstractMetricsContext. The context object was holding its own lock when calling back into application code. The fix is for it not to do that. Note: I've also done some minor cleanups in this file - simply putting back the uses of Java 1.5 generic types that I was required to take out when I originally checked the code in because of the need at that time to support Java 1.4. > Deadlock bug involving the o.a.h.metrics package > ------------------------------------------------ > > Key: HADOOP-1140 > URL: https://issues.apache.org/jira/browse/HADOOP-1140 > Project: Hadoop > Issue Type: Bug > Reporter: David Bowen > Assigned To: David Bowen > Attachments: 1140.patch > > > Hi David, > Our nightly benchmarks are occasionally failing (2 to 4 of them per night) due to this deadlock in the JT that looks to be caused by Simon. Do you have time to fix this in the morning? > Thanks, > Nige > Found one Java-level deadlock: > ============================= > "expireLaunchingTasks": > waiting to lock monitor 0x08141b44 (object 0x57eafdd0, a org.apache.hadoop.mapred.JobTracker), > which is held by "IPC Server handler 8 on 50020" > "IPC Server handler 8 on 50020": > waiting to lock monitor 0x08141630 (object 0x57de46b8, a com.yahoo.simon.hadoop.metrics.SimonContext), > which is held by "Timer-0" > "Timer-0": > waiting to lock monitor 0x08141b44 (object 0x57eafdd0, a org.apache.hadoop.mapred.JobTracker), > which is held by "IPC Server handler 8 on 50020" > Java stack information for the threads listed above: > =================================================== > "expireLaunchingTasks": > at org.apache.hadoop.mapred.JobTracker$ExpireLaunchingTasks.run(JobTracker.java:152) > - waiting to lock <0x57eafdd0> (a org.apache.hadoop.mapred.JobTracker) > at java.lang.Thread.run(Thread.java:619) > "IPC Server handler 8 on 50020": > at org.apache.hadoop.metrics.spi.AbstractMetricsContext.createRecord(AbstractMetricsContext.java:192) > - waiting to lock <0x57de46b8> (a com.yahoo.simon.hadoop.metrics.SimonContext) > at org.apache.hadoop.mapred.JobInProgress.(JobInProgress.java:130) > at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:1383) > - locked <0x57eafdd0> (a org.apache.hadoop.mapred.JobTracker) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:336) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:559) > "Timer-0": > at org.apache.hadoop.mapred.JobTracker.getRunningJobs(JobTracker.java:943) > - waiting to lock <0x57eafdd0> (a org.apache.hadoop.mapred.JobTracker) > at org.apache.hadoop.mapred.JobTracker$JobTrackerMetrics.doUpdates(JobTracker.java:429) > at org.apache.hadoop.metrics.spi.AbstractMetricsContext.timerEvent(AbstractMetricsContext.java:275) > - locked <0x57de46b8> (a com.yahoo.simon.hadoop.metrics.SimonContext) > at org.apache.hadoop.metrics.spi.AbstractMetricsContext.access$000(AbstractMetricsContext.java:48) > at org.apache.hadoop.metrics.spi.AbstractMetricsContext$1.run(AbstractMetricsContext.java:242) > at java.util.TimerThread.mainLoop(Timer.java:512) > at java.util.TimerThread.run(Timer.java:462) > Found 1 deadlock. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.