Return-Path: Delivered-To: apmail-lucene-hadoop-dev-archive@locus.apache.org Received: (qmail 21496 invoked from network); 5 Oct 2007 23:45:43 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 5 Oct 2007 23:45:43 -0000 Received: (qmail 11549 invoked by uid 500); 5 Oct 2007 23:45:30 -0000 Delivered-To: apmail-lucene-hadoop-dev-archive@lucene.apache.org Received: (qmail 11517 invoked by uid 500); 5 Oct 2007 23:45:30 -0000 Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-dev@lucene.apache.org Received: (qmail 11508 invoked by uid 99); 5 Oct 2007 23:45:30 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 05 Oct 2007 16:45:30 -0700 X-ASF-Spam-Status: No, hits=-100.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO brutus.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 05 Oct 2007 23:45:41 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 9152F714233 for ; Fri, 5 Oct 2007 16:44:50 -0700 (PDT) Message-ID: <5240203.1191627890569.JavaMail.jira@brutus> Date: Fri, 5 Oct 2007 16:44:50 -0700 (PDT) From: "Michael Bieniosek (JIRA)" To: hadoop-dev@lucene.apache.org Subject: [jira] Commented: (HADOOP-2001) Deadlock in jobtracker In-Reply-To: <30272114.1191627170781.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12532798 ] Michael Bieniosek commented on HADOOP-2001: ------------------------------------------- I could submit a quick fix patch that unmarks JobTracker.finalizeJob synchronized, but I don't really know if that would break other things, or if it could miss other deadlock paths. Anybody else know more about this code? > Deadlock in jobtracker > ---------------------- > > Key: HADOOP-2001 > URL: https://issues.apache.org/jira/browse/HADOOP-2001 > Project: Hadoop > Issue Type: Bug > Affects Versions: 0.14.0 > Reporter: Michael Bieniosek > Priority: Critical > > My jobtracker deadlocked; the output from kill -QUIT is: > Found one Java-level deadlock: > ============================= > "IPC Server handler 2 on 10001": > waiting to lock monitor 0x0813724c (object 0xd5175488, a org.apache.hadoop.mapred.JobInProgress), > which is held by "SocketListener0-1" > "SocketListener0-1": > waiting to lock monitor 0x081146d4 (object 0xd24d9c50, a org.apache.hadoop.mapred.JobTracker), > which is held by "IPC Server handler 2 on 10001" > Java stack information for the threads listed above: > =================================================== > "IPC Server handler 2 on 10001": > at org.apache.hadoop.mapred.JobInProgress.updateTaskStatus(JobInProgress.java:367) > - waiting to lock <0xd5175488> (a org.apache.hadoop.mapred.JobInProgress) > at org.apache.hadoop.mapred.JobTracker.updateTaskStatuses(JobTracker.java:1719) > at org.apache.hadoop.mapred.JobTracker.processHeartbeat(JobTracker.java:1240) > - locked <0xd24d9c50> (a org.apache.hadoop.mapred.JobTracker) > at org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:1116) > - locked <0xd24d9c50> (a org.apache.hadoop.mapred.JobTracker) > at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) > at java.lang.reflect.Method.invoke(Unknown Source) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:340) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:566) > "SocketListener0-1": > at org.apache.hadoop.mapred.JobTracker.finalizeJob(JobTracker.java:907) > - waiting to lock <0xd24d9c50> (a org.apache.hadoop.mapred.JobTracker) > at org.apache.hadoop.mapred.JobInProgress.garbageCollect(JobInProgress.java:1059) > - locked <0xd5175488> (a org.apache.hadoop.mapred.JobInProgress) > at org.apache.hadoop.mapred.JobInProgress.kill(JobInProgress.java:891) > - locked <0xd5175488> (a org.apache.hadoop.mapred.JobInProgress) > at org.apache.hadoop.mapred.jobdetails_jsp._jspService(jobdetails_jsp.java:158) > at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:94) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:802) > at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:427) > at org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationHandler.java:475) > at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:567) > at org.mortbay.http.HttpContext.handle(HttpContext.java:1565) > at org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationContext.java:635) > at org.mortbay.http.HttpContext.handle(HttpContext.java:1517) > at org.mortbay.http.HttpServer.service(HttpServer.java:954) > at org.mortbay.http.HttpConnection.service(HttpConnection.java:814) > at org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:981) > at org.mortbay.http.HttpConnection.handle(HttpConnection.java:831) > at org.mortbay.http.SocketListener.handleConnection(SocketListener.java:244) > at org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357) > at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534) > Found 1 deadlock. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.