hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hemanth Yamijala (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-4977) Deadlock between reclaimCapacity and assignTasks
Date Thu, 15 Jan 2009 01:51:59 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12663979#action_12663979

Hemanth Yamijala commented on HADOOP-4977:

Attached a patch on which I've run dos2unix. Vivek, in future, can you please make sure the
patch files do not have the windows EOL characters - maybe some setting in the editor can
change this.

Also, in future, please attach test-patch results when you are uploading a patch. Here are
the results for this one:

     [exec] +1 overall.
     [exec]     +1 @author.  The patch does not contain any @author tags.
     [exec]     +1 tests included.  The patch appears to include 3 new or modified tests.
     [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
     [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler
     [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
     [exec]     +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

> Deadlock between reclaimCapacity and assignTasks
> ------------------------------------------------
>                 Key: HADOOP-4977
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4977
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/capacity-sched
>    Affects Versions: 0.19.0
>            Reporter: Matei Zaharia
>            Assignee: Vivek Ratan
>            Priority: Blocker
>             Fix For: 0.20.0
>         Attachments: 4977.1.patch, 4977.2.patch, 4977.3.patch, 4977.4.patch, 4977.4.patch,
> I was running the latest trunk with the capacity scheduler and saw the JobTracker lock
up with the following deadlock reported in jstack:
> Found one Java-level deadlock:
> =============================
> "18107298@qtp0-4":
>   waiting to lock monitor 0x08085b40 (object 0x56605100, a org.apache.hadoop.mapred.JobTracker),
>   which is held by "IPC Server handler 4 on 54311"
> "IPC Server handler 4 on 54311":
>   waiting to lock monitor 0x0808594c (object 0x5660e518, a org.apache.hadoop.mapred.CapacityTaskScheduler$MapSchedulingMgr),
>   which is held by "reclaimCapacity"
> "reclaimCapacity":
>   waiting to lock monitor 0x08085b40 (object 0x56605100, a org.apache.hadoop.mapred.JobTracker),
>   which is held by "IPC Server handler 4 on 54311"
> Java stack information for the threads listed above:
> ===================================================
> "18107298@qtp0-4":
> 	at org.apache.hadoop.mapred.JobTracker.getClusterStatus(JobTracker.java:2695)
> 	- waiting to lock <0x56605100> (a org.apache.hadoop.mapred.JobTracker)
> 	at org.apache.hadoop.mapred.jobtracker_jsp._jspService(jobtracker_jsp.java:93)
> 	at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:97)
> 	at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
> 	at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502)
> 	at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:363)
> 	at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
> 	at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
> 	at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
> 	at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417)
> 	at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
> 	at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
> 	at org.mortbay.jetty.Server.handle(Server.java:324)
> 	at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534)
> 	at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864)
> 	at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533)
> 	at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207)
> 	at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403)
> 	at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409)
> 	at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522)
> "IPC Server handler 4 on 54311":
> 	at org.apache.hadoop.mapred.CapacityTaskScheduler$TaskSchedulingMgr.updateQSIObjects(CapacityTaskScheduler.java:564)
> 	- waiting to lock <0x5660e518> (a org.apache.hadoop.mapred.CapacityTaskScheduler$MapSchedulingMgr)
> 	at org.apache.hadoop.mapred.CapacityTaskScheduler$TaskSchedulingMgr.assignTasks(CapacityTaskScheduler.java:855)
> 	at org.apache.hadoop.mapred.CapacityTaskScheduler$TaskSchedulingMgr.access$1000(CapacityTaskScheduler.java:294)
> 	at org.apache.hadoop.mapred.CapacityTaskScheduler.assignTasks(CapacityTaskScheduler.java:1336)
> 	- locked <0x5660dd20> (a org.apache.hadoop.mapred.CapacityTaskScheduler)
> 	at org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:2288)
> 	- locked <0x56605100> (a org.apache.hadoop.mapred.JobTracker)
> 	at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
> 	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
> Unfortunately I didn't manage to select all of the output by mistake, so some is missing,
but it appears that reclaimCapacity locks the MapSchedulingMgr and then tries to lock the
JobTracker, whereas the updateQSIObjects called in assignTasks holds a lock on the JobTracker
(the JobTracker grabs this lock when it calls assignTasks) and then tries to lock the MapSchedulingMgr.
The other thread listed there is a Jetty thread for the web interface and isn't part of the
circular locking. The solution to this would be to lock the JobTracker in reclaimCapacity
before locking anything else.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message