hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matei Zaharia (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-1436) Deadlock in preemption code in fair scheduler
Date Mon, 01 Feb 2010 20:00:19 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12828249#action_12828249
] 

Matei Zaharia commented on MAPREDUCE-1436:
------------------------------------------

Here is jstack output showing the problem:

{code}
Found one Java-level deadlock:
=============================
"72353846@qtp0-7":
  waiting to lock monitor 0x00000000423f6370 (object 0x00007f22039e2b48, a org.apache.hadoop.mapred.JobTracker),
  which is held by "IPC Server handler 14 on 9001"
"IPC Server handler 14 on 9001":
  waiting to lock monitor 0x0000000041cdc130 (object 0x00007f22039e2fa8, a org.apache.hadoop.mapred.FairScheduler),
  which is held by "FairScheduler update thread"
"FairScheduler update thread":
  waiting to lock monitor 0x0000000041c29fa8 (object 0x00007f2260640948, a org.apache.hadoop.mapred.JobInProgress),
  which is held by "IPC Server handler 14 on 9001"

Java stack information for the threads listed above:
===================================================
"72353846@qtp0-7":
	at org.apache.hadoop.mapred.JobTracker.getClusterStatus(JobTracker.java:3071)
	- waiting to lock <0x00007f22039e2b48> (a org.apache.hadoop.mapred.JobTracker)
	at org.apache.hadoop.mapred.jobtracker_jsp._jspService(jobtracker_jsp.java:91)
	at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:97)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
	at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502)
	at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:363)
	at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
	at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
	at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
	at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417)
	at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
	at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
	at org.mortbay.jetty.Server.handle(Server.java:324)
	at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534)
	at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864)
	at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533)
	at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207)
	at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403)
	at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409)
	at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522)
"IPC Server handler 14 on 9001":
	at org.apache.hadoop.mapred.JobTracker.finalizeJob(JobTracker.java:2115)
	- waiting to lock <0x00007f22039e2fa8> (a org.apache.hadoop.mapred.FairScheduler)
	- locked <0x00007f22039e33e0> (a java.util.TreeMap)
	- locked <0x00007f22039e2b48> (a org.apache.hadoop.mapred.JobTracker)
	at org.apache.hadoop.mapred.JobInProgress.garbageCollect(JobInProgress.java:2510)
	- locked <0x00007f2260640948> (a org.apache.hadoop.mapred.JobInProgress)
	at org.apache.hadoop.mapred.JobInProgress.jobComplete(JobInProgress.java:2146)
	at org.apache.hadoop.mapred.JobInProgress.completedTask(JobInProgress.java:2084)
	- locked <0x00007f2260640948> (a org.apache.hadoop.mapred.JobInProgress)
	at org.apache.hadoop.mapred.JobInProgress.updateTaskStatus(JobInProgress.java:883)
	- locked <0x00007f2260640948> (a org.apache.hadoop.mapred.JobInProgress)
	at org.apache.hadoop.mapred.JobTracker.updateTaskStatuses(JobTracker.java:3564)
	at org.apache.hadoop.mapred.JobTracker.processHeartbeat(JobTracker.java:2758)
	- locked <0x00007f22039e2b48> (a org.apache.hadoop.mapred.JobTracker)
	at org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:2553)
	- locked <0x00007f22039e2b48> (a org.apache.hadoop.mapred.JobTracker)
	at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
"FairScheduler update thread":
	at org.apache.hadoop.mapred.JobInProgress.runningMaps(JobInProgress.java:549)
	- waiting to lock <0x00007f2260640948> (a org.apache.hadoop.mapred.JobInProgress)
	at org.apache.hadoop.mapred.JobSchedulable.getRunningTasks(JobSchedulable.java:103)
	at org.apache.hadoop.mapred.PoolSchedulable.getRunningTasks(PoolSchedulable.java:129)
	at org.apache.hadoop.mapred.FairScheduler.isStarvedForMinShare(FairScheduler.java:704)
	at org.apache.hadoop.mapred.FairScheduler.updatePreemptionVariables(FairScheduler.java:686)
	at org.apache.hadoop.mapred.FairScheduler.update(FairScheduler.java:595)
	- locked <0x00007f22039e2fa8> (a org.apache.hadoop.mapred.FairScheduler)
	at org.apache.hadoop.mapred.FairScheduler$UpdateThread.run(FairScheduler.java:277)
{code}

The issue is that the JobTracker locks the JT, then a JobInProgress, and then tries to lock
the FairScheduler in finalizeJob. However, the fair scheduler's updatePreemptionVariables
code locks the scheduler before attempting to lock the JobInProgress. The right thing is to
lock the JobTracker before the FairScheduler. This happens in update() (to get the cluster
status) but not in updatePreemptionVariables().

> Deadlock in preemption code in fair scheduler
> ---------------------------------------------
>
>                 Key: MAPREDUCE-1436
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1436
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: contrib/fair-share
>    Affects Versions: 0.21.0, 0.22.0
>            Reporter: Matei Zaharia
>            Assignee: Matei Zaharia
>            Priority: Blocker
>
> In testing the fair scheduler with preemption, I found a deadlock between updatePreemptionVariables
and some code in the JobTracker. This was found while testing a backport of the fair scheduler
to Hadoop 0.20, but it looks like it could also happen in trunk and 0.21. Details are in a
comment below.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message