hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinod K V (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-805) Deadlock in Jobtracker
Date Tue, 28 Jul 2009 03:55:14 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12735907#action_12735907
] 

Vinod K V commented on MAPREDUCE-805:
-------------------------------------

Had a cursory look at the patch. It will be good to add javadoc for JobInProgress.initTasks()
and JobInProgress.fail() mentioning that these methods ARE NOT supposed to be called directly
by the schedulers and suggesting that the JobTracker methods be preferred to over JobInProgress
methods for general use.

Given this issue, it will also be helpful to document the locking order (JobTracker, JobInProgress)
so that, for e.g, schedulers don't lock JobInProgress asynchronously before calling these
methods.

Though not directly related to the patch, it will be good to document that JobTracker is locked
while calling JobInProgressListener update methods.

> Deadlock in Jobtracker
> ----------------------
>
>                 Key: MAPREDUCE-805
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-805
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Michael Tamm
>         Attachments: MAPREDUCE-805-v1.1.patch
>
>
> We are running a hadoop cluster (version 0.20.0) and have detected the following deadlock
on our jobtracker:
> {code}
> "IPC Server handler 51 on 9001":
> 	at org.apache.hadoop.mapred.JobInProgress.getCounters(JobInProgress.java:943)
> 	- waiting to lock <0x00007f2b6fb46130> (a org.apache.hadoop.mapred.JobInProgress)
> 	at org.apache.hadoop.mapred.JobTracker.getJobCounters(JobTracker.java:3102)
> 	- locked <0x00007f2b5f026000> (a org.apache.hadoop.mapred.JobTracker)
> 	at sun.reflect.GeneratedMethodAccessor21.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
> 	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
> 	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:396)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
>  "pool-1-thread-2":
> 	at org.apache.hadoop.mapred.JobTracker.finalizeJob(JobTracker.java:2017)
> 	- waiting to lock <0x00007f2b5f026000> (a org.apache.hadoop.mapred.JobTracker)
> 	at org.apache.hadoop.mapred.JobInProgress.garbageCollect(JobInProgress.java:2483)
> 	- locked <0x00007f2b6fb46130> (a org.apache.hadoop.mapred.JobInProgress)
> 	at org.apache.hadoop.mapred.JobInProgress.terminateJob(JobInProgress.java:2152)
> 	- locked <0x00007f2b6fb46130> (a org.apache.hadoop.mapred.JobInProgress)
> 	at org.apache.hadoop.mapred.JobInProgress.terminate(JobInProgress.java:2169)
> 	- locked <0x00007f2b6fb46130> (a org.apache.hadoop.mapred.JobInProgress)
> 	at org.apache.hadoop.mapred.JobInProgress.fail(JobInProgress.java:2245)
> 	- locked <0x00007f2b6fb46130> (a org.apache.hadoop.mapred.JobInProgress)
> 	at org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:86)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> 	at java.lang.Thread.run(Thread.java:619)
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message