hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amar Kamat (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-805) Deadlock in Jobtracker
Date Tue, 11 Aug 2009 04:14:15 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12741681#action_12741681

Amar Kamat commented on MAPREDUCE-805:

Note that I purposefully added sleeps in JobTracker.initJob() and JobInProgress.initTasks
to take care of race conditions. I didnt see any side effect. With this patch init will always
keep the job in PREP state but based on whether 
- setup is required or not 
- tasks are needed to run 
- job-kill was issued during init 
- job-init failed

the job can move to RUNNING or SUCCCEEDED or KILLED or FAILED state or remain in PREP state.
Here is how the state transition happens (note that after job.initTasks() the job will be
in PREP state)
||setup needed?||maps=0 and reduces=0?||job killed during init?||init failed?||new state||comment||
|*|*|*|yes|FAILED|irrespective of what the config is, if the job fails in init, its marked
|*|*|yes|no|KILLED|irrespective of what the config is, if the job is killed during init and
init passed normally then the job is marked as KILLED|
|yes|*|no|no|PREP|if job is configured to run setup then the job will remain in PREP state|
|no|yes|no|no|SUCCEEDED|if the job has no setup configured and if there are no maps and reduces
then the job is marked SUCCEEDED|
|no|no|no|no|RUNNING|if the job has no setup configured and if there are maps and reduces
then the job is marked RUNNING|

> Deadlock in Jobtracker
> ----------------------
>                 Key: MAPREDUCE-805
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-805
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Michael Tamm
>         Attachments: MAPREDUCE-805-v1.1.patch, MAPREDUCE-805-v1.11-branch-0.20.patch,
MAPREDUCE-805-v1.11.patch, MAPREDUCE-805-v1.2.patch, MAPREDUCE-805-v1.3.patch, MAPREDUCE-805-v1.6.patch,
> We are running a hadoop cluster (version 0.20.0) and have detected the following deadlock
on our jobtracker:
> {code}
> "IPC Server handler 51 on 9001":
> 	at org.apache.hadoop.mapred.JobInProgress.getCounters(JobInProgress.java:943)
> 	- waiting to lock <0x00007f2b6fb46130> (a org.apache.hadoop.mapred.JobInProgress)
> 	at org.apache.hadoop.mapred.JobTracker.getJobCounters(JobTracker.java:3102)
> 	- locked <0x00007f2b5f026000> (a org.apache.hadoop.mapred.JobTracker)
> 	at sun.reflect.GeneratedMethodAccessor21.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
> 	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
> 	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:396)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
>  "pool-1-thread-2":
> 	at org.apache.hadoop.mapred.JobTracker.finalizeJob(JobTracker.java:2017)
> 	- waiting to lock <0x00007f2b5f026000> (a org.apache.hadoop.mapred.JobTracker)
> 	at org.apache.hadoop.mapred.JobInProgress.garbageCollect(JobInProgress.java:2483)
> 	- locked <0x00007f2b6fb46130> (a org.apache.hadoop.mapred.JobInProgress)
> 	at org.apache.hadoop.mapred.JobInProgress.terminateJob(JobInProgress.java:2152)
> 	- locked <0x00007f2b6fb46130> (a org.apache.hadoop.mapred.JobInProgress)
> 	at org.apache.hadoop.mapred.JobInProgress.terminate(JobInProgress.java:2169)
> 	- locked <0x00007f2b6fb46130> (a org.apache.hadoop.mapred.JobInProgress)
> 	at org.apache.hadoop.mapred.JobInProgress.fail(JobInProgress.java:2245)
> 	- locked <0x00007f2b6fb46130> (a org.apache.hadoop.mapred.JobInProgress)
> 	at org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:86)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> 	at java.lang.Thread.run(Thread.java:619)
> {code}

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message