hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arun C Murthy (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-1970) tasktracker hang in reduce. Deadlock between main and comm thread
Date Mon, 01 Oct 2007 05:34:50 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12531411
] 

Arun C Murthy commented on HADOOP-1970:
---------------------------------------

Ok, this is indeed a deadlock... the issue is that there is a differing order of locks of
the parent and child Progress objects in Progress.java.

As shown in the stack trace {{Progress.complete}} locks the child first and then the parent,
where as {{Progress.toString(StringBuffer)}} locks the parent first and then the child...
straight-forward fix is to ensure that parent is always locked first e.g. in {{Progress.complete}}.

> tasktracker hang in reduce. Deadlock between main and comm thread
> -----------------------------------------------------------------
>
>                 Key: HADOOP-1970
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1970
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.14.1
>            Reporter: Koji Noguchi
>            Assignee: Vivek Ratan
>            Priority: Blocker
>             Fix For: 0.14.2
>
>
> Saw one reduce task stuck on copy.
> jstack on the reduce task(task_200709272248_0001_r_000150_0)  process showed 
> {noformat} 
> Found one Java-level deadlock:
> =============================
> "Comm thread for task_200709272248_0001_r_000150_0":
>   waiting to lock monitor 0x08144020 (object 0xd4e30aa8, a org.apache.hadoop.util.Progress),
>   which is held by "main"
> "main":
>   waiting to lock monitor 0x08144084 (object 0xd4e30958, a org.apache.hadoop.util.Progress),
>   which is held by "Comm thread for task_200709272248_0001_r_000150_0"
> Java stack information for the threads listed above:
> ===================================================
> "Comm thread for task_200709272248_0001_r_000150_0":
>         at org.apache.hadoop.util.Progress.toString(Progress.java:113)
>         - waiting to lock <0xd4e30aa8> (a org.apache.hadoop.util.Progress)
>         at org.apache.hadoop.util.Progress.toString(Progress.java:116)
>         - locked <0xd4e30958> (a org.apache.hadoop.util.Progress)
>         at org.apache.hadoop.util.Progress.toString(Progress.java:108)
>         at org.apache.hadoop.mapred.Task$1.run(Task.java:268)
>         at java.lang.Thread.run(Thread.java:619)
> "main":
>         at org.apache.hadoop.util.Progress.startNextPhase(Progress.java:58)
>         - waiting to lock <0xd4e30958> (a org.apache.hadoop.util.Progress)
>         at org.apache.hadoop.util.Progress.complete(Progress.java:70)
>         - locked <0xd4e30aa8> (a org.apache.hadoop.util.Progress)
>         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:253)
>         at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1777)
> {noformat} 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message