hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "sam rash (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-5407) Sometimes, Reduce tasks hang, State is unassigned
Date Thu, 12 Mar 2009 17:28:50 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-5407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12681408#action_12681408
] 

sam rash commented on HADOOP-5407:
----------------------------------

After doing a series of thread dumps, we have found that parent TaskTracker is hung on the
following synchronized block:

{code}
          synchronized (numFreeSlots) {
            while (numFreeSlots.get() == 0) {
              numFreeSlots.wait();
            }
{code}

Here is the relevant section of the thread dumps (note this has been in this state for 8+
hours)

{noformat}
"TaskLauncher for task" daemon prio=10 tid=0x6f7f0400 nid=0x1857 in Object.wait() [0x6eae0000..0x6eae1030]
   java.lang.Thread.State: WAITING (on object monitor)
	at java.lang.Object.wait(Native Method)
	at java.lang.Object.wait(Object.java:485)
	at org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:1585)
	- locked <0x74f3cdf0> (a org.apache.hadoop.io.IntWritable)
{noformat}

This is consistent with the previous report that a restart of the TaskTracker is necessary
to fix the problem.

> Sometimes, Reduce tasks hang, State is unassigned
> -------------------------------------------------
>
>                 Key: HADOOP-5407
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5407
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.19.0
>            Reporter: ZhuGuanyin
>
> Hi, all
> When our cluster runs for a long time, some reduce tasks running on some tasktrackers
hang. Their states are UNASSIGNED.  Then, all reduce tasks on these tasktracker will hang.
> We kill the hang reduce task, then the reduce task attempt is re-scheduled to this tasktracker,
the attempt task continues to hang. We fail it, it goes to another tasktracker, it is executed
successfully. 
> Tasktracker which has hang reduce task will receive new reduce task, but the reduce 
task continue to hang for ever.
> When we reboot the tasktracker machine, reduce task no longer hangs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message