[ https://issues.apache.org/jira/browse/MAPREDUCE-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13132642#comment-13132642
]
Hudson commented on MAPREDUCE-3226:
-----------------------------------
Integrated in Hadoop-Mapreduce-trunk #867 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/867/])
MAPREDUCE-3226. Fix shutdown of fetcher threads. Contributed by Vinod K V.
acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1187116
Files :
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/EventFetcher.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Fetcher.java
> Few reduce tasks hanging in a gridmix-run
> -----------------------------------------
>
> Key: MAPREDUCE-3226
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3226
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: mrv2, task
> Affects Versions: 0.23.0
> Reporter: Vinod Kumar Vavilapalli
> Assignee: Vinod Kumar Vavilapalli
> Priority: Blocker
> Fix For: 0.23.0
>
> Attachments: MAPREDUCE-3226-20111020.txt
>
>
> In a gridmix run with ~1000 jobs, one job is getting stuck because of 2-3 hanging reducers.
All of the them are stuck after downloading all map outputs and have the following thread
dump.
> {code}
> "EventFetcher for fetching Map Completion Events" daemon prio=10 tid=0xa325fc00 nid=0x1ca4
waiting on condition [0xa315c000]
> java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at org.apache.hadoop.mapreduce.task.reduce.EventFetcher.run(EventFetcher.java:71)
> "main" prio=10 tid=0x080ed400 nid=0x1c71 in Object.wait() [0xf73a2000]
> java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> - waiting on <0xa94b23d8> (a org.apache.hadoop.mapreduce.task.reduce.EventFetcher)
> at java.lang.Thread.join(Thread.java:1143)
> - locked <0xa94b23d8> (a org.apache.hadoop.mapreduce.task.reduce.EventFetcher)
> at java.lang.Thread.join(Thread.java:1196)
> at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:135)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:367)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:147)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1135)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:142)
> {code}
> Thanks to [~karams] for helping track this down.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
|