hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "xieguiming (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-5) Shuffle's getMapOutput() fails with EofException, followed by IllegalStateException
Date Sun, 27 May 2012 09:32:24 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-5?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13284130#comment-13284130

xieguiming commented on MAPREDUCE-5:

I have analyzed this problem for one whole day, and I will show some details more.
1>The TT throw the EofException and the IllegalStateExcetion for the getMapOutput.

2>and then,I use the netstat command to check the http port (50060), and find 83 connections
are on CLOSE_WAIT state.and the CLOSE_WAIT state do not disapper always. At least, for 24

3>form the TT log, after print the exception, the TT http server do not work well. can
not accept any http request(no "sent out" log found later). and JT add it to the blacklist.
I use the curl shell command to access the http service, and client throw timeout. and the
Datanode http service on the same node is ok.

4>and I also find the TT CPU is 100% even when there is no any childjvm.

5>and I also find the reduce task on the same node copy slower from other node .

6>I restart the TT. and the TT works well.

I attach the TT logs. if need other logs, tell me. but I am sorry that  we have not the matched
userlog, because the userlog will be delete after only 3 hours. and when we find the problem,
and many hours pass.

> Shuffle's getMapOutput() fails with EofException, followed by IllegalStateException
> -----------------------------------------------------------------------------------
>                 Key: MAPREDUCE-5
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.20.2
>         Environment: Sun Java 1.6.0_13, OpenSolaris, running on a SunFire 4150 (x64)
10 node cluster
>            Reporter: George Porter
>         Attachments: temp.rar
> During the shuffle phase, I'm seeing a large sequence of the following actions:
> 1) WARN org.apache.hadoop.mapred.TaskTracker: getMapOutput(attempt_200905181452_0002_m_000010_0,0)
failed : org.mortbay.jetty.EofException
> 2) WARN org.mortbay.log: Committed before 410 getMapOutput(attempt_200905181452_0002_m_000010_0,0)
failed : org.mortbay.jetty.EofException
> 3) ERROR org.mortbay.log: /mapOutput java.lang.IllegalStateException: Committed
> The map phase completes with 100%, and then the reduce phase crawls along with the above
errors in each of the TaskTracker logs.  None of the tasktrackers get lost.  When I run non-data
jobs like the 'pi' test from the example jar, everything works fine.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message