hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Piotr Kołaczkowski (JIRA) <j...@apache.org>
Subject [jira] [Created] (MAPREDUCE-4506) EofException / 'connection reset by peer' while copying map output
Date Thu, 02 Aug 2012 09:31:02 GMT
Piotr Kołaczkowski created MAPREDUCE-4506:
---------------------------------------------

             Summary: EofException / 'connection reset by peer' while copying map output 
                 Key: MAPREDUCE-4506
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4506
             Project: Hadoop Map/Reduce
          Issue Type: Bug
    Affects Versions: 1.0.3
         Environment: Ubuntu Linux 12.04 LTS, 64-bit, Java 6 update 33
            Reporter: Piotr Kołaczkowski
            Priority: Minor


When running complex mapreduce jobs with many mappers and reducers (e.g. 8 mappers, 8 reducers
on a 8 core machine), sometimes the following exceptions pop up in the logs during the shuffle
phase:

{noformat}
WARN [570516323@qtp-2060060479-164] 2012-07-19 02:50:21,229 TaskTracker.java (line 3894) getMapOutput(attempt_201207161621_0217_m_000071_0,0)
failed :
org.mortbay.jetty.EofException
        at org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:787)
        at org.mortbay.jetty.AbstractGenerator$Output.flush(AbstractGenerator.java:568)
        at org.mortbay.jetty.HttpConnection$Output.flush(HttpConnection.java:1005)
        at org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:648)
        at org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:579)
        at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:3872)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
        at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
        at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1166)
        at org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:835)
        at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157)
        at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:388)
        at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
        at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
        at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765)
        at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:418)
        at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
        at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
        at org.mortbay.jetty.Server.handle(Server.java:326)
        at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
        at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:923)
        at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:547)
        at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
        at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
        at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409)
        at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Caused by: java.io.IOException: Connection reset by peer
        at sun.nio.ch.FileDispatcher.write0(Native Method)
        at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:29)
        at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:72)
        at sun.nio.ch.IOUtil.write(IOUtil.java:43)
        at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:334)
        at org.mortbay.io.nio.ChannelEndPoint.flush(ChannelEndPoint.java:169)
        at org.mortbay.io.nio.SelectChannelEndPoint.flush(SelectChannelEndPoint.java:221)
        at org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:721)
{noformat}

The problem looks like some network problems at first, however it turns out that hadoop shuffleInMemory
sometimes deliberately closes map-output-copy connections just to reopen them a few milliseconds
later, because of temporary unavailability of free memory. Because the sending side does not
expect this, an exception is thrown. Additionally this leads to wasting resources on the sender
side, which does more work than required serving additional requests. 



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

Mime
View raw message