Return-Path: X-Original-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 84E48183EE for ; Sat, 2 May 2015 04:22:07 +0000 (UTC) Received: (qmail 18834 invoked by uid 500); 2 May 2015 04:22:07 -0000 Delivered-To: apmail-hadoop-mapreduce-issues-archive@hadoop.apache.org Received: (qmail 18758 invoked by uid 500); 2 May 2015 04:22:07 -0000 Mailing-List: contact mapreduce-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-issues@hadoop.apache.org Delivered-To: mailing list mapreduce-issues@hadoop.apache.org Received: (qmail 18743 invoked by uid 99); 2 May 2015 04:22:07 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 02 May 2015 04:22:07 +0000 Date: Sat, 2 May 2015 04:22:06 +0000 (UTC) From: "Hadoop QA (JIRA)" To: mapreduce-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (MAPREDUCE-4506) EofException / 'connection reset by peer' while copying map output MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/MAPREDUCE-4506?page=3Dcom.atlas= sian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D= 14524557#comment-14524557 ]=20 Hadoop QA commented on MAPREDUCE-4506: -------------------------------------- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not app= ly the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12538889/Redu= ceTask.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/= 5494/console | This message was automatically generated. > EofException / 'connection reset by peer' while copying map output=20 > ------------------------------------------------------------------- > > Key: MAPREDUCE-4506 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4506 > Project: Hadoop Map/Reduce > Issue Type: Bug > Affects Versions: 1.0.3 > Environment: Ubuntu Linux 12.04 LTS, 64-bit, Java 6 update 33 > Reporter: Piotr Ko=C5=82aczkowski > Priority: Minor > Attachments: RamManager.patch, ReduceTask.patch > > > When running complex mapreduce jobs with many mappers and reducers (e.g. = 8 mappers, 8 reducers on a 8 core machine), sometimes the following excepti= ons pop up in the logs during the shuffle phase: > {noformat} > WARN [570516323@qtp-2060060479-164] 2012-07-19 02:50:21,229 TaskTracker.j= ava (line 3894) getMapOutput(attempt_201207161621_0217_m_000071_0,0) failed= : > org.mortbay.jetty.EofException > at org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:787) > at org.mortbay.jetty.AbstractGenerator$Output.flush(AbstractGener= ator.java:568) > at org.mortbay.jetty.HttpConnection$Output.flush(HttpConnection.j= ava:1005) > at org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGener= ator.java:648) > at org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGener= ator.java:579) > at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(Ta= skTracker.java:3872) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) > at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.j= ava:511) > at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(= ServletHandler.java:1166) > at org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(= HttpServer.java:835) > at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(= ServletHandler.java:1157) > at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler= .java:388) > at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHand= ler.java:216) > at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler= .java:182) > at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler= .java:765) > at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.ja= va:418) > at org.mortbay.jetty.handler.ContextHandlerCollection.handle(Cont= extHandlerCollection.java:230) > at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper= .java:152) > at org.mortbay.jetty.Server.handle(Server.java:326) > at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.= java:542) > at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete= (HttpConnection.java:923) > at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:547) > at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:21= 2) > at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:40= 4) > at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndP= oint.java:409) > at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThrea= dPool.java:582) > Caused by: java.io.IOException: Connection reset by peer > at sun.nio.ch.FileDispatcher.write0(Native Method) > at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:29) > at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:72) > at sun.nio.ch.IOUtil.write(IOUtil.java:43) > at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:334) > at org.mortbay.io.nio.ChannelEndPoint.flush(ChannelEndPoint.java:= 169) > at org.mortbay.io.nio.SelectChannelEndPoint.flush(SelectChannelEn= dPoint.java:221) > at org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:721) > {noformat} > The problem looks like some network problems at first, however it turns o= ut that hadoop shuffleInMemory sometimes deliberately closes map-output-cop= y connections just to reopen them a few milliseconds later, because of temp= orary unavailability of free memory. Because the sending side does not expe= ct this, an exception is thrown. Additionally this leads to wasting resourc= es on the sender side, which does more work than required serving additiona= l requests.=20 -- This message was sent by Atlassian JIRA (v6.3.4#6332)