Return-Path: X-Original-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E34E9CF85 for ; Sun, 27 May 2012 09:32:32 +0000 (UTC) Received: (qmail 14627 invoked by uid 500); 27 May 2012 09:32:31 -0000 Delivered-To: apmail-hadoop-mapreduce-issues-archive@hadoop.apache.org Received: (qmail 14387 invoked by uid 500); 27 May 2012 09:32:27 -0000 Mailing-List: contact mapreduce-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-issues@hadoop.apache.org Delivered-To: mailing list mapreduce-issues@hadoop.apache.org Received: (qmail 14274 invoked by uid 99); 27 May 2012 09:32:25 -0000 Received: from issues-vm.apache.org (HELO issues-vm) (140.211.11.160) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 27 May 2012 09:32:25 +0000 Received: from isssues-vm.apache.org (localhost [127.0.0.1]) by issues-vm (Postfix) with ESMTP id ABBFB1402B8 for ; Sun, 27 May 2012 09:32:24 +0000 (UTC) Date: Sun, 27 May 2012 09:32:24 +0000 (UTC) From: "xieguiming (JIRA)" To: mapreduce-issues@hadoop.apache.org Message-ID: <211038532.6791.1338111144708.JavaMail.jiratomcat@issues-vm> Subject: [jira] [Commented] (MAPREDUCE-5) Shuffle's getMapOutput() fails with EofException, followed by IllegalStateException MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/MAPREDUCE-5?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13284130#comment-13284130 ] xieguiming commented on MAPREDUCE-5: ------------------------------------ I have analyzed this problem for one whole day, and I will show some details more. 1>The TT throw the EofException and the IllegalStateExcetion for the getMapOutput. 2>and then,I use the netstat command to check the http port (50060), and find 83 connections are on CLOSE_WAIT state.and the CLOSE_WAIT state do not disapper always. At least, for 24 hours. 3>form the TT log, after print the exception, the TT http server do not work well. can not accept any http request(no "sent out" log found later). and JT add it to the blacklist. I use the curl shell command to access the http service, and client throw timeout. and the Datanode http service on the same node is ok. 4>and I also find the TT CPU is 100% even when there is no any childjvm. 5>and I also find the reduce task on the same node copy slower from other node . 6>I restart the TT. and the TT works well. I attach the TT logs. if need other logs, tell me. but I am sorry that we have not the matched userlog, because the userlog will be delete after only 3 hours. and when we find the problem, and many hours pass. > Shuffle's getMapOutput() fails with EofException, followed by IllegalStateException > ----------------------------------------------------------------------------------- > > Key: MAPREDUCE-5 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5 > Project: Hadoop Map/Reduce > Issue Type: Bug > Affects Versions: 0.20.2 > Environment: Sun Java 1.6.0_13, OpenSolaris, running on a SunFire 4150 (x64) 10 node cluster > Reporter: George Porter > Attachments: temp.rar > > > During the shuffle phase, I'm seeing a large sequence of the following actions: > 1) WARN org.apache.hadoop.mapred.TaskTracker: getMapOutput(attempt_200905181452_0002_m_000010_0,0) failed : org.mortbay.jetty.EofException > 2) WARN org.mortbay.log: Committed before 410 getMapOutput(attempt_200905181452_0002_m_000010_0,0) failed : org.mortbay.jetty.EofException > 3) ERROR org.mortbay.log: /mapOutput java.lang.IllegalStateException: Committed > The map phase completes with 100%, and then the reduce phase crawls along with the above errors in each of the TaskTracker logs. None of the tasktrackers get lost. When I run non-data jobs like the 'pi' test from the example jar, everything works fine. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira