Return-Path: Delivered-To: apmail-hadoop-core-dev-archive@www.apache.org Received: (qmail 76327 invoked from network); 4 Mar 2009 13:44:22 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 4 Mar 2009 13:44:22 -0000 Received: (qmail 71296 invoked by uid 500); 4 Mar 2009 13:44:18 -0000 Delivered-To: apmail-hadoop-core-dev-archive@hadoop.apache.org Received: (qmail 71261 invoked by uid 500); 4 Mar 2009 13:44:18 -0000 Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-dev@hadoop.apache.org Delivered-To: mailing list core-dev@hadoop.apache.org Received: (qmail 71167 invoked by uid 99); 4 Mar 2009 13:44:18 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 04 Mar 2009 05:44:17 -0800 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 04 Mar 2009 13:44:16 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 68D28234C4B4 for ; Wed, 4 Mar 2009 05:43:56 -0800 (PST) Message-ID: <1162885481.1236174236428.JavaMail.jira@brutus> Date: Wed, 4 Mar 2009 05:43:56 -0800 (PST) From: "Ramya R (JIRA)" To: core-dev@hadoop.apache.org Subject: [jira] Commented: (HADOOP-5338) Reduce tasks are stuck waiting for map outputs when none are in progress In-Reply-To: <948845037.1235652662145.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-5338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12678732#action_12678732 ] Ramya R commented on HADOOP-5338: --------------------------------- Tested the above patch on a 500node cluster and the reducers are no longer stuck and the job successfully completes after multiple JT restarts. However, there is one thing to notice. The situation where "reducers wait for maps when none are running" still occurs. But due to the above patch,this situation is handled in a way that the reducers don't hang infinitely. Instead the TT pulls back all the events and completes the task successfully. > Reduce tasks are stuck waiting for map outputs when none are in progress > ------------------------------------------------------------------------ > > Key: HADOOP-5338 > URL: https://issues.apache.org/jira/browse/HADOOP-5338 > Project: Hadoop Core > Issue Type: Bug > Components: mapred > Affects Versions: 0.20.0 > Reporter: Ramya R > Assignee: Amar Kamat > Priority: Blocker > Fix For: 0.20.0 > > Attachments: HADOOP-5338-v2.1.patch, log.txt > > > When JT is restarted several times, a situation is encountered when the reduce tasks are stuck forever waiting for map outputs. However 100%map is complete and none of the map tasks are in progress. The reduce tasks wait infinitely. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.