Return-Path: X-Original-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5ED3075C7 for ; Mon, 1 Aug 2011 21:59:16 +0000 (UTC) Received: (qmail 81429 invoked by uid 500); 1 Aug 2011 21:59:15 -0000 Delivered-To: apmail-hadoop-mapreduce-issues-archive@hadoop.apache.org Received: (qmail 81137 invoked by uid 500); 1 Aug 2011 21:59:14 -0000 Mailing-List: contact mapreduce-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-issues@hadoop.apache.org Delivered-To: mailing list mapreduce-issues@hadoop.apache.org Received: (qmail 81021 invoked by uid 99); 1 Aug 2011 21:59:14 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 01 Aug 2011 21:59:14 +0000 X-ASF-Spam-Status: No, hits=-2000.7 required=5.0 tests=ALL_TRUSTED,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 01 Aug 2011 21:59:11 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 397559917B for ; Mon, 1 Aug 2011 21:58:50 +0000 (UTC) Date: Mon, 1 Aug 2011 21:58:50 +0000 (UTC) From: "Arun C Murthy (JIRA)" To: mapreduce-issues@hadoop.apache.org Message-ID: <804864081.246.1312235930232.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <371656456.15331.1297706997543.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (MAPREDUCE-2324) Job should fail if a reduce task can't be scheduled anywhere MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/MAPREDUCE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13073813#comment-13073813 ] Arun C Murthy commented on MAPREDUCE-2324: ------------------------------------------ Robert - the problem for reduce.input.limit was not 'right' value for the constant, but the fact that 'guessing' the reduce input was broken. For now, should we commit the logging change while you investigate if we can fix the 'guess'? > Job should fail if a reduce task can't be scheduled anywhere > ------------------------------------------------------------ > > Key: MAPREDUCE-2324 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2324 > Project: Hadoop Map/Reduce > Issue Type: Bug > Affects Versions: 0.20.2, 0.20.205.0 > Reporter: Todd Lipcon > Assignee: Robert Joseph Evans > Fix For: 0.20.205.0 > > Attachments: MR-2324-security-v1.txt, MR-2324-security-v2.txt, MR-2324-security-v3.patch, MR-2324-secutiry-just-log-v1.patch > > > If there's a reduce task that needs more disk space than is available on any mapred.local.dir in the cluster, that task will stay pending forever. For example, we produced this in a QA cluster by accidentally running terasort with one reducer - since no mapred.local.dir had 1T free, the job remained in pending state for several days. The reason for the "stuck" task wasn't clear from a user perspective until we looked at the JT logs. > Probably better to just fail the job if a reduce task goes through all TTs and finds that there isn't enough space. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira