Return-Path: X-Original-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0B23B70E8 for ; Thu, 14 Jul 2011 13:25:27 +0000 (UTC) Received: (qmail 64672 invoked by uid 500); 14 Jul 2011 13:25:24 -0000 Delivered-To: apmail-hadoop-mapreduce-issues-archive@hadoop.apache.org Received: (qmail 63666 invoked by uid 500); 14 Jul 2011 13:25:22 -0000 Mailing-List: contact mapreduce-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-issues@hadoop.apache.org Delivered-To: mailing list mapreduce-issues@hadoop.apache.org Received: (qmail 63616 invoked by uid 99); 14 Jul 2011 13:25:22 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 14 Jul 2011 13:25:22 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 14 Jul 2011 13:25:20 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 48573575ED for ; Thu, 14 Jul 2011 13:25:00 +0000 (UTC) Date: Thu, 14 Jul 2011 13:25:00 +0000 (UTC) From: "Robert Joseph Evans (JIRA)" To: mapreduce-issues@hadoop.apache.org Message-ID: <1171123334.13309.1310649900293.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <349235943.11420.1310593079470.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Resolved] (MAPREDUCE-2684) Job Tracker can starve reduces with very large input. MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/MAPREDUCE-2684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans resolved MAPREDUCE-2684. -------------------------------------------- Resolution: Duplicate > Job Tracker can starve reduces with very large input. > ----------------------------------------------------- > > Key: MAPREDUCE-2684 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2684 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobtracker > Affects Versions: 0.20.204.0 > Reporter: Robert Joseph Evans > Assignee: Robert Joseph Evans > > If mapreduce.reduce.input.limit is mis-configured or if a cluster is just running low on disk space in general then reduces with large a input may never get scheduled causing the Job to never fail and never succeed, just starve until the job is killed. > The JobInProgess tries to guess at the size of the input to all reducers in a job. If the size is over mapreduce.reduce.input.limit then the job is killed. If it is not then findNewReduceTask() checks to see if the estimated size is too big to fit on the node currently looking for work. If it is not then it will let some other task have a chance at the slot. > The idea is to keep track of how often it happens that a Reduce Slot is rejected because of the lack of space vs how often it succeeds and then guess if the reduce tasks will ever be scheduled. > So I would like some feedback on this. > 1) How should we guess. Someone who found the bug here suggested P1 + (P2 * S), where S is the number of successful assignments. Possibly P1 = 20 and P2 = 2.0. I am not really sure. > 2) What should we do when we guess that it will never get a slot? Should we fail the job or do we say, even though it might fail, well lets just schedule the it and see if it really will fail. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira