Return-Path: X-Original-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 98DE0D954 for ; Tue, 6 Nov 2012 19:12:13 +0000 (UTC) Received: (qmail 19765 invoked by uid 500); 6 Nov 2012 19:12:13 -0000 Delivered-To: apmail-hadoop-mapreduce-issues-archive@hadoop.apache.org Received: (qmail 19689 invoked by uid 500); 6 Nov 2012 19:12:13 -0000 Mailing-List: contact mapreduce-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-issues@hadoop.apache.org Delivered-To: mailing list mapreduce-issues@hadoop.apache.org Received: (qmail 19566 invoked by uid 99); 6 Nov 2012 19:12:13 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 06 Nov 2012 19:12:13 +0000 Date: Tue, 6 Nov 2012 19:12:12 +0000 (UTC) From: "Robert Joseph Evans (JIRA)" To: mapreduce-issues@hadoop.apache.org Message-ID: <561644758.75784.1352229133050.JavaMail.jiratomcat@arcas> Subject: [jira] [Created] (MAPREDUCE-4775) Reducer will "never" commit suicide MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 Robert Joseph Evans created MAPREDUCE-4775: ---------------------------------------------- Summary: Reducer will "never" commit suicide Key: MAPREDUCE-4775 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4775 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Priority: Critical In 1.0 there are a number of conditions that will cause a reducer to commit suicide and exit. This includes if it is stalled, if the error percentage of total fetches is too high. In the new code it will only commit suicide when the total number of failures for a single task attempt is >= max(30, totalMaps/10). In the best case with the quadratic back-off to get a single map attempt to reach 30 failure it would take 20.5 hours. And unless there is only one reducer running the map task would have been restarted before then. We should go back to include the same reducer suicide checks that are in 1.0 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira