Return-Path: X-Original-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id F3F4C10C25 for ; Fri, 3 Jan 2014 17:18:08 +0000 (UTC) Received: (qmail 97975 invoked by uid 500); 3 Jan 2014 17:18:07 -0000 Delivered-To: apmail-hadoop-mapreduce-issues-archive@hadoop.apache.org Received: (qmail 97648 invoked by uid 500); 3 Jan 2014 17:17:59 -0000 Mailing-List: contact mapreduce-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-issues@hadoop.apache.org Delivered-To: mailing list mapreduce-issues@hadoop.apache.org Received: (qmail 97609 invoked by uid 99); 3 Jan 2014 17:17:55 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 03 Jan 2014 17:17:55 +0000 Date: Fri, 3 Jan 2014 17:17:55 +0000 (UTC) From: "Karthik Kambatla (JIRA)" To: mapreduce-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (MAPREDUCE-5689) MRAppMaster does not preempt reducer when scheduled Maps cannot be full filled MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/MAPREDUCE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13861674#comment-13861674 ] Karthik Kambatla commented on MAPREDUCE-5689: --------------------------------------------- +1. Will commit this shortly. > MRAppMaster does not preempt reducer when scheduled Maps cannot be full filled > ------------------------------------------------------------------------------ > > Key: MAPREDUCE-5689 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5689 > Project: Hadoop Map/Reduce > Issue Type: Bug > Affects Versions: 3.0.0, 2.2.0 > Reporter: Lohit Vijayarenu > Assignee: Lohit Vijayarenu > Priority: Critical > Attachments: MAPREDUCE-5689.1.patch, MAPREDUCE-5689.2.patch > > > We saw corner case where Jobs running on cluster were hung. Scenario was something like this. Job was running within a pool which was running at its capacity. All available containers were occupied by reducers and last 2 mappers. There were few more reducers waiting to be scheduled in pipeline. > At this point two mappers which were running failed and went back to scheduled state. two available containers were assigned to reducers, now whole pool was full of reducers waiting on two maps to be complete. 2 maps never got scheduled because pool was full. > Ideally reducer preemption should have kicked in to make room for Mappers from this code in RMContaienrAllocator > {code} > int completedMaps = getJob().getCompletedMaps(); > int completedTasks = completedMaps + getJob().getCompletedReduces(); > if (lastCompletedTasks != completedTasks) { > lastCompletedTasks = completedTasks; > recalculateReduceSchedule = true; > } > if (recalculateReduceSchedule) { > preemptReducesIfNeeded(); > {code} > But in this scenario lastCompletedTasks is always completedTasks because maps were never completed. This would cause job to hang forever. As workaround if we kill few reducers, mappers would get scheduled and caused job to complete. -- This message was sent by Atlassian JIRA (v6.1.5#6160)