Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 1D241200AF8 for ; Thu, 5 May 2016 20:26:15 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 1BBED160A06; Thu, 5 May 2016 18:26:15 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 658F51609F3 for ; Thu, 5 May 2016 20:26:14 +0200 (CEST) Received: (qmail 21619 invoked by uid 500); 5 May 2016 18:26:13 -0000 Mailing-List: contact mapreduce-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list mapreduce-issues@hadoop.apache.org Received: (qmail 21536 invoked by uid 99); 5 May 2016 18:26:13 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 05 May 2016 18:26:13 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 16BC62C1F6B for ; Thu, 5 May 2016 18:26:13 +0000 (UTC) Date: Thu, 5 May 2016 18:26:13 +0000 (UTC) From: "Wangda Tan (JIRA)" To: mapreduce-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (MAPREDUCE-6514) Job hangs as ask is not updated after ramping down of all reducers MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 05 May 2016 18:26:15 -0000 [ https://issues.apache.org/jira/browse/MAPREDUCE-6514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated MAPREDUCE-6514: ---------------------------------- Status: Patch Available (was: Open) > Job hangs as ask is not updated after ramping down of all reducers > ------------------------------------------------------------------ > > Key: MAPREDUCE-6514 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6514 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster > Affects Versions: 2.7.1 > Reporter: Varun Saxena > Assignee: Varun Saxena > Priority: Critical > Attachments: MAPREDUCE-6514.01.patch, MAPREDUCE-6514.02.patch > > > In RMContainerAllocator#preemptReducesIfNeeded, we simply clear the scheduled reduces map and put these reducers to pending. This is not updated in ask. So RM keeps on assigning and AM is not able to assign as no reducer is scheduled(check logs below the code). > If this is updated immediately, RM will be able to schedule mappers immediately which anyways is the intention when we ramp down reducers. > Scheduler need not allocate for ramped down reducers > This if not handled can lead to map starvation as pointed out in MAPREDUCE-6513 > {code} > LOG.info("Ramping down all scheduled reduces:" > + scheduledRequests.reduces.size()); > for (ContainerRequest req : scheduledRequests.reduces.values()) { > pendingReduces.add(req); > } > scheduledRequests.reduces.clear(); > {code} > {noformat} > 2015-10-13 04:55:04,912 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Container not assigned : container_1437451211867_1485_01_000215 > 2015-10-13 04:55:04,912 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Cannot assign container Container: [ContainerId: container_1437451211867_1485_01_000216, NodeId: hdszzdcxdat6g06u04p:26009, NodeHttpAddress: hdszzdcxdat6g06u04p:26010, Resource: , Priority: 10, Token: Token { kind: ContainerToken, service: 10.2.33.236:26009 }, ] for a reduce as either container memory less than required 4096 or no pending reduce tasks - reduces.isEmpty=true > 2015-10-13 04:55:04,912 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Container not assigned : container_1437451211867_1485_01_000216 > 2015-10-13 04:55:04,912 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Cannot assign container Container: [ContainerId: container_1437451211867_1485_01_000217, NodeId: hdszzdcxdat6g06u06p:26009, NodeHttpAddress: hdszzdcxdat6g06u06p:26010, Resource: , Priority: 10, Token: Token { kind: ContainerToken, service: 10.2.33.239:26009 }, ] for a reduce as either container memory less than required 4096 or no pending reduce tasks - reduces.isEmpty=true > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: mapreduce-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-help@hadoop.apache.org