Return-Path: X-Original-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9935217826 for ; Fri, 1 May 2015 21:33:07 +0000 (UTC) Received: (qmail 415 invoked by uid 500); 1 May 2015 21:33:07 -0000 Delivered-To: apmail-hadoop-yarn-issues-archive@hadoop.apache.org Received: (qmail 373 invoked by uid 500); 1 May 2015 21:33:07 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: yarn-issues@hadoop.apache.org Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 360 invoked by uid 99); 1 May 2015 21:33:07 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 01 May 2015 21:33:07 +0000 Date: Fri, 1 May 2015 21:33:07 +0000 (UTC) From: "Jian He (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (YARN-1662) Capacity Scheduler reservation issue cause Job Hang MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/YARN-1662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14523998#comment-14523998 ] Jian He commented on YARN-1662: ------------------------------- Hi [~sunilg], YARN-1198 has fixed a number of headRoom issues to make sure the headroom is correct so that the reducer preemption will kick in correctly. In that case, this problem may be resolved ? > Capacity Scheduler reservation issue cause Job Hang > --------------------------------------------------- > > Key: YARN-1662 > URL: https://issues.apache.org/jira/browse/YARN-1662 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager > Affects Versions: 2.2.0 > Environment: Suse 11 SP1 + Linux > Reporter: Sunil G > > There are 2 node managers in my cluster. > NM1 with 8GB > NM2 with 8GB > I am submitting a Job with below details: > AM with 2GB > Map needs 5GB > Reducer needs 3GB > slowstart is enabled with 0.5 > 10maps and 50reducers are assigned. > 5maps are completed. Now few reducers got scheduled. > Now NM1 has 2GB AM and 3Gb Reducer_1 [Used 5GB] > NM2 has 3Gb Reducer_2 [Used 3GB] > A Map has now reserved(5GB) in NM1 which has only 3Gb free. > It hangs forever. > Potential issue is, reservation is now blocked in NM1 for a Map which needs 5GB. > But the Reducer_1 hangs by waiting for few map ouputs. > Reducer side preemption also not happened as few headroom is still available. -- This message was sent by Atlassian JIRA (v6.3.4#6332)