Return-Path: X-Original-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3AA4D1793D for ; Wed, 2 Sep 2015 03:27:46 +0000 (UTC) Received: (qmail 19549 invoked by uid 500); 2 Sep 2015 03:27:46 -0000 Delivered-To: apmail-hadoop-yarn-issues-archive@hadoop.apache.org Received: (qmail 19482 invoked by uid 500); 2 Sep 2015 03:27:45 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: yarn-issues@hadoop.apache.org Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 19464 invoked by uid 99); 2 Sep 2015 03:27:45 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 02 Sep 2015 03:27:45 +0000 Date: Wed, 2 Sep 2015 03:27:45 +0000 (UTC) From: "Chang Li (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (YARN-4105) Capacity Scheduler headroom for DRF is wrong MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/YARN-4105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chang Li updated YARN-4105: --------------------------- Attachment: YARN-4105.patch > Capacity Scheduler headroom for DRF is wrong > -------------------------------------------- > > Key: YARN-4105 > URL: https://issues.apache.org/jira/browse/YARN-4105 > Project: Hadoop YARN > Issue Type: Bug > Reporter: Chang Li > Assignee: Chang Li > Attachments: YARN-4105.patch > > > relate to the problem discussed in YARN-1857. But the min method is flawed when we are using DRC. Have run into a real scenario in production where queueCapacity: , qconsumed: , consumed: limit: . headRoom calculation returns 88064 where there is only 1536 left in the queue because DRC effectively compare by vcores. It then caused deadlock because RMcontainer allocator thought there is still space for mapper and won't preempt a reducer in a full queue to schedule a mapper. Propose fix with componentwiseMin. -- This message was sent by Atlassian JIRA (v6.3.4#6332)