Return-Path: X-Original-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 043F910CCD for ; Thu, 13 Mar 2014 08:37:04 +0000 (UTC) Received: (qmail 44808 invoked by uid 500); 13 Mar 2014 08:37:00 -0000 Delivered-To: apmail-hadoop-yarn-issues-archive@hadoop.apache.org Received: (qmail 44049 invoked by uid 500); 13 Mar 2014 08:36:55 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: yarn-issues@hadoop.apache.org Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 42347 invoked by uid 99); 13 Mar 2014 08:36:49 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 13 Mar 2014 08:36:49 +0000 Date: Thu, 13 Mar 2014 08:36:49 +0000 (UTC) From: "PengZhang (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (YARN-1829) CapacityScheduler can't schedule job after misconfiguration MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/YARN-1829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] PengZhang updated YARN-1829: ---------------------------- Description: CapacityScheduler will validate new configuration to make sure all existing queues are still present. But it seems not enough: 1.When we change one queue(name A) from leaf to parent, it will pass validation and add it's new child(X) to queues. And later root.reinitialize() will fail because of queue type has changed. 2.Then we add new parent queue(name B) with children(X), and change queue(A)'s state to STOPPED. This will apply successfully. but job submitted to queue(X) can never be scheduled. Because LeafQueue(X) has already been added in phase 1, and it's parent points to A which is STOPPED. root / A queues: root, A root / A / X reinitialize failed, but X is added to queues queues: root, A, X root / \ A B    \    X new node X will not replace old one queues: root, A, X(value is not LeafQueue that in the tree) was: CapacityScheduler will validate new configuration to make sure all existing queues are still present. But it seems not enough: 1.When we change one queue(name A) from leaf to parent, it will pass validation and add it's new child(X) to queues. And later root.reinitialize() will fail because of queue type has changed. 2.Then we add new parent queue(name B) with children(X), and change queue(A)'s state to STOPPED. This will apply successfully. but job submitted to queue(X) can never be scheduled. Because LeafQueue(X) has already been added in phase 1, and it's parent points to A which is STOPPED. root / A queues: root, A root / A / X reinitialize failed, but X is added to queues queues: root, A, X root / \ A B \ X new node X will not replace old one queues: root, A, X(value is not LeafQueue that in the tree) > CapacityScheduler can't schedule job after misconfiguration > ----------------------------------------------------------- > > Key: YARN-1829 > URL: https://issues.apache.org/jira/browse/YARN-1829 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler > Reporter: PengZhang > > CapacityScheduler will validate new configuration to make sure all existing queues are still present. But it seems not enough: > 1.When we change one queue(name A) from leaf to parent, it will pass validation and add it's new child(X) to queues. And later root.reinitialize() will fail because of queue type has changed. > 2.Then we add new parent queue(name B) with children(X), and change queue(A)'s state to STOPPED. This will apply successfully. but job submitted to queue(X) can never be scheduled. Because LeafQueue(X) has already been added in phase 1, and it's parent points to A which is STOPPED. > root > / > A > queues: root, A > root > / > A > / > X > reinitialize failed, but X is added to queues > queues: root, A, X > root > / \ > A B >    \ >    X > new node X will not replace old one > queues: root, A, X(value is not LeafQueue that in the tree) -- This message was sent by Atlassian JIRA (v6.2#6252)