Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 9C9B3200BB3 for ; Wed, 19 Oct 2016 01:37:01 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 9B7E6160AE5; Tue, 18 Oct 2016 23:37:01 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id E22C5160AF7 for ; Wed, 19 Oct 2016 01:37:00 +0200 (CEST) Received: (qmail 14230 invoked by uid 500); 18 Oct 2016 23:37:00 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 14219 invoked by uid 99); 18 Oct 2016 23:37:00 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 18 Oct 2016 23:37:00 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 9B8A32C4C82 for ; Tue, 18 Oct 2016 23:36:59 +0000 (UTC) Date: Tue, 18 Oct 2016 23:36:59 +0000 (UTC) From: "Carlo Curino (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (YARN-5734) OrgQueue for easy CapacityScheduler queue configuration management MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Tue, 18 Oct 2016 23:37:01 -0000 [ https://issues.apache.org/jira/browse/YARN-5734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15587034#comment-15587034 ] Carlo Curino commented on YARN-5734: ------------------------------------ [~jhung] what I was saying is a bit different, but what you mention makes sense. What I was pointing out was that we had a solution to tweak (for {{ReservationQueue}}) some of the key params in a very cheap / dynamic way. As part of YARN-4193 we had in a prototype the support for node-labels and did some further scalability work (lock tweaks in CS) to make it scale to many changes per second (300 queues with many node labels updated every sec). The insight was to do more "surgical" local changes to specific params, instead of large lock-deadly operations like refreshQueues. Said this, I agree that some of the work you guys are doing could be used (if low cost enough) to enforce the {{Plan}}, and generalize what reservations can "set" in the queues. Finally, during our convo with [~mshen] I was pointing out that the {{ReservationSystem}} can be used to provide a time-varying notion of queues (think a daily sin for the queue capacity), which in turns could be used to "multiply" the sellable capacity in the cluster. For example, we could promise highly guaranteed access to the "dev" queue during the day and exclusive access to the "reporting" queue at night (note that this provides much stronger guarantees than over-capacity fair sharing). Integrating this with what you guys have would be neat. > OrgQueue for easy CapacityScheduler queue configuration management > ------------------------------------------------------------------ > > Key: YARN-5734 > URL: https://issues.apache.org/jira/browse/YARN-5734 > Project: Hadoop YARN > Issue Type: New Feature > Reporter: Min Shen > Assignee: Min Shen > Attachments: OrgQueue_Design_v0.pdf > > > The current xml based configuration mechanism in CapacityScheduler makes it very inconvenient to apply any changes to the queue configurations. We saw 2 main drawbacks in the file based configuration mechanism: > # This makes it very inconvenient to automate queue configuration updates. For example, in our cluster setup, we leverage the queue mapping feature from YARN-2411 to route users to their dedicated organization queues. It could be extremely cumbersome to keep updating the config file to manage the very dynamic mapping between users to organizations. > # Even a user has the admin permission on one specific queue, that user is unable to make any queue configuration changes to resize the subqueues, changing queue ACLs, or creating new queues. All these operations need to be performed in a centralized manner by the cluster administrators. > With these current limitations, we realized the need of a more flexible configuration mechanism that allows queue configurations to be stored and managed more dynamically. We developed the feature internally at LinkedIn which introduces the concept of MutableConfigurationProvider. What it essentially does is to provide a set of configuration mutation APIs that allows queue configurations to be updated externally with a set of REST APIs. When performing the queue configuration changes, the queue ACLs will be honored, which means only queue administrators can make configuration changes to a given queue. MutableConfigurationProvider is implemented as a pluggable interface, and we have one implementation of this interface which is based on Derby embedded database. > This feature has been deployed at LinkedIn's Hadoop cluster for a year now, and have gone through several iterations of gathering feedbacks from users and improving accordingly. With this feature, cluster administrators are able to automate lots of thequeue configuration management tasks, such as setting the queue capacities to adjust cluster resources between queues based on established resource consumption patterns, or managing updating the user to queue mappings. We have attached our design documentation with this ticket and would like to receive feedbacks from the community regarding how to best integrate it with the latest version of YARN. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: yarn-issues-help@hadoop.apache.org