hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Subramaniam Venkatraman Krishnan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1707) Making the CapacityScheduler more dynamic
Date Tue, 29 Jul 2014 01:17:39 GMT

    [ https://issues.apache.org/jira/browse/YARN-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14077242#comment-14077242

Subramaniam Venkatraman Krishnan commented on YARN-1707:

[~wangda] Thanks for the very detailed comments. I agree that understanding the context is
essential & glad to help with that. Overall your understanding is spot on, please find
answers to your questions below: 

1) Yes, it is possible to have multiple PlanQueues (e.g., if two organization want to dynamically
allocate their resources, but not share among them). This is also good to "try" reservation
on a small scale and slowly ramp up at each org's pace.
2) The extra confs are needed to automate the initialization of key parameters of the dynamic
ReservationQueues (without requiring full specification of each of those).
3) Correct
4) Correct
5) First: the Plan guarantees that the sum of reservations never exceed available resources
(replanning if needed to maintain this invariant to handle failures). On the other hand, like
it happens for normal scheduler we can leverage "overcapacity" to guarantee high cluster utilization.
More precisely, depending on the configuration (or dynamically on whether reservations have
gang semantics or not) we can allow resources allocated to PlanQueue and ReservationQueue
to exceed their guaranteed capacity (i.e., set the dynamic max-capacity above the guaranteed
one). In this case preemption might kick in if other apps with more rights on resources have
pending askss. Part of the changes in YARN-1957 were driven by this.
6) To limit the scope of changed, we agreed to have a follow up JIRA to address HA. The intuition
we have is that it is sufficient to persist the Plan alone. During recovery, the _Plan Follower_
will resync the Plan with the scheduler by creating the dynamic queues for currently active
reservations. We will be happy to have your input when we work on the HA JIRA.

[~curino] will answer your questions specify to this JIRA.

> Making the CapacityScheduler more dynamic
> -----------------------------------------
>                 Key: YARN-1707
>                 URL: https://issues.apache.org/jira/browse/YARN-1707
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: capacityscheduler
>            Reporter: Carlo Curino
>            Assignee: Carlo Curino
>              Labels: capacity-scheduler
>         Attachments: YARN-1707.patch
> The CapacityScheduler is a rather static at the moment, and refreshqueue provides a rather
heavy-handed way to reconfigure it. Moving towards long-running services (tracked in YARN-896)
and to enable more advanced admission control and resource parcelling we need to make the
CapacityScheduler more dynamic. This is instrumental to the umbrella jira YARN-1051.
> Concretely this require the following changes:
> * create queues dynamically
> * destroy queues dynamically
> * dynamically change queue parameters (e.g., capacity) 
> * modify refreshqueue validation to enforce sum(child.getCapacity())<= 100% instead
of ==100%
> We limit this to LeafQueues. 

This message was sent by Atlassian JIRA

View raw message