hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Carlo Curino (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1707) Making the CapacityScheduler more dynamic
Date Tue, 29 Jul 2014 01:35:42 GMT

    [ https://issues.apache.org/jira/browse/YARN-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14077256#comment-14077256

Carlo Curino commented on YARN-1707:

Thanks again for the fast and insightful feedback. 

*Regarding how the patch matches the JIRA:*
Our initial implementation was indeed making the changes (i.e., the dynamic behaviors) in
ParentQueue and LeafQueue themselves. Previous feedback pushed us to have subclasses to in
a sense isolate the changes to dynamic subclasses. I think we can go back to the version modifying
directly ParentQueue and LeafQueue if there is consensus. #4 is required because we cannot
transactionally “add Q1, resize Q2” so that the invariant “size of children is == 100%”
is maintained. As a consequence we must relax the constraints (either in ParentQueue if we
remove the hierarchy, or as it is today in PlanQueue).  The good news is that the percentages
from the configuration are not interpreted as actual percentages, but rather used as relative
"weights" (ranking queues in used_resources / guaranteed_resources). This means that even
a careless admin will not get resources unused.  For example, if we set two queues to 10,40
(i.e., something that doesn't add up to 100), the behavior is equivalent to setting them to
20,80 (as they are used only for relative ranking of siblings). I think this is also ok for
hierarchies (worth double checking this part).

So all in all we can pull up to {{ParentQueue}} and {{LeafQueue}} all the dynamic behavior
if there is consensus that this is the right path.

*Regarding move:*
1) Good catch... We will wait for feedback from Jian on this.
2) I think we had that at some point and did not work correctly. We will try again.
3) There are few invariants we do not check. {{MaxApplicationsPerUser}} is one of them, but
also how many applications can be active in the target queue, etc... As I was mentioning in
my previous comment, this is likely fine for the limited usage we will make of this from {{ReservationSystem}},
but it is worth expand the checks we make (see {{FairScheduler.verifyMoveDoesNotViolateConstraints(..)}})
to expose move to users via CLI.

> Making the CapacityScheduler more dynamic
> -----------------------------------------
>                 Key: YARN-1707
>                 URL: https://issues.apache.org/jira/browse/YARN-1707
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: capacityscheduler
>            Reporter: Carlo Curino
>            Assignee: Carlo Curino
>              Labels: capacity-scheduler
>         Attachments: YARN-1707.patch
> The CapacityScheduler is a rather static at the moment, and refreshqueue provides a rather
heavy-handed way to reconfigure it. Moving towards long-running services (tracked in YARN-896)
and to enable more advanced admission control and resource parcelling we need to make the
CapacityScheduler more dynamic. This is instrumental to the umbrella jira YARN-1051.
> Concretely this require the following changes:
> * create queues dynamically
> * destroy queues dynamically
> * dynamically change queue parameters (e.g., capacity) 
> * modify refreshqueue validation to enforce sum(child.getCapacity())<= 100% instead
of ==100%
> We limit this to LeafQueues. 

This message was sent by Atlassian JIRA

View raw message