hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Carlo Curino (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4003) ReservationQueue inherit getAMResourceLimit() from LeafQueue, but behavior is not consistent
Date Mon, 21 Dec 2015 23:44:46 GMT

    [ https://issues.apache.org/jira/browse/YARN-4003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15067269#comment-15067269
] 

Carlo Curino commented on YARN-4003:
------------------------------------

[~sunilg], I think what you propose make is possible, but the semantics would be a bit unpleasant.
Assume a reservation queue R1 launches tons of AMs, and now another reservation queue R2 is
stuck not being able to run any job. I wouldn't want that... I would rather have a reservation
burning its entire capacity in AMs, but allow other reservation queues to launch their jobs.


I think the cleaner solution (but definitely longer term) would be to treat the RM scheduling
bandwidth as a separate (reservable) resource. So a queue (and similarly a reservation) can
be configure to allow up to a certain amount of AMs (which in turn bounds how much RM scheduling
bandwidth I am devoting to this queue). This would also makes lots of sense for the federation
effort: YARN-2915 (where we need to partition jobs across sub-clusters to protect the RMs
from excessive AM-RM traffic due to the scale-out nature of federation). 

What are folks generally thinking about explicitly capturing the cost of scheduler bandwidth
(e.g., a service that launches 10 tasks and never asks for anything again is much less work
for the RM than a MR jobs running many many short-lived tasks) ?

> ReservationQueue inherit getAMResourceLimit() from LeafQueue, but behavior is not consistent
> --------------------------------------------------------------------------------------------
>
>                 Key: YARN-4003
>                 URL: https://issues.apache.org/jira/browse/YARN-4003
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>            Reporter: Carlo Curino
>            Assignee: Carlo Curino
>         Attachments: YARN-4003.patch
>
>
> The inherited behavior from LeafQueue (limit AM % based on capacity) is not a good fit
for ReservationQueue (that have highly dynamic capacity). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message