hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1769) CapacityScheduler: Improve reservations
Date Fri, 04 Apr 2014 22:57:18 GMT

    [ https://issues.apache.org/jira/browse/YARN-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13960541#comment-13960541
] 

Jason Lowe commented on YARN-1769:
----------------------------------

The patch no longer applies cleanly after YARN-1512.  Other comments on the patch:

- Nit: In LeafQueue.assignToQueue we could cache Resource.add(usedResources, required) in
a local when we're computing potentialNewCapacity so we don't have to recompute it as part
of the potentialNewWithoutReservedCapacity computation
- LeafQueue.assignToQueue and LeafQueue.assignToUser don't seem to need the new priority argument,
and therefore LeafQueue.checkLimitsToReserve wouldn't seem to need it either once those others
are updated.
- Should FiCaSchedulerApp getAppToUnreserve really be called getNodeIdToUnreserve or geNodeToUnreserve,
since it's returning a node ID rather than an app?
- In LeafQueue.findNodeToUnreserve, isn't it kinda bad if the app thinks it has reservations
on the node but the scheduler doesn't know about it?  Wondering if the bookkeeping is messed
up at that point therefore something a bit more than debug is an appropriate log level and
if further fixup is needed.
- LeafQueue.findNodeToUnreserve is adjusting the headroom when it unreserves, but I don't
see other unreservations doing a similar calculation.  Wondering if this fixup is something
that should have been in completedContainer or needs to be done elsewhere?  I could easily
be missing something here but asking just in case other unreservation situations also need
to have the headroom fixed.
- LeafQueue.assignContainer uses the much more expensive scheduler.getConfiguration().getReservationContinueLook()
when it should be able to use the reservationsContinueLooking member instead.
- LeafQueue.getReservationContinueLooking should be package private
- Nit: LeafQueue.assignContainer has some reformatting of the log message after the "// Inform
the node" comment which was clearer to read/maintain before since the label and the value
were always on a line by themselves.  Same goes for the "Reserved container" log towards the
end of the method.
- Ultra-Nit: ParentQueue.setupQueueConfig's log message should have the reservationsContinueLooking
on the previous line to match the style of other label/value pairs in the log message.
- ParentQueue.getReservationContinueLooking should be package private.

> CapacityScheduler:  Improve reservations
> ----------------------------------------
>
>                 Key: YARN-1769
>                 URL: https://issues.apache.org/jira/browse/YARN-1769
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: capacityscheduler
>    Affects Versions: 2.3.0
>            Reporter: Thomas Graves
>            Assignee: Thomas Graves
>         Attachments: YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch,
YARN-1769.patch, YARN-1769.patch, YARN-1769.patch
>
>
> Currently the CapacityScheduler uses reservations in order to handle requests for large
containers and the fact there might not currently be enough space available on a single host.
> The current algorithm for reservations is to reserve as many containers as currently
required and then it will start to reserve more above that after a certain number of re-reservations
(currently biased against larger containers).  Anytime it hits the limit of number reserved
it stops looking at any other nodes. This results in potentially missing nodes that have enough
space to fullfill the request.   
> The other place for improvement is currently reservations count against your queue capacity.
 If you have reservations you could hit the various limits which would then stop you from
looking further at that node.  
> The above 2 cases can cause an application requesting a larger container to take a long
time to gets it resources.  
> We could improve upon both of those by simply continuing to look at incoming nodes to
see if we could potentially swap out a reservation for an actual allocation. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message