hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Graves (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1769) CapacityScheduler: Improve reservations
Date Tue, 04 Mar 2014 15:15:24 GMT

    [ https://issues.apache.org/jira/browse/YARN-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13919506#comment-13919506

Thomas Graves commented on YARN-1769:

if canAllocContainer is false then you can't reserve another container.  This could happen
if you don't have any containers to unreserve when you hit the reservation limits and this
node doesn't have available containers.    

      if ((!scheduler.getConfiguration().getReservationContinueLook()) // without feature
always reserve like previously did
          || (canAllocContainer) // if we hit our reservation limit and no available space
on this node, don't reserve another one 
          || (rmContainer != null)) { // if this was called because node already had reservation,
we need to make sure it gets book keeped as re-reservation 

 I can simplify this a bit.  I don't really need the !scheduler.getConfiguration().getReservationContinueLook
check anymore since canAllocContainer defaults to true in that case. 

> CapacityScheduler:  Improve reservations
> ----------------------------------------
>                 Key: YARN-1769
>                 URL: https://issues.apache.org/jira/browse/YARN-1769
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: capacityscheduler
>    Affects Versions: 2.3.0
>            Reporter: Thomas Graves
>            Assignee: Thomas Graves
>         Attachments: YARN-1769.patch
> Currently the CapacityScheduler uses reservations in order to handle requests for large
containers and the fact there might not currently be enough space available on a single host.
> The current algorithm for reservations is to reserve as many containers as currently
required and then it will start to reserve more above that after a certain number of re-reservations
(currently biased against larger containers).  Anytime it hits the limit of number reserved
it stops looking at any other nodes. This results in potentially missing nodes that have enough
space to fullfill the request.   
> The other place for improvement is currently reservations count against your queue capacity.
 If you have reservations you could hit the various limits which would then stop you from
looking further at that node.  
> The above 2 cases can cause an application requesting a larger container to take a long
time to gets it resources.  
> We could improve upon both of those by simply continuing to look at incoming nodes to
see if we could potentially swap out a reservation for an actual allocation. 

This message was sent by Atlassian JIRA

View raw message