hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sandy Ryza (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-568) FairScheduler: support for work-preserving preemption
Date Sat, 04 May 2013 01:56:16 GMT

    [ https://issues.apache.org/jira/browse/YARN-568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13648982#comment-13648982
] 

Sandy Ryza commented on YARN-568:
---------------------------------

Hi Carlo,

I just looked over your patch.  It's looking good.  Glad that this is getting added to the
fair scheduler.

Here's my understanding of how fair scheduler preemption works with and without the patch,
so that we have some common ground to stand on when talking about it: every preemptionInterval
seconds, the fair scheduler checks to see whether anyone is starved for their shares, and
then kills containers until the killed containers' resources sum up to the amount that the
apps were behind.  With the patch, instead of killing containers, it marks them (and reports
that they're marked to the AM).  If a container was already marked, it is killed if maxWaitTimeBeforeKill
has passed.

A couple comments:
* Nothing ensures that the containers marked on a preemption pass are the first ones to be
considered on a subsequent pass.  This means that new containers could be marked for preemption
instead of killing the ones that should be killed.  Instead, the ones that were marked should
probably be saved and checked first.
* toPreempt is decremented whenever a container is considered, whether it's marked, killed,
or just observed as marked but not ready to be killed. I don't have a clear idea at this moment
of whether this is the correct behavior, but I think it would be good to discuss it and include
some rationale as comments.  Do you have thoughts?
* There should be a way to make killing instant.  I.e. if I set maxWaitTimeBeforeKill to 0,
I should not have to wait for the second pass for containers to be killed.
                
> FairScheduler: support for work-preserving preemption 
> ------------------------------------------------------
>
>                 Key: YARN-568
>                 URL: https://issues.apache.org/jira/browse/YARN-568
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: scheduler
>            Reporter: Carlo Curino
>            Assignee: Carlo Curino
>         Attachments: YARN-568.patch, YARN-568.patch
>
>
> In the attached patch, we modified  the FairScheduler to substitute its preemption-by-killling
with a work-preserving version of preemption (followed by killing if the AMs do not respond
quickly enough). This should allows to run preemption checking more often, but kill less often
(proper tuning to be investigated).  Depends on YARN-567 and YARN-45, is related to YARN-569.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message