hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sunil G (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4390) Do surgical preemption based on reserved container in CapacityScheduler
Date Mon, 25 Apr 2016 16:23:12 GMT

    [ https://issues.apache.org/jira/browse/YARN-4390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15256541#comment-15256541
] 

Sunil G commented on YARN-4390:
-------------------------------

HI [~leftnoteasy]

Thanks for sharing patch here, Few comments/doubts from my side.

1. In {{CapacitySchedulerPreemptionUtils#deductPreemptableResourcesBasedSelectedCandidates}},
partition is retrieved from {{SchedulerNode}}. I think we can get this from {{RMContainer#getNodeLabelExpression}}
itself considering we will update RMContainer labelExpression (via a set call) while we invoke
{{replaceLabelsOnNode}}.
2. {{context.getQueueByPartition}} has a code path flow to return NULL eventhough its not
possible to happen. With this assumption, null check is not handled in caller side. So is
it better to raise an exception and skip that round of preemption to add more clarify?
3. From {{FifoCandidatesSelector}},
{code}
CapacitySchedulerPreemptionUtils
72	        .deductPreemptableResourcesBasedSelectedCandidates(
{code}
Ideally this code snippet helps to deduct already selected containers resource from its corresponding
{{TempQueuePartition}}. On that note, I think this code need not have to be inside one such
selector. Rather it may be better suited from the caller end and ensure that after each round
of selection, possible deduction is done.

4. One doubt about the newly changed {{PreemptableResourceCalculator}} ctor. Could we get
the information about {{considersReservedResourceWhenCalculateIdeal}} from {{preemptionContext}}
itself. We could add this as a getter api so that code may be more clean.

5. {{getNodesForPreemption}} in {{ReservedContainerCandidatesSelector}} has to scan all nodes
in cluster to get whether any node has a reserved container or not. For recent UI work, I
thought having a metric to know the count of reserved nodes in cluster. Is it costlier to
keep a reserved nodes list in scheduler? If so, could that be used here. Its just a thought,
I havent really checked the cost of keeping such list. So its fine to scrap this idea also
:)
6. In {{ReservedContainerCandidatesSelector}}, containers are sorted against launch time.
Is it possible that we may select a high priority latest container against some low priority
slightly old one. 
7. As  I see {{RMContainer}} is read-only interface. I feel {{setQueueName}} can be moved
out from this interface and place in {{RMContainerImpl}}. I think we could downcast and use
the same.

Minor nits:
1. {{ReservedContainerCandidatesSelector#getPreemptionCandidatesOnNode}} has some commented
code. It can be removed.
2. in {{TempQueuePerPartition}}, could we add {{reserved}} to {{toString}} for logging.


I think test class looks very clean and readable after new framework change. It is very good
and really useful :).
 I thought of trying this framework sometime tomorrow,and will share if I have comments.

> Do surgical preemption based on reserved container in CapacityScheduler
> -----------------------------------------------------------------------
>
>                 Key: YARN-4390
>                 URL: https://issues.apache.org/jira/browse/YARN-4390
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: capacity scheduler
>    Affects Versions: 3.0.0, 2.8.0, 2.7.3
>            Reporter: Eric Payne
>            Assignee: Wangda Tan
>         Attachments: YARN-4390-design.1.pdf, YARN-4390-test-results.pdf, YARN-4390.1.patch,
YARN-4390.2.patch, YARN-4390.3.branch-2.patch, YARN-4390.3.patch, YARN-4390.4.patch, YARN-4390.5.patch
>
>
> There are multiple reasons why preemption could unnecessarily preempt containers. One
is that an app could be requesting a large container (say 8-GB), and the preemption monitor
could conceivably preempt multiple containers (say 8, 1-GB containers) in order to fill the
large container request. These smaller containers would then be rejected by the requesting
AM and potentially given right back to the preempted app.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message