hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wangda Tan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4945) [Umbrella] Capacity Scheduler Preemption Within a queue
Date Thu, 15 Sep 2016 23:22:21 GMT

    [ https://issues.apache.org/jira/browse/YARN-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15494825#comment-15494825

Wangda Tan commented on YARN-4945:

1) YarnConfiguration:
- Instead of have a separate SELECT_CANDIDATES_FOR_INTRAQUEUE_PREEMPTION, should we only have
a "queue.intra-queue-preemption-enabled"? I cannot clearly think what it means in semantic,
one example is, after we have user-limit preemption support, what happens if we only enable
the user-limit preemption (without priority preemption enabled)?

2) PCPP:
- Unused imports / methods
- getPartitionResource: avoid clone resources? Because we will clone resource twice for every
app. If you concern about consistency, you can clone it once before starting preemption calculation
- It seems to me, partitionToUnderServedQueues can be kept in AbstractPreemptableResourceCalculator.
In addition, Map<String, LinkedHashSet<String>> could be Map<String, List<String>>.
(LinkedHashSet is not necessarily needed, because we won't have two TempQueuePerPartition
with the same queueName and same partition)

3) CapacitySchedulerPreemptionUtils:
- deductPreemptableResourcePerApp, is following a valid comment?
bq. // High priority app is coming first 
- Remove unnecessary param in method and new generic type (like new HashMap(...)), better
to move to Intellij? :p
- {getResToObtainByPartitionForApps}} can be removed, we can directly use policy.getResourceDemandFromAppsPerQueue

4) FiCaSchedulerApp: 
Mvoe getTotalPendingRequestsPerPartition to ResourceUsage? I can see we could have requirements
to: getUsedResourceByPartition, getReservedReosurceByPartition, etc. in the future

5) PreemptionCandidatesSelector:
- All non-abstract methods can be static, correct?
- All TODOs in comments are done, correct?

6) IntraQueuePreemptionPolicy and PriorityIntraQueuePreemptionPolicy:
- Overall: Do you think if the name: -Policy is too big? What it essentially do is computing
how much resource to preempt from each app, how about call it something like IntraQueuePreemptionComputePlugin?
Would like to hear thoughts from you and Eric for this as well.
- Rename the PriorityIntraQueuePreemptionPolicy to FifoIntraQueuePreemptionPolicy if you agree
with [my comment|https://issues.apache.org/jira/browse/YARN-4945?focusedCommentId=15494454&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15494454]
- PriorityIntraQueuePreemptionPolicy#getResourceDemandFromAppsPerQueue: 
a. resToObtainByPartition can be removed from parameter
b. IIUC, it gets resourceToObtain for each app instead of gets resourceDemand for each app,
rename it properly?
c. This logic is not correct:
        // If demand is from same priority level, skip the same.
        if (!tq.isPendingDemandForHigherPriorityApps(a1.getPriority())) {
It can only avoid highest priority in a queue applications preempt from each other, but it
cannot avoid 2nd highest applications from each other. And the performance can be improved
as well, I believe in some settings, maxAppPriority can be as much as MAX_INT. Please look
for below comments/pesudo code for details.
- computeAppsIdealAllocation:
a. Calling getUserLimitHeadRoomPerApp is too expensive, instead we can add one method in LeafQueue
to get UserLimit by userName. Have a Map of username to headroom inside the method can compute
user limit at most once for different user. And this logic can be reused to compute user-limit
b. {{tq.addPendinResourcePerPriority(tmpApp.getPriority(), tmpApp.pending);}} could be changed
if you agree with above .c
c. I think we should move the {{skip the same priority demand}} logic into this method. One
approach in my mind is:
// General idea:
// Use two pointer, one from most prioritized app, one from least prioritized app
// Each app has two quotas, one is how much resource required (ideal - used),
// Another one is how much resource can be preempted
// Move the two pointer and update the two quotas to get:
// For application X, is there any app with higher priority need the resource?

p1 = most-prioritized-app.iterator
p2 = least-prioritized-app.iterator

// For each app, we have:
// - "toPreemptFromOther" which initialized to (ideal - (used - selected)).
// - "actuallyToBePreempted" initialized to 0

while (p1.getPriority() > p2.getPriority() && p1 != p2) {
    Resource rest = p2.toBePreempt - p2.actuallyToBePreempted;
    if (rest > 0) {
        if (p1.toBePreemptFromOther > 0) {
        	Resource toPreempt = min(p1.toBePreemptFromOther, rest);
        	p1.toBePreemptFromOther -= toPreempt
        	p2.actuallyToBePreempted += toPreempt

	if (p2.toBePreempt - p2.actuallyToBePreempted == 0) {
        // Nothing more can be preempt from p2, move to next
        p2 --;    

	if (p1.toBePreemptFromOther == 0) {
	    // p1 is satisified, move to next
	    p1 ++;
d. After change c. getResourceDemandFromAppsPerQueue will simply return actuallyToPreempted
for apps in a queue groupped by partition

7) TempAppPerQueue:
- It should be TempAppPerPartition
- Is it possible to add a common class for app/queue to avoid dup logic?
- Is it better to rename Temp- to something like AppPartitionSnapshot or QueuePartitionSnapshot?
This can be done in a separate patch for better review

I haven't look at very detailed code logics of IntraQueueCandidatesSelector and IntraQueueCalculator
/ Policy, etc. Will do that in the next iterator.

> [Umbrella] Capacity Scheduler Preemption Within a queue
> -------------------------------------------------------
>                 Key: YARN-4945
>                 URL: https://issues.apache.org/jira/browse/YARN-4945
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Wangda Tan
>         Attachments: Intra-Queue Preemption Use Cases.pdf, IntraQueuepreemption-CapacityScheduler
(Design).pdf, YARN-2009-wip.2.patch, YARN-2009-wip.patch, YARN-2009-wip.v3.patch, YARN-2009.v0.patch,
YARN-2009.v1.patch, YARN-2009.v2.patch
> This is umbrella ticket to track efforts of preemption within a queue to support features
> YARN-2009. YARN-2113. YARN-4781.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message