hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wangda Tan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4390) Consider container request size during CS preemption
Date Fri, 22 Apr 2016 00:02:13 GMT

    [ https://issues.apache.org/jira/browse/YARN-4390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253032#comment-15253032
] 

Wangda Tan commented on YARN-4390:
----------------------------------

[~eepayne], Thanks for review!

bq. I think this JIRA gets us closer to that goal, but there may be a possibility for the
killed container to go someplace else. Is that right?
Yes it is true, and the reservation should happen before we can correctly preempt resources
for large containers. For example, if YARN-4280 occurs, we cannot reserve container and preempt
containers correctly.

Done most of your comments, except:

bq. Even though killableContainers is an unmodifiableMap, I think it can still change, can't
it?
Yes, It can change. And actually, all existing preemption logic assume change could happen:
- In micro view: we clone queue metrics at the beginning of editSchedule, but queue metrics
can be changed during preemption logic.
- In macro view: selected candidates could become invalid / valid back-and-forth before max-kill-wait
reaches. (Since queue's resource usage could be updated in the period of max-kill-wait).
So back to your question, if killableContainers modified during editSchedule. We can fix it
in next (and next-next ..) editSchedule.

bq. I am a little concerned about calling preemptionContext.getScheduler().getAllNodes())
to get the list of all of the nodes on every iteration of the preemption monitor...
This is a valid concern. However, as far as I know, Fair scheduler is using the method when
doing async scheduling, and async scheduling is widely used by Fair Scheduler users:
See logic:
{code} 
  void continuousSchedulingAttempt() throws InterruptedException {
    long start = getClock().getTime();
    List<FSSchedulerNode> nodeIdList =
        nodeTracker.sortedNodeList(nodeAvailableResourceComparator);

    // iterate all nodes
    for (FSSchedulerNode node : nodeIdList) {
      try {
        if (Resources.fitsIn(minimumAllocation,
            node.getUnallocatedResource())) {
          attemptScheduling(node);
        }
      }
      ....
   }
{code}
I didn't see any JIRA to complain about performane impact regarding to this approach.
And since it uses R/W lock, write lock will be acquired only if node add / move or node resource
update. So in most cases, nobody acquires write lock. I agree to cache node list inside PCPP
if we do see performance issues.

Attaching ver.4 patch, please kindly review.

> Consider container request size during CS preemption
> ----------------------------------------------------
>
>                 Key: YARN-4390
>                 URL: https://issues.apache.org/jira/browse/YARN-4390
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: capacity scheduler
>    Affects Versions: 3.0.0, 2.8.0, 2.7.3
>            Reporter: Eric Payne
>            Assignee: Wangda Tan
>         Attachments: YARN-4390-design.1.pdf, YARN-4390-test-results.pdf, YARN-4390.1.patch,
YARN-4390.2.patch, YARN-4390.3.branch-2.patch, YARN-4390.3.patch
>
>
> There are multiple reasons why preemption could unnecessarily preempt containers. One
is that an app could be requesting a large container (say 8-GB), and the preemption monitor
could conceivably preempt multiple containers (say 8, 1-GB containers) in order to fill the
large container request. These smaller containers would then be rejected by the requesting
AM and potentially given right back to the preempted app.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message