hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "YunFan Zhou (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (YARN-6361) FairScheduler: FSLeafQueue.fetchAppsWithDemand CPU usage is high with big queues
Date Sun, 06 Aug 2017 15:05:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16115449#comment-16115449
] 

YunFan Zhou edited comment on YARN-6361 at 8/6/17 3:04 PM:
-----------------------------------------------------------

[~yufeigu]  Thank you very much, Yufei. 
I have to say that this JIRA's description describes a very good idea, and I have achieved
the corresponding code.
I used *O(N)* complexity before sorting to calculate the properties of each application, 
and then according to the computed properties of each application ordering all applications
using O(n*log(n)) complexity.
It does look a lot faster. And I'll test the bench mark tomorrow, and see how the optimization
of this approach improves performance compared to the unoptimized performance.


was (Author: daemon):
[~yufeigu]  Thank you very much, Yufei. 
I have to say that this JIRA's description describes a very good idea, and I have achieved
the corresponding code.
I used *O(N)* complexity before sorting to calculate the properties of each application, 
and then according to the computed properties of each application ordering all applications
using O(n*log(n)) complexity.
It does look a lot faster. And I'll test the bench mark tomorrow, and see how the optimization
of this approach improves performance compared to the unoptimized performance.

Here are some snippets of the code I've written, and many of the details have to be changed.
But I think our general idea is similar, using O(N) complexity to calculate each of the Schedulable
attribute values (such as the *SchedulableInnerAttr*). Then, order N of the SchedulableInnerAttr
using O(n*log(n)) time complexity. 
Because every member variable of the *SchedulableInnerAttr* is the base type, it can be very
fast in sorting. And it will be much faster than the traditional sort. I will upload the corresponding
detailed code and bench mark for the test tomorrow.Agree with what I said?

{code:java}
  private Schedulable sched;
  private boolean isDemandNotNone;
  private boolean isNeedy;
  private double minShareRatio;
  private boolean isWeightExceedZero;
  private double memorySizePerWeight;
  private long startTime;
  private String name;
  
  public SchedulableInnerAttr(Schedulable sched) {
    this.sched = sched;
    this.isDemandNotNone = sched.getDemand().equals(Resources.none());
  
    Resource resourceUsage = sched.getResourceUsage();
    Resource minShare = Resources.min(RESOURCE_CALCULATOR, null,
        sched.getMinShare(), sched.getDemand());
    this.isNeedy = Resources.lessThan(RESOURCE_CALCULATOR, null,
        resourceUsage, minShare);
    this.minShareRatio = (double) resourceUsage.getMemorySize() /
        Resources.max(RESOURCE_CALCULATOR, null, minShare, ONE)
            .getMemorySize();
  
    double weight = sched.getWeights().getWeight(ResourceType.MEMORY);
    this.isWeightExceedZero = weight > 0.0;
  
    this.memorySizePerWeight = resourceUsage.getMemorySize();
    if (this.isWeightExceedZero) {
      this.memorySizePerWeight /= weight;
    }
    
    this.startTime = sched.getStartTime();
    this.name = sched.getName();
  }
{code}


{code:java}
  @Override
   public int compare(SchedulableInnerAttr s1, SchedulableInnerAttr s2) {
      int res = 0;
      if (s1.isDemandNotNone() && !s2.isDemandNotNone()) {
        res = -1;
      } else if (!s1.isDemandNotNone() && s2.isDemandNotNone()) {
        res = 1;
      }
      if (res == 0) {
        if (s1.isNeedy() && !s2.isNeedy()) {
          res = -1;
        } else if (!s1.isNeedy() && s2.isNeedy()) {
          res = 1;
        }
      }
      if (res == 0) {
        res = (int) Math.signum(s1.getMinShareRatio() - s2.getMinShareRatio());
      }
      if (res == 0) {
        if (s1.isWeightExceedZero() && !s2.isWeightExceedZero()) {
          res = 1;
        } else if (!s1.isWeightExceedZero() && s2.isWeightExceedZero()) {
          res = -1;
        }
      }
      if (res == 0) {
        res = (int) Math.signum(s1.getMemorySizePerWeight() - s2.getMemorySizePerWeight());
      }
      if (res == 0) {
        res = (int) Math.signum(s1.getStartTime() - s2.getStartTime());
      }
      
      if (res == 0) {
        res = s1.getName().compareTo(s2.getName());
      }
      return res;
    }
{code}



> FairScheduler: FSLeafQueue.fetchAppsWithDemand CPU usage is high with big queues
> --------------------------------------------------------------------------------
>
>                 Key: YARN-6361
>                 URL: https://issues.apache.org/jira/browse/YARN-6361
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Miklos Szegedi
>            Assignee: YunFan Zhou
>         Attachments: dispatcherthread.png, threads.png, YARN-6361.001.pre.patch
>
>
> FSLeafQueue.fetchAppsWithDemand sorts the applications by the current policy. Most of
the time is spent in FairShareComparator.compare. We could improve this by doing the calculations
outside the sort loop {{(O\(n\))}} and we sorted by a fixed number inside instead {{O(n*log\(n\))}}.
This could be an performance issue when there are huge number of applications in a single
queue. The attachments shows the performance impact when there are 10k applications in one
queue.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message