hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Payne <eric.payne1...@yahoo.com.INVALID>
Subject Re: [DISCUSS] Merge Absolute resource configuration support in Capacity Scheduler (YARN-5881) to trunk
Date Tue, 28 Nov 2017 17:40:48 GMT
Thanks Sunil for the great work on this feature.
I looked through the design document, reviewed the code, and tested out branch YARN-5881.
The design makes sense and the code looks like it is implementing the desing in a sensible
way. However, I have encountered a couple of bugs. I opened https://issues.apache.org/jira/browse/YARN-7575
to track my findings. Basically, here's a summary:

The design document from YARN-5881 says that for max-capacity:    
3)  For each queue, we require: a) if max-resource not set, it automatically set to parent.max-resource
     
When I try not setting anyyarn.scheduler.capacity.<queue-path>.maximum-capacity, the
RMUI scheduler page refuses to render. It looks like it's in CapacitySchedulerPage$LeafQueueInfoBlock.

Also... A job will run in the leaf queue with no max capacity set and it will grow to the
max capacity of the cluster, but if I add resources to the node, the job won't grow any more
even though it has pending resources.

Thanks,Eric


      From: Sunil G <sunilg@apache.org>
 To: "yarn-dev@hadoop.apache.org" <yarn-dev@hadoop.apache.org>; Hadoop Common <common-dev@hadoop.apache.org>;
Hdfs-dev <hdfs-dev@hadoop.apache.org>; "mapreduce-dev@hadoop.apache.org" <mapreduce-dev@hadoop.apache.org>

 Sent: Friday, November 24, 2017 11:49 AM
 Subject: [DISCUSS] Merge Absolute resource configuration support in Capacity Scheduler (YARN-5881)
to trunk
   
Hi All,

We would like to bring up the discussion of merging “absolute min/max
resources support in capacity scheduler” branch (YARN-5881) [2] into trunk
in a few weeks. The goal is to get it in for Hadoop 3.1.

*Major work happened in this branch*

  - YARN-6471. Support to add min/max resource configuration for a queue
  - YARN-7332. Compute effectiveCapacity per each resource vector
  - YARN-7411. Inter-Queue preemption's computeFixpointAllocation need to
  handle absolute resources.

*Regarding design details*

Please refer [1] for detailed design document.

*Regarding to testing:*

We did extensive tests for the feature in the last couple of months.
Comparing to latest trunk.

- For SLS benchmark: We didn't see observable performance gap from
simulated test based on 8K nodes SLS traces (1 PB memory). We got 3k+
containers allocated per second.

- For microbenchmark: We use performance test cases added by YARN 6775, it
did not show much performance regression comparing to trunk.

*YARN-5881* <https://issues.apache.org/jira/browse/YARN-5881>

#ResourceTypes = 2. Avg of fastest 20: 55294.52
#ResourceTypes = 2. Avg of fastest 20: 55401.66

*trunk*
#ResourceTypes = 2. Avg of fastest 20: 55865.92
#ResourceTypes = 2. Avg of fastest 20: 55096.418

*Regarding to API stability:*

All newly added @Public APIs are @Unstable.

Documentation jira [3] could help to provide detailed configuration
details. This feature works from end-to-end and we are running this in our
development cluster for last couple of months and undergone good amount of
testing. Branch code is run against trunk and tracked via [4].

We would love to get your thoughts before opening a voting thread.

Special thanks to a team of folks who worked hard and contributed towards
this efforts including design discussion / patch / reviews, etc.: Wangda
Tan, Vinod Kumar Vavilappali, Rohith Sharma K S.

[1] :
https://issues.apache.org/jira/secure/attachment/12855984/YARN-5881.Support.Absolute.Min.Max.Resource.In.Capacity.Scheduler.design-doc.v1.pdf
[2] : https://issues.apache.org/jira/browse/YARN-5881

[3] : https://issues.apache.org/jira/browse/YARN-7533

[4] : https://issues.apache.org/jira/browse/YARN-7510

Thanks,

Sunil G and Wangda Tan

   
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message