hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wangda Tan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-5889) Improve user-limit calculation in capacity scheduler
Date Wed, 01 Feb 2017 21:30:51 GMT

    [ https://issues.apache.org/jira/browse/YARN-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15848962#comment-15848962

Wangda Tan commented on YARN-5889:

Thanks [~sunilg] for updating the patch.

Last wave (hopefully :p) of comments for the latest patch

1) updateUserResourceUsage:
- javadocs parameter need to change
- remove LOG.info for debug

2) incResourceUsagePerUser/decResourceUsagePerUser are mostly identical, suggest to add the
"allocated" parameter and rename it to updateResourceUsagePerUser. And writeLock is not necessary

3) getComputedResourceLimitForActiveUsers:
- Why {{userLimitNeedsRecompute}} is called here? Will it make the following {{isRecomputeNeeded}}
to always return true?
My guess is, now we have only one localVersionOfUsersState for both of active user and total
user. If we have two such map, one for active user and one for total user, it could solve
the problem, correct?

4) isRecomputeNeeded:
- When userLimitPerSchedulingMode gonna be null?
- I'm still not quite sure about why {{userLimitPerSchedulingMode}} is required for {{isRecomputeNeeded}}:
{{getLocalVersionOfUsersState}} returns -1 when userLimitPerSchedulingMode doesn't contain
schedulingMode, correct?
- And also, we don't need {{latestVersionOfUserCount}}, instead we should call {{latestVersionOfUsersState.get()}}.

5) So a summary of 3/4: I think we need two maps for local version, and isRecomputeNeed should
take 3 parameters: schedulingMode, partition, and {{boolean activeUsers}}. Existing logic
looks not correct to me: if version updated to 2 for partition=x, and scheduling_mode=y; then
we get user-limit for active-user/total-user; and then version update to 3 for partition=x
and scheduling_mode=y; then we get user-limit for active-user/total-user again, the 2nd time
UL of total-user will not be updated.

6) getLatestVersionOfUsersState is too simple to be a method, better to remove.

7) userLimitNeedsRecompute:
Need to consider value becomes negative, and we use "-1" as default value for "not found"
local version, we should make sure value is always >= 0.
I think you can do things like:
int x = version.incrementAndGet();
if (x < 0) {
    x = version.get();
	while (x < 0 && !version.compareAndSet(x, 0)) {
		x = version.get();

8) Possible redundant null checks:
I'm not sure if they're required, we can keep them to make RM not crash, but highly suggest
to print warning:
- incResourceUsagePerUser/decResourceUsagePerUser: {{totalResourceUsageForUsers}}

> Improve user-limit calculation in capacity scheduler
> ----------------------------------------------------
>                 Key: YARN-5889
>                 URL: https://issues.apache.org/jira/browse/YARN-5889
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacity scheduler
>            Reporter: Sunil G
>            Assignee: Sunil G
>         Attachments: YARN-5889.0001.patch, YARN-5889.0001.suggested.patchnotes, YARN-5889.0002.patch,
YARN-5889.0003.patch, YARN-5889.0004.patch, YARN-5889.0005.patch, YARN-5889.0006.patch, YARN-5889.0007.patch,
YARN-5889.0008.patch, YARN-5889.v0.patch, YARN-5889.v1.patch, YARN-5889.v2.patch
> Currently user-limit is computed during every heartbeat allocation cycle with a write
lock. To improve performance, this tickets is focussing on moving user-limit calculation out
of heartbeat allocation flow.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message