hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wangda Tan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-5889) Improve user-limit calculation in capacity scheduler
Date Fri, 13 Jan 2017 22:32:26 GMT

    [ https://issues.apache.org/jira/browse/YARN-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15822457#comment-15822457
] 

Wangda Tan commented on YARN-5889:
----------------------------------

Thanks for update the patch, my comments for ver.3 patch.

1) LQ:
- getUserAndAddIfAbsent could be move to UM
- addUser/removeUser could be private method of UM
- getUsers -> getUsersInfo, and move implemenation to UM
- getComputedActiveUserLimit -> getComputedResourceLimitForActiveUsers (or shorter name)
- getComputedUserLimit -> getComputedResourceLimitForAllUsers (mostly for more consistency
naming to above, and more acurate and less ambiguous)
- Add more description to above two methods.
- Is it better to move recalculateQueueUsageRatio/calculateUserUsageRatio into UM, and recalculateQueueUsageRatio
can be renamed to "clusterResourceUpdated"
- inc/decUsedResource, keep User reference to avoid hit hashMap twice.
- Not caused by your patch: could you add a desc to UsageRatios?

2) UM:
- UM#userCountToRecalculateUserlimit could be AtomicLong to avoid writelock and less frequently
wrapped to MIN_VALUE. Rename it to latestVersion.
- rename UM#getUserCountToRecalculateUserlimit -> getLatestVersion and readlock is not
required. 
- A potential race condition:
{code} 
 ....
    // In allocation thread, cross check whether user became active or not.
    // Such a change needs recomputation for all user-limits.
    if (activeUsersManager.isUserAddedOrRemoved()) {  // 1
      // Clear computed user-limits and ask for recomputation. 
      userLimitNeedsRecompute(); // 2

      // reset the flag so that we can further track for active users.
      activeUsersManager.setUserAddedOrRemoved(false); // 3
    }
 ....
{code} 
If thread#1 at  // 2, and thread #2 calls setUserAddedOrRemoved, the {{isUserAddedOrRemoved}}
could be overwritten and cannot properly trigger another update.
To simplify interface and avoid the race condition, can we change {{isUserAddedOrRemoved}}
to use an AtomicBoolean, invokes getAndSet(false) to do the "flip" operation.
- Renaming {{userCountToRecalculateUserlimit}} to {{localVersion}}?
- Similiarily,
{code}
    if (isRecomputeNeeded(user, userLimitPerSchedulingMode)) {
      // recompute
      userLimitPerSchedulingMode = reComputeUserLimits(rc, userName,
          nodePartition, clusterResource, schedulingMode, false);

      // update user count to cache so that we can avoid recompute if no major
      // changes.
      user.setUserCountToRecalculateUserlimit(
          getUserCountToRecalculateUserlimit());
    }
{code}
To avoid race condition, is it better to save value of "latestVersion" and set it to "local"
version, pesudo code like:
{code}
   int latestVersion = UM#getLatestVersion
   if (getLocalVersion != latestVersion) {
       // recompute ...
       setLocalVersion(latestVersion);
   }
{code}
- {{getComputedUserLimit}} / {{getComputedActiveUserLimit}}: I'm wondering if the UserToPartitionRecord
is required, since existing computation doesn't need the "user" field.
- Does existing logic handle following scenario? Assume userLimitPerSchedulingMode is null
and no new changes to user-limit, will NPE happen if we get a RESPECT_EXCLUSIVITY first and
then get a IGNORE_EXCLUSIVITY?
- Inside {{computeUserLimit}}, it calls {{user.setUserResourceLimit(userLimitResource);}},
I think we need to file a separate JIRA to make sure REST API / web UI can get expected UL.
- TODO, after you update the patch, need to double check locks of UM.

> Improve user-limit calculation in capacity scheduler
> ----------------------------------------------------
>
>                 Key: YARN-5889
>                 URL: https://issues.apache.org/jira/browse/YARN-5889
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacity scheduler
>            Reporter: Sunil G
>            Assignee: Sunil G
>         Attachments: YARN-5889.0001.patch, YARN-5889.0001.suggested.patchnotes, YARN-5889.0002.patch,
YARN-5889.0003.patch, YARN-5889.v0.patch, YARN-5889.v1.patch, YARN-5889.v2.patch
>
>
> Currently user-limit is computed during every heartbeat allocation cycle with a write
lock. To improve performance, this tickets is focussing on moving user-limit calculation out
of heartbeat allocation flow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message