[ https://issues.apache.org/jira/browse/YARN-5892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16077246#comment-16077246 ] Eric Payne edited comment on YARN-5892 at 7/7/17 12:33 PM: ----------------------------------------------------------- [~sunilg], [~leftnoteasy], [~jlowe]: Since branch-2 and 2.8 are somewhat different than trunk, it was necessary to make some design decisions that I would like you to be aware of when reviewing this backport: - As noted [here|https://issues.apache.org/jira/browse/YARN-2113?focusedCommentId=16023111&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16023111], I did not backport YARN-5889 because it depends on locking changes from YARN-3140 and other locking JIRAs. - In trunk, a change was made in YARN-5889 that changed the way {{computeUserLimit}} calculates user limit. In branch-2 and branch-2.8, {{userLimitResource = (all used resources in queue) / (num active users in queue)}}. In trunk after YARN-5889, {{userLimitResource = (all used resources by active users in queue) / (num active users)}}. -- Since branch-2 and 2.8 use {{all used resources in queue}} instead of {{all used resources by active users in queue}}, it is not necessary to modify {{LeafQueue}} to update used resource when users are activated and deactivated like was done in {{UsersManager}} in trunk. -- However, I did add the activeUsersSet to LeafQueue and all the places it is modified so it can be used to sum active users times weight. -- Therefore, it wasn't necessary to create a separate UsersManager class as was done in YARN-5889. Instead, I added a small amount of code in ActiveUsersManager to keep track of active users and to indicate when users are either activated or deactivated. - {{LeafQueue#sumActiveUsersTimesWeights}} should not do anything that synchronizes or locks. This is to avoid deadlocks because it is called by getHeadRoom (indirectly), which is called by {{FiCaSchedulerApp}}. {code} float sumActiveUsersTimesWeights() { float count = 0.0f; for (String userName : activeUsersSet) { User user = users.get(userName); count += (user != null) ? user.getWeight() : 1.0f; } return count; } {code} -- This opens up a race condition for when a user is added or removed from {{activeUsersSet}} while {{sumActiveUsersTimesWeights}} is iterating over the set. --- I'm not an expert in Java syncronization. Does this expose {{LeafQueue}} to concurrent modification exceptions? --- There is no {{ConcurrentHashSet}} so should I make {{activeUsersSet}} a {{ConcurrentHashMap}}? was (Author: eepayne): [~sunilg], [~leftnoteasy], [~jlowe]: Since branch-2 and 2.8 are somewhat different than trunk, it was necessary to make some design decisions that I would like you to be aware of when reviewing this backport: - As noted [here|https://issues.apache.org/jira/browse/YARN-2113?focusedCommentId=16023111&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16023111], I did not backport YARN-5889 because it depends on locking changes from YARN-3140 and other locking JIRAs. - In trunk, a change was made in YARN-5889 that changed the way {{computeUserLimit}} calculates user limit. In branch-2 and branch-2.8, {{userLimitResource = (all used resources in queue) / (num active users in queue)}}. In trunk after YARN-5889, {{userLimitResource = (all used resources by active users in queue) / (num active users)}}. -- Since branch-2 and 2.8 use {{all used resources by active users in queue}} instead of {{all used resources in queue}}, it is not necessary to modify {{LeafQueue}} to keep track of when resources are activated and deactivated like was done in {{UsersManager}} in trunk. -- However, I did add the activeUsersSet to LeafQueue and all the places it is modified so it can be used to sum active users times weight. -- Therefore, it wasn't necessary to create a separate UsersManager class as was done in YARN-5889. Instead, I added a small amount of code in ActiveUsersManager to keep track of active users and to indicate when users are either activated or deactivated. - {{LeafQueue#sumActiveUsersTimesWeights}} should not do anything that synchronizes or locks. This is to avoid deadlocks because it is called by getHeadRoom (indirectly), which is called by {{FiCaSchedulerApp}}. {code} float sumActiveUsersTimesWeights() { float count = 0.0f; for (String userName : activeUsersSet) { User user = users.get(userName); count += (user != null) ? user.getWeight() : 1.0f; } return count; } {code} -- This opens up a race condition for when a user is added or removed from {{activeUsersSet}} while {{sumActiveUsersTimesWeights}} is iterating over the set. --- I'm not an expert in Java syncronization. Does this expose {{LeafQueue}} to concurrent modification exceptions? --- There is no {{ConcurrentHashSet}} so should I make {{activeUsersSet}} a {{ConcurrentHashMap}}? > Support user-specific minimum user limit percentage in Capacity Scheduler > ------------------------------------------------------------------------- > > Key: YARN-5892 > URL: https://issues.apache.org/jira/browse/YARN-5892 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler > Reporter: Eric Payne > Assignee: Eric Payne > Fix For: 3.0.0-alpha4 > > Attachments: Active users highlighted.jpg, YARN-5892.001.patch, YARN-5892.002.patch, YARN-5892.003.patch, YARN-5892.004.patch, YARN-5892.005.patch, YARN-5892.006.patch, YARN-5892.007.patch, YARN-5892.008.patch, YARN-5892.009.patch, YARN-5892.010.patch, YARN-5892.012.patch, YARN-5892.013.patch, YARN-5892.014.patch, YARN-5892.015.patch, YARN-5892.branch-2.015.patch > > > Currently, in the capacity scheduler, the {{minimum-user-limit-percent}} property is per queue. A cluster admin should be able to set the minimum user limit percent on a per-user basis within the queue. > This functionality is needed so that when intra-queue preemption is enabled (YARN-4945 / YARN-2113), some users can be deemed as more important than other users, and resources from VIP users won't be as likely to be preempted. > For example, if the {{getstuffdone}} queue has a MULP of 25 percent, but user {{jane}} is a power user of queue {{getstuffdone}} and needs to be guaranteed 75 percent, the properties for {{getstuffdone}} and {{jane}} would look like this: > {code} > > yarn.scheduler.capacity.root.getstuffdone.minimum-user-limit-percent > 25 > > > yarn.scheduler.capacity.root.getstuffdone.jane.minimum-user-limit-percent > 75 > > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: yarn-issues-help@hadoop.apache.org