Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Date: Mon, 23 Jan 2017 10:32:26 +0000 (UTC)
From: "Sunil G (JIRA)" <jira@apache.org>
To: yarn-issues@hadoop.apache.org
Message-ID: <JIRA.13021186.1479315855000.77981.1485167546818@Atlassian.JIRA>
In-Reply-To: <JIRA.13021186.1479315855000@Atlassian.JIRA>
References: <JIRA.13021186.1479315855000@Atlassian.JIRA> <JIRA.13021186.1479315855295@jira-lw-us.apache.org>
Subject: [jira] [Commented] (YARN-5889) Improve user-limit calculation in
 capacity scheduler
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
archived-at: Mon, 23 Jan 2017 10:32:35 -0000


    [ https://issues.apache.org/jira/browse/YARN-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15834233#comment-15834233 ] 

Sunil G commented on YARN-5889:
-------------------------------

Hi [~eepayne]
Thank you for the detailed comments.

bq.do we need the isAnActiveUser checks in assignContainer and releaseContainer?
bq.I removed these checks in my local build and the application is able to use all of the queue and cluster.
If we remove the active user check, then {{activeUsersManager.getTotalResUsedByActiveUsers}} will be for all users. And hence it works like old. But I agree that the computation is not very correct. For example, *user1* was initially active and whenever a container was allocated for *user1*, we incremented resource to  {{AUM#TotalResUsedByActiveUsers}}. Now this user has become in-active since it doesnot have any more outstanding resource requests. So *user1* resources has to be removed from  {{AUM#TotalResUsedByActiveUsers}} at that time. This is not happening now. Eventhough I fix this, there are some changes in behavior. I can explain.

{noformat}
    // User limit resource is determined by:
    // max{resourceUsedForActiveUsers / #activeUsers, queueCapacity *
    // user-limit-percentage%)
{noformat}


Now here, lets assume 2 cases: ( 1. usedResource < queuCap and 2. usedResource > queueCap)

1. {{resourceUsedForActiveUsers / #activeUsers}} will be much lesser value now as we consider only active-users used cap. In old case, {{total_used/#activeUsers}} will be definitely more. So as per above equation, UL will be {{queueCapacity * userLimit%}} for higher MULP (something like 80~99%). Hence UL will be less than queueCapacity. (If MULP is lesser value, then UL will also be lower)
2. If {{usedResource > queueCap}}, then the UL can go more than queue cap based on two factors. If #active_users is lesser and active_users resource usage is more than queue cap OR usedResource which is more than queuCap is multiplied with a higher MULP value.

Altogether, first part of the existing UL compute equation will matter only if #active-users is lesser or MULP is very low in cluster. I think its somewhat fine. Please share your thoughts.

> Improve user-limit calculation in capacity scheduler
> ----------------------------------------------------
>
>                 Key: YARN-5889
>                 URL: https://issues.apache.org/jira/browse/YARN-5889
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacity scheduler
>            Reporter: Sunil G
>            Assignee: Sunil G
>         Attachments: YARN-5889.0001.patch, YARN-5889.0001.suggested.patchnotes, YARN-5889.0002.patch, YARN-5889.0003.patch, YARN-5889.0004.patch, YARN-5889.0005.patch, YARN-5889.v0.patch, YARN-5889.v1.patch, YARN-5889.v2.patch
>
>
> Currently user-limit is computed during every heartbeat allocation cycle with a write lock. To improve performance, this tickets is focussing on moving user-limit calculation out of heartbeat allocation flow.


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org