Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 2051B200C05 for ; Mon, 23 Jan 2017 11:32:35 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 1ED14160B49; Mon, 23 Jan 2017 10:32:35 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 6946E160B3E for ; Mon, 23 Jan 2017 11:32:34 +0100 (CET) Received: (qmail 68879 invoked by uid 500); 23 Jan 2017 10:32:33 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 68868 invoked by uid 99); 23 Jan 2017 10:32:33 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 23 Jan 2017 10:32:33 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 02037C146D for ; Mon, 23 Jan 2017 10:32:33 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -1.999 X-Spam-Level: X-Spam-Status: No, score=-1.999 tagged_above=-999 required=6.31 tests=[KAM_LAZY_DOMAIN_SECURITY=1, RP_MATCHES_RCVD=-2.999] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id WXG8YNGLZI3X for ; Mon, 23 Jan 2017 10:32:30 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 4632E5FE16 for ; Mon, 23 Jan 2017 10:32:30 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 62842E040C for ; Mon, 23 Jan 2017 10:32:28 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id C832C25299 for ; Mon, 23 Jan 2017 10:32:26 +0000 (UTC) Date: Mon, 23 Jan 2017 10:32:26 +0000 (UTC) From: "Sunil G (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (YARN-5889) Improve user-limit calculation in capacity scheduler MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Mon, 23 Jan 2017 10:32:35 -0000 [ https://issues.apache.org/jira/browse/YARN-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15834233#comment-15834233 ] Sunil G commented on YARN-5889: ------------------------------- Hi [~eepayne] Thank you for the detailed comments. bq.do we need the isAnActiveUser checks in assignContainer and releaseContainer? bq.I removed these checks in my local build and the application is able to use all of the queue and cluster. If we remove the active user check, then {{activeUsersManager.getTotalResUsedByActiveUsers}} will be for all users. And hence it works like old. But I agree that the computation is not very correct. For example, *user1* was initially active and whenever a container was allocated for *user1*, we incremented resource to {{AUM#TotalResUsedByActiveUsers}}. Now this user has become in-active since it doesnot have any more outstanding resource requests. So *user1* resources has to be removed from {{AUM#TotalResUsedByActiveUsers}} at that time. This is not happening now. Eventhough I fix this, there are some changes in behavior. I can explain. {noformat} // User limit resource is determined by: // max{resourceUsedForActiveUsers / #activeUsers, queueCapacity * // user-limit-percentage%) {noformat} Now here, lets assume 2 cases: ( 1. usedResource < queuCap and 2. usedResource > queueCap) 1. {{resourceUsedForActiveUsers / #activeUsers}} will be much lesser value now as we consider only active-users used cap. In old case, {{total_used/#activeUsers}} will be definitely more. So as per above equation, UL will be {{queueCapacity * userLimit%}} for higher MULP (something like 80~99%). Hence UL will be less than queueCapacity. (If MULP is lesser value, then UL will also be lower) 2. If {{usedResource > queueCap}}, then the UL can go more than queue cap based on two factors. If #active_users is lesser and active_users resource usage is more than queue cap OR usedResource which is more than queuCap is multiplied with a higher MULP value. Altogether, first part of the existing UL compute equation will matter only if #active-users is lesser or MULP is very low in cluster. I think its somewhat fine. Please share your thoughts. > Improve user-limit calculation in capacity scheduler > ---------------------------------------------------- > > Key: YARN-5889 > URL: https://issues.apache.org/jira/browse/YARN-5889 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler > Reporter: Sunil G > Assignee: Sunil G > Attachments: YARN-5889.0001.patch, YARN-5889.0001.suggested.patchnotes, YARN-5889.0002.patch, YARN-5889.0003.patch, YARN-5889.0004.patch, YARN-5889.0005.patch, YARN-5889.v0.patch, YARN-5889.v1.patch, YARN-5889.v2.patch > > > Currently user-limit is computed during every heartbeat allocation cycle with a write lock. To improve performance, this tickets is focussing on moving user-limit calculation out of heartbeat allocation flow. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: yarn-issues-help@hadoop.apache.org